X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=a076d3ad66a8747e1a83a5c3be0d334ce0ba8411;hb=9f2f055aa1e8c86d97b5ea42473ab1747f518f3a;hp=3576364abfe876c1c5e1fb28664628c7f4b1a442;hpb=90a181105a13094bd64be49dfc7995251779b079;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 3576364..a076d3a 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -102,7 +102,7 @@ X =head3 Metacharacters -The patterns used in Perl pattern matching evolved from the ones supplied in +The patterns used in Perl pattern matching evolved from those supplied in the Version 8 regex routines. (The routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.) See L for @@ -258,7 +258,7 @@ X X X X \pP Match P, named property. Use \p{Prop} for longer names. \PP Match non-P \X Match eXtended Unicode "combining character sequence", - equivalent to (?:\PM\pM*) + equivalent to (?>\PM\pM*) \C Match a single C char (octet) even under Unicode. NOTE: breaks up characters into their UTF-8 bytes, so you may end up with malformed pieces of UTF-8. @@ -375,20 +375,60 @@ X X<\p> X<\p{}> digit IsDigit \d graph IsGraph lower IsLower - print IsPrint - punct IsPunct + print IsPrint (but see [2] below) + punct IsPunct (but see [3] below) space IsSpace IsSpacePerl \s upper IsUpper - word IsWord + word IsWord \w xdigit IsXDigit For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent. +However, the equivalence between C<[[:xxxxx:]]> and C<\p{IsXxxxx}> +is not exact. + +=over 4 + +=item [1] + If the C pragma is not used but the C pragma is, the classes correlate with the usual isalpha(3) interface (except for "word" and "blank"). +But if the C or C pragmas are not used and +the string is not C, then C<[[:xxxxx:]]> (and C<\w>, etc.) +will not match characters 0x80-0xff; whereas C<\p{IsXxxxx}> will +force the string to C and can match these characters +(as Unicode). + +=item [2] + +C<\p{IsPrint}> matches characters 0x09-0x0d but C<[[:print:]]> does not. + +=item [3] + +C<[[:punct::]]> matches the following but C<\p{IsPunct}> does not, +because they are classed as symbols (not punctuation) in Unicode. + +=over 4 + +=item C<$> + +Currency symbol + +=item C<+> C<< < >> C<=> C<< > >> C<|> C<~> + +Mathematical symbols + +=item C<^> C<`> + +Modifier symbols (accents) + +=back + +=back + The other named classes are: =over 4 @@ -521,14 +561,14 @@ backreferences. X<\g{1}> X<\g{-1}> X<\g{name}> X X In order to provide a safer and easier way to construct patterns using -backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly -brackets are optional, however omitting them is less safe as the meaning -of the pattern can be changed by text (such as digits) following it. -When N is a positive integer the C<\g{N}> notation is exactly equivalent -to using normal backreferences. When N is a negative integer then it is -a relative backreference referring to the previous N'th capturing group. -When the bracket form is used and N is not an integer, it is treated as a -reference to a named buffer. +backreferences, Perl provides the C<\g{N}> notation (starting with perl +5.10.0). The curly brackets are optional, however omitting them is less +safe as the meaning of the pattern can be changed by text (such as digits) +following it. When N is a positive integer the C<\g{N}> notation is +exactly equivalent to using normal backreferences. When N is a negative +integer then it is a relative backreference referring to the previous N'th +capturing group. When the bracket form is used and N is not an integer, it +is treated as a reference to a named buffer. Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the buffer before that. For example: @@ -544,7 +584,7 @@ buffer before that. For example: and would match the same as C. -Additionally, as of Perl 5.10 you may use named capture buffers and named +Additionally, as of Perl 5.10.0 you may use named capture buffers and named backreferences. The notation is C<< (?...) >> to declare and C<< \k >> to reference. You may also use apostrophes instead of angle brackets to delimit the name; and you may use the bracketed C<< \g{name} >> backreference syntax. @@ -614,7 +654,7 @@ already paid the price. As of 5.005, C<$&> is not so costly as the other two. X<$&> X<$`> X<$'> -As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>, +As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&> and C<$'>, B that they are only guaranteed to be defined after a successful match that was executed with the C

(preserve) modifier. @@ -740,7 +780,7 @@ X<(?|)> X This is the "branch reset" pattern, which has the special property that the capture buffers are numbered from the same starting point -in each alternation branch. It is available starting from perl 5.10. +in each alternation branch. It is available starting from perl 5.10.0. Capture buffers are numbered from left to right, but inside this construct the numbering is restarted for each branch. @@ -2108,9 +2148,9 @@ part of this regular expression needs to be converted explicitly =head1 PCRE/Python Support -As of Perl 5.10 Perl supports several Python/PCRE specific extensions +As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions to the regex syntax. While Perl programmers are encouraged to use the -Perl specific syntax, the following are legal in Perl 5.10: +Perl specific syntax, the following are also accepted: =over 4