X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=a076d3ad66a8747e1a83a5c3be0d334ce0ba8411;hb=9f2f055aa1e8c86d97b5ea42473ab1747f518f3a;hp=3576364abfe876c1c5e1fb28664628c7f4b1a442;hpb=90a181105a13094bd64be49dfc7995251779b079;p=p5sagit%2Fp5-mst-13.2.git
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 3576364..a076d3a 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -102,7 +102,7 @@ X
=head3 Metacharacters
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L for
@@ -258,7 +258,7 @@ X X X X
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
- equivalent to (?:\PM\pM*)
+ equivalent to (?>\PM\pM*)
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
@@ -375,20 +375,60 @@ X X<\p> X<\p{}>
digit IsDigit \d
graph IsGraph
lower IsLower
- print IsPrint
- punct IsPunct
+ print IsPrint (but see [2] below)
+ punct IsPunct (but see [3] below)
space IsSpace
IsSpacePerl \s
upper IsUpper
- word IsWord
+ word IsWord \w
xdigit IsXDigit
For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
+However, the equivalence between C<[[:xxxxx:]]> and C<\p{IsXxxxx}>
+is not exact.
+
+=over 4
+
+=item [1]
+
If the C pragma is not used but the C pragma is, the
classes correlate with the usual isalpha(3) interface (except for
"word" and "blank").
+But if the C or C pragmas are not used and
+the string is not C, then C<[[:xxxxx:]]> (and C<\w>, etc.)
+will not match characters 0x80-0xff; whereas C<\p{IsXxxxx}> will
+force the string to C and can match these characters
+(as Unicode).
+
+=item [2]
+
+C<\p{IsPrint}> matches characters 0x09-0x0d but C<[[:print:]]> does not.
+
+=item [3]
+
+C<[[:punct::]]> matches the following but C<\p{IsPunct}> does not,
+because they are classed as symbols (not punctuation) in Unicode.
+
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
+
+Modifier symbols (accents)
+
+=back
+
+=back
+
The other named classes are:
=over 4
@@ -521,14 +561,14 @@ backreferences.
X<\g{1}> X<\g{-1}> X<\g{name}> X X
In order to provide a safer and easier way to construct patterns using
-backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
-brackets are optional, however omitting them is less safe as the meaning
-of the pattern can be changed by text (such as digits) following it.
-When N is a positive integer the C<\g{N}> notation is exactly equivalent
-to using normal backreferences. When N is a negative integer then it is
-a relative backreference referring to the previous N'th capturing group.
-When the bracket form is used and N is not an integer, it is treated as a
-reference to a named buffer.
+backreferences, Perl provides the C<\g{N}> notation (starting with perl
+5.10.0). The curly brackets are optional, however omitting them is less
+safe as the meaning of the pattern can be changed by text (such as digits)
+following it. When N is a positive integer the C<\g{N}> notation is
+exactly equivalent to using normal backreferences. When N is a negative
+integer then it is a relative backreference referring to the previous N'th
+capturing group. When the bracket form is used and N is not an integer, it
+is treated as a reference to a named buffer.
Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
buffer before that. For example:
@@ -544,7 +584,7 @@ buffer before that. For example:
and would match the same as C(Y) ( (X) \3 \1 )/x>.
-Additionally, as of Perl 5.10 you may use named capture buffers and named
+Additionally, as of Perl 5.10.0 you may use named capture buffers and named
backreferences. The notation is C<< (?...) >> to declare and C<< \k >>
to reference. You may also use apostrophes instead of angle brackets to delimit the
name; and you may use the bracketed C<< \g{name} >> backreference syntax.
@@ -614,7 +654,7 @@ already paid the price. As of 5.005, C<$&> is not so costly as the
other two.
X<$&> X<$`> X<$'>
-As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
+As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
and C<$'>, B that they are only guaranteed to be defined after a
successful match that was executed with the C
(preserve) modifier.
@@ -740,7 +780,7 @@ X<(?|)> X
This is the "branch reset" pattern, which has the special property
that the capture buffers are numbered from the same starting point
-in each alternation branch. It is available starting from perl 5.10.
+in each alternation branch. It is available starting from perl 5.10.0.
Capture buffers are numbered from left to right, but inside this
construct the numbering is restarted for each branch.
@@ -2108,9 +2148,9 @@ part of this regular expression needs to be converted explicitly
=head1 PCRE/Python Support
-As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
to the regex syntax. While Perl programmers are encouraged to use the
-Perl specific syntax, the following are legal in Perl 5.10:
+Perl specific syntax, the following are also accepted:
=over 4