X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=a076d3ad66a8747e1a83a5c3be0d334ce0ba8411;hb=afa74d4282044c64ab152392003f47bb0674abd2;hp=8bd517e2e42e5b079f1a0819ede75593e3fc6d2f;hpb=353c650532037e4006fbdb2176350717f320f7c3;p=p5sagit%2Fp5-mst-13.2.git
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 8bd517e..a076d3a 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -102,7 +102,7 @@ X
=head3 Metacharacters
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L for
@@ -258,7 +258,7 @@ X X X X
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
- equivalent to (?:\PM\pM*)
+ equivalent to (?>\PM\pM*)
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
@@ -375,20 +375,60 @@ X X<\p> X<\p{}>
digit IsDigit \d
graph IsGraph
lower IsLower
- print IsPrint
- punct IsPunct
+ print IsPrint (but see [2] below)
+ punct IsPunct (but see [3] below)
space IsSpace
IsSpacePerl \s
upper IsUpper
- word IsWord
+ word IsWord \w
xdigit IsXDigit
For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
+However, the equivalence between C<[[:xxxxx:]]> and C<\p{IsXxxxx}>
+is not exact.
+
+=over 4
+
+=item [1]
+
If the C pragma is not used but the C pragma is, the
classes correlate with the usual isalpha(3) interface (except for
"word" and "blank").
+But if the C or C pragmas are not used and
+the string is not C, then C<[[:xxxxx:]]> (and C<\w>, etc.)
+will not match characters 0x80-0xff; whereas C<\p{IsXxxxx}> will
+force the string to C and can match these characters
+(as Unicode).
+
+=item [2]
+
+C<\p{IsPrint}> matches characters 0x09-0x0d but C<[[:print:]]> does not.
+
+=item [3]
+
+C<[[:punct::]]> matches the following but C<\p{IsPunct}> does not,
+because they are classed as symbols (not punctuation) in Unicode.
+
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
+
+Modifier symbols (accents)
+
+=back
+
+=back
+
The other named classes are:
=over 4
@@ -521,14 +561,14 @@ backreferences.
X<\g{1}> X<\g{-1}> X<\g{name}> X X
In order to provide a safer and easier way to construct patterns using
-backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
-brackets are optional, however omitting them is less safe as the meaning
-of the pattern can be changed by text (such as digits) following it.
-When N is a positive integer the C<\g{N}> notation is exactly equivalent
-to using normal backreferences. When N is a negative integer then it is
-a relative backreference referring to the previous N'th capturing group.
-When the bracket form is used and N is not an integer, it is treated as a
-reference to a named buffer.
+backreferences, Perl provides the C<\g{N}> notation (starting with perl
+5.10.0). The curly brackets are optional, however omitting them is less
+safe as the meaning of the pattern can be changed by text (such as digits)
+following it. When N is a positive integer the C<\g{N}> notation is
+exactly equivalent to using normal backreferences. When N is a negative
+integer then it is a relative backreference referring to the previous N'th
+capturing group. When the bracket form is used and N is not an integer, it
+is treated as a reference to a named buffer.
Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
buffer before that. For example:
@@ -544,7 +584,7 @@ buffer before that. For example:
and would match the same as C(Y) ( (X) \3 \1 )/x>.
-Additionally, as of Perl 5.10 you may use named capture buffers and named
+Additionally, as of Perl 5.10.0 you may use named capture buffers and named
backreferences. The notation is C<< (?...) >> to declare and C<< \k >>
to reference. You may also use apostrophes instead of angle brackets to delimit the
name; and you may use the bracketed C<< \g{name} >> backreference syntax.
@@ -614,7 +654,7 @@ already paid the price. As of 5.005, C<$&> is not so costly as the
other two.
X<$&> X<$`> X<$'>
-As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
+As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
and C<$'>, B that they are only guaranteed to be defined after a
successful match that was executed with the C
(preserve) modifier.
@@ -740,7 +780,7 @@ X<(?|)> X
This is the "branch reset" pattern, which has the special property
that the capture buffers are numbered from the same starting point
-in each alternation branch. It is available starting from perl 5.10.
+in each alternation branch. It is available starting from perl 5.10.0.
Capture buffers are numbered from left to right, but inside this
construct the numbering is restarted for each branch.
@@ -761,6 +801,9 @@ which buffer the captured content will be stored.
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1 2 2 3 2 3 4
+Note: as of Perl 5.10.0, branch resets interfere with the contents of
+the C<%+> hash, that holds named captures. Consider using C<%-> instead.
+
=item Look-Around Assertions
X X X X
@@ -837,9 +880,9 @@ only for fixed-width look-behind.
X<< (?) >> X<(?'NAME')> X X
A named capture buffer. Identical in every respect to normal capturing
-parentheses C<()> but for the additional fact that C<%+> may be used after
-a successful match to refer to a named buffer. See C for more
-details on the C<%+> hash.
+parentheses C<()> but for the additional fact that C<%+> or C<%-> may be
+used after a successful match to refer to a named buffer. See C
+for more details on the C<%+> and C<%-> hashes.
If multiple distinct capture buffers have the same name then the
$+{NAME} will refer to the leftmost defined buffer in the match.
@@ -862,7 +905,7 @@ its Unicode extension (see L),
though it isn't extended by the locale (see L).
B In order to make things easier for programmers with experience
-with the Python or PCRE regex engines, the pattern C<< (?Ppattern) >>
+with the Python or PCRE regex engines, the pattern C<< (?PENAMEEpattern) >>
may be used instead of C<< (?pattern) >>; however this form does not
support the use of single quotes as a delimiter for the name.
@@ -2105,13 +2148,13 @@ part of this regular expression needs to be converted explicitly
=head1 PCRE/Python Support
-As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
to the regex syntax. While Perl programmers are encouraged to use the
-Perl specific syntax, the following are legal in Perl 5.10:
+Perl specific syntax, the following are also accepted:
=over 4
-=item C<< (?Ppattern) >>
+=item C<< (?PENAMEEpattern) >>
Define a named capture buffer. Equivalent to C<< (?pattern) >>.