X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=a076d3ad66a8747e1a83a5c3be0d334ce0ba8411;hb=9f2f055aa1e8c86d97b5ea42473ab1747f518f3a;hp=56df3eb48633ec6295ed57a873919e9b428ea103;hpb=e2e6bec7edbce640c3881df6572504b616436a5e;p=p5sagit%2Fp5-mst-13.2.git
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 56df3eb..a076d3a 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -102,7 +102,7 @@ X
=head3 Metacharacters
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L for
@@ -223,9 +223,9 @@ X<\0> X<\c> X<\N> X<\x>
\e escape (think troff) (ESC)
\033 octal char (example: ESC)
\x1B hex char (example: ESC)
- \x{263a} wide hex char (example: Unicode SMILEY)
+ \x{263a} long hex char (example: Unicode SMILEY)
\cK control char (example: VT)
- \N{name} named char
+ \N{name} named Unicode character
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
@@ -258,7 +258,7 @@ X X X X
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
- equivalent to (?:\PM\pM*)
+ equivalent to (?>\PM\pM*)
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
@@ -270,9 +270,6 @@ X X X X
optionally be wrapped in curly brackets for safer parsing.
\g{name} Named backreference
\k Named backreference
- \N{name} Named Unicode character, or Unicode escape
- \x12 Hexadecimal escape sequence
- \x{1234} Long hexadecimal escape sequence
\K Keep the stuff left of the \K, don't include it in $&
\v Vertical whitespace
\V Not vertical whitespace
@@ -296,7 +293,7 @@ in general.
X<\w> X<\W> X
C<\R> will atomically match a linebreak, including the network line-ending
-"\x0D\x0A". Specifically, X<\R> is exactly equivelent to
+"\x0D\x0A". Specifically, X<\R> is exactly equivalent to
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
@@ -378,21 +375,61 @@ X X<\p> X<\p{}>
digit IsDigit \d
graph IsGraph
lower IsLower
- print IsPrint
- punct IsPunct
+ print IsPrint (but see [2] below)
+ punct IsPunct (but see [3] below)
space IsSpace
IsSpacePerl \s
upper IsUpper
- word IsWord
+ word IsWord \w
xdigit IsXDigit
For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
+However, the equivalence between C<[[:xxxxx:]]> and C<\p{IsXxxxx}>
+is not exact.
+
+=over 4
+
+=item [1]
+
If the C pragma is not used but the C pragma is, the
classes correlate with the usual isalpha(3) interface (except for
"word" and "blank").
-The assumedly non-obviously named classes are:
+But if the C or C pragmas are not used and
+the string is not C, then C<[[:xxxxx:]]> (and C<\w>, etc.)
+will not match characters 0x80-0xff; whereas C<\p{IsXxxxx}> will
+force the string to C and can match these characters
+(as Unicode).
+
+=item [2]
+
+C<\p{IsPrint}> matches characters 0x09-0x0d but C<[[:print:]]> does not.
+
+=item [3]
+
+C<[[:punct::]]> matches the following but C<\p{IsPunct}> does not,
+because they are classed as symbols (not punctuation) in Unicode.
+
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
+
+Modifier symbols (accents)
+
+=back
+
+=back
+
+The other named classes are:
=over 4
@@ -524,14 +561,14 @@ backreferences.
X<\g{1}> X<\g{-1}> X<\g{name}> X X
In order to provide a safer and easier way to construct patterns using
-backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
-brackets are optional, however omitting them is less safe as the meaning
-of the pattern can be changed by text (such as digits) following it.
-When N is a positive integer the C<\g{N}> notation is exactly equivalent
-to using normal backreferences. When N is a negative integer then it is
-a relative backreference referring to the previous N'th capturing group.
-When the bracket form is used and N is not an integer, it is treated as a
-reference to a named buffer.
+backreferences, Perl provides the C<\g{N}> notation (starting with perl
+5.10.0). The curly brackets are optional, however omitting them is less
+safe as the meaning of the pattern can be changed by text (such as digits)
+following it. When N is a positive integer the C<\g{N}> notation is
+exactly equivalent to using normal backreferences. When N is a negative
+integer then it is a relative backreference referring to the previous N'th
+capturing group. When the bracket form is used and N is not an integer, it
+is treated as a reference to a named buffer.
Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
buffer before that. For example:
@@ -547,7 +584,7 @@ buffer before that. For example:
and would match the same as C(Y) ( (X) \3 \1 )/x>.
-Additionally, as of Perl 5.10 you may use named capture buffers and named
+Additionally, as of Perl 5.10.0 you may use named capture buffers and named
backreferences. The notation is C<< (?...) >> to declare and C<< \k >>
to reference. You may also use apostrophes instead of angle brackets to delimit the
name; and you may use the bracketed C<< \g{name} >> backreference syntax.
@@ -558,7 +595,7 @@ and C<< \k >> refer to the leftmost defined group. (Thus it's possible
to do things with named capture buffers that would otherwise require C<(??{})>
code to accomplish.)
X X
-X<%+> X<$+{name}> X<\k{name}>
+X<%+> X<$+{name}> X<< \k >>
Examples:
@@ -617,7 +654,7 @@ already paid the price. As of 5.005, C<$&> is not so costly as the
other two.
X<$&> X<$`> X<$'>
-As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
+As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
and C<$'>, B that they are only guaranteed to be defined after a
successful match that was executed with the C
(preserve) modifier.
@@ -743,7 +780,7 @@ X<(?|)> X
This is the "branch reset" pattern, which has the special property
that the capture buffers are numbered from the same starting point
-in each alternation branch. It is available starting from perl 5.10.
+in each alternation branch. It is available starting from perl 5.10.0.
Capture buffers are numbered from left to right, but inside this
construct the numbering is restarted for each branch.
@@ -764,6 +801,9 @@ which buffer the captured content will be stored.
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1 2 2 3 2 3 4
+Note: as of Perl 5.10.0, branch resets interfere with the contents of
+the C<%+> hash, that holds named captures. Consider using C<%-> instead.
+
=item Look-Around Assertions
X X X X
@@ -840,9 +880,9 @@ only for fixed-width look-behind.
X<< (?) >> X<(?'NAME')> X X
A named capture buffer. Identical in every respect to normal capturing
-parentheses C<()> but for the additional fact that C<%+> may be used after
-a successful match to refer to a named buffer. See C for more
-details on the C<%+> hash.
+parentheses C<()> but for the additional fact that C<%+> or C<%-> may be
+used after a successful match to refer to a named buffer. See C
+for more details on the C<%+> and C<%-> hashes.
If multiple distinct capture buffers have the same name then the
$+{NAME} will refer to the leftmost defined buffer in the match.
@@ -867,8 +907,7 @@ though it isn't extended by the locale (see L).
B In order to make things easier for programmers with experience
with the Python or PCRE regex engines, the pattern C<< (?PENAMEEpattern) >>
may be used instead of C<< (?pattern) >>; however this form does not
-support the use of single quotes as a delimiter for the name. This is
-only available in Perl 5.10 or later.
+support the use of single quotes as a delimiter for the name.
=item C<< \k >>
@@ -886,7 +925,7 @@ Both forms are equivalent.
B In order to make things easier for programmers with experience
with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >>
-may be used instead of C<< \k >> in Perl 5.10 or later.
+may be used instead of C<< \k >>.
=item C<(?{ code })>
X<(?{})> X X X
@@ -1111,7 +1150,7 @@ pattern.
B In order to make things easier for programmers with experience
with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
-may be used instead of C<< (?&NAME) >> in Perl 5.10 or later.
+may be used instead of C<< (?&NAME) >>.
=item C<(?(condition)yes-pattern|no-pattern)>
X<(?()>
@@ -1390,7 +1429,7 @@ If we add a C<(*PRUNE)> before the count like the following
print "Count=$count\n";
we prevent backtracking and find the count of the longest matching
-at each matching startpoint like so:
+at each matching starting point like so:
aaab
aab
@@ -1436,7 +1475,7 @@ outputs
Count=2
Once the 'aaab' at the start of the string has matched, and the C<(*SKIP)>
-executed, the next startpoint will be where the cursor was when the
+executed, the next starting point will be where the cursor was when the
C<(*SKIP)> was executed.
=item C<(*MARK:NAME)> C<(*:NAME)>
@@ -1475,7 +1514,7 @@ As a shortcut C<(*MARK:NAME)> can be written C<(*:NAME)>.
=item C<(*THEN)> C<(*THEN:NAME)>
-This is similar to the "cut group" operator C<::> from Perl6. Like
+This is similar to the "cut group" operator C<::> from Perl 6. Like
C<(*PRUNE)>, this verb always matches, and when backtracked into on
failure, it causes the regex engine to try the next alternation in the
innermost enclosing group (capturing or otherwise).
@@ -1509,7 +1548,7 @@ backtrack and try C; but the C<(*PRUNE)> verb will simply fail.
=item C<(*COMMIT)>
X<(*COMMIT)>
-This is the Perl6 "commit pattern" C<< >> or C<:::>. It's a
+This is the Perl 6 "commit pattern" C<< >> or C<:::>. It's a
zero-width pattern similar to C<(*SKIP)>, except that when backtracked
into on failure it causes the match to fail outright. No further attempts
to find a valid match by advancing the start pointer will occur again.
@@ -2109,9 +2148,9 @@ part of this regular expression needs to be converted explicitly
=head1 PCRE/Python Support
-As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
to the regex syntax. While Perl programmers are encouraged to use the
-Perl specific syntax, the following are legal in Perl 5.10:
+Perl specific syntax, the following are also accepted:
=over 4