From: Rafael Garcia-Suarez Date: Tue, 7 Aug 2007 09:41:31 +0000 (+0000) Subject: Documentation updates for new regexp features X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=64c5a5665d9d2e73526d93f8e1b8e0488ead3228;p=p5sagit%2Fp5-mst-13.2.git Documentation updates for new regexp features p4raw-id: //depot/perl@31683 --- diff --git a/pod/perlop.pod b/pod/perlop.pod index e02ad41..355e8aa 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -1056,9 +1056,9 @@ This operator quotes (and possibly compiles) its I as a regular expression. I is interpolated the same way as I in C. If "'" is used as the delimiter, no interpolation is done. Returns a Perl value which may be used instead of the -corresponding C expression. The returned value is a +corresponding C expression. The returned value is a normalized version of the original pattern. It magically differs from -a string containing the same characters: ref(qr/x/) returns "Regexp", +a string containing the same characters: C returns "Regexp", even though dereferencing the result returns undef. For example, diff --git a/pod/perlre.pod b/pod/perlre.pod index 88023ef..07ff022 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -558,7 +558,7 @@ and C<< \k >> refer to the leftmost defined group. (Thus it's possible to do things with named capture buffers that would otherwise require C<(??{})> code to accomplish.) X X -X<%+> X<$+{name}> X<\k{name}> +X<%+> X<$+{name}> X<< \k >> Examples: @@ -865,10 +865,9 @@ its Unicode extension (see L), though it isn't extended by the locale (see L). B In order to make things easier for programmers with experience -with the Python or PCRE regex engines, the pattern C<< (?PENAMEEpattern) >> +with the Python or PCRE regex engines, the pattern C<< (?Ppattern) >> may be used instead of C<< (?pattern) >>; however this form does not -support the use of single quotes as a delimiter for the name. This is -only available in Perl 5.10 or later. +support the use of single quotes as a delimiter for the name. =item C<< \k >> @@ -886,7 +885,7 @@ Both forms are equivalent. B In order to make things easier for programmers with experience with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >> -may be used instead of C<< \k >> in Perl 5.10 or later. +may be used instead of C<< \k >>. =item C<(?{ code })> X<(?{})> X X X @@ -1111,7 +1110,7 @@ pattern. B In order to make things easier for programmers with experience with the Python or PCRE regex engines the pattern C<< (?P>NAME) >> -may be used instead of C<< (?&NAME) >> in Perl 5.10 or later. +may be used instead of C<< (?&NAME) >>. =item C<(?(condition)yes-pattern|no-pattern)> X<(?()> @@ -2115,7 +2114,7 @@ Perl specific syntax, the following are legal in Perl 5.10: =over 4 -=item C<< (?PENAMEEpattern) >> +=item C<< (?Ppattern) >> Define a named capture buffer. Equivalent to C<< (?pattern) >>. diff --git a/pod/perlreref.pod b/pod/perlreref.pod index a5533e3..b9fb3b0 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -36,7 +36,7 @@ applying the given options. If 'pattern' is an empty string, the last I matched regex is used. Delimiters other than '/' may be used for both this -operator and the following ones. The leading C can be ommitted +operator and the following ones. The leading C can be omitted if the delimiter is '/'. C lets you store a regex in a variable, @@ -69,7 +69,13 @@ delimiters can be used. Must be reset with reset(). (...) Groups subexpressions for capturing to $1, $2... (?:...) Groups subexpressions without capturing (cluster) | Matches either the subexpression preceding or following it - \1, \2 ... Matches the text from the Nth group + \1, \2, \3 ... Matches the text from the Nth group + \g1 or \g{1}, \g2 ... Matches the text from the Nth group + \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group + \g{name} Named backreference + \k Named backreference + \k'name' Named backreference + (?P=name) Named backreference (python syntax) =head2 ESCAPE SEQUENCES @@ -167,34 +173,59 @@ All are zero-width assertions. \z Match absolute string end \G Match where previous m//g left off + \K Keep the stuff left of the \K, don't include it in $& + =head2 QUANTIFIERS Quantifiers are greedy by default -- match the B leftmost. - Maximal Minimal Allowed range - ------- ------- ------------- - {n,m} {n,m}? Must occur at least n times but no more than m times - {n,} {n,}? Must occur at least n times - {n} {n}? Must occur exactly n times - * *? 0 or more times (same as {0,}) - + +? 1 or more times (same as {1,}) - ? ?? 0 or 1 time (same as {0,1}) + Maximal Minimal Possessive Allowed range + ------- ------- ---------- ------------- + {n,m} {n,m}? {n,m}+ Must occur at least n times + but no more than m times + {n,} {n,}? {n,}+ Must occur at least n times + {n} {n}? {n}+ Must occur exactly n times + * *? *+ 0 or more times (same as {0,}) + + +? ++ 1 or more times (same as {1,}) + ? ?? ?+ 0 or 1 time (same as {0,1}) + +The possessive forms (new in Perl 5.10) prevent backtracking: what gets +matched by a pattern with a possessive quantifier will not be backtracked +into, even if that causes the whole match to fail. There is no quantifier {,n} -- that gets understood as a literal string. =head2 EXTENDED CONSTRUCTS - (?#text) A comment - (?imxs-imsx:...) Enable/disable option (as per m// modifiers) - (?=...) Zero-width positive lookahead assertion - (?!...) Zero-width negative lookahead assertion - (?<=...) Zero-width positive lookbehind assertion - (?...) Grab what we can, prohibit backtracking - (?{ code }) Embedded code, return value becomes $^R - (??{ code }) Dynamic regex, return value used as regex - (?(cond)yes|no) cond being integer corresponding to capturing parens - (?(cond)yes) or a lookaround/eval zero-width assertion + (?#text) A comment + (?:...) Groups subexpressions without capturing (cluster) + (?pimsx-imsx:...) Enable/disable option (as per m// modifiers) + (?=...) Zero-width positive lookahead assertion + (?!...) Zero-width negative lookahead assertion + (?<=...) Zero-width positive lookbehind assertion + (?...) Grab what we can, prohibit backtracking + (?|...) Branch reset + (?...) Named capture + (?'name'...) Named capture + (?P...) Named capture (python syntax) + (?{ code }) Embedded code, return value becomes $^R + (??{ code }) Dynamic regex, return value used as regex + (?N) Recurse into subpattern number N + (?-N), (?+N) Recurse into Nth previous/next subpattern + (?R), (?0) Recurse at the beginning of the whole pattern + (?&name) Recurse into a named subpattern + (?P>name) Recurse into a named subpattern (python syntax) + (?(cond)yes|no) + (?(cond)yes) Conditional expression, where "cond" can be: + (N) subpattern N has matched something + () named subpattern has matched something + ('name') named subpattern has matched something + (?{code}) code condition + (R) true if recursing + (RN) true if recursing into Nth subpattern + (R&name) true if recursing into named subpattern + (DEFINE) always false, no no-pattern allowed =head2 VARIABLES @@ -209,7 +240,7 @@ There is no quantifier {,n} -- that gets understood as a literal string. ${^POSTMATCH} Everything after to matched string The use of C<$`>, C<$&> or C<$'> will slow down B regex use -within your program. Consult L for C<@LAST_MATCH_START> +within your program. Consult L for C<@-> to see equivalent expressions that won't cause slow down. See also L. Starting with Perl 5.10, you can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> @@ -253,7 +284,7 @@ certain characters like the German "sharp s" there is a difference. =head1 AUTHOR -Iain Truskett. +Iain Truskett. Updated by the Perl 5 Porters. This document may be distributed under the same terms as Perl itself. @@ -291,6 +322,14 @@ L for FAQs on regular expressions. =item * +L for a reference on backslash sequences. + +=item * + +L for a reference on character classes. + +=item * + The L module to alter behaviour and aid debugging.