X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=6060e181b752a1f30863d0ef931bfcc5b0f1cb5e;hb=6ef249b908f1fd6caec1b0140c6be9c66f4eb1f2;hp=74a8bd9fd50bdca384c8a45f0ca22fe0397deba7;hpb=76a9873e006cf8f48f57062b2a0dd40b5ed45a95;p=p5sagit%2Fp5-mst-13.2.git
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 74a8bd9..6060e18 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1,73 +1,111 @@
=head1 NAME
+X
+
+Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
+C<\w>, C<\n>. Unlike some other regular expression languages, there
+are no backslashed symbols that aren't alphanumeric. So anything
+that looks like \\, \(, \), \<, \>, \{, or \} is always
+interpreted as a literal character, not a metacharacter. This was
+once used in a common idiom to disable or quote the special meanings
+of regular expression metacharacters in a string that you want to
+use for a pattern. Simply quote all non-"word" characters:
$pattern =~ s/(\W)/\\$1/g;
-You can also use the built-in quotemeta() function to do this.
-An even easier way to quote metacharacters right in the match operator
-is to say
+(If C modifier is special in that it can only be enabled,
+not disabled, and that its presence anywhere in a pattern has a global
+effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
+when executed under C is not interpolated. Currently,
+the rules to determine where the C
ends are somewhat convoluted.
+
+This feature can be used together with the special variable C<$^N> to
+capture the results of submatches in variables without having to keep
+track of the number of nested parentheses. For example:
+
+ $_ = "The brown fox jumps over the lazy dog";
+ /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
+ print "color = $color, animal = $animal\n";
+
+Inside the C<(?{...})> block, C<$_> refers to the string the regular
+expression is matching against. You can also use C
is properly scoped in the following sense: If the assertion
+is backtracked (compare L<"Backtracking">), all changes introduced after
+C
is put into the special variable C<$^R>. This happens
+immediately, so C<$^R> can be used from other C<(?{ code })> assertions
+inside the same regular expression.
+
+The assignment to C<$^R> above is properly localized, so the old
+value of C<$^R> is restored if the assertion is backtracked; compare
+L<"Backtracking">.
+
+Due to an unfortunate implementation issue, the Perl code contained in these
+blocks is treated as a compile time closure that can have seemingly bizarre
+consequences when used with lexically scoped variables inside of subroutines
+or loops. There are various workarounds for this, including simply using
+global variables instead. If you are using this construct and strange results
+occur then check for the use of lexically scoped variables.
+
+For reasons of security, this construct is forbidden if the regular
+expression involves run-time interpolation of variables, unless the
+perilous C
),
+or indirectly with functions such as C is evaluated
+at run time, at the moment this subexpression may match. The result
+of evaluation is considered as a regular expression and matched as
+if it were inserted instead of this construct. Note that this means
+that the contents of capture buffers defined inside an eval'ed pattern
+are not available outside of the pattern, and vice versa, there is no
+way for the inner pattern to refer to a capture buffer defined outside.
+Thus,
+
+ ('a' x 100)=~/(??{'(.)' x 100})/
+
+B
is not interpolated. As before, the rules to determine
+where the C
ends are currently somewhat convoluted.
+
+The following pattern matches a parenthesized group:
+
+ $re = qr{
+ \(
+ (?:
+ (?> [^()]+ ) # Non-parens without backtracking
+ |
+ (??{ $re }) # Group with matching parens
+ )*
+ \)
+ }x;
+
+See also C<(?PARNO)> for a different, more efficient way to accomplish
+the same task.
+
+Because perl's regex engine is not currently re-entrant, delayed
+code may not invoke the regex engine either directly with C
),
+or indirectly with functions such as C is a double-quoted string. C<\1> in
+PerlThink, the righthand side of an C is a double-quoted string. C<\1> in
the usual double-quoted string means a control-A. The customary Unix
meaning of C<\1> is kludged in for C. However, if you get into the habit
of doing that, you get yourself into trouble if you then add an C
modifier.
- s/(\d+)/ \1 + 1 /eg;
+ s/(\d+)/ \1 + 1 /eg; # causes warning under -w
Or if you try to do
s/(\d+)/\1000/;
You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with
-C<${1}000>. Basically, the operation of interpolation should not be confused
+C<${1}000>. The operation of interpolation should not be confused
with the operation of matching a backreference. Certainly they mean two
different things on the I.
+
+=head2 Repeated Patterns Matching a Zero-length Substring
+
+B, C etc
+(in these examples C and C and C