\D Match a non-digit character
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
- \X Match eXtended Unicode "combining character sequence",
- equivalent to (?>\PM\pM*)
+ \X Match Unicode "eXtended grapheme cluster"
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
backreference only if at least 11 left parentheses have opened
before it. And so on. \1 through \9 are always interpreted as
backreferences.
+If the bracketing group did not match, the associated backreference won't
+match either. (This can happen if the bracketing group is optional, or
+in a different branch of an alternation.)
X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
In order to provide a safer and easier way to construct patterns using
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1 2 2 3 2 3 4
-Note: as of Perl 5.10.0, branch resets interfere with the contents of
-the C<%+> hash, that holds named captures. Consider using C<%-> instead.
+Be careful when using the branch reset pattern in combination with
+named captures. Named captures are implemented as being aliases to
+numbered buffers holding the captures, and that interferes with the
+implementation of the branch reset pattern. If you are using named
+captures in a branch reset pattern, it's best to use the same names,
+in the same order, in each of the alternations:
+
+ /(?| (?<a> x ) (?<b> y )
+ | (?<a> z ) (?<b> w )) /x
+
+Not doing so may lead to surprises:
+
+ "12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
+ say $+ {a}; # Prints '12'
+ say $+ {b}; # *Also* prints '12'.
+
+The problem here is that both the buffer named C<< a >> and the buffer
+named C<< b >> are aliases for the buffer belonging to C<< $1 >>.
=item Look-Around Assertions
X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
See also C<(?PARNO)> for a different, more efficient way to accomplish
the same task.
+For reasons of security, this construct is forbidden if the regular
+expression involves run-time interpolation of variables, unless the
+perilous C<use re 'eval'> pragma has been used (see L<re>), or the
+variables contain results of C<qr//> operator (see
+L<perlop/"qr/STRING/imosx">).
+
Because perl's regex engine is not currently re-entrant, delayed
code may not invoke the regex engine either directly with C<m//> or C<s///>),
or indirectly with functions such as C<split>.
forbidden.
Any pattern containing a special backtracking verb that allows an argument
-has the special behaviour that when executed it sets the current packages'
+has the special behaviour that when executed it sets the current package's
C<$REGERROR> and C<$REGMARK> variables. When doing so the following
rules apply:
when a certain part of the pattern has been successfully matched. This
mark may be given a name. A later C<(*SKIP)> pattern will then skip
forward to that point if backtracked into on failure. Any number of
-C<(*MARK)> patterns are allowed, and the NAME portion is optional and may
-be duplicated.
+C<(*MARK)> patterns are allowed, and the NAME portion may be duplicated.
In addition to interacting with the C<(*SKIP)> pattern, C<(*MARK:NAME)>
can be used to "label" a pattern branch, so that after matching, the