{n,m} Match at least n but not more than m times
(If a curly bracket occurs in any other context, it is treated
-as a regular character.) The "*" modifier is equivalent to C<{0,}>, the "+"
+as a regular character. In particular, the lower bound
+is not optional.) The "*" modifier is equivalent to C<{0,}>, the "+"
modifier to C<{1,}>, and the "?" modifier to C<{0,1}>. n and m are limited
to integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms. The actual limit can
as endpoints of a range, that's not a range, the "-" is understood
literally. If Unicode is in effect, C<\s> matches also "\x{85}",
"\x{2028}, and "\x{2029}", see L<perlunicode> for more details about
-C<\pP>, C<\PP>, and C<\X>, and L<perluniintro> about Unicode in
-general.
+C<\pP>, C<\PP>, and C<\X>, and L<perluniintro> about Unicode in general.
+You can define your own C<\p> and C<\P> propreties, see L<perlunicode>.
The POSIX character class syntax
several patterns that you want to match against consequent substrings
of your string, see the previous reference. The actual location
where C<\G> will match can also be influenced by using C<pos()> as
-an lvalue. Currently C<\G> only works when used at the
-beginning of the pattern. See L<perlfunc/pos>.
+an lvalue: see L<perlfunc/pos>. Currently C<\G> is only fully
+supported when anchored to the start of the pattern; while it
+is permitted to use it elsewhere, as in C</(?<=\G..)./g>, some
+such uses (C</.\G/g>, for example) currently cause problems, and
+it is recommended that you avoid such usage for now.
The bracketing construct C<( ... )> creates capture buffers. To
refer to the digit'th buffer use \<digit> within the
match. C<$+> returns whatever the last bracket match matched.
C<$&> returns the entire matched string. (At one point C<$0> did
also, but now it returns the name of the program.) C<$`> returns
-everything before the matched string. And C<$'> returns everything
-after the matched string.
+everything before the matched string. C<$'> returns everything
+after the matched string. And C<$^N> contains whatever was matched by
+the most-recently closed group (submatch). C<$^N> can be used in
+extended patterns (see below), for example to assign a submatch to a
+variable.
The numbered variables ($1, $2, $3, etc.) and the related punctuation
-set (C<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped
+set (C<$+>, C<$&>, C<$`>, C<$'>, and C<$^N>) are all dynamically scoped
until the end of the enclosing block or until the next successful
match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
always succeeds, and its C<code> is not interpolated. Currently,
the rules to determine where the C<code> ends are somewhat convoluted.
+This feature can be used together with the special variable C<$^N> to
+capture the results of submatches in variables without having to keep
+track of the number of nested parentheses. For example:
+
+ $_ = "The brown fox jumps over the lazy dog";
+ /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
+ print "color = $color, animal = $animal\n";
+
The C<code> is properly scoped in the following sense: If the assertion
is backtracked (compare L<"Backtracking">), all changes introduced after
C<local>ization are undone, so that
know which variety of success you will achieve.
When using look-ahead assertions and negations, this can all get even
-tricker. Imagine you'd like to find a sequence of non-digits not
+trickier. Imagine you'd like to find a sequence of non-digits not
followed by "123". You might try to write that as
$_ = "ABC123";