Functions documented in the Camel to default to $_ now in
fact do, and all those that do are so documented in L<perlfunc>.
-=item C<m//g> does not trigger a pos() reset on failure
-
-The C<m//g> match iteration construct used to reset the iteration
-when it failed to match (so that the next C<m//g> match would start at
-the beginning of the string). You now have to explicitly do a
-C<pos $str = 0;> to reset the "last match" position, or modify the
-string in some way. This change makes it practical to chain C<m//g>
-matches together in conjunction with ordinary matches using the C<\G>
-zero-width assertion. See L<perlop> and L<perlre>.
+=item C<m//g> does not reset search position on failure
+
+The C<m//g> match iteration construct used to reset its target string's
+search position (which is visible through the C<pos> operator) when a
+match failed; as a result, the next C<m//g> match would start at the
+beginning of the string). With Perl 5.004, the search position must be
+reset explicitly, as with C<pos $str = 0;>, or by modifying the target
+string. This change in Perl makes it possible to chain matches together
+in conjunction with the C<\G> zero-width assertion. See L<perlop> and
+L<perlre>.
+
+Here is an illustration of what it takes to get the old behavior:
+
+ for ( qw(this and that are not what you think you got) ) {
+ while ( /(\w*t\w*)/g ) { print "t word is: $1\n" }
+ pos = 0; # REQUIRED FOR 5.004
+ while ( /(\w*a\w*)/g ) { print "a word is: $1\n" }
+ print "\n";
+ }
=item C<m//x> ignores whitespace before ?*+{}
restarts the search at that point. You can actually find the current
match position of a string or set it using the pos() function--see
L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
-matches or intermix C<m//g> matches with C<m/\G.../>.
+matches or intermix C<m//g> matches with C<m/\G.../g>. Note that
+the C<\G> zero-width assertion is not supported without the C</g>
+modifier; currently, without C</g>, C<\G> behaves just like C<\A>, but
+that's accidental and may change in the future.
If you modify the string in any way, the match position is reset to the
beginning. Examples:
print "1: '";
print $1 while /(o)/g; print "', pos=", pos, "\n";
print "2: '";
- print $1 if /\G(q)/; print "', pos=", pos, "\n";
+ print $1 if /\G(q)/g; print "', pos=", pos, "\n";
print "3: '";
print $1 while /(p)/g; print "', pos=", pos, "\n";
}
\B Match a non-(word boundary)
\A Match at only beginning of string
\Z Match at only end of string (or before newline at the end)
- \G Match only where previous m//g left off
+ \G Match only where previous m//g left off (works only with /g)
A word boundary (C<\b>) is defined as a spot between two characters that
has a C<\w> on one side of it and a C<\W> on the other side of it (in
just like "^" and "$" except that they won't match multiple times when the
C</m> modifier is used, while "^" and "$" will match at every internal line
boundary. To match the actual end of the string, not ignoring newline,
-you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global
-matches (using C<m//g>) and non-global ones, as described in
+you can use C<\Z(?!\n)>. The C<\G> assertion can be used to chain global
+matches (using C<m//g>), as described in
L<perlop/"Regexp Quote-Like Operators">.
+
It is also useful when writing C<lex>-like scanners, when you have several
regexps which you want to match against consequent substrings of your
string, see the previous reference.