From: Gurusamy Sarathy Date: Sat, 12 Apr 1997 20:48:41 +0000 (-0400) Subject: Explain //g and \G issues X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=a99df21cfa7a5a885ca7e0b0c7aca7e984889792;p=p5sagit%2Fp5-mst-13.2.git Explain //g and \G issues private-msgid: 199704122048.QAA25060@aatma.engin.umich.edu --- diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 4aa05ed..fdd4dde 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -350,15 +350,25 @@ of course, or if you want a seed other than the default. Functions documented in the Camel to default to $_ now in fact do, and all those that do are so documented in L. -=item C does not trigger a pos() reset on failure - -The C match iteration construct used to reset the iteration -when it failed to match (so that the next C match would start at -the beginning of the string). You now have to explicitly do a -C to reset the "last match" position, or modify the -string in some way. This change makes it practical to chain C -matches together in conjunction with ordinary matches using the C<\G> -zero-width assertion. See L and L. +=item C does not reset search position on failure + +The C match iteration construct used to reset its target string's +search position (which is visible through the C operator) when a +match failed; as a result, the next C match would start at the +beginning of the string). With Perl 5.004, the search position must be +reset explicitly, as with C, or by modifying the target +string. This change in Perl makes it possible to chain matches together +in conjunction with the C<\G> zero-width assertion. See L and +L. + +Here is an illustration of what it takes to get the old behavior: + + for ( qw(this and that are not what you think you got) ) { + while ( /(\w*t\w*)/g ) { print "t word is: $1\n" } + pos = 0; # REQUIRED FOR 5.004 + while ( /(\w*a\w*)/g ) { print "a word is: $1\n" } + print "\n"; + } =item C ignores whitespace before ?*+{} diff --git a/pod/perlop.pod b/pod/perlop.pod index 45dafaa..3bd4f21 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -701,7 +701,10 @@ matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current match position of a string or set it using the pos() function--see L.) Note that you can use this feature to stack C -matches or intermix C matches with C. +matches or intermix C matches with C. Note that +the C<\G> zero-width assertion is not supported without the C +modifier; currently, without C, C<\G> behaves just like C<\A>, but +that's accidental and may change in the future. If you modify the string in any way, the match position is reset to the beginning. Examples: @@ -724,7 +727,7 @@ beginning. Examples: print "1: '"; print $1 while /(o)/g; print "', pos=", pos, "\n"; print "2: '"; - print $1 if /\G(q)/; print "', pos=", pos, "\n"; + print $1 if /\G(q)/g; print "', pos=", pos, "\n"; print "3: '"; print $1 while /(p)/g; print "', pos=", pos, "\n"; } diff --git a/pod/perlre.pod b/pod/perlre.pod index f881a3b..ed9c533 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -163,7 +163,7 @@ Perl defines the following zero-width assertions: \B Match a non-(word boundary) \A Match at only beginning of string \Z Match at only end of string (or before newline at the end) - \G Match only where previous m//g left off + \G Match only where previous m//g left off (works only with /g) A word boundary (C<\b>) is defined as a spot between two characters that has a C<\w> on one side of it and a C<\W> on the other side of it (in @@ -173,9 +173,10 @@ represents backspace rather than a word boundary.) The C<\A> and C<\Z> are just like "^" and "$" except that they won't match multiple times when the C modifier is used, while "^" and "$" will match at every internal line boundary. To match the actual end of the string, not ignoring newline, -you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global -matches (using C) and non-global ones, as described in +you can use C<\Z(?!\n)>. The C<\G> assertion can be used to chain global +matches (using C), as described in L. + It is also useful when writing C-like scanners, when you have several regexps which you want to match against consequent substrings of your string, see the previous reference.