From: Daniel Chetlin Date: Tue, 5 Sep 2000 04:57:07 +0000 (-0700) Subject: \G in non-/g is well-defined now ... right? X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=5d43e42d71f64cebb856e08f031cf1e743bf3445;p=p5sagit%2Fp5-mst-13.2.git \G in non-/g is well-defined now ... right? Message-ID: <20000905045707.A8620@ilmd.chetlin.org> p4raw-id: //depot/perl@7027 --- diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 29136ab..4ab4d4c 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -527,11 +527,16 @@ variable is no longer "expensive" the way the other two are. =head2 What good is C<\G> in a regular expression? -The notation C<\G> is used in a match or substitution in conjunction the -C modifier (and ignored if there's no C) to anchor the regular -expression to the point just past where the last match occurred, i.e. the -pos() point. A failed match resets the position of C<\G> unless the -C modifier is in effect. +The notation C<\G> is used in a match or substitution in conjunction with +the C modifier to anchor the regular expression to the point just past +where the last match occurred, i.e. the pos() point. A failed match resets +the position of C<\G> unless the C modifier is in effect. C<\G> can be +used in a match without the C modifier; it acts the same (i.e. still +anchors at the pos() point) but of course only matches once and does not +update pos(), as non-C expressions never do. C<\G> in an expression +applied to a target string that has never been matched against a C +expression before or has had its pos() reset is functionally equivalent to +C<\A>, which matches at the beginning of the string. For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading C<< > >> characters), and diff --git a/pod/perlop.pod b/pod/perlop.pod index b317bde..945d4f3 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -851,9 +851,11 @@ string also resets the search position. You can intermix C matches with C, where C<\G> is a zero-width assertion that matches the exact position where the previous -C, if any, left off. The C<\G> assertion is not supported without -the C modifier. (Currently, without C, C<\G> behaves just like -C<\A>, but that's accidental and may change in the future.) +C, if any, left off. Without the C modifier, the C<\G> assertion +still anchors at pos(), but the match is of course only attempted once. +Using C<\G> without C on a target string that has not previously had a +C match applied to it is the same as using the C<\A> assertion to match +the beginning of the string. Examples: @@ -861,7 +863,7 @@ Examples: ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); # scalar context - $/ = ""; $* = 1; # $* deprecated in modern perls + $/ = ""; while (defined($paragraph = <>)) { while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { $sentences++; @@ -879,6 +881,7 @@ Examples: print "3: '"; print $1 while /(p)/gc; print "', pos=", pos, "\n"; } + print "Final: '$1', pos=",pos,"\n" if /\G(.)/; The last example should print: @@ -888,6 +891,13 @@ The last example should print: 1: '', pos=7 2: 'q', pos=8 3: '', pos=8 + Final: 'q', pos=8 + +Notice that the final match matched C instead of C

, which a match +without the C<\G> anchor would have done. Also note that the final match +did not update C -- C is only updated on a C match. If the +final match did indeed match C

, it's a good bet that you're running an +older (pre-5.6.0) Perl. A useful idiom for C-like scanners is C. You can combine several regexps like this to process a string part-by-part,