From: Gurusamy Sarathy Date: Mon, 13 Jan 1997 20:13:12 +0000 (-0500) Subject: Document use of pos() and /\G/ X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=b2a07c1c241ec86f010fc0ea3bfa54c8ec28be90;p=p5sagit%2Fp5-mst-13.2.git Document use of pos() and /\G/ Subject: Re: resetting pos broken in _20 On Mon, 13 Jan 1997 12:49:24 EST, Ilya Zakharevich wrote: >Gurusamy Sarathy writes: >> What's wrong with saying >> C after /g fails, to get the behavior >> you want? > >Since this has different semantics. You need to get `pos' before each >match, and reset it after each failing match. > > /=/g; /;/g; /=/g; /;/g; > >may give you non-monotoneous movement of `pos' over the string, which >is a bad thing. Ahh, of course. >But I still do not understand what you mean by "having pos at >end". The bug was that position is reset at failing match, probably >you have some other case in mind? Never mind, I was missing the possibility of chaining //g matches with the \G escape :-( >I did not realize that pos was available at perl 4.?, bug-for-bug >compatibility may be a reason if this was so for so many years... The bug fix seems to make a lot sense (to me) now. \G was essentially useless without the new "incompatiblity", eh? Here's a pod update that documents current behavior in all the places I could think of. - Sarathy. gsar@engin.umich.edu p5p-msgid: <199701132013.PAA26606@aatma.engin.umich.edu> --- diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index c1cd67d..65bba93 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -2132,7 +2132,9 @@ like shift(). Returns the offset of where the last C search left off for the variable is in question ($_ is used when the variable is not specified). May be -modified to change that offset. +modified to change that offset. Such modification will also influence +the C<\G> zero-width assertion in regular expressions. See L and +L. =item print FILEHANDLE LIST diff --git a/pod/perlnews.pod b/pod/perlnews.pod index e6d1225..3ddb1e0 100644 --- a/pod/perlnews.pod +++ b/pod/perlnews.pod @@ -23,7 +23,8 @@ file in the distribution for details. There is a new Configure question that asks if you want to maintain binary compatibility with Perl 5.003. If you choose binary compatibility, you do not have to recompile your extensions, but you -might have symbol conflicts if you embed Perl in another application. +might have symbol conflicts if you embed Perl in another application, +just as in the 5.003 release. =head2 New Opcode Module and Revised Safe Module @@ -186,6 +187,16 @@ function whose prototype you want to retrieve. Functions documented in the Camel to default to $_ now in fact do, and all those that do are so documented in L. +=head2 C does not trigger a pos() reset on failure + +The C match iteration construct used to reset the iteration +when it failed to match (so that the next C match would start at +the beginning of the string). You now have to explicitly do a +C to reset the "last match" position, or modify the +string in some way. This change makes it practical to chain C +matches together in conjunction with ordinary matches using the C<\G> +zero-width assertion. See L and L. + =back =head2 New Built-in Methods diff --git a/pod/perlop.pod b/pod/perlop.pod index a8f34c0..dd3aeab 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -695,7 +695,10 @@ In a scalar context, C iterates through the string, returning TRUE each time it matches, and FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current -match position of a string using the pos() function--see L.) +match position of a string or set it using the pos() function--see +L.) Note that you can use this feature to stack C +matches or intermix C matches with C. + If you modify the string in any way, the match position is reset to the beginning. Examples: @@ -711,6 +714,30 @@ beginning. Examples: } print "$sentences\n"; + # using m//g with \G + $_ = "ppooqppq"; + while ($i++ < 2) { + print "1: '"; + print $1 while /(o)/g; print "', pos=", pos, "\n"; + print "2: '"; + print $1 if /\G(q)/; print "', pos=", pos, "\n"; + print "3: '"; + print $1 while /(p)/g; print "', pos=", pos, "\n"; + } + +The last example should print: + + 1: 'oo', pos=4 + 2: 'q', pos=4 + 3: 'pp', pos=7 + 1: '', pos=7 + 2: 'q', pos=7 + 3: '', pos=7 + +Note how C matches change the value reported by C, but the +non-global match doesn't. + + =item q/STRING/ =item C<'STRING'> diff --git a/pod/perlre.pod b/pod/perlre.pod index 12f9f51..a4c0a7d 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -174,7 +174,10 @@ represents backspace rather than a word boundary.) The C<\A> and C<\Z> are just like "^" and "$" except that they won't match multiple times when the C modifier is used, while "^" and "$" will match at every internal line boundary. To match the actual end of the string, not ignoring newline, -you can use C<\Z(?!\n)>. +you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global +matches (using C) and non-global ones, as described in L. +The actual location where C<\G> will match can also be influenced +by using C as an lvalue. See L. When the bracketing construct C<( ... )> is used, \EdigitE matches the digit'th substring. Outside of the pattern, always use "$" instead of "\" diff --git a/pod/perltrap.pod b/pod/perltrap.pod index b8247a4..4b56dd2 100644 --- a/pod/perltrap.pod +++ b/pod/perltrap.pod @@ -1108,6 +1108,26 @@ repeatedly, like C or C. # perl5 prints: perl5 +=item * Regular Expression + +Under perl4 and upto version 5.003, a failed C match used to +reset the internal iterator, so that subsequent C match attempts +began from the beginning of the string. In perl version 5.004 and later, +failed C matches do not reset the iterator position (which can be +found using the C function--see L). + + $test = "foop"; + for (1..3) { + print $1 while ($test =~ /(o)/g); + # pos $test = 0; # to get old behavior + } + + # perl4 prints: oooooo + # perl5.004 prints: oo + +You may always reset the iterator yourself as shown in the commented line +to get the old behavior. + =back =head2 Subroutine, Signal, Sorting Traps