Subject: Re: resetting pos broken in _20
On Mon, 13 Jan 1997 12:49:24 EST, Ilya Zakharevich wrote:
>Gurusamy Sarathy writes:
>> What's wrong with saying
>> C<pos $foo = length $foo> after /g fails, to get the behavior
>> you want?
>
>Since this has different semantics. You need to get `pos' before each
>match, and reset it after each failing match.
>
> /=/g; /;/g; /=/g; /;/g;
>
>may give you non-monotoneous movement of `pos' over the string, which
>is a bad thing.
Ahh, of course.
>But I still do not understand what you mean by "having pos at
>end". The bug was that position is reset at failing match, probably
>you have some other case in mind?
Never mind, I was missing the possibility of chaining //g matches
with the \G escape :-(
>I did not realize that pos was available at perl 4.?, bug-for-bug
>compatibility may be a reason if this was so for so many years...
The bug fix seems to make a lot sense (to me) now. \G was essentially
useless without the new "incompatiblity", eh?
Here's a pod update that documents current behavior in all the
places I could think of.
- Sarathy.
gsar@engin.umich.edu
p5p-msgid: <
199701132013.PAA26606@aatma.engin.umich.edu>
Returns the offset of where the last C<m//g> search left off for the variable
is in question ($_ is used when the variable is not specified). May be
-modified to change that offset.
+modified to change that offset. Such modification will also influence
+the C<\G> zero-width assertion in regular expressions. See L<perlre> and
+L<perlop>.
=item print FILEHANDLE LIST
There is a new Configure question that asks if you want to maintain
binary compatibility with Perl 5.003. If you choose binary
compatibility, you do not have to recompile your extensions, but you
-might have symbol conflicts if you embed Perl in another application.
+might have symbol conflicts if you embed Perl in another application,
+just as in the 5.003 release.
=head2 New Opcode Module and Revised Safe Module
Functions documented in the Camel to default to $_ now in
fact do, and all those that do are so documented in L<perlfunc>.
+=head2 C<m//g> does not trigger a pos() reset on failure
+
+The C<m//g> match iteration construct used to reset the iteration
+when it failed to match (so that the next C<m//g> match would start at
+the beginning of the string). You now have to explicitly do a
+C<pos $str = 0;> to reset the "last match" position, or modify the
+string in some way. This change makes it practical to chain C<m//g>
+matches together in conjunction with ordinary matches using the C<\G>
+zero-width assertion. See L<perlop> and L<perlre>.
+
=back
=head2 New Built-in Methods
each time it matches, and FALSE when it eventually runs out of
matches. (In other words, it remembers where it left off last time and
restarts the search at that point. You can actually find the current
-match position of a string using the pos() function--see L<perlfunc>.)
+match position of a string or set it using the pos() function--see
+L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
+matches or intermix C<m//g> matches with C<m/\G.../>.
+
If you modify the string in any way, the match position is reset to the
beginning. Examples:
}
print "$sentences\n";
+ # using m//g with \G
+ $_ = "ppooqppq";
+ while ($i++ < 2) {
+ print "1: '";
+ print $1 while /(o)/g; print "', pos=", pos, "\n";
+ print "2: '";
+ print $1 if /\G(q)/; print "', pos=", pos, "\n";
+ print "3: '";
+ print $1 while /(p)/g; print "', pos=", pos, "\n";
+ }
+
+The last example should print:
+
+ 1: 'oo', pos=4
+ 2: 'q', pos=4
+ 3: 'pp', pos=7
+ 1: '', pos=7
+ 2: 'q', pos=7
+ 3: '', pos=7
+
+Note how C<m//g> matches change the value reported by C<pos()>, but the
+non-global match doesn't.
+
+
=item q/STRING/
=item C<'STRING'>
just like "^" and "$" except that they won't match multiple times when the
C</m> modifier is used, while "^" and "$" will match at every internal line
boundary. To match the actual end of the string, not ignoring newline,
-you can use C<\Z(?!\n)>.
+you can use C<\Z(?!\n)>. The C<\G> assertion can be used to mix global
+matches (using C<m//g>) and non-global ones, as described in L<perlop>.
+The actual location where C<\G> will match can also be influenced
+by using C<pos()> as an lvalue. See L<perlfunc/pos>.
When the bracketing construct C<( ... )> is used, \E<lt>digitE<gt> matches the
digit'th substring. Outside of the pattern, always use "$" instead of "\"
# perl5 prints: perl5
+=item * Regular Expression
+
+Under perl4 and upto version 5.003, a failed C<m//g> match used to
+reset the internal iterator, so that subsequent C<m//g> match attempts
+began from the beginning of the string. In perl version 5.004 and later,
+failed C<m//g> matches do not reset the iterator position (which can be
+found using the C<pos()> function--see L<perlfunc/pos>).
+
+ $test = "foop";
+ for (1..3) {
+ print $1 while ($test =~ /(o)/g);
+ # pos $test = 0; # to get old behavior
+ }
+
+ # perl4 prints: oooooo
+ # perl5.004 prints: oo
+
+You may always reset the iterator yourself as shown in the commented line
+to get the old behavior.
+
=back
=head2 Subroutine, Signal, Sorting Traps