X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlretut.pod;h=2c449f8b07ef4061871981f0f9418a00a8772709;hb=22d4bb9ccb8701e68f9243547d7e3a3c55f70908;hp=66f8179ab65d3b56af7f83fcdda817ee808d066d;hpb=4b19af017623bfa3bb72bb164598a517f586e0d3;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 66f8179..2c449f8 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1672,15 +1672,17 @@ i.e., a non-mark followed by one or more marks. As if all those classes weren't enough, Perl also defines POSIX style character classes. These have the form C<[:name:]>, with C the -name of the POSIX class. The POSIX classes are alpha, alnum, ascii, -cntrl, digit, graph, lower, print, punct, space, upper, word, and -xdigit. If C is being used, then these classes are defined the -same as their corresponding perl Unicode classes: C<[:upper:]> is the -same as C<\p{IsUpper}>, etc. The POSIX character classes, however, -don't require using C. The C<[:digit:]>, C<[:word:]>, and +name of the POSIX class. The POSIX classes are C, C, +C, C, C, C, C, C, C, +C, C, and C, and two extensions, C (a Perl +extension to match C<\w>), and C (a GNU extension). If C +is being used, then these classes are defined the same as their +corresponding perl Unicode classes: C<[:upper:]> is the same as +C<\p{IsUpper}>, etc. The POSIX character classes, however, don't +require using C. The C<[:digit:]>, C<[:word:]>, and C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s> -character classes. To negate a POSIX class, put a C<^> in front of the -name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under +character classes. To negate a POSIX class, put a C<^> in front of +the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under C, C<\P{IsDigit}>. The Unicode and POSIX character classes can be used just like C<\d>, both inside and outside of character classes: @@ -2044,8 +2046,41 @@ in the regexp. Here are some silly examples: # prints 'Hi Mom!' $x =~ /aaa(?{print "Hi Mom!";})def/; # doesn't match, # no 'Hi Mom!' + +Pay careful attention to the next example: + $x =~ /abc(?{print "Hi Mom!";})ddd/; # doesn't match, # no 'Hi Mom!' + # but why not? + +At first glance, you'd think that it shouldn't print, because obviously +the C isn't going to match the target string. But look at this +example: + + $x =~ /abc(?{print "Hi Mom!";})[d]dd/; # doesn't match, + # but _does_ print + +Hmm. What happened here? If you've been following along, you know that +the above pattern should be effectively the same as the last one -- +enclosing the d in a character class isn't going to change what it +matches. So why does the first not print while the second one does? + +The answer lies in the optimizations the REx engine makes. In the first +case, all the engine sees are plain old characters (aside from the +C construct). It's smart enough to realize that the string 'ddd' +doesn't occur in our target string before actually running the pattern +through. But in the second case, we've tricked it into thinking that our +pattern is more complicated than it is. It takes a look, sees our +character class, and decides that it will have to actually run the +pattern to determine whether or not it matches, and in the process of +running it hits the print statement before it discovers that we don't +have a match. + +To take a closer look at how the engine does optimizations, see the +section L<"Pragmas and debugging"> below. + +More fun with C: + $x =~ /(?{print "Hi Mom!";})/; # matches, # prints 'Hi Mom!' $x =~ /(?{$c = 1;})(?{print "$c";})/; # matches,