X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlretut.pod;h=6e06f192914b95af82e9cb85856ac46daf6a111f;hb=c23d1eb0e18a49361001d26c686323d50b0c6d21;hp=8c12a5cfbaef1ea3a336835a6408762cab48573e;hpb=54c18d0455d4f9550786bea467f5a04c96e86890;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 8c12a5c..6e06f19 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -689,10 +689,11 @@ inside goes into the special variables C<$1>, C<$2>, etc. They can be used just as ordinary variables: # extract hours, minutes, seconds - $time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format - $hours = $1; - $minutes = $2; - $seconds = $3; + if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format + $hours = $1; + $minutes = $2; + $seconds = $3; + } Now, we know that in scalar context, S > returns a true or false @@ -1403,7 +1404,8 @@ off. C<\G> allows us to easily do context-sensitive matching: The combination of C and C<\G> allows us to process the string a bit at a time and use arbitrary Perl logic to decide what to do next. -Currently, the C<\G> anchor only works at the beginning of a pattern. +Currently, the C<\G> anchor is only fully supported when used to anchor +to the start of the pattern. C<\G> is also invaluable in processing fixed length records with regexps. Suppose we have a snippet of coding region DNA, encoded as @@ -1706,7 +1708,7 @@ it matches I byte 0-255. So The last regexp matches, but is dangerous because the string I position is no longer synchronized to the string I position. This generates the warning 'Malformed UTF-8 -character'. C<\C> is best used for matching the binary data in strings +character'. The C<\C> is best used for matching the binary data in strings with binary data intermixed with Unicode characters. Let us now discuss the rest of the character classes. Just as with @@ -1753,7 +1755,7 @@ For the full list see L. The Unicode has also been separated into various sets of charaters which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in), -for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>. +for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>. For the full list see L. C<\X> is an abbreviation for a character class sequence that includes @@ -2003,6 +2005,10 @@ They evaluate true if the regexps do I match: $x =~ /foo(?!baz)/; # matches, 'baz' doesn't follow 'foo' $x =~ /(? is unsupported in lookbehind, because the already +treacherous definition of C<\C> would become even more so +when going backwards. + =head2 Using independent subexpressions to prevent backtracking The last few extended patterns in this tutorial are experimental as of