X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlretut.pod;h=f0b5d1d3891ee463b2454dfb8d7feabefb3a7958;hb=0111df86b68202837d8ca044a27bbc00d7895fb1;hp=bb2423b8af0e10dc08090ff2459f98d7b6dc2b5d;hpb=72ff290864ea88cc224b5d3af7058f500755f94a;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlretut.pod b/pod/perlretut.pod index bb2423b..f0b5d1d 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1403,6 +1403,8 @@ off. C<\G> allows us to easily do context-sensitive matching: The combination of C and C<\G> allows us to process the string a bit at a time and use arbitrary Perl logic to decide what to do next. +Currently, the C<\G> anchor is only fully supported when used to anchor +to the start of the pattern. C<\G> is also invaluable in processing fixed length records with regexps. Suppose we have a snippet of coding region DNA, encoded as @@ -1738,7 +1740,7 @@ traditional Unicode classes: IsPrint /^([LMNPS]|Co|Zs)/ IsPunct /^P/ IsSpace /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/ - IsSpacePerl /^Z/ || ($code =~ /^(0009|000A|000C|000D)$/ + IsSpacePerl /^Z/ || ($code =~ /^(0009|000A|000C|000D|0085|2028|2029)$/ IsUpper /^L[ut]/ IsWord /^[LMN]/ || $code eq "005F" IsXDigit $code =~ /^00(3[0-9]|[46][1-6])$/ @@ -1752,7 +1754,7 @@ For the full list see L. The Unicode has also been separated into various sets of charaters which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in), -for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>. +for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>. For the full list see L. C<\X> is an abbreviation for a character class sequence that includes @@ -1782,10 +1784,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s> character classes. To negate a POSIX class, put a C<^> in front of the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under C, C<\P{IsDigit}>. The Unicode and POSIX character classes can -be used just like C<\d>, both inside and outside of character classes: +be used just like C<\d>, with the exception that POSIX character +classes can only be used inside of a character class: /\s+[abc[:digit:]xyz]\s*/; # match a,b,c,x,y,z, or a digit - /^=item\s[:digit:]/; # match '=item', + /^=item\s[[:digit:]]/; # match '=item', # followed by a space and a digit use charnames ":full"; /\s+[abc\p{IsDigit}xyz]\s+/; # match a,b,c,x,y,z, or a digit @@ -2059,7 +2062,7 @@ the first alternative C<[^()]+> matching a substring with no parentheses and the second alternative C<\([^()]*\)> matching a substring delimited by parentheses. The problem with this regexp is that it is pathological: it has nested indeterminate quantifiers - of the form C<(a+|b)+>. We discussed in Part 1 how nested quantifiers +of the form C<(a+|b)+>. We discussed in Part 1 how nested quantifiers like this could take an exponentially long time to execute if there was no match possible. To prevent the exponential blowup, we need to prevent useless backtracking at some point. This can be done by