Integrate mainline

[p5sagit/p5-mst-13.2.git] / pod / perlretut.pod
diff --git a/pod/perlretut.pod b/pod/perlretut.pod

index 8f7c8cd..57fc772 100644 (file)
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -1403,6 +1403,8 @@ off.  C<\G> allows us to easily do context-sensitive matching:
 
 The combination of C<//g> and C<\G> allows us to process the string a
 bit at a time and use arbitrary Perl logic to decide what to do next.
+Currently, the C<\G> anchor is only fully supported when used to anchor
+to the start of the pattern.
 
 C<\G> is also invaluable in processing fixed length records with
 regexps.  Suppose we have a snippet of coding region DNA, encoded as
@@ -1705,7 +1707,7 @@ it matches I<any> byte 0-255.  So
 The last regexp matches, but is dangerous because the string
 I<character> position is no longer synchronized to the string I<byte>
 position.  This generates the warning 'Malformed UTF-8
-character'.  C<\C> is best used for matching the binary data in strings
+character'.  The C<\C> is best used for matching the binary data in strings
 with binary data intermixed with Unicode characters.
 
 Let us now discuss the rest of the character classes.  Just as with
@@ -1738,7 +1740,7 @@ traditional Unicode classes:
     IsPrint          /^([LMNPS]|Co|Zs)/
     IsPunct          /^P/
     IsSpace          /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/
-    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D)$/
+    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D|0085|2028|2029)$/
     IsUpper          /^L[ut]/
     IsWord           /^[LMN]/ || $code eq "005F"
     IsXDigit         $code =~ /^00(3[0-9]|[46][1-6])$/
@@ -1752,7 +1754,7 @@ For the full list see L<perlunicode>.
 
 The Unicode has also been separated into various sets of charaters
 which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in),
-for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>.
+for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>.
 For the full list see L<perlunicode>.
 
 C<\X> is an abbreviation for a character class sequence that includes
@@ -1782,10 +1784,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
 character classes.  To negate a POSIX class, put a C<^> in front of
 the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under
 C<utf8>, C<\P{IsDigit}>.  The Unicode and POSIX character classes can
-be used just like C<\d>, both inside and outside of character classes:
+be used just like C<\d>, with the exception that POSIX character
+classes can only be used inside of a character class:
 
     /\s+[abc[:digit:]xyz]\s*/;  # match a,b,c,x,y,z, or a digit
-    /^=item\s[:digit:]/;        # match '=item',
+    /^=item\s[[:digit:]]/;      # match '=item',
                                 # followed by a space and a digit
     use charnames ":full";
     /\s+[abc\p{IsDigit}xyz]\s+/;  # match a,b,c,x,y,z, or a digit
@@ -2001,6 +2004,10 @@ They evaluate true if the regexps do I<not> match:
     $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
     $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'
 
+The C<\C> is unsupported in lookbehind, because the already
+treacherous definition of C<\C> would become even more so
+when going backwards.
+
 =head2 Using independent subexpressions to prevent backtracking
 
 The last few extended patterns in this tutorial are experimental as of