New release date for 5.12.1 in light of the new RC

[p5sagit/p5-mst-13.2.git] / pod / perlre.pod
diff --git a/pod/perlre.pod b/pod/perlre.pod

index 5287965..d48143e 100644 (file)
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -27,15 +27,6 @@ L<perlop/"Gory details of parsing quoted constructs">.
 
 =over 4
 
-=item i
-X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
-X<regular expression, case-insensitive>
-
-Do case-insensitive pattern matching.
-
-If C<use locale> is in effect, the case map is taken from the current
-locale.  See L<perllocale>.
-
 =item m
 X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
 
@@ -50,15 +41,39 @@ X<regular expression, single-line>
 Treat string as single line.  That is, change "." to match any character
 whatsoever, even a newline, which normally it would not match.
 
-Used together, as /ms, they let the "." match any character whatsoever,
+Used together, as C</ms>, they let the "." match any character whatsoever,
 while still allowing "^" and "$" to match, respectively, just after
 and just before newlines within the string.
 
+=item i
+X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
+X<regular expression, case-insensitive>
+
+Do case-insensitive pattern matching.
+
+If C<use locale> is in effect, the case map is taken from the current
+locale.  See L<perllocale>.
+
 =item x
 X</x>
 
 Extend your pattern's legibility by permitting whitespace and comments.
 
+=item p
+X</p> X<regex, preserve> X<regexp, preserve>
+
+Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
+${^POSTMATCH} are available for use after matching.
+
+=item g and c
+X</g> X</c>
+
+Global matching, and keep the Current position after failed matching.
+Unlike i, m, s and x, these two flags affect the way the regex is used
+rather than the regex itself. See
+L<perlretut/"Using regular expressions in Perl"> for further explanation
+of the g and c modifiers.
+
 =back
 
 These are usually written as "the C</x> modifier", even though the delimiter
@@ -67,27 +82,37 @@ modifiers may also be embedded within the regular expression itself using
 the C<(?...)> construct.  See below.
 
 The C</x> modifier itself needs a little more explanation.  It tells
-the regular expression parser to ignore whitespace that is neither
+the regular expression parser to ignore most whitespace that is neither
 backslashed nor within a character class.  You can use this to break up
 your regular expression into (slightly) more readable parts.  The C<#>
 character is also treated as a metacharacter introducing a comment,
 just as in ordinary Perl code.  This also means that if you want real
 whitespace or C<#> characters in the pattern (outside a character
 class, where they are unaffected by C</x>), then you'll either have to
-escape them (using backslashes or C<\Q...\E>) or encode them using octal
-or hex escapes.  Taken together, these features go a long way towards
+escape them (using backslashes or C<\Q...\E>) or encode them using octal,
+hex, or C<\N{}> escapes.  Taken together, these features go a long way towards
 making Perl's regular expressions more readable.  Note that you have to
 be careful not to include the pattern delimiter in the comment--perl has
 no way of knowing you did not intend to close the pattern early.  See
 the C-comment deletion code in L<perlop>.  Also note that anything inside
-a C<\Q...\E> stays unaffected by C</x>.
+a C<\Q...\E> stays unaffected by C</x>.  And note that C</x> doesn't affect
+whether space interpretation within a single multi-character construct.  For
+example in C<\x{...}>, regardless of the C</x> modifier, there can be no
+spaces.  Same for a L<quantifier|/Quantifiers> such as C<{3}> or
+C<{5,}>.  Similarly, C<(?:...)> can't have a space between the C<?> and C<:>,
+but can between the C<(> and C<?>.  Within any delimiters for such a
+construct, allowed spaces are not affected by C</x>, and depend on the
+construct.  For example, C<\x{...}> can't have spaces because hexadecimal
+numbers don't have spaces in them.  But, Unicode properties can have spaces, so
+in C<\p{...}>  there can be spaces that follow the Unicode rules, for which see
+L<perluniprops/Properties accessible through \p{} and \P{}>.
 X</x>
 
 =head2 Regular Expressions
 
 =head3 Metacharacters
 
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
 the Version 8 regex routines.  (The routines are derived
 (distantly) from Henry Spencer's freely redistributable reimplementation
 of the V8 routines.)  See L<Version 8 Regular Expressions> for
@@ -99,13 +124,13 @@ X<metacharacter>
 X<\> X<^> X<.> X<$> X<|> X<(> X<()> X<[> X<[]>
 
 
-    \  Quote the next metacharacter
-    ^  Match the beginning of the line
-    .  Match any character (except newline)
-    $  Match the end of the line (or before newline at the end)
-    |  Alternation
-    () Grouping
-    [] Character class
+    \        Quote the next metacharacter
+    ^        Match the beginning of the line
+    .        Match any character (except newline)
+    $        Match the end of the line (or before newline at the end)
+    |        Alternation
+    ()       Grouping
+    []       Bracketed Character class
 
 By default, the "^" character is guaranteed to match only the
 beginning of the string, the "$" character only the end (or before the
@@ -130,18 +155,18 @@ X<.> X</s>
 The following standard quantifiers are recognized:
 X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
 
-    *     Match 0 or more times
-    +     Match 1 or more times
-    ?     Match 1 or 0 times
-    {n}    Match exactly n times
-    {n,}   Match at least n times
-    {n,m}  Match at least n but not more than m times
+    *           Match 0 or more times
+    +           Match 1 or more times
+    ?           Match 1 or 0 times
+    {n}         Match exactly n times
+    {n,}        Match at least n times
+    {n,m}       Match at least n but not more than m times
 
 (If a curly bracket occurs in any other context, it is treated
 as a regular character.  In particular, the lower bound
-is not optional.)  The "*" modifier is equivalent to C<{0,}>, the "+"
-modifier to C<{1,}>, and the "?" modifier to C<{0,1}>.  n and m are limited
-to integral values less than a preset limit defined when perl is built.
+is not optional.)  The "*" quantifier is equivalent to C<{0,}>, the "+"
+quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>.  n and m are limited
+to non-negative integral values less than a preset limit defined when perl is built.
 This is usually 32766 on the most common platforms.  The actual limit can
 be seen in the error message generated by code such as this:
 
@@ -155,24 +180,24 @@ that the meanings don't change, just the "greediness":
 X<metacharacter> X<greedy> X<greediness>
 X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
 
-    *?     Match 0 or more times, not greedily
-    +?     Match 1 or more times, not greedily
-    ??     Match 0 or 1 time, not greedily
-    {n}?   Match exactly n times, not greedily
-    {n,}?  Match at least n times, not greedily
-    {n,m}? Match at least n but not more than m times, not greedily
+    *?        Match 0 or more times, not greedily
+    +?        Match 1 or more times, not greedily
+    ??        Match 0 or 1 time, not greedily
+    {n}?      Match exactly n times, not greedily
+    {n,}?     Match at least n times, not greedily
+    {n,m}?    Match at least n but not more than m times, not greedily
 
 By default, when a quantified subpattern does not allow the rest of the
 overall pattern to match, Perl will backtrack. However, this behaviour is
 sometimes undesirable. Thus Perl provides the "possessive" quantifier form
 as well.
 
-    *+     Match 0 or more times and give nothing back
-    ++     Match 1 or more times and give nothing back
-    ?+     Match 0 or 1 time and give nothing back
-    {n}+   Match exactly n times and give nothing back (redundant)
-    {n,}+  Match at least n times and give nothing back
-    {n,m}+ Match at least n but not more than m times and give nothing back
+ *+     Match 0 or more times and give nothing back
+ ++     Match 1 or more times and give nothing back
+ ?+     Match 0 or 1 time and give nothing back
+ {n}+   Match exactly n times and give nothing back (redundant)
+ {n,}+  Match at least n times and give nothing back
+ {n,m}+ Match at least n but not more than m times and give nothing back
 
 For instance,
 
@@ -197,226 +222,107 @@ instance the above example could also be written as follows:
 
 Because patterns are processed as double quoted strings, the following
 also work:
-X<\t> X<\n> X<\r> X<\f> X<\e> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
-X<\0> X<\c> X<\N> X<\x>
-
-    \t         tab                   (HT, TAB)
-    \n         newline               (LF, NL)
-    \r         return                (CR)
-    \f         form feed             (FF)
-    \a         alarm (bell)          (BEL)
-    \e         escape (think troff)  (ESC)
-    \033       octal char            (example: ESC)
-    \x1B       hex char              (example: ESC)
-    \x{263a}   wide hex char         (example: Unicode SMILEY)
-    \cK                control char          (example: VT)
-    \N{name}   named char
-    \l         lowercase next char (think vi)
-    \u         uppercase next char (think vi)
-    \L         lowercase till \E (think vi)
-    \U         uppercase till \E (think vi)
-    \E         end case modification (think vi)
-    \Q         quote (disable) pattern metacharacters till \E
-
-If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
-and C<\U> is taken from the current locale.  See L<perllocale>.  For
-documentation of C<\N{name}>, see L<charnames>.
-
-You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
-An unescaped C<$> or C<@> interpolates the corresponding variable,
-while escaping will cause the literal string C<\$> to be matched.
-You'll need to write something like C<m/\Quser\E\@\Qhost/>.
-
-=head3 Character classes
+
+ \t          tab                   (HT, TAB)
+ \n          newline               (LF, NL)
+ \r          return                (CR)
+ \f          form feed             (FF)
+ \a          alarm (bell)          (BEL)
+ \e          escape (think troff)  (ESC)
+ \033        octal char            (example: ESC)
+ \x1B        hex char              (example: ESC)
+ \x{263a}    long hex char         (example: Unicode SMILEY)
+ \cK         control char          (example: VT)
+ \N{name}    named Unicode character
+ \N{U+263D}  Unicode character     (example: FIRST QUARTER MOON)
+ \l          lowercase next char (think vi)
+ \u          uppercase next char (think vi)
+ \L          lowercase till \E (think vi)
+ \U          uppercase till \E (think vi)
+ \Q          quote (disable) pattern metacharacters till \E
+ \E          end either case modification or quoted section, think vi
+
+Details are in L<perlop/Quote and Quote-like Operators>.
+
+=head3 Character Classes and other Special Escapes
 
 In addition, Perl defines the following:
-X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
-X<\g> X<\k> X<\N> X<\K> X<\v> X<\V>
-X<word> X<whitespace> X<character class> X<backreference>
-
-    \w      Match a "word" character (alphanumeric plus "_")
-    \W      Match a non-"word" character
-    \s      Match a whitespace character
-    \S      Match a non-whitespace character
-    \d      Match a digit character
-    \D      Match a non-digit character
-    \pP             Match P, named property.  Use \p{Prop} for longer names.
-    \PP             Match non-P
-    \X      Match eXtended Unicode "combining character sequence",
-             equivalent to (?:\PM\pM*)
-    \C      Match a single C char (octet) even under Unicode.
-            NOTE: breaks up characters into their UTF-8 bytes,
-            so you may end up with malformed pieces of UTF-8.
-            Unsupported in lookbehind.
-    \1       Backreference to a specific group.
-            '1' may actually be any positive integer.
-    \g1      Backreference to a specific or previous group,
-    \g{-1}   number may be negative indicating a previous buffer and may
-             optionally be wrapped in curly brackets for safer parsing.
-    \g{name} Named backreference
-    \k<name> Named backreference
-    \N{name} Named unicode character, or unicode escape
-    \x12     Hexadecimal escape sequence
-    \x{1234} Long hexadecimal escape sequence
-    \K       Keep the stuff left of the \K, don't include it in $&
-    \v       Shortcut for (*PRUNE)
-    \V       Shortcut for (*SKIP)
-
-A C<\w> matches a single alphanumeric character (an alphabetic
-character, or a decimal digit) or C<_>, not a whole word.  Use C<\w+>
-to match a string of Perl-identifier characters (which isn't the same
-as matching an English word).  If C<use locale> is in effect, the list
-of alphabetic characters generated by C<\w> is taken from the current
-locale.  See L<perllocale>.  You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes, but they aren't usable
-as either end of a range. If any of them precedes or follows a "-",
-the "-" is understood literally. If Unicode is in effect, C<\s> matches
-also "\x{85}", "\x{2028}, and "\x{2029}". See L<perlunicode> for more
-details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
-your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
-in general.
-X<\w> X<\W> X<word>
-
-The POSIX character class syntax
-X<character class>
-
-    [:class:]
-
-is also available.  Note that the C<[> and C<]> brackets are I<literal>;
-they must always be used within a character class expression.
-
-    # this is correct:
-    $string =~ /[[:alpha:]]/;
-
-    # this is not, and will generate a warning:
-    $string =~ /[:alpha:]/;
-
-The available classes and their backslash equivalents (if available) are
-as follows:
-X<character class>
-X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
-X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
-
-    alpha
-    alnum
-    ascii
-    blank              [1]
-    cntrl
-    digit       \d
-    graph
-    lower
-    print
-    punct
-    space       \s     [2]
-    upper
-    word        \w     [3]
-    xdigit
-
-=over
+X<\g> X<\k> X<\K> X<backreference>
+
+ Sequence   Note    Description
+  [...]     [1]  Match a character according to the rules of the
+                   bracketed character class defined by the "...".
+                   Example: [a-z] matches "a" or "b" or "c" ... or "z"
+  [[:...:]] [2]  Match a character according to the rules of the POSIX
+                   character class "..." within the outer bracketed
+                   character class.  Example: [[:upper:]] matches any
+                   uppercase character.
+  \w        [3]  Match a "word" character (alphanumeric plus "_")
+  \W        [3]  Match a non-"word" character
+  \s        [3]  Match a whitespace character
+  \S        [3]  Match a non-whitespace character
+  \d        [3]  Match a decimal digit character
+  \D        [3]  Match a non-digit character
+  \pP       [3]  Match P, named property.  Use \p{Prop} for longer names
+  \PP       [3]  Match non-P
+  \X        [4]  Match Unicode "eXtended grapheme cluster"
+  \C             Match a single C-language char (octet) even if that is
+                   part of a larger UTF-8 character.  Thus it breaks up
+                   characters into their UTF-8 bytes, so you may end up
+                   with malformed pieces of UTF-8.  Unsupported in
+                   lookbehind.
+  \1        [5]  Backreference to a specific capture buffer or group.
+                   '1' may actually be any positive integer.
+  \g1       [5]  Backreference to a specific or previous group,
+  \g{-1}    [5]  The number may be negative indicating a relative
+                   previous buffer and may optionally be wrapped in
+                   curly brackets for safer parsing.
+  \g{name}  [5]  Named backreference
+  \k<name>  [5]  Named backreference
+  \K        [6]  Keep the stuff left of the \K, don't include it in $&
+  \N        [7]  Any character but \n (experimental).  Not affected by
+                   /s modifier
+  \v        [3]  Vertical whitespace
+  \V        [3]  Not vertical whitespace
+  \h        [3]  Horizontal whitespace
+  \H        [3]  Not horizontal whitespace
+  \R        [4]  Linebreak
+
+=over 4
 
 =item [1]
 
-A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace".
+See L<perlrecharclass/Bracketed Character Classes> for details.
 
 =item [2]
 
-Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
-also the (very rare) "vertical tabulator", "\cK" or chr(11) in ASCII.
+See L<perlrecharclass/POSIX Character Classes> for details.
 
 =item [3]
 
-A Perl extension, see above.
+See L<perlrecharclass/Backslash sequences> for details.
 
-=back
+=item [4]
 
-For example use C<[:upper:]> to match all the uppercase characters.
-Note that the C<[]> are part of the C<[::]> construct, not part of the
-whole character class.  For example:
+See L<perlrebackslash/Misc> for details.
 
-    [01[:alpha:]%]
+=item [5]
 
-matches zero, one, any alphabetic character, and the percent sign.
-
-The following equivalences to Unicode \p{} constructs and equivalent
-backslash character classes (if available), will hold:
-X<character class> X<\p> X<\p{}>
-
-    [[:...:]]  \p{...}         backslash
-
-    alpha       IsAlpha
-    alnum       IsAlnum
-    ascii       IsASCII
-    blank
-    cntrl       IsCntrl
-    digit       IsDigit        \d
-    graph       IsGraph
-    lower       IsLower
-    print       IsPrint
-    punct       IsPunct
-    space       IsSpace
-                IsSpacePerl    \s
-    upper       IsUpper
-    word        IsWord
-    xdigit      IsXDigit
-
-For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
-
-If the C<utf8> pragma is not used but the C<locale> pragma is, the
-classes correlate with the usual isalpha(3) interface (except for
-"word" and "blank").
-
-The assumedly non-obviously named classes are:
-
-=over 4
+See L</Capture buffers> below for details.
 
-=item cntrl
-X<cntrl>
+=item [6]
 
-Any control character.  Usually characters that don't produce output as
-such but instead control the terminal somehow: for example newline and
-backspace are control characters.  All characters with ord() less than
-32 are usually classified as control characters (assuming ASCII,
-the ISO Latin character sets, and Unicode), as is the character with
-the ord() value of 127 (C<DEL>).
+See L</Extended Patterns> below for details.
 
-=item graph
-X<graph>
+=item [7]
 
-Any alphanumeric or punctuation (special) character.
-
-=item print
-X<print>
-
-Any alphanumeric or punctuation (special) character or the space character.
-
-=item punct
-X<punct>
-
-Any punctuation (special) character.
-
-=item xdigit
-X<xdigit>
-
-Any hexadecimal digit.  Though this may feel silly ([0-9A-Fa-f] would
-work just fine) it is included for completeness.
+Note that C<\N> has two meanings.  When of the form C<\N{NAME}>, it matches the
+character whose name is C<NAME>; and similarly when of the form
+C<\N{U+I<wide hex char>}>, it matches the character whose Unicode ordinal is
+I<wide hex char>.  Otherwise it matches any character but C<\n>.
 
 =back
 
-You can negate the [::] character classes by prefixing the class name
-with a '^'. This is a Perl extension.  For example:
-X<character class, negation>
-
-    POSIX         traditional  Unicode
-
-    [[:^digit:]]    \D         \P{IsDigit}
-    [[:^space:]]    \S         \P{IsSpace}
-    [[:^word:]]            \W         \P{IsWord}
-
-Perl respects the POSIX standard in that POSIX character classes are
-only supported within a character class.  The POSIX character classes
-[.cc.] and [=cc=] are recognized but B<not> supported and trying to
-use them will cause an error.
-
 =head3 Assertions
 
 Perl defines the following zero-width assertions:
@@ -425,12 +331,12 @@ X<regexp, zero-width assertion>
 X<regular expression, zero-width assertion>
 X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
 
-    \b Match a word boundary
-    \B Match except at a word boundary
-    \A Match only at beginning of string
-    \Z Match only at end of string, or before newline at the end
-    \z Match only at end of string
-    \G Match only at pos() (e.g. at the end-of-match position
+    \b  Match a word boundary
+    \B  Match except at a word boundary
+    \A  Match only at beginning of string
+    \Z  Match only at end of string, or before newline at the end
+    \z  Match only at end of string
+    \G  Match only at pos() (e.g. at the end-of-match position
         of prior m//g)
 
 A word boundary (C<\b>) is a spot between two characters
@@ -494,17 +400,20 @@ left parentheses have opened before it.  Likewise \11 is a
 backreference only if at least 11 left parentheses have opened
 before it.  And so on.  \1 through \9 are always interpreted as
 backreferences.
+If the bracketing group did not match, the associated backreference won't
+match either. (This can happen if the bracketing group is optional, or
+in a different branch of an alternation.)
 
 X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
 In order to provide a safer and easier way to construct patterns using
-backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
-brackets are optional, however omitting them is less safe as the meaning
-of the pattern can be changed by text (such as digits) following it.
-When N is a positive integer the C<\g{N}> notation is exactly equivalent
-to using normal backreferences. When N is a negative integer then it is
-a relative backreference referring to the previous N'th capturing group.
-When the bracket form is used and N is not an integer, it is treated as a
-reference to a named buffer.
+backreferences, Perl provides the C<\g{N}> notation (starting with perl
+5.10.0). The curly brackets are optional, however omitting them is less
+safe as the meaning of the pattern can be changed by text (such as digits)
+following it. When N is a positive integer the C<\g{N}> notation is
+exactly equivalent to using normal backreferences. When N is a negative
+integer then it is a relative backreference referring to the previous N'th
+capturing group. When the bracket form is used and N is not an integer, it
+is treated as a reference to a named buffer.
 
 Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
 buffer before that. For example:
@@ -520,7 +429,7 @@ buffer before that. For example:
 
 and would match the same as C</(Y) ( (X) \3 \1 )/x>.
 
-Additionally, as of Perl 5.10 you may use named capture buffers and named
+Additionally, as of Perl 5.10.0 you may use named capture buffers and named
 backreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>
 to reference. You may also use apostrophes instead of angle brackets to delimit the
 name; and you may use the bracketed C<< \g{name} >> backreference syntax.
@@ -531,7 +440,7 @@ and C<< \k<name> >> refer to the leftmost defined group. (Thus it's possible
 to do things with named capture buffers that would otherwise require C<(??{})>
 code to accomplish.)
 X<named capture buffer> X<regular expression, named capture buffer>
-X<%+> X<$+{name}> X<\k{name}>
+X<%+> X<$+{name}> X<< \k<name> >>
 
 Examples:
 
@@ -547,9 +456,9 @@ Examples:
          and print "'$1' is the first doubled character\n";
 
     if (/Time: (..):(..):(..)/) {   # parse out values
-       $hours = $1;
-       $minutes = $2;
-       $seconds = $3;
+        $hours = $1;
+        $minutes = $2;
+        $seconds = $3;
     }
 
 Several special variables also refer back to portions of the previous
@@ -590,14 +499,14 @@ already paid the price.  As of 5.005, C<$&> is not so costly as the
 other two.
 X<$&> X<$`> X<$'>
 
-As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
+As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
 C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
 and C<$'>, B<except> that they are only guaranteed to be defined after a
-successful match that was executed with the C</k> (keep-copy) modifier.
+successful match that was executed with the C</p> (preserve) modifier.
 The use of these variables incurs no global performance penalty, unlike
 their punctuation char equivalents, however at the trade-off that you
 have to tell perl when you want to use them.
-X</k> X<k modifier>
+X</p> X<p modifier>
 
 Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
 C<\w>, C<\n>.  Unlike some other regular expression languages, there
@@ -652,7 +561,7 @@ whitespace formatting, a simple C<#> will suffice.  Note that Perl closes
 the comment as soon as it sees a C<)>, so there is no way to put a literal
 C<)> in the comment.
 
-=item C<(?kimsx-imsx)>
+=item C<(?pimsx-imsx)>
 X<(?)>
 
 One or more embedded pattern-match modifiers, to be turned on (or
@@ -680,9 +589,13 @@ will match C<blah> in any case, some spaces, and an exact (I<including the case>
 repetition of the previous word, assuming the C</x> modifier, and no C</i>
 modifier outside this group.
 
-Note that the C<k> modifier is special in that it can only be enabled,
+These modifiers do not carry over into named subpatterns called in the
+enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
+change the case-sensitivity of the "NAME" pattern.
+
+Note that the C<p> modifier is special in that it can only be enabled,
 not disabled, and that its presence anywhere in a pattern has a global
-effect. Thus C<(?-k)> and C<(?-k:...)> are meaningless and will warn
+effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
 when executed under C<use warnings>.
 
 =item C<(?:pattern)>
@@ -716,24 +629,46 @@ X<(?|)> X<Branch reset>
 
 This is the "branch reset" pattern, which has the special property
 that the capture buffers are numbered from the same starting point
-in each branch. 
+in each alternation branch. It is available starting from perl 5.10.0.
+
+Capture buffers are numbered from left to right, but inside this
+construct the numbering is restarted for each branch.
+
+The numbering within each branch will be as normal, and any buffers
+following this construct will be numbered as though the construct
+contained only one branch, that being the one with the most capture
+buffers in it.
 
-Normally capture buffers in a pattern are number sequentially, left
-to right in the pattern. Inside of this construct this behaviour is
-overriden so that the captures buffers in each branch share the same
-numbers. The numbering in each branch will be as normal, and any 
-buffers following the use of this pattern will be numbered as though
-the construct contained only one branch, that being the one with the
-most capture buffers in it.
+This construct will be useful when you want to capture one of a
+number of alternative matches.
 
-Consider the following pattern. The numbers underneath are which
-buffer number the captured content will be stored in.
+Consider the following pattern.  The numbers underneath show in
+which buffer the captured content will be stored.
 
 
     # before  ---------------branch-reset----------- after        
     / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
     # 1            2         2  3        2     3     4  
 
+Be careful when using the branch reset pattern in combination with 
+named captures. Named captures are implemented as being aliases to 
+numbered buffers holding the captures, and that interferes with the
+implementation of the branch reset pattern. If you are using named
+captures in a branch reset pattern, it's best to use the same names,
+in the same order, in each of the alternations:
+
+   /(?|  (?<a> x ) (?<b> y )
+      |  (?<a> z ) (?<b> w )) /x
+
+Not doing so may lead to surprises:
+
+  "12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
+  say $+ {a};   # Prints '12'
+  say $+ {b};   # *Also* prints '12'.
+
+The problem here is that both the buffer named C<< a >> and the buffer
+named C<< b >> are aliases for the buffer belonging to C<< $1 >>.
+
 =item Look-Around Assertions
 X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
 
@@ -784,7 +719,7 @@ not include it in C<$&>. This effectively provides variable length
 look-behind. The use of C<\K> inside of another look-around assertion
 is allowed, but the behaviour is currently not well defined.
 
-For various reasons C<\K> may be signifigantly more efficient than the
+For various reasons C<\K> may be significantly more efficient than the
 equivalent C<< (?<=...) >> construct, and it is especially useful in
 situations where you want to efficiently remove something following
 something else in a string. For instance
@@ -810,9 +745,9 @@ only for fixed-width look-behind.
 X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
 
 A named capture buffer. Identical in every respect to normal capturing
-parentheses C<()> but for the additional fact that C<%+> may be used after
-a succesful match to refer to a named buffer. See C<perlvar> for more
-details on the C<%+> hash.
+parentheses C<()> but for the additional fact that C<%+> or C<%-> may be
+used after a successful match to refer to a named buffer. See C<perlvar>
+for more details on the C<%+> and C<%-> hashes.
 
 If multiple distinct capture buffers have the same name then the
 $+{NAME} will refer to the leftmost defined buffer in the match.
@@ -837,8 +772,7 @@ though it isn't extended by the locale (see L<perllocale>).
 B<NOTE:> In order to make things easier for programmers with experience
 with the Python or PCRE regex engines, the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
 may be used instead of C<< (?<NAME>pattern) >>; however this form does not
-support the use of single quotes as a delimiter for the name. This is
-only available in Perl 5.10 or later.
+support the use of single quotes as a delimiter for the name.
 
 =item C<< \k<NAME> >>
 
@@ -856,7 +790,7 @@ Both forms are equivalent.
 
 B<NOTE:> In order to make things easier for programmers with experience
 with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >>
-may be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
+may be used instead of C<< \k<NAME> >>.
 
 =item C<(?{ code })>
 X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
@@ -888,16 +822,16 @@ C<local>ization are undone, so that
 
   $_ = 'a' x 8;
   m<
-     (?{ $cnt = 0 })                   # Initialize $cnt.
+     (?{ $cnt = 0 })                   # Initialize $cnt.
      (
        a
        (?{
-           local $cnt = $cnt + 1;      # Update $cnt, backtracking-safe.
+           local $cnt = $cnt + 1;      # Update $cnt, backtracking-safe.
        })
      )*
      aaaa
-     (?{ $res = $cnt })                        # On success copy to non-localized
-                                       # location.
+     (?{ $res = $cnt })                # On success copy to
+                                       # non-localized location.
    >x;
 
 will set C<$res = 4>.  Note that after the match, C<$cnt> returns to the globally
@@ -914,18 +848,11 @@ The assignment to C<$^R> above is properly localized, so the old
 value of C<$^R> is restored if the assertion is backtracked; compare
 L<"Backtracking">.
 
-Due to an unfortunate implementation issue, the Perl code contained in these
-blocks is treated as a compile time closure that can have seemingly bizarre
-consequences when used with lexically scoped variables inside of subroutines
-or loops.  There are various workarounds for this, including simply using
-global variables instead.  If you are using this construct and strange results
-occur then check for the use of lexically scoped variables.
-
 For reasons of security, this construct is forbidden if the regular
 expression involves run-time interpolation of variables, unless the
 perilous C<use re 'eval'> pragma has been used (see L<re>), or the
 variables contain results of C<qr//> operator (see
-L<perlop/"qr/STRING/imosx">).
+L<perlop/"qr/STRINGE<sol>msixpo">).
 
 This restriction is due to the wide-spread and remarkably convenient
 custom of using run-time determined strings as patterns.  For example:
@@ -942,9 +869,15 @@ so you should only do so if you are also using taint checking.
 Better yet, use the carefully constrained evaluation within a Safe
 compartment.  See L<perlsec> for details about both these mechanisms.
 
-Because Perl's regex engine is currently not re-entrant, interpolated
-code may not invoke the regex engine either directly with C<m//> or C<s///>),
-or indirectly with functions such as C<split>.
+B<WARNING>: Use of lexical (C<my>) variables in these blocks is
+broken. The result is unpredictable and will make perl unstable. The
+workaround is to use global (C<our>) variables.
+
+B<WARNING>: Because Perl's regex engine is currently not re-entrant,
+interpolated code may not invoke the regex engine either directly with
+C<m//> or C<s///>), or indirectly with functions such as
+C<split>. Invoking the regex engine in these blocks will make perl
+unstable.
 
 =item C<(??{ code })>
 X<(??{})>
@@ -974,18 +907,24 @@ where the C<code> ends are currently somewhat convoluted.
 The following pattern matches a parenthesized group:
 
   $re = qr{
-            \(
-            (?:
-               (?> [^()]+ )    # Non-parens without backtracking
-             |
-               (??{ $re })     # Group with matching parens
-            )*
-            \)
-         }x;
+             \(
+             (?:
+                (?> [^()]+ )       # Non-parens without backtracking
+              |
+                (??{ $re })        # Group with matching parens
+             )*
+             \)
+          }x;
 
 See also C<(?PARNO)> for a different, more efficient way to accomplish
 the same task.
 
+For reasons of security, this construct is forbidden if the regular
+expression involves run-time interpolation of variables, unless the
+perilous C<use re 'eval'> pragma has been used (see L<re>), or the
+variables contain results of C<qr//> operator (see
+L<perlop/"qrE<sol>STRINGE<sol>msixpo">).
+
 Because perl's regex engine is not currently re-entrant, delayed
 code may not invoke the regex engine either directly with C<m//> or C<s///>),
 or indirectly with functions such as C<split>.
@@ -1081,7 +1020,7 @@ pattern.
 
 B<NOTE:> In order to make things easier for programmers with experience
 with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
-may be used instead of C<< (?&NAME) >> in Perl 5.10 or later.
+may be used instead of C<< (?&NAME) >>.
 
 =item C<(?(condition)yes-pattern|no-pattern)>
 X<(?()>
@@ -1210,7 +1149,7 @@ Consider this pattern:
 
     m{ \(
           (
-            [^()]+             # x+
+            [^()]+           # x+
           |
             \( [^()]* \)
           )+
@@ -1230,7 +1169,7 @@ hung.  However, a tiny change to this pattern
 
     m{ \(
           (
-            (?> [^()]+ )       # change x+ above to (?> x+ )
+            (?> [^()]+ )        # change x+ above to (?> x+ )
           |
             \( [^()]* \)
           )+
@@ -1295,7 +1234,7 @@ otherwise stated the ARG argument is optional; in some cases, it is
 forbidden.
 
 Any pattern containing a special backtracking verb that allows an argument
-has the special behaviour that when executed it sets the current packages'
+has the special behaviour that when executed it sets the current package's
 C<$REGERROR> and C<$REGMARK> variables. When doing so the following
 rules apply:
 
@@ -1325,7 +1264,7 @@ argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
 =over 4
 
 =item C<(*PRUNE)> C<(*PRUNE:NAME)>
-X<(*PRUNE)> X<(*PRUNE:NAME)> X<\v>
+X<(*PRUNE)> X<(*PRUNE:NAME)>
 
 This zero-width pattern prunes the backtracking tree at the current point
 when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>,
@@ -1335,8 +1274,6 @@ continues in B, which may also backtrack as necessary; however, should B
 not match, then no further backtracking will take place, and the pattern
 will fail outright at the current starting position.
 
-As a shortcut, C<\v> is exactly equivalent to C<(*PRUNE)>.
-
 The following example counts all the possible matching strings in a
 pattern (without actually matching any of them).
 
@@ -1362,7 +1299,7 @@ If we add a C<(*PRUNE)> before the count like the following
     print "Count=$count\n";
 
 we prevent backtracking and find the count of the longest matching
-at each matching startpoint like so:
+at each matching starting point like so:
 
     aaab
     aab
@@ -1388,8 +1325,6 @@ of this pattern. This effectively means that the regex engine "skips" forward
 to this position on failure and tries to match again, (assuming that
 there is sufficient room to match).
 
-As a shortcut C<\V> is exactly equivalent to C<(*SKIP)>.
-
 The name of the C<(*SKIP:NAME)> pattern has special significance. If a
 C<(*MARK:NAME)> was encountered while matching, then it is that position
 which is used as the "skip point". If no C<(*MARK)> of that name was
@@ -1410,7 +1345,7 @@ outputs
     Count=2
 
 Once the 'aaab' at the start of the string has matched, and the C<(*SKIP)>
-executed, the next startpoint will be where the cursor was when the
+executed, the next starting point will be where the cursor was when the
 C<(*SKIP)> was executed.
 
 =item C<(*MARK:NAME)> C<(*:NAME)>
@@ -1420,8 +1355,7 @@ This zero-width pattern can be used to mark the point reached in a string
 when a certain part of the pattern has been successfully matched. This
 mark may be given a name. A later C<(*SKIP)> pattern will then skip
 forward to that point if backtracked into on failure. Any number of
-C<(*MARK)> patterns are allowed, and the NAME portion is optional and may
-be duplicated.
+C<(*MARK)> patterns are allowed, and the NAME portion may be duplicated.
 
 In addition to interacting with the C<(*SKIP)> pattern, C<(*MARK:NAME)>
 can be used to "label" a pattern branch, so that after matching, the
@@ -1433,7 +1367,7 @@ name of the most recently executed C<(*MARK:NAME)> that was involved
 in the match.
 
 This can be used to determine which branch of a pattern was matched
-without using a seperate capture buffer for each branch, which in turn
+without using a separate capture buffer for each branch, which in turn
 can result in a performance improvement, as perl cannot optimize
 C</(?:(x)|(y)|(z))/> as efficiently as something like
 C</(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/>.
@@ -1449,7 +1383,7 @@ As a shortcut C<(*MARK:NAME)> can be written C<(*:NAME)>.
 
 =item C<(*THEN)> C<(*THEN:NAME)>
 
-This is similar to the "cut group" operator C<::> from Perl6. Like
+This is similar to the "cut group" operator C<::> from Perl 6. Like
 C<(*PRUNE)>, this verb always matches, and when backtracked into on
 failure, it causes the regex engine to try the next alternation in the
 innermost enclosing group (capturing or otherwise).
@@ -1483,7 +1417,7 @@ backtrack and try C; but the C<(*PRUNE)> verb will simply fail.
 =item C<(*COMMIT)>
 X<(*COMMIT)>
 
-This is the Perl6 "commit pattern" C<< <commit> >> or C<:::>. It's a
+This is the Perl 6 "commit pattern" C<< <commit> >> or C<:::>. It's a
 zero-width pattern similar to C<(*SKIP)>, except that when backtracked
 into on failure it causes the match to fail outright. No further attempts
 to find a valid match by advancing the start pointer will occur again.
@@ -1567,7 +1501,7 @@ word following "foo" in the string "Food is on the foo table.":
 
     $_ = "Food is on the foo table.";
     if ( /\b(foo)\s+(\w+)/i ) {
-       print "$2 follows $1.\n";
+        print "$2 follows $1.\n";
     }
 
 When the match runs, the first part of the regular expression (C<\b(foo)>)
@@ -1585,7 +1519,7 @@ like this:
 
     $_ =  "The food is under the bar in the barn.";
     if ( /foo(.*)bar/ ) {
-       print "got <$1>\n";
+        print "got <$1>\n";
     }
 
 Which perhaps unexpectedly yields:
@@ -1605,8 +1539,8 @@ of a string, and you also want to keep the preceding part of the match.
 So you write this:
 
     $_ = "I have 2 numbers: 53147";
-    if ( /(.*)(\d*)/ ) {                               # Wrong!
-       print "Beginning is <$1>, number is <$2>.\n";
+    if ( /(.*)(\d*)/ ) {                                # Wrong!
+        print "Beginning is <$1>, number is <$2>.\n";
     }
 
 That won't work at all, because C<.*> was greedy and gobbled up the
@@ -1619,23 +1553,23 @@ Here are some variants, most of which don't work:
 
     $_ = "I have 2 numbers: 53147";
     @pats = qw{
-       (.*)(\d*)
-       (.*)(\d+)
-       (.*?)(\d*)
-       (.*?)(\d+)
-       (.*)(\d+)$
-       (.*?)(\d+)$
-       (.*)\b(\d+)$
-       (.*\D)(\d+)$
+        (.*)(\d*)
+        (.*)(\d+)
+        (.*?)(\d*)
+        (.*?)(\d+)
+        (.*)(\d+)$
+        (.*?)(\d+)$
+        (.*)\b(\d+)$
+        (.*\D)(\d+)$
     };
 
     for $pat (@pats) {
-       printf "%-12s ", $pat;
-       if ( /$pat/ ) {
-           print "<$1> <$2>\n";
-       } else {
-           print "FAIL\n";
-       }
+        printf "%-12s ", $pat;
+        if ( /$pat/ ) {
+            print "<$1> <$2>\n";
+        } else {
+            print "FAIL\n";
+        }
     }
 
 That will print out:
@@ -1661,8 +1595,8 @@ trickier.  Imagine you'd like to find a sequence of non-digits not
 followed by "123".  You might try to write that as
 
     $_ = "ABC123";
-    if ( /^\D*(?!123)/ ) {             # Wrong!
-       print "Yup, no 123 in $_\n";
+    if ( /^\D*(?!123)/ ) {                # Wrong!
+        print "Yup, no 123 in $_\n";
     }
 
 But that isn't going to match; at least, not the way you're hoping.  It
@@ -1844,7 +1778,7 @@ meaning of C<\1> is kludged in for C<s///>.  However, if you get into the habit
 of doing that, you get yourself into trouble if you then add an C</e>
 modifier.
 
-    s/(\d+)/ \1 + 1 /eg;       # causes warning under -w
+    s/(\d+)/ \1 + 1 /eg;            # causes warning under -w
 
 Or if you try to do
 
@@ -1870,7 +1804,7 @@ loops using regular expressions, with something as innocuous as:
 
 The C<o?> matches at the beginning of C<'foo'>, and since the position
 in the string is not moved by the match, C<o?> would match again and again
-because of the C<*> modifier.  Another common way to create a similar cycle
+because of the C<*> quantifier.  Another common way to create a similar cycle
 is with the looping modifier C<//g>:
 
     @matches = ( 'foo' =~ m{ o? }xg );
@@ -1885,12 +1819,12 @@ However, long experience has shown that many programming tasks may
 be significantly simplified by using repeated subexpressions that
 may match zero-length substrings.  Here's a simple example being:
 
-    @chars = split //, $string;                  # // is not magic in split
+    @chars = split //, $string;                  # // is not magic in split
     ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
 
 Thus Perl allows such constructs, by I<forcefully breaking
 the infinite loop>.  The rules for this are different for lower-level
-loops given by the greedy modifiers C<*+{}>, and for higher-level
+loops given by the greedy quantifiers C<*+{}>, and for higher-level
 ones like the C</g> modifier or split() operator.
 
 The lower-level loops are I<interrupted> (that is, the loop is
@@ -2058,7 +1992,7 @@ this:
     # We must also take care of not escaping the legitimate \\Y|
     # sequence, hence the presence of '\\' in the conversion rules.
     my %rules = ( '\\' => '\\\\',
-                 'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
+                  'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
     sub convert {
       my $re = shift;
       $re =~ s{
@@ -2083,9 +2017,9 @@ part of this regular expression needs to be converted explicitly
 
 =head1 PCRE/Python Support
 
-As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
 to the regex syntax. While Perl programmers are encouraged to use the
-Perl specific syntax, the following are legal in Perl 5.10:
+Perl specific syntax, the following are also accepted:
 
 =over 4
 
@@ -2105,6 +2039,17 @@ Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>.
 
 =head1 BUGS
 
+There are numerous problems with case insensitive matching of characters
+outside the ASCII range, especially with those whose folds are multiple
+characters, such as ligatures like C<LATIN SMALL LIGATURE FF>.
+
+In a bracketed character class with case insensitive matching, ranges only work
+for ASCII characters.  For example,
+C<m/[\N{CYRILLIC CAPITAL LETTER A}-\N{CYRILLIC CAPITAL LETTER YA}]/i>
+doesn't match all the Russian upper and lower case letters.
+
+Many regular expression constructs don't work on EBCDIC platforms.
+
 This document varies from difficult to understand to completely
 and utterly opaque.  The wandering prose riddled with jargon is
 hard to fathom in several places.