perlfunc.pod grammar fixes

[p5sagit/p5-mst-13.2.git] / pod / perlre.pod
diff --git a/pod/perlre.pod b/pod/perlre.pod

index bb52113..39110ff 100644 (file)
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -6,7 +6,7 @@ perlre - Perl regular expressions
 
 This page describes the syntax of regular expressions in Perl.  
 
-if you haven't used regular expressions before, a quick-start
+If you haven't used regular expressions before, a quick-start
 introduction is available in L<perlrequick>, and a longer tutorial
 introduction is available in L<perlretut>.
 
@@ -41,11 +41,7 @@ line anywhere within the string.
 Treat string as single line.  That is, change "." to match any character
 whatsoever, even a newline, which normally it would not match.
 
-The C</s> and C</m> modifiers both override the C<$*> setting.  That
-is, no matter what C<$*> contains, C</s> without C</m> will force
-"^" to match only at the beginning of the string and "$" to match
-only at the end (or just before a newline at the end) of the string.
-Together, as /ms, they let the "." match any character whatsoever,
+Used together, as /ms, they let the "." match any character whatsoever,
 while still allowing "^" and "$" to match, respectively, just after
 and just before newlines within the string.
 
@@ -103,13 +99,11 @@ string as a multi-line buffer, such that the "^" will match after any
 newline within the string, and "$" will match before any newline.  At the
 cost of a little more overhead, you can do this by using the /m modifier
 on the pattern match operator.  (Older programs did this by setting C<$*>,
-but this practice is now deprecated.)
+but this practice has been removed in perl 5.9.)
 
 To simplify multi-line substitutions, the "." character never matches a
 newline unless you use the C</s> modifier, which in effect tells Perl to pretend
-the string is a single line--even if it isn't.  The C</s> modifier also
-overrides the setting of C<$*>, in case you have some (badly behaved) older
-code that sets it in another module.
+the string is a single line--even if it isn't.
 
 The following standard quantifiers are recognized:
 
@@ -121,7 +115,8 @@ The following standard quantifiers are recognized:
     {n,m}  Match at least n but not more than m times
 
 (If a curly bracket occurs in any other context, it is treated
-as a regular character.)  The "*" modifier is equivalent to C<{0,}>, the "+"
+as a regular character.  In particular, the lower bound
+is not optional.)  The "*" modifier is equivalent to C<{0,}>, the "+"
 modifier to C<{1,}>, and the "?" modifier to C<{0,1}>.  n and m are limited
 to integral values less than a preset limit defined when perl is built.
 This is usually 32766 on the most common platforms.  The actual limit can
@@ -187,6 +182,7 @@ In addition, Perl defines the following:
     \C Match a single C char (octet) even under Unicode.
        NOTE: breaks up characters into their UTF-8 bytes,
        so you may end up with malformed pieces of UTF-8.
+       Unsupported in lookbehind.
 
 A C<\w> matches a single alphanumeric character (an alphabetic
 character, or a decimal digit) or C<_>, not a whole word.  Use C<\w+>
@@ -199,7 +195,7 @@ as endpoints of a range, that's not a range, the "-" is understood
 literally.  If Unicode is in effect, C<\s> matches also "\x{85}",
 "\x{2028}, and "\x{2029}", see L<perlunicode> for more details about
 C<\pP>, C<\PP>, and C<\X>, and L<perluniintro> about Unicode in general.
-You can define your own C<\p> and C<\P> propreties, see L<perlunicode>.
+You can define your own C<\p> and C<\P> properties, see L<perlunicode>.
 
 The POSIX character class syntax
 
@@ -227,12 +223,12 @@ equivalents (if available) are as follows:
 
 =item [1]
 
-A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'.
+A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace".
 
 =item [2]
 
 Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
-also the (very rare) `vertical tabulator', "\ck", chr(11).
+also the (very rare) "vertical tabulator", "\ck", chr(11).
 
 =item [3]
 
@@ -256,7 +252,7 @@ backslash character classes (if available), will hold:
     alpha       IsAlpha
     alnum       IsAlnum
     ascii       IsASCII
-    blank      IsSpace
+    blank       IsSpace
     cntrl       IsCntrl
     digit       IsDigit        \d
     graph       IsGraph
@@ -273,7 +269,7 @@ For example C<[:lower:]> and C<\p{IsLower}> are equivalent.
 
 If the C<utf8> pragma is not used but the C<locale> pragma is, the
 classes correlate with the usual isalpha(3) interface (except for
-`word' and `blank').
+"word" and "blank").
 
 The assumedly non-obviously named classes are:
 
@@ -398,11 +394,15 @@ the most-recently closed group (submatch). C<$^N> can be used in
 extended patterns (see below), for example to assign a submatch to a
 variable. 
 
-The numbered variables ($1, $2, $3, etc.) and the related punctuation
+The numbered match variables ($1, $2, $3, etc.) and the related punctuation
 set (C<$+>, C<$&>, C<$`>, C<$'>, and C<$^N>) are all dynamically scoped
 until the end of the enclosing block or until the next successful
 match, whichever comes first.  (See L<perlsyn/"Compound Statements">.)
 
+B<NOTE>: failed matches in Perl do not reset the match variables,
+which makes it easier to write code that tests for a series of more
+specific cases and remembers the best match.
+
 B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, or
 C<$'> anywhere in the program, it has to provide them for every
 pattern match.  This may substantially slow your program.  Perl
@@ -562,7 +562,7 @@ only for fixed-width look-behind.
 B<WARNING>: This extended regular expression feature is considered
 highly experimental, and may be changed or deleted without notice.
 
-This zero-width assertion evaluate any embedded Perl code.  It
+This zero-width assertion evaluates any embedded Perl code.  It
 always succeeds, and its C<code> is not interpolated.  Currently,
 the rules to determine where the C<code> ends are somewhat convoluted.
 
@@ -574,6 +574,10 @@ track of the number of nested parentheses. For example:
   /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
   print "color = $color, animal = $animal\n";
 
+Inside the C<(?{...})> block, C<$_> refers to the string the regular
+expression is matching against. You can also use C<pos()> to know what is
+the current position of matching within this string.
+
 The C<code> is properly scoped in the following sense: If the assertion
 is backtracked (compare L<"Backtracking">), all changes introduced after
 C<local>ization are undone, so that
@@ -625,7 +629,7 @@ although it could raise an exception from an illegal pattern.  If
 you turn on the C<use re 'eval'>, though, it is no longer secure,
 so you should only do so if you are also using taint checking.
 Better yet, use the carefully constrained evaluation within a Safe
-module.  See L<perlsec> for details about both these mechanisms.
+compartment.  See L<perlsec> for details about both these mechanisms.
 
 =item C<(??{ code })>
 
@@ -1261,7 +1265,7 @@ Overloaded constants (see L<overload>) provide a simple way to extend
 the functionality of the RE engine.
 
 Suppose that we want to enable a new RE escape-sequence C<\Y|> which
-matches at boundary between white-space characters and non-whitespace
+matches at boundary between whitespace characters and non-whitespace
 characters.  Note that C<(?=\S)(?<!\S)|(?!\S)(?<=\S)> matches exactly
 at these positions, so we want to have each C<\Y|> in the place of the
 more complicated version.  We can create a module C<customre> to do
@@ -1278,7 +1282,9 @@ this:
 
     sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
 
-    my %rules = ( '\\' => '\\', 
+    # We must also take care of not escaping the legitimate \\Y|
+    # sequence, hence the presence of '\\' in the conversion rules.
+    my %rules = ( '\\' => '\\\\', 
                  'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
     sub convert {
       my $re = shift;