X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=61907f8a488d426a42d882bf2be3533b738a5ad4;hb=5269aecde866056a77e32c937c7c3182bb599487;hp=182f5bd03ffcc055ddb7eb725ade5e3717772992;hpb=e1901655935137420b3a46ad23c873753fcbbbc7;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 182f5bd..61907f8 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -40,7 +40,7 @@ is, no matter what C<$*> contains, C without C will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, -while yet allowing "^" and "$" to match, respectively, just after +while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. =item x @@ -225,19 +225,21 @@ whole character class. For example: matches zero, one, any alphabetic character, and the percentage sign. If the C pragma is used, the following equivalences to Unicode -\p{} constructs hold: +\p{} constructs and equivalent backslash character classes (if available), +will hold: alpha IsAlpha alnum IsAlnum ascii IsASCII blank IsSpace cntrl IsCntrl - digit IsDigit + digit IsDigit \d graph IsGraph lower IsLower print IsPrint punct IsPunct space IsSpace + IsSpacePerl \s upper IsUpper word IsWord xdigit IsXDigit @@ -266,7 +268,7 @@ Any alphanumeric or punctuation (special) character. =item print -Any alphanumeric or punctuation (special) character or space. +Any alphanumeric or punctuation (special) character or the space character. =item punct @@ -331,12 +333,14 @@ I. There is no limit to the number of captured substrings that you may use. However Perl also uses \10, \11, etc. as aliases for \010, -\011, etc. (Recall that 0 means octal, so \011 is the 9'th ASCII -character, a tab.) Perl resolves this ambiguity by interpreting -\10 as a backreference only if at least 10 left parentheses have -opened before it. Likewise \11 is a backreference only if at least -11 left parentheses have opened before it. And so on. \1 through -\9 are always interpreted as backreferences." +\011, etc. (Recall that 0 means octal, so \011 is the character at +number 9 in your coded character set; which would be the 10th character, +a horizontal tab under ASCII.) Perl resolves this +ambiguity by interpreting \10 as a backreference only if at least 10 +left parentheses have opened before it. Likewise \11 is a +backreference only if at least 11 left parentheses have opened +before it. And so on. \1 through \9 are always interpreted as +backreferences. Examples: @@ -780,7 +784,7 @@ and the first "bar" thereafter. got Here's another example: let's say you'd like to match a number at the end -of a string, and you also want to keep the preceding part the match. +of a string, and you also want to keep the preceding of part the match. So you write this: $_ = "I have 2 numbers: 53147"; @@ -846,7 +850,7 @@ followed by "123". You might try to write that as But that isn't going to match; at least, not the way you're hoping. It claims that there is no 123 in the string. Here's a clearer picture of -why it that pattern matches, contrary to popular expectations: +why that pattern matches, contrary to popular expectations: $x = 'ABC123' ; $y = 'ABC445' ; @@ -952,10 +956,10 @@ escape it with a backslash. "-" is also taken literally when it is at the end of the list, just before the closing "]". (The following all specify the same class of three characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which -specifies a class containing twenty-six characters.) -Also, if you try to use the character classes C<\w>, C<\W>, C<\s>, -C<\S>, C<\d>, or C<\D> as endpoints of a range, that's not a range, -the "-" is understood literally. +specifies a class containing twenty-six characters, even on EBCDIC +based coded character sets.) Also, if you try to use the character +classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of +a range, that's not a range, the "-" is understood literally. Note also that the whole range idea is rather unportable between character sets--and even within character sets they may cause results @@ -967,11 +971,11 @@ spell out the character sets in full. Characters may be specified using a metacharacter syntax much like that used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return, "\f" a form feed, etc. More generally, \I, where I is a string -of octal digits, matches the character whose ASCII value is I. -Similarly, \xI, where I are hexadecimal digits, matches the -character whose ASCII value is I. The expression \cI matches the -ASCII character control-I. Finally, the "." metacharacter matches any -character except "\n" (unless you use C). +of octal digits, matches the character whose coded character set value +is I. Similarly, \xI, where I are hexadecimal digits, +matches the character whose numeric value is I. The expression \cI +matches the character control-I. Finally, the "." metacharacter +matches any character except "\n" (unless you use C). You can specify a series of alternatives for a pattern using "|" to separate them, so that C will match any of "fee", "fie", @@ -1093,7 +1097,7 @@ For example: $_ = 'bar'; s/\w??/<$&>/g; -results in C<"<><><><>">. At each position of the string the best +results in C<< <><><><> >>. At each position of the string the best match given by non-greedy C is the zero-length match, and the I match is what is matched by C<\w>. Thus zero-length matches alternate with one-character-long matches. @@ -1275,5 +1279,7 @@ L. L. +L. + I by Jeffrey Friedl, published by O'Reilly and Associates.