X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=380bc5f1f38231b10aaf3489b25cecfa255f3c34;hb=ba210ebec161cde003bc967e8e460c72f71fb70c;hp=2db4139c30ddb7095ed43d12eeb14628704d3241;hpb=ee8c7f5465f003860e2347a2946abacac39bd9b9;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 2db4139..380bc5f 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -169,7 +169,7 @@ You'll need to write something like C. In addition, Perl defines the following: \w Match a "word" character (alphanumeric plus "_") - \W Match a non-word character + \W Match a non-"word" character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character @@ -180,7 +180,7 @@ In addition, Perl defines the following: equivalent to C<(?:\PM\pM*)> \C Match a single C char (octet) even under utf8. -A C<\w> matches a single alphanumeric character, not a whole word. +A C<\w> matches a single alphanumeric character or C<_>, not a whole word. Use C<\w+> to match a string of Perl-identifier characters (which isn't the same as matching an English word). If C is in effect, the list of alphabetic characters generated by C<\w> is taken from the @@ -199,24 +199,30 @@ equivalents (if available) are as follows: alpha alnum ascii + blank [1] cntrl digit \d graph lower print punct - space \s + space \s [2] upper - word \w + word \w [3] xdigit + [1] A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'. + [2] Not I to C<\s> since the C<[[:space:]]> includes + also the (very rare) `vertical tabulator', "\ck", chr(11). + [3] A Perl extension. + For example use C<[:upper:]> to match all the uppercase characters. -Note that the C<[]> are part of the C<[::]> construct, not part of the whole -character class. For example: +Note that the C<[]> are part of the C<[::]> construct, not part of the +whole character class. For example: [01[:alpha:]%] -matches one, zero, any alphabetic character, and the percentage sign. +matches zero, one, any alphabetic character, and the percentage sign. If the C pragma is used, the following equivalences to Unicode \p{} constructs hold: @@ -224,6 +230,7 @@ If the C pragma is used, the following equivalences to Unicode alpha IsAlpha alnum IsAlnum ascii IsASCII + blank IsSpace cntrl IsCntrl digit IsDigit graph IsGraph @@ -238,8 +245,8 @@ If the C pragma is used, the following equivalences to Unicode For example C<[:lower:]> and C<\p{IsLower}> are equivalent. If the C pragma is not used but the C pragma is, the -classes correlate with the isalpha(3) interface (except for `word', -which is a Perl extension, mirroring C<\w>). +classes correlate with the usual isalpha(3) interface (except for +`word' and `blank'). The assumedly non-obviously named classes are: @@ -250,23 +257,24 @@ The assumedly non-obviously named classes are: Any control character. Usually characters that don't produce output as such but instead control the terminal somehow: for example newline and backspace are control characters. All characters with ord() less than -32 are most often classified as control characters. +32 are most often classified as control characters (assuming ASCII, +the ISO Latin character sets, and Unicode). =item graph -Any alphanumeric or punctuation character. +Any alphanumeric or punctuation (special) character. =item print -Any alphanumeric or punctuation character or space. +Any alphanumeric or punctuation (special) character or space. =item punct -Any punctuation character. +Any punctuation (special) character. =item xdigit -Any hexadecimal digit. Though this may feel silly (/0-9a-f/i would +Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would work just fine) it is included for completeness. =back @@ -352,7 +360,7 @@ everything before the matched string. And C<$'> returns everything after the matched string. The numbered variables ($1, $2, $3, etc.) and the related punctuation -set (C<<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped +set (C<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See L.) @@ -377,10 +385,11 @@ that looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a string that you want to -use for a pattern. Simply quote all non-alphanumeric characters: +use for a pattern. Simply quote all non-"word" characters: $pattern =~ s/(\W)/\\$1/g; +(If C is set, then this depends on the current locale.) Today it is more common to use the quotemeta() function or the C<\Q> metaquoting escape sequence to disable all metacharacters' special meanings like this: