In addition, Perl defines the following:
\w Match a "word" character (alphanumeric plus "_")
- \W Match a non-word character
+ \W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
equivalent to C<(?:\PM\pM*)>
\C Match a single C char (octet) even under utf8.
-A C<\w> matches a single alphanumeric character, not a whole word.
+A C<\w> matches a single alphanumeric character or C<_>, not a whole word.
Use C<\w+> to match a string of Perl-identifier characters (which isn't
the same as matching an English word). If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
=item graph
-Any alphanumeric or punctuation character.
+Any alphanumeric or punctuation (special) character.
=item print
-Any alphanumeric or punctuation character or space.
+Any alphanumeric or punctuation (special) character or space.
=item punct
-Any punctuation character.
+Any punctuation (special) character.
=item xdigit
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
-use for a pattern. Simply quote all non-alphanumeric characters:
+use for a pattern. Simply quote all non-"word" characters:
$pattern =~ s/(\W)/\\$1/g;
+(If C<use locale> is set, then this depends on the current locale.)
Today it is more common to use the quotemeta() function or the C<\Q>
metaquoting escape sequence to disable all metacharacters' special
meanings like this:
in many situations where on the first sight a simple C<()*> looks like
the correct solution. Suppose we parse text with comments being delimited
by C<#> followed by some optional (horizontal) whitespace. Contrary to
-its appearence, C<#[ \t]*> I<is not> the correct subexpression to match
+its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
the comment delimiter, because it may "give up" some whitespace if
the remainder of the pattern can be made to match that way. The correct
answer is either one of these: