In addition, Perl defines the following:
\w Match a "word" character (alphanumeric plus "_")
- \W Match a non-word character
+ \W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
equivalent to C<(?:\PM\pM*)>
\C Match a single C char (octet) even under utf8.
-A C<\w> matches a single alphanumeric character, not a whole word.
+A C<\w> matches a single alphanumeric character or C<_>, not a whole word.
Use C<\w+> to match a string of Perl-identifier characters (which isn't
the same as matching an English word). If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
[01[:alpha:]%]
-matches one, zero, any alphabetic character, and the percentage sign.
+matches zero, one, any alphabetic character, and the percentage sign.
If the C<utf8> pragma is used, the following equivalences to Unicode
\p{} constructs hold:
Any control character. Usually characters that don't produce output as
such but instead control the terminal somehow: for example newline and
backspace are control characters. All characters with ord() less than
-32 are most often classified as control characters.
+32 are most often classified as control characters (assuming ASCII,
+the ISO Latin character sets, and Unicode).
=item graph
-Any alphanumeric or punctuation character.
+Any alphanumeric or punctuation (special) character.
=item print
-Any alphanumeric or punctuation character or space.
+Any alphanumeric or punctuation (special) character or space.
=item punct
-Any punctuation character.
+Any punctuation (special) character.
=item xdigit
-Any hexadecimal digit. Though this may feel silly (/0-9a-f/i would
+Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would
work just fine) it is included for completeness.
=back
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
-use for a pattern. Simply quote all non-alphanumeric characters:
+use for a pattern. Simply quote all non-"word" characters:
$pattern =~ s/(\W)/\\$1/g;
+(If C<use locale> is set, then this depends on the current locale.)
Today it is more common to use the quotemeta() function or the C<\Q>
metaquoting escape sequence to disable all metacharacters' special
meanings like this: