"\n" =~ /(?s:.)/ # Match (local 'single line' modifier)
"ab" =~ /^.$/ # No match (dot matches one character)
-
=head2 Backslashed sequences
Perl regular expressions contain many backslashed sequences that
\S Match a non-white space character.
\h Match a horizontal white space character.
\H Match a character that isn't horizontal white space.
+ \N Match a character that isn't newline.
\v Match a vertical white space character.
\V Match a character that isn't vertical white space.
\pP, \p{Prop} Match a character matching a Unicode property.
=head3 White space
-C<\s> matches any single character that is consider white space. In the
+C<\s> matches any single character that is considered white space. In the
ASCII range, C<\s> matches the horizontal tab (C<\t>), the new line
(C<\n>), the form feed (C<\f>), the carriage return (C<\r>), and the
space (the vertical tab, C<\cK> is not matched by C<\s>). The exact set
this includes the space and the tab characters. C<\H> will match any character
that is not considered horizontal white space.
+C<\N>, like the dot, will match any character that is not a newline. The
+difference is that C<\N> will not be influenced by the single line C</s>
+regular expression modifier. (Note that, since C<\N{}> is also used for
+Unicode named characters, if C<\N> is followed by an opening brace and
+by a letter, perl will assume that a Unicode character name is coming.)
+
C<\v> will match any character that is considered vertical white space;
this includes the carriage return and line feed characters (newline).
C<\V> will match any character that is not considered vertical white space.
C<\pP> and C<\p{Prop}> are character classes to match characters that
fit given Unicode classes. One letter classes can be used in the C<\pP>
-form, with the class name following the C<\p>, otherwise, the property
-name is enclosed in braces, and follows the C<\p>. For instance, a
-match for a number can be written as C</\pN/> or as C</\p{Number}/>.
-Lowercase letters are matched by the property I<LowercaseLetter> which
-has as short form I<Ll>. They have to be written as C</\p{Ll}/> or
-C</\p{LowercaseLetter}/>. C</\pLl/> is valid, but means something different.
+form, with the class name following the C<\p>, otherwise, braces are required.
+There is a single form, which is just the property name enclosed in the braces,
+and a compound form which looks like C<\p{name=value}>, which means to match
+if the property C<name> for the character has the particular C<value>.
+For instance, a match for a number can be written as C</\pN/> or as
+C</\p{Number}/>, or as C</\p{Number=True}/>.
+Lowercase letters are matched by the property I<Lowercase_Letter> which
+has as short form I<Ll>. They need the braces, so are written as C</\p{Ll}/> or
+C</\p{Lowercase_Letter}/>, or C</\p{General_Category=Lowercase_Letter}/>
+(the underscores are optional).
+C</\pLl/> is valid, but means something different.
It matches a two character string: a letter (Unicode property C<\pL>),
followed by a lowercase C<l>.
-For a list of possible properties, see
-L<perlunicode/Unicode Character Properties>. It is also possible to
-defined your own properties. This is discussed in
+For more details, see L<perlunicode/Unicode Character Properties>; for a
+complete list of possible properties, see
+L<perluniprops/Properties accessible through \p{} and \P{}>.
+It is also possible to define your own properties. This is discussed in
L<perlunicode/User-Defined Character Properties>.
alpha Any alphabetical character.
alnum Any alphanumerical character.
ascii Any ASCII character.
- blank A GNU extension, equal to a space or a horizontal tab (C<\t>).
+ blank A GNU extension, equal to a space or a horizontal tab ("\t").
cntrl Any control character.
- digit Any digit, equivalent to C<\d>.
+ digit Any digit, equivalent to "\d".
graph Any printable character, excluding a space.
lower Any lowercase character.
print Any printable character, including a space.
punct Any punctuation character.
- space Any white space character. C<\s> plus the vertical tab (C<\cK>).
+ space Any white space character. "\s" plus the vertical tab ("\cK").
upper Any uppercase character.
- word Any "word" character, equivalent to C<\w>.
+ word Any "word" character, equivalent to "\w".
xdigit Any hexadecimal digit, '0' - '9', 'a' - 'f', 'A' - 'F'.
The exact set of characters matched depends on whether the source string
word IsWord
xdigit IsXDigit
-Some character classes may have a non-obvious name:
+Some of these names may not be obvious:
=over 4