\n Newline
\r Carriage return
\t Tab
- \038 Any octal ASCII value
+ \037 Any octal ASCII value
\x7f Any hexadecimal ASCII value
\x{263a} A wide hexadecimal value
\cx Control-x
\N{name} A named character
\l Lowercase next character
- \u Uppercase next character
+ \u Titlecase next character
\L Lowercase until \E
- \U Titlecase until \E
+ \U Uppercase until \E
\Q Disable pattern metacharacters until \E
\E End case modification
[f-j-] Dash escaped or at start or end means 'dash'
[^f-j] Caret indicates "match any character _except_ these"
-The following work within or without a character class:
+The following sequences work within or without a character class.
+The first six are locale aware, all are Unicode aware. The default
+character class equivalent are given. See L<perllocale> and
+L<perlunicode> for details.
- \d A digit, same as [0-9]
- \D A nondigit, same as [^0-9]
- \w A word character (alphanumeric), same as [a-zA-Z0-9_]
- \W A non-word character, [^a-zA-Z0-9_]
- \s A whitespace character, same as [ \t\n\r\f]
- \S A non-whitespace character, [^ \t\n\r\f]
- \C Match a byte (with Unicode, '.' matches char)
+ \d A digit [0-9]
+ \D A nondigit [^0-9]
+ \w A word character [a-zA-Z0-9_]
+ \W A non-word character [^a-zA-Z0-9_]
+ \s A whitespace character [ \t\n\r\f]
+ \S A non-whitespace character [^ \t\n\r\f]
+
+ \C Match a byte (with Unicode, '.' matches a character)
\pP Match P-named (Unicode) property
\p{...} Match Unicode property with long name
\PP Match non-P
POSIX character classes and their Unicode and Perl equivalents:
- alnum IsAlnum Alphanumeric
- alpha IsAlpha Alphabetic
- ascii IsASCII Any ASCII char
- blank IsSpace [ \t] Horizontal whitespace (GNU)
- cntrl IsCntrl Control characters
- digit IsDigit \d Digits
- graph IsGraph Alphanumeric and punctuation
- lower IsLower Lowercase chars (locale aware)
- print IsPrint Alphanumeric, punct, and space
- punct IsPunct Punctuation
- space IsSpace [\s\ck] Whitespace
- IsSpacePerl \s Perl's whitespace definition
- upper IsUpper Uppercase chars (locale aware)
- word IsWord \w Alphanumeric plus _ (Perl)
- xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit
+ alnum IsAlnum Alphanumeric
+ alpha IsAlpha Alphabetic
+ ascii IsASCII Any ASCII char
+ blank IsSpace [ \t] Horizontal whitespace (GNU extension)
+ cntrl IsCntrl Control characters
+ digit IsDigit \d Digits
+ graph IsGraph Alphanumeric and punctuation
+ lower IsLower Lowercase chars (locale and Unicode aware)
+ print IsPrint Alphanumeric, punct, and space
+ punct IsPunct Punctuation
+ space IsSpace [\s\ck] Whitespace
+ IsSpacePerl \s Perl's whitespace definition
+ upper IsUpper Uppercase chars (locale and Unicode aware)
+ word IsWord \w Alphanumeric plus _ (Perl extension)
+ xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
Within a character class:
split Use regex to split a string into parts
-The first four of these are identical to the escape sequences \l, \u,
-\L, and \U. For Titlecase, see L</Titlecase>.
+The first four of these are like the escape sequences C<\L>, C<\l>,
+C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
-=head2 Terminology
+=head2 TERMINOLOGY
=head3 Titlecase
Unicode concept which most often is equal to uppercase, but for
certain characters like the German "sharp s" there is a difference.
-=head2 AUTHOR
+=head1 AUTHOR
Iain Truskett.
This document may be distributed under the same terms as Perl itself.
-=head2 SEE ALSO
+=head1 SEE ALSO
=over 4
=back
-=head2 THANKS
+=head1 THANKS
David P.C. Wollmann,
Richard Soderberg,