For full information see L<perlre> and L<perlop>, as well
as the L</"SEE ALSO"> section in this document.
-=head1 OPERATORS
+=head2 OPERATORS
=~ determines to which variable the regex is applied.
In its absence, $_ is used.
as a double quoted string unless a single-quote (') is the delimiter.
?pattern? is like m/pattern/ but matches only once. No alternate
- delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
+ delimiters can be used. Must be reset with reset().
-=head1 SYNTAX
+=head2 SYNTAX
\ Escapes the character immediately following it
. Matches any single character except a newline (unless /s is used)
\n Newline
\r Carriage return
\t Tab
- \038 Any octal ASCII value
+ \037 Any octal ASCII value
\x7f Any hexadecimal ASCII value
\x{263a} A wide hexadecimal value
\cx Control-x
\N{name} A named character
\l Lowercase next character
- \u Uppercase next character
+ \u Titlecase next character
\L Lowercase until \E
\U Uppercase until \E
\Q Disable pattern metacharacters until \E
\E End case modification
+For Titlecase, see L</Titlecase>.
+
This one works differently from normal strings:
\b An assertion, not backspace, except in a character class
[f-j-] Dash escaped or at start or end means 'dash'
[^f-j] Caret indicates "match any character _except_ these"
-The following work within or without a character class:
+The following sequences work within or without a character class.
+The first six are locale aware, all are Unicode aware. The default
+character class equivalent are given. See L<perllocale> and
+L<perlunicode> for details.
+
+ \d A digit [0-9]
+ \D A nondigit [^0-9]
+ \w A word character [a-zA-Z0-9_]
+ \W A non-word character [^a-zA-Z0-9_]
+ \s A whitespace character [ \t\n\r\f]
+ \S A non-whitespace character [^ \t\n\r\f]
- \d A digit, same as [0-9]
- \D A nondigit, same as [^0-9]
- \w A word character (alphanumeric), same as [a-zA-Z0-9_]
- \W A non-word character, [^a-zA-Z0-9_]
- \s A whitespace character, same as [ \t\n\r\f]
- \S A non-whitespace character, [^ \t\n\r\f]
- \C Match a byte (with Unicode, '.' matches char)
+ \C Match a byte (with Unicode, '.' matches a character)
\pP Match P-named (Unicode) property
\p{...} Match Unicode property with long name
\PP Match non-P
POSIX character classes and their Unicode and Perl equivalents:
- alnum IsAlnum Alphanumeric
- alpha IsAlpha Alphabetic
- ascii IsASCII Any ASCII char
- blank IsSpace [ \t] Horizontal whitespace (GNU)
- cntrl IsCntrl Control characters
- digit IsDigit \d Digits
- graph IsGraph Alphanumeric and punctuation
- lower IsLower Lowercase chars (locale aware)
- print IsPrint Alphanumeric, punct, and space
- punct IsPunct Punctuation
- space IsSpace [\s\ck] Whitespace
- IsSpacePerl \s Perl's whitespace definition
- upper IsUpper Uppercase chars (locale aware)
- word IsWord \w Alphanumeric plus _ (Perl)
- xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit
+ alnum IsAlnum Alphanumeric
+ alpha IsAlpha Alphabetic
+ ascii IsASCII Any ASCII char
+ blank IsSpace [ \t] Horizontal whitespace (GNU extension)
+ cntrl IsCntrl Control characters
+ digit IsDigit \d Digits
+ graph IsGraph Alphanumeric and punctuation
+ lower IsLower Lowercase chars (locale and Unicode aware)
+ print IsPrint Alphanumeric, punct, and space
+ punct IsPunct Punctuation
+ space IsSpace [\s\ck] Whitespace
+ IsSpacePerl \s Perl's whitespace definition
+ upper IsUpper Uppercase chars (locale and Unicode aware)
+ word IsWord \w Alphanumeric plus _ (Perl extension)
+ xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
Within a character class:
(?(cond)yes|no) cond being integer corresponding to capturing parens
(?(cond)yes) or a lookaround/eval zero-width assertion
-=head1 VARIABLES
+=head2 VARIABLES
$_ Default variable for operators to use
$* Enable multiline matching (deprecated; not in 5.9.0 or later)
Captured groups are numbered according to their I<opening> paren.
-=head1 FUNCTIONS
+=head2 FUNCTIONS
lc Lowercase a string
lcfirst Lowercase first char of a string
uc Uppercase a string
- ucfirst Uppercase first char of a string
+ ucfirst Titlecase first char of a string
+
pos Return or set current match position
quotemeta Quote metacharacters
reset Reset ?pattern? status
split Use regex to split a string into parts
+The first four of these are like the escape sequences C<\L>, C<\l>,
+C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
+
+=head2 TERMINOLOGY
+
+=head3 Titlecase
+
+Unicode concept which most often is equal to uppercase, but for
+certain characters like the German "sharp s" there is a difference.
+
=head1 AUTHOR
Iain Truskett.