This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
- $_ **= $_ , / {$_} / for 2 .. 42;
+ $_ **= $_ , / {$_} / for 2 .. 42;
By default, a quantified subpattern is "greedy", that is, it will match as
many times as possible (given a particular starting location) while still
the same as matching an English word). If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
current locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes (though not as either end of
-a range). See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
+C<\d>, and C<\D> within character classes, but if you try to use them
+as endpoints of a range, that's not a range, the "-" is understood literally.
+See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
The POSIX character class syntax
- [:class:]
+ [:class:]
is also available. The available classes and their backslash
equivalents (if available) are as follows:
Note that the C<[]> are part of the C<[::]> construct, not part of the whole
character class. For example:
- [01[:alpha:]%]
+ [01[:alpha:]%]
matches one, zero, any alphabetic character, and the percentage sign.
=item cntrl
- Any control character. Usually characters that don't produce
- output as such but instead control the terminal somehow:
- for example newline and backspace are control characters.
- All characters with ord() less than 32 are most often control
- classified as characters.
+Any control character. Usually characters that don't produce output as
+such but instead control the terminal somehow: for example newline and
+backspace are control characters. All characters with ord() less than
+32 are most often control classified as characters.
=item graph
- Any alphanumeric or punctuation character.
+Any alphanumeric or punctuation character.
=item print
- Any alphanumeric or punctuation character or space.
+Any alphanumeric or punctuation character or space.
=item punct
- Any punctuation character.
+Any punctuation character.
=item xdigit
- Any hexadecimal digit. Though this may feel silly
- (/0-9a-f/i would work just fine) it is included
- for completeness.
-
-=item
+Any hexadecimal digit. Though this may feel silly (/0-9a-f/i would
+work just fine) it is included for completeness.
=back
first character after the "[" is "^", the class matches any character not
in the list. Within a list, the "-" character specifies a
range, so that C<a-z> represents all characters between "a" and "z",
-inclusive. If you want "-" itself to be a member of a class, put it
-at the start or end of the list, or escape it with a backslash. (The
+inclusive. If you want either "-" or "]" itself to be a member of a
+class, put it at the start of the list (possibly after a "^"), or
+escape it with a backslash. "-" is also taken literally when it is
+at the end of the list, just before the closing "]". (The
following all specify the same class of three characters: C<[-az]>,
C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
specifies a class containing twenty-six characters.)
+Also, if you try to use the character classes C<\w>, C<\W>, C<\s>,
+C<\S>, C<\d>, or C<\D> as endpoints of a range, that's not a range,
+the "-" is understood literally.
Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results