matches zero, one, any alphabetic character, and the percentage sign.
If the C<utf8> pragma is used, the following equivalences to Unicode
-\p{} constructs hold:
+\p{} constructs and equivalent backslash character classes (if available),
+will hold:
alpha IsAlpha
alnum IsAlnum
ascii IsASCII
blank IsSpace
cntrl IsCntrl
- digit IsDigit
+ digit IsDigit \d
graph IsGraph
lower IsLower
print IsPrint
punct IsPunct
space IsSpace
+ IsSpacePerl \s
upper IsUpper
word IsWord
xdigit IsXDigit
internal optimizations done by the regular expression engine, this will
take a painfully long time to run:
- 'aaaaaaaaaaaa' =~ /((a{0,5}){0,5}){0,5}[c]/
+ 'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
-And if you used C<*>'s instead of limiting it to 0 through 5 matches,
-then it would take forever--or until you ran out of stack space.
+And if you used C<*>'s in the internal groups instead of limiting them
+to 0 through 5 matches, then it would take forever--or until you ran
+out of stack space. Moreover, these internal optimizations are not
+always applicable. For example, if you put C<{0,5}> instead of C<*>
+on the external group, no current optimization is applicable, and the
+match takes a long time to finish.
A powerful tool for optimizing such beasts is what is known as an
"independent group",
notion of better/worse for combining operators. In the description
below C<S> and C<T> are regular subexpressions.
-=over
+=over 4
=item C<ST>