character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code. This also means that if you want real
whitespace or C<#> characters in the pattern (outside a character
-class, where they are unaffected by C</x>), that you'll either have to
-escape them or encode them using octal or hex escapes. Taken together,
-these features go a long way towards making Perl's regular expressions
-more readable. Note that you have to be careful not to include the
-pattern delimiter in the comment--perl has no way of knowing you did
-not intend to close the pattern early. See the C-comment deletion code
-in L<perlop>.
+class, where they are unaffected by C</x>), then you'll either have to
+escape them (using backslashes or C<\Q...\E>) or encode them using octal
+or hex escapes. Taken together, these features go a long way towards
+making Perl's regular expressions more readable. Note that you have to
+be careful not to include the pattern delimiter in the comment--perl has
+no way of knowing you did not intend to close the pattern early. See
+the C-comment deletion code in L<perlop>. Also note that anything inside
+a C<\Q...\E> stays unaffected by C</x>.
X</x>
=head2 Regular Expressions
[:class:]
-is also available. The available classes and their backslash
-equivalents (if available) are as follows:
+is also available. Note that the C<[> and C<]> braces are I<literal>;
+they must always be used within a character class expression.
+
+ # this is correct:
+ $string =~ /[[:alpha:]]/;
+
+ # this is not, and will generate a warning:
+ $string =~ /[:alpha:]/;
+
+The available classes and their backslash equivalents (if available) are
+as follows:
X<character class>
X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
backslash character classes (if available), will hold:
X<character class> X<\p> X<\p{}>
- [:...:] \p{...} backslash
+ [[:...:]] \p{...} backslash
alpha IsAlpha
alnum IsAlnum
word IsWord
xdigit IsXDigit
-For example C<[:lower:]> and C<\p{IsLower}> are equivalent.
+For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
If the C<utf8> pragma is not used but the C<locale> pragma is, the
classes correlate with the usual isalpha(3) interface (except for
with a '^'. This is a Perl extension. For example:
X<character class, negation>
- POSIX traditional Unicode
+ POSIX traditional Unicode
- [:^digit:] \D \P{IsDigit}
- [:^space:] \S \P{IsSpace}
- [:^word:] \W \P{IsWord}
+ [[:^digit:]] \D \P{IsDigit}
+ [[:^space:]] \S \P{IsSpace}
+ [[:^word:]] \W \P{IsWord}
Perl respects the POSIX standard in that POSIX character classes are
only supported within a character class. The POSIX character classes
Better yet, use the carefully constrained evaluation within a Safe
compartment. See L<perlsec> for details about both these mechanisms.
+Because perl's regex engine is not currently re-entrant, interpolated
+code may not invoke the regex engine either directly with C<m//> or C<s///>),
+or indirectly with functions such as C<split>.
+
=item C<(??{ code })>
X<(??{})>
X<regex, postponed> X<regexp, postponed> X<regular expression, postponed>
\)
}x;
+Because perl's regex engine is not currently re-entrant, delayed
+code may not invoke the regex engine either directly with C<m//> or C<s///>),
+or indirectly with functions such as C<split>.
+
=item C<< (?>pattern) >>
X<backtrack> X<backtracking>
claims that there is no 123 in the string. Here's a clearer picture of
why that pattern matches, contrary to popular expectations:
- $x = 'ABC123' ;
- $y = 'ABC445' ;
+ $x = 'ABC123';
+ $y = 'ABC445';
- print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;
- print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;
+ print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
+ print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
- print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;
- print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;
+ print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
+ print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
This prints
of the string in their match. So rewriting this way produces what
you'd expect; that is, case 5 will fail, but case 6 succeeds:
- print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;
- print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;
+ print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
+ print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
6: got ABC