X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=61720db90676b4e62a849ae96f150b66159b9a6f;hb=ee6b43cc19efb39ed8a2fdad01d701e59dbdd946;hp=23a7b0fa7183e72b06636d9d8dafbb60e3f1551e;hpb=d74e8afc9309529cf5c6c4390fc311850865d506;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 23a7b0f..61720db 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -70,13 +70,14 @@ your regular expression into (slightly) more readable parts. The C<#> character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or C<#> characters in the pattern (outside a character -class, where they are unaffected by C), that you'll either have to -escape them or encode them using octal or hex escapes. Taken together, -these features go a long way towards making Perl's regular expressions -more readable. Note that you have to be careful not to include the -pattern delimiter in the comment--perl has no way of knowing you did -not intend to close the pattern early. See the C-comment deletion code -in L. +class, where they are unaffected by C), then you'll either have to +escape them (using backslashes or C<\Q...\E>) or encode them using octal +or hex escapes. Taken together, these features go a long way towards +making Perl's regular expressions more readable. Note that you have to +be careful not to include the pattern delimiter in the comment--perl has +no way of knowing you did not intend to close the pattern early. See +the C-comment deletion code in L. Also note that anything inside +a C<\Q...\E> stays unaffected by C. X =head2 Regular Expressions @@ -224,8 +225,17 @@ X [:class:] -is also available. The available classes and their backslash -equivalents (if available) are as follows: +is also available. Note that the C<[> and C<]> braces are I; +they must always be used within a character class expression. + + # this is correct: + $string =~ /[[:alpha:]]/; + + # this is not, and will generate a warning: + $string =~ /[:alpha:]/; + +The available classes and their backslash equivalents (if available) are +as follows: X X X X X X X X X X X X X X X @@ -274,7 +284,7 @@ The following equivalences to Unicode \p{} constructs and equivalent backslash character classes (if available), will hold: X X<\p> X<\p{}> - [:...:] \p{...} backslash + [[:...:]] \p{...} backslash alpha IsAlpha alnum IsAlnum @@ -292,7 +302,7 @@ X X<\p> X<\p{}> word IsWord xdigit IsXDigit -For example C<[:lower:]> and C<\p{IsLower}> are equivalent. +For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent. If the C pragma is not used but the C pragma is, the classes correlate with the usual isalpha(3) interface (except for @@ -339,11 +349,11 @@ You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example: X - POSIX traditional Unicode + POSIX traditional Unicode - [:^digit:] \D \P{IsDigit} - [:^space:] \S \P{IsSpace} - [:^word:] \W \P{IsWord} + [[:^digit:]] \D \P{IsDigit} + [[:^space:]] \S \P{IsSpace} + [[:^word:]] \W \P{IsWord} Perl respects the POSIX standard in that POSIX character classes are only supported within a character class. The POSIX character classes @@ -685,6 +695,10 @@ so you should only do so if you are also using taint checking. Better yet, use the carefully constrained evaluation within a Safe compartment. See L for details about both these mechanisms. +Because perl's regex engine is not currently re-entrant, interpolated +code may not invoke the regex engine either directly with C or C), +or indirectly with functions such as C. + =item C<(??{ code })> X<(??{})> X X X @@ -715,6 +729,10 @@ The following pattern matches a parenthesized group: \) }x; +Because perl's regex engine is not currently re-entrant, delayed +code may not invoke the regex engine either directly with C or C), +or indirectly with functions such as C. + =item C<< (?>pattern) >> X X @@ -960,14 +978,14 @@ But that isn't going to match; at least, not the way you're hoping. It claims that there is no 123 in the string. Here's a clearer picture of why that pattern matches, contrary to popular expectations: - $x = 'ABC123' ; - $y = 'ABC445' ; + $x = 'ABC123'; + $y = 'ABC445'; - print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ; - print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ; + print "1: got $1\n" if $x =~ /^(ABC)(?!123)/; + print "2: got $1\n" if $y =~ /^(ABC)(?!123)/; - print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ; - print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ; + print "3: got $1\n" if $x =~ /^(\D*)(?!123)/; + print "4: got $1\n" if $y =~ /^(\D*)(?!123)/; This prints @@ -1002,8 +1020,8 @@ are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds: - print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ; - print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ; + print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/; + print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/; 6: got ABC