From: Steve Purkis Date: Fri, 20 Jan 2006 12:35:06 +0000 (-0500) Subject: [[:...:]] is equivalent to \p{...}, not [:...:], tweaked from X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=5496314a41c61bc06e565c745abc1dc795ce4db3;p=p5sagit%2Fp5-mst-13.2.git [[:...:]] is equivalent to \p{...}, not [:...:], tweaked from Subject: Re: [:...:] and \p{...} character class equivalence in utf8 regexps Message-Id: <0DAE5956-3ECC-4692-A0C9-C62C8F790C97@multimap.com> Date: Fri, 20 Jan 2006 12:35:06 -0500 p4raw-id: //depot/perl@27042 --- diff --git a/pod/perlre.pod b/pod/perlre.pod index f24e971..32a7e6f 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -224,8 +224,17 @@ X [:class:] -is also available. The available classes and their backslash -equivalents (if available) are as follows: +is also available. Note that the C<[> and C<]> braces are I; +they must always be used within a character class expression. + + # this is correct: + $string =~ /[[:alpha:]]/; + + # this is not, and will generate a warning: + $string =~ /[:alpha:]/; + +The available classes and their backslash equivalents (if available) are +as follows: X X X X X X X X X X X X X X X @@ -274,7 +283,7 @@ The following equivalences to Unicode \p{} constructs and equivalent backslash character classes (if available), will hold: X X<\p> X<\p{}> - [:...:] \p{...} backslash + [[:...:]] \p{...} backslash alpha IsAlpha alnum IsAlnum @@ -292,7 +301,7 @@ X X<\p> X<\p{}> word IsWord xdigit IsXDigit -For example C<[:lower:]> and C<\p{IsLower}> are equivalent. +For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent. If the C pragma is not used but the C pragma is, the classes correlate with the usual isalpha(3) interface (except for @@ -339,11 +348,11 @@ You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example: X - POSIX traditional Unicode + POSIX traditional Unicode - [:^digit:] \D \P{IsDigit} - [:^space:] \S \P{IsSpace} - [:^word:] \W \P{IsWord} + [[:^digit:]] \D \P{IsDigit} + [[:^space:]] \S \P{IsSpace} + [[:^word:]] \W \P{IsWord} Perl respects the POSIX standard in that POSIX character classes are only supported within a character class. The POSIX character classes