From: Jarkko Hietaniemi Date: Sat, 29 Sep 2001 20:15:32 +0000 (+0000) Subject: Explain a bit the new more flexible \p\P syntax. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=a1cc1cb15ece75062cc270a98e84bc57301a80f0;p=p5sagit%2Fp5-mst-13.2.git Explain a bit the new more flexible \p\P syntax. p4raw-id: //depot/perl@12270 --- diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index f27173c..4864909 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -168,11 +168,16 @@ match property) constructs. For instance, C<\p{Lu}> matches any character with the Unicode uppercase property, while C<\p{M}> matches any mark character. Single letter properties may omit the brackets, so that can be written C<\pM> also. Many predefined character classes -are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>. The -recommended names of the C classes are the official Unicode script -and block names but with all non-alphanumeric characters removed, for -example the block name C<"Latin-1 Supplement"> becomes -C<\p{InLatin1Supplement}>. +are available, such as C<\p{IsMirrored}> and C<\p{InTibetan}>. +The recommended naming convention of the C classes are the +official Unicode script and block names, but with all non-alphanumeric +characters removed, for example the block name C<"Latin-1 Supplement"> +becomes C<\p{InLatin1Supplement}>. Perl will ignore the case of +letters, and any space or dash can be a space, dash, underbar, or be +missing altogether, so C<\p{ in latin 1 supplement }> will work, too. +You can also negate both C<\p{}> and C<\P{}> by introducing a caret +(^) between the first curly and the property name: C<\p{^InTamil}> is +equal to C<\P{Tamil}>. Here is the list as of Unicode 3.1.0 (the two-letter classes) and as defined by Perl (the one-letter classes) (in Unicode materials