X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=1df6ba3d8a34c1f8e1b6131c8735df3e00c8cfd6;hb=06eaf0bc49fea082c8b8358680815d807a7a925e;hp=382ba652427467b6fa2ec7310507932e1029cf90;hpb=871b02334a356f1bb4272c9eca4a1570888bcd87;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 382ba65..1df6ba3 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -142,6 +142,7 @@ also work: \e escape (think troff) (ESC) \033 octal char (think of a PDP-11) \x1B hex char + \x{263a} wide hex char (Unicode SMILEY) \c[ control char \l lowercase next char (think vi) \u uppercase next char (think vi) @@ -166,6 +167,11 @@ In addition, Perl defines the following: \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character + \pP Match P, named property. Use \p{Prop} for longer names. + \PP Match non-P + \X Match eXtended Unicode "combining character sequence", + equivalent to C<(?:\PM\pM*)> + \C Match a single C char (octet) even under utf8. A C<\w> matches a single alphanumeric character, not a whole word. To match a word you'd need to say C<\w+>. If C is in @@ -394,6 +400,28 @@ checks, thus to allow $re in the above snippet to contain C<(?{})> I, one needs both C and untaint the $re. +=item C<(?p{ code })> + +I "postponed" regular subexpression. C is evaluated +at runtime, at the moment this subexpression may match. The result of +evaluation is considered as a regular expression, and matched as if it +were inserted instead of this construct. + +C is not interpolated. Currently the rules to +determine where the C ends are somewhat convoluted. + +The following regular expression matches matching parenthesized group: + + $re = qr{ + \( + (?: + (?> [^()]+ ) # Non-parens without backtracking + | + (?p{ $re }) # Group with matching parens + )* + \) + }x; + =item C<(?Epattern)> An "independent" subexpression. Matches the substring that a @@ -458,7 +486,7 @@ the time when used on a similar string with 1000000 Cs. Be aware, however, that this pattern currently triggers a warning message under B<-w> saying it C<"matches the null string many times">): -On simple groups, such as the pattern C<(?> [^()]+ )>, a comparable +On simple groups, such as the pattern C<(?E [^()]+ )>, a comparable effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>. This was only 4 times slower on a string with 1000000 Cs. @@ -730,6 +758,13 @@ following all specify the same class of three characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which specifies a class containing twenty-six characters.) +Note also that the whole range idea is rather unportable between +character sets--and even within character sets they may cause results +you probably didn't expect. A sound principle is to use only ranges +that begin from and end at either alphabets of equal case ([a-e], +[A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt, +spell out the character sets in full. + Characters may be specified using a metacharacter syntax much like that used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return, "\f" a form feed, etc. More generally, \I, where I is a string