X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=1df6ba3d8a34c1f8e1b6131c8735df3e00c8cfd6;hb=06eaf0bc49fea082c8b8358680815d807a7a925e;hp=382ba652427467b6fa2ec7310507932e1029cf90;hpb=871b02334a356f1bb4272c9eca4a1570888bcd87;p=p5sagit%2Fp5-mst-13.2.git

diff --git a/pod/perlre.pod b/pod/perlre.pod
index 382ba65..1df6ba3 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -142,6 +142,7 @@ also work:
     \e		escape (think troff)  (ESC)
     \033	octal char (think of a PDP-11)
     \x1B	hex char
+    \x{263a}	wide hex char         (Unicode SMILEY)
     \c[		control char
     \l		lowercase next char (think vi)
     \u		uppercase next char (think vi)
@@ -166,6 +167,11 @@ In addition, Perl defines the following:
     \S	Match a non-whitespace character
     \d	Match a digit character
     \D	Match a non-digit character
+    \pP	Match P, named property.  Use \p{Prop} for longer names.
+    \PP	Match non-P
+    \X	Match eXtended Unicode "combining character sequence",
+        equivalent to C<(?:\PM\pM*)>
+    \C	Match a single C char (octet) even under utf8.
 
 A C<\w> matches a single alphanumeric character, not a whole
 word.  To match a word you'd need to say C<\w+>.  If C<use locale> is in
@@ -394,6 +400,28 @@ checks, thus to allow $re in the above snippet to contain C<(?{})>
 I<with tainting enabled>, one needs both C<use re 'eval'> and untaint
 the $re.
 
+=item C<(?p{ code })>
+
+I<Very experimental> "postponed" regular subexpression.  C<code> is evaluated
+at runtime, at the moment this subexpression may match.  The result of
+evaluation is considered as a regular expression, and matched as if it
+were inserted instead of this construct.
+
+C<code> is not interpolated.  Currently the rules to
+determine where the C<code> ends are somewhat convoluted.
+
+The following regular expression matches matching parenthesized group:
+
+  $re = qr{
+	     \(
+	     (?:
+		(?> [^()]+ )	# Non-parens without backtracking
+	      |
+		(?p{ $re })	# Group with matching parens
+	     )*
+	     \)
+	  }x;
+
 =item C<(?E<gt>pattern)>
 
 An "independent" subexpression.  Matches the substring that a
@@ -458,7 +486,7 @@ the time when used on a similar string with 1000000 C<a>s.  Be aware,
 however, that this pattern currently triggers a warning message under
 B<-w> saying it C<"matches the null string many times">):
 
-On simple groups, such as the pattern C<(?> [^()]+ )>, a comparable
+On simple groups, such as the pattern C<(?E<gt> [^()]+ )>, a comparable
 effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>.
 This was only 4 times slower on a string with 1000000 C<a>s.
 
@@ -730,6 +758,13 @@ following all specify the same class of three characters: C<[-az]>,
 C<[az-]>, and C<[a\-z]>.  All are different from C<[a-z]>, which
 specifies a class containing twenty-six characters.)
 
+Note also that the whole range idea is rather unportable between
+character sets--and even within character sets they may cause results
+you probably didn't expect.  A sound principle is to use only ranges
+that begin from and end at either alphabets of equal case ([a-e],
+[A-E]), or digits ([0-9]).  Anything else is unsafe.  If in doubt,
+spell out the character sets in full.
+
 Characters may be specified using a metacharacter syntax much like that
 used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
 "\f" a form feed, etc.  More generally, \I<nnn>, where I<nnn> is a string