From: Jeff Pinyan <japhy@pobox.com>
Date: Wed, 14 Apr 2004 17:01:38 +0000 (-0400)
Subject: Re: [PATCH] lib/utf8_heavy.pl -- cascading classes and '&' support
X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=bac0b42524fd3607268d7139a21b07697a1c978b;p=p5sagit%2Fp5-mst-13.2.git

Re: [PATCH] lib/utf8_heavy.pl -- cascading classes and '&' support
From: "Jeff 'japhy' Pinyan" <japhy@perlmonk.org>
Message-ID: <Pine.LNX.4.44.0404141659480.11423-301000@perlmonk.org>

p4raw-id: //depot/perl@22713
---

diff --git a/MANIFEST b/MANIFEST
index 1f8239d..cf87b5f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -2952,6 +2952,7 @@ t/TestInit.pm			Preamble library for core tests
 t/test.pl			Simple testing library
 t/uni/case.pl			See if Unicode casing works
 t/uni/chomp.t			See if Unicode chomp works
+t/uni/class.t			See if Unicode classes work (\p)
 t/uni/fold.t			See if Unicode folding works
 t/uni/lower.t			See if Unicode casing works
 t/uni/sprintf.t			See if Unicode sprintf works
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 7de87ac..0817bb3 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -632,10 +632,21 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
 =head2 User-Defined Character Properties
 
 You can define your own character properties by defining subroutines
-whose names begin with "In" or "Is".  The subroutines must be defined
-in the C<main> package.  The user-defined properties can be used in the
-regular expression C<\p> and C<\P> constructs.  Note that the effect
-is compile-time and immutable once defined.
+whose names begin with "In" or "Is".  The subroutines can be defined in
+any package.  The user-defined properties can be used in the regular
+expression C<\p> and C<\P> constructs; if you are using a user-defined
+property from a package other than the one you are in, you must specify
+its package in the C<\p> or C<\P> construct.
+
+    # assuming property IsForeign defined in Lang::
+    package main;  # property package name required
+    if ($txt =~ /\p{Lang::IsForeign}+/) { ... }
+
+    package Lang;  # property package name not required
+    if ($txt =~ /\p{IsForeign}+/) { ... }
+
+
+Note that the effect is compile-time and immutable once defined.
 
 The subroutines must return a specially-formatted string, with one
 or more newline-separated lines.  Each line must be one of the following:
@@ -650,23 +661,30 @@ tabular characters) denoting a range of Unicode code points to include.
 =item *
 
 Something to include, prefixed by "+": a built-in character
-property (prefixed by "utf8::"), to represent all the characters in that
-property; two hexadecimal code points for a range; or a single
-hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
 
 =item *
 
 Something to exclude, prefixed by "-": an existing character
-property (prefixed by "utf8::"), for all the characters in that
-property; two hexadecimal code points for a range; or a single
-hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
 
 =item *
 
 Something to negate, prefixed "!": an existing character
-property (prefixed by "utf8::") for all the characters except the
-characters in the property; two hexadecimal code points for a range;
-or a single hexadecimal code point.
+property (prefixed by "utf8::") or a user-defined character property,
+to represent all the characters in that property; two hexadecimal code
+points for a range; or a single hexadecimal code point.
+
+=item *
+
+Something to intersect with, prefixed by "&": an existing character
+property (prefixed by "utf8::") or a user-defined character property,
+for all the characters except the characters in the property; two
+hexadecimal code points for a range; or a single hexadecimal code point.
 
 =back
 
@@ -714,6 +732,19 @@ The negation is useful for defining (surprise!) negated classes.
     END
     }
 
+Intersection is useful for getting the common characters matched by
+two (or more) classes.
+
+    sub InFooAndBar {
+        return <<'END';
+    +main::Foo
+    &main::Bar
+    END
+    }
+
+It's important to remember not to use "&" for the first set -- that
+would be intersecting with nothing (resulting in an empty set).
+
 You can also define your own mappings to be used in the lc(),
 lcfirst(), uc(), and ucfirst() (or their string-inlined versions).
 The principle is the same: define subroutines in the C<main> package