or C<\P{Katakana}>. Other sets are the Unicode blocks, the names
of which begin with "In". One such block is dedicated to mathematical
operators, and its pattern formula is <C\p{InMathematicalOperators>}>.
-For the full list see L<perlunicode>.
+For the full list see L<perluniprops>.
+
+What we have described so far is the single form of the C<\p{...}> character
+classes. There is also a compound form which you may run into. These
+look like C<\p{name=value}> or C<\p{name:value}> (the equals sign and colon
+can be used interchangeably). These are more general than the single form,
+and in fact most of the single forms are just Perl-defined shortcuts for common
+compound forms. For example, the script examples in the previous paragraph
+could be written equivalently as C<\p{Script=Latin}>, C<\p{Script:Greek}>, and
+C<\P{script=katakana}> (case is irrelevant between the C<{}> braces). You may
+never have to use the compound forms, but sometimes it is necessary, and their
+use can make your code easier to understand.
C<\X> is an abbreviation for a character class that comprises
-the Unicode I<combining character sequences>. A combining character
-sequence is a base character followed by any number of diacritics, i.e.,
-signs like accents used to indicate different sounds of a letter. Using
-the Unicode full names, e.g., S<C<A + COMBINING RING>> is a combining
-character sequence with base character C<A> and combining character
-S<C<COMBINING RING>>, which translates in Danish to A with the circle
-atop it, as in the word Angstrom. C<\X> is equivalent to C<\PM\pM*}>,
-i.e., a non-mark followed by one or more marks.
+a Unicode I<extended grapheme cluster>. This represents a "logical character",
+what appears to be a single character, but may be represented internally by more
+than one. As an example, using the Unicode full names, e.g., S<C<A + COMBINING
+RING>> is a grapheme cluster with base character C<A> and combining character
+S<C<COMBINING RING>>, which translates in Danish to A with the circle atop it,
+as in the word Angstrom.
For the full and latest information about Unicode see the latest
-Unicode standard, or the Unicode Consortium's website http://www.unicode.org/
+Unicode standard, or the Unicode Consortium's website L<http://www.unicode.org>
As if all those classes weren't enough, Perl also defines POSIX style
character classes. These have the form C<[:name:]>, with C<name> the