From: Karl Williamson Date: Fri, 25 Dec 2009 17:40:56 +0000 (-0700) Subject: Update pods X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=c670e63af2af3c154935c36a0c6fb77f614af0bd;p=p5sagit%2Fp5-mst-13.2.git Update pods Signed-off-by: Abigail --- diff --git a/pod/perl5113delta.pod b/pod/perl5113delta.pod index 5b85e62..55fe29d 100644 --- a/pod/perl5113delta.pod +++ b/pod/perl5113delta.pod @@ -56,7 +56,8 @@ now accepted. C, which matches a Unicode logical character, has been expanded to work better with various Asian languages. It now is defined as an C. (See L). -Anything matched by previously will continue to be matched. But in addition: +Anything matched previously that made sense will continue to be matched. But +in addition: =over @@ -73,7 +74,9 @@ C<\X> will now match a sequence including the C and C characters. C<\X> will now always match at least one character, including an initial mark. Marks generally come after a base character, but it is possible in Unicode to have them in isolation, and C<\X> will now handle that case, for example at the -beginning of a line or after a C. +beginning of a line or after a C. And this is the part where C<\X> +doesn't match the things that it used to that don't make sense. Formerly, for +example, you could have the nonsensical case of an accented LF. =item * diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index cf33dc5..b7a6bdc 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -511,10 +511,10 @@ This matches a Unicode I. C<\X> matches quite well what normal (non-Unicode-programmer) usage would consider a single character. As an example, consider a G with some sort -of accent mark over it (a diacritic). There is no such single character in -Unicode, but something like one can be constructed by using a G followed by a -Unicode combining accent, and would be displayed by Unicode-aware software as -if it were a single character. +of diacritic mark, such as an arrow. There is no such single character in +Unicode, but one can be composed using a G followed by a Unicode "COMBINING +UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it +were a single character. Mnemonic: eItended Unicode character. diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 6807e70..09b5215 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -754,15 +754,15 @@ UTS#18 grouping, intersection, union, and removal (subtraction) syntax. [c] Try the C<:crlf> layer (see L). -[d] Avoid C (or say C) to allow -U+FFFF (C<\x{FFFF}>). +[d] U+FFFF will currently generate a warning message if 'utf8' warnings are + enabled =item * Level 2 - Extended Unicode Support RL2.1 Canonical Equivalents - MISSING [10][11] - RL2.2 Default Grapheme Clusters - MISSING [12][13] + RL2.2 Default Grapheme Clusters - MISSING [12] RL2.3 Default Word Boundaries - MISSING [14] RL2.4 Default Loose Matches - MISSING [15] RL2.5 Name Properties - MISSING [16] diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index e0142d4..01915c2 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -47,15 +47,15 @@ lowercasing, and collating (sorting) are defined. A Unicode I "character" can actually consist of more than one internal I "character" or code point. For Western languages, this is adequately -represented by a I (like C), followed +modelled by a I (like C) followed by one or more I (like C). This sequence of base character and modifiers is called a I. Some non-western languages require more complicated -representations, so Unicode invented a I and then an -I. For example, A Korean Hangul syllable is +models, so Unicode created the I concept, and then the +I. For example, a Korean Hangul syllable is considered a single logical character, but most often consists of three actual -characters: a leading consonant followed by an interior vowel followed by a -trailing consonant. +Unicode characters: a leading consonant followed by an interior vowel followed +by a trailing consonant. Whether to call these extended grapheme clusters "characters" depends on your point of view. If you are a programmer, you probably would tend towards seeing