X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlunicode.pod;h=73734757532095b281fc200cf29de35c13959895;hb=1a1931329abb89a33c5409dc3da71dd30c04a0b8;hp=0aec6fee1121a7af469227014fca39f26d4257b7;hpb=8f8cf39ca802a67cf132f9179bbf212ddb1ec64e;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 0aec6fe..7373475 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -12,7 +12,7 @@ from cover to cover, Perl does support many Unicode features. =over 4 -=item Input and Output Disciplines +=item Input and Output Layers Perl knows when a filehandle uses Perl's internal Unicode encodings (UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with @@ -87,7 +87,7 @@ Unless explicitly stated, Perl operators use character semantics for Unicode data and byte semantics for non-Unicode data. The decision to use character semantics is made transparently. If input data comes from a Unicode source--for example, if a character -encoding discipline is added to a filehandle or a literal Unicode +encoding layer is added to a filehandle or a literal Unicode string constant appears in a program--character semantics apply. Otherwise, byte semantics are in effect. The C pragma should be used to force byte semantics on Unicode data. @@ -598,21 +598,14 @@ than one Unicode character. =back -The following cases do not yet work: +Things to do with locales (Lithuanian, Turkish, Azeri) do B work +since Perl does not understand the concept of Unicode locales. -=over 8 - -=item * - -the "final sigma" (Greek), and - -=item * - -anything to with locales (Lithuanian, Turkish, Azeri). +See the Unicode Technical Report #21, Case Mappings, for more details. =back -See the Unicode Technical Report #21, Case Mappings, for more details. +=over 4 =item * @@ -771,17 +764,19 @@ which will match assigned characters known to be part of the Greek script. Level 2 - Extended Unicode Support - 3.1 Surrogates - MISSING - 3.2 Canonical Equivalents - MISSING [11][12] - 3.3 Locale-Independent Graphemes - MISSING [13] - 3.4 Locale-Independent Words - MISSING [14] - 3.5 Locale-Independent Loose Matches - MISSING [15] - - [11] see UTR#15 Unicode Normalization - [12] have Unicode::Normalize but not integrated to regexes - [13] have \X but at this level . should equal that - [14] need three classes, not just \w and \W - [15] see UTR#21 Case Mappings + 3.1 Surrogates - MISSING [11] + 3.2 Canonical Equivalents - MISSING [12][13] + 3.3 Locale-Independent Graphemes - MISSING [14] + 3.4 Locale-Independent Words - MISSING [15] + 3.5 Locale-Independent Loose Matches - MISSING [16] + + [11] Surrogates are solely a UTF-16 concept and Perl's internal + representation is UTF-8. The Encode module does UTF-16, though. + [12] see UTR#15 Unicode Normalization + [13] have Unicode::Normalize but not integrated to regexes + [14] have \X but at this level . should equal that + [15] need three classes, not just \w and \W + [16] see UTR#21 Case Mappings =item *