=over 4
-=item Input and Output Disciplines
+=item Input and Output Layers
Perl knows when a filehandle uses Perl's internal Unicode encodings
(UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with
for Unicode data and byte semantics for non-Unicode data.
The decision to use character semantics is made transparently. If
input data comes from a Unicode source--for example, if a character
-encoding discipline is added to a filehandle or a literal Unicode
+encoding layer is added to a filehandle or a literal Unicode
string constant appears in a program--character semantics apply.
Otherwise, byte semantics are in effect. The C<bytes> pragma should
be used to force byte semantics on Unicode data.
=back
-The following cases do not yet work:
+Things to do with locales (Lithuanian, Turkish, Azeri) do B<not> work
+since Perl does not understand the concept of Unicode locales.
-=over 8
-
-=item *
-
-the "final sigma" (Greek), and
-
-=item *
-
-anything to with locales (Lithuanian, Turkish, Azeri).
+See the Unicode Technical Report #21, Case Mappings, for more details.
=back
-See the Unicode Technical Report #21, Case Mappings, for more details.
+=over 4
=item *
Level 2 - Extended Unicode Support
- 3.1 Surrogates - MISSING
- 3.2 Canonical Equivalents - MISSING [11][12]
- 3.3 Locale-Independent Graphemes - MISSING [13]
- 3.4 Locale-Independent Words - MISSING [14]
- 3.5 Locale-Independent Loose Matches - MISSING [15]
-
- [11] see UTR#15 Unicode Normalization
- [12] have Unicode::Normalize but not integrated to regexes
- [13] have \X but at this level . should equal that
- [14] need three classes, not just \w and \W
- [15] see UTR#21 Case Mappings
+ 3.1 Surrogates - MISSING [11]
+ 3.2 Canonical Equivalents - MISSING [12][13]
+ 3.3 Locale-Independent Graphemes - MISSING [14]
+ 3.4 Locale-Independent Words - MISSING [15]
+ 3.5 Locale-Independent Loose Matches - MISSING [16]
+
+ [11] Surrogates are solely a UTF-16 concept and Perl's internal
+ representation is UTF-8. The Encode module does UTF-16, though.
+ [12] see UTR#15 Unicode Normalization
+ [13] have Unicode::Normalize but not integrated to regexes
+ [14] have \X but at this level . should equal that
+ [15] need three classes, not just \w and \W
+ [16] see UTR#21 Case Mappings
=item *