From: Nick Ing-Simmons Date: Thu, 5 Apr 2001 21:32:26 +0000 (+0000) Subject: Change sense from "incomplete" to "implemented but needs more work" in perlunicode.pod X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=0a1f2d144e4463451f8627bd1c6ca420a59b01b0;p=p5sagit%2Fp5-mst-13.2.git Change sense from "incomplete" to "implemented but needs more work" in perlunicode.pod p4raw-id: //depot/perlio@9569 --- diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 30a4482..bb3ce2b 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -4,28 +4,40 @@ perlunicode - Unicode support in Perl =head1 DESCRIPTION -=head2 Important Caveat +=head2 Important Caveats -WARNING: The implementation of Unicode support in Perl is incomplete. +WARNING: While the implementation of Unicode support in Perl is now fairly +complete it is still evolving to some extent. -The following areas need further work. +In particular the way Unicode is handled on EBCDIC platforms is still rather +experimental. On such a platform references to UTF-8 encoding in this +document and elsewhere should be read as meaning UTF-EBCDIC as specified +in Unicode Technical Report 16 unless ASCII vs EBCDIC issues are specifically +discussed. There is no C pragma or ":utfebcdic" layer, rather +"utf8" and ":utf8" are re-used to mean platform's "natural" 8-bit encoding +of Unicode. See L for more discussion of the issues. + +The following areas are still under development. =over 4 =item Input and Output Disciplines -There is currently no easy way to mark data read from a file or other -external source as being utf8. This will be one of the major areas of -focus in the near future. +A filehandle can be marked as containing perl's internal Unicode encoding +(UTF-8 or UTF-EBCDIC) by opening it with the ":utf8" layer. +Other encodings can be converted to perl's encoding on input, or from +perl's encoding on output by use of the ":encoding()" layer. +There is not yet a clean way to mark the perl source itself as being +in an particular encoding. =item Regular Expressions -The existing regular expression compiler does not produce polymorphic -opcodes. This means that the determination on whether to match Unicode -characters is made when the pattern is compiled, based on whether the -pattern contains Unicode characters, and not when the matching happens -at run time. This needs to be changed to adaptively match Unicode if -the string to be matched is Unicode. +The regular expression compiler does now attempt to produce polymorphic +opcodes. That is the pattern should now adapt to the data and +automaticaly switch to the Unicode character scheme when presented with Unicode data, +or a traditional byte scheme when presented with byte data. +The implementation is still new and (particularly on EBCDIC platforms) may +need further work. =item C still needed to enable a few features