From: Audrey Tang Date: Fri, 26 Jan 2007 05:38:39 +0000 (+0800) Subject: utf8.pm doc patch X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=2fa62f6675d9a740da5d4e8530030bb9dd10e689;p=p5sagit%2Fp5-mst-13.2.git utf8.pm doc patch Message-Id: <5BDAD0DE-3434-4A29-82C6-35AE3EFD27CC@audreyt.org> p4raw-id: //depot/perl@29992 --- diff --git a/lib/utf8.pm b/lib/utf8.pm index 56c991b..5ff900d 100644 --- a/lib/utf8.pm +++ b/lib/utf8.pm @@ -77,7 +77,7 @@ Enabling the C pragma has the following effect: =item * Bytes in the source text that have their high-bit set will be treated -as being part of a literal UTF-8 character. This includes most +as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. @@ -89,20 +89,24 @@ treated as being part of a literal UTF-EBCDIC character. Note that if you have bytes with the eighth bit on in your script (for example embedded Latin-1 in your string literals), C will be unhappy since the bytes are most probably not well-formed -UTF-8. If you want to have such bytes and use utf8, you can disable -utf8 until the end the block (or file, if at top level) by C. +UTF-X. If you want to have such bytes under C, you can disable +this pragma until the end the block (or file, if at top level) by +C. -If you want to automatically upgrade your 8-bit legacy bytes to UTF-8, +If you want to automatically upgrade your 8-bit legacy bytes to Unicode, use the L pragma instead of this pragma. For example, if -you want to implicitly upgrade your ISO 8859-1 (Latin-1) bytes to UTF-8 +you want to implicitly upgrade your ISO 8859-1 (Latin-1) bytes to Unicode as used in e.g. C and C<\x{...}>, try this: use encoding "latin-1"; my $c = chr(0xc4); my $x = "\x{c5}"; -In case you are wondering: yes, C works much -the same as C. +In case you are wondering: C is mostly the same as +C, except that C marks all string literals in the +source code as Unicode, regardless of whether they contain any high-bit bytes. +Moreover, C installs IO layers on C and C to work +with Unicode strings; see L for details. =head2 Utility functions