X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperluniintro.pod;h=3a2346004c5e851470344ade66c85667892fd53e;hb=6da82d49cf0eed31af6c929b02e2107edccbb7e2;hp=870926ea1fcb999842e3e8e8dcf216f554e1dd64;hpb=fae2c0fbfb26247eb616ab310ef74b1f4084ba68;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 870926e..3a23460 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -172,13 +172,15 @@ To output UTF-8, use the C<:utf8> output layer. Prepending to this sample program ensures that the output is completely UTF-8, and removes the program's warning. -If your locale environment variables (C, C, -C, C) contain the strings 'UTF-8' or 'UTF8', -regardless of case, then the default encoding of your STDIN, STDOUT, -and STDERR and of B, is UTF-8. Note that -this means that Perl expects other software to work, too: if Perl has -been led to believe that STDIN should be UTF-8, but then STDIN coming -in from another command is not UTF-8, Perl will complain about the +If your locale environment variables (C, C, C) +contain the strings 'UTF-8' or 'UTF8' (matched case-insensitively) +B you enable using UTF-8 either by using the C<-C> command line +switch or by setting the PERL_UTF8_LOCALE environment variable to +a true value, then the default encoding of your STDIN, STDOUT, and +STDERR, and of B, is UTF-8. Note that this +means that Perl expects other software to work, too: if Perl has been +led to believe that STDIN should be UTF-8, but then STDIN coming in +from another command is not UTF-8, Perl will complain about the malformed UTF-8. All features that combine Unicode and I/O also require using the new @@ -398,9 +400,9 @@ written to the stream. For example, the following snippet copies the contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to the file "text.utf8", encoded as UTF-8: - open(my $nihongo, '<:encoding(iso2022-jp)', 'text.jis'); - open(my $unicode, '>:utf8', 'text.utf8'); - while (<$nihongo>) { print $unicode } + open(my $nihongo, '<:encoding(iso-2022-jp)', 'text.jis'); + open(my $unicode, '>:utf8', 'text.utf8'); + while (<$nihongo>) { print $unicode $_ } The naming of encodings, both by the C and by the C pragma, is similar to the C pragma in that it allows for @@ -862,7 +864,7 @@ If you have the GNU recode installed, you can also use the Perl front-end C for character conversions. The following are fast conversions from ISO 8859-1 (Latin-1) bytes -to UTF-8 bytes, the code works even with older Perl 5 versions. +to UTF-8 bytes and back, the code works even with older Perl 5 versions. # ISO 8859-1 to UTF-8 s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;