X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperluniintro.pod;h=751bdc6f02af364b3da93affc456ab8804161cff;hb=a8476e91ee770bd8b0c7183fe1314d9effd435ad;hp=223dbae7fd28f1087f67a0612e94508eeb49575b;hpb=63de3cb284beb0325229608ff63562933eba8f50;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 223dbae..751bdc6 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -172,14 +172,15 @@ To output UTF-8, use the C<:utf8> output layer. Prepending to this sample program ensures that the output is completely UTF-8, and removes the program's warning. -If your locale environment variables (C, C, -C, C) contain the strings 'UTF-8' or 'UTF8', -regardless of case, then the default encoding of your STDIN, STDOUT, -and STDERR and of B, is UTF-8. Note that -this means that Perl expects other software to work, too: if Perl has -been led to believe that STDIN should be UTF-8, but then STDIN coming -in from another command is not UTF-8, Perl will complain about the -malformed UTF-8. +You can enable automatic UTF-8-ification of your standard file +handles, default C layer, and C<@ARGV> by using either +the C<-C> command line switch or the C environment +variable, see L for the documentation of the C<-C> switch. + +Note that this means that Perl expects other software to work, too: +if Perl has been led to believe that STDIN should be UTF-8, but then +STDIN coming in from another command is not UTF-8, Perl will complain +about the malformed UTF-8. All features that combine Unicode and I/O also require using the new PerlIO feature. Almost all Perl 5.8 platforms do use PerlIO, though: @@ -361,7 +362,7 @@ the C pragma. See L, or look at the following example. With the C pragma you can use the C<:locale> layer - $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R'; + BEGIN { $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R' } # the :locale will probe the locale environment variables like LC_ALL use open OUT => ':locale'; # russki parusski open(O, ">koi8"); @@ -398,9 +399,9 @@ written to the stream. For example, the following snippet copies the contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to the file "text.utf8", encoded as UTF-8: - open(my $nihongo, '<:encoding(iso2022-jp)', 'text.jis'); - open(my $unicode, '>:utf8', 'text.utf8'); - while (<$nihongo>) { print $unicode } + open(my $nihongo, '<:encoding(iso-2022-jp)', 'text.jis'); + open(my $unicode, '>:utf8', 'text.utf8'); + while (<$nihongo>) { print $unicode $_ } The naming of encodings, both by the C and by the C pragma, is similar to the C pragma in that it allows for @@ -433,7 +434,8 @@ UTF-8 encoded. A C would have avoided the bug, or explicitly opening also the F for input as UTF-8. B: the C<:utf8> and C<:encoding> features work only if your -Perl has been built with the new PerlIO feature. +Perl has been built with the new PerlIO feature (which is the default +on most systems). =head2 Displaying Unicode As Text @@ -449,7 +451,7 @@ displayed as C<\x..>, and the rest of the characters as themselves: sprintf("\\x{%04X}", $_) : # \x{...} chr($_) =~ /[[:cntrl:]]/ ? # else if control character ... sprintf("\\x%02X", $_) : # \x.. - chr($_) # else as themselves + quotemeta(chr($_)) # else quoted or as themselves } unpack("U*", $_[0])); # unpack Unicode characters } @@ -457,9 +459,11 @@ For example, nice_string("foo\x{100}bar\n") -returns: +returns the string + + 'foo\x{0100}bar\x0A' - "foo\x{0100}bar\x0A" +which is ready to be printed. =head2 Special Cases @@ -500,7 +504,7 @@ Yet another way would be to use the Devel::Peek module: That shows the UTF8 flag in FLAGS and both the UTF-8 bytes and Unicode characters in C. See also later in this document -the discussion about the C function of the C module. +the discussion about the C function. =back @@ -621,8 +625,7 @@ didn't get the transparency of Unicode quite right. Okay, if you insist: - use Encode 'is_utf8'; - print is_utf8($string) ? 1 : 0, "\n"; + print utf8::is_utf8($string) ? 1 : 0, "\n"; But note that this doesn't mean that any of the characters in the string are necessary UTF-8 encoded, or that any of the characters have @@ -883,6 +886,6 @@ mailing lists for their valuable feedback. =head1 AUTHOR, COPYRIGHT, AND LICENSE -Copyright 2001-2002 Jarkko Hietaniemi +Copyright 2001-2002 Jarkko Hietaniemi Ejhi@iki.fiE This document may be distributed under the same terms as Perl itself.