to this sample program ensures that the output is completely UTF-8,
and removes the program's warning.
-If your locale environment variables (C<LANGUAGE>, C<LC_ALL>,
-C<LC_CTYPE>, C<LANG>) contain the strings 'UTF-8' or 'UTF8',
-regardless of case, then the default encoding of your STDIN, STDOUT,
-and STDERR and of B<any subsequent file open>, is UTF-8. Note that
-this means that Perl expects other software to work, too: if Perl has
-been led to believe that STDIN should be UTF-8, but then STDIN coming
-in from another command is not UTF-8, Perl will complain about the
-malformed UTF-8.
+If your locale environment variables (C<LC_ALL>, C<LC_CTYPE>, C<LANG>)
+contain the strings 'UTF-8' or 'UTF8' (matched case-insensitively)
+B<and> you enable using UTF-8 either by using the C<-C> command line
+switch or by setting the PERL_UNICODE environment variable to an empty
+string, C<"">, (see L<perlrun> and the documentation for the C<-C>
+switch for more information about the possible values), then the
+default encoding of your STDIN, STDOUT, and STDERR, and of B<any
+subsequent file open>, will be UTF-8. Note that this means that Perl
+expects other software to work, too: if Perl has been led to believe
+that STDIN should be UTF-8, but then STDIN coming in from another
+command is not UTF-8, Perl will complain about the malformed UTF-8.
All features that combine Unicode and I/O also require using the new
PerlIO feature. Almost all Perl 5.8 platforms do use PerlIO, though:
With the C<open> pragma you can use the C<:locale> layer
- $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R';
+ BEGIN { $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R' }
# the :locale will probe the locale environment variables like LC_ALL
use open OUT => ':locale'; # russki parusski
open(O, ">koi8");
contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to
the file "text.utf8", encoded as UTF-8:
- open(my $nihongo, '<:encoding(iso2022-jp)', 'text.jis');
- open(my $unicode, '>:utf8', 'text.utf8');
- while (<$nihongo>) { print $unicode }
+ open(my $nihongo, '<:encoding(iso-2022-jp)', 'text.jis');
+ open(my $unicode, '>:utf8', 'text.utf8');
+ while (<$nihongo>) { print $unicode $_ }
The naming of encodings, both by the C<open()> and by the C<open>
pragma, is similar to the C<encoding> pragma in that it allows for
explicitly opening also the F<file> for input as UTF-8.
B<NOTE>: the C<:utf8> and C<:encoding> features work only if your
-Perl has been built with the new PerlIO feature.
+Perl has been built with the new PerlIO feature (which is the default
+on most systems).
=head2 Displaying Unicode As Text
Perl front-end C<Convert::Recode> for character conversions.
The following are fast conversions from ISO 8859-1 (Latin-1) bytes
-to UTF-8 bytes, the code works even with older Perl 5 versions.
+to UTF-8 bytes and back, the code works even with older Perl 5 versions.
# ISO 8859-1 to UTF-8
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;