Doc tweaks.
[p5sagit/p5-mst-13.2.git] / lib / encoding.pm
CommitLineData
0a378802 1package encoding;
2
3use Encode;
4
5sub import {
6 my ($class, $name) = @_;
7 $name = $ENV{PERL_ENCODING} if @_ < 2;
8 my $enc = find_encoding($name);
9 unless (defined $enc) {
10 require Carp;
11 Carp::croak "Unknown encoding '$name'";
12 }
13 ${^ENCODING} = $enc;
14}
15
16=pod
17
18=head1 NAME
19
20encoding - pragma to control the conversion of legacy data into Unicode
21
22=head1 SYNOPSIS
23
24 use encoding "iso 8859-7";
25
4bdee82d 26 # The \xDF of ISO 8859-7 is \x{3af} in Unicode.
27
0a378802 28 $a = "\xDF";
29 $b = "\x{100}";
30
4bdee82d 31 printf "%#x\n", ord($a); # will print 0x3af, not 0xdf
32
0a378802 33 $c = $a . $b;
34
35 # $c will be "\x{3af}\x{100}", not "\x{df}\x{100}".
0a378802 36
37=head1 DESCRIPTION
38
39Normally when legacy 8-bit data is converted to Unicode the data is
40expected to be Latin-1 (or EBCDIC in EBCDIC platforms). With the
41encoding pragma you can change this default.
42
43The pragma is a per script, not a per block lexical. Only the last
9f4817db 44C<use encoding> matters, and it affects B<the whole script>.
0a378802 45
4bdee82d 46If no encoding is specified, the environment variable L<PERL_ENCODING>
47is consulted. If no encoding can be found, C<Unknown encoding '...'>
48error will be thrown.
49
0a378802 50=head1 FUTURE POSSIBILITIES
51
9f4817db 52The C<\x..> and C<\0...> in regular expressions are not
53affected by this pragma. They probably should.
54
1768d7eb 55Also chr(), ord(), and C<\N{...}> might become affected.
0a378802 56
d521382b 57=head1 KNOWN PROBLEMS
58
59Cannot be combined with C<use utf8>. Note that this is a problem
60B<only> if you would like to have Unicode identifiers in your scripts.
61You should not need C<use utf8> for anything else these days
62(since Perl 5.8.0)
63
0a378802 64=head1 SEE ALSO
65
4bdee82d 66L<perlunicode>, L<encode>
0a378802 67
68=cut
69
701;