Patch for pod/perlpod.pod
[p5sagit/p5-mst-13.2.git] / pod / perli18n.pod
CommitLineData
0cdde29f 1=head1 NAME
2
3perl18n - Perl i18n (internalization)
4
5=head1 DESCRIPTION
6
7Perl supports the language-specific notions of data like
8"is this a letter" and "which letter comes first". These
9are very important issues especially for languages other
10than English -- but also for English: it would be very
bee4bbb5 11naïve indeed to think that C<A-Za-z> defines all the letters.
0cdde29f 12
13Perl understands the language-specific data via the standardized
14(ISO C, XPG4, POSIX 1.c) method called "the locale system".
bee4bbb5 15The locale system is controlled per application using one
16function call and several environment variables.
0cdde29f 17
18=head1 USING LOCALES
19
20If your operating system supports the locale system and you have
21installed the locale system and you have set your locale environment
22variables correctly (please see below) before running Perl, Perl will
bee4bbb5 23understand your data correctly according to your locale settings.
0cdde29f 24
25In runtime you can switch locales using the POSIX::setlocale().
26
bee4bbb5 27 # setlocale is the function call
28 # LC_CTYPE will be explained later
29
0cdde29f 30 use POSIX qw(setlocale LC_CTYPE);
31
32 # query and save the old locale.
33 $old_locale = setlocale(LC_CTYPE);
34
35 setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
36 # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
37
38 setlocale(LC_CTYPE, "");
39 # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define.
40 # see below for documentation about the LC_ALL / LC_CTYPE / LANG.
41
42 # restore the old locale
43 setlocale(LC_CTYPE, $old_locale);
44
bee4bbb5 45The first argument of C<setlocale()> is called B<the category> and the
46second argument B<the locale>. The category tells in what aspect of data
0cdde29f 47processing we want to apply language-specific rules, the locale tells
bee4bbb5 48in what language-country/territory-codeset - but read on for the naming
49of the locales: not all systems name locales as in the example.
50
51For further information about the categories, please consult your
52L<setlocale(3)> manual. For the locales available in your system, also
53consult the L<setlocale(3)> manual and see whether it leads you to the
54list of the available locales (search for the C<SEE ALSO> section). If
55that fails, try out in command line the following commands:
0cdde29f 56
57=over 12
58
59=item locale -a
60
61=item nlsinfo
62
63=item ls /usr/lib/nls/loc
64
65=item ls /usr/lib/locale
66
67=item ls /usr/lib/nls
68
69=back
70
71and see whether they list something resembling these
72
73 en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
74 en_US de_DE ru_RU
bee4bbb5 75 en de ru
0cdde29f 76 english german russian
77 english.iso88591 german.iso88591 russian.iso88595
78
79Sadly enough even if the calling interface has been standardized
bee4bbb5 80the names of the locales are not. The naming usually is
81language-country/territory-codeset but the latter parts may
82not be present. Two special locales are worth special mention:
83
84 "C"
0cdde29f 85
bee4bbb5 86and
87 "POSIX"
0cdde29f 88
bee4bbb5 89Currently and effectively these are the same locale: the difference is
90mainly that the first one is defined by the C standard and the second
91one is defined by the POSIX standard. What they mean and define is the
92B<default locale> in which every program does start in. The language
93is (American) English and the character codeset C<ASCII>.
94B<NOTE>: not all systems have the C<"POSIX"> locale (not all systems
95are POSIX): use the C<"C"> locale when you need the default locale.
96
97=head2 Category LC_CTYPE: CHARACTER TYPES
98
99Starting from Perl version 5.002 perl has obeyed the C<LC_CTYPE>
0cdde29f 100environment variable which controls application's notions on
101which characters are alphabetic characters. This affects in
102Perl the regular expression metanotation
103
104 \w
105
bee4bbb5 106which stands for alphanumeric characters, that is, alphabetic and
107numeric characters (please consult L<perlre> for more information
108about regular expressions). Thanks to the C<LC_CTYPE>, depending on
109your locale settings, characters like C<Æ>, C<É>, C<ß>, C<ø>, can be
110understood as C<\w> characters.
0cdde29f 111
bee4bbb5 112=head2 Category LC_COLLATE: COLLATION
0cdde29f 113
bee4bbb5 114Starting from Perl version 5.003_06 perl has obeyed the B<LC_COLLATE>
0cdde29f 115environment variable which controls application's notions on the
bee4bbb5 116collation (ordering) of the characters. C<B> does in most Latin
117alphabets follow the C<A> but where do the C<Á> and C<Ä> belong?
0cdde29f 118
119Here is a code snippet that will tell you what are the alphanumeric
120characters in the current locale, in the locale order:
121
122 perl -le 'print sort grep /\w/, map { chr() } 0..255'
123
124As noted above, this will work only for Perl versions 5.003_06 and up.
125
126B<NOTE>: in the pre-5.003_06 Perl releases the per-locale collation
127was possible using the C<I18N::Collate> library module. This is now
128mildly obsolete and to be avoided. The C<LC_COLLATE> functionality is
129integrated into the Perl core language and one can use scalar data
130completely normally -- there is no need to juggle with the scalar
131references of C<I18N::Collate>.
132
133=head1 ENVIRONMENT
134
135=over 12
136
137=item PERL_BADLANG
138
139A string that controls whether Perl warns in its startup about failed
bee4bbb5 140locale settings. This can happen if the locale support in the
141operating system is lacking (broken) is some way. If this string has
142an integer value differing from zero, Perl will not complain.
0cdde29f 143B<NOTE>: this is just hiding the warning message: the message tells
144about some problem in your system's locale support and you should
145investigate what the problem is.
146
147=back
148
149The following environment variables are not specific to Perl: they are
150part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to
151control an application's opinion on data.
152
153=over 12
154
155=item LC_ALL
156
157C<LC_ALL> is the "override-all" locale environment variable. If it is
158set, it overrides all the rest of the locale environment variables.
159
160=item LC_CTYPE
161
162C<LC_ALL> controls the classification of characters, see above.
163
164If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
165the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the C<LANG>
166is set, the C<LANG> is used as the C<LC_CTYPE>.
167If none of these three is set, the default locale C<"C">
168is used as the C<LC_CTYPE>.
169
170=item LC_COLLATE
171
172C<LC_ALL> controls the collation of characters, see above.
173
174If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
175the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the
176C<LANG> is set, the C<LANG> is used as the C<LC_COLLATE>.
177If none of these three is set, the default locale C<"C">
178is used as the C<LC_COLLATE>.
179
180=item LANG
181
182LC_ALL is the "catch-all" locale environment variable. If it is set,
183it is used as the last resort if neither of the C<LC_ALL> and the
184category-specific C<LC_...> are set.
185
186=back
187
188There are further locale-controlling environment variables
bee4bbb5 189(C<LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME>) but Perl
190B<does not> currently obey them.