Commit | Line | Data |
0cdde29f |
1 | =head1 NAME |
2 | |
3 | perl18n - Perl i18n (internalization) |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | Perl supports the language-specific notions of data like |
8 | "is this a letter" and "which letter comes first". These |
9 | are very important issues especially for languages other |
10 | than English -- but also for English: it would be very |
bee4bbb5 |
11 | naïve indeed to think that C<A-Za-z> defines all the letters. |
0cdde29f |
12 | |
13 | Perl understands the language-specific data via the standardized |
14 | (ISO C, XPG4, POSIX 1.c) method called "the locale system". |
bee4bbb5 |
15 | The locale system is controlled per application using one |
16 | function call and several environment variables. |
0cdde29f |
17 | |
18 | =head1 USING LOCALES |
19 | |
20 | If your operating system supports the locale system and you have |
21 | installed the locale system and you have set your locale environment |
22 | variables correctly (please see below) before running Perl, Perl will |
bee4bbb5 |
23 | understand your data correctly according to your locale settings. |
0cdde29f |
24 | |
25 | In runtime you can switch locales using the POSIX::setlocale(). |
26 | |
bee4bbb5 |
27 | # setlocale is the function call |
28 | # LC_CTYPE will be explained later |
29 | |
0cdde29f |
30 | use POSIX qw(setlocale LC_CTYPE); |
31 | |
32 | # query and save the old locale. |
33 | $old_locale = setlocale(LC_CTYPE); |
34 | |
35 | setlocale(LC_CTYPE, "fr_CA.ISO8859-1"); |
36 | # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1" |
37 | |
38 | setlocale(LC_CTYPE, ""); |
39 | # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define. |
40 | # see below for documentation about the LC_ALL / LC_CTYPE / LANG. |
41 | |
42 | # restore the old locale |
43 | setlocale(LC_CTYPE, $old_locale); |
44 | |
bee4bbb5 |
45 | The first argument of C<setlocale()> is called B<the category> and the |
46 | second argument B<the locale>. The category tells in what aspect of data |
0cdde29f |
47 | processing we want to apply language-specific rules, the locale tells |
bee4bbb5 |
48 | in what language-country/territory-codeset - but read on for the naming |
49 | of the locales: not all systems name locales as in the example. |
50 | |
51 | For further information about the categories, please consult your |
52 | L<setlocale(3)> manual. For the locales available in your system, also |
53 | consult the L<setlocale(3)> manual and see whether it leads you to the |
54 | list of the available locales (search for the C<SEE ALSO> section). If |
55 | that fails, try out in command line the following commands: |
0cdde29f |
56 | |
57 | =over 12 |
58 | |
59 | =item locale -a |
60 | |
61 | =item nlsinfo |
62 | |
63 | =item ls /usr/lib/nls/loc |
64 | |
65 | =item ls /usr/lib/locale |
66 | |
67 | =item ls /usr/lib/nls |
68 | |
69 | =back |
70 | |
71 | and see whether they list something resembling these |
72 | |
73 | en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5 |
74 | en_US de_DE ru_RU |
bee4bbb5 |
75 | en de ru |
0cdde29f |
76 | english german russian |
77 | english.iso88591 german.iso88591 russian.iso88595 |
78 | |
79 | Sadly enough even if the calling interface has been standardized |
bee4bbb5 |
80 | the names of the locales are not. The naming usually is |
81 | language-country/territory-codeset but the latter parts may |
82 | not be present. Two special locales are worth special mention: |
83 | |
84 | "C" |
0cdde29f |
85 | |
bee4bbb5 |
86 | and |
87 | "POSIX" |
0cdde29f |
88 | |
bee4bbb5 |
89 | Currently and effectively these are the same locale: the difference is |
90 | mainly that the first one is defined by the C standard and the second |
91 | one is defined by the POSIX standard. What they mean and define is the |
92 | B<default locale> in which every program does start in. The language |
93 | is (American) English and the character codeset C<ASCII>. |
94 | B<NOTE>: not all systems have the C<"POSIX"> locale (not all systems |
95 | are POSIX): use the C<"C"> locale when you need the default locale. |
96 | |
97 | =head2 Category LC_CTYPE: CHARACTER TYPES |
98 | |
99 | Starting from Perl version 5.002 perl has obeyed the C<LC_CTYPE> |
0cdde29f |
100 | environment variable which controls application's notions on |
101 | which characters are alphabetic characters. This affects in |
102 | Perl the regular expression metanotation |
103 | |
104 | \w |
105 | |
bee4bbb5 |
106 | which stands for alphanumeric characters, that is, alphabetic and |
107 | numeric characters (please consult L<perlre> for more information |
108 | about regular expressions). Thanks to the C<LC_CTYPE>, depending on |
109 | your locale settings, characters like C<Æ>, C<É>, C<ß>, C<ø>, can be |
110 | understood as C<\w> characters. |
0cdde29f |
111 | |
bee4bbb5 |
112 | =head2 Category LC_COLLATE: COLLATION |
0cdde29f |
113 | |
bee4bbb5 |
114 | Starting from Perl version 5.003_06 perl has obeyed the B<LC_COLLATE> |
0cdde29f |
115 | environment variable which controls application's notions on the |
bee4bbb5 |
116 | collation (ordering) of the characters. C<B> does in most Latin |
117 | alphabets follow the C<A> but where do the C<Á> and C<Ä> belong? |
0cdde29f |
118 | |
119 | Here is a code snippet that will tell you what are the alphanumeric |
120 | characters in the current locale, in the locale order: |
121 | |
122 | perl -le 'print sort grep /\w/, map { chr() } 0..255' |
123 | |
124 | As noted above, this will work only for Perl versions 5.003_06 and up. |
125 | |
126 | B<NOTE>: in the pre-5.003_06 Perl releases the per-locale collation |
127 | was possible using the C<I18N::Collate> library module. This is now |
128 | mildly obsolete and to be avoided. The C<LC_COLLATE> functionality is |
129 | integrated into the Perl core language and one can use scalar data |
130 | completely normally -- there is no need to juggle with the scalar |
131 | references of C<I18N::Collate>. |
132 | |
133 | =head1 ENVIRONMENT |
134 | |
135 | =over 12 |
136 | |
137 | =item PERL_BADLANG |
138 | |
139 | A string that controls whether Perl warns in its startup about failed |
bee4bbb5 |
140 | locale settings. This can happen if the locale support in the |
141 | operating system is lacking (broken) is some way. If this string has |
142 | an integer value differing from zero, Perl will not complain. |
0cdde29f |
143 | B<NOTE>: this is just hiding the warning message: the message tells |
144 | about some problem in your system's locale support and you should |
145 | investigate what the problem is. |
146 | |
147 | =back |
148 | |
149 | The following environment variables are not specific to Perl: they are |
150 | part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to |
151 | control an application's opinion on data. |
152 | |
153 | =over 12 |
154 | |
155 | =item LC_ALL |
156 | |
157 | C<LC_ALL> is the "override-all" locale environment variable. If it is |
158 | set, it overrides all the rest of the locale environment variables. |
159 | |
160 | =item LC_CTYPE |
161 | |
162 | C<LC_ALL> controls the classification of characters, see above. |
163 | |
164 | If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as |
165 | the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the C<LANG> |
166 | is set, the C<LANG> is used as the C<LC_CTYPE>. |
167 | If none of these three is set, the default locale C<"C"> |
168 | is used as the C<LC_CTYPE>. |
169 | |
170 | =item LC_COLLATE |
171 | |
172 | C<LC_ALL> controls the collation of characters, see above. |
173 | |
174 | If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as |
175 | the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the |
176 | C<LANG> is set, the C<LANG> is used as the C<LC_COLLATE>. |
177 | If none of these three is set, the default locale C<"C"> |
178 | is used as the C<LC_COLLATE>. |
179 | |
180 | =item LANG |
181 | |
182 | LC_ALL is the "catch-all" locale environment variable. If it is set, |
183 | it is used as the last resort if neither of the C<LC_ALL> and the |
184 | category-specific C<LC_...> are set. |
185 | |
186 | =back |
187 | |
188 | There are further locale-controlling environment variables |
bee4bbb5 |
189 | (C<LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME>) but Perl |
190 | B<does not> currently obey them. |