Commit | Line | Data |
0cdde29f |
1 | =head1 NAME |
2 | |
3 | perl18n - Perl i18n (internalization) |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | Perl supports the language-specific notions of data like |
8 | "is this a letter" and "which letter comes first". These |
9 | are very important issues especially for languages other |
10 | than English -- but also for English: it would be very |
11 | naive indeed to think that C<A-Za-z> defines all the letters. |
12 | |
13 | Perl understands the language-specific data via the standardized |
14 | (ISO C, XPG4, POSIX 1.c) method called "the locale system". |
15 | The locale system is controlled per application using several |
16 | environment variables. |
17 | |
18 | =head1 USING LOCALES |
19 | |
20 | If your operating system supports the locale system and you have |
21 | installed the locale system and you have set your locale environment |
22 | variables correctly (please see below) before running Perl, Perl will |
23 | understand your data correctly. |
24 | |
25 | In runtime you can switch locales using the POSIX::setlocale(). |
26 | |
27 | use POSIX qw(setlocale LC_CTYPE); |
28 | |
29 | # query and save the old locale. |
30 | $old_locale = setlocale(LC_CTYPE); |
31 | |
32 | setlocale(LC_CTYPE, "fr_CA.ISO8859-1"); |
33 | # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1" |
34 | |
35 | setlocale(LC_CTYPE, ""); |
36 | # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define. |
37 | # see below for documentation about the LC_ALL / LC_CTYPE / LANG. |
38 | |
39 | # restore the old locale |
40 | setlocale(LC_CTYPE, $old_locale); |
41 | |
42 | The first argument of setlocale() is called the category and the |
43 | second argument the locale. The category tells in what area of data |
44 | processing we want to apply language-specific rules, the locale tells |
45 | in what language-country/territory-codeset. For further information |
46 | about the categories, please consult your L<setlocale(3)> manual. For |
47 | the locales available in your system, also consult the L<setlocale(3)> |
48 | manual and see whether it leads you to the list of the available |
49 | locales (search for the C<SEE ALSO> section). If that fails, try out |
50 | in command line the following commands: |
51 | |
52 | =over 12 |
53 | |
54 | =item locale -a |
55 | |
56 | =item nlsinfo |
57 | |
58 | =item ls /usr/lib/nls/loc |
59 | |
60 | =item ls /usr/lib/locale |
61 | |
62 | =item ls /usr/lib/nls |
63 | |
64 | =back |
65 | |
66 | and see whether they list something resembling these |
67 | |
68 | en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5 |
69 | en_US de_DE ru_RU |
70 | english german russian |
71 | english.iso88591 german.iso88591 russian.iso88595 |
72 | |
73 | Sadly enough even if the calling interface has been standardized |
74 | the names of the locales are not. |
75 | |
76 | =head2 CHARACTER TYPES |
77 | |
78 | Starting from Perl version 5.002 perl has obeyed the LC_CTYPE |
79 | environment variable which controls application's notions on |
80 | which characters are alphabetic characters. This affects in |
81 | Perl the regular expression metanotation |
82 | |
83 | \w |
84 | |
85 | which stands for alphanumeric characters, that is, alphabetic |
86 | and numeric characters. Depending on your locale settings, |
87 | characters like C<F>, C<I>, C<_>, C<x>, can be understood |
88 | as C<\w> characters. |
89 | |
90 | =head2 COLLATION |
91 | |
92 | Starting from Perl version 5.003_06 perl has obeyed the LC_COLLATE |
93 | environment variable which controls application's notions on the |
94 | ordering (collation) of the characters. C<B> does in most Latin |
95 | alphabets follow the C<A> but where do the C<A> and C<D> belong? |
96 | |
97 | Here is a code snippet that will tell you what are the alphanumeric |
98 | characters in the current locale, in the locale order: |
99 | |
100 | perl -le 'print sort grep /\w/, map { chr() } 0..255' |
101 | |
102 | As noted above, this will work only for Perl versions 5.003_06 and up. |
103 | |
104 | B<NOTE>: in the pre-5.003_06 Perl releases the per-locale collation |
105 | was possible using the C<I18N::Collate> library module. This is now |
106 | mildly obsolete and to be avoided. The C<LC_COLLATE> functionality is |
107 | integrated into the Perl core language and one can use scalar data |
108 | completely normally -- there is no need to juggle with the scalar |
109 | references of C<I18N::Collate>. |
110 | |
111 | =head1 ENVIRONMENT |
112 | |
113 | =over 12 |
114 | |
115 | =item PERL_BADLANG |
116 | |
117 | A string that controls whether Perl warns in its startup about failed |
118 | language-specific "locale" settings. This can happen if the locale |
119 | support in the operating system is lacking is some way. If this string |
120 | has an integer value differing from zero, Perl will not complain. |
121 | B<NOTE>: this is just hiding the warning message: the message tells |
122 | about some problem in your system's locale support and you should |
123 | investigate what the problem is. |
124 | |
125 | =back |
126 | |
127 | The following environment variables are not specific to Perl: they are |
128 | part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to |
129 | control an application's opinion on data. |
130 | |
131 | =over 12 |
132 | |
133 | =item LC_ALL |
134 | |
135 | C<LC_ALL> is the "override-all" locale environment variable. If it is |
136 | set, it overrides all the rest of the locale environment variables. |
137 | |
138 | =item LC_CTYPE |
139 | |
140 | C<LC_ALL> controls the classification of characters, see above. |
141 | |
142 | If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as |
143 | the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the C<LANG> |
144 | is set, the C<LANG> is used as the C<LC_CTYPE>. |
145 | If none of these three is set, the default locale C<"C"> |
146 | is used as the C<LC_CTYPE>. |
147 | |
148 | =item LC_COLLATE |
149 | |
150 | C<LC_ALL> controls the collation of characters, see above. |
151 | |
152 | If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as |
153 | the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the |
154 | C<LANG> is set, the C<LANG> is used as the C<LC_COLLATE>. |
155 | If none of these three is set, the default locale C<"C"> |
156 | is used as the C<LC_COLLATE>. |
157 | |
158 | =item LANG |
159 | |
160 | LC_ALL is the "catch-all" locale environment variable. If it is set, |
161 | it is used as the last resort if neither of the C<LC_ALL> and the |
162 | category-specific C<LC_...> are set. |
163 | |
164 | =back |
165 | |
166 | There are further locale-controlling environment variables |
167 | (C<LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME>) but |
168 | Perl B<does not> currently obey them. |
169 | |