3 perl18n - Perl i18n (internalization)
7 Perl supports the language-specific notions of data like
8 "is this a letter" and "which letter comes first". These
9 are very important issues especially for languages other
10 than English -- but also for English: it would be very
11 naïve indeed to think that C<A-Za-z> defines all the letters.
13 Perl understands the language-specific data via the standardized
14 (ISO C, XPG4, POSIX 1.c) method called "the locale system".
15 The locale system is controlled per application using one
16 function call and several environment variables.
20 If your operating system supports the locale system and you have
21 installed the locale system and you have set your locale environment
22 variables correctly (please see below) before running Perl, Perl will
23 understand your data correctly according to your locale settings.
25 In runtime you can switch locales using the POSIX::setlocale().
27 # setlocale is the function call
28 # LC_CTYPE will be explained later
30 use POSIX qw(setlocale LC_CTYPE);
32 # query and save the old locale.
33 $old_locale = setlocale(LC_CTYPE);
35 setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
36 # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
38 setlocale(LC_CTYPE, "");
39 # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define.
40 # see below for documentation about the LC_ALL / LC_CTYPE / LANG.
42 # restore the old locale
43 setlocale(LC_CTYPE, $old_locale);
45 The first argument of C<setlocale()> is called B<the category> and the
46 second argument B<the locale>. The category tells in what aspect of data
47 processing we want to apply language-specific rules, the locale tells
48 in what language-country/territory-codeset - but read on for the naming
49 of the locales: not all systems name locales as in the example.
51 For further information about the categories, please consult your
52 L<setlocale(3)> manual. For the locales available in your system, also
53 consult the L<setlocale(3)> manual and see whether it leads you to the
54 list of the available locales (search for the C<SEE ALSO> section). If
55 that fails, try out in command line the following commands:
63 =item ls /usr/lib/nls/loc
65 =item ls /usr/lib/locale
71 and see whether they list something resembling these
73 en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
76 english german russian
77 english.iso88591 german.iso88591 russian.iso88595
79 Sadly enough even if the calling interface has been standardized
80 the names of the locales are not. The naming usually is
81 language-country/territory-codeset but the latter parts may
82 not be present. Two special locales are worth special mention:
89 Currently and effectively these are the same locale: the difference is
90 mainly that the first one is defined by the C standard and the second
91 one is defined by the POSIX standard. What they mean and define is the
92 B<default locale> in which every program does start in. The language
93 is (American) English and the character codeset C<ASCII>.
94 B<NOTE>: not all systems have the C<"POSIX"> locale (not all systems
95 are POSIX): use the C<"C"> locale when you need the default locale.
97 =head2 Category LC_CTYPE: CHARACTER TYPES
99 Starting from Perl version 5.002 perl has obeyed the C<LC_CTYPE>
100 environment variable which controls application's notions on
101 which characters are alphabetic characters. This affects in
102 Perl the regular expression metanotation
106 which stands for alphanumeric characters, that is, alphabetic and
107 numeric characters (please consult L<perlre> for more information
108 about regular expressions). Thanks to the C<LC_CTYPE>, depending on
109 your locale settings, characters like C<Æ>, C<É>, C<ß>, C<ø>, can be
110 understood as C<\w> characters.
112 =head2 Category LC_COLLATE: COLLATION
114 Starting from Perl version 5.003_06 perl has obeyed the B<LC_COLLATE>
115 environment variable which controls application's notions on the
116 collation (ordering) of the characters. C<B> does in most Latin
117 alphabets follow the C<A> but where do the C<Á> and C<Ä> belong?
119 Here is a code snippet that will tell you what are the alphanumeric
120 characters in the current locale, in the locale order:
122 perl -le 'print sort grep /\w/, map { chr() } 0..255'
124 As noted above, this will work only for Perl versions 5.003_06 and up.
126 B<NOTE>: in the pre-5.003_06 Perl releases the per-locale collation
127 was possible using the C<I18N::Collate> library module. This is now
128 mildly obsolete and to be avoided. The C<LC_COLLATE> functionality is
129 integrated into the Perl core language and one can use scalar data
130 completely normally -- there is no need to juggle with the scalar
131 references of C<I18N::Collate>.
139 A string that controls whether Perl warns in its startup about failed
140 locale settings. This can happen if the locale support in the
141 operating system is lacking (broken) is some way. If this string has
142 an integer value differing from zero, Perl will not complain.
143 B<NOTE>: this is just hiding the warning message: the message tells
144 about some problem in your system's locale support and you should
145 investigate what the problem is.
149 The following environment variables are not specific to Perl: they are
150 part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to
151 control an application's opinion on data.
157 C<LC_ALL> is the "override-all" locale environment variable. If it is
158 set, it overrides all the rest of the locale environment variables.
162 C<LC_ALL> controls the classification of characters, see above.
164 If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
165 the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the C<LANG>
166 is set, the C<LANG> is used as the C<LC_CTYPE>.
167 If none of these three is set, the default locale C<"C">
168 is used as the C<LC_CTYPE>.
172 C<LC_ALL> controls the collation of characters, see above.
174 If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
175 the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the
176 C<LANG> is set, the C<LANG> is used as the C<LC_COLLATE>.
177 If none of these three is set, the default locale C<"C">
178 is used as the C<LC_COLLATE>.
182 LC_ALL is the "catch-all" locale environment variable. If it is set,
183 it is used as the last resort if neither of the C<LC_ALL> and the
184 category-specific C<LC_...> are set.
188 There are further locale-controlling environment variables
189 (C<LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME>) but Perl
190 B<does not> currently obey them.