3 perl18n - Perl i18n (internalization)
7 Perl supports the language-specific notions of data like
8 "is this a letter" and "which letter comes first". These
9 are very important issues especially for languages other
10 than English -- but also for English: it would be very
11 naive indeed to think that C<A-Za-z> defines all the letters.
13 Perl understands the language-specific data via the standardized
14 (ISO C, XPG4, POSIX 1.c) method called "the locale system".
15 The locale system is controlled per application using several
16 environment variables.
20 If your operating system supports the locale system and you have
21 installed the locale system and you have set your locale environment
22 variables correctly (please see below) before running Perl, Perl will
23 understand your data correctly.
25 In runtime you can switch locales using the POSIX::setlocale().
27 use POSIX qw(setlocale LC_CTYPE);
29 # query and save the old locale.
30 $old_locale = setlocale(LC_CTYPE);
32 setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
33 # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
35 setlocale(LC_CTYPE, "");
36 # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define.
37 # see below for documentation about the LC_ALL / LC_CTYPE / LANG.
39 # restore the old locale
40 setlocale(LC_CTYPE, $old_locale);
42 The first argument of setlocale() is called the category and the
43 second argument the locale. The category tells in what area of data
44 processing we want to apply language-specific rules, the locale tells
45 in what language-country/territory-codeset. For further information
46 about the categories, please consult your L<setlocale(3)> manual. For
47 the locales available in your system, also consult the L<setlocale(3)>
48 manual and see whether it leads you to the list of the available
49 locales (search for the C<SEE ALSO> section). If that fails, try out
50 in command line the following commands:
58 =item ls /usr/lib/nls/loc
60 =item ls /usr/lib/locale
66 and see whether they list something resembling these
68 en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
70 english german russian
71 english.iso88591 german.iso88591 russian.iso88595
73 Sadly enough even if the calling interface has been standardized
74 the names of the locales are not.
76 =head2 CHARACTER TYPES
78 Starting from Perl version 5.002 perl has obeyed the LC_CTYPE
79 environment variable which controls application's notions on
80 which characters are alphabetic characters. This affects in
81 Perl the regular expression metanotation
85 which stands for alphanumeric characters, that is, alphabetic
86 and numeric characters. Depending on your locale settings,
87 characters like C<F>, C<I>, C<_>, C<x>, can be understood
92 Starting from Perl version 5.003_06 perl has obeyed the LC_COLLATE
93 environment variable which controls application's notions on the
94 ordering (collation) of the characters. C<B> does in most Latin
95 alphabets follow the C<A> but where do the C<A> and C<D> belong?
97 Here is a code snippet that will tell you what are the alphanumeric
98 characters in the current locale, in the locale order:
100 perl -le 'print sort grep /\w/, map { chr() } 0..255'
102 As noted above, this will work only for Perl versions 5.003_06 and up.
104 B<NOTE>: in the pre-5.003_06 Perl releases the per-locale collation
105 was possible using the C<I18N::Collate> library module. This is now
106 mildly obsolete and to be avoided. The C<LC_COLLATE> functionality is
107 integrated into the Perl core language and one can use scalar data
108 completely normally -- there is no need to juggle with the scalar
109 references of C<I18N::Collate>.
117 A string that controls whether Perl warns in its startup about failed
118 language-specific "locale" settings. This can happen if the locale
119 support in the operating system is lacking is some way. If this string
120 has an integer value differing from zero, Perl will not complain.
121 B<NOTE>: this is just hiding the warning message: the message tells
122 about some problem in your system's locale support and you should
123 investigate what the problem is.
127 The following environment variables are not specific to Perl: they are
128 part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to
129 control an application's opinion on data.
135 C<LC_ALL> is the "override-all" locale environment variable. If it is
136 set, it overrides all the rest of the locale environment variables.
140 C<LC_ALL> controls the classification of characters, see above.
142 If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
143 the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the C<LANG>
144 is set, the C<LANG> is used as the C<LC_CTYPE>.
145 If none of these three is set, the default locale C<"C">
146 is used as the C<LC_CTYPE>.
150 C<LC_ALL> controls the collation of characters, see above.
152 If this is unset and the C<LC_ALL> is set, the C<LC_ALL> is used as
153 the C<LC_CTYPE>. If both this and the C<LC_ALL> are unset but the
154 C<LANG> is set, the C<LANG> is used as the C<LC_COLLATE>.
155 If none of these three is set, the default locale C<"C">
156 is used as the C<LC_COLLATE>.
160 LC_ALL is the "catch-all" locale environment variable. If it is set,
161 it is used as the last resort if neither of the C<LC_ALL> and the
162 category-specific C<LC_...> are set.
166 There are further locale-controlling environment variables
167 (C<LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME>) but
168 Perl B<does not> currently obey them.