From: Jarkko Hietaniemi Date: Mon, 7 Oct 1996 19:03:00 +0000 (+0300) Subject: LC_COLLATE. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=0cdde29fee506d2ead224cf2317341e41f9cc939;p=p5sagit%2Fp5-mst-13.2.git LC_COLLATE. Big patch to add, document, and test LC_COLLATE support. written. --- diff --git a/pod/perli18n.pod b/pod/perli18n.pod new file mode 100644 index 0000000..b70f913 --- /dev/null +++ b/pod/perli18n.pod @@ -0,0 +1,169 @@ +=head1 NAME + +perl18n - Perl i18n (internalization) + +=head1 DESCRIPTION + +Perl supports the language-specific notions of data like +"is this a letter" and "which letter comes first". These +are very important issues especially for languages other +than English -- but also for English: it would be very +naive indeed to think that C defines all the letters. + +Perl understands the language-specific data via the standardized +(ISO C, XPG4, POSIX 1.c) method called "the locale system". +The locale system is controlled per application using several +environment variables. + +=head1 USING LOCALES + +If your operating system supports the locale system and you have +installed the locale system and you have set your locale environment +variables correctly (please see below) before running Perl, Perl will +understand your data correctly. + +In runtime you can switch locales using the POSIX::setlocale(). + + use POSIX qw(setlocale LC_CTYPE); + + # query and save the old locale. + $old_locale = setlocale(LC_CTYPE); + + setlocale(LC_CTYPE, "fr_CA.ISO8859-1"); + # for LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1" + + setlocale(LC_CTYPE, ""); + # for LC_CTYPE now in locale what the LC_ALL / LC_CTYPE / LANG define. + # see below for documentation about the LC_ALL / LC_CTYPE / LANG. + + # restore the old locale + setlocale(LC_CTYPE, $old_locale); + +The first argument of setlocale() is called the category and the +second argument the locale. The category tells in what area of data +processing we want to apply language-specific rules, the locale tells +in what language-country/territory-codeset. For further information +about the categories, please consult your L manual. For +the locales available in your system, also consult the L +manual and see whether it leads you to the list of the available +locales (search for the C section). If that fails, try out +in command line the following commands: + +=over 12 + +=item locale -a + +=item nlsinfo + +=item ls /usr/lib/nls/loc + +=item ls /usr/lib/locale + +=item ls /usr/lib/nls + +=back + +and see whether they list something resembling these + + en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5 + en_US de_DE ru_RU + english german russian + english.iso88591 german.iso88591 russian.iso88595 + +Sadly enough even if the calling interface has been standardized +the names of the locales are not. + +=head2 CHARACTER TYPES + +Starting from Perl version 5.002 perl has obeyed the LC_CTYPE +environment variable which controls application's notions on +which characters are alphabetic characters. This affects in +Perl the regular expression metanotation + + \w + +which stands for alphanumeric characters, that is, alphabetic +and numeric characters. Depending on your locale settings, +characters like C, C, C<_>, C, can be understood +as C<\w> characters. + +=head2 COLLATION + +Starting from Perl version 5.003_06 perl has obeyed the LC_COLLATE +environment variable which controls application's notions on the +ordering (collation) of the characters. C does in most Latin +alphabets follow the C but where do the C and C belong? + +Here is a code snippet that will tell you what are the alphanumeric +characters in the current locale, in the locale order: + + perl -le 'print sort grep /\w/, map { chr() } 0..255' + +As noted above, this will work only for Perl versions 5.003_06 and up. + +B: in the pre-5.003_06 Perl releases the per-locale collation +was possible using the C library module. This is now +mildly obsolete and to be avoided. The C functionality is +integrated into the Perl core language and one can use scalar data +completely normally -- there is no need to juggle with the scalar +references of C. + +=head1 ENVIRONMENT + +=over 12 + +=item PERL_BADLANG + +A string that controls whether Perl warns in its startup about failed +language-specific "locale" settings. This can happen if the locale +support in the operating system is lacking is some way. If this string +has an integer value differing from zero, Perl will not complain. +B: this is just hiding the warning message: the message tells +about some problem in your system's locale support and you should +investigate what the problem is. + +=back + +The following environment variables are not specific to Perl: they are +part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale method to +control an application's opinion on data. + +=over 12 + +=item LC_ALL + +C is the "override-all" locale environment variable. If it is +set, it overrides all the rest of the locale environment variables. + +=item LC_CTYPE + +C controls the classification of characters, see above. + +If this is unset and the C is set, the C is used as +the C. If both this and the C are unset but the C +is set, the C is used as the C. +If none of these three is set, the default locale C<"C"> +is used as the C. + +=item LC_COLLATE + +C controls the collation of characters, see above. + +If this is unset and the C is set, the C is used as +the C. If both this and the C are unset but the +C is set, the C is used as the C. +If none of these three is set, the default locale C<"C"> +is used as the C. + +=item LANG + +LC_ALL is the "catch-all" locale environment variable. If it is set, +it is used as the last resort if neither of the C and the +category-specific C are set. + +=back + +There are further locale-controlling environment variables +(C) but +Perl B currently obey them. +