X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperllocale.pod;h=e1bf5f070df9657fe8eeddcdc7378ca64c3d5dd5;hb=ce3f0a3cd6f30cb49f01b7811c2891acb7bab15a;hp=aac84d6c548070119064ba2afbae58c1c3eafacc;hpb=2bdf8adde6b319db9fd155a84faad4c2821b76dc;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perllocale.pod b/pod/perllocale.pod index aac84d6..e1bf5f0 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -5,7 +5,7 @@ perllocale - Perl locale handling (internationalization and localization) =head1 DESCRIPTION Perl supports language-specific notions of data such as "is this a -letter", "what is the upper-case equivalent of this letter", and "which +letter", "what is the uppercase equivalent of this letter", and "which of these letters comes first". These are important issues, especially for languages other than English - but also for English: it would be very naEve to think that C defines all the "letters". Perl @@ -23,6 +23,8 @@ several environment variables. B: This feature is new in Perl 5.004, and does not apply unless an application specifically requests it - see L. +The one exception is that write() now B uses the current locale +- see L<"NOTES">. =head1 PREPARING TO USE LOCALES @@ -61,7 +63,7 @@ C. If you want a Perl application to process and present your data according to a particular locale, the application code should include -the S> pragma (see L) where +the S> pragma (see L) where appropriate, and B of the following must be true: =over 4 @@ -180,7 +182,7 @@ As the example shows, if the second argument is an empty string, the category's locale is returned to the default specified by the corresponding environment variables. Generally, this results in a return to the default which was in force when Perl started up: changes -to the environment made by the application after start-up may or may not +to the environment made by the application after startup may or may not be noticed, depending on the implementation of your system's C library. If the second argument does not correspond to a valid locale, the locale @@ -321,18 +323,7 @@ can use POSIX::strcoll() if you don't want this fall-back: $equal_in_locale will be true if the collation locale specifies a dictionary-like ordering which ignores space characters completely, and -which folds case. Alternatively, you can use this idiom: - - use locale; - $s_a = "space and case ignored"; - $s_b = "SpaceAndCaseIgnored"; - $equal_in_locale = $s_a ge $s_b && $s_a le $s_b; - -which works because neither C nor C falls back to doing a -byte-by-byte comparison when the operands are equal according to the -locale. The idiom may be less efficient than using strcoll(), but, -unlike that function, it is not confused by strings containing embedded -nulls. +which folds case. If you have a single string which you want to check for "equality in locale" against several others, you might think you could gain a little @@ -354,13 +345,15 @@ call strxfrm() for both their operands, then do a byte-by-byte comparison of the transformed strings. By calling strxfrm() explicitly, and using a non locale-affected comparison, the example attempts to save a couple of transformations. In fact, it doesn't save anything: Perl -magic (see L) creates the transformed version of a +magic (see L) creates the transformed version of a string the first time it's needed in a comparison, then keeps it around in case it's needed again. An example rewritten the easy way with -C runs just about as fast. It also copes with null characters +C runs just about as fast. It also copes with null characters embedded in strings; if you call strxfrm() directly, it treats the first -null it finds as a terminator. In short, don't call strxfrm() directly: -let Perl do it for you. +null it finds as a terminator. And don't expect the transformed strings +it produces to be portable across systems - or even from one revision +of your operating system to the next. In short, don't call strxfrm() +directly: let Perl do it for you. Note: C isn't shown in some of these examples, as it isn't needed: strcoll() and strxfrm() exist only to generate locale-dependent @@ -377,7 +370,14 @@ regular expressions.) Thanks to C, depending on your locale setting, characters like 'E', 'E', 'E', and 'E' may be understood as C<\w> characters. -C also affects the POSIX character-class test functions - +The C locale also provides the map used in translating +characters between lower and uppercase. This affects the case-mapping +functions - lc(), lcfirst, uc() and ucfirst(); case-mapping +interpolation with C<\l>, C<\L>, C<\u> or <\U> in double-quoted strings +and in C substitutions; and case-independent regular expression +pattern matching using the C modifier. + +Finally, C affects the POSIX character-class test functions - isalpha(), islower() and so on. For example, if you move from the "C" locale to a 7-bit Scandinavian one, you may find - possibly to your surprise - that "|" moves from the ispunct() class to isalpha(). @@ -478,6 +478,12 @@ characters such as "E" and "|" are alphanumeric. =item * +String interpolation with case-mapping, as in, say, C<$dest = +"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE +case-mapping table is in effect. + +=item * + If the decimal point character in the C locale is surreptitiously changed from a dot to a comma, C produces a string result of "123,456". Many people would @@ -525,22 +531,31 @@ the locale: Scalar true/false (or less/equal/greater) result is never tainted. +=item B (with C<\l>, C<\L>, C<\u> or <\U>) + +Result string containing interpolated material is tainted if +C is in effect. + =item B (C): Scalar true/false result never tainted. Subpatterns, either delivered as an array-context result, or as $1 etc. are tainted if C is in effect, and the subpattern regular -expression contains C<\w> (to match an alphanumeric character). The -matched pattern variable, $&, is also tainted if C is in -effect, and the regular expression contains C<\w>. +expression contains C<\w> (to match an alphanumeric character), C<\W> +(non-alphanumeric character), C<\s> (white-space character), or C<\S> +(non white-space character). The matched pattern variable, $&, $` +(pre-match), $' (post-match), and $+ (last match) are also tainted if +C is in effect and the regular expression contains C<\w>, +C<\W>, C<\s>, or C<\S>. =item B (C): -Has the same behavior as the match operator. When C is -in effect, he left operand of C<=~> will become tainted if it is -modified as a result of a substitution based on a regular expression -match involving C<\w>. +Has the same behavior as the match operator. Also, the left +operand of C<=~> becomes tainted when C in effect, +if it is modified as a result of a substitution based on a regular +expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of +case-mapping with C<\l>, C<\L>,C<\u> or <\U>. =item B (sprintf()): @@ -569,13 +584,13 @@ True/false results are never tainted. Three examples illustrate locale-dependent tainting. The first program, which ignores its locale, won't run: a value taken -directly from the command-line may not be used to name an output file +directly from the command line may not be used to name an output file when taint checks are enabled. #/usr/local/bin/perl -T # Run with taint checking - # Command-line sanity check omitted... + # Command line sanity check omitted... $tainted_output_file = shift; open(F, ">$tainted_output_file") @@ -583,7 +598,7 @@ when taint checks are enabled. The program can be made to run by "laundering" the tainted value through a regular expression: the second example - which still ignores locale -information - runs, creating the file named on its command-line +information - runs, creating the file named on its command line if it can. #/usr/local/bin/perl -T @@ -617,7 +632,7 @@ of a match involving C<\w> when C is in effect. =item PERL_BADLANG A string that can suppress Perl's warning about failed locale settings -at start-up. Failure can occur if the locale support in the operating +at startup. Failure can occur if the locale support in the operating system is lacking (broken) is some way - or if you mistyped the name of a locale when you set up your environment. If this environment variable is absent, or has a value which does not evaluate to integer zero - that @@ -688,7 +703,7 @@ L) was always in force, even if the program environment suggested otherwise. By default, Perl still behaves this way so as to maintain backward compatibility. If you want a Perl application to pay attention to locale information, you B use -the S> pragma (see L> Pragma>) to +the S> pragma (see L) to instruct it to do so. Versions of Perl from 5.002 to 5.003 did use the C @@ -718,6 +733,16 @@ exact multiplier depends on the string's contents, the operating system and the locale.) These downsides are dictated more by the operating system's implementation of the locale system than by Perl. +=head2 write() and LC_NUMERIC + +Formats are the only part of Perl which unconditionally use information +from a program's locale; if a program's environment specifies an +LC_NUMERIC locale, it is always used to specify the decimal point +character in formatted output. Formatted output cannot be controlled by +C because the pragma is tied to the block structure of the +program, and, for historical reasons, formats exist outside that block +structure. + =head2 Freely available locale definitions There is a large collection of locale definitions at @@ -753,7 +778,7 @@ In certain system environments the operating system's locale support is broken and cannot be fixed or used by Perl. Such deficiencies can and will result in mysterious hangs and/or Perl core dumps when the C is in effect. When confronted with such a system, -please report in excruciating detail to C, and +please report in excruciating detail to >, and complain to your vendor: maybe some bug fixes exist for these problems in your operating system. Sometimes such bug fixes are called an operating system upgrade. @@ -772,4 +797,4 @@ L Jarkko Hietaniemi's original F heavily hacked by Dominic Dunlop, assisted by the perl5-porters. -Last update: Tue Dec 24 16:43:11 EST 1996 +Last update: Wed Jan 22 11:04:58 EST 1997