From: Dominic Dunlop Date: Sat, 28 Dec 1996 09:56:41 +0000 (+0100) Subject: Locale-related pod patches, take 2 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=a034a98d8bfd0fd904012bd5227ce209aaaa0b26;p=p5sagit%2Fp5-mst-13.2.git Locale-related pod patches, take 2 [Ahem. Had the wrong thing in the scratch-pad, didn't I? Please ignore my previous full posting of a slightly-tweaked perllocale.pod. This mail contains what I really meant to send.] Herewith (quick, before _18 appears) locale-related patches to the documentation in perl5.003_17/pod. The main effect is to add locale-related information to pods other than perllocale.pod, although there are some tiny tweaks to that pod too. Produces no complaints from pod2man; not checked for layout since 5.003_13. p5p-msgid: --- diff --git a/pod/perl.pod b/pod/perl.pod index 76a0f26..7ac7094 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -267,8 +267,8 @@ directory. If PERL5LIB is defined, PERLLIB is not used. =back -Perl also has environment variables that control how Perl handles -language-specific data. Please consult L. +Perl also has environment variables that control how Perl handles data +specific to particular natural languages. See L. Apart from these, Perl uses no other environment variables, except to make them available to the script being executed, and to child diff --git a/pod/perlform.pod b/pod/perlform.pod index 4fac1a6..b11936b 100644 --- a/pod/perlform.pod +++ b/pod/perlform.pod @@ -72,7 +72,14 @@ separated by commas. The expressions are all evaluated in a list context before the line is processed, so a single list expression could produce multiple list elements. The expressions may be spread out to more than one line if enclosed in braces. If so, the opening brace must be the first -token on the first line. +token on the first line. If an expression evaluates to a number with a +decimal part, and if the corresponding picture specifies that the decimal +part should appear in the output (that is, any picture except multiple "#" +characters B an embedded "."), the character used for the decimal +point is B determined by the current LC_NUMERIC locale. This +means that, if, for example, the run-time environment happens to specify a +German locale, "," will be used instead of the default ".". See +L and L<"WARNINGS"> for more information. Picture fields that begin with ^ rather than @ are treated specially. With a # field, the field is blanked out if the value is undefined. For @@ -306,10 +313,20 @@ is to printf(), do this: END print $string; -=head1 WARNING +=head1 WARNINGS Lexical variables (declared with "my") are not visible within a format unless the format is declared within the scope of the lexical variable. (They weren't visible at all before version 5.001.) Furthermore, lexical aliases will not be compiled correctly: see L for other issues. + +Formats are the only part of Perl which unconditionally use information +from a program's locale; if a program's environment specifies an +LC_NUMERIC locale, it is always used to specify the decimal point +character in formatted output. Perl ignores all other aspects of locale +handling unless the C pragma is in effect. Formatted output +cannot be controlled by C because the pragma is tied to the +block structure of the program, and, for historical reasons, formats +exist outside that block structure. See L for further +discussion of locale handling. diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index fe3da14..c39dd29 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -1544,7 +1544,7 @@ C block, if any, is not executed: Returns an lowercased version of EXPR. This is the internal function implementing the \L escape in double-quoted strings. -Should respect any POSIX setlocale() settings. +Respects current LC_CTYPE locale if C in force. See L. If EXPR is omitted, uses $_. @@ -1554,7 +1554,7 @@ If EXPR is omitted, uses $_. Returns the value of EXPR with the first character lowercased. This is the internal function implementing the \l escape in double-quoted strings. -Should respect any POSIX setlocale() settings. +Respects current LC_CTYPE locale if C in force. See L. If EXPR is omitted, uses $_. @@ -2106,8 +2106,10 @@ you will have to use a block returning its value instead: =item printf FORMAT, LIST -Equivalent to a "print FILEHANDLE sprintf(FORMAT, LIST)". The first argument -of the list will be interpreted as the printf format. +Equivalent to C. The first argument +of the list will be interpreted as the printf format. If C is +in effect, the character used for the decimal point in formatted real numbers +is affected by the LC_NUMERIC locale. See L. =item prototype FUNCTION @@ -2141,8 +2143,11 @@ Generalized quotes. See L. =item quotemeta -Returns the value of EXPR with with all regular expression -metacharacters backslashed. This is the internal function implementing +Returns the value of EXPR with with all non-alphanumeric +characters backslashed. (That is, all characters not matching +C will be preceded by a backslash in the +returned string, regardless of any locale settings.) +This is the internal function implementing the \Q escape in double-quoted strings. If EXPR is omitted, uses $_. @@ -2674,6 +2679,9 @@ the subroutine not via @_ but as the package global variables $a and $b (see example below). They are passed by reference, so don't modify $a and $b. And don't try to declare them as lexicals either. +When C is in effect, C sorts LIST according to the +current collation locale. See L. + Examples: # sort lexically @@ -2888,7 +2896,10 @@ Returns a string formatted by the usual printf conventions of the C language. See L or L on your system for details. (The * character for an indirectly specified length is not supported, but you can get the same effect by interpolating a variable -into the pattern.) Some C libraries' implementations of sprintf() can +into the pattern.) If C is +in effect, the character used for the decimal point in formatted real numbers +is affected by the LC_NUMERIC locale. See L. +Some C libraries' implementations of sprintf() can dump core when fed ludicrous arguments. =item sqrt EXPR @@ -3243,7 +3254,7 @@ on your system. Returns an uppercased version of EXPR. This is the internal function implementing the \U escape in double-quoted strings. -Should respect any POSIX setlocale() settings. +Respects current LC_CTYPE locale if C in force. See L. If EXPR is omitted, uses $_. @@ -3253,7 +3264,7 @@ If EXPR is omitted, uses $_. Returns the value of EXPR with the first character uppercased. This is the internal function implementing the \u escape in double-quoted strings. -Should respect any POSIX setlocale() settings. +Respects current LC_CTYPE locale if C in force. See L. If EXPR is omitted, uses $_. diff --git a/pod/perlop.pod b/pod/perlop.pod index a75cb49..a8f34c0 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -290,6 +290,9 @@ to the right argument. Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument. +"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified +by the current locale if C is in effect. See L. + =head2 Bitwise And Binary "&" returns its operators ANDed together bit by bit. @@ -580,6 +583,9 @@ are interpolated, as are the following sequences: \E end case modification \Q quote regexp metacharacters till \E +If C is in effect, the case map used by C<\l>, C<\L>, C<\u> +and <\U> is taken from the current locale. See L. + Patterns are subject to an additional level of interpretation as a regular expression. This is done as a second pass, after variables are interpolated, so that regular expressions may be incorporated into the @@ -619,6 +625,8 @@ C operator, the $_ string is searched. (The string specified with C<=~> need not be an lvalue--it may be the result of an expression evaluation, but remember the C<=~> binds rather tightly.) See also L. +See L for discussion of additional considerations which apply +when C is in effect. Options are: @@ -769,6 +777,8 @@ at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the C option. If the pattern evaluates to a null string, the last successfully executed regular expression is used instead. See L for further explanation on these. +See L for discussion of additional considerations which apply +when C is in effect. Options are: diff --git a/pod/perlre.pod b/pod/perlre.pod index ce054ec..12f9f51 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -19,6 +19,9 @@ the regular expression inside. These are: Do case-insensitive pattern matching. +If C is in effect, the case map is taken from the current +locale. See L. + =item m Treat string as multiple lines. That is, change "^" and "$" from matching @@ -136,6 +139,9 @@ also work: \E end case modification (think vi) \Q quote regexp metacharacters till \E +If C is in effect, the case map used by C<\l>, C<\L>, C<\u> +and <\U> is taken from the current locale. See L. + In addition, Perl defines the following: \w Match a "word" character (alphanumeric plus "_") @@ -146,9 +152,11 @@ In addition, Perl defines the following: \D Match a non-digit character Note that C<\w> matches a single alphanumeric character, not a whole -word. To match a word you'd need to say C<\w+>. You may use C<\w>, -C<\W>, C<\s>, C<\S>, C<\d>, and C<\D> within character classes (though not -as either end of a range). +word. To match a word you'd need to say C<\w+>. If C is in +effect, the list of alphabetic characters generated by C<\w> is taken +from the current locale. See L. You may use C<\w>, C<\W>, +C<\s>, C<\S>, C<\d>, and C<\D> within character classes (though not as +either end of a range). Perl defines the following zero-width assertions: diff --git a/pod/perlsec.pod b/pod/perlsec.pod index 2b69727..69de859 100644 --- a/pod/perlsec.pod +++ b/pod/perlsec.pod @@ -30,14 +30,14 @@ program more secure than the corresponding C program. You may not use data derived from outside your program to affect something else outside your program--at least, not by accident. All command-line -arguments, environment variables, and file input are marked as "tainted". -Tainted data may not be used directly or indirectly in any command that -invokes a sub-shell, nor in any command that modifies files, directories, -or processes. Any variable set within an expression that has previously -referenced a tainted value itself becomes tainted, even if it is logically -impossible for the tainted value to influence the variable. Because -taintedness is associated with each scalar value, some elements of an -array can be tainted and others not. +arguments, environment variables, locale information (see L), +and file input are marked as "tainted". Tainted data may not be used +directly or indirectly in any command that invokes a sub-shell, nor in any +command that modifies files, directories, or processes. Any variable set +within an expression that has previously referenced a tainted value itself +becomes tainted, even if it is logically impossible for the tainted value +to influence the variable. Because taintedness is associated with each +scalar value, some elements of an array can be tainted and others not. For example: @@ -107,10 +107,10 @@ mechanism is by referencing sub-patterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern. That means using a bit of thought--don't just blindly untaint anything, or you defeat the -entire mechanism. It's better to verify that the variable has only -good characters (for certain values of "good") rather than checking -whether it has any bad characters. That's because it's far too easy to -miss bad characters that you never thought of. +entire mechanism. It's better to verify that the variable has only good +characters (for certain values of "good") rather than checking whether it +has any bad characters. That's because it's far too easy to miss bad +characters that you never thought of. Here's a test to make sure that the data contains nothing but "word" characters (alphabetics, numerics, and underscores), a hyphen, an at sign, @@ -131,6 +131,14 @@ Laundering data using regular expression is the I mechanism for untainting dirty data, unless you use the strategy detailed below to fork a child of lesser privilege. +The example does not untaint $data if C is in effect, +because the characters matched by C<\w> are determined by the locale. +Perl considers that locale definitions are untrustworthy because they +contain data from outside the program. If you are writing a +locale-aware program, and want to launder data with a regular expression +containing C<\w>, put C ahead of the expression in the same +block. See L for further discussion and examples. + =head2 Cleaning Up Your Path For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to a