X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlop.pod;h=73066c1119c17cea6ea62cc51c72e76094730cd4;hb=fbad3eb55c1f8c84d1dfd0e484ecddeffc891e79;hp=8b73629a7c042b071ac5d8465c6ef217e38ee302;hpb=7522fed5983de9fb01258eea70b60840a6a2c756;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlop.pod b/pod/perlop.pod index 8b73629..73066c1 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -44,7 +44,7 @@ Many operators can be overloaded for objects. See L. =head2 Terms and List Operators (Leftward) -A TERM has the highest precedence in Perl. They includes variables, +A TERM has the highest precedence in Perl. They include variables, quote and quote-like operators, any expression in parentheses, and any function whose arguments are parenthesized. Actually, there aren't really functions in this sense, just list operators and unary @@ -620,9 +620,9 @@ the same character fore and aft, but the 4 sorts of brackets "" qq{} Literal yes `` qx{} Command yes (unless '' is delimiter) qw{} Word list no - // m{} Pattern match yes - qr{} Pattern yes - s{}{} Substitution yes + // m{} Pattern match yes (unless '' is delimiter) + qr{} Pattern yes (unless '' is delimiter) + s{}{} Substitution yes (unless '' is delimiter) tr{}{} Transliteration no (but see below) Note that there can be whitespace between the operator and the quoting @@ -636,7 +636,7 @@ next line. This allows you to write: For constructs that do interpolation, variables beginning with "C<$>" or "C<@>" are interpolated, as are the following sequences. Within -a transliteration, the first ten of these sequences may be used. +a transliteration, the first eleven of these sequences may be used. \t tab (HT, TAB) \n newline (NL) @@ -645,8 +645,9 @@ a transliteration, the first ten of these sequences may be used. \b backspace (BS) \a alarm (bell) (BEL) \e escape (ESC) - \033 octal char - \x1b hex char + \033 octal char (ESC) + \x1b hex char (ESC) + \x{263a} wide hex char (SMILEY) \c[ control char \l lowercase next char @@ -752,33 +753,33 @@ Options are: If "/" is the delimiter then the initial C is optional. With the C you can use any pair of non-alphanumeric, non-whitespace characters -as delimiters (if single quotes are used, no interpretation is done -on the replacement string. Unlike Perl 4, Perl 5 treats backticks as normal -delimiters; the replacement text is not evaluated as a command). -This is particularly useful for matching Unix path names -that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is +as delimiters. This is particularly useful for matching Unix path names +that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then the match-only-once rule of C applies. +If "'" is the delimiter, no variable interpolation is performed on the +PATTERN. PATTERN may contain variables, which will be interpolated (and the -pattern recompiled) every time the pattern search is evaluated. (Note -that C<$)> and C<$|> might not be interpolated because they look like -end-of-string tests.) If you want such a pattern to be compiled only -once, add a C after the trailing delimiter. This avoids expensive -run-time recompilations, and is useful when the value you are -interpolating won't change over the life of the script. However, mentioning -C constitutes a promise that you won't change the variables in the pattern. -If you change them, Perl won't even notice. +pattern recompiled) every time the pattern search is evaluated, except +for when the delimiter is a single quote. (Note that C<$)> and C<$|> +might not be interpolated because they look like end-of-string tests.) +If you want such a pattern to be compiled only once, add a C after +the trailing delimiter. This avoids expensive run-time recompilations, +and is useful when the value you are interpolating won't change over +the life of the script. However, mentioning C constitutes a promise +that you won't change the variables in the pattern. If you change them, +Perl won't even notice. If the PATTERN evaluates to the empty string, the last I matched regular expression is used instead. If the C option is not used, C in a list context returns a list consisting of the subexpressions matched by the parentheses in the -pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here -C<$1> etc. are also set, and -that this differs from Perl 4's behavior.) If there are no parentheses, -the return value is the list C<(1)> for success or C<('')> upon failure. -With parentheses, C<()> is returned upon failure. +pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are +also set, and that this differs from Perl 4's behavior.) When there are +no parentheses in the pattern, the return value is the list C<(1)> for +success. With or without parentheses, an empty list is returned upon +failure. Examples: @@ -909,12 +910,48 @@ A double-quoted, interpolated string. =item qr/STRING/imosx -A string which is (possibly) interpolated and then compiled as a -regular expression. The result may be used as a pattern in a match +Quote-as-a-regular-expression operator. I is interpolated the +same way as I in C. If "'" is used as the +delimiter, no variable interpolation is done. Returns a Perl value +which may be used instead of the corresponding C expression. + +For example, + + $rex = qr/my.STRING/is; + s/$rex/foo/; + +is equivalent to + + s/my.STRING/foo/is; + +The result may be used as a subpattern in a match: $re = qr/$pattern/; $string =~ /foo${re}bar/; # can be interpolated in other patterns $string =~ $re; # or used standalone + $string =~ /$re/; # or this way + +Since Perl may compile the pattern at the moment of execution of qr() +operator, using qr() may have speed advantages in I situations, +notably if the result of qr() is used standalone: + + sub match { + my $patterns = shift; + my @compiled = map qr/$_/i, @$patterns; + grep { + my $success = 0; + foreach my $pat @compiled { + $success = 1, last if /$pat/; + } + $success; + } @_; + } + +Precompilation of the pattern into an internal representation at the +moment of qr() avoids a need to recompile the pattern every time a +match C is attempted. (Note that Perl has many other +internal optimizations, but none would be triggered in the above +example if we did not use qr() operator.) Options are: @@ -924,19 +961,6 @@ Options are: s Treat string as single line. x Use extended regular expressions. -The benefit from this is that the pattern is precompiled into an internal -representation, and does not need to be recompiled every time a match -is attempted. This makes it very efficient to do something like: - - foreach $pattern (@pattern_list) { - my $re = qr/$pattern/; - foreach $line (@lines) { - if($line =~ /$re/) { - do_something($line); - } - } - } - See L for additional information on valid syntax for STRING, and for a detailed look at the semantics of regular expressions. @@ -1016,13 +1040,20 @@ See L<"I/O Operators"> for more discussion. =item qw/STRING/ -Returns a list of the words extracted out of STRING, using embedded -whitespace as the word delimiters. It is exactly equivalent to +Evaluates to a list of the words extracted out of STRING, using embedded +whitespace as the word delimiters. It can be understood as being roughly +equivalent to: split(' ', q/STRING/); -This equivalency means that if used in scalar context, you'll get split's -(unfortunate) scalar context behavior, complete with mysterious warnings. +the difference being that it generates a real list at compile time. So +this expression: + + qw(foo bar baz) + +is exactly equivalent to the list: + + ('foo', 'bar', 'baz') Some frequently seen examples: @@ -1034,6 +1065,10 @@ comments into a multi-line C-string. For this reason the C<-w> switch produce warnings if the STRING contains the "," or the "#" character. +Note that under use L qw() taints because the definition of +whitespace is tainted. See L for more information about +tainting and L for more information about locales. + =item s/PATTERN/REPLACEMENT/egimosx Searches a string for a pattern, and if found, replaces that pattern @@ -1045,7 +1080,7 @@ variable is searched and modified. (The string specified with C<=~> must be scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) -If the delimiter chosen is single quote, no variable interpolation is +If the delimiter chosen is a single quote, no variable interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern @@ -1138,9 +1173,9 @@ to occur. Here are two common cases: 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; -=item tr/SEARCHLIST/REPLACEMENTLIST/cds +=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC -=item y/SEARCHLIST/REPLACEMENTLIST/cds +=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns @@ -1148,6 +1183,7 @@ the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) + A character range may be specified with a hyphen, so C does the same replacement as C. For B devotees, C is provided as a synonym for C. If the @@ -1155,11 +1191,20 @@ SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g., C or C. +Note also that the whole range idea is rather unportable between +character sets--and even within character sets they may cause results +you probably didn't expect. A sound principle is to use only ranges +that begin from and end at either alphabets of equal case (a-e, A-E), +or digits (0-4). Anything else is unsafe. If in doubt, spell out the +character sets in full. + Options: c Complement the SEARCHLIST. d Delete found but unreplaced characters. s Squash duplicate replaced characters. + U Translate to/from UTF-8. + C Translate to/from 8-bit char (octet). If the C modifier is specified, the SEARCHLIST character set is complemented. If the C modifier is specified, any characters specified @@ -1177,6 +1222,10 @@ enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class. +The first C or C modifier applies to the left side of the translation. +The second one applies to the right side. If present, these modifiers override +the current utf8 state. + Examples: $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case @@ -1196,6 +1245,9 @@ Examples: tr [\200-\377] [\000-\177]; # delete 8th bit + tr/\0-\xFF//CU; # translate Latin-1 to Unicode + tr/\0-\x{FF}//UC; # translate Unicode to Latin-1 + If multiple transliterations are given for a character, only the first one is used: tr/AAA/XYZ/ @@ -1393,6 +1445,7 @@ to change. =head2 I/O Operators There are several I/O operators you should know about. + A string enclosed by backticks (grave accents) first undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the output of that command is the value @@ -1410,9 +1463,12 @@ The generalized form of backticks is C. (Because backticks always undergo shell expansion as well, see L for security concerns.) -Evaluating a filehandle in angle brackets yields the next line from -that file (newline, if any, included), or C at end of file. -Ordinarily you must assign that value to a variable, but there is one +In a scalar context, evaluating a filehandle in angle brackets yields the +next line from that file (newline, if any, included), or C at +end-of-file. When C<$/> is set to C (i.e. file slurp mode), +it returns C<''> the first time, followed by C subsequently. + +Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. I the input symbol is the only thing inside the conditional of a C or C loop, the value is automatically assigned to the variable @@ -1449,13 +1505,16 @@ The filehandles STDIN, STDOUT, and STDERR are predefined. (The filehandles C, C, and C will also work except in packages, where they would be interpreted as local identifiers rather than global.) Additional filehandles may be created with the open() -function. See L for details on this. +function. See L for details on this. If a EFILEHANDLEE is used in a context that is looking for a list, a list consisting of all the input lines is returned, one line per list element. It's easy to make a I data space this way, so use with care. +EFILEHANDLEE may also be spelt readline(FILEHANDLE). See +L. + The null filehandle EE is special and can be used to emulate the behavior of B and B. Input from EE comes either from standard input, or from each file listed on the command line. Here's @@ -1622,9 +1681,10 @@ Bitstrings of any size may be manipulated by the bitwise operators (C<~ | & ^>). If the operands to a binary bitwise op are strings of different sizes, -B and B ops will act as if the shorter operand had additional -zero bits on the right, while the B op will act as if the longer -operand were truncated to the length of the shorter. +B<|> and B<^> ops will act as if the shorter operand had additional +zero bits on the right, while the B<&> op will act as if the longer +operand were truncated to the length of the shorter. Note that the +granularity for such extension or truncation is one or more I. # ASCII-based examples print "j p \n" ^ " a h"; # prints "JAPH\n" @@ -1645,6 +1705,9 @@ operation you intend by using C<""> or C<0+>, as in the examples below. $baz = 0+$foo & 0+$bar; # both ops explicitly numeric $biz = "$foo" ^ "$bar"; # both ops explicitly stringy +See L for information on how to manipulate individual bits +in a bit vector. + =head2 Integer Arithmetic By default Perl assumes that it must do most of its arithmetic in