X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlop.pod;h=17728df9d3e7a846ae264ceb697aa7cdc5946d7d;hb=c69f112c145fabe210a7e2c5c2406baeea71af2f;hp=8b48eaf2e68be4899fb106625a28f1bef49a97ed;hpb=5695b28edc67a3f45e8a0f25755d07afef3660ac;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlop.pod b/pod/perlop.pod index 8b48eaf..17728df 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -8,7 +8,7 @@ Perl operators have the following associativity and precedence, listed from highest precedence to lowest. Note that all operators borrowed from C keep the same precedence relationship with each other, even where C's precedence is slightly screwy. (This makes learning -Perl easier for C folks.) With very few exceptions, these all +Perl easier for C folks.) With very few exceptions, these all operate on scalar values only, not array values. left terms and list operators (leftward) @@ -16,7 +16,7 @@ operate on scalar values only, not array values. nonassoc ++ -- right ** right ! ~ \ and unary + and - - left =~ !~ + left =~ !~ left * / % x left + - . left << >> @@ -27,7 +27,7 @@ operate on scalar values only, not array values. left | ^ left && left || - nonassoc .. + nonassoc .. ... right ?: right = += -= *= etc. left , => @@ -81,11 +81,11 @@ Also note that print ($foo & 255) + 1, "\n"; -probably doesn't do what you expect at first glance. See +probably doesn't do what you expect at first glance. See L for more discussion of this. Also parsed as terms are the C and C constructs, as -well as subroutine and method calls, and the anonymous +well as subroutine and method calls, and the anonymous constructors C<[]> and C<{}>. See also L toward the end of this section, @@ -110,7 +110,7 @@ See L. increment or decrement the variable before returning the value, and if placed after, increment or decrement the variable after returning the value. -The auto-increment operator has a little extra built-in magic to it. If +The auto-increment operator has a little extra builtin magic to it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has been used in only string contexts since it was set, and @@ -179,7 +179,12 @@ Binary "*" multiplies two numbers. Binary "/" divides two numbers. -Binary "%" computes the modulus of the two numbers. +Binary "%" computes the modulus of two numbers. Given integer +operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is +C<$a> minus the largest multiple of C<$b> that is not greater than +C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the +smallest multiple of C<$b> that is not less than C<$a> (i.e. the +result will be less than or equal to zero). Binary "x" is the repetition operator. In a scalar context, it returns a string consisting of the left operand repeated the number of @@ -347,12 +352,12 @@ operators depending on the context. In a list context, it returns an array of values counting (by ones) from the left value to the right value. This is useful for writing C loops and for doing slice operations on arrays. Be aware that under the current implementation, -a temporary array is created, so you'll burn a lot of memory if you +a temporary array is created, so you'll burn a lot of memory if you write something like this: for (1 .. 1_000_000) { # code - } + } In a scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator @@ -387,7 +392,7 @@ As a scalar operator: As a list operator: for (101 .. 200) { print; } # print $_ 100 times - @foo = @foo[$[ .. $#foo]; # an expensive no-op + @foo = @foo[0 .. $#foo]; # an expensive no-op @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items The range operator (in a list context) makes use of the magical @@ -416,11 +421,11 @@ like an if-then-else. If the argument before the ? is true, the argument before the : is returned, otherwise the argument after the : is returned. For example: - printf "I have %d dog%s.\n", $n, + printf "I have %d dog%s.\n", $n, ($n == 1) ? '' : "s"; Scalar or list context propagates downward into the 2nd -or 3rd argument, whichever is selected. +or 3rd argument, whichever is selected. $a = $ok ? $b : $c; # get a scalar @a = $ok ? @b : @c; # get an array @@ -446,8 +451,8 @@ is equivalent to $a = $a + 2; although without duplicating any side effects that dereferencing the lvalue -might trigger, such as from tie(). Other assignment operators work similarly. -The following are recognized: +might trigger, such as from tie(). Other assignment operators work similarly. +The following are recognized: **= += *= &= <<= &&= -= /= |= >>= ||= @@ -533,12 +538,12 @@ Address-of operator. (But see the "\" operator for taking a reference.) =item unary * -Dereference-address operator. (Perl's prefix dereferencing +Dereference-address operator. (Perl's prefix dereferencing operators are typed: $, @, %, and &.) =item (TYPE) -Type casting operator. +Type casting operator. =back @@ -550,7 +555,7 @@ pattern matching capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of them. In the following table, a C<{}> represents any pair of delimiters you choose. Non-bracketing delimiters use -the same character fore and aft, but the 4 sorts of brackets +the same character fore and aft, but the 4 sorts of brackets (round, angle, square, curly) will all nest. Customary Generic Meaning Interpolates @@ -562,6 +567,15 @@ the same character fore and aft, but the 4 sorts of brackets s{}{} Substitution yes tr{}{} Translation no +Note that there can be whitespace between the operator and the quoting +characters, except when C<#> is being used as the quoting character. +C is parsed as being the string C, which C is the +operator C followed by a comment. Its argument will be taken from the +next line. This allows you to write: + + s {foo} # Replace foo + {bar} # with bar. + For constructs that do interpolation, variables beginning with "C<$>" or "C<@>" are interpolated, as are the following sequences: @@ -614,9 +628,9 @@ patterns local to the current package are reset. This usage is vaguely deprecated, and may be removed in some future version of Perl. -=item m/PATTERN/gimosx +=item m/PATTERN/cgimosx -=item /PATTERN/gimosx +=item /PATTERN/cgimosx Searches a string for a pattern match, and in a scalar context returns true (1) or false (''). If no string is specified via the C<=~> or @@ -629,6 +643,7 @@ when C is in effect. Options are: + c Do not reset search position on a failed match when /g is in effect. g Match globally, i.e., find all occurrences. i Do case-insensitive pattern matching. m Treat string as multiple lines. @@ -639,7 +654,8 @@ Options are: If "/" is the delimiter then the initial C is optional. With the C you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching Unix path names -that contain "/", to avoid LTS (leaning toothpick syndrome). +that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is +the delimiter, then the match-only-once rule of C applies. PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated. (Note @@ -691,68 +707,72 @@ If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. In a scalar context, C iterates through the string, returning TRUE -each time it matches, and FALSE when it eventually runs out of -matches. (In other words, it remembers where it left off last time and -restarts the search at that point. You can actually find the current -match position of a string or set it using the pos() function--see -L.) Note that you can use this feature to stack C -matches or intermix C matches with C. +each time it matches, and FALSE when it eventually runs out of matches. +(In other words, it remembers where it left off last time and restarts +the search at that point. You can actually find the current match +position of a string or set it using the pos() function; see +L.) A failed match normally resets the search position to +the beginning of the string, but you can avoid that by adding the C +modifier (e.g. C). Modifying the target string also resets the +search position. + +You can intermix C matches with C, where C<\G> is a +zero-width assertion that matches the exact position where the previous +C, if any, left off. The C<\G> assertion is not supported without +the C modifier; currently, without C, C<\G> behaves just like +C<\A>, but that's accidental and may change in the future. -If you modify the string in any way, the match position is reset to the -beginning. Examples: +Examples: # list context ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); # scalar context $/ = ""; $* = 1; # $* deprecated in modern perls - while ($paragraph = <>) { + while (defined($paragraph = <>)) { while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { $sentences++; } } print "$sentences\n"; - # using m//g with \G - $_ = "ppooqppq"; + # using m//gc with \G + $_ = "ppooqppqq"; while ($i++ < 2) { print "1: '"; - print $1 while /(o)/g; print "', pos=", pos, "\n"; + print $1 while /(o)/gc; print "', pos=", pos, "\n"; print "2: '"; - print $1 if /\G(q)/; print "', pos=", pos, "\n"; + print $1 if /\G(q)/gc; print "', pos=", pos, "\n"; print "3: '"; - print $1 while /(p)/g; print "', pos=", pos, "\n"; + print $1 while /(p)/gc; print "', pos=", pos, "\n"; } The last example should print: 1: 'oo', pos=4 - 2: 'q', pos=4 + 2: 'q', pos=5 3: 'pp', pos=7 1: '', pos=7 - 2: 'q', pos=7 - 3: '', pos=7 - -Note how C matches change the value reported by C, but the -non-global match doesn't. + 2: 'q', pos=8 + 3: '', pos=8 -A useful idiom for C-like scanners is C. You can +A useful idiom for C-like scanners is C. You can combine several regexps like this to process a string part-by-part, -doing different actions depending on which regexp matched. The next -regexp would step in at the place the previous one left off. +doing different actions depending on which regexp matched. Each +regexp tries to match where the previous one leaves off. - $_ = <<'EOL'; + $_ = <<'EOL'; $url = new URI::URL "http://www/"; die if $url eq "xXx"; -EOL - LOOP: + EOL + LOOP: { - print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/g; - print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/g; - print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/g; - print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/g; - print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/g; - print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/g; - print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/g; + print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc; + print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc; + print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc; + print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc; + print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc; + print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc; + print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc; print ". That's all!\n"; } @@ -798,7 +818,25 @@ with $/ or $INPUT_RECORD_SEPARATOR). $today = qx{ date }; -See L for more discussion. +Note that how the string gets evaluated is entirely subject to the +command interpreter on your system. On most platforms, you will have +to protect shell metacharacters if you want them treated literally. +On some platforms (notably DOS-like ones), the shell may not be +capable of dealing with multiline commands, so putting newlines in +the string may not get you what you want. You may be able to evaluate +multiple commands in a single line by separating them with the command +separator character, if your shell supports that (e.g. C<;> on many Unix +shells; C<&> on the Windows NT C shell). + +Beware that some command shells may place restrictions on the length +of the command line. You must ensure your strings don't exceed this +limit after any necessary interpolations. See the platform-specific +release notes for more details about your particular environment. + +Also realize that using this operator frequently leads to unportable +programs. + +See L<"I/O Operators"> for more discussion. =item qw/STRING/ @@ -812,6 +850,11 @@ Some frequently seen examples: use POSIX qw( setlocale localeconv ) @EXPORT = qw( foo bar baz ); +A common mistake is to try to separate the words with comma or to put +comments into a multi-line qw-string. For this reason the C<-w> +switch produce warnings if the STRING contains the "," or the "#" +character. + =item s/PATTERN/REPLACEMENT/egimosx Searches a string for a pattern, and if found, replaces that pattern @@ -847,7 +890,7 @@ Options are: Any non-alphanumeric, non-whitespace delimiter may replace the slashes. If single quotes are used, no interpretation is done on the replacement string (the C modifier overrides this, however). Unlike -Perl 4, Perl 5 treats back-ticks as normal delimiters; the replacement +Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not evaluated as a command. If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g., @@ -892,7 +935,7 @@ Examples: s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields -Note the use of $ instead of \ in the last example. Unlike +Note the use of $ instead of \ in the last example. Unlike B, we use the \EIE form in only the left hand side. Anywhere else it's $EIE. @@ -916,12 +959,11 @@ with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is translated. (The string specified with =~ must be a scalar variable, an array element, a -hash element, -or an assignment to one of those, i.e., an lvalue.) For B devotees, -C is provided as a synonym for C. If the SEARCHLIST is -delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of -quotes, which may or may not be bracketing quotes, e.g., C -or C. +hash element, or an assignment to one of those, i.e., an lvalue.) +For B devotees, C is provided as a synonym for C. If the +SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has +its own pair of quotes, which may or may not be bracketing quotes, +e.g., C or C. Options: @@ -984,8 +1026,8 @@ an eval(): =head2 I/O Operators -There are several I/O operators you should know about. -A string is enclosed by back-ticks (grave accents) first undergoes +There are several I/O operators you should know about. +A string is enclosed by backticks (grave accents) first undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the output of that command is the value of the pseudo-literal, like in a shell. In a scalar context, a single @@ -998,8 +1040,8 @@ of C<$?>). Unlike in B, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. -The generalized form of back-ticks is C. (Because back-ticks -always undergo shell expansion as well, see L for +The generalized form of backticks is C. (Because backticks +always undergo shell expansion as well, see L for security concerns.) Evaluating a filehandle in angle brackets yields the next line from @@ -1044,7 +1086,7 @@ of filenames. The loop is equivalent to the following Perl-like pseudo code: - unshift(@ARGV, '-') if $#ARGV < $[; + unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV); while () { @@ -1064,7 +1106,7 @@ continue as if the input were one big happy file. (But see example under eof() for how to reset line numbers on each file.) If you want to set @ARGV to your own list of files, go right ahead. If -you want to pass switches into your script, you can use one of the +you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this: while ($_ = $ARGV[0], /^-/) { @@ -1121,7 +1163,7 @@ machine.) Of course, the shortest way to do the above is: Because globbing invokes a shell, it's often faster to call readdir() yourself and do your own grep() on the filenames. Furthermore, due to its current -implementation of using a shell, the glob() routine may get "Arg list too +implementation of using a shell, the glob() routine may get "Arg list too long" errors (unless you've installed tcsh(1L) as F). A glob evaluates its (embedded) argument only when it is starting a new @@ -1139,7 +1181,7 @@ than $file = ; because the latter will alternate between returning a filename and -returning FALSE. +returning FALSE. It you're trying to do variable interpolation, it's definitely better to use the glob() function, because the older notation can cause people @@ -1160,14 +1202,14 @@ compile time. You can say 'Now is the time for all' . "\n" . 'good men to come to.' -and this all reduces to one string internally. Likewise, if +and this all reduces to one string internally. Likewise, if you say foreach $file (@filenames) { if (-s $file > 5 + 100 * 2**16) { ... } - } + } -the compiler will pre-compute the number that +the compiler will precompute the number that expression represents so that the interpreter won't have to. @@ -1181,7 +1223,7 @@ floating point. But by saying you may tell the compiler that it's okay to use integer operations from here to the end of the enclosing BLOCK. An inner BLOCK may -countermand this by saying +countermand this by saying no integer;