perl 5.003_06: t/op/each.t
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlop - Perl operators and precedence
4
5=head1 SYNOPSIS
6
7Perl operators have the following associativity and precedence,
8listed from highest precedence to lowest. Note that all operators
9borrowed from C keep the same precedence relationship with each other,
10even where C's precedence is slightly screwy. (This makes learning
c07a80fd 11Perl easier for C folks.) With very few exceptions, these all
12operate on scalar values only, not array values.
a0d0e21e 13
14 left terms and list operators (leftward)
15 left ->
16 nonassoc ++ --
17 right **
18 right ! ~ \ and unary + and -
19 left =~ !~
20 left * / % x
21 left + - .
22 left << >>
23 nonassoc named unary operators
24 nonassoc < > <= >= lt gt le ge
25 nonassoc == != <=> eq ne cmp
26 left &
27 left | ^
28 left &&
29 left ||
30 nonassoc ..
31 right ?:
32 right = += -= *= etc.
33 left , =>
34 nonassoc list operators (rightward)
a5f75d66 35 right not
a0d0e21e 36 left and
37 left or xor
38
39In the following sections, these operators are covered in precedence order.
40
cb1a09d0 41=head1 DESCRIPTION
a0d0e21e 42
43=head2 Terms and List Operators (Leftward)
44
45Any TERM is of highest precedence of Perl. These includes variables,
46quote and quotelike operators, any expression in parentheses,
47and any function whose arguments are parenthesized. Actually, there
48aren't really functions in this sense, just list operators and unary
49operators behaving as functions because you put parentheses around
50the arguments. These are all documented in L<perlfunc>.
51
52If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
53is followed by a left parenthesis as the next token, the operator and
54arguments within parentheses are taken to be of highest precedence,
55just like a normal function call.
56
57In the absence of parentheses, the precedence of list operators such as
58C<print>, C<sort>, or C<chmod> is either very high or very low depending on
59whether you look at the left side of operator or the right side of it.
60For example, in
61
62 @ary = (1, 3, sort 4, 2);
63 print @ary; # prints 1324
64
65the commas on the right of the sort are evaluated before the sort, but
66the commas on the left are evaluated after. In other words, list
67operators tend to gobble up all the arguments that follow them, and
68then act like a simple TERM with regard to the preceding expression.
69Note that you have to be careful with parens:
70
71 # These evaluate exit before doing the print:
72 print($foo, exit); # Obviously not what you want.
73 print $foo, exit; # Nor is this.
74
75 # These do the print before evaluating exit:
76 (print $foo), exit; # This is what you want.
77 print($foo), exit; # Or this.
78 print ($foo), exit; # Or even this.
79
80Also note that
81
82 print ($foo & 255) + 1, "\n";
83
84probably doesn't do what you expect at first glance. See
85L<Named Unary Operators> for more discussion of this.
86
87Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
88well as subroutine and method calls, and the anonymous
89constructors C<[]> and C<{}>.
90
91See also L<Quote and Quotelike Operators> toward the end of this section,
c07a80fd 92as well as L<"I/O Operators">.
a0d0e21e 93
94=head2 The Arrow Operator
95
96Just as in C and C++, "C<-E<gt>>" is an infix dereference operator. If the
97right side is either a C<[...]> or C<{...}> subscript, then the left side
98must be either a hard or symbolic reference to an array or hash (or
99a location capable of holding a hard reference, if it's an lvalue (assignable)).
100See L<perlref>.
101
102Otherwise, the right side is a method name or a simple scalar variable
103containing the method name, and the left side must either be an object
104(a blessed reference) or a class name (that is, a package name).
105See L<perlobj>.
106
107=head2 Autoincrement and Autodecrement
108
109"++" and "--" work as in C. That is, if placed before a variable, they
110increment or decrement the variable before returning the value, and if
111placed after, increment or decrement the variable after returning the value.
112
113The autoincrement operator has a little extra built-in magic to it. If
114you increment a variable that is numeric, or that has ever been used in
115a numeric context, you get a normal increment. If, however, the
116variable has only been used in string contexts since it was set, and
117has a value that is not null and matches the pattern
118C</^[a-zA-Z]*[0-9]*$/>, the increment is done as a string, preserving each
119character within its range, with carry:
120
121 print ++($foo = '99'); # prints '100'
122 print ++($foo = 'a0'); # prints 'a1'
123 print ++($foo = 'Az'); # prints 'Ba'
124 print ++($foo = 'zz'); # prints 'aaa'
125
126The autodecrement operator is not magical.
127
128=head2 Exponentiation
129
130Binary "**" is the exponentiation operator. Note that it binds even more
cb1a09d0 131tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
132implemented using C's pow(3) function, which actually works on doubles
133internally.)
a0d0e21e 134
135=head2 Symbolic Unary Operators
136
137Unary "!" performs logical negation, i.e. "not". See also C<not> for a lower
138precedence version of this.
139
140Unary "-" performs arithmetic negation if the operand is numeric. If
141the operand is an identifier, a string consisting of a minus sign
142concatenated with the identifier is returned. Otherwise, if the string
143starts with a plus or minus, a string starting with the opposite sign
144is returned. One effect of these rules is that C<-bareword> is equivalent
145to C<"-bareword">.
146
147Unary "~" performs bitwise negation, i.e. 1's complement.
148
149Unary "+" has no effect whatsoever, even on strings. It is useful
150syntactically for separating a function name from a parenthesized expression
151that would otherwise be interpreted as the complete list of function
152arguments. (See examples above under L<List Operators>.)
153
154Unary "\" creates a reference to whatever follows it. See L<perlref>.
155Do not confuse this behavior with the behavior of backslash within a
156string, although both forms do convey the notion of protecting the next
157thing from interpretation.
158
159=head2 Binding Operators
160
c07a80fd 161Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 162search or modify the string $_ by default. This operator makes that kind
163of operation work on some other string. The right argument is a search
164pattern, substitution, or translation. The left argument is what is
165supposed to be searched, substituted, or translated instead of the default
166$_. The return value indicates the success of the operation. (If the
167right argument is an expression rather than a search pattern,
168substitution, or translation, it is interpreted as a search pattern at run
169time. This is less efficient than an explicit search, since the pattern
170must be compiled every time the expression is evaluated--unless you've
171used C</o>.)
a0d0e21e 172
173Binary "!~" is just like "=~" except the return value is negated in
174the logical sense.
175
176=head2 Multiplicative Operators
177
178Binary "*" multiplies two numbers.
179
180Binary "/" divides two numbers.
181
182Binary "%" computes the modulus of the two numbers.
183
184Binary "x" is the repetition operator. In a scalar context, it
185returns a string consisting of the left operand repeated the number of
186times specified by the right operand. In a list context, if the left
187operand is a list in parens, it repeats the list.
188
189 print '-' x 80; # print row of dashes
190
191 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
192
193 @ones = (1) x 80; # a list of 80 1's
194 @ones = (5) x @ones; # set all elements to 5
195
196
197=head2 Additive Operators
198
199Binary "+" returns the sum of two numbers.
200
201Binary "-" returns the difference of two numbers.
202
203Binary "." concatenates two strings.
204
205=head2 Shift Operators
206
6ee5d4e7 207Binary "E<lt>E<lt>" returns the value of its left argument shifted left by the
a0d0e21e 208number of bits specified by the right argument. Arguments should be
209integers.
210
6ee5d4e7 211Binary "E<gt>E<gt>" returns the value of its left argument shifted right by the
a0d0e21e 212number of bits specified by the right argument. Arguments should be
213integers.
214
215=head2 Named Unary Operators
216
217The various named unary operators are treated as functions with one
218argument, with optional parentheses. These include the filetest
219operators, like C<-f>, C<-M>, etc. See L<perlfunc>.
220
221If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
222is followed by a left parenthesis as the next token, the operator and
223arguments within parentheses are taken to be of highest precedence,
224just like a normal function call. Examples:
225
226 chdir $foo || die; # (chdir $foo) || die
227 chdir($foo) || die; # (chdir $foo) || die
228 chdir ($foo) || die; # (chdir $foo) || die
229 chdir +($foo) || die; # (chdir $foo) || die
230
231but, because * is higher precedence than ||:
232
233 chdir $foo * 20; # chdir ($foo * 20)
234 chdir($foo) * 20; # (chdir $foo) * 20
235 chdir ($foo) * 20; # (chdir $foo) * 20
236 chdir +($foo) * 20; # chdir ($foo * 20)
237
238 rand 10 * 20; # rand (10 * 20)
239 rand(10) * 20; # (rand 10) * 20
240 rand (10) * 20; # (rand 10) * 20
241 rand +(10) * 20; # rand (10 * 20)
242
243See also L<"List Operators">.
244
245=head2 Relational Operators
246
6ee5d4e7 247Binary "E<lt>" returns true if the left argument is numerically less than
a0d0e21e 248the right argument.
249
6ee5d4e7 250Binary "E<gt>" returns true if the left argument is numerically greater
a0d0e21e 251than the right argument.
252
6ee5d4e7 253Binary "E<lt>=" returns true if the left argument is numerically less than
a0d0e21e 254or equal to the right argument.
255
6ee5d4e7 256Binary "E<gt>=" returns true if the left argument is numerically greater
a0d0e21e 257than or equal to the right argument.
258
259Binary "lt" returns true if the left argument is stringwise less than
260the right argument.
261
262Binary "gt" returns true if the left argument is stringwise greater
263than the right argument.
264
265Binary "le" returns true if the left argument is stringwise less than
266or equal to the right argument.
267
268Binary "ge" returns true if the left argument is stringwise greater
269than or equal to the right argument.
270
271=head2 Equality Operators
272
273Binary "==" returns true if the left argument is numerically equal to
274the right argument.
275
276Binary "!=" returns true if the left argument is numerically not equal
277to the right argument.
278
6ee5d4e7 279Binary "E<lt>=E<gt>" returns -1, 0, or 1 depending on whether the left
280argument is numerically less than, equal to, or greater than the right
281argument.
a0d0e21e 282
283Binary "eq" returns true if the left argument is stringwise equal to
284the right argument.
285
286Binary "ne" returns true if the left argument is stringwise not equal
287to the right argument.
288
289Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise
290less than, equal to, or greater than the right argument.
291
292=head2 Bitwise And
293
294Binary "&" returns its operators ANDed together bit by bit.
295
296=head2 Bitwise Or and Exclusive Or
297
298Binary "|" returns its operators ORed together bit by bit.
299
300Binary "^" returns its operators XORed together bit by bit.
301
302=head2 C-style Logical And
303
304Binary "&&" performs a short-circuit logical AND operation. That is,
305if the left operand is false, the right operand is not even evaluated.
306Scalar or list context propagates down to the right operand if it
307is evaluated.
308
309=head2 C-style Logical Or
310
311Binary "||" performs a short-circuit logical OR operation. That is,
312if the left operand is true, the right operand is not even evaluated.
313Scalar or list context propagates down to the right operand if it
314is evaluated.
315
316The C<||> and C<&&> operators differ from C's in that, rather than returning
3170 or 1, they return the last value evaluated. Thus, a reasonably portable
318way to find out the home directory (assuming it's not "0") might be:
319
320 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
321 (getpwuid($<))[7] || die "You're homeless!\n";
322
323As more readable alternatives to C<&&> and C<||>, Perl provides "and" and
324"or" operators (see below). The short-circuit behavior is identical. The
325precedence of "and" and "or" is much lower, however, so that you can
326safely use them after a list operator without the need for
327parentheses:
328
329 unlink "alpha", "beta", "gamma"
330 or gripe(), next LINE;
331
332With the C-style operators that would have been written like this:
333
334 unlink("alpha", "beta", "gamma")
335 || (gripe(), next LINE);
336
337=head2 Range Operator
338
339Binary ".." is the range operator, which is really two different
340operators depending on the context. In a list context, it returns an
341array of values counting (by ones) from the left value to the right
342value. This is useful for writing C<for (1..10)> loops and for doing
343slice operations on arrays. Be aware that under the current implementation,
344a temporary array is created, so you'll burn a lot of memory if you
345write something like this:
346
347 for (1 .. 1_000_000) {
348 # code
349 }
350
351In a scalar context, ".." returns a boolean value. The operator is
352bistable, like a flip-flop, and emulates the line-range (comma) operator
353of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
354own boolean state. It is false as long as its left operand is false.
355Once the left operand is true, the range operator stays true until the
356right operand is true, I<AFTER> which the range operator becomes false
357again. (It doesn't become false till the next time the range operator is
358evaluated. It can test the right operand and become false on the same
359evaluation it became true (as in B<awk>), but it still returns true once.
360If you don't want it to test the right operand till the next evaluation
361(as in B<sed>), use three dots ("...") instead of two.) The right
362operand is not evaluated while the operator is in the "false" state, and
363the left operand is not evaluated while the operator is in the "true"
364state. The precedence is a little lower than || and &&. The value
365returned is either the null string for false, or a sequence number
366(beginning with 1) for true. The sequence number is reset for each range
367encountered. The final sequence number in a range has the string "E0"
368appended to it, which doesn't affect its numeric value, but gives you
369something to search for if you want to exclude the endpoint. You can
370exclude the beginning point by waiting for the sequence number to be
371greater than 1. If either operand of scalar ".." is a numeric literal,
372that operand is implicitly compared to the C<$.> variable, the current
373line number. Examples:
374
375As a scalar operator:
376
377 if (101 .. 200) { print; } # print 2nd hundred lines
378 next line if (1 .. /^$/); # skip header lines
379 s/^/> / if (/^$/ .. eof()); # quote body
380
381As a list operator:
382
383 for (101 .. 200) { print; } # print $_ 100 times
384 @foo = @foo[$[ .. $#foo]; # an expensive no-op
385 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
386
387The range operator (in a list context) makes use of the magical
d28ebecd 388autoincrement algorithm if the operands are strings. You
a0d0e21e 389can say
390
391 @alphabet = ('A' .. 'Z');
392
393to get all the letters of the alphabet, or
394
395 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
396
397to get a hexadecimal digit, or
398
399 @z2 = ('01' .. '31'); print $z2[$mday];
400
401to get dates with leading zeros. If the final value specified is not
402in the sequence that the magical increment would produce, the sequence
403goes until the next value would be longer than the final value
404specified.
405
406=head2 Conditional Operator
407
408Ternary "?:" is the conditional operator, just as in C. It works much
409like an if-then-else. If the argument before the ? is true, the
410argument before the : is returned, otherwise the argument after the :
cb1a09d0 411is returned. For example:
412
413 printf "I have %d dog%s.\n", $n,
414 ($n == 1) ? '' : "s";
415
416Scalar or list context propagates downward into the 2nd
417or 3rd argument, whichever is selected.
418
419 $a = $ok ? $b : $c; # get a scalar
420 @a = $ok ? @b : @c; # get an array
421 $a = $ok ? @b : @c; # oops, that's just a count!
422
423The operator may be assigned to if both the 2nd and 3rd arguments are
424legal lvalues (meaning that you can assign to them):
a0d0e21e 425
426 ($a_or_b ? $a : $b) = $c;
427
cb1a09d0 428This is not necessarily guaranteed to contribute to the readability of your program.
a0d0e21e 429
4633a7c4 430=head2 Assignment Operators
a0d0e21e 431
432"=" is the ordinary assignment operator.
433
434Assignment operators work as in C. That is,
435
436 $a += 2;
437
438is equivalent to
439
440 $a = $a + 2;
441
442although without duplicating any side effects that dereferencing the lvalue
443might trigger, such as from tie(). Other assignment operators work similarly.
444The following are recognized:
445
446 **= += *= &= <<= &&=
447 -= /= |= >>= ||=
448 .= %= ^=
449 x=
450
451Note that while these are grouped by family, they all have the precedence
452of assignment.
453
454Unlike in C, the assignment operator produces a valid lvalue. Modifying
455an assignment is equivalent to doing the assignment and then modifying
456the variable that was assigned to. This is useful for modifying
457a copy of something, like this:
458
459 ($tmp = $global) =~ tr [A-Z] [a-z];
460
461Likewise,
462
463 ($a += 2) *= 3;
464
465is equivalent to
466
467 $a += 2;
468 $a *= 3;
469
748a9306 470=head2 Comma Operator
a0d0e21e 471
472Binary "," is the comma operator. In a scalar context it evaluates
473its left argument, throws that value away, then evaluates its right
474argument and returns that value. This is just like C's comma operator.
475
476In a list context, it's just the list argument separator, and inserts
477both its arguments into the list.
478
6ee5d4e7 479The =E<gt> digraph is mostly just a synonym for the comma operator. It's useful for
cb1a09d0 480documenting arguments that come in pairs. As of release 5.001, it also forces
4633a7c4 481any word to the left of it to be interpreted as a string.
748a9306 482
a0d0e21e 483=head2 List Operators (Rightward)
484
485On the right side of a list operator, it has very low precedence,
486such that it controls all comma-separated expressions found there.
487The only operators with lower precedence are the logical operators
488"and", "or", and "not", which may be used to evaluate calls to list
489operators without the need for extra parentheses:
490
491 open HANDLE, "filename"
492 or die "Can't open: $!\n";
493
494See also discussion of list operators in L<List Operators (Leftward)>.
495
496=head2 Logical Not
497
498Unary "not" returns the logical negation of the expression to its right.
499It's the equivalent of "!" except for the very low precedence.
500
501=head2 Logical And
502
503Binary "and" returns the logical conjunction of the two surrounding
504expressions. It's equivalent to && except for the very low
505precedence. This means that it short-circuits: i.e. the right
506expression is evaluated only if the left expression is true.
507
508=head2 Logical or and Exclusive Or
509
510Binary "or" returns the logical disjunction of the two surrounding
511expressions. It's equivalent to || except for the very low
512precedence. This means that it short-circuits: i.e. the right
513expression is evaluated only if the left expression is false.
514
515Binary "xor" returns the exclusive-OR of the two surrounding expressions.
516It cannot short circuit, of course.
517
518=head2 C Operators Missing From Perl
519
520Here is what C has that Perl doesn't:
521
522=over 8
523
524=item unary &
525
526Address-of operator. (But see the "\" operator for taking a reference.)
527
528=item unary *
529
530Dereference-address operator. (Perl's prefix dereferencing
531operators are typed: $, @, %, and &.)
532
533=item (TYPE)
534
535Type casting operator.
536
537=back
538
539=head2 Quote and Quotelike Operators
540
541While we usually think of quotes as literal values, in Perl they
542function as operators, providing various kinds of interpolating and
543pattern matching capabilities. Perl provides customary quote characters
544for these behaviors, but also provides a way for you to choose your
545quote character for any of them. In the following table, a C<{}> represents
546any pair of delimiters you choose. Non-bracketing delimiters use
547the same character fore and aft, but the 4 sorts of brackets
548(round, angle, square, curly) will all nest.
549
550 Customary Generic Meaning Interpolates
551 '' q{} Literal no
552 "" qq{} Literal yes
553 `` qx{} Command yes
554 qw{} Word list no
555 // m{} Pattern match yes
556 s{}{} Substitution yes
557 tr{}{} Translation no
558
cb1a09d0 559For constructs that do interpolation, variables beginning with "C<$>" or "C<@>"
a0d0e21e 560are interpolated, as are the following sequences:
561
6ee5d4e7 562 \t tab (HT, TAB)
563 \n newline (LF, NL)
564 \r return (CR)
565 \f form feed (FF)
566 \b backspace (BS)
567 \a alarm (bell) (BEL)
568 \e escape (ESC)
a0d0e21e 569 \033 octal char
570 \x1b hex char
571 \c[ control char
572 \l lowercase next char
573 \u uppercase next char
574 \L lowercase till \E
575 \U uppercase till \E
576 \E end case modification
577 \Q quote regexp metacharacters till \E
578
579Patterns are subject to an additional level of interpretation as a
580regular expression. This is done as a second pass, after variables are
581interpolated, so that regular expressions may be incorporated into the
582pattern from the variables. If this is not what you want, use C<\Q> to
583interpolate a variable literally.
584
585Apart from the above, there are no multiple levels of interpolation. In
586particular, contrary to the expectations of shell programmers, backquotes
587do I<NOT> interpolate within double quotes, nor do single quotes impede
588evaluation of variables when used within double quotes.
589
cb1a09d0 590=head2 Regexp Quotelike Operators
591
592Here are the quotelike operators that apply to pattern
593matching and related activities.
594
a0d0e21e 595=over 8
596
597=item ?PATTERN?
598
599This is just like the C</pattern/> search, except that it matches only
600once between calls to the reset() operator. This is a useful
601optimization when you only want to see the first occurrence of
602something in each file of a set of files, for instance. Only C<??>
603patterns local to the current package are reset.
604
605This usage is vaguely deprecated, and may be removed in some future
606version of Perl.
607
608=item m/PATTERN/gimosx
609
610=item /PATTERN/gimosx
611
612Searches a string for a pattern match, and in a scalar context returns
613true (1) or false (''). If no string is specified via the C<=~> or
614C<!~> operator, the $_ string is searched. (The string specified with
615C<=~> need not be an lvalue--it may be the result of an expression
616evaluation, but remember the C<=~> binds rather tightly.) See also
617L<perlre>.
618
619Options are:
620
621 g Match globally, i.e. find all occurrences.
622 i Do case-insensitive pattern matching.
623 m Treat string as multiple lines.
624 o Only compile pattern once.
625 s Treat string as single line.
626 x Use extended regular expressions.
627
628If "/" is the delimiter then the initial C<m> is optional. With the C<m>
629you can use any pair of non-alphanumeric, non-whitespace characters as
630delimiters. This is particularly useful for matching Unix path names
631that contain "/", to avoid LTS (leaning toothpick syndrome).
632
633PATTERN may contain variables, which will be interpolated (and the
634pattern recompiled) every time the pattern search is evaluated. (Note
635that C<$)> and C<$|> might not be interpolated because they look like
636end-of-string tests.) If you want such a pattern to be compiled only
637once, add a C</o> after the trailing delimiter. This avoids expensive
638run-time recompilations, and is useful when the value you are
639interpolating won't change over the life of the script. However, mentioning
640C</o> constitutes a promise that you won't change the variables in the pattern.
641If you change them, Perl won't even notice.
642
4633a7c4 643If the PATTERN evaluates to a null string, the last
644successfully executed regular expression is used instead.
a0d0e21e 645
646If used in a context that requires a list value, a pattern match returns a
647list consisting of the subexpressions matched by the parentheses in the
6ee5d4e7 648pattern, i.e. (C<$1>, $2, $3...). (Note that here $1 etc. are also set, and
a0d0e21e 649that this differs from Perl 4's behavior.) If the match fails, a null
650array is returned. If the match succeeds, but there were no parentheses,
651a list value of (1) is returned.
652
653Examples:
654
655 open(TTY, '/dev/tty');
656 <TTY> =~ /^y/i && foo(); # do foo if desired
657
658 if (/Version: *([0-9.]*)/) { $version = $1; }
659
660 next if m#^/usr/spool/uucp#;
661
662 # poor man's grep
663 $arg = shift;
664 while (<>) {
665 print if /$arg/o; # compile only once
666 }
667
668 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
669
670This last example splits $foo into the first two words and the
671remainder of the line, and assigns those three fields to $F1, $F2 and
672$Etc. The conditional is true if any variables were assigned, i.e. if
673the pattern matched.
674
675The C</g> modifier specifies global pattern matching--that is, matching
676as many times as possible within the string. How it behaves depends on
677the context. In a list context, it returns a list of all the
678substrings matched by all the parentheses in the regular expression.
679If there are no parentheses, it returns a list of all the matched
680strings, as if there were parentheses around the whole pattern.
681
682In a scalar context, C<m//g> iterates through the string, returning TRUE
683each time it matches, and FALSE when it eventually runs out of
684matches. (In other words, it remembers where it left off last time and
685restarts the search at that point. You can actually find the current
686match position of a string using the pos() function--see L<perlfunc>.)
687If you modify the string in any way, the match position is reset to the
688beginning. Examples:
689
690 # list context
691 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
692
693 # scalar context
694 $/ = ""; $* = 1; # $* deprecated in Perl 5
695 while ($paragraph = <>) {
696 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
697 $sentences++;
698 }
699 }
700 print "$sentences\n";
701
702=item q/STRING/
703
704=item C<'STRING'>
705
706A single-quoted, literal string. Backslashes are ignored, unless
707followed by the delimiter or another backslash, in which case the
708delimiter or backslash is interpolated.
709
710 $foo = q!I said, "You said, 'She said it.'"!;
711 $bar = q('This is it.');
712
713=item qq/STRING/
714
715=item "STRING"
716
717A double-quoted, interpolated string.
718
719 $_ .= qq
720 (*** The previous line contains the naughty word "$1".\n)
721 if /(tcl|rexx|python)/; # :-)
722
723=item qx/STRING/
724
725=item `STRING`
726
727A string which is interpolated and then executed as a system command.
728The collected standard output of the command is returned. In scalar
729context, it comes back as a single (potentially multi-line) string.
730In list context, returns a list of lines (however you've defined lines
731with $/ or $INPUT_RECORD_SEPARATOR).
732
733 $today = qx{ date };
734
735See L<I/O Operators> for more discussion.
736
737=item qw/STRING/
738
739Returns a list of the words extracted out of STRING, using embedded
740whitespace as the word delimiters. It is exactly equivalent to
741
742 split(' ', q/STRING/);
743
744Some frequently seen examples:
745
746 use POSIX qw( setlocale localeconv )
747 @EXPORT = qw( foo bar baz );
748
749=item s/PATTERN/REPLACEMENT/egimosx
750
751Searches a string for a pattern, and if found, replaces that pattern
752with the replacement text and returns the number of substitutions
e37d713d 753made. Otherwise it returns false (specifically, the empty string).
a0d0e21e 754
755If no string is specified via the C<=~> or C<!~> operator, the C<$_>
756variable is searched and modified. (The string specified with C<=~> must
757be a scalar variable, an array element, a hash element, or an assignment
758to one of those, i.e. an lvalue.)
759
760If the delimiter chosen is single quote, no variable interpolation is
761done on either the PATTERN or the REPLACEMENT. Otherwise, if the
762PATTERN contains a $ that looks like a variable rather than an
763end-of-string test, the variable will be interpolated into the pattern
764at run-time. If you only want the pattern compiled once the first time
765the variable is interpolated, use the C</o> option. If the pattern
4633a7c4 766evaluates to a null string, the last successfully executed regular
a0d0e21e 767expression is used instead. See L<perlre> for further explanation on these.
768
769Options are:
770
771 e Evaluate the right side as an expression.
772 g Replace globally, i.e. all occurrences.
773 i Do case-insensitive pattern matching.
774 m Treat string as multiple lines.
775 o Only compile pattern once.
776 s Treat string as single line.
777 x Use extended regular expressions.
778
779Any non-alphanumeric, non-whitespace delimiter may replace the
780slashes. If single quotes are used, no interpretation is done on the
e37d713d 781replacement string (the C</e> modifier overrides this, however). Unlike
782Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
783text is not evaluated as a command. If the
a0d0e21e 784PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
785pair of quotes, which may or may not be bracketing quotes, e.g.
786C<s(foo)(bar)> or C<sE<lt>fooE<gt>/bar/>. A C</e> will cause the
787replacement portion to be interpreter as a full-fledged Perl expression
788and eval()ed right then and there. It is, however, syntax checked at
789compile-time.
790
791Examples:
792
793 s/\bgreen\b/mauve/g; # don't change wintergreen
794
795 $path =~ s|/usr/bin|/usr/local/bin|;
796
797 s/Login: $foo/Login: $bar/; # run-time pattern
798
799 ($foo = $bar) =~ s/this/that/;
800
801 $count = ($paragraph =~ s/Mister\b/Mr./g);
802
803 $_ = 'abc123xyz';
804 s/\d+/$&*2/e; # yields 'abc246xyz'
805 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
806 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
807
808 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
809 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
810 s/^=(\w+)/&pod($1)/ge; # use function call
811
812 # /e's can even nest; this will expand
813 # simple embedded variables in $_
814 s/(\$\w+)/$1/eeg;
815
816 # Delete C comments.
817 $program =~ s {
4633a7c4 818 /\* # Match the opening delimiter.
819 .*? # Match a minimal number of characters.
820 \*/ # Match the closing delimiter.
a0d0e21e 821 } []gsx;
822
823 s/^\s*(.*?)\s*$/$1/; # trim white space
824
825 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
826
827Note the use of $ instead of \ in the last example. Unlike
6ee5d4e7 828B<sed>, we only use the \E<lt>I<digit>E<gt> form in the left hand side.
829Anywhere else it's $E<lt>I<digit>E<gt>.
a0d0e21e 830
831Occasionally, you can't just use a C</g> to get all the changes
832to occur. Here are two common cases:
833
834 # put commas in the right places in an integer
835 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4
836 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
837
838 # expand tabs to 8-column spacing
839 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
840
841
842=item tr/SEARCHLIST/REPLACEMENTLIST/cds
843
844=item y/SEARCHLIST/REPLACEMENTLIST/cds
845
846Translates all occurrences of the characters found in the search list
847with the corresponding character in the replacement list. It returns
848the number of characters replaced or deleted. If no string is
849specified via the =~ or !~ operator, the $_ string is translated. (The
850string specified with =~ must be a scalar variable, an array element,
851or an assignment to one of those, i.e. an lvalue.) For B<sed> devotees,
852C<y> is provided as a synonym for C<tr>. If the SEARCHLIST is
853delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of
854quotes, which may or may not be bracketing quotes, e.g. C<tr[A-Z][a-z]>
855or C<tr(+-*/)/ABCD/>.
856
857Options:
858
859 c Complement the SEARCHLIST.
860 d Delete found but unreplaced characters.
861 s Squash duplicate replaced characters.
862
863If the C</c> modifier is specified, the SEARCHLIST character set is
864complemented. If the C</d> modifier is specified, any characters specified
865by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note
866that this is slightly more flexible than the behavior of some B<tr>
867programs, which delete anything they find in the SEARCHLIST, period.)
868If the C</s> modifier is specified, sequences of characters that were
869translated to the same character are squashed down to a single instance of the
870character.
871
872If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
873exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
874than the SEARCHLIST, the final character is replicated till it is long
875enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated.
876This latter is useful for counting characters in a class or for
877squashing character sequences in a class.
878
879Examples:
880
881 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
882
883 $cnt = tr/*/*/; # count the stars in $_
884
885 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
886
887 $cnt = tr/0-9//; # count the digits in $_
888
889 tr/a-zA-Z//s; # bookkeeper -> bokeper
890
891 ($HOST = $host) =~ tr/a-z/A-Z/;
892
893 tr/a-zA-Z/ /cs; # change non-alphas to single space
894
895 tr [\200-\377]
896 [\000-\177]; # delete 8th bit
897
748a9306 898If multiple translations are given for a character, only the first one is used:
899
900 tr/AAA/XYZ/
901
902will translate any A to X.
903
a0d0e21e 904Note that because the translation table is built at compile time, neither
905the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
906interpolation. That means that if you want to use variables, you must use
907an eval():
908
909 eval "tr/$oldlist/$newlist/";
910 die $@ if $@;
911
912 eval "tr/$oldlist/$newlist/, 1" or die $@;
913
914=back
915
916=head2 I/O Operators
917
918There are several I/O operators you should know about.
919A string is enclosed by backticks (grave accents) first undergoes
920variable substitution just like a double quoted string. It is then
921interpreted as a command, and the output of that command is the value
922of the pseudo-literal, like in a shell. In a scalar context, a single
923string consisting of all the output is returned. In a list context,
924a list of values is returned, one for each line of output. (You can
925set C<$/> to use a different line terminator.) The command is executed
926each time the pseudo-literal is evaluated. The status value of the
927command is returned in C<$?> (see L<perlvar> for the interpretation
928of C<$?>). Unlike in B<csh>, no translation is done on the return
929data--newlines remain newlines. Unlike in any of the shells, single
930quotes do not hide variable names in the command from interpretation.
931To pass a $ through to the shell you need to hide it with a backslash.
cb1a09d0 932The generalized form of backticks is C<qx//>. (Because backticks
933always undergo shell expansion as well, see L<perlsec> for
934security concerns.)
a0d0e21e 935
936Evaluating a filehandle in angle brackets yields the next line from
748a9306 937that file (newline included, so it's never false until end of file, at
938which time an undefined value is returned). Ordinarily you must assign
939that value to a variable, but there is one situation where an automatic
a0d0e21e 940assignment happens. I<If and ONLY if> the input symbol is the only
941thing inside the conditional of a C<while> loop, the value is
748a9306 942automatically assigned to the variable C<$_>. The assigned value is
943then tested to see if it is defined. (This may seem like an odd thing
944to you, but you'll use the construct in almost every Perl script you
945write.) Anyway, the following lines are equivalent to each other:
a0d0e21e 946
748a9306 947 while (defined($_ = <STDIN>)) { print; }
a0d0e21e 948 while (<STDIN>) { print; }
949 for (;<STDIN>;) { print; }
748a9306 950 print while defined($_ = <STDIN>);
a0d0e21e 951 print while <STDIN>;
952
953The filehandles STDIN, STDOUT and STDERR are predefined. (The
954filehandles C<stdin>, C<stdout> and C<stderr> will also work except in
955packages, where they would be interpreted as local identifiers rather
956than global.) Additional filehandles may be created with the open()
cb1a09d0 957function. See L<perlfunc/open()> for details on this.
a0d0e21e 958
6ee5d4e7 959If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a
a0d0e21e 960list consisting of all the input lines is returned, one line per list
961element. It's easy to make a I<LARGE> data space this way, so use with
962care.
963
d28ebecd 964The null filehandle E<lt>E<gt> is special and can be used to emulate the
965behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
a0d0e21e 966standard input, or from each file listed on the command line. Here's
d28ebecd 967how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
a0d0e21e 968checked, and if it is null, C<$ARGV[0]> is set to "-", which when opened
969gives you standard input. The @ARGV array is then processed as a list
970of filenames. The loop
971
972 while (<>) {
973 ... # code for each line
974 }
975
976is equivalent to the following Perl-like pseudo code:
977
978 unshift(@ARGV, '-') if $#ARGV < $[;
979 while ($ARGV = shift) {
980 open(ARGV, $ARGV);
981 while (<ARGV>) {
982 ... # code for each line
983 }
984 }
985
986except that it isn't so cumbersome to say, and will actually work. It
987really does shift array @ARGV and put the current filename into variable
d28ebecd 988$ARGV. It also uses filehandle I<ARGV> internally--E<lt>E<gt> is just a synonym
6ee5d4e7 989for E<lt>ARGVE<gt>, which is magical. (The pseudo code above doesn't work
990because it treats E<lt>ARGVE<gt> as non-magical.)
a0d0e21e 991
d28ebecd 992You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
a0d0e21e 993containing the list of filenames you really want. Line numbers (C<$.>)
994continue as if the input were one big happy file. (But see example
995under eof() for how to reset line numbers on each file.)
996
997If you want to set @ARGV to your own list of files, go right ahead. If
998you want to pass switches into your script, you can use one of the
999Getopts modules or put a loop on the front like this:
1000
1001 while ($_ = $ARGV[0], /^-/) {
1002 shift;
1003 last if /^--$/;
1004 if (/^-D(.*)/) { $debug = $1 }
1005 if (/^-v/) { $verbose++ }
1006 ... # other switches
1007 }
1008 while (<>) {
1009 ... # code for each line
1010 }
1011
d28ebecd 1012The E<lt>E<gt> symbol will return FALSE only once. If you call it again after
a0d0e21e 1013this it will assume you are processing another @ARGV list, and if you
1014haven't set @ARGV, will input from STDIN.
1015
1016If the string inside the angle brackets is a reference to a scalar
6ee5d4e7 1017variable (e.g. E<lt>$fooE<gt>), then that variable contains the name of the
cb1a09d0 1018filehandle to input from, or a reference to the same. For example:
1019
1020 $fh = \*STDIN;
1021 $line = <$fh>;
a0d0e21e 1022
cb1a09d0 1023If the string inside angle brackets is not a filehandle or a scalar
1024variable containing a filehandle name or reference, then it is interpreted
4633a7c4 1025as a filename pattern to be globbed, and either a list of filenames or the
1026next filename in the list is returned, depending on context. One level of
1027$ interpretation is done first, but you can't say C<E<lt>$fooE<gt>>
1028because that's an indirect filehandle as explained in the previous
6ee5d4e7 1029paragraph. (In older versions of Perl, programmers would insert curly
4633a7c4 1030brackets to force interpretation as a filename glob: C<E<lt>${foo}E<gt>>.
d28ebecd 1031These days, it's considered cleaner to call the internal function directly
4633a7c4 1032as C<glob($foo)>, which is probably the right way to have done it in the
1033first place.) Example:
a0d0e21e 1034
1035 while (<*.c>) {
1036 chmod 0644, $_;
1037 }
1038
1039is equivalent to
1040
1041 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1042 while (<FOO>) {
1043 chop;
1044 chmod 0644, $_;
1045 }
1046
1047In fact, it's currently implemented that way. (Which means it will not
1048work on filenames with spaces in them unless you have csh(1) on your
1049machine.) Of course, the shortest way to do the above is:
1050
1051 chmod 0644, <*.c>;
1052
1053Because globbing invokes a shell, it's often faster to call readdir() yourself
1054and just do your own grep() on the filenames. Furthermore, due to its current
1055implementation of using a shell, the glob() routine may get "Arg list too
1056long" errors (unless you've installed tcsh(1L) as F</bin/csh>).
1057
4633a7c4 1058A glob only evaluates its (embedded) argument when it is starting a new
1059list. All values must be read before it will start over. In a list
1060context this isn't important, because you automatically get them all
1061anyway. In a scalar context, however, the operator returns the next value
1062each time it is called, or a FALSE value if you've just run out. Again,
1063FALSE is returned only once. So if you're expecting a single value from
1064a glob, it is much better to say
1065
1066 ($file) = <blurch*>;
1067
1068than
1069
1070 $file = <blurch*>;
1071
1072because the latter will alternate between returning a filename and
1073returning FALSE.
1074
1075It you're trying to do variable interpolation, it's definitely better
1076to use the glob() function, because the older notation can cause people
e37d713d 1077to become confused with the indirect filehandle notation.
4633a7c4 1078
1079 @files = glob("$dir/*.[ch]");
1080 @files = glob($files[$i]);
1081
a0d0e21e 1082=head2 Constant Folding
1083
1084Like C, Perl does a certain amount of expression evaluation at
1085compile time, whenever it determines that all of the arguments to an
1086operator are static and have no side effects. In particular, string
1087concatenation happens at compile time between literals that don't do
1088variable substitution. Backslash interpretation also happens at
1089compile time. You can say
1090
1091 'Now is the time for all' . "\n" .
1092 'good men to come to.'
1093
1094and this all reduces to one string internally. Likewise, if
1095you say
1096
1097 foreach $file (@filenames) {
1098 if (-s $file > 5 + 100 * 2**16) { ... }
1099 }
1100
1101the compiler will pre-compute the number that
1102expression represents so that the interpreter
1103won't have to.
1104
1105
1106=head2 Integer arithmetic
1107
1108By default Perl assumes that it must do most of its arithmetic in
1109floating point. But by saying
1110
1111 use integer;
1112
1113you may tell the compiler that it's okay to use integer operations
1114from here to the end of the enclosing BLOCK. An inner BLOCK may
1115countermand this by saying
1116
1117 no integer;
1118
1119which lasts until the end of that BLOCK.
1120