[inseparable changes from match from perl-5.003_94 to perl-5.003_95]
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlop - Perl operators and precedence
4
5=head1 SYNOPSIS
6
7Perl operators have the following associativity and precedence,
8listed from highest precedence to lowest. Note that all operators
9borrowed from C keep the same precedence relationship with each other,
10even where C's precedence is slightly screwy. (This makes learning
c07a80fd 11Perl easier for C folks.) With very few exceptions, these all
12operate on scalar values only, not array values.
a0d0e21e 13
14 left terms and list operators (leftward)
15 left ->
16 nonassoc ++ --
17 right **
18 right ! ~ \ and unary + and -
19 left =~ !~
20 left * / % x
21 left + - .
22 left << >>
23 nonassoc named unary operators
24 nonassoc < > <= >= lt gt le ge
25 nonassoc == != <=> eq ne cmp
26 left &
27 left | ^
28 left &&
29 left ||
30 nonassoc ..
31 right ?:
32 right = += -= *= etc.
33 left , =>
34 nonassoc list operators (rightward)
a5f75d66 35 right not
a0d0e21e 36 left and
37 left or xor
38
39In the following sections, these operators are covered in precedence order.
40
cb1a09d0 41=head1 DESCRIPTION
a0d0e21e 42
43=head2 Terms and List Operators (Leftward)
44
45Any TERM is of highest precedence of Perl. These includes variables,
5f05dabc 46quote and quote-like operators, any expression in parentheses,
a0d0e21e 47and any function whose arguments are parenthesized. Actually, there
48aren't really functions in this sense, just list operators and unary
49operators behaving as functions because you put parentheses around
50the arguments. These are all documented in L<perlfunc>.
51
52If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
53is followed by a left parenthesis as the next token, the operator and
54arguments within parentheses are taken to be of highest precedence,
55just like a normal function call.
56
57In the absence of parentheses, the precedence of list operators such as
58C<print>, C<sort>, or C<chmod> is either very high or very low depending on
59whether you look at the left side of operator or the right side of it.
60For example, in
61
62 @ary = (1, 3, sort 4, 2);
63 print @ary; # prints 1324
64
65the commas on the right of the sort are evaluated before the sort, but
66the commas on the left are evaluated after. In other words, list
67operators tend to gobble up all the arguments that follow them, and
68then act like a simple TERM with regard to the preceding expression.
5f05dabc 69Note that you have to be careful with parentheses:
a0d0e21e 70
71 # These evaluate exit before doing the print:
72 print($foo, exit); # Obviously not what you want.
73 print $foo, exit; # Nor is this.
74
75 # These do the print before evaluating exit:
76 (print $foo), exit; # This is what you want.
77 print($foo), exit; # Or this.
78 print ($foo), exit; # Or even this.
79
80Also note that
81
82 print ($foo & 255) + 1, "\n";
83
84probably doesn't do what you expect at first glance. See
85L<Named Unary Operators> for more discussion of this.
86
87Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
88well as subroutine and method calls, and the anonymous
89constructors C<[]> and C<{}>.
90
2ae324a7 91See also L<Quote and Quote-like Operators> toward the end of this section,
c07a80fd 92as well as L<"I/O Operators">.
a0d0e21e 93
94=head2 The Arrow Operator
95
96Just as in C and C++, "C<-E<gt>>" is an infix dereference operator. If the
97right side is either a C<[...]> or C<{...}> subscript, then the left side
98must be either a hard or symbolic reference to an array or hash (or
99a location capable of holding a hard reference, if it's an lvalue (assignable)).
100See L<perlref>.
101
102Otherwise, the right side is a method name or a simple scalar variable
103containing the method name, and the left side must either be an object
104(a blessed reference) or a class name (that is, a package name).
105See L<perlobj>.
106
5f05dabc 107=head2 Auto-increment and Auto-decrement
a0d0e21e 108
109"++" and "--" work as in C. That is, if placed before a variable, they
110increment or decrement the variable before returning the value, and if
111placed after, increment or decrement the variable after returning the value.
112
5f05dabc 113The auto-increment operator has a little extra built-in magic to it. If
a0d0e21e 114you increment a variable that is numeric, or that has ever been used in
115a numeric context, you get a normal increment. If, however, the
5f05dabc 116variable has been used in only string contexts since it was set, and
a0d0e21e 117has a value that is not null and matches the pattern
118C</^[a-zA-Z]*[0-9]*$/>, the increment is done as a string, preserving each
119character within its range, with carry:
120
121 print ++($foo = '99'); # prints '100'
122 print ++($foo = 'a0'); # prints 'a1'
123 print ++($foo = 'Az'); # prints 'Ba'
124 print ++($foo = 'zz'); # prints 'aaa'
125
5f05dabc 126The auto-decrement operator is not magical.
a0d0e21e 127
128=head2 Exponentiation
129
130Binary "**" is the exponentiation operator. Note that it binds even more
cb1a09d0 131tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
132implemented using C's pow(3) function, which actually works on doubles
133internally.)
a0d0e21e 134
135=head2 Symbolic Unary Operators
136
5f05dabc 137Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e 138precedence version of this.
139
140Unary "-" performs arithmetic negation if the operand is numeric. If
141the operand is an identifier, a string consisting of a minus sign
142concatenated with the identifier is returned. Otherwise, if the string
143starts with a plus or minus, a string starting with the opposite sign
144is returned. One effect of these rules is that C<-bareword> is equivalent
145to C<"-bareword">.
146
5f05dabc 147Unary "~" performs bitwise negation, i.e., 1's complement.
55497cff 148(See also L<Integer Arithmetic>.)
a0d0e21e 149
150Unary "+" has no effect whatsoever, even on strings. It is useful
151syntactically for separating a function name from a parenthesized expression
152that would otherwise be interpreted as the complete list of function
5ba421f6 153arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
a0d0e21e 154
155Unary "\" creates a reference to whatever follows it. See L<perlref>.
156Do not confuse this behavior with the behavior of backslash within a
157string, although both forms do convey the notion of protecting the next
158thing from interpretation.
159
160=head2 Binding Operators
161
c07a80fd 162Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 163search or modify the string $_ by default. This operator makes that kind
164of operation work on some other string. The right argument is a search
165pattern, substitution, or translation. The left argument is what is
166supposed to be searched, substituted, or translated instead of the default
167$_. The return value indicates the success of the operation. (If the
168right argument is an expression rather than a search pattern,
169substitution, or translation, it is interpreted as a search pattern at run
aa689395 170time. This can be is less efficient than an explicit search, because the
171pattern must be compiled every time the expression is evaluated.
a0d0e21e 172
173Binary "!~" is just like "=~" except the return value is negated in
174the logical sense.
175
176=head2 Multiplicative Operators
177
178Binary "*" multiplies two numbers.
179
180Binary "/" divides two numbers.
181
182Binary "%" computes the modulus of the two numbers.
183
184Binary "x" is the repetition operator. In a scalar context, it
185returns a string consisting of the left operand repeated the number of
186times specified by the right operand. In a list context, if the left
5f05dabc 187operand is a list in parentheses, it repeats the list.
a0d0e21e 188
189 print '-' x 80; # print row of dashes
190
191 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
192
193 @ones = (1) x 80; # a list of 80 1's
194 @ones = (5) x @ones; # set all elements to 5
195
196
197=head2 Additive Operators
198
199Binary "+" returns the sum of two numbers.
200
201Binary "-" returns the difference of two numbers.
202
203Binary "." concatenates two strings.
204
205=head2 Shift Operators
206
55497cff 207Binary "<<" returns the value of its left argument shifted left by the
208number of bits specified by the right argument. Arguments should be
209integers. (See also L<Integer Arithmetic>.)
a0d0e21e 210
55497cff 211Binary ">>" returns the value of its left argument shifted right by
212the number of bits specified by the right argument. Arguments should
213be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 214
215=head2 Named Unary Operators
216
217The various named unary operators are treated as functions with one
218argument, with optional parentheses. These include the filetest
219operators, like C<-f>, C<-M>, etc. See L<perlfunc>.
220
221If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
222is followed by a left parenthesis as the next token, the operator and
223arguments within parentheses are taken to be of highest precedence,
224just like a normal function call. Examples:
225
226 chdir $foo || die; # (chdir $foo) || die
227 chdir($foo) || die; # (chdir $foo) || die
228 chdir ($foo) || die; # (chdir $foo) || die
229 chdir +($foo) || die; # (chdir $foo) || die
230
231but, because * is higher precedence than ||:
232
233 chdir $foo * 20; # chdir ($foo * 20)
234 chdir($foo) * 20; # (chdir $foo) * 20
235 chdir ($foo) * 20; # (chdir $foo) * 20
236 chdir +($foo) * 20; # chdir ($foo * 20)
237
238 rand 10 * 20; # rand (10 * 20)
239 rand(10) * 20; # (rand 10) * 20
240 rand (10) * 20; # (rand 10) * 20
241 rand +(10) * 20; # rand (10 * 20)
242
5ba421f6 243See also L<"Terms and List Operators (Leftward)">.
a0d0e21e 244
245=head2 Relational Operators
246
6ee5d4e7 247Binary "E<lt>" returns true if the left argument is numerically less than
a0d0e21e 248the right argument.
249
6ee5d4e7 250Binary "E<gt>" returns true if the left argument is numerically greater
a0d0e21e 251than the right argument.
252
6ee5d4e7 253Binary "E<lt>=" returns true if the left argument is numerically less than
a0d0e21e 254or equal to the right argument.
255
6ee5d4e7 256Binary "E<gt>=" returns true if the left argument is numerically greater
a0d0e21e 257than or equal to the right argument.
258
259Binary "lt" returns true if the left argument is stringwise less than
260the right argument.
261
262Binary "gt" returns true if the left argument is stringwise greater
263than the right argument.
264
265Binary "le" returns true if the left argument is stringwise less than
266or equal to the right argument.
267
268Binary "ge" returns true if the left argument is stringwise greater
269than or equal to the right argument.
270
271=head2 Equality Operators
272
273Binary "==" returns true if the left argument is numerically equal to
274the right argument.
275
276Binary "!=" returns true if the left argument is numerically not equal
277to the right argument.
278
6ee5d4e7 279Binary "E<lt>=E<gt>" returns -1, 0, or 1 depending on whether the left
280argument is numerically less than, equal to, or greater than the right
281argument.
a0d0e21e 282
283Binary "eq" returns true if the left argument is stringwise equal to
284the right argument.
285
286Binary "ne" returns true if the left argument is stringwise not equal
287to the right argument.
288
289Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise
290less than, equal to, or greater than the right argument.
291
a034a98d 292"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
293by the current locale if C<use locale> is in effect. See L<perllocale>.
294
a0d0e21e 295=head2 Bitwise And
296
297Binary "&" returns its operators ANDed together bit by bit.
55497cff 298(See also L<Integer Arithmetic>.)
a0d0e21e 299
300=head2 Bitwise Or and Exclusive Or
301
302Binary "|" returns its operators ORed together bit by bit.
55497cff 303(See also L<Integer Arithmetic>.)
a0d0e21e 304
305Binary "^" returns its operators XORed together bit by bit.
55497cff 306(See also L<Integer Arithmetic>.)
a0d0e21e 307
308=head2 C-style Logical And
309
310Binary "&&" performs a short-circuit logical AND operation. That is,
311if the left operand is false, the right operand is not even evaluated.
312Scalar or list context propagates down to the right operand if it
313is evaluated.
314
315=head2 C-style Logical Or
316
317Binary "||" performs a short-circuit logical OR operation. That is,
318if the left operand is true, the right operand is not even evaluated.
319Scalar or list context propagates down to the right operand if it
320is evaluated.
321
322The C<||> and C<&&> operators differ from C's in that, rather than returning
3230 or 1, they return the last value evaluated. Thus, a reasonably portable
324way to find out the home directory (assuming it's not "0") might be:
325
326 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
327 (getpwuid($<))[7] || die "You're homeless!\n";
328
329As more readable alternatives to C<&&> and C<||>, Perl provides "and" and
330"or" operators (see below). The short-circuit behavior is identical. The
331precedence of "and" and "or" is much lower, however, so that you can
332safely use them after a list operator without the need for
333parentheses:
334
335 unlink "alpha", "beta", "gamma"
336 or gripe(), next LINE;
337
338With the C-style operators that would have been written like this:
339
340 unlink("alpha", "beta", "gamma")
341 || (gripe(), next LINE);
342
343=head2 Range Operator
344
345Binary ".." is the range operator, which is really two different
346operators depending on the context. In a list context, it returns an
347array of values counting (by ones) from the left value to the right
348value. This is useful for writing C<for (1..10)> loops and for doing
349slice operations on arrays. Be aware that under the current implementation,
350a temporary array is created, so you'll burn a lot of memory if you
351write something like this:
352
353 for (1 .. 1_000_000) {
354 # code
355 }
356
357In a scalar context, ".." returns a boolean value. The operator is
358bistable, like a flip-flop, and emulates the line-range (comma) operator
359of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
360own boolean state. It is false as long as its left operand is false.
361Once the left operand is true, the range operator stays true until the
362right operand is true, I<AFTER> which the range operator becomes false
363again. (It doesn't become false till the next time the range operator is
364evaluated. It can test the right operand and become false on the same
365evaluation it became true (as in B<awk>), but it still returns true once.
366If you don't want it to test the right operand till the next evaluation
367(as in B<sed>), use three dots ("...") instead of two.) The right
368operand is not evaluated while the operator is in the "false" state, and
369the left operand is not evaluated while the operator is in the "true"
370state. The precedence is a little lower than || and &&. The value
371returned is either the null string for false, or a sequence number
372(beginning with 1) for true. The sequence number is reset for each range
373encountered. The final sequence number in a range has the string "E0"
374appended to it, which doesn't affect its numeric value, but gives you
375something to search for if you want to exclude the endpoint. You can
376exclude the beginning point by waiting for the sequence number to be
377greater than 1. If either operand of scalar ".." is a numeric literal,
378that operand is implicitly compared to the C<$.> variable, the current
379line number. Examples:
380
381As a scalar operator:
382
383 if (101 .. 200) { print; } # print 2nd hundred lines
384 next line if (1 .. /^$/); # skip header lines
385 s/^/> / if (/^$/ .. eof()); # quote body
386
387As a list operator:
388
389 for (101 .. 200) { print; } # print $_ 100 times
390 @foo = @foo[$[ .. $#foo]; # an expensive no-op
391 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
392
393The range operator (in a list context) makes use of the magical
5f05dabc 394auto-increment algorithm if the operands are strings. You
a0d0e21e 395can say
396
397 @alphabet = ('A' .. 'Z');
398
399to get all the letters of the alphabet, or
400
401 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
402
403to get a hexadecimal digit, or
404
405 @z2 = ('01' .. '31'); print $z2[$mday];
406
407to get dates with leading zeros. If the final value specified is not
408in the sequence that the magical increment would produce, the sequence
409goes until the next value would be longer than the final value
410specified.
411
412=head2 Conditional Operator
413
414Ternary "?:" is the conditional operator, just as in C. It works much
415like an if-then-else. If the argument before the ? is true, the
416argument before the : is returned, otherwise the argument after the :
cb1a09d0 417is returned. For example:
418
419 printf "I have %d dog%s.\n", $n,
420 ($n == 1) ? '' : "s";
421
422Scalar or list context propagates downward into the 2nd
423or 3rd argument, whichever is selected.
424
425 $a = $ok ? $b : $c; # get a scalar
426 @a = $ok ? @b : @c; # get an array
427 $a = $ok ? @b : @c; # oops, that's just a count!
428
429The operator may be assigned to if both the 2nd and 3rd arguments are
430legal lvalues (meaning that you can assign to them):
a0d0e21e 431
432 ($a_or_b ? $a : $b) = $c;
433
cb1a09d0 434This is not necessarily guaranteed to contribute to the readability of your program.
a0d0e21e 435
4633a7c4 436=head2 Assignment Operators
a0d0e21e 437
438"=" is the ordinary assignment operator.
439
440Assignment operators work as in C. That is,
441
442 $a += 2;
443
444is equivalent to
445
446 $a = $a + 2;
447
448although without duplicating any side effects that dereferencing the lvalue
449might trigger, such as from tie(). Other assignment operators work similarly.
450The following are recognized:
451
452 **= += *= &= <<= &&=
453 -= /= |= >>= ||=
454 .= %= ^=
455 x=
456
457Note that while these are grouped by family, they all have the precedence
458of assignment.
459
460Unlike in C, the assignment operator produces a valid lvalue. Modifying
461an assignment is equivalent to doing the assignment and then modifying
462the variable that was assigned to. This is useful for modifying
463a copy of something, like this:
464
465 ($tmp = $global) =~ tr [A-Z] [a-z];
466
467Likewise,
468
469 ($a += 2) *= 3;
470
471is equivalent to
472
473 $a += 2;
474 $a *= 3;
475
748a9306 476=head2 Comma Operator
a0d0e21e 477
478Binary "," is the comma operator. In a scalar context it evaluates
479its left argument, throws that value away, then evaluates its right
480argument and returns that value. This is just like C's comma operator.
481
482In a list context, it's just the list argument separator, and inserts
483both its arguments into the list.
484
6ee5d4e7 485The =E<gt> digraph is mostly just a synonym for the comma operator. It's useful for
cb1a09d0 486documenting arguments that come in pairs. As of release 5.001, it also forces
4633a7c4 487any word to the left of it to be interpreted as a string.
748a9306 488
a0d0e21e 489=head2 List Operators (Rightward)
490
491On the right side of a list operator, it has very low precedence,
492such that it controls all comma-separated expressions found there.
493The only operators with lower precedence are the logical operators
494"and", "or", and "not", which may be used to evaluate calls to list
495operators without the need for extra parentheses:
496
497 open HANDLE, "filename"
498 or die "Can't open: $!\n";
499
5ba421f6 500See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e 501
502=head2 Logical Not
503
504Unary "not" returns the logical negation of the expression to its right.
505It's the equivalent of "!" except for the very low precedence.
506
507=head2 Logical And
508
509Binary "and" returns the logical conjunction of the two surrounding
510expressions. It's equivalent to && except for the very low
5f05dabc 511precedence. This means that it short-circuits: i.e., the right
a0d0e21e 512expression is evaluated only if the left expression is true.
513
514=head2 Logical or and Exclusive Or
515
516Binary "or" returns the logical disjunction of the two surrounding
517expressions. It's equivalent to || except for the very low
5f05dabc 518precedence. This means that it short-circuits: i.e., the right
a0d0e21e 519expression is evaluated only if the left expression is false.
520
521Binary "xor" returns the exclusive-OR of the two surrounding expressions.
522It cannot short circuit, of course.
523
524=head2 C Operators Missing From Perl
525
526Here is what C has that Perl doesn't:
527
528=over 8
529
530=item unary &
531
532Address-of operator. (But see the "\" operator for taking a reference.)
533
534=item unary *
535
536Dereference-address operator. (Perl's prefix dereferencing
537operators are typed: $, @, %, and &.)
538
539=item (TYPE)
540
541Type casting operator.
542
543=back
544
5f05dabc 545=head2 Quote and Quote-like Operators
a0d0e21e 546
547While we usually think of quotes as literal values, in Perl they
548function as operators, providing various kinds of interpolating and
549pattern matching capabilities. Perl provides customary quote characters
550for these behaviors, but also provides a way for you to choose your
551quote character for any of them. In the following table, a C<{}> represents
552any pair of delimiters you choose. Non-bracketing delimiters use
553the same character fore and aft, but the 4 sorts of brackets
554(round, angle, square, curly) will all nest.
555
556 Customary Generic Meaning Interpolates
557 '' q{} Literal no
558 "" qq{} Literal yes
559 `` qx{} Command yes
560 qw{} Word list no
561 // m{} Pattern match yes
562 s{}{} Substitution yes
563 tr{}{} Translation no
564
cb1a09d0 565For constructs that do interpolation, variables beginning with "C<$>" or "C<@>"
a0d0e21e 566are interpolated, as are the following sequences:
567
6ee5d4e7 568 \t tab (HT, TAB)
569 \n newline (LF, NL)
570 \r return (CR)
571 \f form feed (FF)
572 \b backspace (BS)
573 \a alarm (bell) (BEL)
574 \e escape (ESC)
a0d0e21e 575 \033 octal char
576 \x1b hex char
577 \c[ control char
578 \l lowercase next char
579 \u uppercase next char
580 \L lowercase till \E
581 \U uppercase till \E
582 \E end case modification
583 \Q quote regexp metacharacters till \E
584
a034a98d 585If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
586and <\U> is taken from the current locale. See L<perllocale>.
587
a0d0e21e 588Patterns are subject to an additional level of interpretation as a
589regular expression. This is done as a second pass, after variables are
590interpolated, so that regular expressions may be incorporated into the
591pattern from the variables. If this is not what you want, use C<\Q> to
592interpolate a variable literally.
593
594Apart from the above, there are no multiple levels of interpolation. In
5f05dabc 595particular, contrary to the expectations of shell programmers, back-quotes
a0d0e21e 596do I<NOT> interpolate within double quotes, nor do single quotes impede
597evaluation of variables when used within double quotes.
598
5f05dabc 599=head2 Regexp Quote-Like Operators
cb1a09d0 600
5f05dabc 601Here are the quote-like operators that apply to pattern
cb1a09d0 602matching and related activities.
603
a0d0e21e 604=over 8
605
606=item ?PATTERN?
607
608This is just like the C</pattern/> search, except that it matches only
609once between calls to the reset() operator. This is a useful
5f05dabc 610optimization when you want to see only the first occurrence of
a0d0e21e 611something in each file of a set of files, for instance. Only C<??>
612patterns local to the current package are reset.
613
614This usage is vaguely deprecated, and may be removed in some future
615version of Perl.
616
617=item m/PATTERN/gimosx
618
619=item /PATTERN/gimosx
620
621Searches a string for a pattern match, and in a scalar context returns
622true (1) or false (''). If no string is specified via the C<=~> or
623C<!~> operator, the $_ string is searched. (The string specified with
624C<=~> need not be an lvalue--it may be the result of an expression
625evaluation, but remember the C<=~> binds rather tightly.) See also
626L<perlre>.
a034a98d 627See L<perllocale> for discussion of additional considerations which apply
628when C<use locale> is in effect.
a0d0e21e 629
630Options are:
631
5f05dabc 632 g Match globally, i.e., find all occurrences.
a0d0e21e 633 i Do case-insensitive pattern matching.
634 m Treat string as multiple lines.
5f05dabc 635 o Compile pattern only once.
a0d0e21e 636 s Treat string as single line.
637 x Use extended regular expressions.
638
639If "/" is the delimiter then the initial C<m> is optional. With the C<m>
640you can use any pair of non-alphanumeric, non-whitespace characters as
641delimiters. This is particularly useful for matching Unix path names
642that contain "/", to avoid LTS (leaning toothpick syndrome).
643
644PATTERN may contain variables, which will be interpolated (and the
645pattern recompiled) every time the pattern search is evaluated. (Note
646that C<$)> and C<$|> might not be interpolated because they look like
647end-of-string tests.) If you want such a pattern to be compiled only
648once, add a C</o> after the trailing delimiter. This avoids expensive
649run-time recompilations, and is useful when the value you are
650interpolating won't change over the life of the script. However, mentioning
651C</o> constitutes a promise that you won't change the variables in the pattern.
652If you change them, Perl won't even notice.
653
4633a7c4 654If the PATTERN evaluates to a null string, the last
655successfully executed regular expression is used instead.
a0d0e21e 656
657If used in a context that requires a list value, a pattern match returns a
658list consisting of the subexpressions matched by the parentheses in the
5f05dabc 659pattern, i.e., (C<$1>, $2, $3...). (Note that here $1 etc. are also set, and
a0d0e21e 660that this differs from Perl 4's behavior.) If the match fails, a null
661array is returned. If the match succeeds, but there were no parentheses,
662a list value of (1) is returned.
663
664Examples:
665
666 open(TTY, '/dev/tty');
667 <TTY> =~ /^y/i && foo(); # do foo if desired
668
669 if (/Version: *([0-9.]*)/) { $version = $1; }
670
671 next if m#^/usr/spool/uucp#;
672
673 # poor man's grep
674 $arg = shift;
675 while (<>) {
676 print if /$arg/o; # compile only once
677 }
678
679 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
680
681This last example splits $foo into the first two words and the
5f05dabc 682remainder of the line, and assigns those three fields to $F1, $F2, and
683$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e 684the pattern matched.
685
686The C</g> modifier specifies global pattern matching--that is, matching
687as many times as possible within the string. How it behaves depends on
688the context. In a list context, it returns a list of all the
689substrings matched by all the parentheses in the regular expression.
690If there are no parentheses, it returns a list of all the matched
691strings, as if there were parentheses around the whole pattern.
692
693In a scalar context, C<m//g> iterates through the string, returning TRUE
694each time it matches, and FALSE when it eventually runs out of
695matches. (In other words, it remembers where it left off last time and
696restarts the search at that point. You can actually find the current
44a8e56a 697match position of a string or set it using the pos() function--see
698L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
699matches or intermix C<m//g> matches with C<m/\G.../>.
700
a0d0e21e 701If you modify the string in any way, the match position is reset to the
702beginning. Examples:
703
704 # list context
705 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
706
707 # scalar context
5f05dabc 708 $/ = ""; $* = 1; # $* deprecated in modern perls
a0d0e21e 709 while ($paragraph = <>) {
710 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
711 $sentences++;
712 }
713 }
714 print "$sentences\n";
715
44a8e56a 716 # using m//g with \G
717 $_ = "ppooqppq";
718 while ($i++ < 2) {
719 print "1: '";
720 print $1 while /(o)/g; print "', pos=", pos, "\n";
721 print "2: '";
722 print $1 if /\G(q)/; print "', pos=", pos, "\n";
723 print "3: '";
724 print $1 while /(p)/g; print "', pos=", pos, "\n";
725 }
726
727The last example should print:
728
729 1: 'oo', pos=4
730 2: 'q', pos=4
731 3: 'pp', pos=7
732 1: '', pos=7
733 2: 'q', pos=7
734 3: '', pos=7
735
736Note how C<m//g> matches change the value reported by C<pos()>, but the
737non-global match doesn't.
738
e7ea3e70 739A useful idiom for C<lex>-like scanners is C</\G.../g>. You can
740combine several regexps like this to process a string part-by-part,
741doing different actions depending on which regexp matched. The next
742regexp would step in at the place the previous one left off.
743
3fe9a6f1 744 $_ = <<'EOL';
e7ea3e70 745 $url = new URI::URL "http://www/"; die if $url eq "xXx";
3fe9a6f1 746 EOL
747 LOOP:
e7ea3e70 748 {
749 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/g;
750 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/g;
751 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/g;
752 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/g;
753 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/g;
754 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/g;
755 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/g;
756 print ". That's all!\n";
757 }
758
759Here is the output (split into several lines):
760
761 line-noise lowercase line-noise lowercase UPPERCASE line-noise
762 UPPERCASE line-noise lowercase line-noise lowercase line-noise
763 lowercase lowercase line-noise lowercase lowercase line-noise
764 MiXeD line-noise. That's all!
44a8e56a 765
a0d0e21e 766=item q/STRING/
767
768=item C<'STRING'>
769
68dc0745 770A single-quoted, literal string. A backslash represents a backslash
771unless followed by the delimiter or another backslash, in which case
772the delimiter or backslash is interpolated.
a0d0e21e 773
774 $foo = q!I said, "You said, 'She said it.'"!;
775 $bar = q('This is it.');
68dc0745 776 $baz = '\n'; # a two-character string
a0d0e21e 777
778=item qq/STRING/
779
780=item "STRING"
781
782A double-quoted, interpolated string.
783
784 $_ .= qq
785 (*** The previous line contains the naughty word "$1".\n)
786 if /(tcl|rexx|python)/; # :-)
68dc0745 787 $baz = "\n"; # a one-character string
a0d0e21e 788
789=item qx/STRING/
790
791=item `STRING`
792
793A string which is interpolated and then executed as a system command.
794The collected standard output of the command is returned. In scalar
795context, it comes back as a single (potentially multi-line) string.
796In list context, returns a list of lines (however you've defined lines
797with $/ or $INPUT_RECORD_SEPARATOR).
798
799 $today = qx{ date };
800
801See L<I/O Operators> for more discussion.
802
803=item qw/STRING/
804
805Returns a list of the words extracted out of STRING, using embedded
806whitespace as the word delimiters. It is exactly equivalent to
807
808 split(' ', q/STRING/);
809
810Some frequently seen examples:
811
812 use POSIX qw( setlocale localeconv )
813 @EXPORT = qw( foo bar baz );
814
815=item s/PATTERN/REPLACEMENT/egimosx
816
817Searches a string for a pattern, and if found, replaces that pattern
818with the replacement text and returns the number of substitutions
e37d713d 819made. Otherwise it returns false (specifically, the empty string).
a0d0e21e 820
821If no string is specified via the C<=~> or C<!~> operator, the C<$_>
822variable is searched and modified. (The string specified with C<=~> must
823be a scalar variable, an array element, a hash element, or an assignment
5f05dabc 824to one of those, i.e., an lvalue.)
a0d0e21e 825
826If the delimiter chosen is single quote, no variable interpolation is
827done on either the PATTERN or the REPLACEMENT. Otherwise, if the
828PATTERN contains a $ that looks like a variable rather than an
829end-of-string test, the variable will be interpolated into the pattern
5f05dabc 830at run-time. If you want the pattern compiled only once the first time
a0d0e21e 831the variable is interpolated, use the C</o> option. If the pattern
4633a7c4 832evaluates to a null string, the last successfully executed regular
a0d0e21e 833expression is used instead. See L<perlre> for further explanation on these.
a034a98d 834See L<perllocale> for discussion of additional considerations which apply
835when C<use locale> is in effect.
a0d0e21e 836
837Options are:
838
839 e Evaluate the right side as an expression.
5f05dabc 840 g Replace globally, i.e., all occurrences.
a0d0e21e 841 i Do case-insensitive pattern matching.
842 m Treat string as multiple lines.
5f05dabc 843 o Compile pattern only once.
a0d0e21e 844 s Treat string as single line.
845 x Use extended regular expressions.
846
847Any non-alphanumeric, non-whitespace delimiter may replace the
848slashes. If single quotes are used, no interpretation is done on the
e37d713d 849replacement string (the C</e> modifier overrides this, however). Unlike
5f05dabc 850Perl 4, Perl 5 treats back-ticks as normal delimiters; the replacement
e37d713d 851text is not evaluated as a command. If the
a0d0e21e 852PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
5f05dabc 853pair of quotes, which may or may not be bracketing quotes, e.g.,
a0d0e21e 854C<s(foo)(bar)> or C<sE<lt>fooE<gt>/bar/>. A C</e> will cause the
855replacement portion to be interpreter as a full-fledged Perl expression
856and eval()ed right then and there. It is, however, syntax checked at
857compile-time.
858
859Examples:
860
861 s/\bgreen\b/mauve/g; # don't change wintergreen
862
863 $path =~ s|/usr/bin|/usr/local/bin|;
864
865 s/Login: $foo/Login: $bar/; # run-time pattern
866
867 ($foo = $bar) =~ s/this/that/;
868
869 $count = ($paragraph =~ s/Mister\b/Mr./g);
870
871 $_ = 'abc123xyz';
872 s/\d+/$&*2/e; # yields 'abc246xyz'
873 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
874 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
875
876 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
877 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
878 s/^=(\w+)/&pod($1)/ge; # use function call
879
880 # /e's can even nest; this will expand
881 # simple embedded variables in $_
882 s/(\$\w+)/$1/eeg;
883
884 # Delete C comments.
885 $program =~ s {
4633a7c4 886 /\* # Match the opening delimiter.
887 .*? # Match a minimal number of characters.
888 \*/ # Match the closing delimiter.
a0d0e21e 889 } []gsx;
890
891 s/^\s*(.*?)\s*$/$1/; # trim white space
892
893 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
894
895Note the use of $ instead of \ in the last example. Unlike
5f05dabc 896B<sed>, we use the \E<lt>I<digit>E<gt> form in only the left hand side.
6ee5d4e7 897Anywhere else it's $E<lt>I<digit>E<gt>.
a0d0e21e 898
5f05dabc 899Occasionally, you can't use just a C</g> to get all the changes
a0d0e21e 900to occur. Here are two common cases:
901
902 # put commas in the right places in an integer
903 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4
904 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
905
906 # expand tabs to 8-column spacing
907 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
908
909
910=item tr/SEARCHLIST/REPLACEMENTLIST/cds
911
912=item y/SEARCHLIST/REPLACEMENTLIST/cds
913
914Translates all occurrences of the characters found in the search list
915with the corresponding character in the replacement list. It returns
916the number of characters replaced or deleted. If no string is
917specified via the =~ or !~ operator, the $_ string is translated. (The
918string specified with =~ must be a scalar variable, an array element,
5f05dabc 919or an assignment to one of those, i.e., an lvalue.) For B<sed> devotees,
a0d0e21e 920C<y> is provided as a synonym for C<tr>. If the SEARCHLIST is
921delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of
5f05dabc 922quotes, which may or may not be bracketing quotes, e.g., C<tr[A-Z][a-z]>
a0d0e21e 923or C<tr(+-*/)/ABCD/>.
924
925Options:
926
927 c Complement the SEARCHLIST.
928 d Delete found but unreplaced characters.
929 s Squash duplicate replaced characters.
930
931If the C</c> modifier is specified, the SEARCHLIST character set is
932complemented. If the C</d> modifier is specified, any characters specified
933by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note
934that this is slightly more flexible than the behavior of some B<tr>
935programs, which delete anything they find in the SEARCHLIST, period.)
936If the C</s> modifier is specified, sequences of characters that were
937translated to the same character are squashed down to a single instance of the
938character.
939
940If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
941exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
942than the SEARCHLIST, the final character is replicated till it is long
943enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated.
944This latter is useful for counting characters in a class or for
945squashing character sequences in a class.
946
947Examples:
948
949 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
950
951 $cnt = tr/*/*/; # count the stars in $_
952
953 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
954
955 $cnt = tr/0-9//; # count the digits in $_
956
957 tr/a-zA-Z//s; # bookkeeper -> bokeper
958
959 ($HOST = $host) =~ tr/a-z/A-Z/;
960
961 tr/a-zA-Z/ /cs; # change non-alphas to single space
962
963 tr [\200-\377]
964 [\000-\177]; # delete 8th bit
965
748a9306 966If multiple translations are given for a character, only the first one is used:
967
968 tr/AAA/XYZ/
969
970will translate any A to X.
971
a0d0e21e 972Note that because the translation table is built at compile time, neither
973the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
974interpolation. That means that if you want to use variables, you must use
975an eval():
976
977 eval "tr/$oldlist/$newlist/";
978 die $@ if $@;
979
980 eval "tr/$oldlist/$newlist/, 1" or die $@;
981
982=back
983
984=head2 I/O Operators
985
986There are several I/O operators you should know about.
5f05dabc 987A string is enclosed by back-ticks (grave accents) first undergoes
a0d0e21e 988variable substitution just like a double quoted string. It is then
989interpreted as a command, and the output of that command is the value
990of the pseudo-literal, like in a shell. In a scalar context, a single
991string consisting of all the output is returned. In a list context,
992a list of values is returned, one for each line of output. (You can
993set C<$/> to use a different line terminator.) The command is executed
994each time the pseudo-literal is evaluated. The status value of the
995command is returned in C<$?> (see L<perlvar> for the interpretation
996of C<$?>). Unlike in B<csh>, no translation is done on the return
997data--newlines remain newlines. Unlike in any of the shells, single
998quotes do not hide variable names in the command from interpretation.
999To pass a $ through to the shell you need to hide it with a backslash.
5f05dabc 1000The generalized form of back-ticks is C<qx//>. (Because back-ticks
cb1a09d0 1001always undergo shell expansion as well, see L<perlsec> for
1002security concerns.)
a0d0e21e 1003
1004Evaluating a filehandle in angle brackets yields the next line from
aa689395 1005that file (newline, if any, included), or C<undef> at end of file.
1006Ordinarily you must assign that value to a variable, but there is one
1007situation where an automatic assignment happens. I<If and ONLY if> the
1008input symbol is the only thing inside the conditional of a C<while> or
1009C<for(;;)> loop, the value is automatically assigned to the variable
1010C<$_>. The assigned value is then tested to see if it is defined.
1011(This may seem like an odd thing to you, but you'll use the construct
1012in almost every Perl script you write.) Anyway, the following lines
1013are equivalent to each other:
a0d0e21e 1014
748a9306 1015 while (defined($_ = <STDIN>)) { print; }
a0d0e21e 1016 while (<STDIN>) { print; }
1017 for (;<STDIN>;) { print; }
748a9306 1018 print while defined($_ = <STDIN>);
a0d0e21e 1019 print while <STDIN>;
1020
5f05dabc 1021The filehandles STDIN, STDOUT, and STDERR are predefined. (The
1022filehandles C<stdin>, C<stdout>, and C<stderr> will also work except in
a0d0e21e 1023packages, where they would be interpreted as local identifiers rather
1024than global.) Additional filehandles may be created with the open()
cb1a09d0 1025function. See L<perlfunc/open()> for details on this.
a0d0e21e 1026
6ee5d4e7 1027If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a
a0d0e21e 1028list consisting of all the input lines is returned, one line per list
1029element. It's easy to make a I<LARGE> data space this way, so use with
1030care.
1031
d28ebecd 1032The null filehandle E<lt>E<gt> is special and can be used to emulate the
1033behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
a0d0e21e 1034standard input, or from each file listed on the command line. Here's
d28ebecd 1035how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
a0d0e21e 1036checked, and if it is null, C<$ARGV[0]> is set to "-", which when opened
1037gives you standard input. The @ARGV array is then processed as a list
1038of filenames. The loop
1039
1040 while (<>) {
1041 ... # code for each line
1042 }
1043
1044is equivalent to the following Perl-like pseudo code:
1045
1046 unshift(@ARGV, '-') if $#ARGV < $[;
1047 while ($ARGV = shift) {
1048 open(ARGV, $ARGV);
1049 while (<ARGV>) {
1050 ... # code for each line
1051 }
1052 }
1053
1054except that it isn't so cumbersome to say, and will actually work. It
1055really does shift array @ARGV and put the current filename into variable
5f05dabc 1056$ARGV. It also uses filehandle I<ARGV> internally--E<lt>E<gt> is just a
1057synonym for E<lt>ARGVE<gt>, which is magical. (The pseudo code above
1058doesn't work because it treats E<lt>ARGVE<gt> as non-magical.)
a0d0e21e 1059
d28ebecd 1060You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
a0d0e21e 1061containing the list of filenames you really want. Line numbers (C<$.>)
1062continue as if the input were one big happy file. (But see example
1063under eof() for how to reset line numbers on each file.)
1064
1065If you want to set @ARGV to your own list of files, go right ahead. If
1066you want to pass switches into your script, you can use one of the
1067Getopts modules or put a loop on the front like this:
1068
1069 while ($_ = $ARGV[0], /^-/) {
1070 shift;
1071 last if /^--$/;
1072 if (/^-D(.*)/) { $debug = $1 }
1073 if (/^-v/) { $verbose++ }
1074 ... # other switches
1075 }
1076 while (<>) {
1077 ... # code for each line
1078 }
1079
d28ebecd 1080The E<lt>E<gt> symbol will return FALSE only once. If you call it again after
a0d0e21e 1081this it will assume you are processing another @ARGV list, and if you
1082haven't set @ARGV, will input from STDIN.
1083
1084If the string inside the angle brackets is a reference to a scalar
5f05dabc 1085variable (e.g., E<lt>$fooE<gt>), then that variable contains the name of the
cb1a09d0 1086filehandle to input from, or a reference to the same. For example:
1087
1088 $fh = \*STDIN;
1089 $line = <$fh>;
a0d0e21e 1090
cb1a09d0 1091If the string inside angle brackets is not a filehandle or a scalar
1092variable containing a filehandle name or reference, then it is interpreted
4633a7c4 1093as a filename pattern to be globbed, and either a list of filenames or the
1094next filename in the list is returned, depending on context. One level of
1095$ interpretation is done first, but you can't say C<E<lt>$fooE<gt>>
1096because that's an indirect filehandle as explained in the previous
6ee5d4e7 1097paragraph. (In older versions of Perl, programmers would insert curly
4633a7c4 1098brackets to force interpretation as a filename glob: C<E<lt>${foo}E<gt>>.
d28ebecd 1099These days, it's considered cleaner to call the internal function directly
4633a7c4 1100as C<glob($foo)>, which is probably the right way to have done it in the
1101first place.) Example:
a0d0e21e 1102
1103 while (<*.c>) {
1104 chmod 0644, $_;
1105 }
1106
1107is equivalent to
1108
1109 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1110 while (<FOO>) {
1111 chop;
1112 chmod 0644, $_;
1113 }
1114
1115In fact, it's currently implemented that way. (Which means it will not
1116work on filenames with spaces in them unless you have csh(1) on your
1117machine.) Of course, the shortest way to do the above is:
1118
1119 chmod 0644, <*.c>;
1120
1121Because globbing invokes a shell, it's often faster to call readdir() yourself
5f05dabc 1122and do your own grep() on the filenames. Furthermore, due to its current
a0d0e21e 1123implementation of using a shell, the glob() routine may get "Arg list too
1124long" errors (unless you've installed tcsh(1L) as F</bin/csh>).
1125
5f05dabc 1126A glob evaluates its (embedded) argument only when it is starting a new
4633a7c4 1127list. All values must be read before it will start over. In a list
1128context this isn't important, because you automatically get them all
1129anyway. In a scalar context, however, the operator returns the next value
1130each time it is called, or a FALSE value if you've just run out. Again,
1131FALSE is returned only once. So if you're expecting a single value from
1132a glob, it is much better to say
1133
1134 ($file) = <blurch*>;
1135
1136than
1137
1138 $file = <blurch*>;
1139
1140because the latter will alternate between returning a filename and
1141returning FALSE.
1142
1143It you're trying to do variable interpolation, it's definitely better
1144to use the glob() function, because the older notation can cause people
e37d713d 1145to become confused with the indirect filehandle notation.
4633a7c4 1146
1147 @files = glob("$dir/*.[ch]");
1148 @files = glob($files[$i]);
1149
a0d0e21e 1150=head2 Constant Folding
1151
1152Like C, Perl does a certain amount of expression evaluation at
1153compile time, whenever it determines that all of the arguments to an
1154operator are static and have no side effects. In particular, string
1155concatenation happens at compile time between literals that don't do
1156variable substitution. Backslash interpretation also happens at
1157compile time. You can say
1158
1159 'Now is the time for all' . "\n" .
1160 'good men to come to.'
1161
1162and this all reduces to one string internally. Likewise, if
1163you say
1164
1165 foreach $file (@filenames) {
1166 if (-s $file > 5 + 100 * 2**16) { ... }
1167 }
1168
1169the compiler will pre-compute the number that
1170expression represents so that the interpreter
1171won't have to.
1172
1173
55497cff 1174=head2 Integer Arithmetic
a0d0e21e 1175
1176By default Perl assumes that it must do most of its arithmetic in
1177floating point. But by saying
1178
1179 use integer;
1180
1181you may tell the compiler that it's okay to use integer operations
1182from here to the end of the enclosing BLOCK. An inner BLOCK may
1183countermand this by saying
1184
1185 no integer;
1186
1187which lasts until the end of that BLOCK.
1188
55497cff 1189The bitwise operators ("&", "|", "^", "~", "<<", and ">>") always
1190produce integral results. However, C<use integer> still has meaning
1191for them. By default, their results are interpreted as unsigned
1192integers. However, if C<use integer> is in effect, their results are
5f05dabc 1193interpreted as signed integers. For example, C<~0> usually evaluates
55497cff 1194to a large integral value. However, C<use integer; ~0> is -1.
68dc0745 1195
1196=head2 Floating-point Arithmetic
1197
1198While C<use integer> provides integer-only arithmetic, there is no
1199similar ways to provide rounding or truncation at a certain number of
1200decimal places. For rounding to a certain number of digits, sprintf()
1201or printf() is usually the easiest route.
1202
1203The POSIX module (part of the standard perl distribution) implements
1204ceil(), floor(), and a number of other mathematical and trigonometric
1205functions. The Math::Complex module (part of the standard perl
1206distribution) defines a number of mathematical functions that can also
1207work on real numbers. Math::Complex not as efficient as POSIX, but
1208POSIX can't work with complex numbers.
1209
1210Rounding in financial applications can have serious implications, and
1211the rounding method used should be specified precisely. In these
1212cases, it probably pays not to trust whichever system rounding is
1213being used by Perl, but to instead implement the rounding function you
1214need yourself.