XSUB's doc fix
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlop - Perl operators and precedence
4
5=head1 SYNOPSIS
6
7Perl operators have the following associativity and precedence,
8listed from highest precedence to lowest. Note that all operators
9borrowed from C keep the same precedence relationship with each other,
10even where C's precedence is slightly screwy. (This makes learning
c07a80fd 11Perl easier for C folks.) With very few exceptions, these all
12operate on scalar values only, not array values.
a0d0e21e 13
14 left terms and list operators (leftward)
15 left ->
16 nonassoc ++ --
17 right **
18 right ! ~ \ and unary + and -
19 left =~ !~
20 left * / % x
21 left + - .
22 left << >>
23 nonassoc named unary operators
24 nonassoc < > <= >= lt gt le ge
25 nonassoc == != <=> eq ne cmp
26 left &
27 left | ^
28 left &&
29 left ||
30 nonassoc ..
31 right ?:
32 right = += -= *= etc.
33 left , =>
34 nonassoc list operators (rightward)
a5f75d66 35 right not
a0d0e21e 36 left and
37 left or xor
38
39In the following sections, these operators are covered in precedence order.
40
cb1a09d0 41=head1 DESCRIPTION
a0d0e21e 42
43=head2 Terms and List Operators (Leftward)
44
45Any TERM is of highest precedence of Perl. These includes variables,
5f05dabc 46quote and quote-like operators, any expression in parentheses,
a0d0e21e 47and any function whose arguments are parenthesized. Actually, there
48aren't really functions in this sense, just list operators and unary
49operators behaving as functions because you put parentheses around
50the arguments. These are all documented in L<perlfunc>.
51
52If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
53is followed by a left parenthesis as the next token, the operator and
54arguments within parentheses are taken to be of highest precedence,
55just like a normal function call.
56
57In the absence of parentheses, the precedence of list operators such as
58C<print>, C<sort>, or C<chmod> is either very high or very low depending on
59whether you look at the left side of operator or the right side of it.
60For example, in
61
62 @ary = (1, 3, sort 4, 2);
63 print @ary; # prints 1324
64
65the commas on the right of the sort are evaluated before the sort, but
66the commas on the left are evaluated after. In other words, list
67operators tend to gobble up all the arguments that follow them, and
68then act like a simple TERM with regard to the preceding expression.
5f05dabc 69Note that you have to be careful with parentheses:
a0d0e21e 70
71 # These evaluate exit before doing the print:
72 print($foo, exit); # Obviously not what you want.
73 print $foo, exit; # Nor is this.
74
75 # These do the print before evaluating exit:
76 (print $foo), exit; # This is what you want.
77 print($foo), exit; # Or this.
78 print ($foo), exit; # Or even this.
79
80Also note that
81
82 print ($foo & 255) + 1, "\n";
83
84probably doesn't do what you expect at first glance. See
85L<Named Unary Operators> for more discussion of this.
86
87Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
88well as subroutine and method calls, and the anonymous
89constructors C<[]> and C<{}>.
90
2ae324a7 91See also L<Quote and Quote-like Operators> toward the end of this section,
c07a80fd 92as well as L<"I/O Operators">.
a0d0e21e 93
94=head2 The Arrow Operator
95
96Just as in C and C++, "C<-E<gt>>" is an infix dereference operator. If the
97right side is either a C<[...]> or C<{...}> subscript, then the left side
98must be either a hard or symbolic reference to an array or hash (or
99a location capable of holding a hard reference, if it's an lvalue (assignable)).
100See L<perlref>.
101
102Otherwise, the right side is a method name or a simple scalar variable
103containing the method name, and the left side must either be an object
104(a blessed reference) or a class name (that is, a package name).
105See L<perlobj>.
106
5f05dabc 107=head2 Auto-increment and Auto-decrement
a0d0e21e 108
109"++" and "--" work as in C. That is, if placed before a variable, they
110increment or decrement the variable before returning the value, and if
111placed after, increment or decrement the variable after returning the value.
112
5f05dabc 113The auto-increment operator has a little extra built-in magic to it. If
a0d0e21e 114you increment a variable that is numeric, or that has ever been used in
115a numeric context, you get a normal increment. If, however, the
5f05dabc 116variable has been used in only string contexts since it was set, and
a0d0e21e 117has a value that is not null and matches the pattern
118C</^[a-zA-Z]*[0-9]*$/>, the increment is done as a string, preserving each
119character within its range, with carry:
120
121 print ++($foo = '99'); # prints '100'
122 print ++($foo = 'a0'); # prints 'a1'
123 print ++($foo = 'Az'); # prints 'Ba'
124 print ++($foo = 'zz'); # prints 'aaa'
125
5f05dabc 126The auto-decrement operator is not magical.
a0d0e21e 127
128=head2 Exponentiation
129
130Binary "**" is the exponentiation operator. Note that it binds even more
cb1a09d0 131tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
132implemented using C's pow(3) function, which actually works on doubles
133internally.)
a0d0e21e 134
135=head2 Symbolic Unary Operators
136
5f05dabc 137Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e 138precedence version of this.
139
140Unary "-" performs arithmetic negation if the operand is numeric. If
141the operand is an identifier, a string consisting of a minus sign
142concatenated with the identifier is returned. Otherwise, if the string
143starts with a plus or minus, a string starting with the opposite sign
144is returned. One effect of these rules is that C<-bareword> is equivalent
145to C<"-bareword">.
146
5f05dabc 147Unary "~" performs bitwise negation, i.e., 1's complement.
55497cff 148(See also L<Integer Arithmetic>.)
a0d0e21e 149
150Unary "+" has no effect whatsoever, even on strings. It is useful
151syntactically for separating a function name from a parenthesized expression
152that would otherwise be interpreted as the complete list of function
5ba421f6 153arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
a0d0e21e 154
155Unary "\" creates a reference to whatever follows it. See L<perlref>.
156Do not confuse this behavior with the behavior of backslash within a
157string, although both forms do convey the notion of protecting the next
158thing from interpretation.
159
160=head2 Binding Operators
161
c07a80fd 162Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 163search or modify the string $_ by default. This operator makes that kind
164of operation work on some other string. The right argument is a search
165pattern, substitution, or translation. The left argument is what is
166supposed to be searched, substituted, or translated instead of the default
167$_. The return value indicates the success of the operation. (If the
168right argument is an expression rather than a search pattern,
169substitution, or translation, it is interpreted as a search pattern at run
aa689395 170time. This can be is less efficient than an explicit search, because the
171pattern must be compiled every time the expression is evaluated.
a0d0e21e 172
173Binary "!~" is just like "=~" except the return value is negated in
174the logical sense.
175
176=head2 Multiplicative Operators
177
178Binary "*" multiplies two numbers.
179
180Binary "/" divides two numbers.
181
182Binary "%" computes the modulus of the two numbers.
183
184Binary "x" is the repetition operator. In a scalar context, it
185returns a string consisting of the left operand repeated the number of
186times specified by the right operand. In a list context, if the left
5f05dabc 187operand is a list in parentheses, it repeats the list.
a0d0e21e 188
189 print '-' x 80; # print row of dashes
190
191 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
192
193 @ones = (1) x 80; # a list of 80 1's
194 @ones = (5) x @ones; # set all elements to 5
195
196
197=head2 Additive Operators
198
199Binary "+" returns the sum of two numbers.
200
201Binary "-" returns the difference of two numbers.
202
203Binary "." concatenates two strings.
204
205=head2 Shift Operators
206
55497cff 207Binary "<<" returns the value of its left argument shifted left by the
208number of bits specified by the right argument. Arguments should be
209integers. (See also L<Integer Arithmetic>.)
a0d0e21e 210
55497cff 211Binary ">>" returns the value of its left argument shifted right by
212the number of bits specified by the right argument. Arguments should
213be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 214
215=head2 Named Unary Operators
216
217The various named unary operators are treated as functions with one
218argument, with optional parentheses. These include the filetest
219operators, like C<-f>, C<-M>, etc. See L<perlfunc>.
220
221If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
222is followed by a left parenthesis as the next token, the operator and
223arguments within parentheses are taken to be of highest precedence,
224just like a normal function call. Examples:
225
226 chdir $foo || die; # (chdir $foo) || die
227 chdir($foo) || die; # (chdir $foo) || die
228 chdir ($foo) || die; # (chdir $foo) || die
229 chdir +($foo) || die; # (chdir $foo) || die
230
231but, because * is higher precedence than ||:
232
233 chdir $foo * 20; # chdir ($foo * 20)
234 chdir($foo) * 20; # (chdir $foo) * 20
235 chdir ($foo) * 20; # (chdir $foo) * 20
236 chdir +($foo) * 20; # chdir ($foo * 20)
237
238 rand 10 * 20; # rand (10 * 20)
239 rand(10) * 20; # (rand 10) * 20
240 rand (10) * 20; # (rand 10) * 20
241 rand +(10) * 20; # rand (10 * 20)
242
5ba421f6 243See also L<"Terms and List Operators (Leftward)">.
a0d0e21e 244
245=head2 Relational Operators
246
6ee5d4e7 247Binary "E<lt>" returns true if the left argument is numerically less than
a0d0e21e 248the right argument.
249
6ee5d4e7 250Binary "E<gt>" returns true if the left argument is numerically greater
a0d0e21e 251than the right argument.
252
6ee5d4e7 253Binary "E<lt>=" returns true if the left argument is numerically less than
a0d0e21e 254or equal to the right argument.
255
6ee5d4e7 256Binary "E<gt>=" returns true if the left argument is numerically greater
a0d0e21e 257than or equal to the right argument.
258
259Binary "lt" returns true if the left argument is stringwise less than
260the right argument.
261
262Binary "gt" returns true if the left argument is stringwise greater
263than the right argument.
264
265Binary "le" returns true if the left argument is stringwise less than
266or equal to the right argument.
267
268Binary "ge" returns true if the left argument is stringwise greater
269than or equal to the right argument.
270
271=head2 Equality Operators
272
273Binary "==" returns true if the left argument is numerically equal to
274the right argument.
275
276Binary "!=" returns true if the left argument is numerically not equal
277to the right argument.
278
6ee5d4e7 279Binary "E<lt>=E<gt>" returns -1, 0, or 1 depending on whether the left
280argument is numerically less than, equal to, or greater than the right
281argument.
a0d0e21e 282
283Binary "eq" returns true if the left argument is stringwise equal to
284the right argument.
285
286Binary "ne" returns true if the left argument is stringwise not equal
287to the right argument.
288
289Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise
290less than, equal to, or greater than the right argument.
291
a034a98d 292"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
293by the current locale if C<use locale> is in effect. See L<perllocale>.
294
a0d0e21e 295=head2 Bitwise And
296
297Binary "&" returns its operators ANDed together bit by bit.
55497cff 298(See also L<Integer Arithmetic>.)
a0d0e21e 299
300=head2 Bitwise Or and Exclusive Or
301
302Binary "|" returns its operators ORed together bit by bit.
55497cff 303(See also L<Integer Arithmetic>.)
a0d0e21e 304
305Binary "^" returns its operators XORed together bit by bit.
55497cff 306(See also L<Integer Arithmetic>.)
a0d0e21e 307
308=head2 C-style Logical And
309
310Binary "&&" performs a short-circuit logical AND operation. That is,
311if the left operand is false, the right operand is not even evaluated.
312Scalar or list context propagates down to the right operand if it
313is evaluated.
314
315=head2 C-style Logical Or
316
317Binary "||" performs a short-circuit logical OR operation. That is,
318if the left operand is true, the right operand is not even evaluated.
319Scalar or list context propagates down to the right operand if it
320is evaluated.
321
322The C<||> and C<&&> operators differ from C's in that, rather than returning
3230 or 1, they return the last value evaluated. Thus, a reasonably portable
324way to find out the home directory (assuming it's not "0") might be:
325
326 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
327 (getpwuid($<))[7] || die "You're homeless!\n";
328
329As more readable alternatives to C<&&> and C<||>, Perl provides "and" and
330"or" operators (see below). The short-circuit behavior is identical. The
331precedence of "and" and "or" is much lower, however, so that you can
332safely use them after a list operator without the need for
333parentheses:
334
335 unlink "alpha", "beta", "gamma"
336 or gripe(), next LINE;
337
338With the C-style operators that would have been written like this:
339
340 unlink("alpha", "beta", "gamma")
341 || (gripe(), next LINE);
342
343=head2 Range Operator
344
345Binary ".." is the range operator, which is really two different
346operators depending on the context. In a list context, it returns an
347array of values counting (by ones) from the left value to the right
348value. This is useful for writing C<for (1..10)> loops and for doing
349slice operations on arrays. Be aware that under the current implementation,
350a temporary array is created, so you'll burn a lot of memory if you
351write something like this:
352
353 for (1 .. 1_000_000) {
354 # code
355 }
356
357In a scalar context, ".." returns a boolean value. The operator is
358bistable, like a flip-flop, and emulates the line-range (comma) operator
359of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
360own boolean state. It is false as long as its left operand is false.
361Once the left operand is true, the range operator stays true until the
362right operand is true, I<AFTER> which the range operator becomes false
363again. (It doesn't become false till the next time the range operator is
364evaluated. It can test the right operand and become false on the same
365evaluation it became true (as in B<awk>), but it still returns true once.
366If you don't want it to test the right operand till the next evaluation
367(as in B<sed>), use three dots ("...") instead of two.) The right
368operand is not evaluated while the operator is in the "false" state, and
369the left operand is not evaluated while the operator is in the "true"
370state. The precedence is a little lower than || and &&. The value
371returned is either the null string for false, or a sequence number
372(beginning with 1) for true. The sequence number is reset for each range
373encountered. The final sequence number in a range has the string "E0"
374appended to it, which doesn't affect its numeric value, but gives you
375something to search for if you want to exclude the endpoint. You can
376exclude the beginning point by waiting for the sequence number to be
377greater than 1. If either operand of scalar ".." is a numeric literal,
378that operand is implicitly compared to the C<$.> variable, the current
379line number. Examples:
380
381As a scalar operator:
382
383 if (101 .. 200) { print; } # print 2nd hundred lines
384 next line if (1 .. /^$/); # skip header lines
385 s/^/> / if (/^$/ .. eof()); # quote body
386
387As a list operator:
388
389 for (101 .. 200) { print; } # print $_ 100 times
390 @foo = @foo[$[ .. $#foo]; # an expensive no-op
391 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
392
393The range operator (in a list context) makes use of the magical
5f05dabc 394auto-increment algorithm if the operands are strings. You
a0d0e21e 395can say
396
397 @alphabet = ('A' .. 'Z');
398
399to get all the letters of the alphabet, or
400
401 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
402
403to get a hexadecimal digit, or
404
405 @z2 = ('01' .. '31'); print $z2[$mday];
406
407to get dates with leading zeros. If the final value specified is not
408in the sequence that the magical increment would produce, the sequence
409goes until the next value would be longer than the final value
410specified.
411
412=head2 Conditional Operator
413
414Ternary "?:" is the conditional operator, just as in C. It works much
415like an if-then-else. If the argument before the ? is true, the
416argument before the : is returned, otherwise the argument after the :
cb1a09d0 417is returned. For example:
418
419 printf "I have %d dog%s.\n", $n,
420 ($n == 1) ? '' : "s";
421
422Scalar or list context propagates downward into the 2nd
423or 3rd argument, whichever is selected.
424
425 $a = $ok ? $b : $c; # get a scalar
426 @a = $ok ? @b : @c; # get an array
427 $a = $ok ? @b : @c; # oops, that's just a count!
428
429The operator may be assigned to if both the 2nd and 3rd arguments are
430legal lvalues (meaning that you can assign to them):
a0d0e21e 431
432 ($a_or_b ? $a : $b) = $c;
433
cb1a09d0 434This is not necessarily guaranteed to contribute to the readability of your program.
a0d0e21e 435
4633a7c4 436=head2 Assignment Operators
a0d0e21e 437
438"=" is the ordinary assignment operator.
439
440Assignment operators work as in C. That is,
441
442 $a += 2;
443
444is equivalent to
445
446 $a = $a + 2;
447
448although without duplicating any side effects that dereferencing the lvalue
449might trigger, such as from tie(). Other assignment operators work similarly.
450The following are recognized:
451
452 **= += *= &= <<= &&=
453 -= /= |= >>= ||=
454 .= %= ^=
455 x=
456
457Note that while these are grouped by family, they all have the precedence
458of assignment.
459
460Unlike in C, the assignment operator produces a valid lvalue. Modifying
461an assignment is equivalent to doing the assignment and then modifying
462the variable that was assigned to. This is useful for modifying
463a copy of something, like this:
464
465 ($tmp = $global) =~ tr [A-Z] [a-z];
466
467Likewise,
468
469 ($a += 2) *= 3;
470
471is equivalent to
472
473 $a += 2;
474 $a *= 3;
475
748a9306 476=head2 Comma Operator
a0d0e21e 477
478Binary "," is the comma operator. In a scalar context it evaluates
479its left argument, throws that value away, then evaluates its right
480argument and returns that value. This is just like C's comma operator.
481
482In a list context, it's just the list argument separator, and inserts
483both its arguments into the list.
484
6ee5d4e7 485The =E<gt> digraph is mostly just a synonym for the comma operator. It's useful for
cb1a09d0 486documenting arguments that come in pairs. As of release 5.001, it also forces
4633a7c4 487any word to the left of it to be interpreted as a string.
748a9306 488
a0d0e21e 489=head2 List Operators (Rightward)
490
491On the right side of a list operator, it has very low precedence,
492such that it controls all comma-separated expressions found there.
493The only operators with lower precedence are the logical operators
494"and", "or", and "not", which may be used to evaluate calls to list
495operators without the need for extra parentheses:
496
497 open HANDLE, "filename"
498 or die "Can't open: $!\n";
499
5ba421f6 500See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e 501
502=head2 Logical Not
503
504Unary "not" returns the logical negation of the expression to its right.
505It's the equivalent of "!" except for the very low precedence.
506
507=head2 Logical And
508
509Binary "and" returns the logical conjunction of the two surrounding
510expressions. It's equivalent to && except for the very low
5f05dabc 511precedence. This means that it short-circuits: i.e., the right
a0d0e21e 512expression is evaluated only if the left expression is true.
513
514=head2 Logical or and Exclusive Or
515
516Binary "or" returns the logical disjunction of the two surrounding
517expressions. It's equivalent to || except for the very low
5f05dabc 518precedence. This means that it short-circuits: i.e., the right
a0d0e21e 519expression is evaluated only if the left expression is false.
520
521Binary "xor" returns the exclusive-OR of the two surrounding expressions.
522It cannot short circuit, of course.
523
524=head2 C Operators Missing From Perl
525
526Here is what C has that Perl doesn't:
527
528=over 8
529
530=item unary &
531
532Address-of operator. (But see the "\" operator for taking a reference.)
533
534=item unary *
535
536Dereference-address operator. (Perl's prefix dereferencing
537operators are typed: $, @, %, and &.)
538
539=item (TYPE)
540
541Type casting operator.
542
543=back
544
5f05dabc 545=head2 Quote and Quote-like Operators
a0d0e21e 546
547While we usually think of quotes as literal values, in Perl they
548function as operators, providing various kinds of interpolating and
549pattern matching capabilities. Perl provides customary quote characters
550for these behaviors, but also provides a way for you to choose your
551quote character for any of them. In the following table, a C<{}> represents
552any pair of delimiters you choose. Non-bracketing delimiters use
553the same character fore and aft, but the 4 sorts of brackets
554(round, angle, square, curly) will all nest.
555
556 Customary Generic Meaning Interpolates
557 '' q{} Literal no
558 "" qq{} Literal yes
559 `` qx{} Command yes
560 qw{} Word list no
561 // m{} Pattern match yes
562 s{}{} Substitution yes
563 tr{}{} Translation no
564
cb1a09d0 565For constructs that do interpolation, variables beginning with "C<$>" or "C<@>"
a0d0e21e 566are interpolated, as are the following sequences:
567
6ee5d4e7 568 \t tab (HT, TAB)
569 \n newline (LF, NL)
570 \r return (CR)
571 \f form feed (FF)
572 \b backspace (BS)
573 \a alarm (bell) (BEL)
574 \e escape (ESC)
a0d0e21e 575 \033 octal char
576 \x1b hex char
577 \c[ control char
578 \l lowercase next char
579 \u uppercase next char
580 \L lowercase till \E
581 \U uppercase till \E
582 \E end case modification
583 \Q quote regexp metacharacters till \E
584
a034a98d 585If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
586and <\U> is taken from the current locale. See L<perllocale>.
587
a0d0e21e 588Patterns are subject to an additional level of interpretation as a
589regular expression. This is done as a second pass, after variables are
590interpolated, so that regular expressions may be incorporated into the
591pattern from the variables. If this is not what you want, use C<\Q> to
592interpolate a variable literally.
593
594Apart from the above, there are no multiple levels of interpolation. In
5f05dabc 595particular, contrary to the expectations of shell programmers, back-quotes
a0d0e21e 596do I<NOT> interpolate within double quotes, nor do single quotes impede
597evaluation of variables when used within double quotes.
598
5f05dabc 599=head2 Regexp Quote-Like Operators
cb1a09d0 600
5f05dabc 601Here are the quote-like operators that apply to pattern
cb1a09d0 602matching and related activities.
603
a0d0e21e 604=over 8
605
606=item ?PATTERN?
607
608This is just like the C</pattern/> search, except that it matches only
609once between calls to the reset() operator. This is a useful
5f05dabc 610optimization when you want to see only the first occurrence of
a0d0e21e 611something in each file of a set of files, for instance. Only C<??>
612patterns local to the current package are reset.
613
614This usage is vaguely deprecated, and may be removed in some future
615version of Perl.
616
617=item m/PATTERN/gimosx
618
619=item /PATTERN/gimosx
620
621Searches a string for a pattern match, and in a scalar context returns
622true (1) or false (''). If no string is specified via the C<=~> or
623C<!~> operator, the $_ string is searched. (The string specified with
624C<=~> need not be an lvalue--it may be the result of an expression
625evaluation, but remember the C<=~> binds rather tightly.) See also
626L<perlre>.
a034a98d 627See L<perllocale> for discussion of additional considerations which apply
628when C<use locale> is in effect.
a0d0e21e 629
630Options are:
631
5f05dabc 632 g Match globally, i.e., find all occurrences.
a0d0e21e 633 i Do case-insensitive pattern matching.
634 m Treat string as multiple lines.
5f05dabc 635 o Compile pattern only once.
a0d0e21e 636 s Treat string as single line.
637 x Use extended regular expressions.
638
639If "/" is the delimiter then the initial C<m> is optional. With the C<m>
640you can use any pair of non-alphanumeric, non-whitespace characters as
641delimiters. This is particularly useful for matching Unix path names
642that contain "/", to avoid LTS (leaning toothpick syndrome).
643
644PATTERN may contain variables, which will be interpolated (and the
645pattern recompiled) every time the pattern search is evaluated. (Note
646that C<$)> and C<$|> might not be interpolated because they look like
647end-of-string tests.) If you want such a pattern to be compiled only
648once, add a C</o> after the trailing delimiter. This avoids expensive
649run-time recompilations, and is useful when the value you are
650interpolating won't change over the life of the script. However, mentioning
651C</o> constitutes a promise that you won't change the variables in the pattern.
652If you change them, Perl won't even notice.
653
4633a7c4 654If the PATTERN evaluates to a null string, the last
655successfully executed regular expression is used instead.
a0d0e21e 656
657If used in a context that requires a list value, a pattern match returns a
658list consisting of the subexpressions matched by the parentheses in the
5f05dabc 659pattern, i.e., (C<$1>, $2, $3...). (Note that here $1 etc. are also set, and
a0d0e21e 660that this differs from Perl 4's behavior.) If the match fails, a null
661array is returned. If the match succeeds, but there were no parentheses,
662a list value of (1) is returned.
663
664Examples:
665
666 open(TTY, '/dev/tty');
667 <TTY> =~ /^y/i && foo(); # do foo if desired
668
669 if (/Version: *([0-9.]*)/) { $version = $1; }
670
671 next if m#^/usr/spool/uucp#;
672
673 # poor man's grep
674 $arg = shift;
675 while (<>) {
676 print if /$arg/o; # compile only once
677 }
678
679 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
680
681This last example splits $foo into the first two words and the
5f05dabc 682remainder of the line, and assigns those three fields to $F1, $F2, and
683$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e 684the pattern matched.
685
686The C</g> modifier specifies global pattern matching--that is, matching
687as many times as possible within the string. How it behaves depends on
688the context. In a list context, it returns a list of all the
689substrings matched by all the parentheses in the regular expression.
690If there are no parentheses, it returns a list of all the matched
691strings, as if there were parentheses around the whole pattern.
692
693In a scalar context, C<m//g> iterates through the string, returning TRUE
694each time it matches, and FALSE when it eventually runs out of
695matches. (In other words, it remembers where it left off last time and
696restarts the search at that point. You can actually find the current
44a8e56a 697match position of a string or set it using the pos() function--see
698L<perlfunc/pos>.) Note that you can use this feature to stack C<m//g>
699matches or intermix C<m//g> matches with C<m/\G.../>.
700
a0d0e21e 701If you modify the string in any way, the match position is reset to the
702beginning. Examples:
703
704 # list context
705 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
706
707 # scalar context
5f05dabc 708 $/ = ""; $* = 1; # $* deprecated in modern perls
a0d0e21e 709 while ($paragraph = <>) {
710 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
711 $sentences++;
712 }
713 }
714 print "$sentences\n";
715
44a8e56a 716 # using m//g with \G
717 $_ = "ppooqppq";
718 while ($i++ < 2) {
719 print "1: '";
720 print $1 while /(o)/g; print "', pos=", pos, "\n";
721 print "2: '";
722 print $1 if /\G(q)/; print "', pos=", pos, "\n";
723 print "3: '";
724 print $1 while /(p)/g; print "', pos=", pos, "\n";
725 }
726
727The last example should print:
728
729 1: 'oo', pos=4
730 2: 'q', pos=4
731 3: 'pp', pos=7
732 1: '', pos=7
733 2: 'q', pos=7
734 3: '', pos=7
735
736Note how C<m//g> matches change the value reported by C<pos()>, but the
737non-global match doesn't.
738
e7ea3e70 739A useful idiom for C<lex>-like scanners is C</\G.../g>. You can
740combine several regexps like this to process a string part-by-part,
741doing different actions depending on which regexp matched. The next
742regexp would step in at the place the previous one left off.
743
744 $_ = <<'EOL';
745 $url = new URI::URL "http://www/"; die if $url eq "xXx";
746EOL
747 LOOP:
748 {
749 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/g;
750 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/g;
751 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/g;
752 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/g;
753 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/g;
754 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/g;
755 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/g;
756 print ". That's all!\n";
757 }
758
759Here is the output (split into several lines):
760
761 line-noise lowercase line-noise lowercase UPPERCASE line-noise
762 UPPERCASE line-noise lowercase line-noise lowercase line-noise
763 lowercase lowercase line-noise lowercase lowercase line-noise
764 MiXeD line-noise. That's all!
44a8e56a 765
a0d0e21e 766=item q/STRING/
767
768=item C<'STRING'>
769
770A single-quoted, literal string. Backslashes are ignored, unless
771followed by the delimiter or another backslash, in which case the
772delimiter or backslash is interpolated.
773
774 $foo = q!I said, "You said, 'She said it.'"!;
775 $bar = q('This is it.');
776
777=item qq/STRING/
778
779=item "STRING"
780
781A double-quoted, interpolated string.
782
783 $_ .= qq
784 (*** The previous line contains the naughty word "$1".\n)
785 if /(tcl|rexx|python)/; # :-)
786
787=item qx/STRING/
788
789=item `STRING`
790
791A string which is interpolated and then executed as a system command.
792The collected standard output of the command is returned. In scalar
793context, it comes back as a single (potentially multi-line) string.
794In list context, returns a list of lines (however you've defined lines
795with $/ or $INPUT_RECORD_SEPARATOR).
796
797 $today = qx{ date };
798
799See L<I/O Operators> for more discussion.
800
801=item qw/STRING/
802
803Returns a list of the words extracted out of STRING, using embedded
804whitespace as the word delimiters. It is exactly equivalent to
805
806 split(' ', q/STRING/);
807
808Some frequently seen examples:
809
810 use POSIX qw( setlocale localeconv )
811 @EXPORT = qw( foo bar baz );
812
813=item s/PATTERN/REPLACEMENT/egimosx
814
815Searches a string for a pattern, and if found, replaces that pattern
816with the replacement text and returns the number of substitutions
e37d713d 817made. Otherwise it returns false (specifically, the empty string).
a0d0e21e 818
819If no string is specified via the C<=~> or C<!~> operator, the C<$_>
820variable is searched and modified. (The string specified with C<=~> must
821be a scalar variable, an array element, a hash element, or an assignment
5f05dabc 822to one of those, i.e., an lvalue.)
a0d0e21e 823
824If the delimiter chosen is single quote, no variable interpolation is
825done on either the PATTERN or the REPLACEMENT. Otherwise, if the
826PATTERN contains a $ that looks like a variable rather than an
827end-of-string test, the variable will be interpolated into the pattern
5f05dabc 828at run-time. If you want the pattern compiled only once the first time
a0d0e21e 829the variable is interpolated, use the C</o> option. If the pattern
4633a7c4 830evaluates to a null string, the last successfully executed regular
a0d0e21e 831expression is used instead. See L<perlre> for further explanation on these.
a034a98d 832See L<perllocale> for discussion of additional considerations which apply
833when C<use locale> is in effect.
a0d0e21e 834
835Options are:
836
837 e Evaluate the right side as an expression.
5f05dabc 838 g Replace globally, i.e., all occurrences.
a0d0e21e 839 i Do case-insensitive pattern matching.
840 m Treat string as multiple lines.
5f05dabc 841 o Compile pattern only once.
a0d0e21e 842 s Treat string as single line.
843 x Use extended regular expressions.
844
845Any non-alphanumeric, non-whitespace delimiter may replace the
846slashes. If single quotes are used, no interpretation is done on the
e37d713d 847replacement string (the C</e> modifier overrides this, however). Unlike
5f05dabc 848Perl 4, Perl 5 treats back-ticks as normal delimiters; the replacement
e37d713d 849text is not evaluated as a command. If the
a0d0e21e 850PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
5f05dabc 851pair of quotes, which may or may not be bracketing quotes, e.g.,
a0d0e21e 852C<s(foo)(bar)> or C<sE<lt>fooE<gt>/bar/>. A C</e> will cause the
853replacement portion to be interpreter as a full-fledged Perl expression
854and eval()ed right then and there. It is, however, syntax checked at
855compile-time.
856
857Examples:
858
859 s/\bgreen\b/mauve/g; # don't change wintergreen
860
861 $path =~ s|/usr/bin|/usr/local/bin|;
862
863 s/Login: $foo/Login: $bar/; # run-time pattern
864
865 ($foo = $bar) =~ s/this/that/;
866
867 $count = ($paragraph =~ s/Mister\b/Mr./g);
868
869 $_ = 'abc123xyz';
870 s/\d+/$&*2/e; # yields 'abc246xyz'
871 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
872 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
873
874 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
875 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
876 s/^=(\w+)/&pod($1)/ge; # use function call
877
878 # /e's can even nest; this will expand
879 # simple embedded variables in $_
880 s/(\$\w+)/$1/eeg;
881
882 # Delete C comments.
883 $program =~ s {
4633a7c4 884 /\* # Match the opening delimiter.
885 .*? # Match a minimal number of characters.
886 \*/ # Match the closing delimiter.
a0d0e21e 887 } []gsx;
888
889 s/^\s*(.*?)\s*$/$1/; # trim white space
890
891 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
892
893Note the use of $ instead of \ in the last example. Unlike
5f05dabc 894B<sed>, we use the \E<lt>I<digit>E<gt> form in only the left hand side.
6ee5d4e7 895Anywhere else it's $E<lt>I<digit>E<gt>.
a0d0e21e 896
5f05dabc 897Occasionally, you can't use just a C</g> to get all the changes
a0d0e21e 898to occur. Here are two common cases:
899
900 # put commas in the right places in an integer
901 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4
902 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
903
904 # expand tabs to 8-column spacing
905 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
906
907
908=item tr/SEARCHLIST/REPLACEMENTLIST/cds
909
910=item y/SEARCHLIST/REPLACEMENTLIST/cds
911
912Translates all occurrences of the characters found in the search list
913with the corresponding character in the replacement list. It returns
914the number of characters replaced or deleted. If no string is
915specified via the =~ or !~ operator, the $_ string is translated. (The
916string specified with =~ must be a scalar variable, an array element,
5f05dabc 917or an assignment to one of those, i.e., an lvalue.) For B<sed> devotees,
a0d0e21e 918C<y> is provided as a synonym for C<tr>. If the SEARCHLIST is
919delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of
5f05dabc 920quotes, which may or may not be bracketing quotes, e.g., C<tr[A-Z][a-z]>
a0d0e21e 921or C<tr(+-*/)/ABCD/>.
922
923Options:
924
925 c Complement the SEARCHLIST.
926 d Delete found but unreplaced characters.
927 s Squash duplicate replaced characters.
928
929If the C</c> modifier is specified, the SEARCHLIST character set is
930complemented. If the C</d> modifier is specified, any characters specified
931by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note
932that this is slightly more flexible than the behavior of some B<tr>
933programs, which delete anything they find in the SEARCHLIST, period.)
934If the C</s> modifier is specified, sequences of characters that were
935translated to the same character are squashed down to a single instance of the
936character.
937
938If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
939exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
940than the SEARCHLIST, the final character is replicated till it is long
941enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated.
942This latter is useful for counting characters in a class or for
943squashing character sequences in a class.
944
945Examples:
946
947 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
948
949 $cnt = tr/*/*/; # count the stars in $_
950
951 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
952
953 $cnt = tr/0-9//; # count the digits in $_
954
955 tr/a-zA-Z//s; # bookkeeper -> bokeper
956
957 ($HOST = $host) =~ tr/a-z/A-Z/;
958
959 tr/a-zA-Z/ /cs; # change non-alphas to single space
960
961 tr [\200-\377]
962 [\000-\177]; # delete 8th bit
963
748a9306 964If multiple translations are given for a character, only the first one is used:
965
966 tr/AAA/XYZ/
967
968will translate any A to X.
969
a0d0e21e 970Note that because the translation table is built at compile time, neither
971the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
972interpolation. That means that if you want to use variables, you must use
973an eval():
974
975 eval "tr/$oldlist/$newlist/";
976 die $@ if $@;
977
978 eval "tr/$oldlist/$newlist/, 1" or die $@;
979
980=back
981
982=head2 I/O Operators
983
984There are several I/O operators you should know about.
5f05dabc 985A string is enclosed by back-ticks (grave accents) first undergoes
a0d0e21e 986variable substitution just like a double quoted string. It is then
987interpreted as a command, and the output of that command is the value
988of the pseudo-literal, like in a shell. In a scalar context, a single
989string consisting of all the output is returned. In a list context,
990a list of values is returned, one for each line of output. (You can
991set C<$/> to use a different line terminator.) The command is executed
992each time the pseudo-literal is evaluated. The status value of the
993command is returned in C<$?> (see L<perlvar> for the interpretation
994of C<$?>). Unlike in B<csh>, no translation is done on the return
995data--newlines remain newlines. Unlike in any of the shells, single
996quotes do not hide variable names in the command from interpretation.
997To pass a $ through to the shell you need to hide it with a backslash.
5f05dabc 998The generalized form of back-ticks is C<qx//>. (Because back-ticks
cb1a09d0 999always undergo shell expansion as well, see L<perlsec> for
1000security concerns.)
a0d0e21e 1001
1002Evaluating a filehandle in angle brackets yields the next line from
aa689395 1003that file (newline, if any, included), or C<undef> at end of file.
1004Ordinarily you must assign that value to a variable, but there is one
1005situation where an automatic assignment happens. I<If and ONLY if> the
1006input symbol is the only thing inside the conditional of a C<while> or
1007C<for(;;)> loop, the value is automatically assigned to the variable
1008C<$_>. The assigned value is then tested to see if it is defined.
1009(This may seem like an odd thing to you, but you'll use the construct
1010in almost every Perl script you write.) Anyway, the following lines
1011are equivalent to each other:
a0d0e21e 1012
748a9306 1013 while (defined($_ = <STDIN>)) { print; }
a0d0e21e 1014 while (<STDIN>) { print; }
1015 for (;<STDIN>;) { print; }
748a9306 1016 print while defined($_ = <STDIN>);
a0d0e21e 1017 print while <STDIN>;
1018
5f05dabc 1019The filehandles STDIN, STDOUT, and STDERR are predefined. (The
1020filehandles C<stdin>, C<stdout>, and C<stderr> will also work except in
a0d0e21e 1021packages, where they would be interpreted as local identifiers rather
1022than global.) Additional filehandles may be created with the open()
cb1a09d0 1023function. See L<perlfunc/open()> for details on this.
a0d0e21e 1024
6ee5d4e7 1025If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a
a0d0e21e 1026list consisting of all the input lines is returned, one line per list
1027element. It's easy to make a I<LARGE> data space this way, so use with
1028care.
1029
d28ebecd 1030The null filehandle E<lt>E<gt> is special and can be used to emulate the
1031behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
a0d0e21e 1032standard input, or from each file listed on the command line. Here's
d28ebecd 1033how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
a0d0e21e 1034checked, and if it is null, C<$ARGV[0]> is set to "-", which when opened
1035gives you standard input. The @ARGV array is then processed as a list
1036of filenames. The loop
1037
1038 while (<>) {
1039 ... # code for each line
1040 }
1041
1042is equivalent to the following Perl-like pseudo code:
1043
1044 unshift(@ARGV, '-') if $#ARGV < $[;
1045 while ($ARGV = shift) {
1046 open(ARGV, $ARGV);
1047 while (<ARGV>) {
1048 ... # code for each line
1049 }
1050 }
1051
1052except that it isn't so cumbersome to say, and will actually work. It
1053really does shift array @ARGV and put the current filename into variable
5f05dabc 1054$ARGV. It also uses filehandle I<ARGV> internally--E<lt>E<gt> is just a
1055synonym for E<lt>ARGVE<gt>, which is magical. (The pseudo code above
1056doesn't work because it treats E<lt>ARGVE<gt> as non-magical.)
a0d0e21e 1057
d28ebecd 1058You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
a0d0e21e 1059containing the list of filenames you really want. Line numbers (C<$.>)
1060continue as if the input were one big happy file. (But see example
1061under eof() for how to reset line numbers on each file.)
1062
1063If you want to set @ARGV to your own list of files, go right ahead. If
1064you want to pass switches into your script, you can use one of the
1065Getopts modules or put a loop on the front like this:
1066
1067 while ($_ = $ARGV[0], /^-/) {
1068 shift;
1069 last if /^--$/;
1070 if (/^-D(.*)/) { $debug = $1 }
1071 if (/^-v/) { $verbose++ }
1072 ... # other switches
1073 }
1074 while (<>) {
1075 ... # code for each line
1076 }
1077
d28ebecd 1078The E<lt>E<gt> symbol will return FALSE only once. If you call it again after
a0d0e21e 1079this it will assume you are processing another @ARGV list, and if you
1080haven't set @ARGV, will input from STDIN.
1081
1082If the string inside the angle brackets is a reference to a scalar
5f05dabc 1083variable (e.g., E<lt>$fooE<gt>), then that variable contains the name of the
cb1a09d0 1084filehandle to input from, or a reference to the same. For example:
1085
1086 $fh = \*STDIN;
1087 $line = <$fh>;
a0d0e21e 1088
cb1a09d0 1089If the string inside angle brackets is not a filehandle or a scalar
1090variable containing a filehandle name or reference, then it is interpreted
4633a7c4 1091as a filename pattern to be globbed, and either a list of filenames or the
1092next filename in the list is returned, depending on context. One level of
1093$ interpretation is done first, but you can't say C<E<lt>$fooE<gt>>
1094because that's an indirect filehandle as explained in the previous
6ee5d4e7 1095paragraph. (In older versions of Perl, programmers would insert curly
4633a7c4 1096brackets to force interpretation as a filename glob: C<E<lt>${foo}E<gt>>.
d28ebecd 1097These days, it's considered cleaner to call the internal function directly
4633a7c4 1098as C<glob($foo)>, which is probably the right way to have done it in the
1099first place.) Example:
a0d0e21e 1100
1101 while (<*.c>) {
1102 chmod 0644, $_;
1103 }
1104
1105is equivalent to
1106
1107 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1108 while (<FOO>) {
1109 chop;
1110 chmod 0644, $_;
1111 }
1112
1113In fact, it's currently implemented that way. (Which means it will not
1114work on filenames with spaces in them unless you have csh(1) on your
1115machine.) Of course, the shortest way to do the above is:
1116
1117 chmod 0644, <*.c>;
1118
1119Because globbing invokes a shell, it's often faster to call readdir() yourself
5f05dabc 1120and do your own grep() on the filenames. Furthermore, due to its current
a0d0e21e 1121implementation of using a shell, the glob() routine may get "Arg list too
1122long" errors (unless you've installed tcsh(1L) as F</bin/csh>).
1123
5f05dabc 1124A glob evaluates its (embedded) argument only when it is starting a new
4633a7c4 1125list. All values must be read before it will start over. In a list
1126context this isn't important, because you automatically get them all
1127anyway. In a scalar context, however, the operator returns the next value
1128each time it is called, or a FALSE value if you've just run out. Again,
1129FALSE is returned only once. So if you're expecting a single value from
1130a glob, it is much better to say
1131
1132 ($file) = <blurch*>;
1133
1134than
1135
1136 $file = <blurch*>;
1137
1138because the latter will alternate between returning a filename and
1139returning FALSE.
1140
1141It you're trying to do variable interpolation, it's definitely better
1142to use the glob() function, because the older notation can cause people
e37d713d 1143to become confused with the indirect filehandle notation.
4633a7c4 1144
1145 @files = glob("$dir/*.[ch]");
1146 @files = glob($files[$i]);
1147
a0d0e21e 1148=head2 Constant Folding
1149
1150Like C, Perl does a certain amount of expression evaluation at
1151compile time, whenever it determines that all of the arguments to an
1152operator are static and have no side effects. In particular, string
1153concatenation happens at compile time between literals that don't do
1154variable substitution. Backslash interpretation also happens at
1155compile time. You can say
1156
1157 'Now is the time for all' . "\n" .
1158 'good men to come to.'
1159
1160and this all reduces to one string internally. Likewise, if
1161you say
1162
1163 foreach $file (@filenames) {
1164 if (-s $file > 5 + 100 * 2**16) { ... }
1165 }
1166
1167the compiler will pre-compute the number that
1168expression represents so that the interpreter
1169won't have to.
1170
1171
55497cff 1172=head2 Integer Arithmetic
a0d0e21e 1173
1174By default Perl assumes that it must do most of its arithmetic in
1175floating point. But by saying
1176
1177 use integer;
1178
1179you may tell the compiler that it's okay to use integer operations
1180from here to the end of the enclosing BLOCK. An inner BLOCK may
1181countermand this by saying
1182
1183 no integer;
1184
1185which lasts until the end of that BLOCK.
1186
55497cff 1187The bitwise operators ("&", "|", "^", "~", "<<", and ">>") always
1188produce integral results. However, C<use integer> still has meaning
1189for them. By default, their results are interpreted as unsigned
1190integers. However, if C<use integer> is in effect, their results are
5f05dabc 1191interpreted as signed integers. For example, C<~0> usually evaluates
55497cff 1192to a large integral value. However, C<use integer; ~0> is -1.
3c10ad8e 1193
1194=head2 Floating-point Arithmetic
1195
1196While C<use integer> provides integer-only arithmetic, there is no
1197similar ways to provide rounding or truncation at a certain number of
1198decimal places. For rounding to a certain number of digits, sprintf()
1199or printf() is usually the easiest route.
1200
1201The POSIX module (part of the standard perl distribution) implements
1202ceil(), floor(), and a number of other mathematical and trigonometric
1203functions. The Math::Complex module (part of the standard perl
1204distribution) defines a number of mathematical functions that can also
1205work on real numbers. Math::Complex not as efficient as POSIX, but
1206POSIX can't work with complex numbers.
1207
1208Rounding in financial applications can have serious implications, and
1209the rounding method used should be specified precisely. In these
1210cases, it probably pays not to trust whichever system rounding is
1211being used by Perl, but to instead implement the rounding function you
1212need yourself.