Re: [perl #18872] File::Basename example misleading
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlop - Perl operators and precedence
4
5=head1 SYNOPSIS
6
7Perl operators have the following associativity and precedence,
19799a22 8listed from highest precedence to lowest. Operators borrowed from
9C keep the same precedence relationship with each other, even where
10C's precedence is slightly screwy. (This makes learning Perl easier
11for C folks.) With very few exceptions, these all operate on scalar
12values only, not array values.
a0d0e21e 13
14 left terms and list operators (leftward)
15 left ->
16 nonassoc ++ --
17 right **
18 right ! ~ \ and unary + and -
54310121 19 left =~ !~
a0d0e21e 20 left * / % x
21 left + - .
22 left << >>
23 nonassoc named unary operators
24 nonassoc < > <= >= lt gt le ge
25 nonassoc == != <=> eq ne cmp
26 left &
27 left | ^
28 left &&
c963b151 29 left || //
137443ea 30 nonassoc .. ...
a0d0e21e 31 right ?:
32 right = += -= *= etc.
33 left , =>
34 nonassoc list operators (rightward)
a5f75d66 35 right not
a0d0e21e 36 left and
c963b151 37 left or xor err
a0d0e21e 38
39In the following sections, these operators are covered in precedence order.
40
5a964f20 41Many operators can be overloaded for objects. See L<overload>.
42
cb1a09d0 43=head1 DESCRIPTION
a0d0e21e 44
45=head2 Terms and List Operators (Leftward)
46
62c18ce2 47A TERM has the highest precedence in Perl. They include variables,
5f05dabc 48quote and quote-like operators, any expression in parentheses,
a0d0e21e 49and any function whose arguments are parenthesized. Actually, there
50aren't really functions in this sense, just list operators and unary
51operators behaving as functions because you put parentheses around
52the arguments. These are all documented in L<perlfunc>.
53
54If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
55is followed by a left parenthesis as the next token, the operator and
56arguments within parentheses are taken to be of highest precedence,
57just like a normal function call.
58
59In the absence of parentheses, the precedence of list operators such as
60C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 61whether you are looking at the left side or the right side of the operator.
a0d0e21e 62For example, in
63
64 @ary = (1, 3, sort 4, 2);
65 print @ary; # prints 1324
66
19799a22 67the commas on the right of the sort are evaluated before the sort,
68but the commas on the left are evaluated after. In other words,
69list operators tend to gobble up all arguments that follow, and
a0d0e21e 70then act like a simple TERM with regard to the preceding expression.
19799a22 71Be careful with parentheses:
a0d0e21e 72
73 # These evaluate exit before doing the print:
74 print($foo, exit); # Obviously not what you want.
75 print $foo, exit; # Nor is this.
76
77 # These do the print before evaluating exit:
78 (print $foo), exit; # This is what you want.
79 print($foo), exit; # Or this.
80 print ($foo), exit; # Or even this.
81
82Also note that
83
84 print ($foo & 255) + 1, "\n";
85
54310121 86probably doesn't do what you expect at first glance. See
a0d0e21e 87L<Named Unary Operators> for more discussion of this.
88
89Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
54310121 90well as subroutine and method calls, and the anonymous
a0d0e21e 91constructors C<[]> and C<{}>.
92
2ae324a7 93See also L<Quote and Quote-like Operators> toward the end of this section,
c07a80fd 94as well as L<"I/O Operators">.
a0d0e21e 95
96=head2 The Arrow Operator
97
35f2feb0 98"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22 99and C++. If the right side is either a C<[...]>, C<{...}>, or a
100C<(...)> subscript, then the left side must be either a hard or
101symbolic reference to an array, a hash, or a subroutine respectively.
102(Or technically speaking, a location capable of holding a hard
103reference, if it's an array or hash reference being used for
104assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 105
19799a22 106Otherwise, the right side is a method name or a simple scalar
107variable containing either the method name or a subroutine reference,
108and the left side must be either an object (a blessed reference)
109or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 110
5f05dabc 111=head2 Auto-increment and Auto-decrement
a0d0e21e 112
113"++" and "--" work as in C. That is, if placed before a variable, they
114increment or decrement the variable before returning the value, and if
115placed after, increment or decrement the variable after returning the value.
116
54310121 117The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e 118you increment a variable that is numeric, or that has ever been used in
119a numeric context, you get a normal increment. If, however, the
5f05dabc 120variable has been used in only string contexts since it was set, and
5a964f20 121has a value that is not the empty string and matches the pattern
9c0670e1 122C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e 123character within its range, with carry:
124
125 print ++($foo = '99'); # prints '100'
126 print ++($foo = 'a0'); # prints 'a1'
127 print ++($foo = 'Az'); # prints 'Ba'
128 print ++($foo = 'zz'); # prints 'aaa'
129
5f05dabc 130The auto-decrement operator is not magical.
a0d0e21e 131
132=head2 Exponentiation
133
19799a22 134Binary "**" is the exponentiation operator. It binds even more
cb1a09d0 135tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
136implemented using C's pow(3) function, which actually works on doubles
137internally.)
a0d0e21e 138
139=head2 Symbolic Unary Operators
140
5f05dabc 141Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e 142precedence version of this.
143
144Unary "-" performs arithmetic negation if the operand is numeric. If
145the operand is an identifier, a string consisting of a minus sign
146concatenated with the identifier is returned. Otherwise, if the string
147starts with a plus or minus, a string starting with the opposite sign
148is returned. One effect of these rules is that C<-bareword> is equivalent
149to C<"-bareword">.
150
972b05a9 151Unary "~" performs bitwise negation, i.e., 1's complement. For
152example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and
153L<Bitwise String Operators>.) Note that the width of the result is
154platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
155bits wide on a 64-bit platform, so if you are expecting a certain bit
156width, remember use the & operator to mask off the excess bits.
a0d0e21e 157
158Unary "+" has no effect whatsoever, even on strings. It is useful
159syntactically for separating a function name from a parenthesized expression
160that would otherwise be interpreted as the complete list of function
5ba421f6 161arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
a0d0e21e 162
19799a22 163Unary "\" creates a reference to whatever follows it. See L<perlreftut>
164and L<perlref>. Do not confuse this behavior with the behavior of
165backslash within a string, although both forms do convey the notion
166of protecting the next thing from interpolation.
a0d0e21e 167
168=head2 Binding Operators
169
c07a80fd 170Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 171search or modify the string $_ by default. This operator makes that kind
172of operation work on some other string. The right argument is a search
2c268ad5 173pattern, substitution, or transliteration. The left argument is what is
174supposed to be searched, substituted, or transliterated instead of the default
f8bab1e9 175$_. When used in scalar context, the return value generally indicates the
176success of the operation. Behavior in list context depends on the particular
177operator. See L</"Regexp Quote-Like Operators"> for details.
178
179If the right argument is an expression rather than a search pattern,
2c268ad5 180substitution, or transliteration, it is interpreted as a search pattern at run
573e01ca 181time.
a0d0e21e 182
183Binary "!~" is just like "=~" except the return value is negated in
184the logical sense.
185
186=head2 Multiplicative Operators
187
188Binary "*" multiplies two numbers.
189
190Binary "/" divides two numbers.
191
54310121 192Binary "%" computes the modulus of two numbers. Given integer
193operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
194C<$a> minus the largest multiple of C<$b> that is not greater than
195C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
196smallest multiple of C<$b> that is not less than C<$a> (i.e. the
6bb4e6d4 197result will be less than or equal to zero).
0412d526 198Note that when C<use integer> is in scope, "%" gives you direct access
55d729e4 199to the modulus operator as implemented by your C compiler. This
200operator is not as well defined for negative operands, but it will
201execute faster.
202
62d10b70 203Binary "x" is the repetition operator. In scalar context or if the left
204operand is not enclosed in parentheses, it returns a string consisting
205of the left operand repeated the number of times specified by the right
206operand. In list context, if the left operand is enclosed in
207parentheses, it repeats the list.
a0d0e21e 208
209 print '-' x 80; # print row of dashes
210
211 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
212
213 @ones = (1) x 80; # a list of 80 1's
214 @ones = (5) x @ones; # set all elements to 5
215
216
217=head2 Additive Operators
218
219Binary "+" returns the sum of two numbers.
220
221Binary "-" returns the difference of two numbers.
222
223Binary "." concatenates two strings.
224
225=head2 Shift Operators
226
55497cff 227Binary "<<" returns the value of its left argument shifted left by the
228number of bits specified by the right argument. Arguments should be
982ce180 229integers. (See also L<Integer Arithmetic>.)
a0d0e21e 230
55497cff 231Binary ">>" returns the value of its left argument shifted right by
232the number of bits specified by the right argument. Arguments should
982ce180 233be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 234
b16cf6df 235Note that both "<<" and ">>" in Perl are implemented directly using
236"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is
237in force then signed C integers are used, else unsigned C integers are
238used. Either way, the implementation isn't going to generate results
239larger than the size of the integer type Perl was built with (32 bits
240or 64 bits).
241
242The result of overflowing the range of the integers is undefined
243because it is undefined also in C. In other words, using 32-bit
244integers, C<< 1 << 32 >> is undefined. Shifting by a negative number
245of bits is also undefined.
246
a0d0e21e 247=head2 Named Unary Operators
248
249The various named unary operators are treated as functions with one
250argument, with optional parentheses. These include the filetest
251operators, like C<-f>, C<-M>, etc. See L<perlfunc>.
252
253If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
254is followed by a left parenthesis as the next token, the operator and
255arguments within parentheses are taken to be of highest precedence,
3981b0eb 256just like a normal function call. For example,
257because named unary operators are higher precedence than ||:
a0d0e21e 258
259 chdir $foo || die; # (chdir $foo) || die
260 chdir($foo) || die; # (chdir $foo) || die
261 chdir ($foo) || die; # (chdir $foo) || die
262 chdir +($foo) || die; # (chdir $foo) || die
263
3981b0eb 264but, because * is higher precedence than named operators:
a0d0e21e 265
266 chdir $foo * 20; # chdir ($foo * 20)
267 chdir($foo) * 20; # (chdir $foo) * 20
268 chdir ($foo) * 20; # (chdir $foo) * 20
269 chdir +($foo) * 20; # chdir ($foo * 20)
270
271 rand 10 * 20; # rand (10 * 20)
272 rand(10) * 20; # (rand 10) * 20
273 rand (10) * 20; # (rand 10) * 20
274 rand +(10) * 20; # rand (10 * 20)
275
5ba421f6 276See also L<"Terms and List Operators (Leftward)">.
a0d0e21e 277
278=head2 Relational Operators
279
35f2feb0 280Binary "<" returns true if the left argument is numerically less than
a0d0e21e 281the right argument.
282
35f2feb0 283Binary ">" returns true if the left argument is numerically greater
a0d0e21e 284than the right argument.
285
35f2feb0 286Binary "<=" returns true if the left argument is numerically less than
a0d0e21e 287or equal to the right argument.
288
35f2feb0 289Binary ">=" returns true if the left argument is numerically greater
a0d0e21e 290than or equal to the right argument.
291
292Binary "lt" returns true if the left argument is stringwise less than
293the right argument.
294
295Binary "gt" returns true if the left argument is stringwise greater
296than the right argument.
297
298Binary "le" returns true if the left argument is stringwise less than
299or equal to the right argument.
300
301Binary "ge" returns true if the left argument is stringwise greater
302than or equal to the right argument.
303
304=head2 Equality Operators
305
306Binary "==" returns true if the left argument is numerically equal to
307the right argument.
308
309Binary "!=" returns true if the left argument is numerically not equal
310to the right argument.
311
35f2feb0 312Binary "<=>" returns -1, 0, or 1 depending on whether the left
6ee5d4e7 313argument is numerically less than, equal to, or greater than the right
d4ad863d 314argument. If your platform supports NaNs (not-a-numbers) as numeric
7d3a9d88 315values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
316"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
317returns true, as does NaN != anything else. If your platform doesn't
318support NaNs then NaN is just a string with numeric value 0.
319
320 perl -le '$a = NaN; print "No NaN support here" if $a == $a'
321 perl -le '$a = NaN; print "NaN support here" if $a != $a'
a0d0e21e 322
323Binary "eq" returns true if the left argument is stringwise equal to
324the right argument.
325
326Binary "ne" returns true if the left argument is stringwise not equal
327to the right argument.
328
d4ad863d 329Binary "cmp" returns -1, 0, or 1 depending on whether the left
330argument is stringwise less than, equal to, or greater than the right
331argument.
a0d0e21e 332
a034a98d 333"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
334by the current locale if C<use locale> is in effect. See L<perllocale>.
335
a0d0e21e 336=head2 Bitwise And
337
2cdc098b 338Binary "&" returns its operands ANDed together bit by bit.
2c268ad5 339(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 340
2cdc098b 341Note that "&" has lower priority than relational operators, so for example
342the brackets are essential in a test like
343
344 print "Even\n" if ($x & 1) == 0;
345
a0d0e21e 346=head2 Bitwise Or and Exclusive Or
347
2cdc098b 348Binary "|" returns its operands ORed together bit by bit.
2c268ad5 349(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 350
2cdc098b 351Binary "^" returns its operands XORed together bit by bit.
2c268ad5 352(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 353
2cdc098b 354Note that "|" and "^" have lower priority than relational operators, so
355for example the brackets are essential in a test like
356
357 print "false\n" if (8 | 2) != 10;
358
a0d0e21e 359=head2 C-style Logical And
360
361Binary "&&" performs a short-circuit logical AND operation. That is,
362if the left operand is false, the right operand is not even evaluated.
363Scalar or list context propagates down to the right operand if it
364is evaluated.
365
366=head2 C-style Logical Or
367
368Binary "||" performs a short-circuit logical OR operation. That is,
369if the left operand is true, the right operand is not even evaluated.
370Scalar or list context propagates down to the right operand if it
371is evaluated.
372
c963b151 373=head2 C-style Logical Defined-Or
374
375Although it has no direct equivalent in C, Perl's C<//> operator is related
376to its C-style or. In fact, it's exactly the same as C<||>, except that it
377tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
378is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
379rather than the value of C<defined($a)>) and is exactly equivalent to
380C<defined($a) ? $a : $b>. This is very useful for providing default values
381for variables. If you actually want to test if at least one of C<$a> and C<$b> is
382defined, use C<defined($a // $b)>.
383
384The C<||>, C<//> and C<&&> operators differ from C's in that, rather than returning
a0d0e21e 3850 or 1, they return the last value evaluated. Thus, a reasonably portable
c963b151 386way to find out the home directory might be:
a0d0e21e 387
c963b151 388 $home = $ENV{'HOME'} // $ENV{'LOGDIR'} //
389 (getpwuid($<))[7] // die "You're homeless!\n";
a0d0e21e 390
5a964f20 391In particular, this means that you shouldn't use this
392for selecting between two aggregates for assignment:
393
394 @a = @b || @c; # this is wrong
395 @a = scalar(@b) || @c; # really meant this
396 @a = @b ? @b : @c; # this works fine, though
397
c963b151 398As more readable alternatives to C<&&>, C<//> and C<||> when used for
399control flow, Perl provides C<and>, C<err> and C<or> operators (see below).
400The short-circuit behavior is identical. The precedence of "and", "err"
401and "or" is much lower, however, so that you can safely use them after a
5a964f20 402list operator without the need for parentheses:
a0d0e21e 403
404 unlink "alpha", "beta", "gamma"
405 or gripe(), next LINE;
406
407With the C-style operators that would have been written like this:
408
409 unlink("alpha", "beta", "gamma")
410 || (gripe(), next LINE);
411
eeb6a2c9 412Using "or" for assignment is unlikely to do what you want; see below.
5a964f20 413
414=head2 Range Operators
a0d0e21e 415
416Binary ".." is the range operator, which is really two different
5a964f20 417operators depending on the context. In list context, it returns an
54ae734e 418list of values counting (up by ones) from the left value to the right
2cdbc966 419value. If the left value is greater than the right value then it
420returns the empty array. The range operator is useful for writing
54ae734e 421C<foreach (1..10)> loops and for doing slice operations on arrays. In
2cdbc966 422the current implementation, no temporary array is created when the
423range operator is used as the expression in C<foreach> loops, but older
424versions of Perl might burn a lot of memory when you write something
425like this:
a0d0e21e 426
427 for (1 .. 1_000_000) {
428 # code
54310121 429 }
a0d0e21e 430
54ae734e 431The range operator also works on strings, using the magical auto-increment,
432see below.
433
5a964f20 434In scalar context, ".." returns a boolean value. The operator is
a0d0e21e 435bistable, like a flip-flop, and emulates the line-range (comma) operator
436of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
437own boolean state. It is false as long as its left operand is false.
438Once the left operand is true, the range operator stays true until the
439right operand is true, I<AFTER> which the range operator becomes false
19799a22 440again. It doesn't become false till the next time the range operator is
a0d0e21e 441evaluated. It can test the right operand and become false on the same
442evaluation it became true (as in B<awk>), but it still returns true once.
19799a22 443If you don't want it to test the right operand till the next
444evaluation, as in B<sed>, just use three dots ("...") instead of
445two. In all other regards, "..." behaves just like ".." does.
446
447The right operand is not evaluated while the operator is in the
448"false" state, and the left operand is not evaluated while the
449operator is in the "true" state. The precedence is a little lower
450than || and &&. The value returned is either the empty string for
451false, or a sequence number (beginning with 1) for true. The
452sequence number is reset for each range encountered. The final
453sequence number in a range has the string "E0" appended to it, which
454doesn't affect its numeric value, but gives you something to search
455for if you want to exclude the endpoint. You can exclude the
456beginning point by waiting for the sequence number to be greater
457than 1. If either operand of scalar ".." is a constant expression,
458that operand is implicitly compared to the C<$.> variable, the
459current line number. Examples:
a0d0e21e 460
461As a scalar operator:
462
463 if (101 .. 200) { print; } # print 2nd hundred lines
464 next line if (1 .. /^$/); # skip header lines
465 s/^/> / if (/^$/ .. eof()); # quote body
466
5a964f20 467 # parse mail messages
468 while (<>) {
469 $in_header = 1 .. /^$/;
470 $in_body = /^$/ .. eof();
471 # do something based on those
472 } continue {
473 close ARGV if eof; # reset $. each file
474 }
475
a0d0e21e 476As a list operator:
477
478 for (101 .. 200) { print; } # print $_ 100 times
3e3baf6d 479 @foo = @foo[0 .. $#foo]; # an expensive no-op
a0d0e21e 480 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
481
5a964f20 482The range operator (in list context) makes use of the magical
5f05dabc 483auto-increment algorithm if the operands are strings. You
a0d0e21e 484can say
485
486 @alphabet = ('A' .. 'Z');
487
54ae734e 488to get all normal letters of the English alphabet, or
a0d0e21e 489
490 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
491
492to get a hexadecimal digit, or
493
494 @z2 = ('01' .. '31'); print $z2[$mday];
495
496to get dates with leading zeros. If the final value specified is not
497in the sequence that the magical increment would produce, the sequence
498goes until the next value would be longer than the final value
499specified.
500
501=head2 Conditional Operator
502
503Ternary "?:" is the conditional operator, just as in C. It works much
504like an if-then-else. If the argument before the ? is true, the
505argument before the : is returned, otherwise the argument after the :
cb1a09d0 506is returned. For example:
507
54310121 508 printf "I have %d dog%s.\n", $n,
cb1a09d0 509 ($n == 1) ? '' : "s";
510
511Scalar or list context propagates downward into the 2nd
54310121 512or 3rd argument, whichever is selected.
cb1a09d0 513
514 $a = $ok ? $b : $c; # get a scalar
515 @a = $ok ? @b : @c; # get an array
516 $a = $ok ? @b : @c; # oops, that's just a count!
517
518The operator may be assigned to if both the 2nd and 3rd arguments are
519legal lvalues (meaning that you can assign to them):
a0d0e21e 520
521 ($a_or_b ? $a : $b) = $c;
522
5a964f20 523Because this operator produces an assignable result, using assignments
524without parentheses will get you in trouble. For example, this:
525
526 $a % 2 ? $a += 10 : $a += 2
527
528Really means this:
529
530 (($a % 2) ? ($a += 10) : $a) += 2
531
532Rather than this:
533
534 ($a % 2) ? ($a += 10) : ($a += 2)
535
19799a22 536That should probably be written more simply as:
537
538 $a += ($a % 2) ? 10 : 2;
539
4633a7c4 540=head2 Assignment Operators
a0d0e21e 541
542"=" is the ordinary assignment operator.
543
544Assignment operators work as in C. That is,
545
546 $a += 2;
547
548is equivalent to
549
550 $a = $a + 2;
551
552although without duplicating any side effects that dereferencing the lvalue
54310121 553might trigger, such as from tie(). Other assignment operators work similarly.
554The following are recognized:
a0d0e21e 555
556 **= += *= &= <<= &&=
557 -= /= |= >>= ||=
558 .= %= ^=
559 x=
560
19799a22 561Although these are grouped by family, they all have the precedence
a0d0e21e 562of assignment.
563
b350dd2f 564Unlike in C, the scalar assignment operator produces a valid lvalue.
565Modifying an assignment is equivalent to doing the assignment and
566then modifying the variable that was assigned to. This is useful
567for modifying a copy of something, like this:
a0d0e21e 568
569 ($tmp = $global) =~ tr [A-Z] [a-z];
570
571Likewise,
572
573 ($a += 2) *= 3;
574
575is equivalent to
576
577 $a += 2;
578 $a *= 3;
579
b350dd2f 580Similarly, a list assignment in list context produces the list of
581lvalues assigned to, and a list assignment in scalar context returns
582the number of elements produced by the expression on the right hand
583side of the assignment.
584
748a9306 585=head2 Comma Operator
a0d0e21e 586
5a964f20 587Binary "," is the comma operator. In scalar context it evaluates
a0d0e21e 588its left argument, throws that value away, then evaluates its right
589argument and returns that value. This is just like C's comma operator.
590
5a964f20 591In list context, it's just the list argument separator, and inserts
a0d0e21e 592both its arguments into the list.
593
35f2feb0 594The => digraph is mostly just a synonym for the comma operator. It's useful for
cb1a09d0 595documenting arguments that come in pairs. As of release 5.001, it also forces
4633a7c4 596any word to the left of it to be interpreted as a string.
748a9306 597
a0d0e21e 598=head2 List Operators (Rightward)
599
600On the right side of a list operator, it has very low precedence,
601such that it controls all comma-separated expressions found there.
602The only operators with lower precedence are the logical operators
603"and", "or", and "not", which may be used to evaluate calls to list
604operators without the need for extra parentheses:
605
606 open HANDLE, "filename"
607 or die "Can't open: $!\n";
608
5ba421f6 609See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e 610
611=head2 Logical Not
612
613Unary "not" returns the logical negation of the expression to its right.
614It's the equivalent of "!" except for the very low precedence.
615
616=head2 Logical And
617
618Binary "and" returns the logical conjunction of the two surrounding
619expressions. It's equivalent to && except for the very low
5f05dabc 620precedence. This means that it short-circuits: i.e., the right
a0d0e21e 621expression is evaluated only if the left expression is true.
622
c963b151 623=head2 Logical or, Defined or, and Exclusive Or
a0d0e21e 624
625Binary "or" returns the logical disjunction of the two surrounding
5a964f20 626expressions. It's equivalent to || except for the very low precedence.
627This makes it useful for control flow
628
629 print FH $data or die "Can't write to FH: $!";
630
631This means that it short-circuits: i.e., the right expression is evaluated
632only if the left expression is false. Due to its precedence, you should
633probably avoid using this for assignment, only for control flow.
634
635 $a = $b or $c; # bug: this is wrong
636 ($a = $b) or $c; # really means this
637 $a = $b || $c; # better written this way
638
19799a22 639However, when it's a list-context assignment and you're trying to use
5a964f20 640"||" for control flow, you probably need "or" so that the assignment
641takes higher precedence.
642
643 @info = stat($file) || die; # oops, scalar sense of stat!
644 @info = stat($file) or die; # better, now @info gets its due
645
c963b151 646Then again, you could always use parentheses.
647
648Binary "err" is equivalent to C<//>--it's just like binary "or", except it tests
649its left argument's definedness instead of its truth. There are two ways to
650remember "err": either because many functions return C<undef> on an B<err>or,
651or as a sort of correction: C<$a=($b err 'default')>
a0d0e21e 652
653Binary "xor" returns the exclusive-OR of the two surrounding expressions.
654It cannot short circuit, of course.
655
656=head2 C Operators Missing From Perl
657
658Here is what C has that Perl doesn't:
659
660=over 8
661
662=item unary &
663
664Address-of operator. (But see the "\" operator for taking a reference.)
665
666=item unary *
667
54310121 668Dereference-address operator. (Perl's prefix dereferencing
a0d0e21e 669operators are typed: $, @, %, and &.)
670
671=item (TYPE)
672
19799a22 673Type-casting operator.
a0d0e21e 674
675=back
676
5f05dabc 677=head2 Quote and Quote-like Operators
a0d0e21e 678
679While we usually think of quotes as literal values, in Perl they
680function as operators, providing various kinds of interpolating and
681pattern matching capabilities. Perl provides customary quote characters
682for these behaviors, but also provides a way for you to choose your
683quote character for any of them. In the following table, a C<{}> represents
87275199 684any pair of delimiters you choose.
a0d0e21e 685
2c268ad5 686 Customary Generic Meaning Interpolates
687 '' q{} Literal no
688 "" qq{} Literal yes
af9219ee 689 `` qx{} Command yes*
2c268ad5 690 qw{} Word list no
af9219ee 691 // m{} Pattern match yes*
692 qr{} Pattern yes*
693 s{}{} Substitution yes*
2c268ad5 694 tr{}{} Transliteration no (but see below)
7e3b091d 695 <<EOF here-doc yes*
a0d0e21e 696
af9219ee 697 * unless the delimiter is ''.
698
87275199 699Non-bracketing delimiters use the same character fore and aft, but the four
700sorts of brackets (round, angle, square, curly) will all nest, which means
701that
702
703 q{foo{bar}baz}
35f2feb0 704
87275199 705is the same as
706
707 'foo{bar}baz'
708
709Note, however, that this does not always work for quoting Perl code:
710
711 $s = q{ if($a eq "}") ... }; # WRONG
712
83df6a1d 713is a syntax error. The C<Text::Balanced> module (from CPAN, and
714starting from Perl 5.8 part of the standard distribution) is able
715to do this properly.
87275199 716
19799a22 717There can be whitespace between the operator and the quoting
fb73857a 718characters, except when C<#> is being used as the quoting character.
19799a22 719C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
720operator C<q> followed by a comment. Its argument will be taken
721from the next line. This allows you to write:
fb73857a 722
723 s {foo} # Replace foo
724 {bar} # with bar.
725
904501ec 726The following escape sequences are available in constructs that interpolate
727and in transliterations.
a0d0e21e 728
6ee5d4e7 729 \t tab (HT, TAB)
5a964f20 730 \n newline (NL)
6ee5d4e7 731 \r return (CR)
732 \f form feed (FF)
733 \b backspace (BS)
734 \a alarm (bell) (BEL)
735 \e escape (ESC)
a0ed51b3 736 \033 octal char (ESC)
737 \x1b hex char (ESC)
738 \x{263a} wide hex char (SMILEY)
19799a22 739 \c[ control char (ESC)
95cc3e0c 740 \N{name} named Unicode character
2c268ad5 741
904501ec 742The following escape sequences are available in constructs that interpolate
743but not in transliterations.
744
a0d0e21e 745 \l lowercase next char
746 \u uppercase next char
747 \L lowercase till \E
748 \U uppercase till \E
749 \E end case modification
1d2dff63 750 \Q quote non-word characters till \E
a0d0e21e 751
95cc3e0c 752If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
753C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
754If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
755beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
756C<\U> is as defined by Unicode. For documentation of C<\N{name}>,
757see L<charnames>.
a034a98d 758
5a964f20 759All systems use the virtual C<"\n"> to represent a line terminator,
760called a "newline". There is no such thing as an unvarying, physical
19799a22 761newline character. It is only an illusion that the operating system,
5a964f20 762device drivers, C libraries, and Perl all conspire to preserve. Not all
763systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
764on a Mac, these are reversed, and on systems without line terminator,
765printing C<"\n"> may emit no actual data. In general, use C<"\n"> when
766you mean a "newline" for your system, but use the literal ASCII when you
767need an exact character. For example, most networking protocols expect
2a380090 768and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20 769and although they often accept just C<"\012">, they seldom tolerate just
770C<"\015">. If you get in the habit of using C<"\n"> for networking,
771you may be burned some day.
772
904501ec 773For constructs that do interpolate, variables beginning with "C<$>"
774or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a 775C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
776But method calls such as C<< $obj->meth >> are not.
af9219ee 777
778Interpolating an array or slice interpolates the elements in order,
779separated by the value of C<$">, so is equivalent to interpolating
904501ec 780C<join $", @array>. "Punctuation" arrays such as C<@+> are only
781interpolated if the name is enclosed in braces C<@{+}>.
af9219ee 782
1d2dff63 783You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
784An unescaped C<$> or C<@> interpolates the corresponding variable,
785while escaping will cause the literal string C<\$> to be inserted.
786You'll need to write something like C<m/\Quser\E\@\Qhost/>.
787
a0d0e21e 788Patterns are subject to an additional level of interpretation as a
789regular expression. This is done as a second pass, after variables are
790interpolated, so that regular expressions may be incorporated into the
791pattern from the variables. If this is not what you want, use C<\Q> to
792interpolate a variable literally.
793
19799a22 794Apart from the behavior described above, Perl does not expand
795multiple levels of interpolation. In particular, contrary to the
796expectations of shell programmers, back-quotes do I<NOT> interpolate
797within double quotes, nor do single quotes impede evaluation of
798variables when used within double quotes.
a0d0e21e 799
5f05dabc 800=head2 Regexp Quote-Like Operators
cb1a09d0 801
5f05dabc 802Here are the quote-like operators that apply to pattern
cb1a09d0 803matching and related activities.
804
a0d0e21e 805=over 8
806
807=item ?PATTERN?
808
809This is just like the C</pattern/> search, except that it matches only
810once between calls to the reset() operator. This is a useful
5f05dabc 811optimization when you want to see only the first occurrence of
a0d0e21e 812something in each file of a set of files, for instance. Only C<??>
813patterns local to the current package are reset.
814
5a964f20 815 while (<>) {
816 if (?^$?) {
817 # blank line between header and body
818 }
819 } continue {
820 reset if eof; # clear ?? status for next file
821 }
822
483b4840 823This usage is vaguely deprecated, which means it just might possibly
19799a22 824be removed in some distant future version of Perl, perhaps somewhere
825around the year 2168.
a0d0e21e 826
fb73857a 827=item m/PATTERN/cgimosx
a0d0e21e 828
fb73857a 829=item /PATTERN/cgimosx
a0d0e21e 830
5a964f20 831Searches a string for a pattern match, and in scalar context returns
19799a22 832true if it succeeds, false if it fails. If no string is specified
833via the C<=~> or C<!~> operator, the $_ string is searched. (The
834string specified with C<=~> need not be an lvalue--it may be the
835result of an expression evaluation, but remember the C<=~> binds
836rather tightly.) See also L<perlre>. See L<perllocale> for
837discussion of additional considerations that apply when C<use locale>
838is in effect.
a0d0e21e 839
840Options are:
841
fb73857a 842 c Do not reset search position on a failed match when /g is in effect.
5f05dabc 843 g Match globally, i.e., find all occurrences.
a0d0e21e 844 i Do case-insensitive pattern matching.
845 m Treat string as multiple lines.
5f05dabc 846 o Compile pattern only once.
a0d0e21e 847 s Treat string as single line.
848 x Use extended regular expressions.
849
850If "/" is the delimiter then the initial C<m> is optional. With the C<m>
01ae956f 851you can use any pair of non-alphanumeric, non-whitespace characters
19799a22 852as delimiters. This is particularly useful for matching path names
853that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
7bac28a0 854the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
19799a22 855If "'" is the delimiter, no interpolation is performed on the PATTERN.
a0d0e21e 856
857PATTERN may contain variables, which will be interpolated (and the
f70b4f9c 858pattern recompiled) every time the pattern search is evaluated, except
1f247705 859for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
860C<$|> are not interpolated because they look like end-of-string tests.)
f70b4f9c 861If you want such a pattern to be compiled only once, add a C</o> after
862the trailing delimiter. This avoids expensive run-time recompilations,
863and is useful when the value you are interpolating won't change over
864the life of the script. However, mentioning C</o> constitutes a promise
865that you won't change the variables in the pattern. If you change them,
13a2d996 866Perl won't even notice. See also L<"qr/STRING/imosx">.
a0d0e21e 867
5a964f20 868If the PATTERN evaluates to the empty string, the last
d65afb4b 869I<successfully> matched regular expression is used instead. In this
870case, only the C<g> and C<c> flags on the empty pattern is honoured -
871the other flags are taken from the original pattern. If no match has
872previously succeeded, this will (silently) act instead as a genuine
873empty pattern (which will always match).
a0d0e21e 874
c963b151 875Note that it's possible to confuse Perl into thinking C<//> (the empty
876regex) is really C<//> (the defined-or operator). Perl is usually pretty
877good about this, but some pathological cases might trigger this, such as
878C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
879(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
880will assume you meant defined-or. If you meant the empty regex, just
881use parentheses or spaces to disambiguate, or even prefix the empty
882regex with an C<m> (so C<//> becomes C<m//>).
883
19799a22 884If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 885list consisting of the subexpressions matched by the parentheses in the
f7e33566 886pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
887also set, and that this differs from Perl 4's behavior.) When there are
888no parentheses in the pattern, the return value is the list C<(1)> for
889success. With or without parentheses, an empty list is returned upon
890failure.
a0d0e21e 891
892Examples:
893
894 open(TTY, '/dev/tty');
895 <TTY> =~ /^y/i && foo(); # do foo if desired
896
897 if (/Version: *([0-9.]*)/) { $version = $1; }
898
899 next if m#^/usr/spool/uucp#;
900
901 # poor man's grep
902 $arg = shift;
903 while (<>) {
904 print if /$arg/o; # compile only once
905 }
906
907 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
908
909This last example splits $foo into the first two words and the
5f05dabc 910remainder of the line, and assigns those three fields to $F1, $F2, and
911$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e 912the pattern matched.
913
19799a22 914The C</g> modifier specifies global pattern matching--that is,
915matching as many times as possible within the string. How it behaves
916depends on the context. In list context, it returns a list of the
917substrings matched by any capturing parentheses in the regular
918expression. If there are no parentheses, it returns a list of all
919the matched strings, as if there were parentheses around the whole
920pattern.
a0d0e21e 921
7e86de3e 922In scalar context, each execution of C<m//g> finds the next match,
19799a22 923returning true if it matches, and false if there is no further match.
7e86de3e 924The position after the last match can be read or set using the pos()
925function; see L<perlfunc/pos>. A failed match normally resets the
926search position to the beginning of the string, but you can avoid that
927by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
928string also resets the search position.
c90c0ff4 929
930You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
931zero-width assertion that matches the exact position where the previous
5d43e42d 932C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
933still anchors at pos(), but the match is of course only attempted once.
934Using C<\G> without C</g> on a target string that has not previously had a
935C</g> match applied to it is the same as using the C<\A> assertion to match
fe4b3f22 936the beginning of the string. Note also that, currently, C<\G> is only
937properly supported when anchored at the very beginning of the pattern.
c90c0ff4 938
939Examples:
a0d0e21e 940
941 # list context
942 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
943
944 # scalar context
5d43e42d 945 $/ = "";
19799a22 946 while (defined($paragraph = <>)) {
947 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
948 $sentences++;
a0d0e21e 949 }
950 }
951 print "$sentences\n";
952
c90c0ff4 953 # using m//gc with \G
137443ea 954 $_ = "ppooqppqq";
44a8e56a 955 while ($i++ < 2) {
956 print "1: '";
c90c0ff4 957 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 958 print "2: '";
c90c0ff4 959 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 960 print "3: '";
c90c0ff4 961 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 962 }
5d43e42d 963 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a 964
965The last example should print:
966
967 1: 'oo', pos=4
137443ea 968 2: 'q', pos=5
44a8e56a 969 3: 'pp', pos=7
970 1: '', pos=7
137443ea 971 2: 'q', pos=8
972 3: '', pos=8
5d43e42d 973 Final: 'q', pos=8
974
975Notice that the final match matched C<q> instead of C<p>, which a match
976without the C<\G> anchor would have done. Also note that the final match
977did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
978final match did indeed match C<p>, it's a good bet that you're running an
979older (pre-5.6.0) Perl.
44a8e56a 980
c90c0ff4 981A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 982combine several regexps like this to process a string part-by-part,
c90c0ff4 983doing different actions depending on which regexp matched. Each
984regexp tries to match where the previous one leaves off.
e7ea3e70 985
3fe9a6f1 986 $_ = <<'EOL';
e7ea3e70 987 $url = new URI::URL "http://www/"; die if $url eq "xXx";
3fe9a6f1 988 EOL
989 LOOP:
e7ea3e70 990 {
c90c0ff4 991 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
992 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
993 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
994 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
995 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
996 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
997 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
e7ea3e70 998 print ". That's all!\n";
999 }
1000
1001Here is the output (split into several lines):
1002
1003 line-noise lowercase line-noise lowercase UPPERCASE line-noise
1004 UPPERCASE line-noise lowercase line-noise lowercase line-noise
1005 lowercase lowercase line-noise lowercase lowercase line-noise
1006 MiXeD line-noise. That's all!
44a8e56a 1007
a0d0e21e 1008=item q/STRING/
1009
1010=item C<'STRING'>
1011
19799a22 1012A single-quoted, literal string. A backslash represents a backslash
68dc0745 1013unless followed by the delimiter or another backslash, in which case
1014the delimiter or backslash is interpolated.
a0d0e21e 1015
1016 $foo = q!I said, "You said, 'She said it.'"!;
1017 $bar = q('This is it.');
68dc0745 1018 $baz = '\n'; # a two-character string
a0d0e21e 1019
1020=item qq/STRING/
1021
1022=item "STRING"
1023
1024A double-quoted, interpolated string.
1025
1026 $_ .= qq
1027 (*** The previous line contains the naughty word "$1".\n)
19799a22 1028 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 1029 $baz = "\n"; # a one-character string
a0d0e21e 1030
eec2d3df 1031=item qr/STRING/imosx
1032
322edccd 1033This operator quotes (and possibly compiles) its I<STRING> as a regular
19799a22 1034expression. I<STRING> is interpolated the same way as I<PATTERN>
1035in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
1036is done. Returns a Perl value which may be used instead of the
1037corresponding C</STRING/imosx> expression.
4b6a7270 1038
1039For example,
1040
1041 $rex = qr/my.STRING/is;
1042 s/$rex/foo/;
1043
1044is equivalent to
1045
1046 s/my.STRING/foo/is;
1047
1048The result may be used as a subpattern in a match:
eec2d3df 1049
1050 $re = qr/$pattern/;
0a92e3a8 1051 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1052 $string =~ $re; # or used standalone
4b6a7270 1053 $string =~ /$re/; # or this way
1054
1055Since Perl may compile the pattern at the moment of execution of qr()
19799a22 1056operator, using qr() may have speed advantages in some situations,
4b6a7270 1057notably if the result of qr() is used standalone:
1058
1059 sub match {
1060 my $patterns = shift;
1061 my @compiled = map qr/$_/i, @$patterns;
1062 grep {
1063 my $success = 0;
a7665c5e 1064 foreach my $pat (@compiled) {
4b6a7270 1065 $success = 1, last if /$pat/;
1066 }
1067 $success;
1068 } @_;
1069 }
1070
19799a22 1071Precompilation of the pattern into an internal representation at
1072the moment of qr() avoids a need to recompile the pattern every
1073time a match C</$pat/> is attempted. (Perl has many other internal
1074optimizations, but none would be triggered in the above example if
1075we did not use qr() operator.)
eec2d3df 1076
1077Options are:
1078
1079 i Do case-insensitive pattern matching.
1080 m Treat string as multiple lines.
1081 o Compile pattern only once.
1082 s Treat string as single line.
1083 x Use extended regular expressions.
1084
0a92e3a8 1085See L<perlre> for additional information on valid syntax for STRING, and
1086for a detailed look at the semantics of regular expressions.
1087
a0d0e21e 1088=item qx/STRING/
1089
1090=item `STRING`
1091
43dd4d21 1092A string which is (possibly) interpolated and then executed as a
1093system command with C</bin/sh> or its equivalent. Shell wildcards,
1094pipes, and redirections will be honored. The collected standard
1095output of the command is returned; standard error is unaffected. In
1096scalar context, it comes back as a single (potentially multi-line)
1097string, or undef if the command failed. In list context, returns a
1098list of lines (however you've defined lines with $/ or
1099$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
5a964f20 1100
1101Because backticks do not affect standard error, use shell file descriptor
1102syntax (assuming the shell supports this) if you care to address this.
1103To capture a command's STDERR and STDOUT together:
a0d0e21e 1104
5a964f20 1105 $output = `cmd 2>&1`;
1106
1107To capture a command's STDOUT but discard its STDERR:
1108
1109 $output = `cmd 2>/dev/null`;
1110
1111To capture a command's STDERR but discard its STDOUT (ordering is
1112important here):
1113
1114 $output = `cmd 2>&1 1>/dev/null`;
1115
1116To exchange a command's STDOUT and STDERR in order to capture the STDERR
1117but leave its STDOUT to come out the old STDERR:
1118
1119 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1120
1121To read both a command's STDOUT and its STDERR separately, it's easiest
1122and safest to redirect them separately to files, and then read from those
1123files when the program is done:
1124
1125 system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
1126
1127Using single-quote as a delimiter protects the command from Perl's
1128double-quote interpolation, passing it on to the shell instead:
1129
1130 $perl_info = qx(ps $$); # that's Perl's $$
1131 $shell_info = qx'ps $$'; # that's the new shell's $$
1132
19799a22 1133How that string gets evaluated is entirely subject to the command
5a964f20 1134interpreter on your system. On most platforms, you will have to protect
1135shell metacharacters if you want them treated literally. This is in
1136practice difficult to do, as it's unclear how to escape which characters.
1137See L<perlsec> for a clean and safe example of a manual fork() and exec()
1138to emulate backticks safely.
a0d0e21e 1139
bb32b41a 1140On some platforms (notably DOS-like ones), the shell may not be
1141capable of dealing with multiline commands, so putting newlines in
1142the string may not get you what you want. You may be able to evaluate
1143multiple commands in a single line by separating them with the command
1144separator character, if your shell supports that (e.g. C<;> on many Unix
1145shells; C<&> on the Windows NT C<cmd> shell).
1146
0f897271 1147Beginning with v5.6.0, Perl will attempt to flush all files opened for
1148output before starting the child process, but this may not be supported
1149on some platforms (see L<perlport>). To be safe, you may need to set
1150C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1151C<IO::Handle> on any open handles.
1152
bb32b41a 1153Beware that some command shells may place restrictions on the length
1154of the command line. You must ensure your strings don't exceed this
1155limit after any necessary interpolations. See the platform-specific
1156release notes for more details about your particular environment.
1157
5a964f20 1158Using this operator can lead to programs that are difficult to port,
1159because the shell commands called vary between systems, and may in
1160fact not be present at all. As one example, the C<type> command under
1161the POSIX shell is very different from the C<type> command under DOS.
1162That doesn't mean you should go out of your way to avoid backticks
1163when they're the right way to get something done. Perl was made to be
1164a glue language, and one of the things it glues together is commands.
1165Just understand what you're getting yourself into.
bb32b41a 1166
dc848c6f 1167See L<"I/O Operators"> for more discussion.
a0d0e21e 1168
945c54fd 1169=item qw/STRING/
1170
1171Evaluates to a list of the words extracted out of STRING, using embedded
1172whitespace as the word delimiters. It can be understood as being roughly
1173equivalent to:
1174
1175 split(' ', q/STRING/);
1176
1177the difference being that it generates a real list at compile time. So
1178this expression:
1179
1180 qw(foo bar baz)
1181
1182is semantically equivalent to the list:
1183
1184 'foo', 'bar', 'baz'
1185
1186Some frequently seen examples:
1187
1188 use POSIX qw( setlocale localeconv )
1189 @EXPORT = qw( foo bar baz );
1190
1191A common mistake is to try to separate the words with comma or to
1192put comments into a multi-line C<qw>-string. For this reason, the
1193C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
1194produces warnings if the STRING contains the "," or the "#" character.
1195
a0d0e21e 1196=item s/PATTERN/REPLACEMENT/egimosx
1197
1198Searches a string for a pattern, and if found, replaces that pattern
1199with the replacement text and returns the number of substitutions
e37d713d 1200made. Otherwise it returns false (specifically, the empty string).
a0d0e21e 1201
1202If no string is specified via the C<=~> or C<!~> operator, the C<$_>
1203variable is searched and modified. (The string specified with C<=~> must
5a964f20 1204be scalar variable, an array element, a hash element, or an assignment
5f05dabc 1205to one of those, i.e., an lvalue.)
a0d0e21e 1206
19799a22 1207If the delimiter chosen is a single quote, no interpolation is
a0d0e21e 1208done on either the PATTERN or the REPLACEMENT. Otherwise, if the
1209PATTERN contains a $ that looks like a variable rather than an
1210end-of-string test, the variable will be interpolated into the pattern
5f05dabc 1211at run-time. If you want the pattern compiled only once the first time
a0d0e21e 1212the variable is interpolated, use the C</o> option. If the pattern
5a964f20 1213evaluates to the empty string, the last successfully executed regular
a0d0e21e 1214expression is used instead. See L<perlre> for further explanation on these.
5a964f20 1215See L<perllocale> for discussion of additional considerations that apply
a034a98d 1216when C<use locale> is in effect.
a0d0e21e 1217
1218Options are:
1219
1220 e Evaluate the right side as an expression.
5f05dabc 1221 g Replace globally, i.e., all occurrences.
a0d0e21e 1222 i Do case-insensitive pattern matching.
1223 m Treat string as multiple lines.
5f05dabc 1224 o Compile pattern only once.
a0d0e21e 1225 s Treat string as single line.
1226 x Use extended regular expressions.
1227
1228Any non-alphanumeric, non-whitespace delimiter may replace the
1229slashes. If single quotes are used, no interpretation is done on the
e37d713d 1230replacement string (the C</e> modifier overrides this, however). Unlike
54310121 1231Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
e37d713d 1232text is not evaluated as a command. If the
a0d0e21e 1233PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
5f05dabc 1234pair of quotes, which may or may not be bracketing quotes, e.g.,
35f2feb0 1235C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
cec88af6 1236replacement portion to be treated as a full-fledged Perl expression
1237and evaluated right then and there. It is, however, syntax checked at
1238compile-time. A second C<e> modifier will cause the replacement portion
1239to be C<eval>ed before being run as a Perl expression.
a0d0e21e 1240
1241Examples:
1242
1243 s/\bgreen\b/mauve/g; # don't change wintergreen
1244
1245 $path =~ s|/usr/bin|/usr/local/bin|;
1246
1247 s/Login: $foo/Login: $bar/; # run-time pattern
1248
5a964f20 1249 ($foo = $bar) =~ s/this/that/; # copy first, then change
a0d0e21e 1250
5a964f20 1251 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
a0d0e21e 1252
1253 $_ = 'abc123xyz';
1254 s/\d+/$&*2/e; # yields 'abc246xyz'
1255 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1256 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1257
1258 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1259 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
1260 s/^=(\w+)/&pod($1)/ge; # use function call
1261
5a964f20 1262 # expand variables in $_, but dynamics only, using
1263 # symbolic dereferencing
1264 s/\$(\w+)/${$1}/g;
1265
cec88af6 1266 # Add one to the value of any numbers in the string
1267 s/(\d+)/1 + $1/eg;
1268
1269 # This will expand any embedded scalar variable
1270 # (including lexicals) in $_ : First $1 is interpolated
1271 # to the variable name, and then evaluated
a0d0e21e 1272 s/(\$\w+)/$1/eeg;
1273
5a964f20 1274 # Delete (most) C comments.
a0d0e21e 1275 $program =~ s {
4633a7c4 1276 /\* # Match the opening delimiter.
1277 .*? # Match a minimal number of characters.
1278 \*/ # Match the closing delimiter.
a0d0e21e 1279 } []gsx;
1280
5a964f20 1281 s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively
1282
1283 for ($variable) { # trim white space in $variable, cheap
1284 s/^\s+//;
1285 s/\s+$//;
1286 }
a0d0e21e 1287
1288 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1289
54310121 1290Note the use of $ instead of \ in the last example. Unlike
35f2feb0 1291B<sed>, we use the \<I<digit>> form in only the left hand side.
1292Anywhere else it's $<I<digit>>.
a0d0e21e 1293
5f05dabc 1294Occasionally, you can't use just a C</g> to get all the changes
19799a22 1295to occur that you might want. Here are two common cases:
a0d0e21e 1296
1297 # put commas in the right places in an integer
19799a22 1298 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
a0d0e21e 1299
1300 # expand tabs to 8-column spacing
1301 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1302
6940069f 1303=item tr/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1304
6940069f 1305=item y/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1306
2c268ad5 1307Transliterates all occurrences of the characters found in the search list
a0d0e21e 1308with the corresponding character in the replacement list. It returns
1309the number of characters replaced or deleted. If no string is
2c268ad5 1310specified via the =~ or !~ operator, the $_ string is transliterated. (The
54310121 1311string specified with =~ must be a scalar variable, an array element, a
1312hash element, or an assignment to one of those, i.e., an lvalue.)
8ada0baa 1313
2c268ad5 1314A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
1315does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 1316For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
1317SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1318its own pair of quotes, which may or may not be bracketing quotes,
2c268ad5 1319e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
a0d0e21e 1320
cc255d5f 1321Note that C<tr> does B<not> do regular expression character classes
1322such as C<\d> or C<[:lower:]>. The <tr> operator is not equivalent to
1323the tr(1) utility. If you want to map strings between lower/upper
1324cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1325using the C<s> operator if you need regular expressions.
1326
8ada0baa 1327Note also that the whole range idea is rather unportable between
1328character sets--and even within character sets they may cause results
1329you probably didn't expect. A sound principle is to use only ranges
1330that begin from and end at either alphabets of equal case (a-e, A-E),
1331or digits (0-4). Anything else is unsafe. If in doubt, spell out the
1332character sets in full.
1333
a0d0e21e 1334Options:
1335
1336 c Complement the SEARCHLIST.
1337 d Delete found but unreplaced characters.
1338 s Squash duplicate replaced characters.
1339
19799a22 1340If the C</c> modifier is specified, the SEARCHLIST character set
1341is complemented. If the C</d> modifier is specified, any characters
1342specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1343(Note that this is slightly more flexible than the behavior of some
1344B<tr> programs, which delete anything they find in the SEARCHLIST,
1345period.) If the C</s> modifier is specified, sequences of characters
1346that were transliterated to the same character are squashed down
1347to a single instance of the character.
a0d0e21e 1348
1349If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1350exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
1351than the SEARCHLIST, the final character is replicated till it is long
5a964f20 1352enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
a0d0e21e 1353This latter is useful for counting characters in a class or for
1354squashing character sequences in a class.
1355
1356Examples:
1357
1358 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1359
1360 $cnt = tr/*/*/; # count the stars in $_
1361
1362 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1363
1364 $cnt = tr/0-9//; # count the digits in $_
1365
1366 tr/a-zA-Z//s; # bookkeeper -> bokeper
1367
1368 ($HOST = $host) =~ tr/a-z/A-Z/;
1369
1370 tr/a-zA-Z/ /cs; # change non-alphas to single space
1371
1372 tr [\200-\377]
1373 [\000-\177]; # delete 8th bit
1374
19799a22 1375If multiple transliterations are given for a character, only the
1376first one is used:
748a9306 1377
1378 tr/AAA/XYZ/
1379
2c268ad5 1380will transliterate any A to X.
748a9306 1381
19799a22 1382Because the transliteration table is built at compile time, neither
a0d0e21e 1383the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
19799a22 1384interpolation. That means that if you want to use variables, you
1385must use an eval():
a0d0e21e 1386
1387 eval "tr/$oldlist/$newlist/";
1388 die $@ if $@;
1389
1390 eval "tr/$oldlist/$newlist/, 1" or die $@;
1391
7e3b091d 1392=item <<EOF
1393
1394A line-oriented form of quoting is based on the shell "here-document"
1395syntax. Following a C<< << >> you specify a string to terminate
1396the quoted material, and all lines following the current line down to
1397the terminating string are the value of the item. The terminating
1398string may be either an identifier (a word), or some quoted text. If
1399quoted, the type of quotes you use determines the treatment of the
1400text, just as in regular quoting. An unquoted identifier works like
1401double quotes. There must be no space between the C<< << >> and
1402the identifier, unless the identifier is quoted. (If you put a space it
1403will be treated as a null identifier, which is valid, and matches the first
1404empty line.) The terminating string must appear by itself (unquoted and
1405with no surrounding whitespace) on the terminating line.
1406
1407 print <<EOF;
1408 The price is $Price.
1409 EOF
1410
1411 print << "EOF"; # same as above
1412 The price is $Price.
1413 EOF
1414
1415 print << `EOC`; # execute commands
1416 echo hi there
1417 echo lo there
1418 EOC
1419
1420 print <<"foo", <<"bar"; # you can stack them
1421 I said foo.
1422 foo
1423 I said bar.
1424 bar
1425
1426 myfunc(<< "THIS", 23, <<'THAT');
1427 Here's a line
1428 or two.
1429 THIS
1430 and here's another.
1431 THAT
1432
1433Just don't forget that you have to put a semicolon on the end
1434to finish the statement, as Perl doesn't know you're not going to
1435try to do this:
1436
1437 print <<ABC
1438 179231
1439 ABC
1440 + 20;
1441
1442If you want your here-docs to be indented with the
1443rest of the code, you'll need to remove leading whitespace
1444from each line manually:
1445
1446 ($quote = <<'FINIS') =~ s/^\s+//gm;
1447 The Road goes ever on and on,
1448 down from the door where it began.
1449 FINIS
1450
1451If you use a here-doc within a delimited construct, such as in C<s///eg>,
1452the quoted material must come on the lines following the final delimiter.
1453So instead of
1454
1455 s/this/<<E . 'that'
1456 the other
1457 E
1458 . 'more '/eg;
1459
1460you have to write
1461
1462 s/this/<<E . 'that'
1463 . 'more '/eg;
1464 the other
1465 E
1466
1467If the terminating identifier is on the last line of the program, you
1468must be sure there is a newline after it; otherwise, Perl will give the
1469warning B<Can't find string terminator "END" anywhere before EOF...>.
1470
1471Additionally, the quoting rules for the identifier are not related to
1472Perl's quoting rules -- C<q()>, C<qq()>, and the like are not supported
1473in place of C<''> and C<"">, and the only interpolation is for backslashing
1474the quoting character:
1475
1476 print << "abc\"def";
1477 testing...
1478 abc"def
1479
1480Finally, quoted strings cannot span multiple lines. The general rule is
1481that the identifier must be a string literal. Stick with that, and you
1482should be safe.
1483
a0d0e21e 1484=back
1485
75e14d17 1486=head2 Gory details of parsing quoted constructs
1487
19799a22 1488When presented with something that might have several different
1489interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1490principle to pick the most probable interpretation. This strategy
1491is so successful that Perl programmers often do not suspect the
1492ambivalence of what they write. But from time to time, Perl's
1493notions differ substantially from what the author honestly meant.
1494
1495This section hopes to clarify how Perl handles quoted constructs.
1496Although the most common reason to learn this is to unravel labyrinthine
1497regular expressions, because the initial steps of parsing are the
1498same for all quoting operators, they are all discussed together.
1499
1500The most important Perl parsing rule is the first one discussed
1501below: when processing a quoted construct, Perl first finds the end
1502of that construct, then interprets its contents. If you understand
1503this rule, you may skip the rest of this section on the first
1504reading. The other rules are likely to contradict the user's
1505expectations much less frequently than this first one.
1506
1507Some passes discussed below are performed concurrently, but because
1508their results are the same, we consider them individually. For different
1509quoting constructs, Perl performs different numbers of passes, from
1510one to five, but these passes are always performed in the same order.
75e14d17 1511
13a2d996 1512=over 4
75e14d17 1513
1514=item Finding the end
1515
19799a22 1516The first pass is finding the end of the quoted construct, whether
1517it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
1518construct, a C</> that terminates a C<qq//> construct, a C<]> which
35f2feb0 1519terminates C<qq[]> construct, or a C<< > >> which terminates a
1520fileglob started with C<< < >>.
75e14d17 1521
19799a22 1522When searching for single-character non-pairing delimiters, such
1523as C</>, combinations of C<\\> and C<\/> are skipped. However,
1524when searching for single-character pairing delimiter like C<[>,
1525combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
1526C<[>, C<]> are skipped as well. When searching for multicharacter
1527delimiters, nothing is skipped.
75e14d17 1528
19799a22 1529For constructs with three-part delimiters (C<s///>, C<y///>, and
1530C<tr///>), the search is repeated once more.
75e14d17 1531
19799a22 1532During this search no attention is paid to the semantics of the construct.
1533Thus:
75e14d17 1534
1535 "$hash{"$foo/$bar"}"
1536
2a94b7ce 1537or:
75e14d17 1538
1539 m/
2a94b7ce 1540 bar # NOT a comment, this slash / terminated m//!
75e14d17 1541 /x
1542
19799a22 1543do not form legal quoted expressions. The quoted part ends on the
1544first C<"> and C</>, and the rest happens to be a syntax error.
1545Because the slash that terminated C<m//> was followed by a C<SPACE>,
1546the example above is not C<m//x>, but rather C<m//> with no C</x>
1547modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 1548
1549=item Removal of backslashes before delimiters
1550
19799a22 1551During the second pass, text between the starting and ending
1552delimiters is copied to a safe location, and the C<\> is removed
1553from combinations consisting of C<\> and delimiter--or delimiters,
1554meaning both starting and ending delimiters will should these differ.
1555This removal does not happen for multi-character delimiters.
1556Note that the combination C<\\> is left intact, just as it was.
75e14d17 1557
19799a22 1558Starting from this step no information about the delimiters is
1559used in parsing.
75e14d17 1560
1561=item Interpolation
1562
19799a22 1563The next step is interpolation in the text obtained, which is now
1564delimiter-independent. There are four different cases.
75e14d17 1565
13a2d996 1566=over 4
75e14d17 1567
1568=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
1569
1570No interpolation is performed.
1571
1572=item C<''>, C<q//>
1573
1574The only interpolation is removal of C<\> from pairs C<\\>.
1575
35f2feb0 1576=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
75e14d17 1577
19799a22 1578C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1579converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
1580is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
1581The other combinations are replaced with appropriate expansions.
2a94b7ce 1582
19799a22 1583Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
1584is interpolated in the usual way. Something like C<"\Q\\E"> has
1585no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
1586result is the same as for C<"\\\\E">. As a general rule, backslashes
1587between C<\Q> and C<\E> may lead to counterintuitive results. So,
1588C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1589as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce 1590
1591 $str = '\t';
1592 return "\Q$str";
1593
1594may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1595
19799a22 1596Interpolated scalars and arrays are converted internally to the C<join> and
92d29cee 1597C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes:
75e14d17 1598
19799a22 1599 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 1600
19799a22 1601All operations above are performed simultaneously, left to right.
75e14d17 1602
19799a22 1603Because the result of C<"\Q STRING \E"> has all metacharacters
1604quoted, there is no way to insert a literal C<$> or C<@> inside a
1605C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
1606C<"\\\$">; if not, it is interpreted as the start of an interpolated
1607scalar.
75e14d17 1608
19799a22 1609Note also that the interpolation code needs to make a decision on
1610where the interpolated scalar ends. For instance, whether
35f2feb0 1611C<< "a $b -> {c}" >> really means:
75e14d17 1612
1613 "a " . $b . " -> {c}";
1614
2a94b7ce 1615or:
75e14d17 1616
1617 "a " . $b -> {c};
1618
19799a22 1619Most of the time, the longest possible text that does not include
1620spaces between components and which contains matching braces or
1621brackets. because the outcome may be determined by voting based
1622on heuristic estimators, the result is not strictly predictable.
1623Fortunately, it's usually correct for ambiguous cases.
75e14d17 1624
1625=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1626
19799a22 1627Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
1628happens (almost) as with C<qq//> constructs, but the substitution
1629of C<\> followed by RE-special chars (including C<\>) is not
1630performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
1631a C<#>-comment in a C<//x>-regular expression, no processing is
1632performed whatsoever. This is the first step at which the presence
1633of the C<//x> modifier is relevant.
1634
1635Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
1636interpolated, and constructs C<$var[SOMETHING]> are voted (by several
1637different estimators) to be either an array element or C<$var>
1638followed by an RE alternative. This is where the notation
1639C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
1640array element C<-9>, not as a regular expression from the variable
1641C<$arr> followed by a digit, which would be the interpretation of
1642C</$arr[0-9]/>. Since voting among different estimators may occur,
1643the result is not predictable.
1644
1645It is at this step that C<\1> is begrudgingly converted to C<$1> in
1646the replacement text of C<s///> to correct the incorrigible
1647I<sed> hackers who haven't picked up the saner idiom yet. A warning
9f1b1f2d 1648is emitted if the C<use warnings> pragma or the B<-w> command-line flag
1649(that is, the C<$^W> variable) was set.
19799a22 1650
1651The lack of processing of C<\\> creates specific restrictions on
1652the post-processed text. If the delimiter is C</>, one cannot get
1653the combination C<\/> into the result of this step. C</> will
1654finish the regular expression, C<\/> will be stripped to C</> on
1655the previous step, and C<\\/> will be left as is. Because C</> is
1656equivalent to C<\/> inside a regular expression, this does not
1657matter unless the delimiter happens to be character special to the
1658RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
1659alphanumeric char, as in:
2a94b7ce 1660
1661 m m ^ a \s* b mmx;
1662
19799a22 1663In the RE above, which is intentionally obfuscated for illustration, the
2a94b7ce 1664delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
19799a22 1665RE is the same as for C<m/ ^ a s* b /mx>). There's more than one
1666reason you're encouraged to restrict your delimiters to non-alphanumeric,
1667non-whitespace choices.
75e14d17 1668
1669=back
1670
19799a22 1671This step is the last one for all constructs except regular expressions,
75e14d17 1672which are processed further.
1673
1674=item Interpolation of regular expressions
1675
19799a22 1676Previous steps were performed during the compilation of Perl code,
1677but this one happens at run time--although it may be optimized to
1678be calculated at compile time if appropriate. After preprocessing
1679described above, and possibly after evaluation if catenation,
1680joining, casing translation, or metaquoting are involved, the
1681resulting I<string> is passed to the RE engine for compilation.
1682
1683Whatever happens in the RE engine might be better discussed in L<perlre>,
1684but for the sake of continuity, we shall do so here.
1685
1686This is another step where the presence of the C<//x> modifier is
1687relevant. The RE engine scans the string from left to right and
1688converts it to a finite automaton.
1689
1690Backslashed characters are either replaced with corresponding
1691literal strings (as with C<\{>), or else they generate special nodes
1692in the finite automaton (as with C<\b>). Characters special to the
1693RE engine (such as C<|>) generate corresponding nodes or groups of
1694nodes. C<(?#...)> comments are ignored. All the rest is either
1695converted to literal strings to match, or else is ignored (as is
1696whitespace and C<#>-style comments if C<//x> is present).
1697
1698Parsing of the bracketed character class construct, C<[...]>, is
1699rather different than the rule used for the rest of the pattern.
1700The terminator of this construct is found using the same rules as
1701for finding the terminator of a C<{}>-delimited construct, the only
1702exception being that C<]> immediately following C<[> is treated as
1703though preceded by a backslash. Similarly, the terminator of
1704C<(?{...})> is found using the same rules as for finding the
1705terminator of a C<{}>-delimited construct.
1706
1707It is possible to inspect both the string given to RE engine and the
1708resulting finite automaton. See the arguments C<debug>/C<debugcolor>
1709in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 1710switch documented in L<perlrun/"Command Switches">.
75e14d17 1711
1712=item Optimization of regular expressions
1713
7522fed5 1714This step is listed for completeness only. Since it does not change
75e14d17 1715semantics, details of this step are not documented and are subject
19799a22 1716to change without notice. This step is performed over the finite
1717automaton that was generated during the previous pass.
2a94b7ce 1718
19799a22 1719It is at this stage that C<split()> silently optimizes C</^/> to
1720mean C</^/m>.
75e14d17 1721
1722=back
1723
a0d0e21e 1724=head2 I/O Operators
1725
54310121 1726There are several I/O operators you should know about.
fbad3eb5 1727
7b8d334a 1728A string enclosed by backticks (grave accents) first undergoes
19799a22 1729double-quote interpolation. It is then interpreted as an external
1730command, and the output of that command is the value of the
e9c56f9b 1731backtick string, like in a shell. In scalar context, a single string
1732consisting of all output is returned. In list context, a list of
1733values is returned, one per line of output. (You can set C<$/> to use
1734a different line terminator.) The command is executed each time the
1735pseudo-literal is evaluated. The status value of the command is
1736returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
1737Unlike in B<csh>, no translation is done on the return data--newlines
1738remain newlines. Unlike in any of the shells, single quotes do not
1739hide variable names in the command from interpretation. To pass a
1740literal dollar-sign through to the shell you need to hide it with a
1741backslash. The generalized form of backticks is C<qx//>. (Because
1742backticks always undergo shell expansion as well, see L<perlsec> for
1743security concerns.)
19799a22 1744
1745In scalar context, evaluating a filehandle in angle brackets yields
1746the next line from that file (the newline, if any, included), or
1747C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
1748(sometimes known as file-slurp mode) and the file is empty, it
1749returns C<''> the first time, followed by C<undef> subsequently.
1750
1751Ordinarily you must assign the returned value to a variable, but
1752there is one situation where an automatic assignment happens. If
1753and only if the input symbol is the only thing inside the conditional
1754of a C<while> statement (even if disguised as a C<for(;;)> loop),
1755the value is automatically assigned to the global variable $_,
1756destroying whatever was there previously. (This may seem like an
1757odd thing to you, but you'll use the construct in almost every Perl
17b829fa 1758script you write.) The $_ variable is not implicitly localized.
19799a22 1759You'll have to put a C<local $_;> before the loop if you want that
1760to happen.
1761
1762The following lines are equivalent:
a0d0e21e 1763
748a9306 1764 while (defined($_ = <STDIN>)) { print; }
7b8d334a 1765 while ($_ = <STDIN>) { print; }
a0d0e21e 1766 while (<STDIN>) { print; }
1767 for (;<STDIN>;) { print; }
748a9306 1768 print while defined($_ = <STDIN>);
7b8d334a 1769 print while ($_ = <STDIN>);
a0d0e21e 1770 print while <STDIN>;
1771
19799a22 1772This also behaves similarly, but avoids $_ :
7b8d334a 1773
1774 while (my $line = <STDIN>) { print $line }
1775
19799a22 1776In these loop constructs, the assigned value (whether assignment
1777is automatic or explicit) is then tested to see whether it is
1778defined. The defined test avoids problems where line has a string
1779value that would be treated as false by Perl, for example a "" or
1780a "0" with no trailing newline. If you really mean for such values
1781to terminate the loop, they should be tested for explicitly:
7b8d334a 1782
1783 while (($_ = <STDIN>) ne '0') { ... }
1784 while (<STDIN>) { last unless $_; ... }
1785
35f2feb0 1786In other boolean contexts, C<< <I<filehandle>> >> without an
9f1b1f2d 1787explicit C<defined> test or comparison elicit a warning if the
1788C<use warnings> pragma or the B<-w>
19799a22 1789command-line switch (the C<$^W> variable) is in effect.
7b8d334a 1790
5f05dabc 1791The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22 1792filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
1793in packages, where they would be interpreted as local identifiers
1794rather than global.) Additional filehandles may be created with
1795the open() function, amongst others. See L<perlopentut> and
1796L<perlfunc/open> for details on this.
a0d0e21e 1797
35f2feb0 1798If a <FILEHANDLE> is used in a context that is looking for
19799a22 1799a list, a list comprising all input lines is returned, one line per
1800list element. It's easy to grow to a rather large data space this
1801way, so use with care.
a0d0e21e 1802
35f2feb0 1803<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
19799a22 1804See L<perlfunc/readline>.
fbad3eb5 1805
35f2feb0 1806The null filehandle <> is special: it can be used to emulate the
1807behavior of B<sed> and B<awk>. Input from <> comes either from
a0d0e21e 1808standard input, or from each file listed on the command line. Here's
35f2feb0 1809how it works: the first time <> is evaluated, the @ARGV array is
5a964f20 1810checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
a0d0e21e 1811gives you standard input. The @ARGV array is then processed as a list
1812of filenames. The loop
1813
1814 while (<>) {
1815 ... # code for each line
1816 }
1817
1818is equivalent to the following Perl-like pseudo code:
1819
3e3baf6d 1820 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e 1821 while ($ARGV = shift) {
1822 open(ARGV, $ARGV);
1823 while (<ARGV>) {
1824 ... # code for each line
1825 }
1826 }
1827
19799a22 1828except that it isn't so cumbersome to say, and will actually work.
1829It really does shift the @ARGV array and put the current filename
1830into the $ARGV variable. It also uses filehandle I<ARGV>
35f2feb0 1831internally--<> is just a synonym for <ARGV>, which
19799a22 1832is magical. (The pseudo code above doesn't work because it treats
35f2feb0 1833<ARGV> as non-magical.)
a0d0e21e 1834
35f2feb0 1835You can modify @ARGV before the first <> as long as the array ends up
a0d0e21e 1836containing the list of filenames you really want. Line numbers (C<$.>)
19799a22 1837continue as though the input were one big happy file. See the example
1838in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 1839
1840If you want to set @ARGV to your own list of files, go right ahead.
1841This sets @ARGV to all plain text files if no @ARGV was given:
1842
1843 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 1844
5a964f20 1845You can even set them to pipe commands. For example, this automatically
1846filters compressed arguments through B<gzip>:
1847
1848 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
1849
1850If you want to pass switches into your script, you can use one of the
a0d0e21e 1851Getopts modules or put a loop on the front like this:
1852
1853 while ($_ = $ARGV[0], /^-/) {
1854 shift;
1855 last if /^--$/;
1856 if (/^-D(.*)/) { $debug = $1 }
1857 if (/^-v/) { $verbose++ }
5a964f20 1858 # ... # other switches
a0d0e21e 1859 }
5a964f20 1860
a0d0e21e 1861 while (<>) {
5a964f20 1862 # ... # code for each line
a0d0e21e 1863 }
1864
35f2feb0 1865The <> symbol will return C<undef> for end-of-file only once.
19799a22 1866If you call it again after this, it will assume you are processing another
1867@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
a0d0e21e 1868
b159ebd3 1869If what the angle brackets contain is a simple scalar variable (e.g.,
35f2feb0 1870<$foo>), then that variable contains the name of the
19799a22 1871filehandle to input from, or its typeglob, or a reference to the
1872same. For example:
cb1a09d0 1873
1874 $fh = \*STDIN;
1875 $line = <$fh>;
a0d0e21e 1876
5a964f20 1877If what's within the angle brackets is neither a filehandle nor a simple
1878scalar variable containing a filehandle name, typeglob, or typeglob
1879reference, it is interpreted as a filename pattern to be globbed, and
1880either a list of filenames or the next filename in the list is returned,
19799a22 1881depending on context. This distinction is determined on syntactic
35f2feb0 1882grounds alone. That means C<< <$x> >> is always a readline() from
1883an indirect handle, but C<< <$hash{key}> >> is always a glob().
5a964f20 1884That's because $x is a simple scalar variable, but C<$hash{key}> is
1885not--it's a hash element.
1886
1887One level of double-quote interpretation is done first, but you can't
35f2feb0 1888say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20 1889in the previous paragraph. (In older versions of Perl, programmers
1890would insert curly brackets to force interpretation as a filename glob:
35f2feb0 1891C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 1892internal function directly as C<glob($foo)>, which is probably the right
19799a22 1893way to have done it in the first place.) For example:
a0d0e21e 1894
1895 while (<*.c>) {
1896 chmod 0644, $_;
1897 }
1898
3a4b19e4 1899is roughly equivalent to:
a0d0e21e 1900
1901 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1902 while (<FOO>) {
5b3eff12 1903 chomp;
a0d0e21e 1904 chmod 0644, $_;
1905 }
1906
3a4b19e4 1907except that the globbing is actually done internally using the standard
1908C<File::Glob> extension. Of course, the shortest way to do the above is:
a0d0e21e 1909
1910 chmod 0644, <*.c>;
1911
19799a22 1912A (file)glob evaluates its (embedded) argument only when it is
1913starting a new list. All values must be read before it will start
1914over. In list context, this isn't important because you automatically
1915get them all anyway. However, in scalar context the operator returns
069e01df 1916the next value each time it's called, or C<undef> when the list has
19799a22 1917run out. As with filehandle reads, an automatic C<defined> is
1918generated when the glob occurs in the test part of a C<while>,
1919because legal glob returns (e.g. a file called F<0>) would otherwise
1920terminate the loop. Again, C<undef> is returned only once. So if
1921you're expecting a single value from a glob, it is much better to
1922say
4633a7c4 1923
1924 ($file) = <blurch*>;
1925
1926than
1927
1928 $file = <blurch*>;
1929
1930because the latter will alternate between returning a filename and
19799a22 1931returning false.
4633a7c4 1932
b159ebd3 1933If you're trying to do variable interpolation, it's definitely better
4633a7c4 1934to use the glob() function, because the older notation can cause people
e37d713d 1935to become confused with the indirect filehandle notation.
4633a7c4 1936
1937 @files = glob("$dir/*.[ch]");
1938 @files = glob($files[$i]);
1939
a0d0e21e 1940=head2 Constant Folding
1941
1942Like C, Perl does a certain amount of expression evaluation at
19799a22 1943compile time whenever it determines that all arguments to an
a0d0e21e 1944operator are static and have no side effects. In particular, string
1945concatenation happens at compile time between literals that don't do
19799a22 1946variable substitution. Backslash interpolation also happens at
a0d0e21e 1947compile time. You can say
1948
1949 'Now is the time for all' . "\n" .
1950 'good men to come to.'
1951
54310121 1952and this all reduces to one string internally. Likewise, if
a0d0e21e 1953you say
1954
1955 foreach $file (@filenames) {
5a964f20 1956 if (-s $file > 5 + 100 * 2**16) { }
54310121 1957 }
a0d0e21e 1958
19799a22 1959the compiler will precompute the number which that expression
1960represents so that the interpreter won't have to.
a0d0e21e 1961
2c268ad5 1962=head2 Bitwise String Operators
1963
1964Bitstrings of any size may be manipulated by the bitwise operators
1965(C<~ | & ^>).
1966
19799a22 1967If the operands to a binary bitwise op are strings of different
1968sizes, B<|> and B<^> ops act as though the shorter operand had
1969additional zero bits on the right, while the B<&> op acts as though
1970the longer operand were truncated to the length of the shorter.
1971The granularity for such extension or truncation is one or more
1972bytes.
2c268ad5 1973
1974 # ASCII-based examples
1975 print "j p \n" ^ " a h"; # prints "JAPH\n"
1976 print "JA" | " ph\n"; # prints "japh\n"
1977 print "japh\nJunk" & '_____'; # prints "JAPH\n";
1978 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
1979
19799a22 1980If you are intending to manipulate bitstrings, be certain that
2c268ad5 1981you're supplying bitstrings: If an operand is a number, that will imply
19799a22 1982a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5 1983operation you intend by using C<""> or C<0+>, as in the examples below.
1984
1985 $foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF)
1986 $foo = '150' | 105 ; # yields 255
1987 $foo = 150 | '105'; # yields 255
1988 $foo = '150' | '105'; # yields string '155' (under ASCII)
1989
1990 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
1991 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 1992
1ae175c8 1993See L<perlfunc/vec> for information on how to manipulate individual bits
1994in a bit vector.
1995
55497cff 1996=head2 Integer Arithmetic
a0d0e21e 1997
19799a22 1998By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e 1999floating point. But by saying
2000
2001 use integer;
2002
2003you may tell the compiler that it's okay to use integer operations
19799a22 2004(if it feels like it) from here to the end of the enclosing BLOCK.
2005An inner BLOCK may countermand this by saying
a0d0e21e 2006
2007 no integer;
2008
19799a22 2009which lasts until the end of that BLOCK. Note that this doesn't
2010mean everything is only an integer, merely that Perl may use integer
2011operations if it is so inclined. For example, even under C<use
2012integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2013or so.
2014
2015Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
13a2d996 2016and ">>") always produce integral results. (But see also
2017L<Bitwise String Operators>.) However, C<use integer> still has meaning for
19799a22 2018them. By default, their results are interpreted as unsigned integers, but
2019if C<use integer> is in effect, their results are interpreted
2020as signed integers. For example, C<~0> usually evaluates to a large
2021integral value. However, C<use integer; ~0> is C<-1> on twos-complement
2022machines.
68dc0745 2023
2024=head2 Floating-point Arithmetic
2025
2026While C<use integer> provides integer-only arithmetic, there is no
19799a22 2027analogous mechanism to provide automatic rounding or truncation to a
2028certain number of decimal places. For rounding to a certain number
2029of digits, sprintf() or printf() is usually the easiest route.
2030See L<perlfaq4>.
68dc0745 2031
5a964f20 2032Floating-point numbers are only approximations to what a mathematician
2033would call real numbers. There are infinitely more reals than floats,
2034so some corners must be cut. For example:
2035
2036 printf "%.20g\n", 123456789123456789;
2037 # produces 123456789123456784
2038
2039Testing for exact equality of floating-point equality or inequality is
2040not a good idea. Here's a (relatively expensive) work-around to compare
2041whether two floating-point numbers are equal to a particular number of
2042decimal places. See Knuth, volume II, for a more robust treatment of
2043this topic.
2044
2045 sub fp_equal {
2046 my ($X, $Y, $POINTS) = @_;
2047 my ($tX, $tY);
2048 $tX = sprintf("%.${POINTS}g", $X);
2049 $tY = sprintf("%.${POINTS}g", $Y);
2050 return $tX eq $tY;
2051 }
2052
68dc0745 2053The POSIX module (part of the standard perl distribution) implements
19799a22 2054ceil(), floor(), and other mathematical and trigonometric functions.
2055The Math::Complex module (part of the standard perl distribution)
2056defines mathematical functions that work on both the reals and the
2057imaginary numbers. Math::Complex not as efficient as POSIX, but
68dc0745 2058POSIX can't work with complex numbers.
2059
2060Rounding in financial applications can have serious implications, and
2061the rounding method used should be specified precisely. In these
2062cases, it probably pays not to trust whichever system rounding is
2063being used by Perl, but to instead implement the rounding function you
2064need yourself.
5a964f20 2065
2066=head2 Bigger Numbers
2067
2068The standard Math::BigInt and Math::BigFloat modules provide
19799a22 2069variable-precision arithmetic and overloaded operators, although
cd5c4fce 2070they're currently pretty slow. At the cost of some space and
19799a22 2071considerable speed, they avoid the normal pitfalls associated with
2072limited-precision representations.
5a964f20 2073
2074 use Math::BigInt;
2075 $x = Math::BigInt->new('123456789123456789');
2076 print $x * $x;
2077
2078 # prints +15241578780673678515622620750190521
19799a22 2079
cd5c4fce 2080There are several modules that let you calculate with (bound only by
2081memory and cpu-time) unlimited or fixed precision. There are also
2082some non-standard modules that provide faster implementations via
2083external C libraries.
2084
2085Here is a short, but incomplete summary:
2086
2087 Math::Fraction big, unlimited fractions like 9973 / 12967
2088 Math::String treat string sequences like numbers
2089 Math::FixedPrecision calculate with a fixed precision
2090 Math::Currency for currency calculations
2091 Bit::Vector manipulate bit vectors fast (uses C)
2092 Math::BigIntFast Bit::Vector wrapper for big numbers
2093 Math::Pari provides access to the Pari C library
2094 Math::BigInteger uses an external C library
2095 Math::Cephes uses external Cephes C library (no big numbers)
2096 Math::Cephes::Fraction fractions via the Cephes library
2097 Math::GMP another one using an external C library
2098
2099Choose wisely.
16070b82 2100
2101=cut