For 5.12: saner behaviour for `length`
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
d74e8afc 2X<operator>
a0d0e21e 3
4perlop - Perl operators and precedence
5
d042e63d 6=head1 DESCRIPTION
7
89d205f2 8=head2 Operator Precedence and Associativity
d74e8afc 9X<operator, precedence> X<precedence> X<associativity>
d042e63d 10
11Operator precedence and associativity work in Perl more or less like
12they do in mathematics.
13
14I<Operator precedence> means some operators are evaluated before
15others. For example, in C<2 + 4 * 5>, the multiplication has higher
16precedence so C<4 * 5> is evaluated first yielding C<2 + 20 ==
1722> and not C<6 * 5 == 30>.
18
19I<Operator associativity> defines what happens if a sequence of the
20same operators is used one after another: whether the evaluator will
21evaluate the left operations first or the right. For example, in C<8
22- 4 - 2>, subtraction is left associative so Perl evaluates the
23expression left to right. C<8 - 4> is evaluated first making the
24expression C<4 - 2 == 2> and not C<8 - 2 == 6>.
a0d0e21e 25
26Perl operators have the following associativity and precedence,
19799a22 27listed from highest precedence to lowest. Operators borrowed from
28C keep the same precedence relationship with each other, even where
29C's precedence is slightly screwy. (This makes learning Perl easier
30for C folks.) With very few exceptions, these all operate on scalar
31values only, not array values.
a0d0e21e 32
33 left terms and list operators (leftward)
34 left ->
35 nonassoc ++ --
36 right **
37 right ! ~ \ and unary + and -
54310121 38 left =~ !~
a0d0e21e 39 left * / % x
40 left + - .
41 left << >>
42 nonassoc named unary operators
43 nonassoc < > <= >= lt gt le ge
0d863452 44 nonassoc == != <=> eq ne cmp ~~
a0d0e21e 45 left &
46 left | ^
47 left &&
c963b151 48 left || //
137443ea 49 nonassoc .. ...
a0d0e21e 50 right ?:
51 right = += -= *= etc.
52 left , =>
53 nonassoc list operators (rightward)
a5f75d66 54 right not
a0d0e21e 55 left and
f23102e2 56 left or xor
a0d0e21e 57
58In the following sections, these operators are covered in precedence order.
59
5a964f20 60Many operators can be overloaded for objects. See L<overload>.
61
a0d0e21e 62=head2 Terms and List Operators (Leftward)
d74e8afc 63X<list operator> X<operator, list> X<term>
a0d0e21e 64
62c18ce2 65A TERM has the highest precedence in Perl. They include variables,
5f05dabc 66quote and quote-like operators, any expression in parentheses,
a0d0e21e 67and any function whose arguments are parenthesized. Actually, there
68aren't really functions in this sense, just list operators and unary
69operators behaving as functions because you put parentheses around
70the arguments. These are all documented in L<perlfunc>.
71
72If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
73is followed by a left parenthesis as the next token, the operator and
74arguments within parentheses are taken to be of highest precedence,
75just like a normal function call.
76
77In the absence of parentheses, the precedence of list operators such as
78C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 79whether you are looking at the left side or the right side of the operator.
a0d0e21e 80For example, in
81
82 @ary = (1, 3, sort 4, 2);
83 print @ary; # prints 1324
84
19799a22 85the commas on the right of the sort are evaluated before the sort,
86but the commas on the left are evaluated after. In other words,
87list operators tend to gobble up all arguments that follow, and
a0d0e21e 88then act like a simple TERM with regard to the preceding expression.
19799a22 89Be careful with parentheses:
a0d0e21e 90
91 # These evaluate exit before doing the print:
92 print($foo, exit); # Obviously not what you want.
93 print $foo, exit; # Nor is this.
94
95 # These do the print before evaluating exit:
96 (print $foo), exit; # This is what you want.
97 print($foo), exit; # Or this.
98 print ($foo), exit; # Or even this.
99
100Also note that
101
102 print ($foo & 255) + 1, "\n";
103
d042e63d 104probably doesn't do what you expect at first glance. The parentheses
105enclose the argument list for C<print> which is evaluated (printing
106the result of C<$foo & 255>). Then one is added to the return value
107of C<print> (usually 1). The result is something like this:
108
109 1 + 1, "\n"; # Obviously not what you meant.
110
111To do what you meant properly, you must write:
112
113 print(($foo & 255) + 1, "\n");
114
115See L<Named Unary Operators> for more discussion of this.
a0d0e21e 116
117Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
54310121 118well as subroutine and method calls, and the anonymous
a0d0e21e 119constructors C<[]> and C<{}>.
120
2ae324a7 121See also L<Quote and Quote-like Operators> toward the end of this section,
da87341d 122as well as L</"I/O Operators">.
a0d0e21e 123
124=head2 The Arrow Operator
d74e8afc 125X<arrow> X<dereference> X<< -> >>
a0d0e21e 126
35f2feb0 127"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22 128and C++. If the right side is either a C<[...]>, C<{...}>, or a
129C<(...)> subscript, then the left side must be either a hard or
130symbolic reference to an array, a hash, or a subroutine respectively.
131(Or technically speaking, a location capable of holding a hard
132reference, if it's an array or hash reference being used for
133assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 134
19799a22 135Otherwise, the right side is a method name or a simple scalar
136variable containing either the method name or a subroutine reference,
137and the left side must be either an object (a blessed reference)
138or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 139
5f05dabc 140=head2 Auto-increment and Auto-decrement
d74e8afc 141X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
a0d0e21e 142
d042e63d 143"++" and "--" work as in C. That is, if placed before a variable,
144they increment or decrement the variable by one before returning the
145value, and if placed after, increment or decrement after returning the
146value.
147
148 $i = 0; $j = 0;
149 print $i++; # prints 0
150 print ++$j; # prints 1
a0d0e21e 151
b033823e 152Note that just as in C, Perl doesn't define B<when> the variable is
89d205f2 153incremented or decremented. You just know it will be done sometime
b033823e 154before or after the value is returned. This also means that modifying
155a variable twice in the same statement will lead to undefined behaviour.
156Avoid statements like:
157
158 $i = $i ++;
159 print ++ $i + $i ++;
160
161Perl will not guarantee what the result of the above statements is.
162
54310121 163The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e 164you increment a variable that is numeric, or that has ever been used in
165a numeric context, you get a normal increment. If, however, the
5f05dabc 166variable has been used in only string contexts since it was set, and
5a964f20 167has a value that is not the empty string and matches the pattern
9c0670e1 168C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e 169character within its range, with carry:
170
171 print ++($foo = '99'); # prints '100'
172 print ++($foo = 'a0'); # prints 'a1'
173 print ++($foo = 'Az'); # prints 'Ba'
174 print ++($foo = 'zz'); # prints 'aaa'
175
6a61d433 176C<undef> is always treated as numeric, and in particular is changed
177to C<0> before incrementing (so that a post-increment of an undef value
178will return C<0> rather than C<undef>).
179
5f05dabc 180The auto-decrement operator is not magical.
a0d0e21e 181
182=head2 Exponentiation
d74e8afc 183X<**> X<exponentiation> X<power>
a0d0e21e 184
19799a22 185Binary "**" is the exponentiation operator. It binds even more
cb1a09d0 186tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
187implemented using C's pow(3) function, which actually works on doubles
188internally.)
a0d0e21e 189
190=head2 Symbolic Unary Operators
d74e8afc 191X<unary operator> X<operator, unary>
a0d0e21e 192
5f05dabc 193Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e 194precedence version of this.
d74e8afc 195X<!>
a0d0e21e 196
197Unary "-" performs arithmetic negation if the operand is numeric. If
198the operand is an identifier, a string consisting of a minus sign
199concatenated with the identifier is returned. Otherwise, if the string
200starts with a plus or minus, a string starting with the opposite sign
bff5667c 201is returned. One effect of these rules is that -bareword is equivalent
8705167b 202to the string "-bareword". If, however, the string begins with a
353c6505 203non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
06705523 204the string to a numeric and the arithmetic negation is performed. If the
205string cannot be cleanly converted to a numeric, Perl will give the warning
206B<Argument "the string" isn't numeric in negation (-) at ...>.
d74e8afc 207X<-> X<negation, arithmetic>
a0d0e21e 208
972b05a9 209Unary "~" performs bitwise negation, i.e., 1's complement. For
210example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and
211L<Bitwise String Operators>.) Note that the width of the result is
212platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
213bits wide on a 64-bit platform, so if you are expecting a certain bit
d042e63d 214width, remember to use the & operator to mask off the excess bits.
d74e8afc 215X<~> X<negation, binary>
a0d0e21e 216
217Unary "+" has no effect whatsoever, even on strings. It is useful
218syntactically for separating a function name from a parenthesized expression
219that would otherwise be interpreted as the complete list of function
5ba421f6 220arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
d74e8afc 221X<+>
a0d0e21e 222
19799a22 223Unary "\" creates a reference to whatever follows it. See L<perlreftut>
224and L<perlref>. Do not confuse this behavior with the behavior of
225backslash within a string, although both forms do convey the notion
226of protecting the next thing from interpolation.
d74e8afc 227X<\> X<reference> X<backslash>
a0d0e21e 228
229=head2 Binding Operators
d74e8afc 230X<binding> X<operator, binding> X<=~> X<!~>
a0d0e21e 231
c07a80fd 232Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 233search or modify the string $_ by default. This operator makes that kind
234of operation work on some other string. The right argument is a search
2c268ad5 235pattern, substitution, or transliteration. The left argument is what is
236supposed to be searched, substituted, or transliterated instead of the default
f8bab1e9 237$_. When used in scalar context, the return value generally indicates the
238success of the operation. Behavior in list context depends on the particular
89d205f2 239operator. See L</"Regexp Quote-Like Operators"> for details and
d7782e69 240L<perlretut> for examples using these operators.
f8bab1e9 241
242If the right argument is an expression rather than a search pattern,
2c268ad5 243substitution, or transliteration, it is interpreted as a search pattern at run
89d205f2 244time. Note that this means that its contents will be interpolated twice, so
245
246 '\\' =~ q'\\';
247
248is not ok, as the regex engine will end up trying to compile the
249pattern C<\>, which it will consider a syntax error.
a0d0e21e 250
251Binary "!~" is just like "=~" except the return value is negated in
252the logical sense.
253
254=head2 Multiplicative Operators
d74e8afc 255X<operator, multiplicative>
a0d0e21e 256
257Binary "*" multiplies two numbers.
d74e8afc 258X<*>
a0d0e21e 259
260Binary "/" divides two numbers.
d74e8afc 261X</> X<slash>
a0d0e21e 262
54310121 263Binary "%" computes the modulus of two numbers. Given integer
264operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
265C<$a> minus the largest multiple of C<$b> that is not greater than
266C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
267smallest multiple of C<$b> that is not less than C<$a> (i.e. the
89b4f0ad 268result will be less than or equal to zero). If the operands
4848a83b 269C<$a> and C<$b> are floating point values and the absolute value of
270C<$b> (that is C<abs($b)>) is less than C<(UV_MAX + 1)>, only
271the integer portion of C<$a> and C<$b> will be used in the operation
272(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
273If the absolute value of the right operand (C<abs($b)>) is greater than
274or equal to C<(UV_MAX + 1)>, "%" computes the floating-point remainder
275C<$r> in the equation C<($r = $a - $i*$b)> where C<$i> is a certain
276integer that makes C<$r> should have the same sign as the right operand
277C<$b> (B<not> as the left operand C<$a> like C function C<fmod()>)
278and the absolute value less than that of C<$b>.
0412d526 279Note that when C<use integer> is in scope, "%" gives you direct access
55d729e4 280to the modulus operator as implemented by your C compiler. This
281operator is not as well defined for negative operands, but it will
282execute faster.
d74e8afc 283X<%> X<remainder> X<modulus> X<mod>
55d729e4 284
62d10b70 285Binary "x" is the repetition operator. In scalar context or if the left
286operand is not enclosed in parentheses, it returns a string consisting
287of the left operand repeated the number of times specified by the right
288operand. In list context, if the left operand is enclosed in
3585017f 289parentheses or is a list formed by C<qw/STRING/>, it repeats the list.
290If the right operand is zero or negative, it returns an empty string
291or an empty list, depending on the context.
d74e8afc 292X<x>
a0d0e21e 293
294 print '-' x 80; # print row of dashes
295
296 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
297
298 @ones = (1) x 80; # a list of 80 1's
299 @ones = (5) x @ones; # set all elements to 5
300
301
302=head2 Additive Operators
d74e8afc 303X<operator, additive>
a0d0e21e 304
305Binary "+" returns the sum of two numbers.
d74e8afc 306X<+>
a0d0e21e 307
308Binary "-" returns the difference of two numbers.
d74e8afc 309X<->
a0d0e21e 310
311Binary "." concatenates two strings.
d74e8afc 312X<string, concatenation> X<concatenation>
313X<cat> X<concat> X<concatenate> X<.>
a0d0e21e 314
315=head2 Shift Operators
d74e8afc 316X<shift operator> X<operator, shift> X<<< << >>>
317X<<< >> >>> X<right shift> X<left shift> X<bitwise shift>
318X<shl> X<shr> X<shift, right> X<shift, left>
a0d0e21e 319
55497cff 320Binary "<<" returns the value of its left argument shifted left by the
321number of bits specified by the right argument. Arguments should be
982ce180 322integers. (See also L<Integer Arithmetic>.)
a0d0e21e 323
55497cff 324Binary ">>" returns the value of its left argument shifted right by
325the number of bits specified by the right argument. Arguments should
982ce180 326be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 327
b16cf6df 328Note that both "<<" and ">>" in Perl are implemented directly using
329"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is
330in force then signed C integers are used, else unsigned C integers are
331used. Either way, the implementation isn't going to generate results
332larger than the size of the integer type Perl was built with (32 bits
333or 64 bits).
334
335The result of overflowing the range of the integers is undefined
336because it is undefined also in C. In other words, using 32-bit
337integers, C<< 1 << 32 >> is undefined. Shifting by a negative number
338of bits is also undefined.
339
a0d0e21e 340=head2 Named Unary Operators
d74e8afc 341X<operator, named unary>
a0d0e21e 342
343The various named unary operators are treated as functions with one
568e6d8b 344argument, with optional parentheses.
a0d0e21e 345
346If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
347is followed by a left parenthesis as the next token, the operator and
348arguments within parentheses are taken to be of highest precedence,
3981b0eb 349just like a normal function call. For example,
350because named unary operators are higher precedence than ||:
a0d0e21e 351
352 chdir $foo || die; # (chdir $foo) || die
353 chdir($foo) || die; # (chdir $foo) || die
354 chdir ($foo) || die; # (chdir $foo) || die
355 chdir +($foo) || die; # (chdir $foo) || die
356
3981b0eb 357but, because * is higher precedence than named operators:
a0d0e21e 358
359 chdir $foo * 20; # chdir ($foo * 20)
360 chdir($foo) * 20; # (chdir $foo) * 20
361 chdir ($foo) * 20; # (chdir $foo) * 20
362 chdir +($foo) * 20; # chdir ($foo * 20)
363
364 rand 10 * 20; # rand (10 * 20)
365 rand(10) * 20; # (rand 10) * 20
366 rand (10) * 20; # (rand 10) * 20
367 rand +(10) * 20; # rand (10 * 20)
368
568e6d8b 369Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
370treated like named unary operators, but they don't follow this functional
371parenthesis rule. That means, for example, that C<-f($file).".bak"> is
372equivalent to C<-f "$file.bak">.
d74e8afc 373X<-X> X<filetest> X<operator, filetest>
568e6d8b 374
5ba421f6 375See also L<"Terms and List Operators (Leftward)">.
a0d0e21e 376
377=head2 Relational Operators
d74e8afc 378X<relational operator> X<operator, relational>
a0d0e21e 379
35f2feb0 380Binary "<" returns true if the left argument is numerically less than
a0d0e21e 381the right argument.
d74e8afc 382X<< < >>
a0d0e21e 383
35f2feb0 384Binary ">" returns true if the left argument is numerically greater
a0d0e21e 385than the right argument.
d74e8afc 386X<< > >>
a0d0e21e 387
35f2feb0 388Binary "<=" returns true if the left argument is numerically less than
a0d0e21e 389or equal to the right argument.
d74e8afc 390X<< <= >>
a0d0e21e 391
35f2feb0 392Binary ">=" returns true if the left argument is numerically greater
a0d0e21e 393than or equal to the right argument.
d74e8afc 394X<< >= >>
a0d0e21e 395
396Binary "lt" returns true if the left argument is stringwise less than
397the right argument.
d74e8afc 398X<< lt >>
a0d0e21e 399
400Binary "gt" returns true if the left argument is stringwise greater
401than the right argument.
d74e8afc 402X<< gt >>
a0d0e21e 403
404Binary "le" returns true if the left argument is stringwise less than
405or equal to the right argument.
d74e8afc 406X<< le >>
a0d0e21e 407
408Binary "ge" returns true if the left argument is stringwise greater
409than or equal to the right argument.
d74e8afc 410X<< ge >>
a0d0e21e 411
412=head2 Equality Operators
d74e8afc 413X<equality> X<equal> X<equals> X<operator, equality>
a0d0e21e 414
415Binary "==" returns true if the left argument is numerically equal to
416the right argument.
d74e8afc 417X<==>
a0d0e21e 418
419Binary "!=" returns true if the left argument is numerically not equal
420to the right argument.
d74e8afc 421X<!=>
a0d0e21e 422
35f2feb0 423Binary "<=>" returns -1, 0, or 1 depending on whether the left
6ee5d4e7 424argument is numerically less than, equal to, or greater than the right
d4ad863d 425argument. If your platform supports NaNs (not-a-numbers) as numeric
7d3a9d88 426values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
427"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
428returns true, as does NaN != anything else. If your platform doesn't
429support NaNs then NaN is just a string with numeric value 0.
d74e8afc 430X<< <=> >> X<spaceship>
7d3a9d88 431
2b54f59f 432 perl -le '$a = "NaN"; print "No NaN support here" if $a == $a'
433 perl -le '$a = "NaN"; print "NaN support here" if $a != $a'
a0d0e21e 434
435Binary "eq" returns true if the left argument is stringwise equal to
436the right argument.
d74e8afc 437X<eq>
a0d0e21e 438
439Binary "ne" returns true if the left argument is stringwise not equal
440to the right argument.
d74e8afc 441X<ne>
a0d0e21e 442
d4ad863d 443Binary "cmp" returns -1, 0, or 1 depending on whether the left
444argument is stringwise less than, equal to, or greater than the right
445argument.
d74e8afc 446X<cmp>
a0d0e21e 447
0d863452 448Binary "~~" does a smart match between its arguments. Smart matching
0f7107a0 449is described in L<perlsyn/"Smart matching in detail">.
0d863452 450X<~~>
451
a034a98d 452"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
453by the current locale if C<use locale> is in effect. See L<perllocale>.
454
a0d0e21e 455=head2 Bitwise And
d74e8afc 456X<operator, bitwise, and> X<bitwise and> X<&>
a0d0e21e 457
2cdc098b 458Binary "&" returns its operands ANDed together bit by bit.
2c268ad5 459(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 460
2cdc098b 461Note that "&" has lower priority than relational operators, so for example
462the brackets are essential in a test like
463
464 print "Even\n" if ($x & 1) == 0;
465
a0d0e21e 466=head2 Bitwise Or and Exclusive Or
d74e8afc 467X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor>
468X<bitwise xor> X<^>
a0d0e21e 469
2cdc098b 470Binary "|" returns its operands ORed together bit by bit.
2c268ad5 471(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 472
2cdc098b 473Binary "^" returns its operands XORed together bit by bit.
2c268ad5 474(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 475
2cdc098b 476Note that "|" and "^" have lower priority than relational operators, so
477for example the brackets are essential in a test like
478
479 print "false\n" if (8 | 2) != 10;
480
a0d0e21e 481=head2 C-style Logical And
d74e8afc 482X<&&> X<logical and> X<operator, logical, and>
a0d0e21e 483
484Binary "&&" performs a short-circuit logical AND operation. That is,
485if the left operand is false, the right operand is not even evaluated.
486Scalar or list context propagates down to the right operand if it
487is evaluated.
488
489=head2 C-style Logical Or
d74e8afc 490X<||> X<operator, logical, or>
a0d0e21e 491
492Binary "||" performs a short-circuit logical OR operation. That is,
493if the left operand is true, the right operand is not even evaluated.
494Scalar or list context propagates down to the right operand if it
495is evaluated.
496
c963b151 497=head2 C-style Logical Defined-Or
d74e8afc 498X<//> X<operator, logical, defined-or>
c963b151 499
500Although it has no direct equivalent in C, Perl's C<//> operator is related
89d205f2 501to its C-style or. In fact, it's exactly the same as C<||>, except that it
c963b151 502tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
89d205f2 503is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
504rather than the value of C<defined($a)>) and is exactly equivalent to
c963b151 505C<defined($a) ? $a : $b>. This is very useful for providing default values
89d205f2 506for variables. If you actually want to test if at least one of C<$a> and
d042e63d 507C<$b> is defined, use C<defined($a // $b)>.
c963b151 508
d042e63d 509The C<||>, C<//> and C<&&> operators return the last value evaluated
510(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
511portable way to find out the home directory might be:
a0d0e21e 512
c963b151 513 $home = $ENV{'HOME'} // $ENV{'LOGDIR'} //
514 (getpwuid($<))[7] // die "You're homeless!\n";
a0d0e21e 515
5a964f20 516In particular, this means that you shouldn't use this
517for selecting between two aggregates for assignment:
518
519 @a = @b || @c; # this is wrong
520 @a = scalar(@b) || @c; # really meant this
521 @a = @b ? @b : @c; # this works fine, though
522
f23102e2 523As more readable alternatives to C<&&> and C<||> when used for
524control flow, Perl provides the C<and> and C<or> operators (see below).
525The short-circuit behavior is identical. The precedence of "and"
c963b151 526and "or" is much lower, however, so that you can safely use them after a
5a964f20 527list operator without the need for parentheses:
a0d0e21e 528
529 unlink "alpha", "beta", "gamma"
530 or gripe(), next LINE;
531
532With the C-style operators that would have been written like this:
533
534 unlink("alpha", "beta", "gamma")
535 || (gripe(), next LINE);
536
eeb6a2c9 537Using "or" for assignment is unlikely to do what you want; see below.
5a964f20 538
539=head2 Range Operators
d74e8afc 540X<operator, range> X<range> X<..> X<...>
a0d0e21e 541
542Binary ".." is the range operator, which is really two different
fb53bbb2 543operators depending on the context. In list context, it returns a
54ae734e 544list of values counting (up by ones) from the left value to the right
2cdbc966 545value. If the left value is greater than the right value then it
fb53bbb2 546returns the empty list. The range operator is useful for writing
54ae734e 547C<foreach (1..10)> loops and for doing slice operations on arrays. In
2cdbc966 548the current implementation, no temporary array is created when the
549range operator is used as the expression in C<foreach> loops, but older
550versions of Perl might burn a lot of memory when you write something
551like this:
a0d0e21e 552
553 for (1 .. 1_000_000) {
554 # code
54310121 555 }
a0d0e21e 556
54ae734e 557The range operator also works on strings, using the magical auto-increment,
558see below.
559
5a964f20 560In scalar context, ".." returns a boolean value. The operator is
a0d0e21e 561bistable, like a flip-flop, and emulates the line-range (comma) operator
562of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
563own boolean state. It is false as long as its left operand is false.
564Once the left operand is true, the range operator stays true until the
565right operand is true, I<AFTER> which the range operator becomes false
19799a22 566again. It doesn't become false till the next time the range operator is
a0d0e21e 567evaluated. It can test the right operand and become false on the same
568evaluation it became true (as in B<awk>), but it still returns true once.
19799a22 569If you don't want it to test the right operand till the next
570evaluation, as in B<sed>, just use three dots ("...") instead of
571two. In all other regards, "..." behaves just like ".." does.
572
573The right operand is not evaluated while the operator is in the
574"false" state, and the left operand is not evaluated while the
575operator is in the "true" state. The precedence is a little lower
576than || and &&. The value returned is either the empty string for
577false, or a sequence number (beginning with 1) for true. The
578sequence number is reset for each range encountered. The final
579sequence number in a range has the string "E0" appended to it, which
580doesn't affect its numeric value, but gives you something to search
581for if you want to exclude the endpoint. You can exclude the
582beginning point by waiting for the sequence number to be greater
df5f8116 583than 1.
584
585If either operand of scalar ".." is a constant expression,
586that operand is considered true if it is equal (C<==>) to the current
587input line number (the C<$.> variable).
588
589To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>,
590but that is only an issue if you use a floating point expression; when
591implicitly using C<$.> as described in the previous paragraph, the
592comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.>
593is set to a floating point value and you are not reading from a file.
594Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what
595you want in scalar context because each of the operands are evaluated
596using their integer representation.
597
598Examples:
a0d0e21e 599
600As a scalar operator:
601
df5f8116 602 if (101 .. 200) { print; } # print 2nd hundred lines, short for
603 # if ($. == 101 .. $. == 200) ...
9f10b797 604
605 next LINE if (1 .. /^$/); # skip header lines, short for
df5f8116 606 # ... if ($. == 1 .. /^$/);
9f10b797 607 # (typically in a loop labeled LINE)
608
609 s/^/> / if (/^$/ .. eof()); # quote body
a0d0e21e 610
5a964f20 611 # parse mail messages
612 while (<>) {
613 $in_header = 1 .. /^$/;
df5f8116 614 $in_body = /^$/ .. eof;
615 if ($in_header) {
616 # ...
617 } else { # in body
618 # ...
619 }
5a964f20 620 } continue {
df5f8116 621 close ARGV if eof; # reset $. each file
5a964f20 622 }
623
acf31ca5 624Here's a simple example to illustrate the difference between
625the two range operators:
626
627 @lines = (" - Foo",
628 "01 - Bar",
629 "1 - Baz",
630 " - Quux");
631
9f10b797 632 foreach (@lines) {
633 if (/0/ .. /1/) {
acf31ca5 634 print "$_\n";
635 }
636 }
637
9f10b797 638This program will print only the line containing "Bar". If
639the range operator is changed to C<...>, it will also print the
acf31ca5 640"Baz" line.
641
642And now some examples as a list operator:
a0d0e21e 643
644 for (101 .. 200) { print; } # print $_ 100 times
3e3baf6d 645 @foo = @foo[0 .. $#foo]; # an expensive no-op
a0d0e21e 646 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
647
5a964f20 648The range operator (in list context) makes use of the magical
5f05dabc 649auto-increment algorithm if the operands are strings. You
a0d0e21e 650can say
651
652 @alphabet = ('A' .. 'Z');
653
54ae734e 654to get all normal letters of the English alphabet, or
a0d0e21e 655
656 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
657
658to get a hexadecimal digit, or
659
660 @z2 = ('01' .. '31'); print $z2[$mday];
661
ea4f5703 662to get dates with leading zeros.
663
664If the final value specified is not in the sequence that the magical
665increment would produce, the sequence goes until the next value would
666be longer than the final value specified.
667
668If the initial value specified isn't part of a magical increment
669sequence (that is, a non-empty string matching "/^[a-zA-Z]*[0-9]*\z/"),
670only the initial value will be returned. So the following will only
671return an alpha:
672
673 use charnames 'greek';
674 my @greek_small = ("\N{alpha}" .. "\N{omega}");
675
676To get lower-case greek letters, use this instead:
677
678 my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") );
a0d0e21e 679
df5f8116 680Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
681return two elements in list context.
682
683 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
684
a0d0e21e 685=head2 Conditional Operator
d74e8afc 686X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
a0d0e21e 687
688Ternary "?:" is the conditional operator, just as in C. It works much
689like an if-then-else. If the argument before the ? is true, the
690argument before the : is returned, otherwise the argument after the :
cb1a09d0 691is returned. For example:
692
54310121 693 printf "I have %d dog%s.\n", $n,
cb1a09d0 694 ($n == 1) ? '' : "s";
695
696Scalar or list context propagates downward into the 2nd
54310121 697or 3rd argument, whichever is selected.
cb1a09d0 698
699 $a = $ok ? $b : $c; # get a scalar
700 @a = $ok ? @b : @c; # get an array
701 $a = $ok ? @b : @c; # oops, that's just a count!
702
703The operator may be assigned to if both the 2nd and 3rd arguments are
704legal lvalues (meaning that you can assign to them):
a0d0e21e 705
706 ($a_or_b ? $a : $b) = $c;
707
5a964f20 708Because this operator produces an assignable result, using assignments
709without parentheses will get you in trouble. For example, this:
710
711 $a % 2 ? $a += 10 : $a += 2
712
713Really means this:
714
715 (($a % 2) ? ($a += 10) : $a) += 2
716
717Rather than this:
718
719 ($a % 2) ? ($a += 10) : ($a += 2)
720
19799a22 721That should probably be written more simply as:
722
723 $a += ($a % 2) ? 10 : 2;
724
4633a7c4 725=head2 Assignment Operators
d74e8afc 726X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
5ac3b81c 727X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=>
d74e8afc 728X<%=> X<^=> X<x=>
a0d0e21e 729
730"=" is the ordinary assignment operator.
731
732Assignment operators work as in C. That is,
733
734 $a += 2;
735
736is equivalent to
737
738 $a = $a + 2;
739
740although without duplicating any side effects that dereferencing the lvalue
54310121 741might trigger, such as from tie(). Other assignment operators work similarly.
742The following are recognized:
a0d0e21e 743
744 **= += *= &= <<= &&=
9f10b797 745 -= /= |= >>= ||=
746 .= %= ^= //=
747 x=
a0d0e21e 748
19799a22 749Although these are grouped by family, they all have the precedence
a0d0e21e 750of assignment.
751
b350dd2f 752Unlike in C, the scalar assignment operator produces a valid lvalue.
753Modifying an assignment is equivalent to doing the assignment and
754then modifying the variable that was assigned to. This is useful
755for modifying a copy of something, like this:
a0d0e21e 756
757 ($tmp = $global) =~ tr [A-Z] [a-z];
758
759Likewise,
760
761 ($a += 2) *= 3;
762
763is equivalent to
764
765 $a += 2;
766 $a *= 3;
767
b350dd2f 768Similarly, a list assignment in list context produces the list of
769lvalues assigned to, and a list assignment in scalar context returns
770the number of elements produced by the expression on the right hand
771side of the assignment.
772
748a9306 773=head2 Comma Operator
d74e8afc 774X<comma> X<operator, comma> X<,>
a0d0e21e 775
5a964f20 776Binary "," is the comma operator. In scalar context it evaluates
a0d0e21e 777its left argument, throws that value away, then evaluates its right
778argument and returns that value. This is just like C's comma operator.
779
5a964f20 780In list context, it's just the list argument separator, and inserts
ed5c6d31 781both its arguments into the list. These arguments are also evaluated
782from left to right.
a0d0e21e 783
d042e63d 784The C<< => >> operator is a synonym for the comma, but forces any word
719b43e8 785(consisting entirely of word characters) to its left to be interpreted
a44e5664 786as a string (as of 5.001). This includes words that might otherwise be
787considered a constant or function call.
788
789 use constant FOO => "something";
790
791 my %h = ( FOO => 23 );
792
793is equivalent to:
794
795 my %h = ("FOO", 23);
796
797It is I<NOT>:
798
799 my %h = ("something", 23);
800
801If the argument on the left is not a word, it is first interpreted as
802an expression, and then the string value of that is used.
719b43e8 803
804The C<< => >> operator is helpful in documenting the correspondence
805between keys and values in hashes, and other paired elements in lists.
748a9306 806
a44e5664 807 %hash = ( $key => $value );
808 login( $username => $password );
809
a0d0e21e 810=head2 List Operators (Rightward)
d74e8afc 811X<operator, list, rightward> X<list operator>
a0d0e21e 812
813On the right side of a list operator, it has very low precedence,
814such that it controls all comma-separated expressions found there.
815The only operators with lower precedence are the logical operators
816"and", "or", and "not", which may be used to evaluate calls to list
817operators without the need for extra parentheses:
818
819 open HANDLE, "filename"
820 or die "Can't open: $!\n";
821
5ba421f6 822See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e 823
824=head2 Logical Not
d74e8afc 825X<operator, logical, not> X<not>
a0d0e21e 826
827Unary "not" returns the logical negation of the expression to its right.
828It's the equivalent of "!" except for the very low precedence.
829
830=head2 Logical And
d74e8afc 831X<operator, logical, and> X<and>
a0d0e21e 832
833Binary "and" returns the logical conjunction of the two surrounding
834expressions. It's equivalent to && except for the very low
5f05dabc 835precedence. This means that it short-circuits: i.e., the right
a0d0e21e 836expression is evaluated only if the left expression is true.
837
c963b151 838=head2 Logical or, Defined or, and Exclusive Or
f23102e2 839X<operator, logical, or> X<operator, logical, xor>
d74e8afc 840X<operator, logical, defined or> X<operator, logical, exclusive or>
f23102e2 841X<or> X<xor>
a0d0e21e 842
843Binary "or" returns the logical disjunction of the two surrounding
5a964f20 844expressions. It's equivalent to || except for the very low precedence.
845This makes it useful for control flow
846
847 print FH $data or die "Can't write to FH: $!";
848
849This means that it short-circuits: i.e., the right expression is evaluated
850only if the left expression is false. Due to its precedence, you should
851probably avoid using this for assignment, only for control flow.
852
853 $a = $b or $c; # bug: this is wrong
854 ($a = $b) or $c; # really means this
855 $a = $b || $c; # better written this way
856
19799a22 857However, when it's a list-context assignment and you're trying to use
5a964f20 858"||" for control flow, you probably need "or" so that the assignment
859takes higher precedence.
860
861 @info = stat($file) || die; # oops, scalar sense of stat!
862 @info = stat($file) or die; # better, now @info gets its due
863
c963b151 864Then again, you could always use parentheses.
865
a0d0e21e 866Binary "xor" returns the exclusive-OR of the two surrounding expressions.
867It cannot short circuit, of course.
868
869=head2 C Operators Missing From Perl
d74e8afc 870X<operator, missing from perl> X<&> X<*>
871X<typecasting> X<(TYPE)>
a0d0e21e 872
873Here is what C has that Perl doesn't:
874
875=over 8
876
877=item unary &
878
879Address-of operator. (But see the "\" operator for taking a reference.)
880
881=item unary *
882
54310121 883Dereference-address operator. (Perl's prefix dereferencing
a0d0e21e 884operators are typed: $, @, %, and &.)
885
886=item (TYPE)
887
19799a22 888Type-casting operator.
a0d0e21e 889
890=back
891
5f05dabc 892=head2 Quote and Quote-like Operators
89d205f2 893X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m>
d74e8afc 894X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>>
895X<escape sequence> X<escape>
896
a0d0e21e 897
898While we usually think of quotes as literal values, in Perl they
899function as operators, providing various kinds of interpolating and
900pattern matching capabilities. Perl provides customary quote characters
901for these behaviors, but also provides a way for you to choose your
902quote character for any of them. In the following table, a C<{}> represents
9f10b797 903any pair of delimiters you choose.
a0d0e21e 904
2c268ad5 905 Customary Generic Meaning Interpolates
906 '' q{} Literal no
907 "" qq{} Literal yes
af9219ee 908 `` qx{} Command yes*
2c268ad5 909 qw{} Word list no
af9219ee 910 // m{} Pattern match yes*
911 qr{} Pattern yes*
912 s{}{} Substitution yes*
2c268ad5 913 tr{}{} Transliteration no (but see below)
7e3b091d 914 <<EOF here-doc yes*
a0d0e21e 915
af9219ee 916 * unless the delimiter is ''.
917
87275199 918Non-bracketing delimiters use the same character fore and aft, but the four
919sorts of brackets (round, angle, square, curly) will all nest, which means
9f10b797 920that
87275199 921
9f10b797 922 q{foo{bar}baz}
35f2feb0 923
9f10b797 924is the same as
87275199 925
926 'foo{bar}baz'
927
928Note, however, that this does not always work for quoting Perl code:
929
930 $s = q{ if($a eq "}") ... }; # WRONG
931
83df6a1d 932is a syntax error. The C<Text::Balanced> module (from CPAN, and
933starting from Perl 5.8 part of the standard distribution) is able
934to do this properly.
87275199 935
19799a22 936There can be whitespace between the operator and the quoting
fb73857a 937characters, except when C<#> is being used as the quoting character.
19799a22 938C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
939operator C<q> followed by a comment. Its argument will be taken
940from the next line. This allows you to write:
fb73857a 941
942 s {foo} # Replace foo
943 {bar} # with bar.
944
904501ec 945The following escape sequences are available in constructs that interpolate
946and in transliterations.
d74e8afc 947X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N>
a0d0e21e 948
6ee5d4e7 949 \t tab (HT, TAB)
5a964f20 950 \n newline (NL)
6ee5d4e7 951 \r return (CR)
952 \f form feed (FF)
953 \b backspace (BS)
954 \a alarm (bell) (BEL)
955 \e escape (ESC)
ee9f418e 956 \033 octal char (example: ESC)
957 \x1b hex char (example: ESC)
958 \x{263a} wide hex char (example: SMILEY)
959 \c[ control char (example: ESC)
95cc3e0c 960 \N{name} named Unicode character
2c268ad5 961
ee9f418e 962The character following C<\c> is mapped to some other character by
963converting letters to upper case and then (on ASCII systems) by inverting
964the 7th bit (0x40). The most interesting range is from '@' to '_'
965(0x40 through 0x5F), resulting in a control character from 0x00
966through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only
967'@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting
968in 0x00 through 0x1F and 0x7F.
969
4c77eaa2 970B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for
ee9f418e 971the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>.
4c77eaa2 972
904501ec 973The following escape sequences are available in constructs that interpolate
974but not in transliterations.
d74e8afc 975X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
904501ec 976
a0d0e21e 977 \l lowercase next char
978 \u uppercase next char
979 \L lowercase till \E
980 \U uppercase till \E
981 \E end case modification
1d2dff63 982 \Q quote non-word characters till \E
a0d0e21e 983
95cc3e0c 984If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
985C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
986If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
987beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
988C<\U> is as defined by Unicode. For documentation of C<\N{name}>,
989see L<charnames>.
a034a98d 990
5a964f20 991All systems use the virtual C<"\n"> to represent a line terminator,
992called a "newline". There is no such thing as an unvarying, physical
19799a22 993newline character. It is only an illusion that the operating system,
5a964f20 994device drivers, C libraries, and Perl all conspire to preserve. Not all
995systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
996on a Mac, these are reversed, and on systems without line terminator,
997printing C<"\n"> may emit no actual data. In general, use C<"\n"> when
998you mean a "newline" for your system, but use the literal ASCII when you
999need an exact character. For example, most networking protocols expect
2a380090 1000and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20 1001and although they often accept just C<"\012">, they seldom tolerate just
1002C<"\015">. If you get in the habit of using C<"\n"> for networking,
1003you may be burned some day.
d74e8afc 1004X<newline> X<line terminator> X<eol> X<end of line>
1005X<\n> X<\r> X<\r\n>
5a964f20 1006
904501ec 1007For constructs that do interpolate, variables beginning with "C<$>"
1008or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a 1009C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
1010But method calls such as C<< $obj->meth >> are not.
af9219ee 1011
1012Interpolating an array or slice interpolates the elements in order,
1013separated by the value of C<$">, so is equivalent to interpolating
6deea57f 1014C<join $", @array>. "Punctuation" arrays such as C<@*> are only
1015interpolated if the name is enclosed in braces C<@{*}>, but special
1016arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces.
af9219ee 1017
89d205f2 1018You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
1019An unescaped C<$> or C<@> interpolates the corresponding variable,
1d2dff63 1020while escaping will cause the literal string C<\$> to be inserted.
89d205f2 1021You'll need to write something like C<m/\Quser\E\@\Qhost/>.
1d2dff63 1022
a0d0e21e 1023Patterns are subject to an additional level of interpretation as a
1024regular expression. This is done as a second pass, after variables are
1025interpolated, so that regular expressions may be incorporated into the
1026pattern from the variables. If this is not what you want, use C<\Q> to
1027interpolate a variable literally.
1028
19799a22 1029Apart from the behavior described above, Perl does not expand
1030multiple levels of interpolation. In particular, contrary to the
1031expectations of shell programmers, back-quotes do I<NOT> interpolate
1032within double quotes, nor do single quotes impede evaluation of
1033variables when used within double quotes.
a0d0e21e 1034
5f05dabc 1035=head2 Regexp Quote-Like Operators
d74e8afc 1036X<operator, regexp>
cb1a09d0 1037
5f05dabc 1038Here are the quote-like operators that apply to pattern
cb1a09d0 1039matching and related activities.
1040
a0d0e21e 1041=over 8
1042
87e95b7f 1043=item qr/STRING/msixpo
01c6f5f4 1044X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
a0d0e21e 1045
87e95b7f 1046This operator quotes (and possibly compiles) its I<STRING> as a regular
1047expression. I<STRING> is interpolated the same way as I<PATTERN>
1048in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
1049is done. Returns a Perl value which may be used instead of the
64c5a566 1050corresponding C</STRING/msixpo> expression. The returned value is a
85dd5c8b 1051normalized version of the original pattern. It magically differs from
64c5a566 1052a string containing the same characters: C<ref(qr/x/)> returns "Regexp",
85dd5c8b 1053even though dereferencing the result returns undef.
a0d0e21e 1054
87e95b7f 1055For example,
1056
1057 $rex = qr/my.STRING/is;
85dd5c8b 1058 print $rex; # prints (?si-xm:my.STRING)
87e95b7f 1059 s/$rex/foo/;
1060
1061is equivalent to
1062
1063 s/my.STRING/foo/is;
1064
1065The result may be used as a subpattern in a match:
1066
1067 $re = qr/$pattern/;
1068 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1069 $string =~ $re; # or used standalone
1070 $string =~ /$re/; # or this way
1071
1072Since Perl may compile the pattern at the moment of execution of qr()
1073operator, using qr() may have speed advantages in some situations,
1074notably if the result of qr() is used standalone:
1075
1076 sub match {
1077 my $patterns = shift;
1078 my @compiled = map qr/$_/i, @$patterns;
1079 grep {
1080 my $success = 0;
1081 foreach my $pat (@compiled) {
1082 $success = 1, last if /$pat/;
1083 }
1084 $success;
1085 } @_;
5a964f20 1086 }
1087
87e95b7f 1088Precompilation of the pattern into an internal representation at
1089the moment of qr() avoids a need to recompile the pattern every
1090time a match C</$pat/> is attempted. (Perl has many other internal
1091optimizations, but none would be triggered in the above example if
1092we did not use qr() operator.)
1093
1094Options are:
1095
1096 m Treat string as multiple lines.
1097 s Treat string as single line. (Make . match a newline)
1098 i Do case-insensitive pattern matching.
1099 x Use extended regular expressions.
1100 p When matching preserve a copy of the matched string so
1101 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
1102 o Compile pattern only once.
1103
1104If a precompiled pattern is embedded in a larger pattern then the effect
1105of 'msixp' will be propagated appropriately. The effect of the 'o'
1106modifier has is not propagated, being restricted to those patterns
1107explicitly using it.
1108
1109See L<perlre> for additional information on valid syntax for STRING, and
1110for a detailed look at the semantics of regular expressions.
a0d0e21e 1111
87e95b7f 1112=item m/PATTERN/msixpogc
89d205f2 1113X<m> X<operator, match>
1114X<regexp, options> X<regexp> X<regex, options> X<regex>
01c6f5f4 1115X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
a0d0e21e 1116
87e95b7f 1117=item /PATTERN/msixpogc
a0d0e21e 1118
5a964f20 1119Searches a string for a pattern match, and in scalar context returns
19799a22 1120true if it succeeds, false if it fails. If no string is specified
1121via the C<=~> or C<!~> operator, the $_ string is searched. (The
1122string specified with C<=~> need not be an lvalue--it may be the
1123result of an expression evaluation, but remember the C<=~> binds
1124rather tightly.) See also L<perlre>. See L<perllocale> for
1125discussion of additional considerations that apply when C<use locale>
1126is in effect.
a0d0e21e 1127
01c6f5f4 1128Options are as described in C<qr//>; in addition, the following match
1129process modifiers are available:
a0d0e21e 1130
cde0cee5 1131 g Match globally, i.e., find all occurrences.
1132 c Do not reset search position on a failed match when /g is in effect.
a0d0e21e 1133
1134If "/" is the delimiter then the initial C<m> is optional. With the C<m>
89d205f2 1135you can use any pair of non-alphanumeric, non-whitespace characters
19799a22 1136as delimiters. This is particularly useful for matching path names
1137that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
7bac28a0 1138the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
19799a22 1139If "'" is the delimiter, no interpolation is performed on the PATTERN.
a0d0e21e 1140
1141PATTERN may contain variables, which will be interpolated (and the
f70b4f9c 1142pattern recompiled) every time the pattern search is evaluated, except
1f247705 1143for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
1144C<$|> are not interpolated because they look like end-of-string tests.)
f70b4f9c 1145If you want such a pattern to be compiled only once, add a C</o> after
1146the trailing delimiter. This avoids expensive run-time recompilations,
1147and is useful when the value you are interpolating won't change over
1148the life of the script. However, mentioning C</o> constitutes a promise
1149that you won't change the variables in the pattern. If you change them,
01c6f5f4 1150Perl won't even notice. See also L<"qr/STRING/msixpo">.
a0d0e21e 1151
5a964f20 1152If the PATTERN evaluates to the empty string, the last
d65afb4b 1153I<successfully> matched regular expression is used instead. In this
1154case, only the C<g> and C<c> flags on the empty pattern is honoured -
1155the other flags are taken from the original pattern. If no match has
1156previously succeeded, this will (silently) act instead as a genuine
1157empty pattern (which will always match).
a0d0e21e 1158
89d205f2 1159Note that it's possible to confuse Perl into thinking C<//> (the empty
1160regex) is really C<//> (the defined-or operator). Perl is usually pretty
1161good about this, but some pathological cases might trigger this, such as
1162C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
1163(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
1164will assume you meant defined-or. If you meant the empty regex, just
1165use parentheses or spaces to disambiguate, or even prefix the empty
c963b151 1166regex with an C<m> (so C<//> becomes C<m//>).
1167
19799a22 1168If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 1169list consisting of the subexpressions matched by the parentheses in the
f7e33566 1170pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
1171also set, and that this differs from Perl 4's behavior.) When there are
1172no parentheses in the pattern, the return value is the list C<(1)> for
1173success. With or without parentheses, an empty list is returned upon
1174failure.
a0d0e21e 1175
1176Examples:
1177
1178 open(TTY, '/dev/tty');
1179 <TTY> =~ /^y/i && foo(); # do foo if desired
1180
1181 if (/Version: *([0-9.]*)/) { $version = $1; }
1182
1183 next if m#^/usr/spool/uucp#;
1184
1185 # poor man's grep
1186 $arg = shift;
1187 while (<>) {
1188 print if /$arg/o; # compile only once
1189 }
1190
1191 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
1192
1193This last example splits $foo into the first two words and the
5f05dabc 1194remainder of the line, and assigns those three fields to $F1, $F2, and
1195$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e 1196the pattern matched.
1197
19799a22 1198The C</g> modifier specifies global pattern matching--that is,
1199matching as many times as possible within the string. How it behaves
1200depends on the context. In list context, it returns a list of the
1201substrings matched by any capturing parentheses in the regular
1202expression. If there are no parentheses, it returns a list of all
1203the matched strings, as if there were parentheses around the whole
1204pattern.
a0d0e21e 1205
7e86de3e 1206In scalar context, each execution of C<m//g> finds the next match,
19799a22 1207returning true if it matches, and false if there is no further match.
7e86de3e 1208The position after the last match can be read or set using the pos()
1209function; see L<perlfunc/pos>. A failed match normally resets the
1210search position to the beginning of the string, but you can avoid that
1211by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
1212string also resets the search position.
c90c0ff4 1213
1214You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
1215zero-width assertion that matches the exact position where the previous
5d43e42d 1216C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
1217still anchors at pos(), but the match is of course only attempted once.
1218Using C<\G> without C</g> on a target string that has not previously had a
1219C</g> match applied to it is the same as using the C<\A> assertion to match
fe4b3f22 1220the beginning of the string. Note also that, currently, C<\G> is only
1221properly supported when anchored at the very beginning of the pattern.
c90c0ff4 1222
1223Examples:
a0d0e21e 1224
1225 # list context
1226 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1227
1228 # scalar context
5d43e42d 1229 $/ = "";
19799a22 1230 while (defined($paragraph = <>)) {
1231 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
1232 $sentences++;
a0d0e21e 1233 }
1234 }
1235 print "$sentences\n";
1236
c90c0ff4 1237 # using m//gc with \G
137443ea 1238 $_ = "ppooqppqq";
44a8e56a 1239 while ($i++ < 2) {
1240 print "1: '";
c90c0ff4 1241 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 1242 print "2: '";
c90c0ff4 1243 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 1244 print "3: '";
c90c0ff4 1245 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 1246 }
5d43e42d 1247 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a 1248
1249The last example should print:
1250
1251 1: 'oo', pos=4
137443ea 1252 2: 'q', pos=5
44a8e56a 1253 3: 'pp', pos=7
1254 1: '', pos=7
137443ea 1255 2: 'q', pos=8
1256 3: '', pos=8
5d43e42d 1257 Final: 'q', pos=8
1258
1259Notice that the final match matched C<q> instead of C<p>, which a match
1260without the C<\G> anchor would have done. Also note that the final match
1261did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
1262final match did indeed match C<p>, it's a good bet that you're running an
1263older (pre-5.6.0) Perl.
44a8e56a 1264
c90c0ff4 1265A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 1266combine several regexps like this to process a string part-by-part,
c90c0ff4 1267doing different actions depending on which regexp matched. Each
1268regexp tries to match where the previous one leaves off.
e7ea3e70 1269
3fe9a6f1 1270 $_ = <<'EOL';
63acfd00 1271 $url = URI::URL->new( "http://www/" ); die if $url eq "xXx";
3fe9a6f1 1272 EOL
1273 LOOP:
e7ea3e70 1274 {
c90c0ff4 1275 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1276 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
1277 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
1278 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
1279 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
1280 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
1281 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
e7ea3e70 1282 print ". That's all!\n";
1283 }
1284
1285Here is the output (split into several lines):
1286
1287 line-noise lowercase line-noise lowercase UPPERCASE line-noise
1288 UPPERCASE line-noise lowercase line-noise lowercase line-noise
1289 lowercase lowercase line-noise lowercase lowercase line-noise
1290 MiXeD line-noise. That's all!
44a8e56a 1291
87e95b7f 1292=item ?PATTERN?
1293X<?>
1294
1295This is just like the C</pattern/> search, except that it matches only
1296once between calls to the reset() operator. This is a useful
1297optimization when you want to see only the first occurrence of
1298something in each file of a set of files, for instance. Only C<??>
1299patterns local to the current package are reset.
1300
1301 while (<>) {
1302 if (?^$?) {
1303 # blank line between header and body
1304 }
1305 } continue {
1306 reset if eof; # clear ?? status for next file
1307 }
1308
1309This usage is vaguely deprecated, which means it just might possibly
1310be removed in some distant future version of Perl, perhaps somewhere
1311around the year 2168.
1312
1313=item s/PATTERN/REPLACEMENT/msixpogce
1314X<substitute> X<substitution> X<replace> X<regexp, replace>
01c6f5f4 1315X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e>
87e95b7f 1316
1317Searches a string for a pattern, and if found, replaces that pattern
1318with the replacement text and returns the number of substitutions
1319made. Otherwise it returns false (specifically, the empty string).
1320
1321If no string is specified via the C<=~> or C<!~> operator, the C<$_>
1322variable is searched and modified. (The string specified with C<=~> must
1323be scalar variable, an array element, a hash element, or an assignment
1324to one of those, i.e., an lvalue.)
1325
1326If the delimiter chosen is a single quote, no interpolation is
1327done on either the PATTERN or the REPLACEMENT. Otherwise, if the
1328PATTERN contains a $ that looks like a variable rather than an
1329end-of-string test, the variable will be interpolated into the pattern
1330at run-time. If you want the pattern compiled only once the first time
1331the variable is interpolated, use the C</o> option. If the pattern
1332evaluates to the empty string, the last successfully executed regular
1333expression is used instead. See L<perlre> for further explanation on these.
1334See L<perllocale> for discussion of additional considerations that apply
1335when C<use locale> is in effect.
1336
1337Options are as with m// with the addition of the following replacement
1338specific options:
1339
1340 e Evaluate the right side as an expression.
1341 ee Evaluate the right side as a string then eval the result
1342
1343Any non-alphanumeric, non-whitespace delimiter may replace the
1344slashes. If single quotes are used, no interpretation is done on the
1345replacement string (the C</e> modifier overrides this, however). Unlike
1346Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
1347text is not evaluated as a command. If the
1348PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
1349pair of quotes, which may or may not be bracketing quotes, e.g.,
1350C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
1351replacement portion to be treated as a full-fledged Perl expression
1352and evaluated right then and there. It is, however, syntax checked at
1353compile-time. A second C<e> modifier will cause the replacement portion
1354to be C<eval>ed before being run as a Perl expression.
1355
1356Examples:
1357
1358 s/\bgreen\b/mauve/g; # don't change wintergreen
1359
1360 $path =~ s|/usr/bin|/usr/local/bin|;
1361
1362 s/Login: $foo/Login: $bar/; # run-time pattern
1363
1364 ($foo = $bar) =~ s/this/that/; # copy first, then change
1365
1366 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
1367
1368 $_ = 'abc123xyz';
1369 s/\d+/$&*2/e; # yields 'abc246xyz'
1370 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1371 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1372
1373 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1374 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
1375 s/^=(\w+)/pod($1)/ge; # use function call
1376
1377 # expand variables in $_, but dynamics only, using
1378 # symbolic dereferencing
1379 s/\$(\w+)/${$1}/g;
1380
1381 # Add one to the value of any numbers in the string
1382 s/(\d+)/1 + $1/eg;
1383
1384 # This will expand any embedded scalar variable
1385 # (including lexicals) in $_ : First $1 is interpolated
1386 # to the variable name, and then evaluated
1387 s/(\$\w+)/$1/eeg;
1388
1389 # Delete (most) C comments.
1390 $program =~ s {
1391 /\* # Match the opening delimiter.
1392 .*? # Match a minimal number of characters.
1393 \*/ # Match the closing delimiter.
1394 } []gsx;
1395
1396 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively
1397
1398 for ($variable) { # trim whitespace in $variable, cheap
1399 s/^\s+//;
1400 s/\s+$//;
1401 }
1402
1403 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1404
1405Note the use of $ instead of \ in the last example. Unlike
1406B<sed>, we use the \<I<digit>> form in only the left hand side.
1407Anywhere else it's $<I<digit>>.
1408
1409Occasionally, you can't use just a C</g> to get all the changes
1410to occur that you might want. Here are two common cases:
1411
1412 # put commas in the right places in an integer
1413 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
1414
1415 # expand tabs to 8-column spacing
1416 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1417
1418=back
1419
1420=head2 Quote-Like Operators
1421X<operator, quote-like>
1422
01c6f5f4 1423=over 4
1424
a0d0e21e 1425=item q/STRING/
5d44bfff 1426X<q> X<quote, single> X<'> X<''>
a0d0e21e 1427
5d44bfff 1428=item 'STRING'
a0d0e21e 1429
19799a22 1430A single-quoted, literal string. A backslash represents a backslash
68dc0745 1431unless followed by the delimiter or another backslash, in which case
1432the delimiter or backslash is interpolated.
a0d0e21e 1433
1434 $foo = q!I said, "You said, 'She said it.'"!;
1435 $bar = q('This is it.');
68dc0745 1436 $baz = '\n'; # a two-character string
a0d0e21e 1437
1438=item qq/STRING/
d74e8afc 1439X<qq> X<quote, double> X<"> X<"">
a0d0e21e 1440
1441=item "STRING"
1442
1443A double-quoted, interpolated string.
1444
1445 $_ .= qq
1446 (*** The previous line contains the naughty word "$1".\n)
19799a22 1447 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 1448 $baz = "\n"; # a one-character string
a0d0e21e 1449
1450=item qx/STRING/
d74e8afc 1451X<qx> X<`> X<``> X<backtick>
a0d0e21e 1452
1453=item `STRING`
1454
43dd4d21 1455A string which is (possibly) interpolated and then executed as a
1456system command with C</bin/sh> or its equivalent. Shell wildcards,
1457pipes, and redirections will be honored. The collected standard
1458output of the command is returned; standard error is unaffected. In
1459scalar context, it comes back as a single (potentially multi-line)
1460string, or undef if the command failed. In list context, returns a
1461list of lines (however you've defined lines with $/ or
1462$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
5a964f20 1463
1464Because backticks do not affect standard error, use shell file descriptor
1465syntax (assuming the shell supports this) if you care to address this.
1466To capture a command's STDERR and STDOUT together:
a0d0e21e 1467
5a964f20 1468 $output = `cmd 2>&1`;
1469
1470To capture a command's STDOUT but discard its STDERR:
1471
1472 $output = `cmd 2>/dev/null`;
1473
1474To capture a command's STDERR but discard its STDOUT (ordering is
1475important here):
1476
1477 $output = `cmd 2>&1 1>/dev/null`;
1478
1479To exchange a command's STDOUT and STDERR in order to capture the STDERR
1480but leave its STDOUT to come out the old STDERR:
1481
1482 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1483
1484To read both a command's STDOUT and its STDERR separately, it's easiest
2359510d 1485to redirect them separately to files, and then read from those files
1486when the program is done:
5a964f20 1487
2359510d 1488 system("program args 1>program.stdout 2>program.stderr");
5a964f20 1489
30398227 1490The STDIN filehandle used by the command is inherited from Perl's STDIN.
1491For example:
1492
1493 open BLAM, "blam" || die "Can't open: $!";
1494 open STDIN, "<&BLAM";
1495 print `sort`;
1496
1497will print the sorted contents of the file "blam".
1498
5a964f20 1499Using single-quote as a delimiter protects the command from Perl's
1500double-quote interpolation, passing it on to the shell instead:
1501
1502 $perl_info = qx(ps $$); # that's Perl's $$
1503 $shell_info = qx'ps $$'; # that's the new shell's $$
1504
19799a22 1505How that string gets evaluated is entirely subject to the command
5a964f20 1506interpreter on your system. On most platforms, you will have to protect
1507shell metacharacters if you want them treated literally. This is in
1508practice difficult to do, as it's unclear how to escape which characters.
1509See L<perlsec> for a clean and safe example of a manual fork() and exec()
1510to emulate backticks safely.
a0d0e21e 1511
bb32b41a 1512On some platforms (notably DOS-like ones), the shell may not be
1513capable of dealing with multiline commands, so putting newlines in
1514the string may not get you what you want. You may be able to evaluate
1515multiple commands in a single line by separating them with the command
1516separator character, if your shell supports that (e.g. C<;> on many Unix
1517shells; C<&> on the Windows NT C<cmd> shell).
1518
0f897271 1519Beginning with v5.6.0, Perl will attempt to flush all files opened for
1520output before starting the child process, but this may not be supported
1521on some platforms (see L<perlport>). To be safe, you may need to set
1522C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1523C<IO::Handle> on any open handles.
1524
bb32b41a 1525Beware that some command shells may place restrictions on the length
1526of the command line. You must ensure your strings don't exceed this
1527limit after any necessary interpolations. See the platform-specific
1528release notes for more details about your particular environment.
1529
5a964f20 1530Using this operator can lead to programs that are difficult to port,
1531because the shell commands called vary between systems, and may in
1532fact not be present at all. As one example, the C<type> command under
1533the POSIX shell is very different from the C<type> command under DOS.
1534That doesn't mean you should go out of your way to avoid backticks
1535when they're the right way to get something done. Perl was made to be
1536a glue language, and one of the things it glues together is commands.
1537Just understand what you're getting yourself into.
bb32b41a 1538
da87341d 1539See L</"I/O Operators"> for more discussion.
a0d0e21e 1540
945c54fd 1541=item qw/STRING/
d74e8afc 1542X<qw> X<quote, list> X<quote, words>
945c54fd 1543
1544Evaluates to a list of the words extracted out of STRING, using embedded
1545whitespace as the word delimiters. It can be understood as being roughly
1546equivalent to:
1547
1548 split(' ', q/STRING/);
1549
efb1e162 1550the differences being that it generates a real list at compile time, and
1551in scalar context it returns the last element in the list. So
945c54fd 1552this expression:
1553
1554 qw(foo bar baz)
1555
1556is semantically equivalent to the list:
1557
1558 'foo', 'bar', 'baz'
1559
1560Some frequently seen examples:
1561
1562 use POSIX qw( setlocale localeconv )
1563 @EXPORT = qw( foo bar baz );
1564
1565A common mistake is to try to separate the words with comma or to
1566put comments into a multi-line C<qw>-string. For this reason, the
89d205f2 1567C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
945c54fd 1568produces warnings if the STRING contains the "," or the "#" character.
1569
a0d0e21e 1570
6940069f 1571=item tr/SEARCHLIST/REPLACEMENTLIST/cds
d74e8afc 1572X<tr> X<y> X<transliterate> X</c> X</d> X</s>
a0d0e21e 1573
6940069f 1574=item y/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1575
2c268ad5 1576Transliterates all occurrences of the characters found in the search list
a0d0e21e 1577with the corresponding character in the replacement list. It returns
1578the number of characters replaced or deleted. If no string is
2c268ad5 1579specified via the =~ or !~ operator, the $_ string is transliterated. (The
54310121 1580string specified with =~ must be a scalar variable, an array element, a
1581hash element, or an assignment to one of those, i.e., an lvalue.)
8ada0baa 1582
89d205f2 1583A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
2c268ad5 1584does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 1585For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
1586SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1587its own pair of quotes, which may or may not be bracketing quotes,
2c268ad5 1588e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
a0d0e21e 1589
cc255d5f 1590Note that C<tr> does B<not> do regular expression character classes
e0c83546 1591such as C<\d> or C<[:lower:]>. The C<tr> operator is not equivalent to
cc255d5f 1592the tr(1) utility. If you want to map strings between lower/upper
1593cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1594using the C<s> operator if you need regular expressions.
1595
8ada0baa 1596Note also that the whole range idea is rather unportable between
1597character sets--and even within character sets they may cause results
1598you probably didn't expect. A sound principle is to use only ranges
1599that begin from and end at either alphabets of equal case (a-e, A-E),
1600or digits (0-4). Anything else is unsafe. If in doubt, spell out the
1601character sets in full.
1602
a0d0e21e 1603Options:
1604
1605 c Complement the SEARCHLIST.
1606 d Delete found but unreplaced characters.
1607 s Squash duplicate replaced characters.
1608
19799a22 1609If the C</c> modifier is specified, the SEARCHLIST character set
1610is complemented. If the C</d> modifier is specified, any characters
1611specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1612(Note that this is slightly more flexible than the behavior of some
1613B<tr> programs, which delete anything they find in the SEARCHLIST,
1614period.) If the C</s> modifier is specified, sequences of characters
1615that were transliterated to the same character are squashed down
1616to a single instance of the character.
a0d0e21e 1617
1618If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1619exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
1620than the SEARCHLIST, the final character is replicated till it is long
5a964f20 1621enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
a0d0e21e 1622This latter is useful for counting characters in a class or for
1623squashing character sequences in a class.
1624
1625Examples:
1626
1627 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1628
1629 $cnt = tr/*/*/; # count the stars in $_
1630
1631 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1632
1633 $cnt = tr/0-9//; # count the digits in $_
1634
1635 tr/a-zA-Z//s; # bookkeeper -> bokeper
1636
1637 ($HOST = $host) =~ tr/a-z/A-Z/;
1638
1639 tr/a-zA-Z/ /cs; # change non-alphas to single space
1640
1641 tr [\200-\377]
1642 [\000-\177]; # delete 8th bit
1643
19799a22 1644If multiple transliterations are given for a character, only the
1645first one is used:
748a9306 1646
1647 tr/AAA/XYZ/
1648
2c268ad5 1649will transliterate any A to X.
748a9306 1650
19799a22 1651Because the transliteration table is built at compile time, neither
a0d0e21e 1652the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
19799a22 1653interpolation. That means that if you want to use variables, you
1654must use an eval():
a0d0e21e 1655
1656 eval "tr/$oldlist/$newlist/";
1657 die $@ if $@;
1658
1659 eval "tr/$oldlist/$newlist/, 1" or die $@;
1660
7e3b091d 1661=item <<EOF
d74e8afc 1662X<here-doc> X<heredoc> X<here-document> X<<< << >>>
7e3b091d 1663
1664A line-oriented form of quoting is based on the shell "here-document"
1665syntax. Following a C<< << >> you specify a string to terminate
1666the quoted material, and all lines following the current line down to
89d205f2 1667the terminating string are the value of the item.
1668
1669The terminating string may be either an identifier (a word), or some
1670quoted text. An unquoted identifier works like double quotes.
1671There may not be a space between the C<< << >> and the identifier,
1672unless the identifier is explicitly quoted. (If you put a space it
1673will be treated as a null identifier, which is valid, and matches the
1674first empty line.) The terminating string must appear by itself
1675(unquoted and with no surrounding whitespace) on the terminating line.
1676
1677If the terminating string is quoted, the type of quotes used determine
1678the treatment of the text.
1679
1680=over 4
1681
1682=item Double Quotes
1683
1684Double quotes indicate that the text will be interpolated using exactly
1685the same rules as normal double quoted strings.
7e3b091d 1686
1687 print <<EOF;
1688 The price is $Price.
1689 EOF
1690
1691 print << "EOF"; # same as above
1692 The price is $Price.
1693 EOF
1694
89d205f2 1695
1696=item Single Quotes
1697
1698Single quotes indicate the text is to be treated literally with no
1699interpolation of its content. This is similar to single quoted
1700strings except that backslashes have no special meaning, with C<\\>
1701being treated as two backslashes and not one as they would in every
1702other quoting construct.
1703
1704This is the only form of quoting in perl where there is no need
1705to worry about escaping content, something that code generators
1706can and do make good use of.
1707
1708=item Backticks
1709
1710The content of the here doc is treated just as it would be if the
1711string were embedded in backticks. Thus the content is interpolated
1712as though it were double quoted and then executed via the shell, with
1713the results of the execution returned.
1714
1715 print << `EOC`; # execute command and get results
7e3b091d 1716 echo hi there
7e3b091d 1717 EOC
1718
89d205f2 1719=back
1720
1721It is possible to stack multiple here-docs in a row:
1722
7e3b091d 1723 print <<"foo", <<"bar"; # you can stack them
1724 I said foo.
1725 foo
1726 I said bar.
1727 bar
1728
1729 myfunc(<< "THIS", 23, <<'THAT');
1730 Here's a line
1731 or two.
1732 THIS
1733 and here's another.
1734 THAT
1735
1736Just don't forget that you have to put a semicolon on the end
1737to finish the statement, as Perl doesn't know you're not going to
1738try to do this:
1739
1740 print <<ABC
1741 179231
1742 ABC
1743 + 20;
1744
872d7e53 1745If you want to remove the line terminator from your here-docs,
1746use C<chomp()>.
1747
1748 chomp($string = <<'END');
1749 This is a string.
1750 END
1751
1752If you want your here-docs to be indented with the rest of the code,
1753you'll need to remove leading whitespace from each line manually:
7e3b091d 1754
1755 ($quote = <<'FINIS') =~ s/^\s+//gm;
89d205f2 1756 The Road goes ever on and on,
7e3b091d 1757 down from the door where it began.
1758 FINIS
1759
1760If you use a here-doc within a delimited construct, such as in C<s///eg>,
1761the quoted material must come on the lines following the final delimiter.
1762So instead of
1763
1764 s/this/<<E . 'that'
1765 the other
1766 E
1767 . 'more '/eg;
1768
1769you have to write
1770
89d205f2 1771 s/this/<<E . 'that'
1772 . 'more '/eg;
1773 the other
1774 E
7e3b091d 1775
1776If the terminating identifier is on the last line of the program, you
1777must be sure there is a newline after it; otherwise, Perl will give the
1778warning B<Can't find string terminator "END" anywhere before EOF...>.
1779
89d205f2 1780Additionally, the quoting rules for the end of string identifier are not
1781related to Perl's quoting rules -- C<q()>, C<qq()>, and the like are not
1782supported in place of C<''> and C<"">, and the only interpolation is for
1783backslashing the quoting character:
7e3b091d 1784
1785 print << "abc\"def";
1786 testing...
1787 abc"def
1788
1789Finally, quoted strings cannot span multiple lines. The general rule is
1790that the identifier must be a string literal. Stick with that, and you
1791should be safe.
1792
a0d0e21e 1793=back
1794
75e14d17 1795=head2 Gory details of parsing quoted constructs
d74e8afc 1796X<quote, gory details>
75e14d17 1797
19799a22 1798When presented with something that might have several different
1799interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1800principle to pick the most probable interpretation. This strategy
1801is so successful that Perl programmers often do not suspect the
1802ambivalence of what they write. But from time to time, Perl's
1803notions differ substantially from what the author honestly meant.
1804
1805This section hopes to clarify how Perl handles quoted constructs.
1806Although the most common reason to learn this is to unravel labyrinthine
1807regular expressions, because the initial steps of parsing are the
1808same for all quoting operators, they are all discussed together.
1809
1810The most important Perl parsing rule is the first one discussed
1811below: when processing a quoted construct, Perl first finds the end
1812of that construct, then interprets its contents. If you understand
1813this rule, you may skip the rest of this section on the first
1814reading. The other rules are likely to contradict the user's
1815expectations much less frequently than this first one.
1816
1817Some passes discussed below are performed concurrently, but because
1818their results are the same, we consider them individually. For different
1819quoting constructs, Perl performs different numbers of passes, from
6deea57f 1820one to four, but these passes are always performed in the same order.
75e14d17 1821
13a2d996 1822=over 4
75e14d17 1823
1824=item Finding the end
1825
6deea57f 1826The first pass is finding the end of the quoted construct, where
1827the information about the delimiters is used in parsing.
1828During this search, text between the starting and ending delimiters
1829is copied to a safe location. The text copied gets delimiter-independent.
1830
1831If the construct is a here-doc, the ending delimiter is a line
1832that has a terminating string as the content. Therefore C<<<EOF> is
1833terminated by C<EOF> immediately followed by C<"\n"> and starting
1834from the first column of the terminating line.
1835When searching for the terminating line of a here-doc, nothing
1836is skipped. In other words, lines after the here-doc syntax
1837are compared with the terminating string line by line.
1838
1839For the constructs except here-docs, single characters are used as starting
1840and ending delimiters. If the starting delimiter is an opening punctuation
1841(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
1842corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
1843If the starting delimiter is an unpaired character like C</> or a closing
1844punctuation, the ending delimiter is same as the starting delimiter.
1845Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
1846C<qq[]> and C<qq]]> constructs.
1847
1848When searching for single-character delimiters, escaped delimiters
1849and C<\\> are skipped. For example, while searching for terminating C</>,
1850combinations of C<\\> and C<\/> are skipped. If the delimiters are
1851bracketing, nested pairs are also skipped. For example, while searching
1852for closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>,
1853and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well.
1854However, when backslashes are used as the delimiters (like C<qq\\> and
1855C<tr\\\>), nothing is skipped.
1856During the search for the end, backslashes that escape delimiters
1857are removed (exactly speaking, they are not copied to the safe location).
75e14d17 1858
19799a22 1859For constructs with three-part delimiters (C<s///>, C<y///>, and
1860C<tr///>), the search is repeated once more.
6deea57f 1861If the first delimiter is not an opening punctuation, three delimiters must
1862be same such as C<s!!!> and C<tr)))>, in which case the second delimiter
1863terminates the left part and starts the right part at once.
1864If the left part is delimited by bracketing punctuations (that is C<()>,
1865C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
1866delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespaces
1867and comments are allowed between both parts, though the comment must follow
1868at least one whitespace; otherwise a character expected as the start of
1869the comment may be regarded as the starting delimiter of the right part.
75e14d17 1870
19799a22 1871During this search no attention is paid to the semantics of the construct.
1872Thus:
75e14d17 1873
1874 "$hash{"$foo/$bar"}"
1875
2a94b7ce 1876or:
75e14d17 1877
89d205f2 1878 m/
2a94b7ce 1879 bar # NOT a comment, this slash / terminated m//!
75e14d17 1880 /x
1881
19799a22 1882do not form legal quoted expressions. The quoted part ends on the
1883first C<"> and C</>, and the rest happens to be a syntax error.
1884Because the slash that terminated C<m//> was followed by a C<SPACE>,
1885the example above is not C<m//x>, but rather C<m//> with no C</x>
1886modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 1887
89d205f2 1888Also no attention is paid to C<\c\> (multichar control char syntax) during
1889this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
1890of C<\/>, and the following C</> is not recognized as a delimiter.
0d594e51 1891Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
1892
75e14d17 1893=item Interpolation
d74e8afc 1894X<interpolation>
75e14d17 1895
19799a22 1896The next step is interpolation in the text obtained, which is now
89d205f2 1897delimiter-independent. There are multiple cases.
75e14d17 1898
13a2d996 1899=over 4
75e14d17 1900
89d205f2 1901=item C<<<'EOF'>
75e14d17 1902
1903No interpolation is performed.
6deea57f 1904Note that the combination C<\\> is left intact, since escaped delimiters
1905are not available for here-docs.
75e14d17 1906
6deea57f 1907=item C<m''>, the pattern of C<s'''>
89d205f2 1908
6deea57f 1909No interpolation is performed at this stage.
1910Any backslashed sequences including C<\\> are treated at the stage
1911to L</"parsing regular expressions">.
89d205f2 1912
6deea57f 1913=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''>
75e14d17 1914
89d205f2 1915The only interpolation is removal of C<\> from pairs of C<\\>.
6deea57f 1916Therefore C<-> in C<tr'''> and C<y'''> is treated literally
1917as a hyphen and no character range is available.
1918C<\1> in the replacement of C<s'''> does not work as C<$1>.
89d205f2 1919
1920=item C<tr///>, C<y///>
1921
6deea57f 1922No variable interpolation occurs. String modifying combinations for
1923case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized.
1924The other escape sequences such as C<\200> and C<\t> and backslashed
1925characters such as C<\\> and C<\-> are converted to appropriate literals.
89d205f2 1926The character C<-> is treated specially and therefore C<\-> is treated
1927as a literal C<->.
75e14d17 1928
89d205f2 1929=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
75e14d17 1930
19799a22 1931C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1932converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
1933is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
6deea57f 1934The other escape sequences such as C<\200> and C<\t> and backslashed
1935characters such as C<\\> and C<\-> are replaced with appropriate
1936expansions.
2a94b7ce 1937
19799a22 1938Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
1939is interpolated in the usual way. Something like C<"\Q\\E"> has
1940no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
1941result is the same as for C<"\\\\E">. As a general rule, backslashes
1942between C<\Q> and C<\E> may lead to counterintuitive results. So,
1943C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1944as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce 1945
1946 $str = '\t';
1947 return "\Q$str";
1948
1949may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1950
19799a22 1951Interpolated scalars and arrays are converted internally to the C<join> and
92d29cee 1952C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes:
75e14d17 1953
19799a22 1954 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 1955
19799a22 1956All operations above are performed simultaneously, left to right.
75e14d17 1957
19799a22 1958Because the result of C<"\Q STRING \E"> has all metacharacters
1959quoted, there is no way to insert a literal C<$> or C<@> inside a
1960C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
1961C<"\\\$">; if not, it is interpreted as the start of an interpolated
1962scalar.
75e14d17 1963
19799a22 1964Note also that the interpolation code needs to make a decision on
89d205f2 1965where the interpolated scalar ends. For instance, whether
35f2feb0 1966C<< "a $b -> {c}" >> really means:
75e14d17 1967
1968 "a " . $b . " -> {c}";
1969
2a94b7ce 1970or:
75e14d17 1971
1972 "a " . $b -> {c};
1973
19799a22 1974Most of the time, the longest possible text that does not include
1975spaces between components and which contains matching braces or
1976brackets. because the outcome may be determined by voting based
1977on heuristic estimators, the result is not strictly predictable.
1978Fortunately, it's usually correct for ambiguous cases.
75e14d17 1979
6deea57f 1980=item the replacement of C<s///>
75e14d17 1981
19799a22 1982Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
6deea57f 1983happens as with C<qq//> constructs.
1984
1985It is at this step that C<\1> is begrudgingly converted to C<$1> in
1986the replacement text of C<s///>, in order to correct the incorrigible
1987I<sed> hackers who haven't picked up the saner idiom yet. A warning
1988is emitted if the C<use warnings> pragma or the B<-w> command-line flag
1989(that is, the C<$^W> variable) was set.
1990
1991=item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1992
cc74c5bd 1993Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>,
1994and interpolation happens (almost) as with C<qq//> constructs.
1995
1996However any other combinations of C<\> followed by a character
1997are not substituted but only skipped, in order to parse them
1998as regular expressions at the following step.
6deea57f 1999As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly
1749ea0d 2000treated as an array symbol (for example C<@foo>),
6deea57f 2001even though the same text in C<qq//> gives interpolation of C<\c@>.
6deea57f 2002
2003Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
19799a22 2004a C<#>-comment in a C<//x>-regular expression, no processing is
2005performed whatsoever. This is the first step at which the presence
2006of the C<//x> modifier is relevant.
2007
1749ea0d 2008Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+>
2009and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are
2010voted (by several different estimators) to be either an array element
2011or C<$var> followed by an RE alternative. This is where the notation
19799a22 2012C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
2013array element C<-9>, not as a regular expression from the variable
2014C<$arr> followed by a digit, which would be the interpretation of
2015C</$arr[0-9]/>. Since voting among different estimators may occur,
2016the result is not predictable.
2017
19799a22 2018The lack of processing of C<\\> creates specific restrictions on
2019the post-processed text. If the delimiter is C</>, one cannot get
2020the combination C<\/> into the result of this step. C</> will
2021finish the regular expression, C<\/> will be stripped to C</> on
2022the previous step, and C<\\/> will be left as is. Because C</> is
2023equivalent to C<\/> inside a regular expression, this does not
2024matter unless the delimiter happens to be character special to the
2025RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
2026alphanumeric char, as in:
2a94b7ce 2027
2028 m m ^ a \s* b mmx;
2029
19799a22 2030In the RE above, which is intentionally obfuscated for illustration, the
6deea57f 2031delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the
89d205f2 2032RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one
19799a22 2033reason you're encouraged to restrict your delimiters to non-alphanumeric,
2034non-whitespace choices.
75e14d17 2035
2036=back
2037
19799a22 2038This step is the last one for all constructs except regular expressions,
75e14d17 2039which are processed further.
2040
6deea57f 2041=item parsing regular expressions
2042X<regexp, parse>
75e14d17 2043
19799a22 2044Previous steps were performed during the compilation of Perl code,
2045but this one happens at run time--although it may be optimized to
2046be calculated at compile time if appropriate. After preprocessing
6deea57f 2047described above, and possibly after evaluation if concatenation,
19799a22 2048joining, casing translation, or metaquoting are involved, the
2049resulting I<string> is passed to the RE engine for compilation.
2050
2051Whatever happens in the RE engine might be better discussed in L<perlre>,
2052but for the sake of continuity, we shall do so here.
2053
2054This is another step where the presence of the C<//x> modifier is
2055relevant. The RE engine scans the string from left to right and
2056converts it to a finite automaton.
2057
2058Backslashed characters are either replaced with corresponding
2059literal strings (as with C<\{>), or else they generate special nodes
2060in the finite automaton (as with C<\b>). Characters special to the
2061RE engine (such as C<|>) generate corresponding nodes or groups of
2062nodes. C<(?#...)> comments are ignored. All the rest is either
2063converted to literal strings to match, or else is ignored (as is
2064whitespace and C<#>-style comments if C<//x> is present).
2065
2066Parsing of the bracketed character class construct, C<[...]>, is
2067rather different than the rule used for the rest of the pattern.
2068The terminator of this construct is found using the same rules as
2069for finding the terminator of a C<{}>-delimited construct, the only
2070exception being that C<]> immediately following C<[> is treated as
2071though preceded by a backslash. Similarly, the terminator of
2072C<(?{...})> is found using the same rules as for finding the
2073terminator of a C<{}>-delimited construct.
2074
2075It is possible to inspect both the string given to RE engine and the
2076resulting finite automaton. See the arguments C<debug>/C<debugcolor>
2077in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 2078switch documented in L<perlrun/"Command Switches">.
75e14d17 2079
2080=item Optimization of regular expressions
d74e8afc 2081X<regexp, optimization>
75e14d17 2082
7522fed5 2083This step is listed for completeness only. Since it does not change
75e14d17 2084semantics, details of this step are not documented and are subject
19799a22 2085to change without notice. This step is performed over the finite
2086automaton that was generated during the previous pass.
2a94b7ce 2087
19799a22 2088It is at this stage that C<split()> silently optimizes C</^/> to
2089mean C</^/m>.
75e14d17 2090
2091=back
2092
a0d0e21e 2093=head2 I/O Operators
d74e8afc 2094X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
2095X<< <> >> X<@ARGV>
a0d0e21e 2096
54310121 2097There are several I/O operators you should know about.
fbad3eb5 2098
7b8d334a 2099A string enclosed by backticks (grave accents) first undergoes
19799a22 2100double-quote interpolation. It is then interpreted as an external
2101command, and the output of that command is the value of the
e9c56f9b 2102backtick string, like in a shell. In scalar context, a single string
2103consisting of all output is returned. In list context, a list of
2104values is returned, one per line of output. (You can set C<$/> to use
2105a different line terminator.) The command is executed each time the
2106pseudo-literal is evaluated. The status value of the command is
2107returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
2108Unlike in B<csh>, no translation is done on the return data--newlines
2109remain newlines. Unlike in any of the shells, single quotes do not
2110hide variable names in the command from interpretation. To pass a
2111literal dollar-sign through to the shell you need to hide it with a
2112backslash. The generalized form of backticks is C<qx//>. (Because
2113backticks always undergo shell expansion as well, see L<perlsec> for
2114security concerns.)
d74e8afc 2115X<qx> X<`> X<``> X<backtick> X<glob>
19799a22 2116
2117In scalar context, evaluating a filehandle in angle brackets yields
2118the next line from that file (the newline, if any, included), or
2119C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
2120(sometimes known as file-slurp mode) and the file is empty, it
2121returns C<''> the first time, followed by C<undef> subsequently.
2122
2123Ordinarily you must assign the returned value to a variable, but
2124there is one situation where an automatic assignment happens. If
2125and only if the input symbol is the only thing inside the conditional
2126of a C<while> statement (even if disguised as a C<for(;;)> loop),
2127the value is automatically assigned to the global variable $_,
2128destroying whatever was there previously. (This may seem like an
2129odd thing to you, but you'll use the construct in almost every Perl
17b829fa 2130script you write.) The $_ variable is not implicitly localized.
19799a22 2131You'll have to put a C<local $_;> before the loop if you want that
2132to happen.
2133
2134The following lines are equivalent:
a0d0e21e 2135
748a9306 2136 while (defined($_ = <STDIN>)) { print; }
7b8d334a 2137 while ($_ = <STDIN>) { print; }
a0d0e21e 2138 while (<STDIN>) { print; }
2139 for (;<STDIN>;) { print; }
748a9306 2140 print while defined($_ = <STDIN>);
7b8d334a 2141 print while ($_ = <STDIN>);
a0d0e21e 2142 print while <STDIN>;
2143
19799a22 2144This also behaves similarly, but avoids $_ :
7b8d334a 2145
89d205f2 2146 while (my $line = <STDIN>) { print $line }
7b8d334a 2147
19799a22 2148In these loop constructs, the assigned value (whether assignment
2149is automatic or explicit) is then tested to see whether it is
2150defined. The defined test avoids problems where line has a string
2151value that would be treated as false by Perl, for example a "" or
2152a "0" with no trailing newline. If you really mean for such values
2153to terminate the loop, they should be tested for explicitly:
7b8d334a 2154
2155 while (($_ = <STDIN>) ne '0') { ... }
2156 while (<STDIN>) { last unless $_; ... }
2157
35f2feb0 2158In other boolean contexts, C<< <I<filehandle>> >> without an
89d205f2 2159explicit C<defined> test or comparison elicit a warning if the
9f1b1f2d 2160C<use warnings> pragma or the B<-w>
19799a22 2161command-line switch (the C<$^W> variable) is in effect.
7b8d334a 2162
5f05dabc 2163The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22 2164filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
2165in packages, where they would be interpreted as local identifiers
2166rather than global.) Additional filehandles may be created with
2167the open() function, amongst others. See L<perlopentut> and
2168L<perlfunc/open> for details on this.
d74e8afc 2169X<stdin> X<stdout> X<sterr>
a0d0e21e 2170
35f2feb0 2171If a <FILEHANDLE> is used in a context that is looking for
19799a22 2172a list, a list comprising all input lines is returned, one line per
2173list element. It's easy to grow to a rather large data space this
2174way, so use with care.
a0d0e21e 2175
35f2feb0 2176<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
19799a22 2177See L<perlfunc/readline>.
fbad3eb5 2178
35f2feb0 2179The null filehandle <> is special: it can be used to emulate the
2180behavior of B<sed> and B<awk>. Input from <> comes either from
a0d0e21e 2181standard input, or from each file listed on the command line. Here's
35f2feb0 2182how it works: the first time <> is evaluated, the @ARGV array is
5a964f20 2183checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
a0d0e21e 2184gives you standard input. The @ARGV array is then processed as a list
2185of filenames. The loop
2186
2187 while (<>) {
2188 ... # code for each line
2189 }
2190
2191is equivalent to the following Perl-like pseudo code:
2192
3e3baf6d 2193 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e 2194 while ($ARGV = shift) {
2195 open(ARGV, $ARGV);
2196 while (<ARGV>) {
2197 ... # code for each line
2198 }
2199 }
2200
19799a22 2201except that it isn't so cumbersome to say, and will actually work.
2202It really does shift the @ARGV array and put the current filename
2203into the $ARGV variable. It also uses filehandle I<ARGV>
35f2feb0 2204internally--<> is just a synonym for <ARGV>, which
19799a22 2205is magical. (The pseudo code above doesn't work because it treats
35f2feb0 2206<ARGV> as non-magical.)
a0d0e21e 2207
35f2feb0 2208You can modify @ARGV before the first <> as long as the array ends up
a0d0e21e 2209containing the list of filenames you really want. Line numbers (C<$.>)
19799a22 2210continue as though the input were one big happy file. See the example
2211in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 2212
89d205f2 2213If you want to set @ARGV to your own list of files, go right ahead.
5a964f20 2214This sets @ARGV to all plain text files if no @ARGV was given:
2215
2216 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 2217
5a964f20 2218You can even set them to pipe commands. For example, this automatically
2219filters compressed arguments through B<gzip>:
2220
2221 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
2222
2223If you want to pass switches into your script, you can use one of the
a0d0e21e 2224Getopts modules or put a loop on the front like this:
2225
2226 while ($_ = $ARGV[0], /^-/) {
2227 shift;
2228 last if /^--$/;
2229 if (/^-D(.*)/) { $debug = $1 }
2230 if (/^-v/) { $verbose++ }
5a964f20 2231 # ... # other switches
a0d0e21e 2232 }
5a964f20 2233
a0d0e21e 2234 while (<>) {
5a964f20 2235 # ... # code for each line
a0d0e21e 2236 }
2237
89d205f2 2238The <> symbol will return C<undef> for end-of-file only once.
2239If you call it again after this, it will assume you are processing another
19799a22 2240@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
a0d0e21e 2241
b159ebd3 2242If what the angle brackets contain is a simple scalar variable (e.g.,
35f2feb0 2243<$foo>), then that variable contains the name of the
19799a22 2244filehandle to input from, or its typeglob, or a reference to the
2245same. For example:
cb1a09d0 2246
2247 $fh = \*STDIN;
2248 $line = <$fh>;
a0d0e21e 2249
5a964f20 2250If what's within the angle brackets is neither a filehandle nor a simple
2251scalar variable containing a filehandle name, typeglob, or typeglob
2252reference, it is interpreted as a filename pattern to be globbed, and
2253either a list of filenames or the next filename in the list is returned,
19799a22 2254depending on context. This distinction is determined on syntactic
35f2feb0 2255grounds alone. That means C<< <$x> >> is always a readline() from
2256an indirect handle, but C<< <$hash{key}> >> is always a glob().
5a964f20 2257That's because $x is a simple scalar variable, but C<$hash{key}> is
ef191992 2258not--it's a hash element. Even C<< <$x > >> (note the extra space)
2259is treated as C<glob("$x ")>, not C<readline($x)>.
5a964f20 2260
2261One level of double-quote interpretation is done first, but you can't
35f2feb0 2262say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20 2263in the previous paragraph. (In older versions of Perl, programmers
2264would insert curly brackets to force interpretation as a filename glob:
35f2feb0 2265C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 2266internal function directly as C<glob($foo)>, which is probably the right
19799a22 2267way to have done it in the first place.) For example:
a0d0e21e 2268
2269 while (<*.c>) {
2270 chmod 0644, $_;
2271 }
2272
3a4b19e4 2273is roughly equivalent to:
a0d0e21e 2274
2275 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
2276 while (<FOO>) {
5b3eff12 2277 chomp;
a0d0e21e 2278 chmod 0644, $_;
2279 }
2280
3a4b19e4 2281except that the globbing is actually done internally using the standard
2282C<File::Glob> extension. Of course, the shortest way to do the above is:
a0d0e21e 2283
2284 chmod 0644, <*.c>;
2285
19799a22 2286A (file)glob evaluates its (embedded) argument only when it is
2287starting a new list. All values must be read before it will start
2288over. In list context, this isn't important because you automatically
2289get them all anyway. However, in scalar context the operator returns
069e01df 2290the next value each time it's called, or C<undef> when the list has
19799a22 2291run out. As with filehandle reads, an automatic C<defined> is
2292generated when the glob occurs in the test part of a C<while>,
2293because legal glob returns (e.g. a file called F<0>) would otherwise
2294terminate the loop. Again, C<undef> is returned only once. So if
2295you're expecting a single value from a glob, it is much better to
2296say
4633a7c4 2297
2298 ($file) = <blurch*>;
2299
2300than
2301
2302 $file = <blurch*>;
2303
2304because the latter will alternate between returning a filename and
19799a22 2305returning false.
4633a7c4 2306
b159ebd3 2307If you're trying to do variable interpolation, it's definitely better
4633a7c4 2308to use the glob() function, because the older notation can cause people
e37d713d 2309to become confused with the indirect filehandle notation.
4633a7c4 2310
2311 @files = glob("$dir/*.[ch]");
2312 @files = glob($files[$i]);
2313
a0d0e21e 2314=head2 Constant Folding
d74e8afc 2315X<constant folding> X<folding>
a0d0e21e 2316
2317Like C, Perl does a certain amount of expression evaluation at
19799a22 2318compile time whenever it determines that all arguments to an
a0d0e21e 2319operator are static and have no side effects. In particular, string
2320concatenation happens at compile time between literals that don't do
19799a22 2321variable substitution. Backslash interpolation also happens at
a0d0e21e 2322compile time. You can say
2323
2324 'Now is the time for all' . "\n" .
2325 'good men to come to.'
2326
54310121 2327and this all reduces to one string internally. Likewise, if
a0d0e21e 2328you say
2329
2330 foreach $file (@filenames) {
5a964f20 2331 if (-s $file > 5 + 100 * 2**16) { }
54310121 2332 }
a0d0e21e 2333
19799a22 2334the compiler will precompute the number which that expression
2335represents so that the interpreter won't have to.
a0d0e21e 2336
fd1abbef 2337=head2 No-ops
d74e8afc 2338X<no-op> X<nop>
fd1abbef 2339
2340Perl doesn't officially have a no-op operator, but the bare constants
2341C<0> and C<1> are special-cased to not produce a warning in a void
2342context, so you can for example safely do
2343
2344 1 while foo();
2345
2c268ad5 2346=head2 Bitwise String Operators
d74e8afc 2347X<operator, bitwise, string>
2c268ad5 2348
2349Bitstrings of any size may be manipulated by the bitwise operators
2350(C<~ | & ^>).
2351
19799a22 2352If the operands to a binary bitwise op are strings of different
2353sizes, B<|> and B<^> ops act as though the shorter operand had
2354additional zero bits on the right, while the B<&> op acts as though
2355the longer operand were truncated to the length of the shorter.
2356The granularity for such extension or truncation is one or more
2357bytes.
2c268ad5 2358
89d205f2 2359 # ASCII-based examples
2c268ad5 2360 print "j p \n" ^ " a h"; # prints "JAPH\n"
2361 print "JA" | " ph\n"; # prints "japh\n"
2362 print "japh\nJunk" & '_____'; # prints "JAPH\n";
2363 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
2364
19799a22 2365If you are intending to manipulate bitstrings, be certain that
2c268ad5 2366you're supplying bitstrings: If an operand is a number, that will imply
19799a22 2367a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5 2368operation you intend by using C<""> or C<0+>, as in the examples below.
2369
4358a253 2370 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
2371 $foo = '150' | 105; # yields 255
2c268ad5 2372 $foo = 150 | '105'; # yields 255
2373 $foo = '150' | '105'; # yields string '155' (under ASCII)
2374
2375 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
2376 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 2377
1ae175c8 2378See L<perlfunc/vec> for information on how to manipulate individual bits
2379in a bit vector.
2380
55497cff 2381=head2 Integer Arithmetic
d74e8afc 2382X<integer>
a0d0e21e 2383
19799a22 2384By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e 2385floating point. But by saying
2386
2387 use integer;
2388
2389you may tell the compiler that it's okay to use integer operations
19799a22 2390(if it feels like it) from here to the end of the enclosing BLOCK.
2391An inner BLOCK may countermand this by saying
a0d0e21e 2392
2393 no integer;
2394
19799a22 2395which lasts until the end of that BLOCK. Note that this doesn't
2396mean everything is only an integer, merely that Perl may use integer
2397operations if it is so inclined. For example, even under C<use
2398integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2399or so.
2400
2401Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
89d205f2 2402and ">>") always produce integral results. (But see also
13a2d996 2403L<Bitwise String Operators>.) However, C<use integer> still has meaning for
19799a22 2404them. By default, their results are interpreted as unsigned integers, but
2405if C<use integer> is in effect, their results are interpreted
2406as signed integers. For example, C<~0> usually evaluates to a large
0be96356 2407integral value. However, C<use integer; ~0> is C<-1> on two's-complement
19799a22 2408machines.
68dc0745 2409
2410=head2 Floating-point Arithmetic
d74e8afc 2411X<floating-point> X<floating point> X<float> X<real>
68dc0745 2412
2413While C<use integer> provides integer-only arithmetic, there is no
19799a22 2414analogous mechanism to provide automatic rounding or truncation to a
2415certain number of decimal places. For rounding to a certain number
2416of digits, sprintf() or printf() is usually the easiest route.
2417See L<perlfaq4>.
68dc0745 2418
5a964f20 2419Floating-point numbers are only approximations to what a mathematician
2420would call real numbers. There are infinitely more reals than floats,
2421so some corners must be cut. For example:
2422
2423 printf "%.20g\n", 123456789123456789;
2424 # produces 123456789123456784
2425
2426Testing for exact equality of floating-point equality or inequality is
2427not a good idea. Here's a (relatively expensive) work-around to compare
2428whether two floating-point numbers are equal to a particular number of
2429decimal places. See Knuth, volume II, for a more robust treatment of
2430this topic.
2431
2432 sub fp_equal {
2433 my ($X, $Y, $POINTS) = @_;
2434 my ($tX, $tY);
2435 $tX = sprintf("%.${POINTS}g", $X);
2436 $tY = sprintf("%.${POINTS}g", $Y);
2437 return $tX eq $tY;
2438 }
2439
68dc0745 2440The POSIX module (part of the standard perl distribution) implements
19799a22 2441ceil(), floor(), and other mathematical and trigonometric functions.
2442The Math::Complex module (part of the standard perl distribution)
2443defines mathematical functions that work on both the reals and the
2444imaginary numbers. Math::Complex not as efficient as POSIX, but
68dc0745 2445POSIX can't work with complex numbers.
2446
2447Rounding in financial applications can have serious implications, and
2448the rounding method used should be specified precisely. In these
2449cases, it probably pays not to trust whichever system rounding is
2450being used by Perl, but to instead implement the rounding function you
2451need yourself.
5a964f20 2452
2453=head2 Bigger Numbers
d74e8afc 2454X<number, arbitrary precision>
5a964f20 2455
2456The standard Math::BigInt and Math::BigFloat modules provide
19799a22 2457variable-precision arithmetic and overloaded operators, although
cd5c4fce 2458they're currently pretty slow. At the cost of some space and
19799a22 2459considerable speed, they avoid the normal pitfalls associated with
2460limited-precision representations.
5a964f20 2461
2462 use Math::BigInt;
2463 $x = Math::BigInt->new('123456789123456789');
2464 print $x * $x;
2465
2466 # prints +15241578780673678515622620750190521
19799a22 2467
cd5c4fce 2468There are several modules that let you calculate with (bound only by
2469memory and cpu-time) unlimited or fixed precision. There are also
2470some non-standard modules that provide faster implementations via
2471external C libraries.
2472
2473Here is a short, but incomplete summary:
2474
2475 Math::Fraction big, unlimited fractions like 9973 / 12967
2476 Math::String treat string sequences like numbers
2477 Math::FixedPrecision calculate with a fixed precision
2478 Math::Currency for currency calculations
2479 Bit::Vector manipulate bit vectors fast (uses C)
2480 Math::BigIntFast Bit::Vector wrapper for big numbers
2481 Math::Pari provides access to the Pari C library
2482 Math::BigInteger uses an external C library
2483 Math::Cephes uses external Cephes C library (no big numbers)
2484 Math::Cephes::Fraction fractions via the Cephes library
2485 Math::GMP another one using an external C library
2486
2487Choose wisely.
16070b82 2488
2489=cut