Re: [PATCH] Fix PERL_MALLOC_WRAP change for Win32
[p5sagit/p5-mst-13.2.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlop - Perl operators and precedence
4
d042e63d 5=head1 DESCRIPTION
6
7=head2 Operator Precedence and Associativity
8
9Operator precedence and associativity work in Perl more or less like
10they do in mathematics.
11
12I<Operator precedence> means some operators are evaluated before
13others. For example, in C<2 + 4 * 5>, the multiplication has higher
14precedence so C<4 * 5> is evaluated first yielding C<2 + 20 ==
1522> and not C<6 * 5 == 30>.
16
17I<Operator associativity> defines what happens if a sequence of the
18same operators is used one after another: whether the evaluator will
19evaluate the left operations first or the right. For example, in C<8
20- 4 - 2>, subtraction is left associative so Perl evaluates the
21expression left to right. C<8 - 4> is evaluated first making the
22expression C<4 - 2 == 2> and not C<8 - 2 == 6>.
a0d0e21e 23
24Perl operators have the following associativity and precedence,
19799a22 25listed from highest precedence to lowest. Operators borrowed from
26C keep the same precedence relationship with each other, even where
27C's precedence is slightly screwy. (This makes learning Perl easier
28for C folks.) With very few exceptions, these all operate on scalar
29values only, not array values.
a0d0e21e 30
31 left terms and list operators (leftward)
32 left ->
33 nonassoc ++ --
34 right **
35 right ! ~ \ and unary + and -
54310121 36 left =~ !~
a0d0e21e 37 left * / % x
38 left + - .
39 left << >>
40 nonassoc named unary operators
41 nonassoc < > <= >= lt gt le ge
42 nonassoc == != <=> eq ne cmp
43 left &
44 left | ^
45 left &&
c963b151 46 left || //
137443ea 47 nonassoc .. ...
a0d0e21e 48 right ?:
49 right = += -= *= etc.
50 left , =>
51 nonassoc list operators (rightward)
a5f75d66 52 right not
a0d0e21e 53 left and
c963b151 54 left or xor err
a0d0e21e 55
56In the following sections, these operators are covered in precedence order.
57
5a964f20 58Many operators can be overloaded for objects. See L<overload>.
59
a0d0e21e 60=head2 Terms and List Operators (Leftward)
61
62c18ce2 62A TERM has the highest precedence in Perl. They include variables,
5f05dabc 63quote and quote-like operators, any expression in parentheses,
a0d0e21e 64and any function whose arguments are parenthesized. Actually, there
65aren't really functions in this sense, just list operators and unary
66operators behaving as functions because you put parentheses around
67the arguments. These are all documented in L<perlfunc>.
68
69If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
70is followed by a left parenthesis as the next token, the operator and
71arguments within parentheses are taken to be of highest precedence,
72just like a normal function call.
73
74In the absence of parentheses, the precedence of list operators such as
75C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 76whether you are looking at the left side or the right side of the operator.
a0d0e21e 77For example, in
78
79 @ary = (1, 3, sort 4, 2);
80 print @ary; # prints 1324
81
19799a22 82the commas on the right of the sort are evaluated before the sort,
83but the commas on the left are evaluated after. In other words,
84list operators tend to gobble up all arguments that follow, and
a0d0e21e 85then act like a simple TERM with regard to the preceding expression.
19799a22 86Be careful with parentheses:
a0d0e21e 87
88 # These evaluate exit before doing the print:
89 print($foo, exit); # Obviously not what you want.
90 print $foo, exit; # Nor is this.
91
92 # These do the print before evaluating exit:
93 (print $foo), exit; # This is what you want.
94 print($foo), exit; # Or this.
95 print ($foo), exit; # Or even this.
96
97Also note that
98
99 print ($foo & 255) + 1, "\n";
100
d042e63d 101probably doesn't do what you expect at first glance. The parentheses
102enclose the argument list for C<print> which is evaluated (printing
103the result of C<$foo & 255>). Then one is added to the return value
104of C<print> (usually 1). The result is something like this:
105
106 1 + 1, "\n"; # Obviously not what you meant.
107
108To do what you meant properly, you must write:
109
110 print(($foo & 255) + 1, "\n");
111
112See L<Named Unary Operators> for more discussion of this.
a0d0e21e 113
114Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
54310121 115well as subroutine and method calls, and the anonymous
a0d0e21e 116constructors C<[]> and C<{}>.
117
2ae324a7 118See also L<Quote and Quote-like Operators> toward the end of this section,
c07a80fd 119as well as L<"I/O Operators">.
a0d0e21e 120
121=head2 The Arrow Operator
122
35f2feb0 123"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22 124and C++. If the right side is either a C<[...]>, C<{...}>, or a
125C<(...)> subscript, then the left side must be either a hard or
126symbolic reference to an array, a hash, or a subroutine respectively.
127(Or technically speaking, a location capable of holding a hard
128reference, if it's an array or hash reference being used for
129assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 130
19799a22 131Otherwise, the right side is a method name or a simple scalar
132variable containing either the method name or a subroutine reference,
133and the left side must be either an object (a blessed reference)
134or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 135
5f05dabc 136=head2 Auto-increment and Auto-decrement
a0d0e21e 137
d042e63d 138"++" and "--" work as in C. That is, if placed before a variable,
139they increment or decrement the variable by one before returning the
140value, and if placed after, increment or decrement after returning the
141value.
142
143 $i = 0; $j = 0;
144 print $i++; # prints 0
145 print ++$j; # prints 1
a0d0e21e 146
54310121 147The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e 148you increment a variable that is numeric, or that has ever been used in
149a numeric context, you get a normal increment. If, however, the
5f05dabc 150variable has been used in only string contexts since it was set, and
5a964f20 151has a value that is not the empty string and matches the pattern
9c0670e1 152C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e 153character within its range, with carry:
154
155 print ++($foo = '99'); # prints '100'
156 print ++($foo = 'a0'); # prints 'a1'
157 print ++($foo = 'Az'); # prints 'Ba'
158 print ++($foo = 'zz'); # prints 'aaa'
159
6a61d433 160C<undef> is always treated as numeric, and in particular is changed
161to C<0> before incrementing (so that a post-increment of an undef value
162will return C<0> rather than C<undef>).
163
5f05dabc 164The auto-decrement operator is not magical.
a0d0e21e 165
166=head2 Exponentiation
167
19799a22 168Binary "**" is the exponentiation operator. It binds even more
cb1a09d0 169tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
170implemented using C's pow(3) function, which actually works on doubles
171internally.)
a0d0e21e 172
173=head2 Symbolic Unary Operators
174
5f05dabc 175Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e 176precedence version of this.
177
178Unary "-" performs arithmetic negation if the operand is numeric. If
179the operand is an identifier, a string consisting of a minus sign
180concatenated with the identifier is returned. Otherwise, if the string
181starts with a plus or minus, a string starting with the opposite sign
182is returned. One effect of these rules is that C<-bareword> is equivalent
183to C<"-bareword">.
184
972b05a9 185Unary "~" performs bitwise negation, i.e., 1's complement. For
186example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and
187L<Bitwise String Operators>.) Note that the width of the result is
188platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
189bits wide on a 64-bit platform, so if you are expecting a certain bit
d042e63d 190width, remember to use the & operator to mask off the excess bits.
a0d0e21e 191
192Unary "+" has no effect whatsoever, even on strings. It is useful
193syntactically for separating a function name from a parenthesized expression
194that would otherwise be interpreted as the complete list of function
5ba421f6 195arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
a0d0e21e 196
19799a22 197Unary "\" creates a reference to whatever follows it. See L<perlreftut>
198and L<perlref>. Do not confuse this behavior with the behavior of
199backslash within a string, although both forms do convey the notion
200of protecting the next thing from interpolation.
a0d0e21e 201
202=head2 Binding Operators
203
c07a80fd 204Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0 205search or modify the string $_ by default. This operator makes that kind
206of operation work on some other string. The right argument is a search
2c268ad5 207pattern, substitution, or transliteration. The left argument is what is
208supposed to be searched, substituted, or transliterated instead of the default
f8bab1e9 209$_. When used in scalar context, the return value generally indicates the
210success of the operation. Behavior in list context depends on the particular
211operator. See L</"Regexp Quote-Like Operators"> for details.
212
213If the right argument is an expression rather than a search pattern,
2c268ad5 214substitution, or transliteration, it is interpreted as a search pattern at run
573e01ca 215time.
a0d0e21e 216
217Binary "!~" is just like "=~" except the return value is negated in
218the logical sense.
219
220=head2 Multiplicative Operators
221
222Binary "*" multiplies two numbers.
223
224Binary "/" divides two numbers.
225
54310121 226Binary "%" computes the modulus of two numbers. Given integer
227operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
228C<$a> minus the largest multiple of C<$b> that is not greater than
229C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
230smallest multiple of C<$b> that is not less than C<$a> (i.e. the
6bb4e6d4 231result will be less than or equal to zero).
0412d526 232Note that when C<use integer> is in scope, "%" gives you direct access
55d729e4 233to the modulus operator as implemented by your C compiler. This
234operator is not as well defined for negative operands, but it will
235execute faster.
236
62d10b70 237Binary "x" is the repetition operator. In scalar context or if the left
238operand is not enclosed in parentheses, it returns a string consisting
239of the left operand repeated the number of times specified by the right
240operand. In list context, if the left operand is enclosed in
58a9d1fc 241parentheses, it repeats the list. If the right operand is zero or
242negative, it returns an empty string or an empty list, depending on the
243context.
a0d0e21e 244
245 print '-' x 80; # print row of dashes
246
247 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
248
249 @ones = (1) x 80; # a list of 80 1's
250 @ones = (5) x @ones; # set all elements to 5
251
252
253=head2 Additive Operators
254
255Binary "+" returns the sum of two numbers.
256
257Binary "-" returns the difference of two numbers.
258
259Binary "." concatenates two strings.
260
261=head2 Shift Operators
262
55497cff 263Binary "<<" returns the value of its left argument shifted left by the
264number of bits specified by the right argument. Arguments should be
982ce180 265integers. (See also L<Integer Arithmetic>.)
a0d0e21e 266
55497cff 267Binary ">>" returns the value of its left argument shifted right by
268the number of bits specified by the right argument. Arguments should
982ce180 269be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 270
b16cf6df 271Note that both "<<" and ">>" in Perl are implemented directly using
272"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is
273in force then signed C integers are used, else unsigned C integers are
274used. Either way, the implementation isn't going to generate results
275larger than the size of the integer type Perl was built with (32 bits
276or 64 bits).
277
278The result of overflowing the range of the integers is undefined
279because it is undefined also in C. In other words, using 32-bit
280integers, C<< 1 << 32 >> is undefined. Shifting by a negative number
281of bits is also undefined.
282
a0d0e21e 283=head2 Named Unary Operators
284
285The various named unary operators are treated as functions with one
568e6d8b 286argument, with optional parentheses.
a0d0e21e 287
288If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
289is followed by a left parenthesis as the next token, the operator and
290arguments within parentheses are taken to be of highest precedence,
3981b0eb 291just like a normal function call. For example,
292because named unary operators are higher precedence than ||:
a0d0e21e 293
294 chdir $foo || die; # (chdir $foo) || die
295 chdir($foo) || die; # (chdir $foo) || die
296 chdir ($foo) || die; # (chdir $foo) || die
297 chdir +($foo) || die; # (chdir $foo) || die
298
3981b0eb 299but, because * is higher precedence than named operators:
a0d0e21e 300
301 chdir $foo * 20; # chdir ($foo * 20)
302 chdir($foo) * 20; # (chdir $foo) * 20
303 chdir ($foo) * 20; # (chdir $foo) * 20
304 chdir +($foo) * 20; # chdir ($foo * 20)
305
306 rand 10 * 20; # rand (10 * 20)
307 rand(10) * 20; # (rand 10) * 20
308 rand (10) * 20; # (rand 10) * 20
309 rand +(10) * 20; # rand (10 * 20)
310
568e6d8b 311Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
312treated like named unary operators, but they don't follow this functional
313parenthesis rule. That means, for example, that C<-f($file).".bak"> is
314equivalent to C<-f "$file.bak">.
315
5ba421f6 316See also L<"Terms and List Operators (Leftward)">.
a0d0e21e 317
318=head2 Relational Operators
319
35f2feb0 320Binary "<" returns true if the left argument is numerically less than
a0d0e21e 321the right argument.
322
35f2feb0 323Binary ">" returns true if the left argument is numerically greater
a0d0e21e 324than the right argument.
325
35f2feb0 326Binary "<=" returns true if the left argument is numerically less than
a0d0e21e 327or equal to the right argument.
328
35f2feb0 329Binary ">=" returns true if the left argument is numerically greater
a0d0e21e 330than or equal to the right argument.
331
332Binary "lt" returns true if the left argument is stringwise less than
333the right argument.
334
335Binary "gt" returns true if the left argument is stringwise greater
336than the right argument.
337
338Binary "le" returns true if the left argument is stringwise less than
339or equal to the right argument.
340
341Binary "ge" returns true if the left argument is stringwise greater
342than or equal to the right argument.
343
344=head2 Equality Operators
345
346Binary "==" returns true if the left argument is numerically equal to
347the right argument.
348
349Binary "!=" returns true if the left argument is numerically not equal
350to the right argument.
351
35f2feb0 352Binary "<=>" returns -1, 0, or 1 depending on whether the left
6ee5d4e7 353argument is numerically less than, equal to, or greater than the right
d4ad863d 354argument. If your platform supports NaNs (not-a-numbers) as numeric
7d3a9d88 355values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
356"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
357returns true, as does NaN != anything else. If your platform doesn't
358support NaNs then NaN is just a string with numeric value 0.
359
360 perl -le '$a = NaN; print "No NaN support here" if $a == $a'
361 perl -le '$a = NaN; print "NaN support here" if $a != $a'
a0d0e21e 362
363Binary "eq" returns true if the left argument is stringwise equal to
364the right argument.
365
366Binary "ne" returns true if the left argument is stringwise not equal
367to the right argument.
368
d4ad863d 369Binary "cmp" returns -1, 0, or 1 depending on whether the left
370argument is stringwise less than, equal to, or greater than the right
371argument.
a0d0e21e 372
a034a98d 373"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
374by the current locale if C<use locale> is in effect. See L<perllocale>.
375
a0d0e21e 376=head2 Bitwise And
377
2cdc098b 378Binary "&" returns its operands ANDed together bit by bit.
2c268ad5 379(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 380
2cdc098b 381Note that "&" has lower priority than relational operators, so for example
382the brackets are essential in a test like
383
384 print "Even\n" if ($x & 1) == 0;
385
a0d0e21e 386=head2 Bitwise Or and Exclusive Or
387
2cdc098b 388Binary "|" returns its operands ORed together bit by bit.
2c268ad5 389(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 390
2cdc098b 391Binary "^" returns its operands XORed together bit by bit.
2c268ad5 392(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 393
2cdc098b 394Note that "|" and "^" have lower priority than relational operators, so
395for example the brackets are essential in a test like
396
397 print "false\n" if (8 | 2) != 10;
398
a0d0e21e 399=head2 C-style Logical And
400
401Binary "&&" performs a short-circuit logical AND operation. That is,
402if the left operand is false, the right operand is not even evaluated.
403Scalar or list context propagates down to the right operand if it
404is evaluated.
405
406=head2 C-style Logical Or
407
408Binary "||" performs a short-circuit logical OR operation. That is,
409if the left operand is true, the right operand is not even evaluated.
410Scalar or list context propagates down to the right operand if it
411is evaluated.
412
c963b151 413=head2 C-style Logical Defined-Or
414
415Although it has no direct equivalent in C, Perl's C<//> operator is related
416to its C-style or. In fact, it's exactly the same as C<||>, except that it
417tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
418is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
419rather than the value of C<defined($a)>) and is exactly equivalent to
420C<defined($a) ? $a : $b>. This is very useful for providing default values
d042e63d 421for variables. If you actually want to test if at least one of C<$a> and
422C<$b> is defined, use C<defined($a // $b)>.
c963b151 423
d042e63d 424The C<||>, C<//> and C<&&> operators return the last value evaluated
425(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
426portable way to find out the home directory might be:
a0d0e21e 427
c963b151 428 $home = $ENV{'HOME'} // $ENV{'LOGDIR'} //
429 (getpwuid($<))[7] // die "You're homeless!\n";
a0d0e21e 430
5a964f20 431In particular, this means that you shouldn't use this
432for selecting between two aggregates for assignment:
433
434 @a = @b || @c; # this is wrong
435 @a = scalar(@b) || @c; # really meant this
436 @a = @b ? @b : @c; # this works fine, though
437
c963b151 438As more readable alternatives to C<&&>, C<//> and C<||> when used for
439control flow, Perl provides C<and>, C<err> and C<or> operators (see below).
440The short-circuit behavior is identical. The precedence of "and", "err"
441and "or" is much lower, however, so that you can safely use them after a
5a964f20 442list operator without the need for parentheses:
a0d0e21e 443
444 unlink "alpha", "beta", "gamma"
445 or gripe(), next LINE;
446
447With the C-style operators that would have been written like this:
448
449 unlink("alpha", "beta", "gamma")
450 || (gripe(), next LINE);
451
eeb6a2c9 452Using "or" for assignment is unlikely to do what you want; see below.
5a964f20 453
454=head2 Range Operators
a0d0e21e 455
456Binary ".." is the range operator, which is really two different
fb53bbb2 457operators depending on the context. In list context, it returns a
54ae734e 458list of values counting (up by ones) from the left value to the right
2cdbc966 459value. If the left value is greater than the right value then it
fb53bbb2 460returns the empty list. The range operator is useful for writing
54ae734e 461C<foreach (1..10)> loops and for doing slice operations on arrays. In
2cdbc966 462the current implementation, no temporary array is created when the
463range operator is used as the expression in C<foreach> loops, but older
464versions of Perl might burn a lot of memory when you write something
465like this:
a0d0e21e 466
467 for (1 .. 1_000_000) {
468 # code
54310121 469 }
a0d0e21e 470
54ae734e 471The range operator also works on strings, using the magical auto-increment,
472see below.
473
5a964f20 474In scalar context, ".." returns a boolean value. The operator is
a0d0e21e 475bistable, like a flip-flop, and emulates the line-range (comma) operator
476of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
477own boolean state. It is false as long as its left operand is false.
478Once the left operand is true, the range operator stays true until the
479right operand is true, I<AFTER> which the range operator becomes false
19799a22 480again. It doesn't become false till the next time the range operator is
a0d0e21e 481evaluated. It can test the right operand and become false on the same
482evaluation it became true (as in B<awk>), but it still returns true once.
19799a22 483If you don't want it to test the right operand till the next
484evaluation, as in B<sed>, just use three dots ("...") instead of
485two. In all other regards, "..." behaves just like ".." does.
486
487The right operand is not evaluated while the operator is in the
488"false" state, and the left operand is not evaluated while the
489operator is in the "true" state. The precedence is a little lower
490than || and &&. The value returned is either the empty string for
491false, or a sequence number (beginning with 1) for true. The
492sequence number is reset for each range encountered. The final
493sequence number in a range has the string "E0" appended to it, which
494doesn't affect its numeric value, but gives you something to search
495for if you want to exclude the endpoint. You can exclude the
496beginning point by waiting for the sequence number to be greater
df5f8116 497than 1.
498
499If either operand of scalar ".." is a constant expression,
500that operand is considered true if it is equal (C<==>) to the current
501input line number (the C<$.> variable).
502
503To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>,
504but that is only an issue if you use a floating point expression; when
505implicitly using C<$.> as described in the previous paragraph, the
506comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.>
507is set to a floating point value and you are not reading from a file.
508Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what
509you want in scalar context because each of the operands are evaluated
510using their integer representation.
511
512Examples:
a0d0e21e 513
514As a scalar operator:
515
df5f8116 516 if (101 .. 200) { print; } # print 2nd hundred lines, short for
517 # if ($. == 101 .. $. == 200) ...
518 next line if (1 .. /^$/); # skip header lines, short for
519 # ... if ($. == 1 .. /^$/);
a0d0e21e 520 s/^/> / if (/^$/ .. eof()); # quote body
521
5a964f20 522 # parse mail messages
523 while (<>) {
524 $in_header = 1 .. /^$/;
df5f8116 525 $in_body = /^$/ .. eof;
526 if ($in_header) {
527 # ...
528 } else { # in body
529 # ...
530 }
5a964f20 531 } continue {
df5f8116 532 close ARGV if eof; # reset $. each file
5a964f20 533 }
534
a0d0e21e 535As a list operator:
536
537 for (101 .. 200) { print; } # print $_ 100 times
3e3baf6d 538 @foo = @foo[0 .. $#foo]; # an expensive no-op
a0d0e21e 539 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
540
5a964f20 541The range operator (in list context) makes use of the magical
5f05dabc 542auto-increment algorithm if the operands are strings. You
a0d0e21e 543can say
544
545 @alphabet = ('A' .. 'Z');
546
54ae734e 547to get all normal letters of the English alphabet, or
a0d0e21e 548
549 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
550
551to get a hexadecimal digit, or
552
553 @z2 = ('01' .. '31'); print $z2[$mday];
554
555to get dates with leading zeros. If the final value specified is not
556in the sequence that the magical increment would produce, the sequence
557goes until the next value would be longer than the final value
558specified.
559
df5f8116 560Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
561return two elements in list context.
562
563 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
564
a0d0e21e 565=head2 Conditional Operator
566
567Ternary "?:" is the conditional operator, just as in C. It works much
568like an if-then-else. If the argument before the ? is true, the
569argument before the : is returned, otherwise the argument after the :
cb1a09d0 570is returned. For example:
571
54310121 572 printf "I have %d dog%s.\n", $n,
cb1a09d0 573 ($n == 1) ? '' : "s";
574
575Scalar or list context propagates downward into the 2nd
54310121 576or 3rd argument, whichever is selected.
cb1a09d0 577
578 $a = $ok ? $b : $c; # get a scalar
579 @a = $ok ? @b : @c; # get an array
580 $a = $ok ? @b : @c; # oops, that's just a count!
581
582The operator may be assigned to if both the 2nd and 3rd arguments are
583legal lvalues (meaning that you can assign to them):
a0d0e21e 584
585 ($a_or_b ? $a : $b) = $c;
586
5a964f20 587Because this operator produces an assignable result, using assignments
588without parentheses will get you in trouble. For example, this:
589
590 $a % 2 ? $a += 10 : $a += 2
591
592Really means this:
593
594 (($a % 2) ? ($a += 10) : $a) += 2
595
596Rather than this:
597
598 ($a % 2) ? ($a += 10) : ($a += 2)
599
19799a22 600That should probably be written more simply as:
601
602 $a += ($a % 2) ? 10 : 2;
603
4633a7c4 604=head2 Assignment Operators
a0d0e21e 605
606"=" is the ordinary assignment operator.
607
608Assignment operators work as in C. That is,
609
610 $a += 2;
611
612is equivalent to
613
614 $a = $a + 2;
615
616although without duplicating any side effects that dereferencing the lvalue
54310121 617might trigger, such as from tie(). Other assignment operators work similarly.
618The following are recognized:
a0d0e21e 619
620 **= += *= &= <<= &&=
621 -= /= |= >>= ||=
622 .= %= ^=
623 x=
624
19799a22 625Although these are grouped by family, they all have the precedence
a0d0e21e 626of assignment.
627
b350dd2f 628Unlike in C, the scalar assignment operator produces a valid lvalue.
629Modifying an assignment is equivalent to doing the assignment and
630then modifying the variable that was assigned to. This is useful
631for modifying a copy of something, like this:
a0d0e21e 632
633 ($tmp = $global) =~ tr [A-Z] [a-z];
634
635Likewise,
636
637 ($a += 2) *= 3;
638
639is equivalent to
640
641 $a += 2;
642 $a *= 3;
643
b350dd2f 644Similarly, a list assignment in list context produces the list of
645lvalues assigned to, and a list assignment in scalar context returns
646the number of elements produced by the expression on the right hand
647side of the assignment.
648
748a9306 649=head2 Comma Operator
a0d0e21e 650
5a964f20 651Binary "," is the comma operator. In scalar context it evaluates
a0d0e21e 652its left argument, throws that value away, then evaluates its right
653argument and returns that value. This is just like C's comma operator.
654
5a964f20 655In list context, it's just the list argument separator, and inserts
a0d0e21e 656both its arguments into the list.
657
d042e63d 658The C<< => >> operator is a synonym for the comma, but forces any word
659to its left to be interpreted as a string (as of 5.001). It is helpful
660in documenting the correspondence between keys and values in hashes,
661and other paired elements in lists.
748a9306 662
a0d0e21e 663=head2 List Operators (Rightward)
664
665On the right side of a list operator, it has very low precedence,
666such that it controls all comma-separated expressions found there.
667The only operators with lower precedence are the logical operators
668"and", "or", and "not", which may be used to evaluate calls to list
669operators without the need for extra parentheses:
670
671 open HANDLE, "filename"
672 or die "Can't open: $!\n";
673
5ba421f6 674See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e 675
676=head2 Logical Not
677
678Unary "not" returns the logical negation of the expression to its right.
679It's the equivalent of "!" except for the very low precedence.
680
681=head2 Logical And
682
683Binary "and" returns the logical conjunction of the two surrounding
684expressions. It's equivalent to && except for the very low
5f05dabc 685precedence. This means that it short-circuits: i.e., the right
a0d0e21e 686expression is evaluated only if the left expression is true.
687
c963b151 688=head2 Logical or, Defined or, and Exclusive Or
a0d0e21e 689
690Binary "or" returns the logical disjunction of the two surrounding
5a964f20 691expressions. It's equivalent to || except for the very low precedence.
692This makes it useful for control flow
693
694 print FH $data or die "Can't write to FH: $!";
695
696This means that it short-circuits: i.e., the right expression is evaluated
697only if the left expression is false. Due to its precedence, you should
698probably avoid using this for assignment, only for control flow.
699
700 $a = $b or $c; # bug: this is wrong
701 ($a = $b) or $c; # really means this
702 $a = $b || $c; # better written this way
703
19799a22 704However, when it's a list-context assignment and you're trying to use
5a964f20 705"||" for control flow, you probably need "or" so that the assignment
706takes higher precedence.
707
708 @info = stat($file) || die; # oops, scalar sense of stat!
709 @info = stat($file) or die; # better, now @info gets its due
710
c963b151 711Then again, you could always use parentheses.
712
713Binary "err" is equivalent to C<//>--it's just like binary "or", except it tests
714its left argument's definedness instead of its truth. There are two ways to
715remember "err": either because many functions return C<undef> on an B<err>or,
716or as a sort of correction: C<$a=($b err 'default')>
a0d0e21e 717
718Binary "xor" returns the exclusive-OR of the two surrounding expressions.
719It cannot short circuit, of course.
720
721=head2 C Operators Missing From Perl
722
723Here is what C has that Perl doesn't:
724
725=over 8
726
727=item unary &
728
729Address-of operator. (But see the "\" operator for taking a reference.)
730
731=item unary *
732
54310121 733Dereference-address operator. (Perl's prefix dereferencing
a0d0e21e 734operators are typed: $, @, %, and &.)
735
736=item (TYPE)
737
19799a22 738Type-casting operator.
a0d0e21e 739
740=back
741
5f05dabc 742=head2 Quote and Quote-like Operators
a0d0e21e 743
744While we usually think of quotes as literal values, in Perl they
745function as operators, providing various kinds of interpolating and
746pattern matching capabilities. Perl provides customary quote characters
747for these behaviors, but also provides a way for you to choose your
748quote character for any of them. In the following table, a C<{}> represents
87275199 749any pair of delimiters you choose.
a0d0e21e 750
2c268ad5 751 Customary Generic Meaning Interpolates
752 '' q{} Literal no
753 "" qq{} Literal yes
af9219ee 754 `` qx{} Command yes*
2c268ad5 755 qw{} Word list no
af9219ee 756 // m{} Pattern match yes*
757 qr{} Pattern yes*
758 s{}{} Substitution yes*
2c268ad5 759 tr{}{} Transliteration no (but see below)
7e3b091d 760 <<EOF here-doc yes*
a0d0e21e 761
af9219ee 762 * unless the delimiter is ''.
763
87275199 764Non-bracketing delimiters use the same character fore and aft, but the four
765sorts of brackets (round, angle, square, curly) will all nest, which means
766that
767
768 q{foo{bar}baz}
35f2feb0 769
87275199 770is the same as
771
772 'foo{bar}baz'
773
774Note, however, that this does not always work for quoting Perl code:
775
776 $s = q{ if($a eq "}") ... }; # WRONG
777
83df6a1d 778is a syntax error. The C<Text::Balanced> module (from CPAN, and
779starting from Perl 5.8 part of the standard distribution) is able
780to do this properly.
87275199 781
19799a22 782There can be whitespace between the operator and the quoting
fb73857a 783characters, except when C<#> is being used as the quoting character.
19799a22 784C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
785operator C<q> followed by a comment. Its argument will be taken
786from the next line. This allows you to write:
fb73857a 787
788 s {foo} # Replace foo
789 {bar} # with bar.
790
904501ec 791The following escape sequences are available in constructs that interpolate
792and in transliterations.
a0d0e21e 793
6ee5d4e7 794 \t tab (HT, TAB)
5a964f20 795 \n newline (NL)
6ee5d4e7 796 \r return (CR)
797 \f form feed (FF)
798 \b backspace (BS)
799 \a alarm (bell) (BEL)
800 \e escape (ESC)
a0ed51b3 801 \033 octal char (ESC)
802 \x1b hex char (ESC)
803 \x{263a} wide hex char (SMILEY)
19799a22 804 \c[ control char (ESC)
95cc3e0c 805 \N{name} named Unicode character
2c268ad5 806
4c77eaa2 807B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for
808the vertical tab (VT - ASCII 11).
809
904501ec 810The following escape sequences are available in constructs that interpolate
811but not in transliterations.
812
a0d0e21e 813 \l lowercase next char
814 \u uppercase next char
815 \L lowercase till \E
816 \U uppercase till \E
817 \E end case modification
1d2dff63 818 \Q quote non-word characters till \E
a0d0e21e 819
95cc3e0c 820If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
821C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
822If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
823beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
824C<\U> is as defined by Unicode. For documentation of C<\N{name}>,
825see L<charnames>.
a034a98d 826
5a964f20 827All systems use the virtual C<"\n"> to represent a line terminator,
828called a "newline". There is no such thing as an unvarying, physical
19799a22 829newline character. It is only an illusion that the operating system,
5a964f20 830device drivers, C libraries, and Perl all conspire to preserve. Not all
831systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
832on a Mac, these are reversed, and on systems without line terminator,
833printing C<"\n"> may emit no actual data. In general, use C<"\n"> when
834you mean a "newline" for your system, but use the literal ASCII when you
835need an exact character. For example, most networking protocols expect
2a380090 836and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20 837and although they often accept just C<"\012">, they seldom tolerate just
838C<"\015">. If you get in the habit of using C<"\n"> for networking,
839you may be burned some day.
840
904501ec 841For constructs that do interpolate, variables beginning with "C<$>"
842or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a 843C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
844But method calls such as C<< $obj->meth >> are not.
af9219ee 845
846Interpolating an array or slice interpolates the elements in order,
847separated by the value of C<$">, so is equivalent to interpolating
904501ec 848C<join $", @array>. "Punctuation" arrays such as C<@+> are only
849interpolated if the name is enclosed in braces C<@{+}>.
af9219ee 850
1d2dff63 851You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
852An unescaped C<$> or C<@> interpolates the corresponding variable,
853while escaping will cause the literal string C<\$> to be inserted.
854You'll need to write something like C<m/\Quser\E\@\Qhost/>.
855
a0d0e21e 856Patterns are subject to an additional level of interpretation as a
857regular expression. This is done as a second pass, after variables are
858interpolated, so that regular expressions may be incorporated into the
859pattern from the variables. If this is not what you want, use C<\Q> to
860interpolate a variable literally.
861
19799a22 862Apart from the behavior described above, Perl does not expand
863multiple levels of interpolation. In particular, contrary to the
864expectations of shell programmers, back-quotes do I<NOT> interpolate
865within double quotes, nor do single quotes impede evaluation of
866variables when used within double quotes.
a0d0e21e 867
5f05dabc 868=head2 Regexp Quote-Like Operators
cb1a09d0 869
5f05dabc 870Here are the quote-like operators that apply to pattern
cb1a09d0 871matching and related activities.
872
a0d0e21e 873=over 8
874
875=item ?PATTERN?
876
877This is just like the C</pattern/> search, except that it matches only
878once between calls to the reset() operator. This is a useful
5f05dabc 879optimization when you want to see only the first occurrence of
a0d0e21e 880something in each file of a set of files, for instance. Only C<??>
881patterns local to the current package are reset.
882
5a964f20 883 while (<>) {
884 if (?^$?) {
885 # blank line between header and body
886 }
887 } continue {
888 reset if eof; # clear ?? status for next file
889 }
890
483b4840 891This usage is vaguely deprecated, which means it just might possibly
19799a22 892be removed in some distant future version of Perl, perhaps somewhere
893around the year 2168.
a0d0e21e 894
fb73857a 895=item m/PATTERN/cgimosx
a0d0e21e 896
fb73857a 897=item /PATTERN/cgimosx
a0d0e21e 898
5a964f20 899Searches a string for a pattern match, and in scalar context returns
19799a22 900true if it succeeds, false if it fails. If no string is specified
901via the C<=~> or C<!~> operator, the $_ string is searched. (The
902string specified with C<=~> need not be an lvalue--it may be the
903result of an expression evaluation, but remember the C<=~> binds
904rather tightly.) See also L<perlre>. See L<perllocale> for
905discussion of additional considerations that apply when C<use locale>
906is in effect.
a0d0e21e 907
908Options are:
909
fb73857a 910 c Do not reset search position on a failed match when /g is in effect.
5f05dabc 911 g Match globally, i.e., find all occurrences.
a0d0e21e 912 i Do case-insensitive pattern matching.
913 m Treat string as multiple lines.
5f05dabc 914 o Compile pattern only once.
a0d0e21e 915 s Treat string as single line.
916 x Use extended regular expressions.
917
918If "/" is the delimiter then the initial C<m> is optional. With the C<m>
01ae956f 919you can use any pair of non-alphanumeric, non-whitespace characters
19799a22 920as delimiters. This is particularly useful for matching path names
921that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
7bac28a0 922the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
19799a22 923If "'" is the delimiter, no interpolation is performed on the PATTERN.
a0d0e21e 924
925PATTERN may contain variables, which will be interpolated (and the
f70b4f9c 926pattern recompiled) every time the pattern search is evaluated, except
1f247705 927for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
928C<$|> are not interpolated because they look like end-of-string tests.)
f70b4f9c 929If you want such a pattern to be compiled only once, add a C</o> after
930the trailing delimiter. This avoids expensive run-time recompilations,
931and is useful when the value you are interpolating won't change over
932the life of the script. However, mentioning C</o> constitutes a promise
933that you won't change the variables in the pattern. If you change them,
13a2d996 934Perl won't even notice. See also L<"qr/STRING/imosx">.
a0d0e21e 935
5a964f20 936If the PATTERN evaluates to the empty string, the last
d65afb4b 937I<successfully> matched regular expression is used instead. In this
938case, only the C<g> and C<c> flags on the empty pattern is honoured -
939the other flags are taken from the original pattern. If no match has
940previously succeeded, this will (silently) act instead as a genuine
941empty pattern (which will always match).
a0d0e21e 942
c963b151 943Note that it's possible to confuse Perl into thinking C<//> (the empty
944regex) is really C<//> (the defined-or operator). Perl is usually pretty
945good about this, but some pathological cases might trigger this, such as
946C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
947(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
948will assume you meant defined-or. If you meant the empty regex, just
949use parentheses or spaces to disambiguate, or even prefix the empty
950regex with an C<m> (so C<//> becomes C<m//>).
951
19799a22 952If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 953list consisting of the subexpressions matched by the parentheses in the
f7e33566 954pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
955also set, and that this differs from Perl 4's behavior.) When there are
956no parentheses in the pattern, the return value is the list C<(1)> for
957success. With or without parentheses, an empty list is returned upon
958failure.
a0d0e21e 959
960Examples:
961
962 open(TTY, '/dev/tty');
963 <TTY> =~ /^y/i && foo(); # do foo if desired
964
965 if (/Version: *([0-9.]*)/) { $version = $1; }
966
967 next if m#^/usr/spool/uucp#;
968
969 # poor man's grep
970 $arg = shift;
971 while (<>) {
972 print if /$arg/o; # compile only once
973 }
974
975 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
976
977This last example splits $foo into the first two words and the
5f05dabc 978remainder of the line, and assigns those three fields to $F1, $F2, and
979$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e 980the pattern matched.
981
19799a22 982The C</g> modifier specifies global pattern matching--that is,
983matching as many times as possible within the string. How it behaves
984depends on the context. In list context, it returns a list of the
985substrings matched by any capturing parentheses in the regular
986expression. If there are no parentheses, it returns a list of all
987the matched strings, as if there were parentheses around the whole
988pattern.
a0d0e21e 989
7e86de3e 990In scalar context, each execution of C<m//g> finds the next match,
19799a22 991returning true if it matches, and false if there is no further match.
7e86de3e 992The position after the last match can be read or set using the pos()
993function; see L<perlfunc/pos>. A failed match normally resets the
994search position to the beginning of the string, but you can avoid that
995by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
996string also resets the search position.
c90c0ff4 997
998You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
999zero-width assertion that matches the exact position where the previous
5d43e42d 1000C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
1001still anchors at pos(), but the match is of course only attempted once.
1002Using C<\G> without C</g> on a target string that has not previously had a
1003C</g> match applied to it is the same as using the C<\A> assertion to match
fe4b3f22 1004the beginning of the string. Note also that, currently, C<\G> is only
1005properly supported when anchored at the very beginning of the pattern.
c90c0ff4 1006
1007Examples:
a0d0e21e 1008
1009 # list context
1010 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1011
1012 # scalar context
5d43e42d 1013 $/ = "";
19799a22 1014 while (defined($paragraph = <>)) {
1015 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
1016 $sentences++;
a0d0e21e 1017 }
1018 }
1019 print "$sentences\n";
1020
c90c0ff4 1021 # using m//gc with \G
137443ea 1022 $_ = "ppooqppqq";
44a8e56a 1023 while ($i++ < 2) {
1024 print "1: '";
c90c0ff4 1025 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 1026 print "2: '";
c90c0ff4 1027 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 1028 print "3: '";
c90c0ff4 1029 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 1030 }
5d43e42d 1031 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a 1032
1033The last example should print:
1034
1035 1: 'oo', pos=4
137443ea 1036 2: 'q', pos=5
44a8e56a 1037 3: 'pp', pos=7
1038 1: '', pos=7
137443ea 1039 2: 'q', pos=8
1040 3: '', pos=8
5d43e42d 1041 Final: 'q', pos=8
1042
1043Notice that the final match matched C<q> instead of C<p>, which a match
1044without the C<\G> anchor would have done. Also note that the final match
1045did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
1046final match did indeed match C<p>, it's a good bet that you're running an
1047older (pre-5.6.0) Perl.
44a8e56a 1048
c90c0ff4 1049A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 1050combine several regexps like this to process a string part-by-part,
c90c0ff4 1051doing different actions depending on which regexp matched. Each
1052regexp tries to match where the previous one leaves off.
e7ea3e70 1053
3fe9a6f1 1054 $_ = <<'EOL';
e7ea3e70 1055 $url = new URI::URL "http://www/"; die if $url eq "xXx";
3fe9a6f1 1056 EOL
1057 LOOP:
e7ea3e70 1058 {
c90c0ff4 1059 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1060 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
1061 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
1062 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
1063 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
1064 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
1065 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
e7ea3e70 1066 print ". That's all!\n";
1067 }
1068
1069Here is the output (split into several lines):
1070
1071 line-noise lowercase line-noise lowercase UPPERCASE line-noise
1072 UPPERCASE line-noise lowercase line-noise lowercase line-noise
1073 lowercase lowercase line-noise lowercase lowercase line-noise
1074 MiXeD line-noise. That's all!
44a8e56a 1075
a0d0e21e 1076=item q/STRING/
1077
1078=item C<'STRING'>
1079
19799a22 1080A single-quoted, literal string. A backslash represents a backslash
68dc0745 1081unless followed by the delimiter or another backslash, in which case
1082the delimiter or backslash is interpolated.
a0d0e21e 1083
1084 $foo = q!I said, "You said, 'She said it.'"!;
1085 $bar = q('This is it.');
68dc0745 1086 $baz = '\n'; # a two-character string
a0d0e21e 1087
1088=item qq/STRING/
1089
1090=item "STRING"
1091
1092A double-quoted, interpolated string.
1093
1094 $_ .= qq
1095 (*** The previous line contains the naughty word "$1".\n)
19799a22 1096 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 1097 $baz = "\n"; # a one-character string
a0d0e21e 1098
eec2d3df 1099=item qr/STRING/imosx
1100
322edccd 1101This operator quotes (and possibly compiles) its I<STRING> as a regular
19799a22 1102expression. I<STRING> is interpolated the same way as I<PATTERN>
1103in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
1104is done. Returns a Perl value which may be used instead of the
1105corresponding C</STRING/imosx> expression.
4b6a7270 1106
1107For example,
1108
1109 $rex = qr/my.STRING/is;
1110 s/$rex/foo/;
1111
1112is equivalent to
1113
1114 s/my.STRING/foo/is;
1115
1116The result may be used as a subpattern in a match:
eec2d3df 1117
1118 $re = qr/$pattern/;
0a92e3a8 1119 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1120 $string =~ $re; # or used standalone
4b6a7270 1121 $string =~ /$re/; # or this way
1122
1123Since Perl may compile the pattern at the moment of execution of qr()
19799a22 1124operator, using qr() may have speed advantages in some situations,
4b6a7270 1125notably if the result of qr() is used standalone:
1126
1127 sub match {
1128 my $patterns = shift;
1129 my @compiled = map qr/$_/i, @$patterns;
1130 grep {
1131 my $success = 0;
a7665c5e 1132 foreach my $pat (@compiled) {
4b6a7270 1133 $success = 1, last if /$pat/;
1134 }
1135 $success;
1136 } @_;
1137 }
1138
19799a22 1139Precompilation of the pattern into an internal representation at
1140the moment of qr() avoids a need to recompile the pattern every
1141time a match C</$pat/> is attempted. (Perl has many other internal
1142optimizations, but none would be triggered in the above example if
1143we did not use qr() operator.)
eec2d3df 1144
1145Options are:
1146
1147 i Do case-insensitive pattern matching.
1148 m Treat string as multiple lines.
1149 o Compile pattern only once.
1150 s Treat string as single line.
1151 x Use extended regular expressions.
1152
0a92e3a8 1153See L<perlre> for additional information on valid syntax for STRING, and
1154for a detailed look at the semantics of regular expressions.
1155
a0d0e21e 1156=item qx/STRING/
1157
1158=item `STRING`
1159
43dd4d21 1160A string which is (possibly) interpolated and then executed as a
1161system command with C</bin/sh> or its equivalent. Shell wildcards,
1162pipes, and redirections will be honored. The collected standard
1163output of the command is returned; standard error is unaffected. In
1164scalar context, it comes back as a single (potentially multi-line)
1165string, or undef if the command failed. In list context, returns a
1166list of lines (however you've defined lines with $/ or
1167$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
5a964f20 1168
1169Because backticks do not affect standard error, use shell file descriptor
1170syntax (assuming the shell supports this) if you care to address this.
1171To capture a command's STDERR and STDOUT together:
a0d0e21e 1172
5a964f20 1173 $output = `cmd 2>&1`;
1174
1175To capture a command's STDOUT but discard its STDERR:
1176
1177 $output = `cmd 2>/dev/null`;
1178
1179To capture a command's STDERR but discard its STDOUT (ordering is
1180important here):
1181
1182 $output = `cmd 2>&1 1>/dev/null`;
1183
1184To exchange a command's STDOUT and STDERR in order to capture the STDERR
1185but leave its STDOUT to come out the old STDERR:
1186
1187 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1188
1189To read both a command's STDOUT and its STDERR separately, it's easiest
2359510d 1190to redirect them separately to files, and then read from those files
1191when the program is done:
5a964f20 1192
2359510d 1193 system("program args 1>program.stdout 2>program.stderr");
5a964f20 1194
1195Using single-quote as a delimiter protects the command from Perl's
1196double-quote interpolation, passing it on to the shell instead:
1197
1198 $perl_info = qx(ps $$); # that's Perl's $$
1199 $shell_info = qx'ps $$'; # that's the new shell's $$
1200
19799a22 1201How that string gets evaluated is entirely subject to the command
5a964f20 1202interpreter on your system. On most platforms, you will have to protect
1203shell metacharacters if you want them treated literally. This is in
1204practice difficult to do, as it's unclear how to escape which characters.
1205See L<perlsec> for a clean and safe example of a manual fork() and exec()
1206to emulate backticks safely.
a0d0e21e 1207
bb32b41a 1208On some platforms (notably DOS-like ones), the shell may not be
1209capable of dealing with multiline commands, so putting newlines in
1210the string may not get you what you want. You may be able to evaluate
1211multiple commands in a single line by separating them with the command
1212separator character, if your shell supports that (e.g. C<;> on many Unix
1213shells; C<&> on the Windows NT C<cmd> shell).
1214
0f897271 1215Beginning with v5.6.0, Perl will attempt to flush all files opened for
1216output before starting the child process, but this may not be supported
1217on some platforms (see L<perlport>). To be safe, you may need to set
1218C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1219C<IO::Handle> on any open handles.
1220
bb32b41a 1221Beware that some command shells may place restrictions on the length
1222of the command line. You must ensure your strings don't exceed this
1223limit after any necessary interpolations. See the platform-specific
1224release notes for more details about your particular environment.
1225
5a964f20 1226Using this operator can lead to programs that are difficult to port,
1227because the shell commands called vary between systems, and may in
1228fact not be present at all. As one example, the C<type> command under
1229the POSIX shell is very different from the C<type> command under DOS.
1230That doesn't mean you should go out of your way to avoid backticks
1231when they're the right way to get something done. Perl was made to be
1232a glue language, and one of the things it glues together is commands.
1233Just understand what you're getting yourself into.
bb32b41a 1234
dc848c6f 1235See L<"I/O Operators"> for more discussion.
a0d0e21e 1236
945c54fd 1237=item qw/STRING/
1238
1239Evaluates to a list of the words extracted out of STRING, using embedded
1240whitespace as the word delimiters. It can be understood as being roughly
1241equivalent to:
1242
1243 split(' ', q/STRING/);
1244
efb1e162 1245the differences being that it generates a real list at compile time, and
1246in scalar context it returns the last element in the list. So
945c54fd 1247this expression:
1248
1249 qw(foo bar baz)
1250
1251is semantically equivalent to the list:
1252
1253 'foo', 'bar', 'baz'
1254
1255Some frequently seen examples:
1256
1257 use POSIX qw( setlocale localeconv )
1258 @EXPORT = qw( foo bar baz );
1259
1260A common mistake is to try to separate the words with comma or to
1261put comments into a multi-line C<qw>-string. For this reason, the
1262C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
1263produces warnings if the STRING contains the "," or the "#" character.
1264
a0d0e21e 1265=item s/PATTERN/REPLACEMENT/egimosx
1266
1267Searches a string for a pattern, and if found, replaces that pattern
1268with the replacement text and returns the number of substitutions
e37d713d 1269made. Otherwise it returns false (specifically, the empty string).
a0d0e21e 1270
1271If no string is specified via the C<=~> or C<!~> operator, the C<$_>
1272variable is searched and modified. (The string specified with C<=~> must
5a964f20 1273be scalar variable, an array element, a hash element, or an assignment
5f05dabc 1274to one of those, i.e., an lvalue.)
a0d0e21e 1275
19799a22 1276If the delimiter chosen is a single quote, no interpolation is
a0d0e21e 1277done on either the PATTERN or the REPLACEMENT. Otherwise, if the
1278PATTERN contains a $ that looks like a variable rather than an
1279end-of-string test, the variable will be interpolated into the pattern
5f05dabc 1280at run-time. If you want the pattern compiled only once the first time
a0d0e21e 1281the variable is interpolated, use the C</o> option. If the pattern
5a964f20 1282evaluates to the empty string, the last successfully executed regular
a0d0e21e 1283expression is used instead. See L<perlre> for further explanation on these.
5a964f20 1284See L<perllocale> for discussion of additional considerations that apply
a034a98d 1285when C<use locale> is in effect.
a0d0e21e 1286
1287Options are:
1288
1289 e Evaluate the right side as an expression.
5f05dabc 1290 g Replace globally, i.e., all occurrences.
a0d0e21e 1291 i Do case-insensitive pattern matching.
1292 m Treat string as multiple lines.
5f05dabc 1293 o Compile pattern only once.
a0d0e21e 1294 s Treat string as single line.
1295 x Use extended regular expressions.
1296
1297Any non-alphanumeric, non-whitespace delimiter may replace the
1298slashes. If single quotes are used, no interpretation is done on the
e37d713d 1299replacement string (the C</e> modifier overrides this, however). Unlike
54310121 1300Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
e37d713d 1301text is not evaluated as a command. If the
a0d0e21e 1302PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
5f05dabc 1303pair of quotes, which may or may not be bracketing quotes, e.g.,
35f2feb0 1304C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
cec88af6 1305replacement portion to be treated as a full-fledged Perl expression
1306and evaluated right then and there. It is, however, syntax checked at
1307compile-time. A second C<e> modifier will cause the replacement portion
1308to be C<eval>ed before being run as a Perl expression.
a0d0e21e 1309
1310Examples:
1311
1312 s/\bgreen\b/mauve/g; # don't change wintergreen
1313
1314 $path =~ s|/usr/bin|/usr/local/bin|;
1315
1316 s/Login: $foo/Login: $bar/; # run-time pattern
1317
5a964f20 1318 ($foo = $bar) =~ s/this/that/; # copy first, then change
a0d0e21e 1319
5a964f20 1320 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
a0d0e21e 1321
1322 $_ = 'abc123xyz';
1323 s/\d+/$&*2/e; # yields 'abc246xyz'
1324 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1325 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1326
1327 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1328 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
1329 s/^=(\w+)/&pod($1)/ge; # use function call
1330
5a964f20 1331 # expand variables in $_, but dynamics only, using
1332 # symbolic dereferencing
1333 s/\$(\w+)/${$1}/g;
1334
cec88af6 1335 # Add one to the value of any numbers in the string
1336 s/(\d+)/1 + $1/eg;
1337
1338 # This will expand any embedded scalar variable
1339 # (including lexicals) in $_ : First $1 is interpolated
1340 # to the variable name, and then evaluated
a0d0e21e 1341 s/(\$\w+)/$1/eeg;
1342
5a964f20 1343 # Delete (most) C comments.
a0d0e21e 1344 $program =~ s {
4633a7c4 1345 /\* # Match the opening delimiter.
1346 .*? # Match a minimal number of characters.
1347 \*/ # Match the closing delimiter.
a0d0e21e 1348 } []gsx;
1349
5a964f20 1350 s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively
1351
1352 for ($variable) { # trim white space in $variable, cheap
1353 s/^\s+//;
1354 s/\s+$//;
1355 }
a0d0e21e 1356
1357 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1358
54310121 1359Note the use of $ instead of \ in the last example. Unlike
35f2feb0 1360B<sed>, we use the \<I<digit>> form in only the left hand side.
1361Anywhere else it's $<I<digit>>.
a0d0e21e 1362
5f05dabc 1363Occasionally, you can't use just a C</g> to get all the changes
19799a22 1364to occur that you might want. Here are two common cases:
a0d0e21e 1365
1366 # put commas in the right places in an integer
19799a22 1367 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
a0d0e21e 1368
1369 # expand tabs to 8-column spacing
1370 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1371
6940069f 1372=item tr/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1373
6940069f 1374=item y/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1375
2c268ad5 1376Transliterates all occurrences of the characters found in the search list
a0d0e21e 1377with the corresponding character in the replacement list. It returns
1378the number of characters replaced or deleted. If no string is
2c268ad5 1379specified via the =~ or !~ operator, the $_ string is transliterated. (The
54310121 1380string specified with =~ must be a scalar variable, an array element, a
1381hash element, or an assignment to one of those, i.e., an lvalue.)
8ada0baa 1382
2c268ad5 1383A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
1384does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 1385For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
1386SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1387its own pair of quotes, which may or may not be bracketing quotes,
2c268ad5 1388e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
a0d0e21e 1389
cc255d5f 1390Note that C<tr> does B<not> do regular expression character classes
1391such as C<\d> or C<[:lower:]>. The <tr> operator is not equivalent to
1392the tr(1) utility. If you want to map strings between lower/upper
1393cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1394using the C<s> operator if you need regular expressions.
1395
8ada0baa 1396Note also that the whole range idea is rather unportable between
1397character sets--and even within character sets they may cause results
1398you probably didn't expect. A sound principle is to use only ranges
1399that begin from and end at either alphabets of equal case (a-e, A-E),
1400or digits (0-4). Anything else is unsafe. If in doubt, spell out the
1401character sets in full.
1402
a0d0e21e 1403Options:
1404
1405 c Complement the SEARCHLIST.
1406 d Delete found but unreplaced characters.
1407 s Squash duplicate replaced characters.
1408
19799a22 1409If the C</c> modifier is specified, the SEARCHLIST character set
1410is complemented. If the C</d> modifier is specified, any characters
1411specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1412(Note that this is slightly more flexible than the behavior of some
1413B<tr> programs, which delete anything they find in the SEARCHLIST,
1414period.) If the C</s> modifier is specified, sequences of characters
1415that were transliterated to the same character are squashed down
1416to a single instance of the character.
a0d0e21e 1417
1418If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1419exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
1420than the SEARCHLIST, the final character is replicated till it is long
5a964f20 1421enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
a0d0e21e 1422This latter is useful for counting characters in a class or for
1423squashing character sequences in a class.
1424
1425Examples:
1426
1427 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1428
1429 $cnt = tr/*/*/; # count the stars in $_
1430
1431 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1432
1433 $cnt = tr/0-9//; # count the digits in $_
1434
1435 tr/a-zA-Z//s; # bookkeeper -> bokeper
1436
1437 ($HOST = $host) =~ tr/a-z/A-Z/;
1438
1439 tr/a-zA-Z/ /cs; # change non-alphas to single space
1440
1441 tr [\200-\377]
1442 [\000-\177]; # delete 8th bit
1443
19799a22 1444If multiple transliterations are given for a character, only the
1445first one is used:
748a9306 1446
1447 tr/AAA/XYZ/
1448
2c268ad5 1449will transliterate any A to X.
748a9306 1450
19799a22 1451Because the transliteration table is built at compile time, neither
a0d0e21e 1452the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
19799a22 1453interpolation. That means that if you want to use variables, you
1454must use an eval():
a0d0e21e 1455
1456 eval "tr/$oldlist/$newlist/";
1457 die $@ if $@;
1458
1459 eval "tr/$oldlist/$newlist/, 1" or die $@;
1460
7e3b091d 1461=item <<EOF
1462
1463A line-oriented form of quoting is based on the shell "here-document"
1464syntax. Following a C<< << >> you specify a string to terminate
1465the quoted material, and all lines following the current line down to
1466the terminating string are the value of the item. The terminating
1467string may be either an identifier (a word), or some quoted text. If
1468quoted, the type of quotes you use determines the treatment of the
1469text, just as in regular quoting. An unquoted identifier works like
1470double quotes. There must be no space between the C<< << >> and
1471the identifier, unless the identifier is quoted. (If you put a space it
1472will be treated as a null identifier, which is valid, and matches the first
1473empty line.) The terminating string must appear by itself (unquoted and
1474with no surrounding whitespace) on the terminating line.
1475
1476 print <<EOF;
1477 The price is $Price.
1478 EOF
1479
1480 print << "EOF"; # same as above
1481 The price is $Price.
1482 EOF
1483
1484 print << `EOC`; # execute commands
1485 echo hi there
1486 echo lo there
1487 EOC
1488
1489 print <<"foo", <<"bar"; # you can stack them
1490 I said foo.
1491 foo
1492 I said bar.
1493 bar
1494
1495 myfunc(<< "THIS", 23, <<'THAT');
1496 Here's a line
1497 or two.
1498 THIS
1499 and here's another.
1500 THAT
1501
1502Just don't forget that you have to put a semicolon on the end
1503to finish the statement, as Perl doesn't know you're not going to
1504try to do this:
1505
1506 print <<ABC
1507 179231
1508 ABC
1509 + 20;
1510
1511If you want your here-docs to be indented with the
1512rest of the code, you'll need to remove leading whitespace
1513from each line manually:
1514
1515 ($quote = <<'FINIS') =~ s/^\s+//gm;
1516 The Road goes ever on and on,
1517 down from the door where it began.
1518 FINIS
1519
1520If you use a here-doc within a delimited construct, such as in C<s///eg>,
1521the quoted material must come on the lines following the final delimiter.
1522So instead of
1523
1524 s/this/<<E . 'that'
1525 the other
1526 E
1527 . 'more '/eg;
1528
1529you have to write
1530
1531 s/this/<<E . 'that'
1532 . 'more '/eg;
1533 the other
1534 E
1535
1536If the terminating identifier is on the last line of the program, you
1537must be sure there is a newline after it; otherwise, Perl will give the
1538warning B<Can't find string terminator "END" anywhere before EOF...>.
1539
1540Additionally, the quoting rules for the identifier are not related to
1541Perl's quoting rules -- C<q()>, C<qq()>, and the like are not supported
1542in place of C<''> and C<"">, and the only interpolation is for backslashing
1543the quoting character:
1544
1545 print << "abc\"def";
1546 testing...
1547 abc"def
1548
1549Finally, quoted strings cannot span multiple lines. The general rule is
1550that the identifier must be a string literal. Stick with that, and you
1551should be safe.
1552
a0d0e21e 1553=back
1554
75e14d17 1555=head2 Gory details of parsing quoted constructs
1556
19799a22 1557When presented with something that might have several different
1558interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1559principle to pick the most probable interpretation. This strategy
1560is so successful that Perl programmers often do not suspect the
1561ambivalence of what they write. But from time to time, Perl's
1562notions differ substantially from what the author honestly meant.
1563
1564This section hopes to clarify how Perl handles quoted constructs.
1565Although the most common reason to learn this is to unravel labyrinthine
1566regular expressions, because the initial steps of parsing are the
1567same for all quoting operators, they are all discussed together.
1568
1569The most important Perl parsing rule is the first one discussed
1570below: when processing a quoted construct, Perl first finds the end
1571of that construct, then interprets its contents. If you understand
1572this rule, you may skip the rest of this section on the first
1573reading. The other rules are likely to contradict the user's
1574expectations much less frequently than this first one.
1575
1576Some passes discussed below are performed concurrently, but because
1577their results are the same, we consider them individually. For different
1578quoting constructs, Perl performs different numbers of passes, from
1579one to five, but these passes are always performed in the same order.
75e14d17 1580
13a2d996 1581=over 4
75e14d17 1582
1583=item Finding the end
1584
19799a22 1585The first pass is finding the end of the quoted construct, whether
1586it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
1587construct, a C</> that terminates a C<qq//> construct, a C<]> which
35f2feb0 1588terminates C<qq[]> construct, or a C<< > >> which terminates a
1589fileglob started with C<< < >>.
75e14d17 1590
19799a22 1591When searching for single-character non-pairing delimiters, such
1592as C</>, combinations of C<\\> and C<\/> are skipped. However,
1593when searching for single-character pairing delimiter like C<[>,
1594combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
1595C<[>, C<]> are skipped as well. When searching for multicharacter
1596delimiters, nothing is skipped.
75e14d17 1597
19799a22 1598For constructs with three-part delimiters (C<s///>, C<y///>, and
1599C<tr///>), the search is repeated once more.
75e14d17 1600
19799a22 1601During this search no attention is paid to the semantics of the construct.
1602Thus:
75e14d17 1603
1604 "$hash{"$foo/$bar"}"
1605
2a94b7ce 1606or:
75e14d17 1607
1608 m/
2a94b7ce 1609 bar # NOT a comment, this slash / terminated m//!
75e14d17 1610 /x
1611
19799a22 1612do not form legal quoted expressions. The quoted part ends on the
1613first C<"> and C</>, and the rest happens to be a syntax error.
1614Because the slash that terminated C<m//> was followed by a C<SPACE>,
1615the example above is not C<m//x>, but rather C<m//> with no C</x>
1616modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 1617
1618=item Removal of backslashes before delimiters
1619
19799a22 1620During the second pass, text between the starting and ending
1621delimiters is copied to a safe location, and the C<\> is removed
1622from combinations consisting of C<\> and delimiter--or delimiters,
1623meaning both starting and ending delimiters will should these differ.
1624This removal does not happen for multi-character delimiters.
1625Note that the combination C<\\> is left intact, just as it was.
75e14d17 1626
19799a22 1627Starting from this step no information about the delimiters is
1628used in parsing.
75e14d17 1629
1630=item Interpolation
1631
19799a22 1632The next step is interpolation in the text obtained, which is now
1633delimiter-independent. There are four different cases.
75e14d17 1634
13a2d996 1635=over 4
75e14d17 1636
1637=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
1638
1639No interpolation is performed.
1640
1641=item C<''>, C<q//>
1642
1643The only interpolation is removal of C<\> from pairs C<\\>.
1644
35f2feb0 1645=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
75e14d17 1646
19799a22 1647C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1648converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
1649is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
1650The other combinations are replaced with appropriate expansions.
2a94b7ce 1651
19799a22 1652Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
1653is interpolated in the usual way. Something like C<"\Q\\E"> has
1654no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
1655result is the same as for C<"\\\\E">. As a general rule, backslashes
1656between C<\Q> and C<\E> may lead to counterintuitive results. So,
1657C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1658as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce 1659
1660 $str = '\t';
1661 return "\Q$str";
1662
1663may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1664
19799a22 1665Interpolated scalars and arrays are converted internally to the C<join> and
92d29cee 1666C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes:
75e14d17 1667
19799a22 1668 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 1669
19799a22 1670All operations above are performed simultaneously, left to right.
75e14d17 1671
19799a22 1672Because the result of C<"\Q STRING \E"> has all metacharacters
1673quoted, there is no way to insert a literal C<$> or C<@> inside a
1674C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
1675C<"\\\$">; if not, it is interpreted as the start of an interpolated
1676scalar.
75e14d17 1677
19799a22 1678Note also that the interpolation code needs to make a decision on
1679where the interpolated scalar ends. For instance, whether
35f2feb0 1680C<< "a $b -> {c}" >> really means:
75e14d17 1681
1682 "a " . $b . " -> {c}";
1683
2a94b7ce 1684or:
75e14d17 1685
1686 "a " . $b -> {c};
1687
19799a22 1688Most of the time, the longest possible text that does not include
1689spaces between components and which contains matching braces or
1690brackets. because the outcome may be determined by voting based
1691on heuristic estimators, the result is not strictly predictable.
1692Fortunately, it's usually correct for ambiguous cases.
75e14d17 1693
1694=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1695
19799a22 1696Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
1697happens (almost) as with C<qq//> constructs, but the substitution
1698of C<\> followed by RE-special chars (including C<\>) is not
1699performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
1700a C<#>-comment in a C<//x>-regular expression, no processing is
1701performed whatsoever. This is the first step at which the presence
1702of the C<//x> modifier is relevant.
1703
1704Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
1705interpolated, and constructs C<$var[SOMETHING]> are voted (by several
1706different estimators) to be either an array element or C<$var>
1707followed by an RE alternative. This is where the notation
1708C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
1709array element C<-9>, not as a regular expression from the variable
1710C<$arr> followed by a digit, which would be the interpretation of
1711C</$arr[0-9]/>. Since voting among different estimators may occur,
1712the result is not predictable.
1713
1714It is at this step that C<\1> is begrudgingly converted to C<$1> in
1715the replacement text of C<s///> to correct the incorrigible
1716I<sed> hackers who haven't picked up the saner idiom yet. A warning
9f1b1f2d 1717is emitted if the C<use warnings> pragma or the B<-w> command-line flag
1718(that is, the C<$^W> variable) was set.
19799a22 1719
1720The lack of processing of C<\\> creates specific restrictions on
1721the post-processed text. If the delimiter is C</>, one cannot get
1722the combination C<\/> into the result of this step. C</> will
1723finish the regular expression, C<\/> will be stripped to C</> on
1724the previous step, and C<\\/> will be left as is. Because C</> is
1725equivalent to C<\/> inside a regular expression, this does not
1726matter unless the delimiter happens to be character special to the
1727RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
1728alphanumeric char, as in:
2a94b7ce 1729
1730 m m ^ a \s* b mmx;
1731
19799a22 1732In the RE above, which is intentionally obfuscated for illustration, the
2a94b7ce 1733delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
aa863641 1734RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one
19799a22 1735reason you're encouraged to restrict your delimiters to non-alphanumeric,
1736non-whitespace choices.
75e14d17 1737
1738=back
1739
19799a22 1740This step is the last one for all constructs except regular expressions,
75e14d17 1741which are processed further.
1742
1743=item Interpolation of regular expressions
1744
19799a22 1745Previous steps were performed during the compilation of Perl code,
1746but this one happens at run time--although it may be optimized to
1747be calculated at compile time if appropriate. After preprocessing
1748described above, and possibly after evaluation if catenation,
1749joining, casing translation, or metaquoting are involved, the
1750resulting I<string> is passed to the RE engine for compilation.
1751
1752Whatever happens in the RE engine might be better discussed in L<perlre>,
1753but for the sake of continuity, we shall do so here.
1754
1755This is another step where the presence of the C<//x> modifier is
1756relevant. The RE engine scans the string from left to right and
1757converts it to a finite automaton.
1758
1759Backslashed characters are either replaced with corresponding
1760literal strings (as with C<\{>), or else they generate special nodes
1761in the finite automaton (as with C<\b>). Characters special to the
1762RE engine (such as C<|>) generate corresponding nodes or groups of
1763nodes. C<(?#...)> comments are ignored. All the rest is either
1764converted to literal strings to match, or else is ignored (as is
1765whitespace and C<#>-style comments if C<//x> is present).
1766
1767Parsing of the bracketed character class construct, C<[...]>, is
1768rather different than the rule used for the rest of the pattern.
1769The terminator of this construct is found using the same rules as
1770for finding the terminator of a C<{}>-delimited construct, the only
1771exception being that C<]> immediately following C<[> is treated as
1772though preceded by a backslash. Similarly, the terminator of
1773C<(?{...})> is found using the same rules as for finding the
1774terminator of a C<{}>-delimited construct.
1775
1776It is possible to inspect both the string given to RE engine and the
1777resulting finite automaton. See the arguments C<debug>/C<debugcolor>
1778in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 1779switch documented in L<perlrun/"Command Switches">.
75e14d17 1780
1781=item Optimization of regular expressions
1782
7522fed5 1783This step is listed for completeness only. Since it does not change
75e14d17 1784semantics, details of this step are not documented and are subject
19799a22 1785to change without notice. This step is performed over the finite
1786automaton that was generated during the previous pass.
2a94b7ce 1787
19799a22 1788It is at this stage that C<split()> silently optimizes C</^/> to
1789mean C</^/m>.
75e14d17 1790
1791=back
1792
a0d0e21e 1793=head2 I/O Operators
1794
54310121 1795There are several I/O operators you should know about.
fbad3eb5 1796
7b8d334a 1797A string enclosed by backticks (grave accents) first undergoes
19799a22 1798double-quote interpolation. It is then interpreted as an external
1799command, and the output of that command is the value of the
e9c56f9b 1800backtick string, like in a shell. In scalar context, a single string
1801consisting of all output is returned. In list context, a list of
1802values is returned, one per line of output. (You can set C<$/> to use
1803a different line terminator.) The command is executed each time the
1804pseudo-literal is evaluated. The status value of the command is
1805returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
1806Unlike in B<csh>, no translation is done on the return data--newlines
1807remain newlines. Unlike in any of the shells, single quotes do not
1808hide variable names in the command from interpretation. To pass a
1809literal dollar-sign through to the shell you need to hide it with a
1810backslash. The generalized form of backticks is C<qx//>. (Because
1811backticks always undergo shell expansion as well, see L<perlsec> for
1812security concerns.)
19799a22 1813
1814In scalar context, evaluating a filehandle in angle brackets yields
1815the next line from that file (the newline, if any, included), or
1816C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
1817(sometimes known as file-slurp mode) and the file is empty, it
1818returns C<''> the first time, followed by C<undef> subsequently.
1819
1820Ordinarily you must assign the returned value to a variable, but
1821there is one situation where an automatic assignment happens. If
1822and only if the input symbol is the only thing inside the conditional
1823of a C<while> statement (even if disguised as a C<for(;;)> loop),
1824the value is automatically assigned to the global variable $_,
1825destroying whatever was there previously. (This may seem like an
1826odd thing to you, but you'll use the construct in almost every Perl
17b829fa 1827script you write.) The $_ variable is not implicitly localized.
19799a22 1828You'll have to put a C<local $_;> before the loop if you want that
1829to happen.
1830
1831The following lines are equivalent:
a0d0e21e 1832
748a9306 1833 while (defined($_ = <STDIN>)) { print; }
7b8d334a 1834 while ($_ = <STDIN>) { print; }
a0d0e21e 1835 while (<STDIN>) { print; }
1836 for (;<STDIN>;) { print; }
748a9306 1837 print while defined($_ = <STDIN>);
7b8d334a 1838 print while ($_ = <STDIN>);
a0d0e21e 1839 print while <STDIN>;
1840
19799a22 1841This also behaves similarly, but avoids $_ :
7b8d334a 1842
1843 while (my $line = <STDIN>) { print $line }
1844
19799a22 1845In these loop constructs, the assigned value (whether assignment
1846is automatic or explicit) is then tested to see whether it is
1847defined. The defined test avoids problems where line has a string
1848value that would be treated as false by Perl, for example a "" or
1849a "0" with no trailing newline. If you really mean for such values
1850to terminate the loop, they should be tested for explicitly:
7b8d334a 1851
1852 while (($_ = <STDIN>) ne '0') { ... }
1853 while (<STDIN>) { last unless $_; ... }
1854
35f2feb0 1855In other boolean contexts, C<< <I<filehandle>> >> without an
9f1b1f2d 1856explicit C<defined> test or comparison elicit a warning if the
1857C<use warnings> pragma or the B<-w>
19799a22 1858command-line switch (the C<$^W> variable) is in effect.
7b8d334a 1859
5f05dabc 1860The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22 1861filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
1862in packages, where they would be interpreted as local identifiers
1863rather than global.) Additional filehandles may be created with
1864the open() function, amongst others. See L<perlopentut> and
1865L<perlfunc/open> for details on this.
a0d0e21e 1866
35f2feb0 1867If a <FILEHANDLE> is used in a context that is looking for
19799a22 1868a list, a list comprising all input lines is returned, one line per
1869list element. It's easy to grow to a rather large data space this
1870way, so use with care.
a0d0e21e 1871
35f2feb0 1872<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
19799a22 1873See L<perlfunc/readline>.
fbad3eb5 1874
35f2feb0 1875The null filehandle <> is special: it can be used to emulate the
1876behavior of B<sed> and B<awk>. Input from <> comes either from
a0d0e21e 1877standard input, or from each file listed on the command line. Here's
35f2feb0 1878how it works: the first time <> is evaluated, the @ARGV array is
5a964f20 1879checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
a0d0e21e 1880gives you standard input. The @ARGV array is then processed as a list
1881of filenames. The loop
1882
1883 while (<>) {
1884 ... # code for each line
1885 }
1886
1887is equivalent to the following Perl-like pseudo code:
1888
3e3baf6d 1889 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e 1890 while ($ARGV = shift) {
1891 open(ARGV, $ARGV);
1892 while (<ARGV>) {
1893 ... # code for each line
1894 }
1895 }
1896
19799a22 1897except that it isn't so cumbersome to say, and will actually work.
1898It really does shift the @ARGV array and put the current filename
1899into the $ARGV variable. It also uses filehandle I<ARGV>
35f2feb0 1900internally--<> is just a synonym for <ARGV>, which
19799a22 1901is magical. (The pseudo code above doesn't work because it treats
35f2feb0 1902<ARGV> as non-magical.)
a0d0e21e 1903
35f2feb0 1904You can modify @ARGV before the first <> as long as the array ends up
a0d0e21e 1905containing the list of filenames you really want. Line numbers (C<$.>)
19799a22 1906continue as though the input were one big happy file. See the example
1907in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 1908
1909If you want to set @ARGV to your own list of files, go right ahead.
1910This sets @ARGV to all plain text files if no @ARGV was given:
1911
1912 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 1913
5a964f20 1914You can even set them to pipe commands. For example, this automatically
1915filters compressed arguments through B<gzip>:
1916
1917 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
1918
1919If you want to pass switches into your script, you can use one of the
a0d0e21e 1920Getopts modules or put a loop on the front like this:
1921
1922 while ($_ = $ARGV[0], /^-/) {
1923 shift;
1924 last if /^--$/;
1925 if (/^-D(.*)/) { $debug = $1 }
1926 if (/^-v/) { $verbose++ }
5a964f20 1927 # ... # other switches
a0d0e21e 1928 }
5a964f20 1929
a0d0e21e 1930 while (<>) {
5a964f20 1931 # ... # code for each line
a0d0e21e 1932 }
1933
35f2feb0 1934The <> symbol will return C<undef> for end-of-file only once.
19799a22 1935If you call it again after this, it will assume you are processing another
1936@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
a0d0e21e 1937
b159ebd3 1938If what the angle brackets contain is a simple scalar variable (e.g.,
35f2feb0 1939<$foo>), then that variable contains the name of the
19799a22 1940filehandle to input from, or its typeglob, or a reference to the
1941same. For example:
cb1a09d0 1942
1943 $fh = \*STDIN;
1944 $line = <$fh>;
a0d0e21e 1945
5a964f20 1946If what's within the angle brackets is neither a filehandle nor a simple
1947scalar variable containing a filehandle name, typeglob, or typeglob
1948reference, it is interpreted as a filename pattern to be globbed, and
1949either a list of filenames or the next filename in the list is returned,
19799a22 1950depending on context. This distinction is determined on syntactic
35f2feb0 1951grounds alone. That means C<< <$x> >> is always a readline() from
1952an indirect handle, but C<< <$hash{key}> >> is always a glob().
5a964f20 1953That's because $x is a simple scalar variable, but C<$hash{key}> is
1954not--it's a hash element.
1955
1956One level of double-quote interpretation is done first, but you can't
35f2feb0 1957say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20 1958in the previous paragraph. (In older versions of Perl, programmers
1959would insert curly brackets to force interpretation as a filename glob:
35f2feb0 1960C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 1961internal function directly as C<glob($foo)>, which is probably the right
19799a22 1962way to have done it in the first place.) For example:
a0d0e21e 1963
1964 while (<*.c>) {
1965 chmod 0644, $_;
1966 }
1967
3a4b19e4 1968is roughly equivalent to:
a0d0e21e 1969
1970 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1971 while (<FOO>) {
5b3eff12 1972 chomp;
a0d0e21e 1973 chmod 0644, $_;
1974 }
1975
3a4b19e4 1976except that the globbing is actually done internally using the standard
1977C<File::Glob> extension. Of course, the shortest way to do the above is:
a0d0e21e 1978
1979 chmod 0644, <*.c>;
1980
19799a22 1981A (file)glob evaluates its (embedded) argument only when it is
1982starting a new list. All values must be read before it will start
1983over. In list context, this isn't important because you automatically
1984get them all anyway. However, in scalar context the operator returns
069e01df 1985the next value each time it's called, or C<undef> when the list has
19799a22 1986run out. As with filehandle reads, an automatic C<defined> is
1987generated when the glob occurs in the test part of a C<while>,
1988because legal glob returns (e.g. a file called F<0>) would otherwise
1989terminate the loop. Again, C<undef> is returned only once. So if
1990you're expecting a single value from a glob, it is much better to
1991say
4633a7c4 1992
1993 ($file) = <blurch*>;
1994
1995than
1996
1997 $file = <blurch*>;
1998
1999because the latter will alternate between returning a filename and
19799a22 2000returning false.
4633a7c4 2001
b159ebd3 2002If you're trying to do variable interpolation, it's definitely better
4633a7c4 2003to use the glob() function, because the older notation can cause people
e37d713d 2004to become confused with the indirect filehandle notation.
4633a7c4 2005
2006 @files = glob("$dir/*.[ch]");
2007 @files = glob($files[$i]);
2008
a0d0e21e 2009=head2 Constant Folding
2010
2011Like C, Perl does a certain amount of expression evaluation at
19799a22 2012compile time whenever it determines that all arguments to an
a0d0e21e 2013operator are static and have no side effects. In particular, string
2014concatenation happens at compile time between literals that don't do
19799a22 2015variable substitution. Backslash interpolation also happens at
a0d0e21e 2016compile time. You can say
2017
2018 'Now is the time for all' . "\n" .
2019 'good men to come to.'
2020
54310121 2021and this all reduces to one string internally. Likewise, if
a0d0e21e 2022you say
2023
2024 foreach $file (@filenames) {
5a964f20 2025 if (-s $file > 5 + 100 * 2**16) { }
54310121 2026 }
a0d0e21e 2027
19799a22 2028the compiler will precompute the number which that expression
2029represents so that the interpreter won't have to.
a0d0e21e 2030
2c268ad5 2031=head2 Bitwise String Operators
2032
2033Bitstrings of any size may be manipulated by the bitwise operators
2034(C<~ | & ^>).
2035
19799a22 2036If the operands to a binary bitwise op are strings of different
2037sizes, B<|> and B<^> ops act as though the shorter operand had
2038additional zero bits on the right, while the B<&> op acts as though
2039the longer operand were truncated to the length of the shorter.
2040The granularity for such extension or truncation is one or more
2041bytes.
2c268ad5 2042
2043 # ASCII-based examples
2044 print "j p \n" ^ " a h"; # prints "JAPH\n"
2045 print "JA" | " ph\n"; # prints "japh\n"
2046 print "japh\nJunk" & '_____'; # prints "JAPH\n";
2047 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
2048
19799a22 2049If you are intending to manipulate bitstrings, be certain that
2c268ad5 2050you're supplying bitstrings: If an operand is a number, that will imply
19799a22 2051a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5 2052operation you intend by using C<""> or C<0+>, as in the examples below.
2053
2054 $foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF)
2055 $foo = '150' | 105 ; # yields 255
2056 $foo = 150 | '105'; # yields 255
2057 $foo = '150' | '105'; # yields string '155' (under ASCII)
2058
2059 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
2060 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 2061
1ae175c8 2062See L<perlfunc/vec> for information on how to manipulate individual bits
2063in a bit vector.
2064
55497cff 2065=head2 Integer Arithmetic
a0d0e21e 2066
19799a22 2067By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e 2068floating point. But by saying
2069
2070 use integer;
2071
2072you may tell the compiler that it's okay to use integer operations
19799a22 2073(if it feels like it) from here to the end of the enclosing BLOCK.
2074An inner BLOCK may countermand this by saying
a0d0e21e 2075
2076 no integer;
2077
19799a22 2078which lasts until the end of that BLOCK. Note that this doesn't
2079mean everything is only an integer, merely that Perl may use integer
2080operations if it is so inclined. For example, even under C<use
2081integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2082or so.
2083
2084Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
13a2d996 2085and ">>") always produce integral results. (But see also
2086L<Bitwise String Operators>.) However, C<use integer> still has meaning for
19799a22 2087them. By default, their results are interpreted as unsigned integers, but
2088if C<use integer> is in effect, their results are interpreted
2089as signed integers. For example, C<~0> usually evaluates to a large
2090integral value. However, C<use integer; ~0> is C<-1> on twos-complement
2091machines.
68dc0745 2092
2093=head2 Floating-point Arithmetic
2094
2095While C<use integer> provides integer-only arithmetic, there is no
19799a22 2096analogous mechanism to provide automatic rounding or truncation to a
2097certain number of decimal places. For rounding to a certain number
2098of digits, sprintf() or printf() is usually the easiest route.
2099See L<perlfaq4>.
68dc0745 2100
5a964f20 2101Floating-point numbers are only approximations to what a mathematician
2102would call real numbers. There are infinitely more reals than floats,
2103so some corners must be cut. For example:
2104
2105 printf "%.20g\n", 123456789123456789;
2106 # produces 123456789123456784
2107
2108Testing for exact equality of floating-point equality or inequality is
2109not a good idea. Here's a (relatively expensive) work-around to compare
2110whether two floating-point numbers are equal to a particular number of
2111decimal places. See Knuth, volume II, for a more robust treatment of
2112this topic.
2113
2114 sub fp_equal {
2115 my ($X, $Y, $POINTS) = @_;
2116 my ($tX, $tY);
2117 $tX = sprintf("%.${POINTS}g", $X);
2118 $tY = sprintf("%.${POINTS}g", $Y);
2119 return $tX eq $tY;
2120 }
2121
68dc0745 2122The POSIX module (part of the standard perl distribution) implements
19799a22 2123ceil(), floor(), and other mathematical and trigonometric functions.
2124The Math::Complex module (part of the standard perl distribution)
2125defines mathematical functions that work on both the reals and the
2126imaginary numbers. Math::Complex not as efficient as POSIX, but
68dc0745 2127POSIX can't work with complex numbers.
2128
2129Rounding in financial applications can have serious implications, and
2130the rounding method used should be specified precisely. In these
2131cases, it probably pays not to trust whichever system rounding is
2132being used by Perl, but to instead implement the rounding function you
2133need yourself.
5a964f20 2134
2135=head2 Bigger Numbers
2136
2137The standard Math::BigInt and Math::BigFloat modules provide
19799a22 2138variable-precision arithmetic and overloaded operators, although
cd5c4fce 2139they're currently pretty slow. At the cost of some space and
19799a22 2140considerable speed, they avoid the normal pitfalls associated with
2141limited-precision representations.
5a964f20 2142
2143 use Math::BigInt;
2144 $x = Math::BigInt->new('123456789123456789');
2145 print $x * $x;
2146
2147 # prints +15241578780673678515622620750190521
19799a22 2148
cd5c4fce 2149There are several modules that let you calculate with (bound only by
2150memory and cpu-time) unlimited or fixed precision. There are also
2151some non-standard modules that provide faster implementations via
2152external C libraries.
2153
2154Here is a short, but incomplete summary:
2155
2156 Math::Fraction big, unlimited fractions like 9973 / 12967
2157 Math::String treat string sequences like numbers
2158 Math::FixedPrecision calculate with a fixed precision
2159 Math::Currency for currency calculations
2160 Bit::Vector manipulate bit vectors fast (uses C)
2161 Math::BigIntFast Bit::Vector wrapper for big numbers
2162 Math::Pari provides access to the Pari C library
2163 Math::BigInteger uses an external C library
2164 Math::Cephes uses external Cephes C library (no big numbers)
2165 Math::Cephes::Fraction fractions via the Cephes library
2166 Math::GMP another one using an external C library
2167
2168Choose wisely.
16070b82 2169
2170=cut