pod/perlop.pod

   1 =head1 NAME
   2
   3 perlop - Perl operators and precedence
   4
   5 =head1 SYNOPSIS
   6
   7 Perl operators have the following associativity and precedence,
   8 listed from highest precedence to lowest.  Operators borrowed from
   9 C keep the same precedence relationship with each other, even where
  10 C's precedence is slightly screwy.  (This makes learning Perl easier
  11 for C folks.)  With very few exceptions, these all operate on scalar
  12 values only, not array values.
  13
  14     left        terms and list operators (leftward)
  15     left        ->
  16     nonassoc    ++ --
  17     right       **
  18     right       ! ~ \ and unary + and -
  19     left        =~ !~
  20     left        * / % x
  21     left        + - .
  22     left        << >>
  23     nonassoc    named unary operators
  24     nonassoc    < > <= >= lt gt le ge
  25     nonassoc    == != <=> eq ne cmp
  26     left        &
  27     left        | ^
  28     left        &&
  29     left        || //
  30     nonassoc    ..  ...
  31     right       ?:
  32     right       = += -= *= etc.
  33     left        , =>
  34     nonassoc    list operators (rightward)
  35     right       not
  36     left        and
  37     left        or xor err
  38
  39 In the following sections, these operators are covered in precedence order.
  40
  41 Many operators can be overloaded for objects.  See L<overload>.
  42
  43 =head1 DESCRIPTION
  44
  45 =head2 Terms and List Operators (Leftward)
  46
  47 A TERM has the highest precedence in Perl.  They include variables,
  48 quote and quote-like operators, any expression in parentheses,
  49 and any function whose arguments are parenthesized.  Actually, there
  50 aren't really functions in this sense, just list operators and unary
  51 operators behaving as functions because you put parentheses around
  52 the arguments.  These are all documented in L<perlfunc>.
  53
  54 If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
  55 is followed by a left parenthesis as the next token, the operator and
  56 arguments within parentheses are taken to be of highest precedence,
  57 just like a normal function call.
  58
  59 In the absence of parentheses, the precedence of list operators such as
  60 C<print>, C<sort>, or C<chmod> is either very high or very low depending on
  61 whether you are looking at the left side or the right side of the operator.
  62 For example, in
  63
  64     @ary = (1, 3, sort 4, 2);
  65     print @ary;         # prints 1324
  66
  67 the commas on the right of the sort are evaluated before the sort,
  68 but the commas on the left are evaluated after.  In other words,
  69 list operators tend to gobble up all arguments that follow, and
  70 then act like a simple TERM with regard to the preceding expression.
  71 Be careful with parentheses:
  72
  73     # These evaluate exit before doing the print:
  74     print($foo, exit);  # Obviously not what you want.
  75     print $foo, exit;   # Nor is this.
  76
  77     # These do the print before evaluating exit:
  78     (print $foo), exit; # This is what you want.
  79     print($foo), exit;  # Or this.
  80     print ($foo), exit; # Or even this.
  81
  82 Also note that
  83
  84     print ($foo & 255) + 1, "\n";
  85
  86 probably doesn't do what you expect at first glance.  See
  87 L<Named Unary Operators> for more discussion of this.
  88
  89 Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
  90 well as subroutine and method calls, and the anonymous
  91 constructors C<[]> and C<{}>.
  92
  93 See also L<Quote and Quote-like Operators> toward the end of this section,
  94 as well as L<"I/O Operators">.
  95
  96 =head2 The Arrow Operator
  97
  98 "C<< -> >>" is an infix dereference operator, just as it is in C
  99 and C++.  If the right side is either a C<[...]>, C<{...}>, or a
 100 C<(...)> subscript, then the left side must be either a hard or
 101 symbolic reference to an array, a hash, or a subroutine respectively.
 102 (Or technically speaking, a location capable of holding a hard
 103 reference, if it's an array or hash reference being used for
 104 assignment.)  See L<perlreftut> and L<perlref>.
 105
 106 Otherwise, the right side is a method name or a simple scalar
 107 variable containing either the method name or a subroutine reference,
 108 and the left side must be either an object (a blessed reference)
 109 or a class name (that is, a package name).  See L<perlobj>.
 110
 111 =head2 Auto-increment and Auto-decrement
 112
 113 "++" and "--" work as in C.  That is, if placed before a variable, they
 114 increment or decrement the variable before returning the value, and if
 115 placed after, increment or decrement the variable after returning the value.
 116
 117 The auto-increment operator has a little extra builtin magic to it.  If
 118 you increment a variable that is numeric, or that has ever been used in
 119 a numeric context, you get a normal increment.  If, however, the
 120 variable has been used in only string contexts since it was set, and
 121 has a value that is not the empty string and matches the pattern
 122 C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
 123 character within its range, with carry:
 124
 125     print ++($foo = '99');      # prints '100'
 126     print ++($foo = 'a0');      # prints 'a1'
 127     print ++($foo = 'Az');      # prints 'Ba'
 128     print ++($foo = 'zz');      # prints 'aaa'
 129
 130 The auto-decrement operator is not magical.
 131
 132 =head2 Exponentiation
 133
 134 Binary "**" is the exponentiation operator.  It binds even more
 135 tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
 136 implemented using C's pow(3) function, which actually works on doubles
 137 internally.)
 138
 139 =head2 Symbolic Unary Operators
 140
 141 Unary "!" performs logical negation, i.e., "not".  See also C<not> for a lower
 142 precedence version of this.
 143
 144 Unary "-" performs arithmetic negation if the operand is numeric.  If
 145 the operand is an identifier, a string consisting of a minus sign
 146 concatenated with the identifier is returned.  Otherwise, if the string
 147 starts with a plus or minus, a string starting with the opposite sign
 148 is returned.  One effect of these rules is that C<-bareword> is equivalent
 149 to C<"-bareword">.
 150
 151 Unary "~" performs bitwise negation, i.e., 1's complement.  For
 152 example, C<0666 & ~027> is 0640.  (See also L<Integer Arithmetic> and
 153 L<Bitwise String Operators>.)  Note that the width of the result is
 154 platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
 155 bits wide on a 64-bit platform, so if you are expecting a certain bit
 156 width, remember use the & operator to mask off the excess bits.
 157
 158 Unary "+" has no effect whatsoever, even on strings.  It is useful
 159 syntactically for separating a function name from a parenthesized expression
 160 that would otherwise be interpreted as the complete list of function
 161 arguments.  (See examples above under L<Terms and List Operators (Leftward)>.)
 162
 163 Unary "\" creates a reference to whatever follows it.  See L<perlreftut>
 164 and L<perlref>.  Do not confuse this behavior with the behavior of
 165 backslash within a string, although both forms do convey the notion
 166 of protecting the next thing from interpolation.
 167
 168 =head2 Binding Operators
 169
 170 Binary "=~" binds a scalar expression to a pattern match.  Certain operations
 171 search or modify the string $_ by default.  This operator makes that kind
 172 of operation work on some other string.  The right argument is a search
 173 pattern, substitution, or transliteration.  The left argument is what is
 174 supposed to be searched, substituted, or transliterated instead of the default
 175 $_.  When used in scalar context, the return value generally indicates the
 176 success of the operation.  Behavior in list context depends on the particular
 177 operator.  See L</"Regexp Quote-Like Operators"> for details.
 178
 179 If the right argument is an expression rather than a search pattern,
 180 substitution, or transliteration, it is interpreted as a search pattern at run
 181 time.
 182
 183 Binary "!~" is just like "=~" except the return value is negated in
 184 the logical sense.
 185
 186 =head2 Multiplicative Operators
 187
 188 Binary "*" multiplies two numbers.
 189
 190 Binary "/" divides two numbers.
 191
 192 Binary "%" computes the modulus of two numbers.  Given integer
 193 operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
 194 C<$a> minus the largest multiple of C<$b> that is not greater than
 195 C<$a>.  If C<$b> is negative, then C<$a % $b> is C<$a> minus the
 196 smallest multiple of C<$b> that is not less than C<$a> (i.e. the
 197 result will be less than or equal to zero).
 198 Note that when C<use integer> is in scope, "%" gives you direct access
 199 to the modulus operator as implemented by your C compiler.  This
 200 operator is not as well defined for negative operands, but it will
 201 execute faster.
 202
 203 Binary "x" is the repetition operator.  In scalar context or if the left
 204 operand is not enclosed in parentheses, it returns a string consisting
 205 of the left operand repeated the number of times specified by the right
 206 operand.  In list context, if the left operand is enclosed in
 207 parentheses, it repeats the list.
 208
 209     print '-' x 80;             # print row of dashes
 210
 211     print "\t" x ($tab/8), ' ' x ($tab%8);      # tab over
 212
 213     @ones = (1) x 80;           # a list of 80 1's
 214     @ones = (5) x @ones;        # set all elements to 5
 215
 216
 217 =head2 Additive Operators
 218
 219 Binary "+" returns the sum of two numbers.
 220
 221 Binary "-" returns the difference of two numbers.
 222
 223 Binary "." concatenates two strings.
 224
 225 =head2 Shift Operators
 226
 227 Binary "<<" returns the value of its left argument shifted left by the
 228 number of bits specified by the right argument.  Arguments should be
 229 integers.  (See also L<Integer Arithmetic>.)
 230
 231 Binary ">>" returns the value of its left argument shifted right by
 232 the number of bits specified by the right argument.  Arguments should
 233 be integers.  (See also L<Integer Arithmetic>.)
 234
 235 Note that both "<<" and ">>" in Perl are implemented directly using
 236 "<<" and ">>" in C.  If C<use integer> (see L<Integer Arithmetic>) is
 237 in force then signed C integers are used, else unsigned C integers are
 238 used.  Either way, the implementation isn't going to generate results
 239 larger than the size of the integer type Perl was built with (32 bits
 240 or 64 bits).
 241
 242 The result of overflowing the range of the integers is undefined
 243 because it is undefined also in C.  In other words, using 32-bit
 244 integers, C<< 1 << 32 >> is undefined.  Shifting by a negative number
 245 of bits is also undefined.
 246
 247 =head2 Named Unary Operators
 248
 249 The various named unary operators are treated as functions with one
 250 argument, with optional parentheses.  These include the filetest
 251 operators, like C<-f>, C<-M>, etc.  See L<perlfunc>.
 252
 253 If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
 254 is followed by a left parenthesis as the next token, the operator and
 255 arguments within parentheses are taken to be of highest precedence,
 256 just like a normal function call.  For example,
 257 because named unary operators are higher precedence than ||:
 258
 259     chdir $foo    || die;       # (chdir $foo) || die
 260     chdir($foo)   || die;       # (chdir $foo) || die
 261     chdir ($foo)  || die;       # (chdir $foo) || die
 262     chdir +($foo) || die;       # (chdir $foo) || die
 263
 264 but, because * is higher precedence than named operators:
 265
 266     chdir $foo * 20;    # chdir ($foo * 20)
 267     chdir($foo) * 20;   # (chdir $foo) * 20
 268     chdir ($foo) * 20;  # (chdir $foo) * 20
 269     chdir +($foo) * 20; # chdir ($foo * 20)
 270
 271     rand 10 * 20;       # rand (10 * 20)
 272     rand(10) * 20;      # (rand 10) * 20
 273     rand (10) * 20;     # (rand 10) * 20
 274     rand +(10) * 20;    # rand (10 * 20)
 275
 276 See also L<"Terms and List Operators (Leftward)">.
 277
 278 =head2 Relational Operators
 279
 280 Binary "<" returns true if the left argument is numerically less than
 281 the right argument.
 282
 283 Binary ">" returns true if the left argument is numerically greater
 284 than the right argument.
 285
 286 Binary "<=" returns true if the left argument is numerically less than
 287 or equal to the right argument.
 288
 289 Binary ">=" returns true if the left argument is numerically greater
 290 than or equal to the right argument.
 291
 292 Binary "lt" returns true if the left argument is stringwise less than
 293 the right argument.
 294
 295 Binary "gt" returns true if the left argument is stringwise greater
 296 than the right argument.
 297
 298 Binary "le" returns true if the left argument is stringwise less than
 299 or equal to the right argument.
 300
 301 Binary "ge" returns true if the left argument is stringwise greater
 302 than or equal to the right argument.
 303
 304 =head2 Equality Operators
 305
 306 Binary "==" returns true if the left argument is numerically equal to
 307 the right argument.
 308
 309 Binary "!=" returns true if the left argument is numerically not equal
 310 to the right argument.
 311
 312 Binary "<=>" returns -1, 0, or 1 depending on whether the left
 313 argument is numerically less than, equal to, or greater than the right
 314 argument.  If your platform supports NaNs (not-a-numbers) as numeric
 315 values, using them with "<=>" returns undef.  NaN is not "<", "==", ">",
 316 "<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
 317 returns true, as does NaN != anything else. If your platform doesn't
 318 support NaNs then NaN is just a string with numeric value 0.
 319
 320     perl -le '$a = NaN; print "No NaN support here" if $a == $a'
 321     perl -le '$a = NaN; print "NaN support here" if $a != $a'
 322
 323 Binary "eq" returns true if the left argument is stringwise equal to
 324 the right argument.
 325
 326 Binary "ne" returns true if the left argument is stringwise not equal
 327 to the right argument.
 328
 329 Binary "cmp" returns -1, 0, or 1 depending on whether the left
 330 argument is stringwise less than, equal to, or greater than the right
 331 argument.
 332
 333 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
 334 by the current locale if C<use locale> is in effect.  See L<perllocale>.
 335
 336 =head2 Bitwise And
 337
 338 Binary "&" returns its operands ANDed together bit by bit.
 339 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
 340
 341 Note that "&" has lower priority than relational operators, so for example
 342 the brackets are essential in a test like
 343
 344         print "Even\n" if ($x & 1) == 0;
 345
 346 =head2 Bitwise Or and Exclusive Or
 347
 348 Binary "|" returns its operands ORed together bit by bit.
 349 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
 350
 351 Binary "^" returns its operands XORed together bit by bit.
 352 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
 353
 354 Note that "|" and "^" have lower priority than relational operators, so
 355 for example the brackets are essential in a test like
 356
 357         print "false\n" if (8 | 2) != 10;
 358
 359 =head2 C-style Logical And
 360
 361 Binary "&&" performs a short-circuit logical AND operation.  That is,
 362 if the left operand is false, the right operand is not even evaluated.
 363 Scalar or list context propagates down to the right operand if it
 364 is evaluated.
 365
 366 =head2 C-style Logical Or
 367
 368 Binary "||" performs a short-circuit logical OR operation.  That is,
 369 if the left operand is true, the right operand is not even evaluated.
 370 Scalar or list context propagates down to the right operand if it
 371 is evaluated.
 372
 373 =head2 C-style Logical Defined-Or
 374
 375 Although it has no direct equivalent in C, Perl's C<//> operator is related
 376 to its C-style or.  In fact, it's exactly the same as C<||>, except that it
 377 tests the left hand side's definedness instead of its truth.  Thus, C<$a // $b>
 378 is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
 379 rather than the value of C<defined($a)>) and is exactly equivalent to
 380 C<defined($a) ? $a : $b>.  This is very useful for providing default values
 381 for variables.  If you actually want to test if at least one of C<$a> and C<$b> is
 382 defined, use C<defined($a // $b)>.
 383
 384 The C<||>, C<//> and C<&&> operators differ from C's in that, rather than returning
 385 0 or 1, they return the last value evaluated.  Thus, a reasonably portable
 386 way to find out the home directory might be:
 387
 388     $home = $ENV{'HOME'} // $ENV{'LOGDIR'} //
 389         (getpwuid($<))[7] // die "You're homeless!\n";
 390
 391 In particular, this means that you shouldn't use this
 392 for selecting between two aggregates for assignment:
 393
 394     @a = @b || @c;              # this is wrong
 395     @a = scalar(@b) || @c;      # really meant this
 396     @a = @b ? @b : @c;          # this works fine, though
 397
 398 As more readable alternatives to C<&&>, C<//> and C<||> when used for
 399 control flow, Perl provides C<and>, C<err> and C<or> operators (see below).
 400 The short-circuit behavior is identical.  The precedence of "and", "err"
 401 and "or" is much lower, however, so that you can safely use them after a
 402 list operator without the need for parentheses:
 403
 404     unlink "alpha", "beta", "gamma"
 405             or gripe(), next LINE;
 406
 407 With the C-style operators that would have been written like this:
 408
 409     unlink("alpha", "beta", "gamma")
 410             || (gripe(), next LINE);
 411
 412 Using "or" for assignment is unlikely to do what you want; see below.
 413
 414 =head2 Range Operators
 415
 416 Binary ".." is the range operator, which is really two different
 417 operators depending on the context.  In list context, it returns an
 418 list of values counting (up by ones) from the left value to the right
 419 value.  If the left value is greater than the right value then it
 420 returns the empty array.  The range operator is useful for writing
 421 C<foreach (1..10)> loops and for doing slice operations on arrays. In
 422 the current implementation, no temporary array is created when the
 423 range operator is used as the expression in C<foreach> loops, but older
 424 versions of Perl might burn a lot of memory when you write something
 425 like this:
 426
 427     for (1 .. 1_000_000) {
 428         # code
 429     }
 430
 431 The range operator also works on strings, using the magical auto-increment,
 432 see below.
 433
 434 In scalar context, ".." returns a boolean value.  The operator is
 435 bistable, like a flip-flop, and emulates the line-range (comma) operator
 436 of B<sed>, B<awk>, and various editors.  Each ".." operator maintains its
 437 own boolean state.  It is false as long as its left operand is false.
 438 Once the left operand is true, the range operator stays true until the
 439 right operand is true, I<AFTER> which the range operator becomes false
 440 again.  It doesn't become false till the next time the range operator is
 441 evaluated.  It can test the right operand and become false on the same
 442 evaluation it became true (as in B<awk>), but it still returns true once.
 443 If you don't want it to test the right operand till the next
 444 evaluation, as in B<sed>, just use three dots ("...") instead of
 445 two.  In all other regards, "..." behaves just like ".." does.
 446
 447 The right operand is not evaluated while the operator is in the
 448 "false" state, and the left operand is not evaluated while the
 449 operator is in the "true" state.  The precedence is a little lower
 450 than || and &&.  The value returned is either the empty string for
 451 false, or a sequence number (beginning with 1) for true.  The
 452 sequence number is reset for each range encountered.  The final
 453 sequence number in a range has the string "E0" appended to it, which
 454 doesn't affect its numeric value, but gives you something to search
 455 for if you want to exclude the endpoint.  You can exclude the
 456 beginning point by waiting for the sequence number to be greater
 457 than 1.  If either operand of scalar ".." is a constant expression,
 458 that operand is implicitly compared to the C<$.> variable, the
 459 current line number.  Examples:
 460
 461 As a scalar operator:
 462
 463     if (101 .. 200) { print; }  # print 2nd hundred lines
 464     next line if (1 .. /^$/);   # skip header lines
 465     s/^/> / if (/^$/ .. eof()); # quote body
 466
 467     # parse mail messages
 468     while (<>) {
 469         $in_header =   1  .. /^$/;
 470         $in_body   = /^$/ .. eof();
 471         # do something based on those
 472     } continue {
 473         close ARGV if eof;              # reset $. each file
 474     }
 475
 476 As a list operator:
 477
 478     for (101 .. 200) { print; } # print $_ 100 times
 479     @foo = @foo[0 .. $#foo];    # an expensive no-op
 480     @foo = @foo[$#foo-4 .. $#foo];      # slice last 5 items
 481
 482 The range operator (in list context) makes use of the magical
 483 auto-increment algorithm if the operands are strings.  You
 484 can say
 485
 486     @alphabet = ('A' .. 'Z');
 487
 488 to get all normal letters of the English alphabet, or
 489
 490     $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
 491
 492 to get a hexadecimal digit, or
 493
 494     @z2 = ('01' .. '31');  print $z2[$mday];
 495
 496 to get dates with leading zeros.  If the final value specified is not
 497 in the sequence that the magical increment would produce, the sequence
 498 goes until the next value would be longer than the final value
 499 specified.
 500
 501 =head2 Conditional Operator
 502
 503 Ternary "?:" is the conditional operator, just as in C.  It works much
 504 like an if-then-else.  If the argument before the ? is true, the
 505 argument before the : is returned, otherwise the argument after the :
 506 is returned.  For example:
 507
 508     printf "I have %d dog%s.\n", $n,
 509             ($n == 1) ? '' : "s";
 510
 511 Scalar or list context propagates downward into the 2nd
 512 or 3rd argument, whichever is selected.
 513
 514     $a = $ok ? $b : $c;  # get a scalar
 515     @a = $ok ? @b : @c;  # get an array
 516     $a = $ok ? @b : @c;  # oops, that's just a count!
 517
 518 The operator may be assigned to if both the 2nd and 3rd arguments are
 519 legal lvalues (meaning that you can assign to them):
 520
 521     ($a_or_b ? $a : $b) = $c;
 522
 523 Because this operator produces an assignable result, using assignments
 524 without parentheses will get you in trouble.  For example, this:
 525
 526     $a % 2 ? $a += 10 : $a += 2
 527
 528 Really means this:
 529
 530     (($a % 2) ? ($a += 10) : $a) += 2
 531
 532 Rather than this:
 533
 534     ($a % 2) ? ($a += 10) : ($a += 2)
 535
 536 That should probably be written more simply as:
 537
 538     $a += ($a % 2) ? 10 : 2;
 539
 540 =head2 Assignment Operators
 541
 542 "=" is the ordinary assignment operator.
 543
 544 Assignment operators work as in C.  That is,
 545
 546     $a += 2;
 547
 548 is equivalent to
 549
 550     $a = $a + 2;
 551
 552 although without duplicating any side effects that dereferencing the lvalue
 553 might trigger, such as from tie().  Other assignment operators work similarly.
 554 The following are recognized:
 555
 556     **=    +=    *=    &=    <<=    &&=
 557            -=    /=    |=    >>=    ||=
 558            .=    %=    ^=
 559                  x=
 560
 561 Although these are grouped by family, they all have the precedence
 562 of assignment.
 563
 564 Unlike in C, the scalar assignment operator produces a valid lvalue.
 565 Modifying an assignment is equivalent to doing the assignment and
 566 then modifying the variable that was assigned to.  This is useful
 567 for modifying a copy of something, like this:
 568
 569     ($tmp = $global) =~ tr [A-Z] [a-z];
 570
 571 Likewise,
 572
 573     ($a += 2) *= 3;
 574
 575 is equivalent to
 576
 577     $a += 2;
 578     $a *= 3;
 579
 580 Similarly, a list assignment in list context produces the list of
 581 lvalues assigned to, and a list assignment in scalar context returns
 582 the number of elements produced by the expression on the right hand
 583 side of the assignment.
 584
 585 =head2 Comma Operator
 586
 587 Binary "," is the comma operator.  In scalar context it evaluates
 588 its left argument, throws that value away, then evaluates its right
 589 argument and returns that value.  This is just like C's comma operator.
 590
 591 In list context, it's just the list argument separator, and inserts
 592 both its arguments into the list.
 593
 594 The => digraph is mostly just a synonym for the comma operator.  It's useful for
 595 documenting arguments that come in pairs.  As of release 5.001, it also forces
 596 any word to the left of it to be interpreted as a string.
 597
 598 =head2 List Operators (Rightward)
 599
 600 On the right side of a list operator, it has very low precedence,
 601 such that it controls all comma-separated expressions found there.
 602 The only operators with lower precedence are the logical operators
 603 "and", "or", and "not", which may be used to evaluate calls to list
 604 operators without the need for extra parentheses:
 605
 606     open HANDLE, "filename"
 607         or die "Can't open: $!\n";
 608
 609 See also discussion of list operators in L<Terms and List Operators (Leftward)>.
 610
 611 =head2 Logical Not
 612
 613 Unary "not" returns the logical negation of the expression to its right.
 614 It's the equivalent of "!" except for the very low precedence.
 615
 616 =head2 Logical And
 617
 618 Binary "and" returns the logical conjunction of the two surrounding
 619 expressions.  It's equivalent to && except for the very low
 620 precedence.  This means that it short-circuits: i.e., the right
 621 expression is evaluated only if the left expression is true.
 622
 623 =head2 Logical or, Defined or, and Exclusive Or
 624
 625 Binary "or" returns the logical disjunction of the two surrounding
 626 expressions.  It's equivalent to || except for the very low precedence.
 627 This makes it useful for control flow
 628
 629     print FH $data              or die "Can't write to FH: $!";
 630
 631 This means that it short-circuits: i.e., the right expression is evaluated
 632 only if the left expression is false.  Due to its precedence, you should
 633 probably avoid using this for assignment, only for control flow.
 634
 635     $a = $b or $c;              # bug: this is wrong
 636     ($a = $b) or $c;            # really means this
 637     $a = $b || $c;              # better written this way
 638
 639 However, when it's a list-context assignment and you're trying to use
 640 "||" for control flow, you probably need "or" so that the assignment
 641 takes higher precedence.
 642
 643     @info = stat($file) || die;     # oops, scalar sense of stat!
 644     @info = stat($file) or die;     # better, now @info gets its due
 645
 646 Then again, you could always use parentheses.
 647
 648 Binary "err" is equivalent to C<//>--it's just like binary "or", except it tests
 649 its left argument's definedness instead of its truth.  There are two ways to
 650 remember "err":  either because many functions return C<undef> on an B<err>or,
 651 or as a sort of correction:  C<$a=($b err 'default')>
 652
 653 Binary "xor" returns the exclusive-OR of the two surrounding expressions.
 654 It cannot short circuit, of course.
 655
 656 =head2 C Operators Missing From Perl
 657
 658 Here is what C has that Perl doesn't:
 659
 660 =over 8
 661
 662 =item unary &
 663
 664 Address-of operator.  (But see the "\" operator for taking a reference.)
 665
 666 =item unary *
 667
 668 Dereference-address operator. (Perl's prefix dereferencing
 669 operators are typed: $, @, %, and &.)
 670
 671 =item (TYPE)
 672
 673 Type-casting operator.
 674
 675 =back
 676
 677 =head2 Quote and Quote-like Operators
 678
 679 While we usually think of quotes as literal values, in Perl they
 680 function as operators, providing various kinds of interpolating and
 681 pattern matching capabilities.  Perl provides customary quote characters
 682 for these behaviors, but also provides a way for you to choose your
 683 quote character for any of them.  In the following table, a C<{}> represents
 684 any pair of delimiters you choose.
 685
 686     Customary  Generic        Meaning        Interpolates
 687         ''       q{}          Literal             no
 688         ""      qq{}          Literal             yes
 689         ``      qx{}          Command             yes*
 690                 qw{}         Word list            no
 691         //       m{}       Pattern match          yes*
 692                 qr{}          Pattern             yes*
 693                  s{}{}      Substitution          yes*
 694                 tr{}{}    Transliteration         no (but see below)
 695         <<EOF                 here-doc            yes*
 696
 697         * unless the delimiter is ''.
 698
 699 Non-bracketing delimiters use the same character fore and aft, but the four
 700 sorts of brackets (round, angle, square, curly) will all nest, which means
 701 that
 702
 703         q{foo{bar}baz}
 704
 705 is the same as
 706
 707         'foo{bar}baz'
 708
 709 Note, however, that this does not always work for quoting Perl code:
 710
 711         $s = q{ if($a eq "}") ... }; # WRONG
 712
 713 is a syntax error. The C<Text::Balanced> module (from CPAN, and
 714 starting from Perl 5.8 part of the standard distribution) is able
 715 to do this properly.
 716
 717 There can be whitespace between the operator and the quoting
 718 characters, except when C<#> is being used as the quoting character.
 719 C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
 720 operator C<q> followed by a comment.  Its argument will be taken
 721 from the next line.  This allows you to write:
 722
 723     s {foo}  # Replace foo
 724       {bar}  # with bar.
 725
 726 The following escape sequences are available in constructs that interpolate
 727 and in transliterations.
 728
 729     \t          tab             (HT, TAB)
 730     \n          newline         (NL)
 731     \r          return          (CR)
 732     \f          form feed       (FF)
 733     \b          backspace       (BS)
 734     \a          alarm (bell)    (BEL)
 735     \e          escape          (ESC)
 736     \033        octal char      (ESC)
 737     \x1b        hex char        (ESC)
 738     \x{263a}    wide hex char   (SMILEY)
 739     \c[         control char    (ESC)
 740     \N{name}    named Unicode character
 741
 742 The following escape sequences are available in constructs that interpolate
 743 but not in transliterations.
 744
 745     \l          lowercase next char
 746     \u          uppercase next char
 747     \L          lowercase till \E
 748     \U          uppercase till \E
 749     \E          end case modification
 750     \Q          quote non-word characters till \E
 751
 752 If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
 753 C<\u> and C<\U> is taken from the current locale.  See L<perllocale>.
 754 If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
 755 beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
 756 C<\U> is as defined by Unicode.  For documentation of C<\N{name}>,
 757 see L<charnames>.
 758
 759 All systems use the virtual C<"\n"> to represent a line terminator,
 760 called a "newline".  There is no such thing as an unvarying, physical
 761 newline character.  It is only an illusion that the operating system,
 762 device drivers, C libraries, and Perl all conspire to preserve.  Not all
 763 systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF.  For example,
 764 on a Mac, these are reversed, and on systems without line terminator,
 765 printing C<"\n"> may emit no actual data.  In general, use C<"\n"> when
 766 you mean a "newline" for your system, but use the literal ASCII when you
 767 need an exact character.  For example, most networking protocols expect
 768 and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
 769 and although they often accept just C<"\012">, they seldom tolerate just
 770 C<"\015">.  If you get in the habit of using C<"\n"> for networking,
 771 you may be burned some day.
 772
 773 For constructs that do interpolate, variables beginning with "C<$>"
 774 or "C<@>" are interpolated.  Subscripted variables such as C<$a[3]> or
 775 C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
 776 But method calls such as C<< $obj->meth >> are not.
 777
 778 Interpolating an array or slice interpolates the elements in order,
 779 separated by the value of C<$">, so is equivalent to interpolating
 780 C<join $", @array>.    "Punctuation" arrays such as C<@+> are only
 781 interpolated if the name is enclosed in braces C<@{+}>.
 782
 783 You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
 784 An unescaped C<$> or C<@> interpolates the corresponding variable,
 785 while escaping will cause the literal string C<\$> to be inserted.
 786 You'll need to write something like C<m/\Quser\E\@\Qhost/>.
 787
 788 Patterns are subject to an additional level of interpretation as a
 789 regular expression.  This is done as a second pass, after variables are
 790 interpolated, so that regular expressions may be incorporated into the
 791 pattern from the variables.  If this is not what you want, use C<\Q> to
 792 interpolate a variable literally.
 793
 794 Apart from the behavior described above, Perl does not expand
 795 multiple levels of interpolation.  In particular, contrary to the
 796 expectations of shell programmers, back-quotes do I<NOT> interpolate
 797 within double quotes, nor do single quotes impede evaluation of
 798 variables when used within double quotes.
 799
 800 =head2 Regexp Quote-Like Operators
 801
 802 Here are the quote-like operators that apply to pattern
 803 matching and related activities.
 804
 805 =over 8
 806
 807 =item ?PATTERN?
 808
 809 This is just like the C</pattern/> search, except that it matches only
 810 once between calls to the reset() operator.  This is a useful
 811 optimization when you want to see only the first occurrence of
 812 something in each file of a set of files, for instance.  Only C<??>
 813 patterns local to the current package are reset.
 814
 815     while (<>) {
 816         if (?^$?) {
 817                             # blank line between header and body
 818         }
 819     } continue {
 820         reset if eof;       # clear ?? status for next file
 821     }
 822
 823 This usage is vaguely deprecated, which means it just might possibly
 824 be removed in some distant future version of Perl, perhaps somewhere
 825 around the year 2168.
 826
 827 =item m/PATTERN/cgimosx
 828
 829 =item /PATTERN/cgimosx
 830
 831 Searches a string for a pattern match, and in scalar context returns
 832 true if it succeeds, false if it fails.  If no string is specified
 833 via the C<=~> or C<!~> operator, the $_ string is searched.  (The
 834 string specified with C<=~> need not be an lvalue--it may be the
 835 result of an expression evaluation, but remember the C<=~> binds
 836 rather tightly.)  See also L<perlre>.  See L<perllocale> for
 837 discussion of additional considerations that apply when C<use locale>
 838 is in effect.
 839
 840 Options are:
 841
 842     c   Do not reset search position on a failed match when /g is in effect.
 843     g   Match globally, i.e., find all occurrences.
 844     i   Do case-insensitive pattern matching.
 845     m   Treat string as multiple lines.
 846     o   Compile pattern only once.
 847     s   Treat string as single line.
 848     x   Use extended regular expressions.
 849
 850 If "/" is the delimiter then the initial C<m> is optional.  With the C<m>
 851 you can use any pair of non-alphanumeric, non-whitespace characters
 852 as delimiters.  This is particularly useful for matching path names
 853 that contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
 854 the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
 855 If "'" is the delimiter, no interpolation is performed on the PATTERN.
 856
 857 PATTERN may contain variables, which will be interpolated (and the
 858 pattern recompiled) every time the pattern search is evaluated, except
 859 for when the delimiter is a single quote.  (Note that C<$(>, C<$)>, and
 860 C<$|> are not interpolated because they look like end-of-string tests.)
 861 If you want such a pattern to be compiled only once, add a C</o> after
 862 the trailing delimiter.  This avoids expensive run-time recompilations,
 863 and is useful when the value you are interpolating won't change over
 864 the life of the script.  However, mentioning C</o> constitutes a promise
 865 that you won't change the variables in the pattern.  If you change them,
 866 Perl won't even notice.  See also L<"qr/STRING/imosx">.
 867
 868 If the PATTERN evaluates to the empty string, the last
 869 I<successfully> matched regular expression is used instead. In this
 870 case, only the C<g> and C<c> flags on the empty pattern is honoured -
 871 the other flags are taken from the original pattern. If no match has
 872 previously succeeded, this will (silently) act instead as a genuine
 873 empty pattern (which will always match).
 874
 875 Note that it's possible to confuse Perl into thinking C<//> (the empty
 876 regex) is really C<//> (the defined-or operator).  Perl is usually pretty
 877 good about this, but some pathological cases might trigger this, such as
 878 C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
 879 (C<print $fh(//> or C<print($fh //>?).  In all of these examples, Perl
 880 will assume you meant defined-or.  If you meant the empty regex, just
 881 use parentheses or spaces to disambiguate, or even prefix the empty
 882 regex with an C<m> (so C<//> becomes C<m//>).
 883
 884 If the C</g> option is not used, C<m//> in list context returns a
 885 list consisting of the subexpressions matched by the parentheses in the
 886 pattern, i.e., (C<$1>, C<$2>, C<$3>...).  (Note that here C<$1> etc. are
 887 also set, and that this differs from Perl 4's behavior.)  When there are
 888 no parentheses in the pattern, the return value is the list C<(1)> for
 889 success.  With or without parentheses, an empty list is returned upon
 890 failure.
 891
 892 Examples:
 893
 894     open(TTY, '/dev/tty');
 895     <TTY> =~ /^y/i && foo();    # do foo if desired
 896
 897     if (/Version: *([0-9.]*)/) { $version = $1; }
 898
 899     next if m#^/usr/spool/uucp#;
 900
 901     # poor man's grep
 902     $arg = shift;
 903     while (<>) {
 904         print if /$arg/o;       # compile only once
 905     }
 906
 907     if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
 908
 909 This last example splits $foo into the first two words and the
 910 remainder of the line, and assigns those three fields to $F1, $F2, and
 911 $Etc.  The conditional is true if any variables were assigned, i.e., if
 912 the pattern matched.
 913
 914 The C</g> modifier specifies global pattern matching--that is,
 915 matching as many times as possible within the string.  How it behaves
 916 depends on the context.  In list context, it returns a list of the
 917 substrings matched by any capturing parentheses in the regular
 918 expression.  If there are no parentheses, it returns a list of all
 919 the matched strings, as if there were parentheses around the whole
 920 pattern.
 921
 922 In scalar context, each execution of C<m//g> finds the next match,
 923 returning true if it matches, and false if there is no further match.
 924 The position after the last match can be read or set using the pos()
 925 function; see L<perlfunc/pos>.   A failed match normally resets the
 926 search position to the beginning of the string, but you can avoid that
 927 by adding the C</c> modifier (e.g. C<m//gc>).  Modifying the target
 928 string also resets the search position.
 929
 930 You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
 931 zero-width assertion that matches the exact position where the previous
 932 C<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
 933 still anchors at pos(), but the match is of course only attempted once.
 934 Using C<\G> without C</g> on a target string that has not previously had a
 935 C</g> match applied to it is the same as using the C<\A> assertion to match
 936 the beginning of the string.  Note also that, currently, C<\G> is only
 937 properly supported when anchored at the very beginning of the pattern.
 938
 939 Examples:
 940
 941     # list context
 942     ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
 943
 944     # scalar context
 945     $/ = "";
 946     while (defined($paragraph = <>)) {
 947         while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
 948             $sentences++;
 949         }
 950     }
 951     print "$sentences\n";
 952
 953     # using m//gc with \G
 954     $_ = "ppooqppqq";
 955     while ($i++ < 2) {
 956         print "1: '";
 957         print $1 while /(o)/gc; print "', pos=", pos, "\n";
 958         print "2: '";
 959         print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
 960         print "3: '";
 961         print $1 while /(p)/gc; print "', pos=", pos, "\n";
 962     }
 963     print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
 964
 965 The last example should print:
 966
 967     1: 'oo', pos=4
 968     2: 'q', pos=5
 969     3: 'pp', pos=7
 970     1: '', pos=7
 971     2: 'q', pos=8
 972     3: '', pos=8
 973     Final: 'q', pos=8
 974
 975 Notice that the final match matched C<q> instead of C<p>, which a match
 976 without the C<\G> anchor would have done. Also note that the final match
 977 did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
 978 final match did indeed match C<p>, it's a good bet that you're running an
 979 older (pre-5.6.0) Perl.
 980
 981 A useful idiom for C<lex>-like scanners is C</\G.../gc>.  You can
 982 combine several regexps like this to process a string part-by-part,
 983 doing different actions depending on which regexp matched.  Each
 984 regexp tries to match where the previous one leaves off.
 985
 986  $_ = <<'EOL';
 987       $url = new URI::URL "http://www/";   die if $url eq "xXx";
 988  EOL
 989  LOOP:
 990     {
 991       print(" digits"),         redo LOOP if /\G\d+\b[,.;]?\s*/gc;
 992       print(" lowercase"),      redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
 993       print(" UPPERCASE"),      redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
 994       print(" Capitalized"),    redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
 995       print(" MiXeD"),          redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
 996       print(" alphanumeric"),   redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
 997       print(" line-noise"),     redo LOOP if /\G[^A-Za-z0-9]+/gc;
 998       print ". That's all!\n";
 999     }
1000
1001 Here is the output (split into several lines):
1002
1003  line-noise lowercase line-noise lowercase UPPERCASE line-noise
1004  UPPERCASE line-noise lowercase line-noise lowercase line-noise
1005  lowercase lowercase line-noise lowercase lowercase line-noise
1006  MiXeD line-noise. That's all!
1007
1008 =item q/STRING/
1009
1010 =item C<'STRING'>
1011
1012 A single-quoted, literal string.  A backslash represents a backslash
1013 unless followed by the delimiter or another backslash, in which case
1014 the delimiter or backslash is interpolated.
1015
1016     $foo = q!I said, "You said, 'She said it.'"!;
1017     $bar = q('This is it.');
1018     $baz = '\n';                # a two-character string
1019
1020 =item qq/STRING/
1021
1022 =item "STRING"
1023
1024 A double-quoted, interpolated string.
1025
1026     $_ .= qq
1027      (*** The previous line contains the naughty word "$1".\n)
1028                 if /\b(tcl|java|python)\b/i;      # :-)
1029     $baz = "\n";                # a one-character string
1030
1031 =item qr/STRING/imosx
1032
1033 This operator quotes (and possibly compiles) its I<STRING> as a regular
1034 expression.  I<STRING> is interpolated the same way as I<PATTERN>
1035 in C<m/PATTERN/>.  If "'" is used as the delimiter, no interpolation
1036 is done.  Returns a Perl value which may be used instead of the
1037 corresponding C</STRING/imosx> expression.
1038
1039 For example,
1040
1041     $rex = qr/my.STRING/is;
1042     s/$rex/foo/;
1043
1044 is equivalent to
1045
1046     s/my.STRING/foo/is;
1047
1048 The result may be used as a subpattern in a match:
1049
1050     $re = qr/$pattern/;
1051     $string =~ /foo${re}bar/;   # can be interpolated in other patterns
1052     $string =~ $re;             # or used standalone
1053     $string =~ /$re/;           # or this way
1054
1055 Since Perl may compile the pattern at the moment of execution of qr()
1056 operator, using qr() may have speed advantages in some situations,
1057 notably if the result of qr() is used standalone:
1058
1059     sub match {
1060         my $patterns = shift;
1061         my @compiled = map qr/$_/i, @$patterns;
1062         grep {
1063             my $success = 0;
1064             foreach my $pat (@compiled) {
1065                 $success = 1, last if /$pat/;
1066             }
1067             $success;
1068         } @_;
1069     }
1070
1071 Precompilation of the pattern into an internal representation at
1072 the moment of qr() avoids a need to recompile the pattern every
1073 time a match C</$pat/> is attempted.  (Perl has many other internal
1074 optimizations, but none would be triggered in the above example if
1075 we did not use qr() operator.)
1076
1077 Options are:
1078
1079     i   Do case-insensitive pattern matching.
1080     m   Treat string as multiple lines.
1081     o   Compile pattern only once.
1082     s   Treat string as single line.
1083     x   Use extended regular expressions.
1084
1085 See L<perlre> for additional information on valid syntax for STRING, and
1086 for a detailed look at the semantics of regular expressions.
1087
1088 =item qx/STRING/
1089
1090 =item `STRING`
1091
1092 A string which is (possibly) interpolated and then executed as a
1093 system command with C</bin/sh> or its equivalent.  Shell wildcards,
1094 pipes, and redirections will be honored.  The collected standard
1095 output of the command is returned; standard error is unaffected.  In
1096 scalar context, it comes back as a single (potentially multi-line)
1097 string, or undef if the command failed.  In list context, returns a
1098 list of lines (however you've defined lines with $/ or
1099 $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
1100
1101 Because backticks do not affect standard error, use shell file descriptor
1102 syntax (assuming the shell supports this) if you care to address this.
1103 To capture a command's STDERR and STDOUT together:
1104
1105     $output = `cmd 2>&1`;
1106
1107 To capture a command's STDOUT but discard its STDERR:
1108
1109     $output = `cmd 2>/dev/null`;
1110
1111 To capture a command's STDERR but discard its STDOUT (ordering is
1112 important here):
1113
1114     $output = `cmd 2>&1 1>/dev/null`;
1115
1116 To exchange a command's STDOUT and STDERR in order to capture the STDERR
1117 but leave its STDOUT to come out the old STDERR:
1118
1119     $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1120
1121 To read both a command's STDOUT and its STDERR separately, it's easiest
1122 and safest to redirect them separately to files, and then read from those
1123 files when the program is done:
1124
1125     system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
1126
1127 Using single-quote as a delimiter protects the command from Perl's
1128 double-quote interpolation, passing it on to the shell instead:
1129
1130     $perl_info  = qx(ps $$);            # that's Perl's $$
1131     $shell_info = qx'ps $$';            # that's the new shell's $$
1132
1133 How that string gets evaluated is entirely subject to the command
1134 interpreter on your system.  On most platforms, you will have to protect
1135 shell metacharacters if you want them treated literally.  This is in
1136 practice difficult to do, as it's unclear how to escape which characters.
1137 See L<perlsec> for a clean and safe example of a manual fork() and exec()
1138 to emulate backticks safely.
1139
1140 On some platforms (notably DOS-like ones), the shell may not be
1141 capable of dealing with multiline commands, so putting newlines in
1142 the string may not get you what you want.  You may be able to evaluate
1143 multiple commands in a single line by separating them with the command
1144 separator character, if your shell supports that (e.g. C<;> on many Unix
1145 shells; C<&> on the Windows NT C<cmd> shell).
1146
1147 Beginning with v5.6.0, Perl will attempt to flush all files opened for
1148 output before starting the child process, but this may not be supported
1149 on some platforms (see L<perlport>).  To be safe, you may need to set
1150 C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1151 C<IO::Handle> on any open handles.
1152
1153 Beware that some command shells may place restrictions on the length
1154 of the command line.  You must ensure your strings don't exceed this
1155 limit after any necessary interpolations.  See the platform-specific
1156 release notes for more details about your particular environment.
1157
1158 Using this operator can lead to programs that are difficult to port,
1159 because the shell commands called vary between systems, and may in
1160 fact not be present at all.  As one example, the C<type> command under
1161 the POSIX shell is very different from the C<type> command under DOS.
1162 That doesn't mean you should go out of your way to avoid backticks
1163 when they're the right way to get something done.  Perl was made to be
1164 a glue language, and one of the things it glues together is commands.
1165 Just understand what you're getting yourself into.
1166
1167 See L<"I/O Operators"> for more discussion.
1168
1169 =item qw/STRING/
1170
1171 Evaluates to a list of the words extracted out of STRING, using embedded
1172 whitespace as the word delimiters.  It can be understood as being roughly
1173 equivalent to:
1174
1175     split(' ', q/STRING/);
1176
1177 the difference being that it generates a real list at compile time.  So
1178 this expression:
1179
1180     qw(foo bar baz)
1181
1182 is semantically equivalent to the list:
1183
1184     'foo', 'bar', 'baz'
1185
1186 Some frequently seen examples:
1187
1188     use POSIX qw( setlocale localeconv )
1189     @EXPORT = qw( foo bar baz );
1190
1191 A common mistake is to try to separate the words with comma or to
1192 put comments into a multi-line C<qw>-string.  For this reason, the
1193 C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
1194 produces warnings if the STRING contains the "," or the "#" character.
1195
1196 =item s/PATTERN/REPLACEMENT/egimosx
1197
1198 Searches a string for a pattern, and if found, replaces that pattern
1199 with the replacement text and returns the number of substitutions
1200 made.  Otherwise it returns false (specifically, the empty string).
1201
1202 If no string is specified via the C<=~> or C<!~> operator, the C<$_>
1203 variable is searched and modified.  (The string specified with C<=~> must
1204 be scalar variable, an array element, a hash element, or an assignment
1205 to one of those, i.e., an lvalue.)
1206
1207 If the delimiter chosen is a single quote, no interpolation is
1208 done on either the PATTERN or the REPLACEMENT.  Otherwise, if the
1209 PATTERN contains a $ that looks like a variable rather than an
1210 end-of-string test, the variable will be interpolated into the pattern
1211 at run-time.  If you want the pattern compiled only once the first time
1212 the variable is interpolated, use the C</o> option.  If the pattern
1213 evaluates to the empty string, the last successfully executed regular
1214 expression is used instead.  See L<perlre> for further explanation on these.
1215 See L<perllocale> for discussion of additional considerations that apply
1216 when C<use locale> is in effect.
1217
1218 Options are:
1219
1220     e   Evaluate the right side as an expression.
1221     g   Replace globally, i.e., all occurrences.
1222     i   Do case-insensitive pattern matching.
1223     m   Treat string as multiple lines.
1224     o   Compile pattern only once.
1225     s   Treat string as single line.
1226     x   Use extended regular expressions.
1227
1228 Any non-alphanumeric, non-whitespace delimiter may replace the
1229 slashes.  If single quotes are used, no interpretation is done on the
1230 replacement string (the C</e> modifier overrides this, however).  Unlike
1231 Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
1232 text is not evaluated as a command.  If the
1233 PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
1234 pair of quotes, which may or may not be bracketing quotes, e.g.,
1235 C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
1236 replacement portion to be treated as a full-fledged Perl expression
1237 and evaluated right then and there.  It is, however, syntax checked at
1238 compile-time. A second C<e> modifier will cause the replacement portion
1239 to be C<eval>ed before being run as a Perl expression.
1240
1241 Examples:
1242
1243     s/\bgreen\b/mauve/g;                # don't change wintergreen
1244
1245     $path =~ s|/usr/bin|/usr/local/bin|;
1246
1247     s/Login: $foo/Login: $bar/; # run-time pattern
1248
1249     ($foo = $bar) =~ s/this/that/;      # copy first, then change
1250
1251     $count = ($paragraph =~ s/Mister\b/Mr./g);  # get change-count
1252
1253     $_ = 'abc123xyz';
1254     s/\d+/$&*2/e;               # yields 'abc246xyz'
1255     s/\d+/sprintf("%5d",$&)/e;  # yields 'abc  246xyz'
1256     s/\w/$& x 2/eg;             # yields 'aabbcc  224466xxyyzz'
1257
1258     s/%(.)/$percent{$1}/g;      # change percent escapes; no /e
1259     s/%(.)/$percent{$1} || $&/ge;       # expr now, so /e
1260     s/^=(\w+)/&pod($1)/ge;      # use function call
1261
1262     # expand variables in $_, but dynamics only, using
1263     # symbolic dereferencing
1264     s/\$(\w+)/${$1}/g;
1265
1266     # Add one to the value of any numbers in the string
1267     s/(\d+)/1 + $1/eg;
1268
1269     # This will expand any embedded scalar variable
1270     # (including lexicals) in $_ : First $1 is interpolated
1271     # to the variable name, and then evaluated
1272     s/(\$\w+)/$1/eeg;
1273
1274     # Delete (most) C comments.
1275     $program =~ s {
1276         /\*     # Match the opening delimiter.
1277         .*?     # Match a minimal number of characters.
1278         \*/     # Match the closing delimiter.
1279     } []gsx;
1280
1281     s/^\s*(.*?)\s*$/$1/;        # trim white space in $_, expensively
1282
1283     for ($variable) {           # trim white space in $variable, cheap
1284         s/^\s+//;
1285         s/\s+$//;
1286     }
1287
1288     s/([^ ]*) *([^ ]*)/$2 $1/;  # reverse 1st two fields
1289
1290 Note the use of $ instead of \ in the last example.  Unlike
1291 B<sed>, we use the \<I<digit>> form in only the left hand side.
1292 Anywhere else it's $<I<digit>>.
1293
1294 Occasionally, you can't use just a C</g> to get all the changes
1295 to occur that you might want.  Here are two common cases:
1296
1297     # put commas in the right places in an integer
1298     1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
1299
1300     # expand tabs to 8-column spacing
1301     1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1302
1303 =item tr/SEARCHLIST/REPLACEMENTLIST/cds
1304
1305 =item y/SEARCHLIST/REPLACEMENTLIST/cds
1306
1307 Transliterates all occurrences of the characters found in the search list
1308 with the corresponding character in the replacement list.  It returns
1309 the number of characters replaced or deleted.  If no string is
1310 specified via the =~ or !~ operator, the $_ string is transliterated.  (The
1311 string specified with =~ must be a scalar variable, an array element, a
1312 hash element, or an assignment to one of those, i.e., an lvalue.)
1313
1314 A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
1315 does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
1316 For B<sed> devotees, C<y> is provided as a synonym for C<tr>.  If the
1317 SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1318 its own pair of quotes, which may or may not be bracketing quotes,
1319 e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
1320
1321 Note that C<tr> does B<not> do regular expression character classes
1322 such as C<\d> or C<[:lower:]>.  The <tr> operator is not equivalent to
1323 the tr(1) utility.  If you want to map strings between lower/upper
1324 cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1325 using the C<s> operator if you need regular expressions.
1326
1327 Note also that the whole range idea is rather unportable between
1328 character sets--and even within character sets they may cause results
1329 you probably didn't expect.  A sound principle is to use only ranges
1330 that begin from and end at either alphabets of equal case (a-e, A-E),
1331 or digits (0-4).  Anything else is unsafe.  If in doubt, spell out the
1332 character sets in full.
1333
1334 Options:
1335
1336     c   Complement the SEARCHLIST.
1337     d   Delete found but unreplaced characters.
1338     s   Squash duplicate replaced characters.
1339
1340 If the C</c> modifier is specified, the SEARCHLIST character set
1341 is complemented.  If the C</d> modifier is specified, any characters
1342 specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1343 (Note that this is slightly more flexible than the behavior of some
1344 B<tr> programs, which delete anything they find in the SEARCHLIST,
1345 period.) If the C</s> modifier is specified, sequences of characters
1346 that were transliterated to the same character are squashed down
1347 to a single instance of the character.
1348
1349 If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1350 exactly as specified.  Otherwise, if the REPLACEMENTLIST is shorter
1351 than the SEARCHLIST, the final character is replicated till it is long
1352 enough.  If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
1353 This latter is useful for counting characters in a class or for
1354 squashing character sequences in a class.
1355
1356 Examples:
1357
1358     $ARGV[1] =~ tr/A-Z/a-z/;    # canonicalize to lower case
1359
1360     $cnt = tr/*/*/;             # count the stars in $_
1361
1362     $cnt = $sky =~ tr/*/*/;     # count the stars in $sky
1363
1364     $cnt = tr/0-9//;            # count the digits in $_
1365
1366     tr/a-zA-Z//s;               # bookkeeper -> bokeper
1367
1368     ($HOST = $host) =~ tr/a-z/A-Z/;
1369
1370     tr/a-zA-Z/ /cs;             # change non-alphas to single space
1371
1372     tr [\200-\377]
1373        [\000-\177];             # delete 8th bit
1374
1375 If multiple transliterations are given for a character, only the
1376 first one is used:
1377
1378     tr/AAA/XYZ/
1379
1380 will transliterate any A to X.
1381
1382 Because the transliteration table is built at compile time, neither
1383 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
1384 interpolation.  That means that if you want to use variables, you
1385 must use an eval():
1386
1387     eval "tr/$oldlist/$newlist/";
1388     die $@ if $@;
1389
1390     eval "tr/$oldlist/$newlist/, 1" or die $@;
1391
1392 =item <<EOF
1393
1394 A line-oriented form of quoting is based on the shell "here-document"
1395 syntax.  Following a C<< << >> you specify a string to terminate
1396 the quoted material, and all lines following the current line down to
1397 the terminating string are the value of the item.  The terminating
1398 string may be either an identifier (a word), or some quoted text.  If
1399 quoted, the type of quotes you use determines the treatment of the
1400 text, just as in regular quoting.  An unquoted identifier works like
1401 double quotes.  There must be no space between the C<< << >> and
1402 the identifier, unless the identifier is quoted.  (If you put a space it
1403 will be treated as a null identifier, which is valid, and matches the first
1404 empty line.)  The terminating string must appear by itself (unquoted and
1405 with no surrounding whitespace) on the terminating line.
1406
1407        print <<EOF;
1408     The price is $Price.
1409     EOF
1410
1411        print << "EOF"; # same as above
1412     The price is $Price.
1413     EOF
1414
1415        print << `EOC`; # execute commands
1416     echo hi there
1417     echo lo there
1418     EOC
1419
1420        print <<"foo", <<"bar"; # you can stack them
1421     I said foo.
1422     foo
1423     I said bar.
1424     bar
1425
1426        myfunc(<< "THIS", 23, <<'THAT');
1427     Here's a line
1428     or two.
1429     THIS
1430     and here's another.
1431     THAT
1432
1433 Just don't forget that you have to put a semicolon on the end
1434 to finish the statement, as Perl doesn't know you're not going to
1435 try to do this:
1436
1437        print <<ABC
1438     179231
1439     ABC
1440        + 20;
1441
1442 If you want your here-docs to be indented with the
1443 rest of the code, you'll need to remove leading whitespace
1444 from each line manually:
1445
1446     ($quote = <<'FINIS') =~ s/^\s+//gm;
1447        The Road goes ever on and on,
1448        down from the door where it began.
1449     FINIS
1450
1451 If you use a here-doc within a delimited construct, such as in C<s///eg>,
1452 the quoted material must come on the lines following the final delimiter.
1453 So instead of
1454
1455     s/this/<<E . 'that'
1456     the other
1457     E
1458      . 'more '/eg;
1459
1460 you have to write
1461
1462     s/this/<<E . 'that'
1463      . 'more '/eg;
1464     the other
1465     E
1466
1467 If the terminating identifier is on the last line of the program, you
1468 must be sure there is a newline after it; otherwise, Perl will give the
1469 warning B<Can't find string terminator "END" anywhere before EOF...>.
1470
1471 Additionally, the quoting rules for the identifier are not related to
1472 Perl's quoting rules -- C<q()>, C<qq()>, and the like are not supported
1473 in place of C<''> and C<"">, and the only interpolation is for backslashing
1474 the quoting character:
1475
1476     print << "abc\"def";
1477     testing...
1478     abc"def
1479
1480 Finally, quoted strings cannot span multiple lines.  The general rule is
1481 that the identifier must be a string literal.  Stick with that, and you
1482 should be safe.
1483
1484 =back
1485
1486 =head2 Gory details of parsing quoted constructs
1487
1488 When presented with something that might have several different
1489 interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1490 principle to pick the most probable interpretation.  This strategy
1491 is so successful that Perl programmers often do not suspect the
1492 ambivalence of what they write.  But from time to time, Perl's
1493 notions differ substantially from what the author honestly meant.
1494
1495 This section hopes to clarify how Perl handles quoted constructs.
1496 Although the most common reason to learn this is to unravel labyrinthine
1497 regular expressions, because the initial steps of parsing are the
1498 same for all quoting operators, they are all discussed together.
1499
1500 The most important Perl parsing rule is the first one discussed
1501 below: when processing a quoted construct, Perl first finds the end
1502 of that construct, then interprets its contents.  If you understand
1503 this rule, you may skip the rest of this section on the first
1504 reading.  The other rules are likely to contradict the user's
1505 expectations much less frequently than this first one.
1506
1507 Some passes discussed below are performed concurrently, but because
1508 their results are the same, we consider them individually.  For different
1509 quoting constructs, Perl performs different numbers of passes, from
1510 one to five, but these passes are always performed in the same order.
1511
1512 =over 4
1513
1514 =item Finding the end
1515
1516 The first pass is finding the end of the quoted construct, whether
1517 it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
1518 construct, a C</> that terminates a C<qq//> construct, a C<]> which
1519 terminates C<qq[]> construct, or a C<< > >> which terminates a
1520 fileglob started with C<< < >>.
1521
1522 When searching for single-character non-pairing delimiters, such
1523 as C</>, combinations of C<\\> and C<\/> are skipped.  However,
1524 when searching for single-character pairing delimiter like C<[>,
1525 combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
1526 C<[>, C<]> are skipped as well.  When searching for multicharacter
1527 delimiters, nothing is skipped.
1528
1529 For constructs with three-part delimiters (C<s///>, C<y///>, and
1530 C<tr///>), the search is repeated once more.
1531
1532 During this search no attention is paid to the semantics of the construct.
1533 Thus:
1534
1535     "$hash{"$foo/$bar"}"
1536
1537 or:
1538
1539     m/
1540       bar       # NOT a comment, this slash / terminated m//!
1541      /x
1542
1543 do not form legal quoted expressions.   The quoted part ends on the
1544 first C<"> and C</>, and the rest happens to be a syntax error.
1545 Because the slash that terminated C<m//> was followed by a C<SPACE>,
1546 the example above is not C<m//x>, but rather C<m//> with no C</x>
1547 modifier.  So the embedded C<#> is interpreted as a literal C<#>.
1548
1549 =item Removal of backslashes before delimiters
1550
1551 During the second pass, text between the starting and ending
1552 delimiters is copied to a safe location, and the C<\> is removed
1553 from combinations consisting of C<\> and delimiter--or delimiters,
1554 meaning both starting and ending delimiters will should these differ.
1555 This removal does not happen for multi-character delimiters.
1556 Note that the combination C<\\> is left intact, just as it was.
1557
1558 Starting from this step no information about the delimiters is
1559 used in parsing.
1560
1561 =item Interpolation
1562
1563 The next step is interpolation in the text obtained, which is now
1564 delimiter-independent.  There are four different cases.
1565
1566 =over 4
1567
1568 =item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
1569
1570 No interpolation is performed.
1571
1572 =item C<''>, C<q//>
1573
1574 The only interpolation is removal of C<\> from pairs C<\\>.
1575
1576 =item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
1577
1578 C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1579 converted to corresponding Perl constructs.  Thus, C<"$foo\Qbaz$bar">
1580 is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
1581 The other combinations are replaced with appropriate expansions.
1582
1583 Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
1584 is interpolated in the usual way.  Something like C<"\Q\\E"> has
1585 no C<\E> inside.  instead, it has C<\Q>, C<\\>, and C<E>, so the
1586 result is the same as for C<"\\\\E">.  As a general rule, backslashes
1587 between C<\Q> and C<\E> may lead to counterintuitive results.  So,
1588 C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1589 as C<"\\\t"> (since TAB is not alphanumeric).  Note also that:
1590
1591   $str = '\t';
1592   return "\Q$str";
1593
1594 may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1595
1596 Interpolated scalars and arrays are converted internally to the C<join> and
1597 C<.> catenation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
1598
1599   $foo . " XXX '" . (join $", @arr) . "'";
1600
1601 All operations above are performed simultaneously, left to right.
1602
1603 Because the result of C<"\Q STRING \E"> has all metacharacters
1604 quoted, there is no way to insert a literal C<$> or C<@> inside a
1605 C<\Q\E> pair.  If protected by C<\>, C<$> will be quoted to became
1606 C<"\\\$">; if not, it is interpreted as the start of an interpolated
1607 scalar.
1608
1609 Note also that the interpolation code needs to make a decision on
1610 where the interpolated scalar ends.  For instance, whether
1611 C<< "a $b -> {c}" >> really means:
1612
1613   "a " . $b . " -> {c}";
1614
1615 or:
1616
1617   "a " . $b -> {c};
1618
1619 Most of the time, the longest possible text that does not include
1620 spaces between components and which contains matching braces or
1621 brackets.  because the outcome may be determined by voting based
1622 on heuristic estimators, the result is not strictly predictable.
1623 Fortunately, it's usually correct for ambiguous cases.
1624
1625 =item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1626
1627 Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
1628 happens (almost) as with C<qq//> constructs, but the substitution
1629 of C<\> followed by RE-special chars (including C<\>) is not
1630 performed.  Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
1631 a C<#>-comment in a C<//x>-regular expression, no processing is
1632 performed whatsoever.  This is the first step at which the presence
1633 of the C<//x> modifier is relevant.
1634
1635 Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
1636 interpolated, and constructs C<$var[SOMETHING]> are voted (by several
1637 different estimators) to be either an array element or C<$var>
1638 followed by an RE alternative.  This is where the notation
1639 C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
1640 array element C<-9>, not as a regular expression from the variable
1641 C<$arr> followed by a digit, which would be the interpretation of
1642 C</$arr[0-9]/>.  Since voting among different estimators may occur,
1643 the result is not predictable.
1644
1645 It is at this step that C<\1> is begrudgingly converted to C<$1> in
1646 the replacement text of C<s///> to correct the incorrigible
1647 I<sed> hackers who haven't picked up the saner idiom yet.  A warning
1648 is emitted if the C<use warnings> pragma or the B<-w> command-line flag
1649 (that is, the C<$^W> variable) was set.
1650
1651 The lack of processing of C<\\> creates specific restrictions on
1652 the post-processed text.  If the delimiter is C</>, one cannot get
1653 the combination C<\/> into the result of this step.  C</> will
1654 finish the regular expression, C<\/> will be stripped to C</> on
1655 the previous step, and C<\\/> will be left as is.  Because C</> is
1656 equivalent to C<\/> inside a regular expression, this does not
1657 matter unless the delimiter happens to be character special to the
1658 RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
1659 alphanumeric char, as in:
1660
1661   m m ^ a \s* b mmx;
1662
1663 In the RE above, which is intentionally obfuscated for illustration, the
1664 delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
1665 RE is the same as for C<m/ ^ a s* b /mx>).  There's more than one
1666 reason you're encouraged to restrict your delimiters to non-alphanumeric,
1667 non-whitespace choices.
1668
1669 =back
1670
1671 This step is the last one for all constructs except regular expressions,
1672 which are processed further.
1673
1674 =item Interpolation of regular expressions
1675
1676 Previous steps were performed during the compilation of Perl code,
1677 but this one happens at run time--although it may be optimized to
1678 be calculated at compile time if appropriate.  After preprocessing
1679 described above, and possibly after evaluation if catenation,
1680 joining, casing translation, or metaquoting are involved, the
1681 resulting I<string> is passed to the RE engine for compilation.
1682
1683 Whatever happens in the RE engine might be better discussed in L<perlre>,
1684 but for the sake of continuity, we shall do so here.
1685
1686 This is another step where the presence of the C<//x> modifier is
1687 relevant.  The RE engine scans the string from left to right and
1688 converts it to a finite automaton.
1689
1690 Backslashed characters are either replaced with corresponding
1691 literal strings (as with C<\{>), or else they generate special nodes
1692 in the finite automaton (as with C<\b>).  Characters special to the
1693 RE engine (such as C<|>) generate corresponding nodes or groups of
1694 nodes.  C<(?#...)> comments are ignored.  All the rest is either
1695 converted to literal strings to match, or else is ignored (as is
1696 whitespace and C<#>-style comments if C<//x> is present).
1697
1698 Parsing of the bracketed character class construct, C<[...]>, is
1699 rather different than the rule used for the rest of the pattern.
1700 The terminator of this construct is found using the same rules as
1701 for finding the terminator of a C<{}>-delimited construct, the only
1702 exception being that C<]> immediately following C<[> is treated as
1703 though preceded by a backslash.  Similarly, the terminator of
1704 C<(?{...})> is found using the same rules as for finding the
1705 terminator of a C<{}>-delimited construct.
1706
1707 It is possible to inspect both the string given to RE engine and the
1708 resulting finite automaton.  See the arguments C<debug>/C<debugcolor>
1709 in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
1710 switch documented in L<perlrun/"Command Switches">.
1711
1712 =item Optimization of regular expressions
1713
1714 This step is listed for completeness only.  Since it does not change
1715 semantics, details of this step are not documented and are subject
1716 to change without notice.  This step is performed over the finite
1717 automaton that was generated during the previous pass.
1718
1719 It is at this stage that C<split()> silently optimizes C</^/> to
1720 mean C</^/m>.
1721
1722 =back
1723
1724 =head2 I/O Operators
1725
1726 There are several I/O operators you should know about.
1727
1728 A string enclosed by backticks (grave accents) first undergoes
1729 double-quote interpolation.  It is then interpreted as an external
1730 command, and the output of that command is the value of the
1731 backtick string, like in a shell.  In scalar context, a single string
1732 consisting of all output is returned.  In list context, a list of
1733 values is returned, one per line of output.  (You can set C<$/> to use
1734 a different line terminator.)  The command is executed each time the
1735 pseudo-literal is evaluated.  The status value of the command is
1736 returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
1737 Unlike in B<csh>, no translation is done on the return data--newlines
1738 remain newlines.  Unlike in any of the shells, single quotes do not
1739 hide variable names in the command from interpretation.  To pass a
1740 literal dollar-sign through to the shell you need to hide it with a
1741 backslash.  The generalized form of backticks is C<qx//>.  (Because
1742 backticks always undergo shell expansion as well, see L<perlsec> for
1743 security concerns.)
1744
1745 In scalar context, evaluating a filehandle in angle brackets yields
1746 the next line from that file (the newline, if any, included), or
1747 C<undef> at end-of-file or on error.  When C<$/> is set to C<undef>
1748 (sometimes known as file-slurp mode) and the file is empty, it
1749 returns C<''> the first time, followed by C<undef> subsequently.
1750
1751 Ordinarily you must assign the returned value to a variable, but
1752 there is one situation where an automatic assignment happens.  If
1753 and only if the input symbol is the only thing inside the conditional
1754 of a C<while> statement (even if disguised as a C<for(;;)> loop),
1755 the value is automatically assigned to the global variable $_,
1756 destroying whatever was there previously.  (This may seem like an
1757 odd thing to you, but you'll use the construct in almost every Perl
1758 script you write.)  The $_ variable is not implicitly localized.
1759 You'll have to put a C<local $_;> before the loop if you want that
1760 to happen.
1761
1762 The following lines are equivalent:
1763
1764     while (defined($_ = <STDIN>)) { print; }
1765     while ($_ = <STDIN>) { print; }
1766     while (<STDIN>) { print; }
1767     for (;<STDIN>;) { print; }
1768     print while defined($_ = <STDIN>);
1769     print while ($_ = <STDIN>);
1770     print while <STDIN>;
1771
1772 This also behaves similarly, but avoids $_ :
1773
1774     while (my $line = <STDIN>) { print $line }
1775
1776 In these loop constructs, the assigned value (whether assignment
1777 is automatic or explicit) is then tested to see whether it is
1778 defined.  The defined test avoids problems where line has a string
1779 value that would be treated as false by Perl, for example a "" or
1780 a "0" with no trailing newline.  If you really mean for such values
1781 to terminate the loop, they should be tested for explicitly:
1782
1783     while (($_ = <STDIN>) ne '0') { ... }
1784     while (<STDIN>) { last unless $_; ... }
1785
1786 In other boolean contexts, C<< <I<filehandle>> >> without an
1787 explicit C<defined> test or comparison elicit a warning if the
1788 C<use warnings> pragma or the B<-w>
1789 command-line switch (the C<$^W> variable) is in effect.
1790
1791 The filehandles STDIN, STDOUT, and STDERR are predefined.  (The
1792 filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
1793 in packages, where they would be interpreted as local identifiers
1794 rather than global.)  Additional filehandles may be created with
1795 the open() function, amongst others.  See L<perlopentut> and
1796 L<perlfunc/open> for details on this.
1797
1798 If a <FILEHANDLE> is used in a context that is looking for
1799 a list, a list comprising all input lines is returned, one line per
1800 list element.  It's easy to grow to a rather large data space this
1801 way, so use with care.
1802
1803 <FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
1804 See L<perlfunc/readline>.
1805
1806 The null filehandle <> is special: it can be used to emulate the
1807 behavior of B<sed> and B<awk>.  Input from <> comes either from
1808 standard input, or from each file listed on the command line.  Here's
1809 how it works: the first time <> is evaluated, the @ARGV array is
1810 checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
1811 gives you standard input.  The @ARGV array is then processed as a list
1812 of filenames.  The loop
1813
1814     while (<>) {
1815         ...                     # code for each line
1816     }
1817
1818 is equivalent to the following Perl-like pseudo code:
1819
1820     unshift(@ARGV, '-') unless @ARGV;
1821     while ($ARGV = shift) {
1822         open(ARGV, $ARGV);
1823         while (<ARGV>) {
1824             ...         # code for each line
1825         }
1826     }
1827
1828 except that it isn't so cumbersome to say, and will actually work.
1829 It really does shift the @ARGV array and put the current filename
1830 into the $ARGV variable.  It also uses filehandle I<ARGV>
1831 internally--<> is just a synonym for <ARGV>, which
1832 is magical.  (The pseudo code above doesn't work because it treats
1833 <ARGV> as non-magical.)
1834
1835 You can modify @ARGV before the first <> as long as the array ends up
1836 containing the list of filenames you really want.  Line numbers (C<$.>)
1837 continue as though the input were one big happy file.  See the example
1838 in L<perlfunc/eof> for how to reset line numbers on each file.
1839
1840 If you want to set @ARGV to your own list of files, go right ahead.
1841 This sets @ARGV to all plain text files if no @ARGV was given:
1842
1843     @ARGV = grep { -f && -T } glob('*') unless @ARGV;
1844
1845 You can even set them to pipe commands.  For example, this automatically
1846 filters compressed arguments through B<gzip>:
1847
1848     @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
1849
1850 If you want to pass switches into your script, you can use one of the
1851 Getopts modules or put a loop on the front like this:
1852
1853     while ($_ = $ARGV[0], /^-/) {
1854         shift;
1855         last if /^--$/;
1856         if (/^-D(.*)/) { $debug = $1 }
1857         if (/^-v/)     { $verbose++  }
1858         # ...           # other switches
1859     }
1860
1861     while (<>) {
1862         # ...           # code for each line
1863     }
1864
1865 The <> symbol will return C<undef> for end-of-file only once.
1866 If you call it again after this, it will assume you are processing another
1867 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
1868
1869 If what the angle brackets contain is a simple scalar variable (e.g.,
1870 <$foo>), then that variable contains the name of the
1871 filehandle to input from, or its typeglob, or a reference to the
1872 same.  For example:
1873
1874     $fh = \*STDIN;
1875     $line = <$fh>;
1876
1877 If what's within the angle brackets is neither a filehandle nor a simple
1878 scalar variable containing a filehandle name, typeglob, or typeglob
1879 reference, it is interpreted as a filename pattern to be globbed, and
1880 either a list of filenames or the next filename in the list is returned,
1881 depending on context.  This distinction is determined on syntactic
1882 grounds alone.  That means C<< <$x> >> is always a readline() from
1883 an indirect handle, but C<< <$hash{key}> >> is always a glob().
1884 That's because $x is a simple scalar variable, but C<$hash{key}> is
1885 not--it's a hash element.
1886
1887 One level of double-quote interpretation is done first, but you can't
1888 say C<< <$foo> >> because that's an indirect filehandle as explained
1889 in the previous paragraph.  (In older versions of Perl, programmers
1890 would insert curly brackets to force interpretation as a filename glob:
1891 C<< <${foo}> >>.  These days, it's considered cleaner to call the
1892 internal function directly as C<glob($foo)>, which is probably the right
1893 way to have done it in the first place.)  For example:
1894
1895     while (<*.c>) {
1896         chmod 0644, $_;
1897     }
1898
1899 is roughly equivalent to:
1900
1901     open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1902     while (<FOO>) {
1903         chomp;
1904         chmod 0644, $_;
1905     }
1906
1907 except that the globbing is actually done internally using the standard
1908 C<File::Glob> extension.  Of course, the shortest way to do the above is:
1909
1910     chmod 0644, <*.c>;
1911
1912 A (file)glob evaluates its (embedded) argument only when it is
1913 starting a new list.  All values must be read before it will start
1914 over.  In list context, this isn't important because you automatically
1915 get them all anyway.  However, in scalar context the operator returns
1916 the next value each time it's called, or C<undef> when the list has
1917 run out.  As with filehandle reads, an automatic C<defined> is
1918 generated when the glob occurs in the test part of a C<while>,
1919 because legal glob returns (e.g. a file called F<0>) would otherwise
1920 terminate the loop.  Again, C<undef> is returned only once.  So if
1921 you're expecting a single value from a glob, it is much better to
1922 say
1923
1924     ($file) = <blurch*>;
1925
1926 than
1927
1928     $file = <blurch*>;
1929
1930 because the latter will alternate between returning a filename and
1931 returning false.
1932
1933 If you're trying to do variable interpolation, it's definitely better
1934 to use the glob() function, because the older notation can cause people
1935 to become confused with the indirect filehandle notation.
1936
1937     @files = glob("$dir/*.[ch]");
1938     @files = glob($files[$i]);
1939
1940 =head2 Constant Folding
1941
1942 Like C, Perl does a certain amount of expression evaluation at
1943 compile time whenever it determines that all arguments to an
1944 operator are static and have no side effects.  In particular, string
1945 concatenation happens at compile time between literals that don't do
1946 variable substitution.  Backslash interpolation also happens at
1947 compile time.  You can say
1948
1949     'Now is the time for all' . "\n" .
1950         'good men to come to.'
1951
1952 and this all reduces to one string internally.  Likewise, if
1953 you say
1954
1955     foreach $file (@filenames) {
1956         if (-s $file > 5 + 100 * 2**16) {  }
1957     }
1958
1959 the compiler will precompute the number which that expression
1960 represents so that the interpreter won't have to.
1961
1962 =head2 Bitwise String Operators
1963
1964 Bitstrings of any size may be manipulated by the bitwise operators
1965 (C<~ | & ^>).
1966
1967 If the operands to a binary bitwise op are strings of different
1968 sizes, B<|> and B<^> ops act as though the shorter operand had
1969 additional zero bits on the right, while the B<&> op acts as though
1970 the longer operand were truncated to the length of the shorter.
1971 The granularity for such extension or truncation is one or more
1972 bytes.
1973
1974     # ASCII-based examples
1975     print "j p \n" ^ " a h";            # prints "JAPH\n"
1976     print "JA" | "  ph\n";              # prints "japh\n"
1977     print "japh\nJunk" & '_____';       # prints "JAPH\n";
1978     print 'p N$' ^ " E<H\n";            # prints "Perl\n";
1979
1980 If you are intending to manipulate bitstrings, be certain that
1981 you're supplying bitstrings: If an operand is a number, that will imply
1982 a B<numeric> bitwise operation.  You may explicitly show which type of
1983 operation you intend by using C<""> or C<0+>, as in the examples below.
1984
1985     $foo =  150  |  105 ;       # yields 255  (0x96 | 0x69 is 0xFF)
1986     $foo = '150' |  105 ;       # yields 255
1987     $foo =  150  | '105';       # yields 255
1988     $foo = '150' | '105';       # yields string '155' (under ASCII)
1989
1990     $baz = 0+$foo & 0+$bar;     # both ops explicitly numeric
1991     $biz = "$foo" ^ "$bar";     # both ops explicitly stringy
1992
1993 See L<perlfunc/vec> for information on how to manipulate individual bits
1994 in a bit vector.
1995
1996 =head2 Integer Arithmetic
1997
1998 By default, Perl assumes that it must do most of its arithmetic in
1999 floating point.  But by saying
2000
2001     use integer;
2002
2003 you may tell the compiler that it's okay to use integer operations
2004 (if it feels like it) from here to the end of the enclosing BLOCK.
2005 An inner BLOCK may countermand this by saying
2006
2007     no integer;
2008
2009 which lasts until the end of that BLOCK.  Note that this doesn't
2010 mean everything is only an integer, merely that Perl may use integer
2011 operations if it is so inclined.  For example, even under C<use
2012 integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2013 or so.
2014
2015 Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
2016 and ">>") always produce integral results.  (But see also
2017 L<Bitwise String Operators>.)  However, C<use integer> still has meaning for
2018 them.  By default, their results are interpreted as unsigned integers, but
2019 if C<use integer> is in effect, their results are interpreted
2020 as signed integers.  For example, C<~0> usually evaluates to a large
2021 integral value.  However, C<use integer; ~0> is C<-1> on twos-complement
2022 machines.
2023
2024 =head2 Floating-point Arithmetic
2025
2026 While C<use integer> provides integer-only arithmetic, there is no
2027 analogous mechanism to provide automatic rounding or truncation to a
2028 certain number of decimal places.  For rounding to a certain number
2029 of digits, sprintf() or printf() is usually the easiest route.
2030 See L<perlfaq4>.
2031
2032 Floating-point numbers are only approximations to what a mathematician
2033 would call real numbers.  There are infinitely more reals than floats,
2034 so some corners must be cut.  For example:
2035
2036     printf "%.20g\n", 123456789123456789;
2037     #        produces 123456789123456784
2038
2039 Testing for exact equality of floating-point equality or inequality is
2040 not a good idea.  Here's a (relatively expensive) work-around to compare
2041 whether two floating-point numbers are equal to a particular number of
2042 decimal places.  See Knuth, volume II, for a more robust treatment of
2043 this topic.
2044
2045     sub fp_equal {
2046         my ($X, $Y, $POINTS) = @_;
2047         my ($tX, $tY);
2048         $tX = sprintf("%.${POINTS}g", $X);
2049         $tY = sprintf("%.${POINTS}g", $Y);
2050         return $tX eq $tY;
2051     }
2052
2053 The POSIX module (part of the standard perl distribution) implements
2054 ceil(), floor(), and other mathematical and trigonometric functions.
2055 The Math::Complex module (part of the standard perl distribution)
2056 defines mathematical functions that work on both the reals and the
2057 imaginary numbers.  Math::Complex not as efficient as POSIX, but
2058 POSIX can't work with complex numbers.
2059
2060 Rounding in financial applications can have serious implications, and
2061 the rounding method used should be specified precisely.  In these
2062 cases, it probably pays not to trust whichever system rounding is
2063 being used by Perl, but to instead implement the rounding function you
2064 need yourself.
2065
2066 =head2 Bigger Numbers
2067
2068 The standard Math::BigInt and Math::BigFloat modules provide
2069 variable-precision arithmetic and overloaded operators, although
2070 they're currently pretty slow. At the cost of some space and
2071 considerable speed, they avoid the normal pitfalls associated with
2072 limited-precision representations.
2073
2074     use Math::BigInt;
2075     $x = Math::BigInt->new('123456789123456789');
2076     print $x * $x;
2077
2078     # prints +15241578780673678515622620750190521
2079
2080 There are several modules that let you calculate with (bound only by
2081 memory and cpu-time) unlimited or fixed precision. There are also
2082 some non-standard modules that provide faster implementations via
2083 external C libraries.
2084
2085 Here is a short, but incomplete summary:
2086
2087         Math::Fraction          big, unlimited fractions like 9973 / 12967
2088         Math::String            treat string sequences like numbers
2089         Math::FixedPrecision    calculate with a fixed precision
2090         Math::Currency          for currency calculations
2091         Bit::Vector             manipulate bit vectors fast (uses C)
2092         Math::BigIntFast        Bit::Vector wrapper for big numbers
2093         Math::Pari              provides access to the Pari C library
2094         Math::BigInteger        uses an external C library
2095         Math::Cephes            uses external Cephes C library (no big numbers)
2096         Math::Cephes::Fraction  fractions via the Cephes library
2097         Math::GMP               another one using an external C library
2098
2099 Choose wisely.
2100
2101 =cut