pod/perlfaq4.pod

   1 =head1 NAME
   2
   3 perlfaq4 - Data Manipulation
   4
   5 =head1 DESCRIPTION
   6
   7 This section of the FAQ answers questions related to manipulating
   8 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
   9
  10 =head1 Data: Numbers
  11
  12 =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
  13
  14 Internally, your computer represents floating-point numbers in binary.
  15 Digital (as in powers of two) computers cannot store all numbers
  16 exactly.  Some real numbers lose precision in the process.  This is a
  17 problem with how computers store numbers and affects all computer
  18 languages, not just Perl.
  19
  20 L<perlnumber> shows the gory details of number representations and
  21 conversions.
  22
  23 To limit the number of decimal places in your numbers, you can use the
  24 printf or sprintf function.  See the L<"Floating Point
  25 Arithmetic"|perlop> for more details.
  26
  27         printf "%.2f", 10/3;
  28
  29         my $number = sprintf "%.2f", 10/3;
  30
  31 =head2 Why is int() broken?
  32
  33 Your C<int()> is most probably working just fine.  It's the numbers that
  34 aren't quite what you think.
  35
  36 First, see the answer to "Why am I getting long decimals
  37 (eg, 19.9499999999999) instead of the numbers I should be getting
  38 (eg, 19.95)?".
  39
  40 For example, this
  41
  42         print int(0.6/0.2-2), "\n";
  43
  44 will in most computers print 0, not 1, because even such simple
  45 numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
  46 numbers.  What you think in the above as 'three' is really more like
  47 2.9999999999999995559.
  48
  49 =head2 Why isn't my octal data interpreted correctly?
  50
  51 (contributed by brian d foy)
  52
  53 You're probably trying to convert a string to a number, which Perl only
  54 converts as a decimal number. When Perl converts a string to a number, it
  55 ignores leading spaces and zeroes, then assumes the rest of the digits
  56 are in base 10:
  57
  58         my $string = '0644';
  59
  60         print $string + 0;  # prints 644
  61
  62         print $string + 44; # prints 688, certainly not octal!
  63
  64 This problem usually involves one of the Perl built-ins that has the
  65 same name a unix command that uses octal numbers as arguments on the
  66 command line. In this example, C<chmod> on the command line knows that
  67 its first argument is octal because that's what it does:
  68
  69         %prompt> chmod 644 file
  70
  71 If you want to use the same literal digits (644) in Perl, you have to tell
  72 Perl to treat them as octal numbers either by prefixing the digits with
  73 a C<0> or using C<oct>:
  74
  75         chmod(     0644, $file);   # right, has leading zero
  76         chmod( oct(644), $file );  # also correct
  77
  78 The problem comes in when you take your numbers from something that Perl
  79 thinks is a string, such as a command line argument in C<@ARGV>:
  80
  81         chmod( $ARGV[0],      $file);   # wrong, even if "0644"
  82
  83         chmod( oct($ARGV[0]), $file );  # correct, treat string as octal
  84
  85 You can always check the value you're using by printing it in octal
  86 notation to ensure it matches what you think it should be. Print it
  87 in octal  and decimal format:
  88
  89         printf "0%o %d", $number, $number;
  90
  91 =head2 Does Perl have a round() function?  What about ceil() and floor()?  Trig functions?
  92
  93 Remember that C<int()> merely truncates toward 0.  For rounding to a
  94 certain number of digits, C<sprintf()> or C<printf()> is usually the
  95 easiest route.
  96
  97         printf("%.3f", 3.1415926535);   # prints 3.142
  98
  99 The C<POSIX> module (part of the standard Perl distribution)
 100 implements C<ceil()>, C<floor()>, and a number of other mathematical
 101 and trigonometric functions.
 102
 103         use POSIX;
 104         $ceil   = ceil(3.5);   # 4
 105         $floor  = floor(3.5);  # 3
 106
 107 In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
 108 module.  With 5.004, the C<Math::Trig> module (part of the standard Perl
 109 distribution) implements the trigonometric functions. Internally it
 110 uses the C<Math::Complex> module and some functions can break out from
 111 the real axis into the complex plane, for example the inverse sine of
 112 2.
 113
 114 Rounding in financial applications can have serious implications, and
 115 the rounding method used should be specified precisely.  In these
 116 cases, it probably pays not to trust whichever system rounding is
 117 being used by Perl, but to instead implement the rounding function you
 118 need yourself.
 119
 120 To see why, notice how you'll still have an issue on half-way-point
 121 alternation:
 122
 123         for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
 124
 125         0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
 126         0.8 0.8 0.9 0.9 1.0 1.0
 127
 128 Don't blame Perl.  It's the same as in C.  IEEE says we have to do
 129 this. Perl numbers whose absolute values are integers under 2**31 (on
 130 32 bit machines) will work pretty much like mathematical integers.
 131 Other numbers are not guaranteed.
 132
 133 =head2 How do I convert between numeric representations/bases/radixes?
 134
 135 As always with Perl there is more than one way to do it.  Below are a
 136 few examples of approaches to making common conversions between number
 137 representations.  This is intended to be representational rather than
 138 exhaustive.
 139
 140 Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
 141 module from CPAN. The reason you might choose C<Bit::Vector> over the
 142 perl built in functions is that it works with numbers of ANY size,
 143 that it is optimized for speed on some operations, and for at least
 144 some programmers the notation might be familiar.
 145
 146 =over 4
 147
 148 =item How do I convert hexadecimal into decimal
 149
 150 Using perl's built in conversion of C<0x> notation:
 151
 152         $dec = 0xDEADBEEF;
 153
 154 Using the C<hex> function:
 155
 156         $dec = hex("DEADBEEF");
 157
 158 Using C<pack>:
 159
 160         $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
 161
 162 Using the CPAN module C<Bit::Vector>:
 163
 164         use Bit::Vector;
 165         $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
 166         $dec = $vec->to_Dec();
 167
 168 =item How do I convert from decimal to hexadecimal
 169
 170 Using C<sprintf>:
 171
 172         $hex = sprintf("%X", 3735928559); # upper case A-F
 173         $hex = sprintf("%x", 3735928559); # lower case a-f
 174
 175 Using C<unpack>:
 176
 177         $hex = unpack("H*", pack("N", 3735928559));
 178
 179 Using C<Bit::Vector>:
 180
 181         use Bit::Vector;
 182         $vec = Bit::Vector->new_Dec(32, -559038737);
 183         $hex = $vec->to_Hex();
 184
 185 And C<Bit::Vector> supports odd bit counts:
 186
 187         use Bit::Vector;
 188         $vec = Bit::Vector->new_Dec(33, 3735928559);
 189         $vec->Resize(32); # suppress leading 0 if unwanted
 190         $hex = $vec->to_Hex();
 191
 192 =item How do I convert from octal to decimal
 193
 194 Using Perl's built in conversion of numbers with leading zeros:
 195
 196         $dec = 033653337357; # note the leading 0!
 197
 198 Using the C<oct> function:
 199
 200         $dec = oct("33653337357");
 201
 202 Using C<Bit::Vector>:
 203
 204         use Bit::Vector;
 205         $vec = Bit::Vector->new(32);
 206         $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
 207         $dec = $vec->to_Dec();
 208
 209 =item How do I convert from decimal to octal
 210
 211 Using C<sprintf>:
 212
 213         $oct = sprintf("%o", 3735928559);
 214
 215 Using C<Bit::Vector>:
 216
 217         use Bit::Vector;
 218         $vec = Bit::Vector->new_Dec(32, -559038737);
 219         $oct = reverse join('', $vec->Chunk_List_Read(3));
 220
 221 =item How do I convert from binary to decimal
 222
 223 Perl 5.6 lets you write binary numbers directly with
 224 the C<0b> notation:
 225
 226         $number = 0b10110110;
 227
 228 Using C<oct>:
 229
 230         my $input = "10110110";
 231         $decimal = oct( "0b$input" );
 232
 233 Using C<pack> and C<ord>:
 234
 235         $decimal = ord(pack('B8', '10110110'));
 236
 237 Using C<pack> and C<unpack> for larger strings:
 238
 239         $int = unpack("N", pack("B32",
 240         substr("0" x 32 . "11110101011011011111011101111", -32)));
 241         $dec = sprintf("%d", $int);
 242
 243         # substr() is used to left pad a 32 character string with zeros.
 244
 245 Using C<Bit::Vector>:
 246
 247         $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
 248         $dec = $vec->to_Dec();
 249
 250 =item How do I convert from decimal to binary
 251
 252 Using C<sprintf> (perl 5.6+):
 253
 254         $bin = sprintf("%b", 3735928559);
 255
 256 Using C<unpack>:
 257
 258         $bin = unpack("B*", pack("N", 3735928559));
 259
 260 Using C<Bit::Vector>:
 261
 262         use Bit::Vector;
 263         $vec = Bit::Vector->new_Dec(32, -559038737);
 264         $bin = $vec->to_Bin();
 265
 266 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
 267 are left as an exercise to the inclined reader.
 268
 269 =back
 270
 271 =head2 Why doesn't & work the way I want it to?
 272
 273 The behavior of binary arithmetic operators depends on whether they're
 274 used on numbers or strings.  The operators treat a string as a series
 275 of bits and work with that (the string C<"3"> is the bit pattern
 276 C<00110011>).  The operators work with the binary form of a number
 277 (the number C<3> is treated as the bit pattern C<00000011>).
 278
 279 So, saying C<11 & 3> performs the "and" operation on numbers (yielding
 280 C<3>).  Saying C<"11" & "3"> performs the "and" operation on strings
 281 (yielding C<"1">).
 282
 283 Most problems with C<&> and C<|> arise because the programmer thinks
 284 they have a number but really it's a string.  The rest arise because
 285 the programmer says:
 286
 287         if ("\020\020" & "\101\101") {
 288                 # ...
 289                 }
 290
 291 but a string consisting of two null bytes (the result of C<"\020\020"
 292 & "\101\101">) is not a false value in Perl.  You need:
 293
 294         if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
 295                 # ...
 296                 }
 297
 298 =head2 How do I multiply matrices?
 299
 300 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
 301 or the PDL extension (also available from CPAN).
 302
 303 =head2 How do I perform an operation on a series of integers?
 304
 305 To call a function on each element in an array, and collect the
 306 results, use:
 307
 308         @results = map { my_func($_) } @array;
 309
 310 For example:
 311
 312         @triple = map { 3 * $_ } @single;
 313
 314 To call a function on each element of an array, but ignore the
 315 results:
 316
 317         foreach $iterator (@array) {
 318                 some_func($iterator);
 319                 }
 320
 321 To call a function on each integer in a (small) range, you B<can> use:
 322
 323         @results = map { some_func($_) } (5 .. 25);
 324
 325 but you should be aware that the C<..> operator creates an array of
 326 all integers in the range.  This can take a lot of memory for large
 327 ranges.  Instead use:
 328
 329         @results = ();
 330         for ($i=5; $i < 500_005; $i++) {
 331                 push(@results, some_func($i));
 332                 }
 333
 334 This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
 335 loop will iterate over the range, without creating the entire range.
 336
 337         for my $i (5 .. 500_005) {
 338                 push(@results, some_func($i));
 339                 }
 340
 341 will not create a list of 500,000 integers.
 342
 343 =head2 How can I output Roman numerals?
 344
 345 Get the http://www.cpan.org/modules/by-module/Roman module.
 346
 347 =head2 Why aren't my random numbers random?
 348
 349 If you're using a version of Perl before 5.004, you must call C<srand>
 350 once at the start of your program to seed the random number generator.
 351
 352          BEGIN { srand() if $] < 5.004 }
 353
 354 5.004 and later automatically call C<srand> at the beginning.  Don't
 355 call C<srand> more than once--you make your numbers less random,
 356 rather than more.
 357
 358 Computers are good at being predictable and bad at being random
 359 (despite appearances caused by bugs in your programs :-).  see the
 360 F<random> article in the "Far More Than You Ever Wanted To Know"
 361 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
 362 of Tom Phoenix, talks more about this.  John von Neumann said, "Anyone
 363 who attempts to generate random numbers by deterministic means is, of
 364 course, living in a state of sin."
 365
 366 If you want numbers that are more random than C<rand> with C<srand>
 367 provides, you should also check out the C<Math::TrulyRandom> module from
 368 CPAN.  It uses the imperfections in your system's timer to generate
 369 random numbers, but this takes quite a while.  If you want a better
 370 pseudorandom generator than comes with your operating system, look at
 371 "Numerical Recipes in C" at http://www.nr.com/ .
 372
 373 =head2 How do I get a random number between X and Y?
 374
 375 To get a random number between two values, you can use the C<rand()>
 376 built-in to get a random number between 0 and 1. From there, you shift
 377 that into the range that you want.
 378
 379 C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
 380 what you want to have perl figure out is a random number in the range
 381 from 0 to the difference between your I<X> and I<Y>.
 382
 383 That is, to get a number between 10 and 15, inclusive, you want a
 384 random number between 0 and 5 that you can then add to 10.
 385
 386         my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
 387
 388 Hence you derive the following simple function to abstract
 389 that. It selects a random integer between the two given
 390 integers (inclusive), For example: C<random_int_between(50,120)>.
 391
 392         sub random_int_between {
 393                 my($min, $max) = @_;
 394                 # Assumes that the two arguments are integers themselves!
 395                 return $min if $min == $max;
 396                 ($min, $max) = ($max, $min)  if  $min > $max;
 397                 return $min + int rand(1 + $max - $min);
 398                 }
 399
 400 =head1 Data: Dates
 401
 402 =head2 How do I find the day or week of the year?
 403
 404 The localtime function returns the day of the year.  Without an
 405 argument localtime uses the current time.
 406
 407         $day_of_year = (localtime)[7];
 408
 409 The C<POSIX> module can also format a date as the day of the year or
 410 week of the year.
 411
 412         use POSIX qw/strftime/;
 413         my $day_of_year  = strftime "%j", localtime;
 414         my $week_of_year = strftime "%W", localtime;
 415
 416 To get the day of year for any date, use C<POSIX>'s C<mktime> to get
 417 a time in epoch seconds for the argument to localtime.
 418
 419         use POSIX qw/mktime strftime/;
 420         my $week_of_year = strftime "%W",
 421                 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
 422
 423 The C<Date::Calc> module provides two functions to calculate these.
 424
 425         use Date::Calc;
 426         my $day_of_year  = Day_of_Year(  1987, 12, 18 );
 427         my $week_of_year = Week_of_Year( 1987, 12, 18 );
 428
 429 =head2 How do I find the current century or millennium?
 430
 431 Use the following simple functions:
 432
 433         sub get_century    {
 434                 return int((((localtime(shift || time))[5] + 1999))/100);
 435                 }
 436
 437         sub get_millennium {
 438                 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
 439                 }
 440
 441 On some systems, the C<POSIX> module's C<strftime()> function has been
 442 extended in a non-standard way to use a C<%C> format, which they
 443 sometimes claim is the "century". It isn't, because on most such
 444 systems, this is only the first two digits of the four-digit year, and
 445 thus cannot be used to reliably determine the current century or
 446 millennium.
 447
 448 =head2 How can I compare two dates and find the difference?
 449
 450 (contributed by brian d foy)
 451
 452 You could just store all your dates as a number and then subtract.
 453 Life isn't always that simple though. If you want to work with
 454 formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
 455 modules can help you.
 456
 457 =head2 How can I take a string and turn it into epoch seconds?
 458
 459 If it's a regular enough string that it always has the same format,
 460 you can split it up and pass the parts to C<timelocal> in the standard
 461 C<Time::Local> module.  Otherwise, you should look into the C<Date::Calc>
 462 and C<Date::Manip> modules from CPAN.
 463
 464 =head2 How can I find the Julian Day?
 465
 466 (contributed by brian d foy and Dave Cross)
 467
 468 You can use the C<Time::JulianDay> module available on CPAN.  Ensure
 469 that you really want to find a Julian day, though, as many people have
 470 different ideas about Julian days.  See
 471 http://www.hermetic.ch/cal_stud/jdn.htm for instance.
 472
 473 You can also try the C<DateTime> module, which can convert a date/time
 474 to a Julian Day.
 475
 476         $ perl -MDateTime -le'print DateTime->today->jd'
 477         2453401.5
 478
 479 Or the modified Julian Day
 480
 481         $ perl -MDateTime -le'print DateTime->today->mjd'
 482         53401
 483
 484 Or even the day of the year (which is what some people think of as a
 485 Julian day)
 486
 487         $ perl -MDateTime -le'print DateTime->today->doy'
 488         31
 489
 490 =head2 How do I find yesterday's date?
 491 X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
 492 X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
 493 X<timelocal>
 494
 495 (contributed by brian d foy)
 496
 497 Use one of the Date modules. The C<DateTime> module makes it simple, and
 498 give you the same time of day, only the day before.
 499
 500         use DateTime;
 501
 502         my $yesterday = DateTime->now->subtract( days => 1 );
 503
 504         print "Yesterday was $yesterday\n";
 505
 506 You can also use the C<Date::Calc> module using its C<Today_and_Now>
 507 function.
 508
 509         use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
 510
 511         my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
 512
 513         print "@date_time\n";
 514
 515 Most people try to use the time rather than the calendar to figure out
 516 dates, but that assumes that days are twenty-four hours each.  For
 517 most people, there are two days a year when they aren't: the switch to
 518 and from summer time throws this off. Let the modules do the work.
 519
 520 If you absolutely must do it yourself (or can't use one of the
 521 modules), here's a solution using C<Time::Local>, which comes with
 522 Perl:
 523
 524         # contributed by Gunnar Hjalmarsson
 525          use Time::Local;
 526          my $today = timelocal 0, 0, 12, ( localtime )[3..5];
 527          my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
 528          printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
 529
 530 In this case, you measure the day starting at noon, and subtract 24
 531 hours. Even if the length of the calendar day is 23 or 25 hours,
 532 you'll still end up on the previous calendar day, although not at
 533 noon. Since you don't care about the time, the one hour difference
 534 doesn't matter and you end up with the previous date.
 535
 536 =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
 537
 538 Short answer: No, Perl does not have a Year 2000 problem.  Yes, Perl is
 539 Y2K compliant (whatever that means). The programmers you've hired to
 540 use it, however, probably are not.
 541
 542 Long answer: The question belies a true understanding of the issue.
 543 Perl is just as Y2K compliant as your pencil--no more, and no less.
 544 Can you use your pencil to write a non-Y2K-compliant memo?  Of course
 545 you can.  Is that the pencil's fault?  Of course it isn't.
 546
 547 The date and time functions supplied with Perl (gmtime and localtime)
 548 supply adequate information to determine the year well beyond 2000
 549 (2038 is when trouble strikes for 32-bit machines).  The year returned
 550 by these functions when used in a list context is the year minus 1900.
 551 For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
 552 number. To avoid the year 2000 problem simply do not treat the year as
 553 a 2-digit number.  It isn't.
 554
 555 When gmtime() and localtime() are used in scalar context they return
 556 a timestamp string that contains a fully-expanded year.  For example,
 557 C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
 558 2001".  There's no year 2000 problem here.
 559
 560 That doesn't mean that Perl can't be used to create non-Y2K compliant
 561 programs.  It can.  But so can your pencil.  It's the fault of the user,
 562 not the language.  At the risk of inflaming the NRA: "Perl doesn't
 563 break Y2K, people do."  See http://www.perl.org/about/y2k.html for
 564 a longer exposition.
 565
 566 =head1 Data: Strings
 567
 568 =head2 How do I validate input?
 569
 570 (contributed by brian d foy)
 571
 572 There are many ways to ensure that values are what you expect or
 573 want to accept. Besides the specific examples that we cover in the
 574 perlfaq, you can also look at the modules with "Assert" and "Validate"
 575 in their names, along with other modules such as C<Regexp::Common>.
 576
 577 Some modules have validation for particular types of input, such
 578 as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
 579 and C<Data::Validate::IP>.
 580
 581 =head2 How do I unescape a string?
 582
 583 It depends just what you mean by "escape".  URL escapes are dealt
 584 with in L<perlfaq9>.  Shell escapes with the backslash (C<\>)
 585 character are removed with
 586
 587         s/\\(.)/$1/g;
 588
 589 This won't expand C<"\n"> or C<"\t"> or any other special escapes.
 590
 591 =head2 How do I remove consecutive pairs of characters?
 592
 593 (contributed by brian d foy)
 594
 595 You can use the substitution operator to find pairs of characters (or
 596 runs of characters) and replace them with a single instance. In this
 597 substitution, we find a character in C<(.)>. The memory parentheses
 598 store the matched character in the back-reference C<\1> and we use
 599 that to require that the same thing immediately follow it. We replace
 600 that part of the string with the character in C<$1>.
 601
 602         s/(.)\1/$1/g;
 603
 604 We can also use the transliteration operator, C<tr///>. In this
 605 example, the search list side of our C<tr///> contains nothing, but
 606 the C<c> option complements that so it contains everything. The
 607 replacement list also contains nothing, so the transliteration is
 608 almost a no-op since it won't do any replacements (or more exactly,
 609 replace the character with itself). However, the C<s> option squashes
 610 duplicated and consecutive characters in the string so a character
 611 does not show up next to itself
 612
 613         my $str = 'Haarlem';   # in the Netherlands
 614         $str =~ tr///cs;       # Now Harlem, like in New York
 615
 616 =head2 How do I expand function calls in a string?
 617
 618 (contributed by brian d foy)
 619
 620 This is documented in L<perlref>, and although it's not the easiest
 621 thing to read, it does work. In each of these examples, we call the
 622 function inside the braces used to dereference a reference. If we
 623 have more than one return value, we can construct and dereference an
 624 anonymous array. In this case, we call the function in list context.
 625
 626         print "The time values are @{ [localtime] }.\n";
 627
 628 If we want to call the function in scalar context, we have to do a bit
 629 more work. We can really have any code we like inside the braces, so
 630 we simply have to end with the scalar reference, although how you do
 631 that is up to you, and you can use code inside the braces. Note that
 632 the use of parens creates a list context, so we need C<scalar> to
 633 force the scalar context on the function:
 634
 635         print "The time is ${\(scalar localtime)}.\n"
 636
 637         print "The time is ${ my $x = localtime; \$x }.\n";
 638
 639 If your function already returns a reference, you don't need to create
 640 the reference yourself.
 641
 642         sub timestamp { my $t = localtime; \$t }
 643
 644         print "The time is ${ timestamp() }.\n";
 645
 646 The C<Interpolation> module can also do a lot of magic for you. You can
 647 specify a variable name, in this case C<E>, to set up a tied hash that
 648 does the interpolation for you. It has several other methods to do this
 649 as well.
 650
 651         use Interpolation E => 'eval';
 652         print "The time values are $E{localtime()}.\n";
 653
 654 In most cases, it is probably easier to simply use string concatenation,
 655 which also forces scalar context.
 656
 657         print "The time is " . localtime() . ".\n";
 658
 659 =head2 How do I find matching/nesting anything?
 660
 661 This isn't something that can be done in one regular expression, no
 662 matter how complicated.  To find something between two single
 663 characters, a pattern like C</x([^x]*)x/> will get the intervening
 664 bits in $1. For multiple ones, then something more like
 665 C</alpha(.*?)omega/> would be needed. But none of these deals with
 666 nested patterns.  For balanced expressions using C<(>, C<{>, C<[> or
 667 C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
 668 L<perlre/(??{ code })>.  For other cases, you'll have to write a
 669 parser.
 670
 671 If you are serious about writing a parser, there are a number of
 672 modules or oddities that will make your life a lot easier.  There are
 673 the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
 674 C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
 675 the C<Text::Balanced> is part of the standard distribution.
 676
 677 One simple destructive, inside-out approach that you might try is to
 678 pull out the smallest nesting parts one at a time:
 679
 680         while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
 681                 # do something with $1
 682                 }
 683
 684 A more complicated and sneaky approach is to make Perl's regular
 685 expression engine do it for you.  This is courtesy Dean Inada, and
 686 rather has the nature of an Obfuscated Perl Contest entry, but it
 687 really does work:
 688
 689         # $_ contains the string to parse
 690         # BEGIN and END are the opening and closing markers for the
 691         # nested text.
 692
 693         @( = ('(','');
 694         @) = (')','');
 695         ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
 696         @$ = (eval{/$re/},$@!~/unmatched/i);
 697         print join("\n",@$[0..$#$]) if( $$[-1] );
 698
 699 =head2 How do I reverse a string?
 700
 701 Use C<reverse()> in scalar context, as documented in
 702 L<perlfunc/reverse>.
 703
 704         $reversed = reverse $string;
 705
 706 =head2 How do I expand tabs in a string?
 707
 708 You can do it yourself:
 709
 710         1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
 711
 712 Or you can just use the C<Text::Tabs> module (part of the standard Perl
 713 distribution).
 714
 715         use Text::Tabs;
 716         @expanded_lines = expand(@lines_with_tabs);
 717
 718 =head2 How do I reformat a paragraph?
 719
 720 Use C<Text::Wrap> (part of the standard Perl distribution):
 721
 722         use Text::Wrap;
 723         print wrap("\t", '  ', @paragraphs);
 724
 725 The paragraphs you give to C<Text::Wrap> should not contain embedded
 726 newlines.  C<Text::Wrap> doesn't justify the lines (flush-right).
 727
 728 Or use the CPAN module C<Text::Autoformat>.  Formatting files can be
 729 easily done by making a shell alias, like so:
 730
 731         alias fmt="perl -i -MText::Autoformat -n0777 \
 732                 -e 'print autoformat $_, {all=>1}' $*"
 733
 734 See the documentation for C<Text::Autoformat> to appreciate its many
 735 capabilities.
 736
 737 =head2 How can I access or change N characters of a string?
 738
 739 You can access the first characters of a string with substr().
 740 To get the first character, for example, start at position 0
 741 and grab the string of length 1.
 742
 743
 744         $string = "Just another Perl Hacker";
 745         $first_char = substr( $string, 0, 1 );  #  'J'
 746
 747 To change part of a string, you can use the optional fourth
 748 argument which is the replacement string.
 749
 750         substr( $string, 13, 4, "Perl 5.8.0" );
 751
 752 You can also use substr() as an lvalue.
 753
 754         substr( $string, 13, 4 ) =  "Perl 5.8.0";
 755
 756 =head2 How do I change the Nth occurrence of something?
 757
 758 You have to keep track of N yourself.  For example, let's say you want
 759 to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
 760 C<"whosoever"> or C<"whomsoever">, case insensitively.  These
 761 all assume that $_ contains the string to be altered.
 762
 763         $count = 0;
 764         s{((whom?)ever)}{
 765         ++$count == 5       # is it the 5th?
 766             ? "${2}soever"  # yes, swap
 767             : $1            # renege and leave it there
 768                 }ige;
 769
 770 In the more general case, you can use the C</g> modifier in a C<while>
 771 loop, keeping count of matches.
 772
 773         $WANT = 3;
 774         $count = 0;
 775         $_ = "One fish two fish red fish blue fish";
 776         while (/(\w+)\s+fish\b/gi) {
 777                 if (++$count == $WANT) {
 778                         print "The third fish is a $1 one.\n";
 779                         }
 780                 }
 781
 782 That prints out: C<"The third fish is a red one.">  You can also use a
 783 repetition count and repeated pattern like this:
 784
 785         /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
 786
 787 =head2 How can I count the number of occurrences of a substring within a string?
 788
 789 There are a number of ways, with varying efficiency.  If you want a
 790 count of a certain single character (X) within a string, you can use the
 791 C<tr///> function like so:
 792
 793         $string = "ThisXlineXhasXsomeXx'sXinXit";
 794         $count = ($string =~ tr/X//);
 795         print "There are $count X characters in the string";
 796
 797 This is fine if you are just looking for a single character.  However,
 798 if you are trying to count multiple character substrings within a
 799 larger string, C<tr///> won't work.  What you can do is wrap a while()
 800 loop around a global pattern match.  For example, let's count negative
 801 integers:
 802
 803         $string = "-9 55 48 -2 23 -76 4 14 -44";
 804         while ($string =~ /-\d+/g) { $count++ }
 805         print "There are $count negative numbers in the string";
 806
 807 Another version uses a global match in list context, then assigns the
 808 result to a scalar, producing a count of the number of matches.
 809
 810         $count = () = $string =~ /-\d+/g;
 811
 812 =head2 How do I capitalize all the words on one line?
 813 X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
 814
 815 (contributed by brian d foy)
 816
 817 Damian Conway's L<Text::Autoformat> handles all of the thinking
 818 for you.
 819
 820         use Text::Autoformat;
 821         my $x = "Dr. Strangelove or: How I Learned to Stop ".
 822           "Worrying and Love the Bomb";
 823
 824         print $x, "\n";
 825         for my $style (qw( sentence title highlight )) {
 826                 print autoformat($x, { case => $style }), "\n";
 827                 }
 828
 829 How do you want to capitalize those words?
 830
 831         FRED AND BARNEY'S LODGE        # all uppercase
 832         Fred And Barney's Lodge        # title case
 833         Fred and Barney's Lodge        # highlight case
 834
 835 It's not as easy a problem as it looks. How many words do you think
 836 are in there? Wait for it... wait for it.... If you answered 5
 837 you're right. Perl words are groups of C<\w+>, but that's not what
 838 you want to capitalize. How is Perl supposed to know not to capitalize
 839 that C<s> after the apostrophe? You could try a regular expression:
 840
 841         $string =~ s/ (
 842                                  (^\w)    #at the beginning of the line
 843                                    |      # or
 844                                  (\s\w)   #preceded by whitespace
 845                                    )
 846                                 /\U$1/xg;
 847
 848         $string =~ s/([\w']+)/\u\L$1/g;
 849
 850 Now, what if you don't want to capitalize that "and"? Just use
 851 L<Text::Autoformat> and get on with the next problem. :)
 852
 853 =head2 How can I split a [character] delimited string except when inside [character]?
 854
 855 Several modules can handle this sort of parsing--C<Text::Balanced>,
 856 C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
 857
 858 Take the example case of trying to split a string that is
 859 comma-separated into its different fields. You can't use C<split(/,/)>
 860 because you shouldn't split if the comma is inside quotes.  For
 861 example, take a data line like this:
 862
 863         SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
 864
 865 Due to the restriction of the quotes, this is a fairly complex
 866 problem.  Thankfully, we have Jeffrey Friedl, author of
 867 I<Mastering Regular Expressions>, to handle these for us.  He
 868 suggests (assuming your string is contained in C<$text>):
 869
 870          @new = ();
 871          push(@new, $+) while $text =~ m{
 872                  "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
 873                 | ([^,]+),?
 874                 | ,
 875                 }gx;
 876          push(@new, undef) if substr($text,-1,1) eq ',';
 877
 878 If you want to represent quotation marks inside a
 879 quotation-mark-delimited field, escape them with backslashes (eg,
 880 C<"like \"this\"">.
 881
 882 Alternatively, the C<Text::ParseWords> module (part of the standard
 883 Perl distribution) lets you say:
 884
 885         use Text::ParseWords;
 886         @new = quotewords(",", 0, $text);
 887
 888 =head2 How do I strip blank space from the beginning/end of a string?
 889
 890 (contributed by brian d foy)
 891
 892 A substitution can do this for you. For a single line, you want to
 893 replace all the leading or trailing whitespace with nothing. You
 894 can do that with a pair of substitutions.
 895
 896         s/^\s+//;
 897         s/\s+$//;
 898
 899 You can also write that as a single substitution, although it turns
 900 out the combined statement is slower than the separate ones. That
 901 might not matter to you, though.
 902
 903         s/^\s+|\s+$//g;
 904
 905 In this regular expression, the alternation matches either at the
 906 beginning or the end of the string since the anchors have a lower
 907 precedence than the alternation. With the C</g> flag, the substitution
 908 makes all possible matches, so it gets both. Remember, the trailing
 909 newline matches the C<\s+>, and  the C<$> anchor can match to the
 910 physical end of the string, so the newline disappears too. Just add
 911 the newline to the output, which has the added benefit of preserving
 912 "blank" (consisting entirely of whitespace) lines which the C<^\s+>
 913 would remove all by itself.
 914
 915         while( <> )
 916                 {
 917                 s/^\s+|\s+$//g;
 918                 print "$_\n";
 919                 }
 920
 921 For a multi-line string, you can apply the regular expression
 922 to each logical line in the string by adding the C</m> flag (for
 923 "multi-line"). With the C</m> flag, the C<$> matches I<before> an
 924 embedded newline, so it doesn't remove it. It still removes the
 925 newline at the end of the string.
 926
 927         $string =~ s/^\s+|\s+$//gm;
 928
 929 Remember that lines consisting entirely of whitespace will disappear,
 930 since the first part of the alternation can match the entire string
 931 and replace it with nothing. If need to keep embedded blank lines,
 932 you have to do a little more work. Instead of matching any whitespace
 933 (since that includes a newline), just match the other whitespace.
 934
 935         $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
 936
 937 =head2 How do I pad a string with blanks or pad a number with zeroes?
 938
 939 In the following examples, C<$pad_len> is the length to which you wish
 940 to pad the string, C<$text> or C<$num> contains the string to be padded,
 941 and C<$pad_char> contains the padding character. You can use a single
 942 character string constant instead of the C<$pad_char> variable if you
 943 know what it is in advance. And in the same way you can use an integer in
 944 place of C<$pad_len> if you know the pad length in advance.
 945
 946 The simplest method uses the C<sprintf> function. It can pad on the left
 947 or right with blanks and on the left with zeroes and it will not
 948 truncate the result. The C<pack> function can only pad strings on the
 949 right with blanks and it will truncate the result to a maximum length of
 950 C<$pad_len>.
 951
 952         # Left padding a string with blanks (no truncation):
 953         $padded = sprintf("%${pad_len}s", $text);
 954         $padded = sprintf("%*s", $pad_len, $text);  # same thing
 955
 956         # Right padding a string with blanks (no truncation):
 957         $padded = sprintf("%-${pad_len}s", $text);
 958         $padded = sprintf("%-*s", $pad_len, $text); # same thing
 959
 960         # Left padding a number with 0 (no truncation):
 961         $padded = sprintf("%0${pad_len}d", $num);
 962         $padded = sprintf("%0*d", $pad_len, $num); # same thing
 963
 964         # Right padding a string with blanks using pack (will truncate):
 965         $padded = pack("A$pad_len",$text);
 966
 967 If you need to pad with a character other than blank or zero you can use
 968 one of the following methods.  They all generate a pad string with the
 969 C<x> operator and combine that with C<$text>. These methods do
 970 not truncate C<$text>.
 971
 972 Left and right padding with any character, creating a new string:
 973
 974         $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
 975         $padded = $text . $pad_char x ( $pad_len - length( $text ) );
 976
 977 Left and right padding with any character, modifying C<$text> directly:
 978
 979         substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
 980         $text .= $pad_char x ( $pad_len - length( $text ) );
 981
 982 =head2 How do I extract selected columns from a string?
 983
 984 (contributed by brian d foy)
 985
 986 If you know where the columns that contain the data, you can
 987 use C<substr> to extract a single column.
 988
 989         my $column = substr( $line, $start_column, $length );
 990
 991 You can use C<split> if the columns are separated by whitespace or
 992 some other delimiter, as long as whitespace or the delimiter cannot
 993 appear as part of the data.
 994
 995         my $line    = ' fred barney   betty   ';
 996         my @columns = split /\s+/, $line;
 997                 # ( '', 'fred', 'barney', 'betty' );
 998
 999         my $line    = 'fred||barney||betty';
1000         my @columns = split /\|/, $line;
1001                 # ( 'fred', '', 'barney', '', 'betty' );
1002
1003 If you want to work with comma-separated values, don't do this since
1004 that format is a bit more complicated. Use one of the modules that
1005 handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
1006 C<Text::CSV_PP>.
1007
1008 If you want to break apart an entire line of fixed columns, you can use
1009 C<unpack> with the A (ASCII) format. By using a number after the format
1010 specifier, you can denote the column width. See the C<pack> and C<unpack>
1011 entries in L<perlfunc> for more details.
1012
1013         my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1014
1015 Note that spaces in the format argument to C<unpack> do not denote literal
1016 spaces. If you have space separated data, you may want C<split> instead.
1017
1018 =head2 How do I find the soundex value of a string?
1019
1020 (contributed by brian d foy)
1021
1022 You can use the Text::Soundex module. If you want to do fuzzy or close
1023 matching, you might also try the C<String::Approx>, and
1024 C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
1025
1026 =head2 How can I expand variables in text strings?
1027
1028 (contributed by brian d foy)
1029
1030 If you can avoid it, don't, or if you can use a templating system,
1031 such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1032 might even be able to get the job done with C<sprintf> or C<printf>:
1033
1034         my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1035
1036 However, for the one-off simple case where I don't want to pull out a
1037 full templating system, I'll use a string that has two Perl scalar
1038 variables in it. In this example, I want to expand C<$foo> and C<$bar>
1039 to their variable's values:
1040
1041         my $foo = 'Fred';
1042         my $bar = 'Barney';
1043         $string = 'Say hello to $foo and $bar';
1044
1045 One way I can do this involves the substitution operator and a double
1046 C</e> flag.  The first C</e> evaluates C<$1> on the replacement side and
1047 turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1048 it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1049 what's left in the string:
1050
1051         $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1052
1053 The C</e> will also silently ignore violations of strict, replacing
1054 undefined variable names with the empty string. Since I'm using the
1055 C</e> flag (twice even!), I have all of the same security problems I
1056 have with C<eval> in its string form. If there's something odd in
1057 C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1058 I could get myself in trouble.
1059
1060 To get around the security problem, I could also pull the values from
1061 a hash instead of evaluating variable names. Using a single C</e>, I
1062 can check the hash to ensure the value exists, and if it doesn't, I
1063 can replace the missing value with a marker, in this case C<???> to
1064 signal that I missed something:
1065
1066         my $string = 'This has $foo and $bar';
1067
1068         my %Replacements = (
1069                 foo  => 'Fred',
1070                 );
1071
1072         # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1073         $string =~ s/\$(\w+)/
1074                 exists $Replacements{$1} ? $Replacements{$1} : '???'
1075                 /eg;
1076
1077         print $string;
1078
1079 =head2 What's wrong with always quoting "$vars"?
1080
1081 The problem is that those double-quotes force
1082 stringification--coercing numbers and references into strings--even
1083 when you don't want them to be strings.  Think of it this way:
1084 double-quote expansion is used to produce new strings.  If you already
1085 have a string, why do you need more?
1086
1087 If you get used to writing odd things like these:
1088
1089         print "$var";           # BAD
1090         $new = "$old";          # BAD
1091         somefunc("$var");       # BAD
1092
1093 You'll be in trouble.  Those should (in 99.8% of the cases) be
1094 the simpler and more direct:
1095
1096         print $var;
1097         $new = $old;
1098         somefunc($var);
1099
1100 Otherwise, besides slowing you down, you're going to break code when
1101 the thing in the scalar is actually neither a string nor a number, but
1102 a reference:
1103
1104         func(\@array);
1105         sub func {
1106                 my $aref = shift;
1107                 my $oref = "$aref";  # WRONG
1108                 }
1109
1110 You can also get into subtle problems on those few operations in Perl
1111 that actually do care about the difference between a string and a
1112 number, such as the magical C<++> autoincrement operator or the
1113 syscall() function.
1114
1115 Stringification also destroys arrays.
1116
1117         @lines = `command`;
1118         print "@lines";     # WRONG - extra blanks
1119         print @lines;       # right
1120
1121 =head2 Why don't my E<lt>E<lt>HERE documents work?
1122
1123 Check for these three things:
1124
1125 =over 4
1126
1127 =item There must be no space after the E<lt>E<lt> part.
1128
1129 =item There (probably) should be a semicolon at the end.
1130
1131 =item You can't (easily) have any space in front of the tag.
1132
1133 =back
1134
1135 If you want to indent the text in the here document, you
1136 can do this:
1137
1138     # all in one
1139     ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1140         your text
1141         goes here
1142     HERE_TARGET
1143
1144 But the HERE_TARGET must still be flush against the margin.
1145 If you want that indented also, you'll have to quote
1146 in the indentation.
1147
1148     ($quote = <<'    FINIS') =~ s/^\s+//gm;
1149             ...we will have peace, when you and all your works have
1150             perished--and the works of your dark master to whom you
1151             would deliver us. You are a liar, Saruman, and a corrupter
1152             of men's hearts.  --Theoden in /usr/src/perl/taint.c
1153         FINIS
1154     $quote =~ s/\s+--/\n--/;
1155
1156 A nice general-purpose fixer-upper function for indented here documents
1157 follows.  It expects to be called with a here document as its argument.
1158 It looks to see whether each line begins with a common substring, and
1159 if so, strips that substring off.  Otherwise, it takes the amount of leading
1160 whitespace found on the first line and removes that much off each
1161 subsequent line.
1162
1163     sub fix {
1164         local $_ = shift;
1165         my ($white, $leader);  # common whitespace and common leading string
1166         if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1167             ($white, $leader) = ($2, quotemeta($1));
1168         } else {
1169             ($white, $leader) = (/^(\s+)/, '');
1170         }
1171         s/^\s*?$leader(?:$white)?//gm;
1172         return $_;
1173     }
1174
1175 This works with leading special strings, dynamically determined:
1176
1177         $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1178         @@@ int
1179         @@@ runops() {
1180         @@@     SAVEI32(runlevel);
1181         @@@     runlevel++;
1182         @@@     while ( op = (*op->op_ppaddr)() );
1183         @@@     TAINT_NOT;
1184         @@@     return 0;
1185         @@@ }
1186         MAIN_INTERPRETER_LOOP
1187
1188 Or with a fixed amount of leading whitespace, with remaining
1189 indentation correctly preserved:
1190
1191         $poem = fix<<EVER_ON_AND_ON;
1192        Now far ahead the Road has gone,
1193           And I must follow, if I can,
1194        Pursuing it with eager feet,
1195           Until it joins some larger way
1196        Where many paths and errands meet.
1197           And whither then? I cannot say.
1198                 --Bilbo in /usr/src/perl/pp_ctl.c
1199         EVER_ON_AND_ON
1200
1201 =head1 Data: Arrays
1202
1203 =head2 What is the difference between a list and an array?
1204
1205 An array has a changeable length.  A list does not.  An array is
1206 something you can push or pop, while a list is a set of values.  Some
1207 people make the distinction that a list is a value while an array is a
1208 variable. Subroutines are passed and return lists, you put things into
1209 list context, you initialize arrays with lists, and you C<foreach()>
1210 across a list.  C<@> variables are arrays, anonymous arrays are
1211 arrays, arrays in scalar context behave like the number of elements in
1212 them, subroutines access their arguments through the array C<@_>, and
1213 C<push>/C<pop>/C<shift> only work on arrays.
1214
1215 As a side note, there's no such thing as a list in scalar context.
1216 When you say
1217
1218         $scalar = (2, 5, 7, 9);
1219
1220 you're using the comma operator in scalar context, so it uses the scalar
1221 comma operator.  There never was a list there at all! This causes the
1222 last value to be returned: 9.
1223
1224 =head2 What is the difference between $array[1] and @array[1]?
1225
1226 The former is a scalar value; the latter an array slice, making
1227 it a list with one (scalar) value.  You should use $ when you want a
1228 scalar value (most of the time) and @ when you want a list with one
1229 scalar value in it (very, very rarely; nearly never, in fact).
1230
1231 Sometimes it doesn't make a difference, but sometimes it does.
1232 For example, compare:
1233
1234         $good[0] = `some program that outputs several lines`;
1235
1236 with
1237
1238         @bad[0]  = `same program that outputs several lines`;
1239
1240 The C<use warnings> pragma and the B<-w> flag will warn you about these
1241 matters.
1242
1243 =head2 How can I remove duplicate elements from a list or array?
1244
1245 (contributed by brian d foy)
1246
1247 Use a hash. When you think the words "unique" or "duplicated", think
1248 "hash keys".
1249
1250 If you don't care about the order of the elements, you could just
1251 create the hash then extract the keys. It's not important how you
1252 create that hash: just that you use C<keys> to get the unique
1253 elements.
1254
1255         my %hash   = map { $_, 1 } @array;
1256         # or a hash slice: @hash{ @array } = ();
1257         # or a foreach: $hash{$_} = 1 foreach ( @array );
1258
1259         my @unique = keys %hash;
1260
1261 If you want to use a module, try the C<uniq> function from
1262 C<List::MoreUtils>. In list context it returns the unique elements,
1263 preserving their order in the list. In scalar context, it returns the
1264 number of unique elements.
1265
1266         use List::MoreUtils qw(uniq);
1267
1268         my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1269         my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1270
1271 You can also go through each element and skip the ones you've seen
1272 before. Use a hash to keep track. The first time the loop sees an
1273 element, that element has no key in C<%Seen>. The C<next> statement
1274 creates the key and immediately uses its value, which is C<undef>, so
1275 the loop continues to the C<push> and increments the value for that
1276 key. The next time the loop sees that same element, its key exists in
1277 the hash I<and> the value for that key is true (since it's not 0 or
1278 C<undef>), so the next skips that iteration and the loop goes to the
1279 next element.
1280
1281         my @unique = ();
1282         my %seen   = ();
1283
1284         foreach my $elem ( @array )
1285                 {
1286                 next if $seen{ $elem }++;
1287                 push @unique, $elem;
1288                 }
1289
1290 You can write this more briefly using a grep, which does the
1291 same thing.
1292
1293         my %seen = ();
1294         my @unique = grep { ! $seen{ $_ }++ } @array;
1295
1296 =head2 How can I tell whether a certain element is contained in a list or array?
1297
1298 (portions of this answer contributed by Anno Siegel and brian d foy)
1299
1300 Hearing the word "in" is an I<in>dication that you probably should have
1301 used a hash, not a list or array, to store your data.  Hashes are
1302 designed to answer this question quickly and efficiently.  Arrays aren't.
1303
1304 That being said, there are several ways to approach this.  In Perl 5.10
1305 and later, you can use the smart match operator to check that an item is
1306 contained in an array or a hash:
1307
1308         use 5.010;
1309
1310         if( $item ~~ @array )
1311                 {
1312                 say "The array contains $item"
1313                 }
1314
1315         if( $item ~~ %hash )
1316                 {
1317                 say "The hash contains $item"
1318                 }
1319
1320 With earlier versions of Perl, you have to do a bit more work. If you
1321 are going to make this query many times over arbitrary string values,
1322 the fastest way is probably to invert the original array and maintain a
1323 hash whose keys are the first array's values:
1324
1325         @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1326         %is_blue = ();
1327         for (@blues) { $is_blue{$_} = 1 }
1328
1329 Now you can check whether C<$is_blue{$some_color}>.  It might have
1330 been a good idea to keep the blues all in a hash in the first place.
1331
1332 If the values are all small integers, you could use a simple indexed
1333 array.  This kind of an array will take up less space:
1334
1335         @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1336         @is_tiny_prime = ();
1337         for (@primes) { $is_tiny_prime[$_] = 1 }
1338         # or simply  @istiny_prime[@primes] = (1) x @primes;
1339
1340 Now you check whether $is_tiny_prime[$some_number].
1341
1342 If the values in question are integers instead of strings, you can save
1343 quite a lot of space by using bit strings instead:
1344
1345         @articles = ( 1..10, 150..2000, 2017 );
1346         undef $read;
1347         for (@articles) { vec($read,$_,1) = 1 }
1348
1349 Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1350
1351 These methods guarantee fast individual tests but require a re-organization
1352 of the original list or array.  They only pay off if you have to test
1353 multiple values against the same array.
1354
1355 If you are testing only once, the standard module C<List::Util> exports
1356 the function C<first> for this purpose.  It works by stopping once it
1357 finds the element. It's written in C for speed, and its Perl equivalent
1358 looks like this subroutine:
1359
1360         sub first (&@) {
1361                 my $code = shift;
1362                 foreach (@_) {
1363                         return $_ if &{$code}();
1364                 }
1365                 undef;
1366         }
1367
1368 If speed is of little concern, the common idiom uses grep in scalar context
1369 (which returns the number of items that passed its condition) to traverse the
1370 entire list. This does have the benefit of telling you how many matches it
1371 found, though.
1372
1373         my $is_there = grep $_ eq $whatever, @array;
1374
1375 If you want to actually extract the matching elements, simply use grep in
1376 list context.
1377
1378         my @matches = grep $_ eq $whatever, @array;
1379
1380 =head2 How do I compute the difference of two arrays?  How do I compute the intersection of two arrays?
1381
1382 Use a hash.  Here's code to do both and more.  It assumes that each
1383 element is unique in a given array:
1384
1385         @union = @intersection = @difference = ();
1386         %count = ();
1387         foreach $element (@array1, @array2) { $count{$element}++ }
1388         foreach $element (keys %count) {
1389                 push @union, $element;
1390                 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1391                 }
1392
1393 Note that this is the I<symmetric difference>, that is, all elements
1394 in either A or in B but not in both.  Think of it as an xor operation.
1395
1396 =head2 How do I test whether two arrays or hashes are equal?
1397
1398 With Perl 5.10 and later, the smart match operator can give you the answer
1399 with the least amount of work:
1400
1401         use 5.010;
1402
1403         if( @array1 ~~ @array2 )
1404                 {
1405                 say "The arrays are the same";
1406                 }
1407
1408         if( %hash1 ~~ %hash2 ) # doesn't check values!
1409                 {
1410                 say "The hash keys are the same";
1411                 }
1412
1413 The following code works for single-level arrays.  It uses a
1414 stringwise comparison, and does not distinguish defined versus
1415 undefined empty strings.  Modify if you have other needs.
1416
1417         $are_equal = compare_arrays(\@frogs, \@toads);
1418
1419         sub compare_arrays {
1420                 my ($first, $second) = @_;
1421                 no warnings;  # silence spurious -w undef complaints
1422                 return 0 unless @$first == @$second;
1423                 for (my $i = 0; $i < @$first; $i++) {
1424                         return 0 if $first->[$i] ne $second->[$i];
1425                         }
1426                 return 1;
1427                 }
1428
1429 For multilevel structures, you may wish to use an approach more
1430 like this one.  It uses the CPAN module C<FreezeThaw>:
1431
1432         use FreezeThaw qw(cmpStr);
1433         @a = @b = ( "this", "that", [ "more", "stuff" ] );
1434
1435         printf "a and b contain %s arrays\n",
1436                 cmpStr(\@a, \@b) == 0
1437                 ? "the same"
1438                 : "different";
1439
1440 This approach also works for comparing hashes.  Here we'll demonstrate
1441 two different answers:
1442
1443         use FreezeThaw qw(cmpStr cmpStrHard);
1444
1445         %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1446         $a{EXTRA} = \%b;
1447         $b{EXTRA} = \%a;
1448
1449         printf "a and b contain %s hashes\n",
1450         cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1451
1452         printf "a and b contain %s hashes\n",
1453         cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1454
1455
1456 The first reports that both those the hashes contain the same data,
1457 while the second reports that they do not.  Which you prefer is left as
1458 an exercise to the reader.
1459
1460 =head2 How do I find the first array element for which a condition is true?
1461
1462 To find the first array element which satisfies a condition, you can
1463 use the C<first()> function in the C<List::Util> module, which comes
1464 with Perl 5.8. This example finds the first element that contains
1465 "Perl".
1466
1467         use List::Util qw(first);
1468
1469         my $element = first { /Perl/ } @array;
1470
1471 If you cannot use C<List::Util>, you can make your own loop to do the
1472 same thing.  Once you find the element, you stop the loop with last.
1473
1474         my $found;
1475         foreach ( @array ) {
1476                 if( /Perl/ ) { $found = $_; last }
1477                 }
1478
1479 If you want the array index, you can iterate through the indices
1480 and check the array element at each index until you find one
1481 that satisfies the condition.
1482
1483         my( $found, $index ) = ( undef, -1 );
1484         for( $i = 0; $i < @array; $i++ ) {
1485                 if( $array[$i] =~ /Perl/ ) {
1486                         $found = $array[$i];
1487                         $index = $i;
1488                         last;
1489                         }
1490                 }
1491
1492 =head2 How do I handle linked lists?
1493
1494 In general, you usually don't need a linked list in Perl, since with
1495 regular arrays, you can push and pop or shift and unshift at either
1496 end, or you can use splice to add and/or remove arbitrary number of
1497 elements at arbitrary points.  Both pop and shift are O(1)
1498 operations on Perl's dynamic arrays.  In the absence of shifts and
1499 pops, push in general needs to reallocate on the order every log(N)
1500 times, and unshift will need to copy pointers each time.
1501
1502 If you really, really wanted, you could use structures as described in
1503 L<perldsc> or L<perltoot> and do just what the algorithm book tells
1504 you to do.  For example, imagine a list node like this:
1505
1506         $node = {
1507                 VALUE => 42,
1508                 LINK  => undef,
1509                 };
1510
1511 You could walk the list this way:
1512
1513         print "List: ";
1514         for ($node = $head;  $node; $node = $node->{LINK}) {
1515                 print $node->{VALUE}, " ";
1516                 }
1517         print "\n";
1518
1519 You could add to the list this way:
1520
1521         my ($head, $tail);
1522         $tail = append($head, 1);       # grow a new head
1523         for $value ( 2 .. 10 ) {
1524                 $tail = append($tail, $value);
1525                 }
1526
1527         sub append {
1528                 my($list, $value) = @_;
1529                 my $node = { VALUE => $value };
1530                 if ($list) {
1531                         $node->{LINK} = $list->{LINK};
1532                         $list->{LINK} = $node;
1533                         }
1534                 else {
1535                         $_[0] = $node;      # replace caller's version
1536                         }
1537                 return $node;
1538                 }
1539
1540 But again, Perl's built-in are virtually always good enough.
1541
1542 =head2 How do I handle circular lists?
1543 X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1544 X<cycle> X<modulus>
1545
1546 (contributed by brian d foy)
1547
1548 If you want to cycle through an array endlessly, you can increment the
1549 index modulo the number of elements in the array:
1550
1551         my @array = qw( a b c );
1552         my $i = 0;
1553
1554         while( 1 ) {
1555                 print $array[ $i++ % @array ], "\n";
1556                 last if $i > 20;
1557                 }
1558
1559 You can also use C<Tie::Cycle> to use a scalar that always has the
1560 next element of the circular array:
1561
1562         use Tie::Cycle;
1563
1564         tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1565
1566         print $cycle; # FFFFFF
1567         print $cycle; # 000000
1568         print $cycle; # FFFF00
1569
1570 The C<Array::Iterator::Circular> creates an iterator object for
1571 circular arrays:
1572
1573         use Array::Iterator::Circular;
1574
1575         my $color_iterator = Array::Iterator::Circular->new(
1576                 qw(red green blue orange)
1577                 );
1578
1579         foreach ( 1 .. 20 ) {
1580                 print $color_iterator->next, "\n";
1581                 }
1582
1583 =head2 How do I shuffle an array randomly?
1584
1585 If you either have Perl 5.8.0 or later installed, or if you have
1586 Scalar-List-Utils 1.03 or later installed, you can say:
1587
1588         use List::Util 'shuffle';
1589
1590         @shuffled = shuffle(@list);
1591
1592 If not, you can use a Fisher-Yates shuffle.
1593
1594         sub fisher_yates_shuffle {
1595                 my $deck = shift;  # $deck is a reference to an array
1596                 return unless @$deck; # must not be empty!
1597
1598                 my $i = @$deck;
1599                 while (--$i) {
1600                         my $j = int rand ($i+1);
1601                         @$deck[$i,$j] = @$deck[$j,$i];
1602                         }
1603         }
1604
1605         # shuffle my mpeg collection
1606         #
1607         my @mpeg = <audio/*/*.mp3>;
1608         fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1609         print @mpeg;
1610
1611 Note that the above implementation shuffles an array in place,
1612 unlike the C<List::Util::shuffle()> which takes a list and returns
1613 a new shuffled list.
1614
1615 You've probably seen shuffling algorithms that work using splice,
1616 randomly picking another element to swap the current element with
1617
1618         srand;
1619         @new = ();
1620         @old = 1 .. 10;  # just a demo
1621         while (@old) {
1622                 push(@new, splice(@old, rand @old, 1));
1623                 }
1624
1625 This is bad because splice is already O(N), and since you do it N
1626 times, you just invented a quadratic algorithm; that is, O(N**2).
1627 This does not scale, although Perl is so efficient that you probably
1628 won't notice this until you have rather largish arrays.
1629
1630 =head2 How do I process/modify each element of an array?
1631
1632 Use C<for>/C<foreach>:
1633
1634         for (@lines) {
1635                 s/foo/bar/;     # change that word
1636                 tr/XZ/ZX/;      # swap those letters
1637                 }
1638
1639 Here's another; let's compute spherical volumes:
1640
1641         for (@volumes = @radii) {   # @volumes has changed parts
1642                 $_ **= 3;
1643                 $_ *= (4/3) * 3.14159;  # this will be constant folded
1644                 }
1645
1646 which can also be done with C<map()> which is made to transform
1647 one list into another:
1648
1649         @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1650
1651 If you want to do the same thing to modify the values of the
1652 hash, you can use the C<values> function.  As of Perl 5.6
1653 the values are not copied, so if you modify $orbit (in this
1654 case), you modify the value.
1655
1656         for $orbit ( values %orbits ) {
1657                 ($orbit **= 3) *= (4/3) * 3.14159;
1658                 }
1659
1660 Prior to perl 5.6 C<values> returned copies of the values,
1661 so older perl code often contains constructions such as
1662 C<@orbits{keys %orbits}> instead of C<values %orbits> where
1663 the hash is to be modified.
1664
1665 =head2 How do I select a random element from an array?
1666
1667 Use the C<rand()> function (see L<perlfunc/rand>):
1668
1669         $index   = rand @array;
1670         $element = $array[$index];
1671
1672 Or, simply:
1673
1674         my $element = $array[ rand @array ];
1675
1676 =head2 How do I permute N elements of a list?
1677 X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1678 X<The Art of Computer Programming> X<Fischer-Krause>
1679
1680 Use the C<List::Permutor> module on CPAN. If the list is actually an
1681 array, try the C<Algorithm::Permute> module (also on CPAN). It's
1682 written in XS code and is very efficient:
1683
1684         use Algorithm::Permute;
1685
1686         my @array = 'a'..'d';
1687         my $p_iterator = Algorithm::Permute->new ( \@array );
1688
1689         while (my @perm = $p_iterator->next) {
1690            print "next permutation: (@perm)\n";
1691                 }
1692
1693 For even faster execution, you could do:
1694
1695         use Algorithm::Permute;
1696
1697         my @array = 'a'..'d';
1698
1699         Algorithm::Permute::permute {
1700                 print "next permutation: (@array)\n";
1701                 } @array;
1702
1703 Here's a little program that generates all permutations of all the
1704 words on each line of input. The algorithm embodied in the
1705 C<permute()> function is discussed in Volume 4 (still unpublished) of
1706 Knuth's I<The Art of Computer Programming> and will work on any list:
1707
1708         #!/usr/bin/perl -n
1709         # Fischer-Krause ordered permutation generator
1710
1711         sub permute (&@) {
1712                 my $code = shift;
1713                 my @idx = 0..$#_;
1714                 while ( $code->(@_[@idx]) ) {
1715                         my $p = $#idx;
1716                         --$p while $idx[$p-1] > $idx[$p];
1717                         my $q = $p or return;
1718                         push @idx, reverse splice @idx, $p;
1719                         ++$q while $idx[$p-1] > $idx[$q];
1720                         @idx[$p-1,$q]=@idx[$q,$p-1];
1721                 }
1722         }
1723
1724         permute { print "@_\n" } split;
1725
1726 The C<Algorithm::Loops> module also provides the C<NextPermute> and
1727 C<NextPermuteNum> functions which efficiently find all unique permutations
1728 of an array, even if it contains duplicate values, modifying it in-place:
1729 if its elements are in reverse-sorted order then the array is reversed,
1730 making it sorted, and it returns false; otherwise the next
1731 permutation is returned.
1732
1733 C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1734 you can enumerate all the permutations of C<0..9> like this:
1735
1736         use Algorithm::Loops qw(NextPermuteNum);
1737
1738     my @list= 0..9;
1739     do { print "@list\n" } while NextPermuteNum @list;
1740
1741 =head2 How do I sort an array by (anything)?
1742
1743 Supply a comparison function to sort() (described in L<perlfunc/sort>):
1744
1745         @list = sort { $a <=> $b } @list;
1746
1747 The default sort function is cmp, string comparison, which would
1748 sort C<(1, 2, 10)> into C<(1, 10, 2)>.  C<< <=> >>, used above, is
1749 the numerical comparison operator.
1750
1751 If you have a complicated function needed to pull out the part you
1752 want to sort on, then don't do it inside the sort function.  Pull it
1753 out first, because the sort BLOCK can be called many times for the
1754 same element.  Here's an example of how to pull out the first word
1755 after the first number on each item, and then sort those words
1756 case-insensitively.
1757
1758         @idx = ();
1759         for (@data) {
1760                 ($item) = /\d+\s*(\S+)/;
1761                 push @idx, uc($item);
1762             }
1763         @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1764
1765 which could also be written this way, using a trick
1766 that's come to be known as the Schwartzian Transform:
1767
1768         @sorted = map  { $_->[0] }
1769                 sort { $a->[1] cmp $b->[1] }
1770                 map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1771
1772 If you need to sort on several fields, the following paradigm is useful.
1773
1774         @sorted = sort {
1775                 field1($a) <=> field1($b) ||
1776                 field2($a) cmp field2($b) ||
1777                 field3($a) cmp field3($b)
1778                 } @data;
1779
1780 This can be conveniently combined with precalculation of keys as given
1781 above.
1782
1783 See the F<sort> article in the "Far More Than You Ever Wanted
1784 To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
1785 more about this approach.
1786
1787 See also the question later in L<perlfaq4> on sorting hashes.
1788
1789 =head2 How do I manipulate arrays of bits?
1790
1791 Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1792 operations.
1793
1794 For example, you don't have to store individual bits in an array
1795 (which would mean that you're wasting a lot of space). To convert an
1796 array of bits to a string, use C<vec()> to set the right bits. This
1797 sets C<$vec> to have bit N set only if C<$ints[N]> was set:
1798
1799         @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1800         $vec = '';
1801         foreach( 0 .. $#ints ) {
1802                 vec($vec,$_,1) = 1 if $ints[$_];
1803                 }
1804
1805 The string C<$vec> only takes up as many bits as it needs. For
1806 instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1807 bytes to store them (not counting the scalar variable overhead).
1808
1809 Here's how, given a vector in C<$vec>, you can get those bits into
1810 your C<@ints> array:
1811
1812         sub bitvec_to_list {
1813                 my $vec = shift;
1814                 my @ints;
1815                 # Find null-byte density then select best algorithm
1816                 if ($vec =~ tr/\0// / length $vec > 0.95) {
1817                         use integer;
1818                         my $i;
1819
1820                         # This method is faster with mostly null-bytes
1821                         while($vec =~ /[^\0]/g ) {
1822                                 $i = -9 + 8 * pos $vec;
1823                                 push @ints, $i if vec($vec, ++$i, 1);
1824                                 push @ints, $i if vec($vec, ++$i, 1);
1825                                 push @ints, $i if vec($vec, ++$i, 1);
1826                                 push @ints, $i if vec($vec, ++$i, 1);
1827                                 push @ints, $i if vec($vec, ++$i, 1);
1828                                 push @ints, $i if vec($vec, ++$i, 1);
1829                                 push @ints, $i if vec($vec, ++$i, 1);
1830                                 push @ints, $i if vec($vec, ++$i, 1);
1831                                 }
1832                         }
1833                 else {
1834                         # This method is a fast general algorithm
1835                         use integer;
1836                         my $bits = unpack "b*", $vec;
1837                         push @ints, 0 if $bits =~ s/^(\d)// && $1;
1838                         push @ints, pos $bits while($bits =~ /1/g);
1839                         }
1840
1841                 return \@ints;
1842                 }
1843
1844 This method gets faster the more sparse the bit vector is.
1845 (Courtesy of Tim Bunce and Winfried Koenig.)
1846
1847 You can make the while loop a lot shorter with this suggestion
1848 from Benjamin Goldberg:
1849
1850         while($vec =~ /[^\0]+/g ) {
1851                 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1852                 }
1853
1854 Or use the CPAN module C<Bit::Vector>:
1855
1856         $vector = Bit::Vector->new($num_of_bits);
1857         $vector->Index_List_Store(@ints);
1858         @ints = $vector->Index_List_Read();
1859
1860 C<Bit::Vector> provides efficient methods for bit vector, sets of
1861 small integers and "big int" math.
1862
1863 Here's a more extensive illustration using vec():
1864
1865         # vec demo
1866         $vector = "\xff\x0f\xef\xfe";
1867         print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1868         unpack("N", $vector), "\n";
1869         $is_set = vec($vector, 23, 1);
1870         print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1871         pvec($vector);
1872
1873         set_vec(1,1,1);
1874         set_vec(3,1,1);
1875         set_vec(23,1,1);
1876
1877         set_vec(3,1,3);
1878         set_vec(3,2,3);
1879         set_vec(3,4,3);
1880         set_vec(3,4,7);
1881         set_vec(3,8,3);
1882         set_vec(3,8,7);
1883
1884         set_vec(0,32,17);
1885         set_vec(1,32,17);
1886
1887         sub set_vec {
1888                 my ($offset, $width, $value) = @_;
1889                 my $vector = '';
1890                 vec($vector, $offset, $width) = $value;
1891                 print "offset=$offset width=$width value=$value\n";
1892                 pvec($vector);
1893                 }
1894
1895         sub pvec {
1896                 my $vector = shift;
1897                 my $bits = unpack("b*", $vector);
1898                 my $i = 0;
1899                 my $BASE = 8;
1900
1901                 print "vector length in bytes: ", length($vector), "\n";
1902                 @bytes = unpack("A8" x length($vector), $bits);
1903                 print "bits are: @bytes\n\n";
1904                 }
1905
1906 =head2 Why does defined() return true on empty arrays and hashes?
1907
1908 The short story is that you should probably only use defined on scalars or
1909 functions, not on aggregates (arrays and hashes).  See L<perlfunc/defined>
1910 in the 5.004 release or later of Perl for more detail.
1911
1912 =head1 Data: Hashes (Associative Arrays)
1913
1914 =head2 How do I process an entire hash?
1915
1916 (contributed by brian d foy)
1917
1918 There are a couple of ways that you can process an entire hash. You
1919 can get a list of keys, then go through each key, or grab a one
1920 key-value pair at a time.
1921
1922 To go through all of the keys, use the C<keys> function. This extracts
1923 all of the keys of the hash and gives them back to you as a list. You
1924 can then get the value through the particular key you're processing:
1925
1926         foreach my $key ( keys %hash ) {
1927                 my $value = $hash{$key}
1928                 ...
1929                 }
1930
1931 Once you have the list of keys, you can process that list before you
1932 process the hash elements. For instance, you can sort the keys so you
1933 can process them in lexical order:
1934
1935         foreach my $key ( sort keys %hash ) {
1936                 my $value = $hash{$key}
1937                 ...
1938                 }
1939
1940 Or, you might want to only process some of the items. If you only want
1941 to deal with the keys that start with C<text:>, you can select just
1942 those using C<grep>:
1943
1944         foreach my $key ( grep /^text:/, keys %hash ) {
1945                 my $value = $hash{$key}
1946                 ...
1947                 }
1948
1949 If the hash is very large, you might not want to create a long list of
1950 keys. To save some memory, you can grab one key-value pair at a time using
1951 C<each()>, which returns a pair you haven't seen yet:
1952
1953         while( my( $key, $value ) = each( %hash ) ) {
1954                 ...
1955                 }
1956
1957 The C<each> operator returns the pairs in apparently random order, so if
1958 ordering matters to you, you'll have to stick with the C<keys> method.
1959
1960 The C<each()> operator can be a bit tricky though. You can't add or
1961 delete keys of the hash while you're using it without possibly
1962 skipping or re-processing some pairs after Perl internally rehashes
1963 all of the elements. Additionally, a hash has only one iterator, so if
1964 you use C<keys>, C<values>, or C<each> on the same hash, you can reset
1965 the iterator and mess up your processing. See the C<each> entry in
1966 L<perlfunc> for more details.
1967
1968 =head2 How do I merge two hashes?
1969 X<hash> X<merge> X<slice, hash>
1970
1971 (contributed by brian d foy)
1972
1973 Before you decide to merge two hashes, you have to decide what to do
1974 if both hashes contain keys that are the same and if you want to leave
1975 the original hashes as they were.
1976
1977 If you want to preserve the original hashes, copy one hash (C<%hash1>)
1978 to a new hash (C<%new_hash>), then add the keys from the other hash
1979 (C<%hash2> to the new hash. Checking that the key already exists in
1980 C<%new_hash> gives you a chance to decide what to do with the
1981 duplicates:
1982
1983         my %new_hash = %hash1; # make a copy; leave %hash1 alone
1984
1985         foreach my $key2 ( keys %hash2 )
1986                 {
1987                 if( exists $new_hash{$key2} )
1988                         {
1989                         warn "Key [$key2] is in both hashes!";
1990                         # handle the duplicate (perhaps only warning)
1991                         ...
1992                         next;
1993                         }
1994                 else
1995                         {
1996                         $new_hash{$key2} = $hash2{$key2};
1997                         }
1998                 }
1999
2000 If you don't want to create a new hash, you can still use this looping
2001 technique; just change the C<%new_hash> to C<%hash1>.
2002
2003         foreach my $key2 ( keys %hash2 )
2004                 {
2005                 if( exists $hash1{$key2} )
2006                         {
2007                         warn "Key [$key2] is in both hashes!";
2008                         # handle the duplicate (perhaps only warning)
2009                         ...
2010                         next;
2011                         }
2012                 else
2013                         {
2014                         $hash1{$key2} = $hash2{$key2};
2015                         }
2016                 }
2017
2018 If you don't care that one hash overwrites keys and values from the other, you
2019 could just use a hash slice to add one hash to another. In this case, values
2020 from C<%hash2> replace values from C<%hash1> when they have keys in common:
2021
2022         @hash1{ keys %hash2 } = values %hash2;
2023
2024 =head2 What happens if I add or remove keys from a hash while iterating over it?
2025
2026 (contributed by brian d foy)
2027
2028 The easy answer is "Don't do that!"
2029
2030 If you iterate through the hash with each(), you can delete the key
2031 most recently returned without worrying about it.  If you delete or add
2032 other keys, the iterator may skip or double up on them since perl
2033 may rearrange the hash table.  See the
2034 entry for C<each()> in L<perlfunc>.
2035
2036 =head2 How do I look up a hash element by value?
2037
2038 Create a reverse hash:
2039
2040         %by_value = reverse %by_key;
2041         $key = $by_value{$value};
2042
2043 That's not particularly efficient.  It would be more space-efficient
2044 to use:
2045
2046         while (($key, $value) = each %by_key) {
2047                 $by_value{$value} = $key;
2048             }
2049
2050 If your hash could have repeated values, the methods above will only find
2051 one of the associated keys.   This may or may not worry you.  If it does
2052 worry you, you can always reverse the hash into a hash of arrays instead:
2053
2054         while (($key, $value) = each %by_key) {
2055                  push @{$key_list_by_value{$value}}, $key;
2056                 }
2057
2058 =head2 How can I know how many entries are in a hash?
2059
2060 (contributed by brian d foy)
2061
2062 This is very similar to "How do I process an entire hash?", also in
2063 L<perlfaq4>, but a bit simpler in the common cases.
2064
2065 You can use the C<keys()> built-in function in scalar context to find out
2066 have many entries you have in a hash:
2067
2068         my $key_count = keys %hash; # must be scalar context!
2069
2070 If you want to find out how many entries have a defined value, that's
2071 a bit different. You have to check each value. A C<grep> is handy:
2072
2073         my $defined_value_count = grep { defined } values %hash;
2074
2075 You can use that same structure to count the entries any way that
2076 you like. If you want the count of the keys with vowels in them,
2077 you just test for that instead:
2078
2079         my $vowel_count = grep { /[aeiou]/ } keys %hash;
2080
2081 The C<grep> in scalar context returns the count. If you want the list
2082 of matching items, just use it in list context instead:
2083
2084         my @defined_values = grep { defined } values %hash;
2085
2086 The C<keys()> function also resets the iterator, which means that you may
2087 see strange results if you use this between uses of other hash operators
2088 such as C<each()>.
2089
2090 =head2 How do I sort a hash (optionally by value instead of key)?
2091
2092 (contributed by brian d foy)
2093
2094 To sort a hash, start with the keys. In this example, we give the list of
2095 keys to the sort function which then compares them ASCIIbetically (which
2096 might be affected by your locale settings). The output list has the keys
2097 in ASCIIbetical order. Once we have the keys, we can go through them to
2098 create a report which lists the keys in ASCIIbetical order.
2099
2100         my @keys = sort { $a cmp $b } keys %hash;
2101
2102         foreach my $key ( @keys )
2103                 {
2104                 printf "%-20s %6d\n", $key, $hash{$key};
2105                 }
2106
2107 We could get more fancy in the C<sort()> block though. Instead of
2108 comparing the keys, we can compute a value with them and use that
2109 value as the comparison.
2110
2111 For instance, to make our report order case-insensitive, we use
2112 the C<\L> sequence in a double-quoted string to make everything
2113 lowercase. The C<sort()> block then compares the lowercased
2114 values to determine in which order to put the keys.
2115
2116         my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2117
2118 Note: if the computation is expensive or the hash has many elements,
2119 you may want to look at the Schwartzian Transform to cache the
2120 computation results.
2121
2122 If we want to sort by the hash value instead, we use the hash key
2123 to look it up. We still get out a list of keys, but this time they
2124 are ordered by their value.
2125
2126         my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2127
2128 From there we can get more complex. If the hash values are the same,
2129 we can provide a secondary sort on the hash key.
2130
2131         my @keys = sort {
2132                 $hash{$a} <=> $hash{$b}
2133                         or
2134                 "\L$a" cmp "\L$b"
2135                 } keys %hash;
2136
2137 =head2 How can I always keep my hash sorted?
2138 X<hash tie sort DB_File Tie::IxHash>
2139
2140 You can look into using the C<DB_File> module and C<tie()> using the
2141 C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2142 Databases">. The C<Tie::IxHash> module from CPAN might also be
2143 instructive. Although this does keep your hash sorted, you might not
2144 like the slow down you suffer from the tie interface. Are you sure you
2145 need to do this? :)
2146
2147 =head2 What's the difference between "delete" and "undef" with hashes?
2148
2149 Hashes contain pairs of scalars: the first is the key, the
2150 second is the value.  The key will be coerced to a string,
2151 although the value can be any kind of scalar: string,
2152 number, or reference.  If a key C<$key> is present in
2153 %hash, C<exists($hash{$key})> will return true.  The value
2154 for a given key can be C<undef>, in which case
2155 C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2156 will return true.  This corresponds to (C<$key>, C<undef>)
2157 being in the hash.
2158
2159 Pictures help...  Here's the C<%hash> table:
2160
2161           keys  values
2162         +------+------+
2163         |  a   |  3   |
2164         |  x   |  7   |
2165         |  d   |  0   |
2166         |  e   |  2   |
2167         +------+------+
2168
2169 And these conditions hold
2170
2171         $hash{'a'}                       is true
2172         $hash{'d'}                       is false
2173         defined $hash{'d'}               is true
2174         defined $hash{'a'}               is true
2175         exists $hash{'a'}                is true (Perl 5 only)
2176         grep ($_ eq 'a', keys %hash)     is true
2177
2178 If you now say
2179
2180         undef $hash{'a'}
2181
2182 your table now reads:
2183
2184
2185           keys  values
2186         +------+------+
2187         |  a   | undef|
2188         |  x   |  7   |
2189         |  d   |  0   |
2190         |  e   |  2   |
2191         +------+------+
2192
2193 and these conditions now hold; changes in caps:
2194
2195         $hash{'a'}                       is FALSE
2196         $hash{'d'}                       is false
2197         defined $hash{'d'}               is true
2198         defined $hash{'a'}               is FALSE
2199         exists $hash{'a'}                is true (Perl 5 only)
2200         grep ($_ eq 'a', keys %hash)     is true
2201
2202 Notice the last two: you have an undef value, but a defined key!
2203
2204 Now, consider this:
2205
2206         delete $hash{'a'}
2207
2208 your table now reads:
2209
2210           keys  values
2211         +------+------+
2212         |  x   |  7   |
2213         |  d   |  0   |
2214         |  e   |  2   |
2215         +------+------+
2216
2217 and these conditions now hold; changes in caps:
2218
2219         $hash{'a'}                       is false
2220         $hash{'d'}                       is false
2221         defined $hash{'d'}               is true
2222         defined $hash{'a'}               is false
2223         exists $hash{'a'}                is FALSE (Perl 5 only)
2224         grep ($_ eq 'a', keys %hash)     is FALSE
2225
2226 See, the whole entry is gone!
2227
2228 =head2 Why don't my tied hashes make the defined/exists distinction?
2229
2230 This depends on the tied hash's implementation of EXISTS().
2231 For example, there isn't the concept of undef with hashes
2232 that are tied to DBM* files. It also means that exists() and
2233 defined() do the same thing with a DBM* file, and what they
2234 end up doing is not what they do with ordinary hashes.
2235
2236 =head2 How do I reset an each() operation part-way through?
2237
2238 (contributed by brian d foy)
2239
2240 You can use the C<keys> or C<values> functions to reset C<each>. To
2241 simply reset the iterator used by C<each> without doing anything else,
2242 use one of them in void context:
2243
2244         keys %hash; # resets iterator, nothing else.
2245         values %hash; # resets iterator, nothing else.
2246
2247 See the documentation for C<each> in L<perlfunc>.
2248
2249 =head2 How can I get the unique keys from two hashes?
2250
2251 First you extract the keys from the hashes into lists, then solve
2252 the "removing duplicates" problem described above.  For example:
2253
2254         %seen = ();
2255         for $element (keys(%foo), keys(%bar)) {
2256                 $seen{$element}++;
2257                 }
2258         @uniq = keys %seen;
2259
2260 Or more succinctly:
2261
2262         @uniq = keys %{{%foo,%bar}};
2263
2264 Or if you really want to save space:
2265
2266         %seen = ();
2267         while (defined ($key = each %foo)) {
2268                 $seen{$key}++;
2269         }
2270         while (defined ($key = each %bar)) {
2271                 $seen{$key}++;
2272         }
2273         @uniq = keys %seen;
2274
2275 =head2 How can I store a multidimensional array in a DBM file?
2276
2277 Either stringify the structure yourself (no fun), or else
2278 get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2279 it on top of either DB_File or GDBM_File.
2280
2281 =head2 How can I make my hash remember the order I put elements into it?
2282
2283 Use the C<Tie::IxHash> from CPAN.
2284
2285         use Tie::IxHash;
2286
2287         tie my %myhash, 'Tie::IxHash';
2288
2289         for (my $i=0; $i<20; $i++) {
2290                 $myhash{$i} = 2*$i;
2291                 }
2292
2293         my @keys = keys %myhash;
2294         # @keys = (0,1,2,3,...)
2295
2296 =head2 Why does passing a subroutine an undefined element in a hash create it?
2297
2298 (contributed by brian d foy)
2299
2300 Are you using a really old version of Perl?
2301
2302 Normally, accessing a hash key's value for a nonexistent key will
2303 I<not> create the key.
2304
2305         my %hash  = ();
2306         my $value = $hash{ 'foo' };
2307         print "This won't print\n" if exists $hash{ 'foo' };
2308
2309 Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2310 Since you could assign directly to C<$_[0]>, Perl had to be ready to
2311 make that assignment so it created the hash key ahead of time:
2312
2313     my_sub( $hash{ 'foo' } );
2314         print "This will print before 5.004\n" if exists $hash{ 'foo' };
2315
2316         sub my_sub {
2317                 # $_[0] = 'bar'; # create hash key in case you do this
2318                 1;
2319                 }
2320
2321 Since Perl 5.004, however, this situation is a special case and Perl
2322 creates the hash key only when you make the assignment:
2323
2324     my_sub( $hash{ 'foo' } );
2325         print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2326
2327         sub my_sub {
2328                 $_[0] = 'bar';
2329                 }
2330
2331 However, if you want the old behavior (and think carefully about that
2332 because it's a weird side effect), you can pass a hash slice instead.
2333 Perl 5.004 didn't make this a special case:
2334
2335         my_sub( @hash{ qw/foo/ } );
2336
2337 =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
2338
2339 Usually a hash ref, perhaps like this:
2340
2341         $record = {
2342                 NAME   => "Jason",
2343                 EMPNO  => 132,
2344                 TITLE  => "deputy peon",
2345                 AGE    => 23,
2346                 SALARY => 37_000,
2347                 PALS   => [ "Norbert", "Rhys", "Phineas"],
2348         };
2349
2350 References are documented in L<perlref> and the upcoming L<perlreftut>.
2351 Examples of complex data structures are given in L<perldsc> and
2352 L<perllol>.  Examples of structures and object-oriented classes are
2353 in L<perltoot>.
2354
2355 =head2 How can I use a reference as a hash key?
2356
2357 (contributed by brian d foy and Ben Morrow)
2358
2359 Hash keys are strings, so you can't really use a reference as the key.
2360 When you try to do that, perl turns the reference into its stringified
2361 form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2362 back the reference from the stringified form, at least without doing
2363 some extra work on your own.
2364
2365 Remember that the entry in the hash will still be there even if
2366 the referenced variable  goes out of scope, and that it is entirely
2367 possible for Perl to subsequently allocate a different variable at
2368 the same address. This will mean a new variable might accidentally
2369 be associated with the value for an old.
2370
2371 If you have Perl 5.10 or later, and you just want to store a value
2372 against the reference for lookup later, you can use the core
2373 Hash::Util::Fieldhash module. This will also handle renaming the
2374 keys if you use multiple threads (which causes all variables to be
2375 reallocated at new addresses, changing their stringification), and
2376 garbage-collecting the entries when the referenced variable goes out
2377 of scope.
2378
2379 If you actually need to be able to get a real reference back from
2380 each hash entry, you can use the Tie::RefHash module, which does the
2381 required work for you.
2382
2383 =head1 Data: Misc
2384
2385 =head2 How do I handle binary data correctly?
2386
2387 Perl is binary clean, so it can handle binary data just fine.
2388 On Windows or DOS, however, you have to use C<binmode> for binary
2389 files to avoid conversions for line endings. In general, you should
2390 use C<binmode> any time you want to work with binary data.
2391
2392 Also see L<perlfunc/"binmode"> or L<perlopentut>.
2393
2394 If you're concerned about 8-bit textual data then see L<perllocale>.
2395 If you want to deal with multibyte characters, however, there are
2396 some gotchas.  See the section on Regular Expressions.
2397
2398 =head2 How do I determine whether a scalar is a number/whole/integer/float?
2399
2400 Assuming that you don't care about IEEE notations like "NaN" or
2401 "Infinity", you probably just want to use a regular expression.
2402
2403         if (/\D/)            { print "has nondigits\n" }
2404         if (/^\d+$/)         { print "is a whole number\n" }
2405         if (/^-?\d+$/)       { print "is an integer\n" }
2406         if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
2407         if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2408         if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2409         if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2410                         { print "a C float\n" }
2411
2412 There are also some commonly used modules for the task.
2413 L<Scalar::Util> (distributed with 5.8) provides access to perl's
2414 internal function C<looks_like_number> for determining whether a
2415 variable looks like a number.  L<Data::Types> exports functions that
2416 validate data types using both the above and other regular
2417 expressions. Thirdly, there is C<Regexp::Common> which has regular
2418 expressions to match various types of numbers. Those three modules are
2419 available from the CPAN.
2420
2421 If you're on a POSIX system, Perl supports the C<POSIX::strtod>
2422 function.  Its semantics are somewhat cumbersome, so here's a
2423 C<getnum> wrapper function for more convenient access.  This function
2424 takes a string and returns the number it found, or C<undef> for input
2425 that isn't a C float.  The C<is_numeric> function is a front end to
2426 C<getnum> if you just want to say, "Is this a float?"
2427
2428         sub getnum {
2429                 use POSIX qw(strtod);
2430                 my $str = shift;
2431                 $str =~ s/^\s+//;
2432                 $str =~ s/\s+$//;
2433                 $! = 0;
2434                 my($num, $unparsed) = strtod($str);
2435                 if (($str eq '') || ($unparsed != 0) || $!) {
2436                                 return undef;
2437                         }
2438                 else {
2439                         return $num;
2440                         }
2441                 }
2442
2443         sub is_numeric { defined getnum($_[0]) }
2444
2445 Or you could check out the L<String::Scanf> module on the CPAN
2446 instead. The C<POSIX> module (part of the standard Perl distribution)
2447 provides the C<strtod> and C<strtol> for converting strings to double
2448 and longs, respectively.
2449
2450 =head2 How do I keep persistent data across program calls?
2451
2452 For some specific applications, you can use one of the DBM modules.
2453 See L<AnyDBM_File>.  More generically, you should consult the C<FreezeThaw>
2454 or C<Storable> modules from CPAN.  Starting from Perl 5.8 C<Storable> is part
2455 of the standard distribution.  Here's one example using C<Storable>'s C<store>
2456 and C<retrieve> functions:
2457
2458         use Storable;
2459         store(\%hash, "filename");
2460
2461         # later on...
2462         $href = retrieve("filename");        # by ref
2463         %hash = %{ retrieve("filename") };   # direct to hash
2464
2465 =head2 How do I print out or copy a recursive data structure?
2466
2467 The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2468 for printing out data structures.  The C<Storable> module on CPAN (or the
2469 5.8 release of Perl), provides a function called C<dclone> that recursively
2470 copies its argument.
2471
2472         use Storable qw(dclone);
2473         $r2 = dclone($r1);
2474
2475 Where C<$r1> can be a reference to any kind of data structure you'd like.
2476 It will be deeply copied.  Because C<dclone> takes and returns references,
2477 you'd have to add extra punctuation if you had a hash of arrays that
2478 you wanted to copy.
2479
2480         %newhash = %{ dclone(\%oldhash) };
2481
2482 =head2 How do I define methods for every class/object?
2483
2484 (contributed by Ben Morrow)
2485
2486 You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2487 be very careful to consider the consequences of doing this: adding
2488 methods to every object is very likely to have unintended
2489 consequences. If possible, it would be better to have all your object
2490 inherit from some common base class, or to use an object system like
2491 Moose that supports roles.
2492
2493 =head2 How do I verify a credit card checksum?
2494
2495 Get the C<Business::CreditCard> module from CPAN.
2496
2497 =head2 How do I pack arrays of doubles or floats for XS code?
2498
2499 The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
2500 If you're doing a lot of float or double processing, consider using
2501 the C<PDL> module from CPAN instead--it makes number-crunching easy.
2502
2503 See L<http://search.cpan.org/dist/PGPLOT> for the code.
2504
2505 =head1 REVISION
2506
2507 Revision: $Revision$
2508
2509 Date: $Date$
2510
2511 See L<perlfaq> for source control details and availability.
2512
2513 =head1 AUTHOR AND COPYRIGHT
2514
2515 Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
2516 other authors as noted. All rights reserved.
2517
2518 This documentation is free; you can redistribute it and/or modify it
2519 under the same terms as Perl itself.
2520
2521 Irrespective of its distribution, all code examples in this file
2522 are hereby placed into the public domain.  You are permitted and
2523 encouraged to use this code in your own programs for fun
2524 or for profit as you see fit.  A simple comment in the code giving
2525 credit would be courteous but is not required.