pod/perlfaq4.pod

   1 =head1 NAME
   2
   3 perlfaq4 - Data Manipulation ($Revision: 1.61 $, $Date: 2005/03/11 16:27:53 $)
   4
   5 =head1 DESCRIPTION
   6
   7 This section of the FAQ answers questions related to manipulating
   8 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
   9
  10 =head1 Data: Numbers
  11
  12 =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
  13
  14 Internally, your computer represents floating-point numbers
  15 in binary. Digital (as in powers of two) computers cannot
  16 store all numbers exactly.  Some real numbers lose precision
  17 in the process.  This is a problem with how computers store
  18 numbers and affects all computer languages, not just Perl.
  19
  20 L<perlnumber> show the gory details of number
  21 representations and conversions.
  22
  23 To limit the number of decimal places in your numbers, you
  24 can use the printf or sprintf function.  See the
  25 L<"Floating Point Arithmetic"|perlop> for more details.
  26
  27         printf "%.2f", 10/3;
  28
  29         my $number = sprintf "%.2f", 10/3;
  30
  31 =head2 Why is int() broken?
  32
  33 Your int() is most probably working just fine.  It's the numbers that
  34 aren't quite what you think.
  35
  36 First, see the above item "Why am I getting long decimals
  37 (eg, 19.9499999999999) instead of the numbers I should be getting
  38 (eg, 19.95)?".
  39
  40 For example, this
  41
  42     print int(0.6/0.2-2), "\n";
  43
  44 will in most computers print 0, not 1, because even such simple
  45 numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
  46 numbers.  What you think in the above as 'three' is really more like
  47 2.9999999999999995559.
  48
  49 =head2 Why isn't my octal data interpreted correctly?
  50
  51 Perl only understands octal and hex numbers as such when they occur as
  52 literals in your program.  Octal literals in perl must start with a
  53 leading "0" and hexadecimal literals must start with a leading "0x".
  54 If they are read in from somewhere and assigned, no automatic
  55 conversion takes place.  You must explicitly use oct() or hex() if you
  56 want the values converted to decimal.  oct() interprets hex ("0x350"),
  57 octal ("0350" or even without the leading "0", like "377") and binary
  58 ("0b1010") numbers, while hex() only converts hexadecimal ones, with
  59 or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
  60 The inverse mapping from decimal to octal can be done with either the
  61 "%o" or "%O" sprintf() formats.
  62
  63 This problem shows up most often when people try using chmod(), mkdir(),
  64 umask(), or sysopen(), which by widespread tradition typically take
  65 permissions in octal.
  66
  67     chmod(644,  $file); # WRONG
  68     chmod(0644, $file); # right
  69
  70 Note the mistake in the first line was specifying the decimal literal
  71 644, rather than the intended octal literal 0644.  The problem can
  72 be seen with:
  73
  74     printf("%#o",644); # prints 01204
  75
  76 Surely you had not intended C<chmod(01204, $file);> - did you?  If you
  77 want to use numeric literals as arguments to chmod() et al. then please
  78 try to express them as octal constants, that is with a leading zero and
  79 with the following digits restricted to the set 0..7.
  80
  81 =head2 Does Perl have a round() function?  What about ceil() and floor()?  Trig functions?
  82
  83 Remember that int() merely truncates toward 0.  For rounding to a
  84 certain number of digits, sprintf() or printf() is usually the easiest
  85 route.
  86
  87     printf("%.3f", 3.1415926535);       # prints 3.142
  88
  89 The POSIX module (part of the standard Perl distribution) implements
  90 ceil(), floor(), and a number of other mathematical and trigonometric
  91 functions.
  92
  93     use POSIX;
  94     $ceil   = ceil(3.5);                        # 4
  95     $floor  = floor(3.5);                       # 3
  96
  97 In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
  98 module.  With 5.004, the Math::Trig module (part of the standard Perl
  99 distribution) implements the trigonometric functions. Internally it
 100 uses the Math::Complex module and some functions can break out from
 101 the real axis into the complex plane, for example the inverse sine of
 102 2.
 103
 104 Rounding in financial applications can have serious implications, and
 105 the rounding method used should be specified precisely.  In these
 106 cases, it probably pays not to trust whichever system rounding is
 107 being used by Perl, but to instead implement the rounding function you
 108 need yourself.
 109
 110 To see why, notice how you'll still have an issue on half-way-point
 111 alternation:
 112
 113     for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
 114
 115     0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
 116     0.8 0.8 0.9 0.9 1.0 1.0
 117
 118 Don't blame Perl.  It's the same as in C.  IEEE says we have to do this.
 119 Perl numbers whose absolute values are integers under 2**31 (on 32 bit
 120 machines) will work pretty much like mathematical integers.  Other numbers
 121 are not guaranteed.
 122
 123 =head2 How do I convert between numeric representations/bases/radixes?
 124
 125 As always with Perl there is more than one way to do it.  Below
 126 are a few examples of approaches to making common conversions
 127 between number representations.  This is intended to be representational
 128 rather than exhaustive.
 129
 130 Some of the examples below use the Bit::Vector module from CPAN.
 131 The reason you might choose Bit::Vector over the perl built in
 132 functions is that it works with numbers of ANY size, that it is
 133 optimized for speed on some operations, and for at least some
 134 programmers the notation might be familiar.
 135
 136 =over 4
 137
 138 =item How do I convert hexadecimal into decimal
 139
 140 Using perl's built in conversion of 0x notation:
 141
 142     $dec = 0xDEADBEEF;
 143
 144 Using the hex function:
 145
 146     $dec = hex("DEADBEEF");
 147
 148 Using pack:
 149
 150     $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
 151
 152 Using the CPAN module Bit::Vector:
 153
 154     use Bit::Vector;
 155     $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
 156     $dec = $vec->to_Dec();
 157
 158 =item How do I convert from decimal to hexadecimal
 159
 160 Using sprintf:
 161
 162     $hex = sprintf("%X", 3735928559); # upper case A-F
 163     $hex = sprintf("%x", 3735928559); # lower case a-f
 164
 165 Using unpack:
 166
 167     $hex = unpack("H*", pack("N", 3735928559));
 168
 169 Using Bit::Vector:
 170
 171     use Bit::Vector;
 172     $vec = Bit::Vector->new_Dec(32, -559038737);
 173     $hex = $vec->to_Hex();
 174
 175 And Bit::Vector supports odd bit counts:
 176
 177     use Bit::Vector;
 178     $vec = Bit::Vector->new_Dec(33, 3735928559);
 179     $vec->Resize(32); # suppress leading 0 if unwanted
 180     $hex = $vec->to_Hex();
 181
 182 =item How do I convert from octal to decimal
 183
 184 Using Perl's built in conversion of numbers with leading zeros:
 185
 186     $dec = 033653337357; # note the leading 0!
 187
 188 Using the oct function:
 189
 190     $dec = oct("33653337357");
 191
 192 Using Bit::Vector:
 193
 194     use Bit::Vector;
 195     $vec = Bit::Vector->new(32);
 196     $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
 197     $dec = $vec->to_Dec();
 198
 199 =item How do I convert from decimal to octal
 200
 201 Using sprintf:
 202
 203     $oct = sprintf("%o", 3735928559);
 204
 205 Using Bit::Vector:
 206
 207     use Bit::Vector;
 208     $vec = Bit::Vector->new_Dec(32, -559038737);
 209     $oct = reverse join('', $vec->Chunk_List_Read(3));
 210
 211 =item How do I convert from binary to decimal
 212
 213 Perl 5.6 lets you write binary numbers directly with
 214 the 0b notation:
 215
 216     $number = 0b10110110;
 217
 218 Using oct:
 219
 220     my $input = "10110110";
 221     $decimal = oct( "0b$input" );
 222
 223 Using pack and ord:
 224
 225     $decimal = ord(pack('B8', '10110110'));
 226
 227 Using pack and unpack for larger strings:
 228
 229     $int = unpack("N", pack("B32",
 230         substr("0" x 32 . "11110101011011011111011101111", -32)));
 231     $dec = sprintf("%d", $int);
 232
 233     # substr() is used to left pad a 32 character string with zeros.
 234
 235 Using Bit::Vector:
 236
 237     $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
 238     $dec = $vec->to_Dec();
 239
 240 =item How do I convert from decimal to binary
 241
 242 Using sprintf (perl 5.6+):
 243
 244     $bin = sprintf("%b", 3735928559);
 245
 246 Using unpack:
 247
 248     $bin = unpack("B*", pack("N", 3735928559));
 249
 250 Using Bit::Vector:
 251
 252     use Bit::Vector;
 253     $vec = Bit::Vector->new_Dec(32, -559038737);
 254     $bin = $vec->to_Bin();
 255
 256 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
 257 are left as an exercise to the inclined reader.
 258
 259 =back
 260
 261 =head2 Why doesn't & work the way I want it to?
 262
 263 The behavior of binary arithmetic operators depends on whether they're
 264 used on numbers or strings.  The operators treat a string as a series
 265 of bits and work with that (the string C<"3"> is the bit pattern
 266 C<00110011>).  The operators work with the binary form of a number
 267 (the number C<3> is treated as the bit pattern C<00000011>).
 268
 269 So, saying C<11 & 3> performs the "and" operation on numbers (yielding
 270 C<3>).  Saying C<"11" & "3"> performs the "and" operation on strings
 271 (yielding C<"1">).
 272
 273 Most problems with C<&> and C<|> arise because the programmer thinks
 274 they have a number but really it's a string.  The rest arise because
 275 the programmer says:
 276
 277     if ("\020\020" & "\101\101") {
 278         # ...
 279     }
 280
 281 but a string consisting of two null bytes (the result of C<"\020\020"
 282 & "\101\101">) is not a false value in Perl.  You need:
 283
 284     if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
 285         # ...
 286     }
 287
 288 =head2 How do I multiply matrices?
 289
 290 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
 291 or the PDL extension (also available from CPAN).
 292
 293 =head2 How do I perform an operation on a series of integers?
 294
 295 To call a function on each element in an array, and collect the
 296 results, use:
 297
 298     @results = map { my_func($_) } @array;
 299
 300 For example:
 301
 302     @triple = map { 3 * $_ } @single;
 303
 304 To call a function on each element of an array, but ignore the
 305 results:
 306
 307     foreach $iterator (@array) {
 308         some_func($iterator);
 309     }
 310
 311 To call a function on each integer in a (small) range, you B<can> use:
 312
 313     @results = map { some_func($_) } (5 .. 25);
 314
 315 but you should be aware that the C<..> operator creates an array of
 316 all integers in the range.  This can take a lot of memory for large
 317 ranges.  Instead use:
 318
 319     @results = ();
 320     for ($i=5; $i < 500_005; $i++) {
 321         push(@results, some_func($i));
 322     }
 323
 324 This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
 325 loop will iterate over the range, without creating the entire range.
 326
 327     for my $i (5 .. 500_005) {
 328         push(@results, some_func($i));
 329     }
 330
 331 will not create a list of 500,000 integers.
 332
 333 =head2 How can I output Roman numerals?
 334
 335 Get the http://www.cpan.org/modules/by-module/Roman module.
 336
 337 =head2 Why aren't my random numbers random?
 338
 339 If you're using a version of Perl before 5.004, you must call C<srand>
 340 once at the start of your program to seed the random number generator.
 341
 342          BEGIN { srand() if $] < 5.004 }
 343
 344 5.004 and later automatically call C<srand> at the beginning.  Don't
 345 call C<srand> more than once---you make your numbers less random, rather
 346 than more.
 347
 348 Computers are good at being predictable and bad at being random
 349 (despite appearances caused by bugs in your programs :-).  see the
 350 F<random> article in the "Far More Than You Ever Wanted To Know"
 351 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
 352 Tom Phoenix, talks more about this.  John von Neumann said, ``Anyone
 353 who attempts to generate random numbers by deterministic means is, of
 354 course, living in a state of sin.''
 355
 356 If you want numbers that are more random than C<rand> with C<srand>
 357 provides, you should also check out the Math::TrulyRandom module from
 358 CPAN.  It uses the imperfections in your system's timer to generate
 359 random numbers, but this takes quite a while.  If you want a better
 360 pseudorandom generator than comes with your operating system, look at
 361 ``Numerical Recipes in C'' at http://www.nr.com/ .
 362
 363 =head2 How do I get a random number between X and Y?
 364
 365 C<rand($x)> returns a number such that
 366 C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
 367 figure out is a random number in the range from 0 to the
 368 difference between your I<X> and I<Y>.
 369
 370 That is, to get a number between 10 and 15, inclusive, you
 371 want a random number between 0 and 5 that you can then add
 372 to 10.
 373
 374     my $number = 10 + int rand( 15-10+1 );
 375
 376 Hence you derive the following simple function to abstract
 377 that. It selects a random integer between the two given
 378 integers (inclusive), For example: C<random_int_in(50,120)>.
 379
 380    sub random_int_in ($$) {
 381      my($min, $max) = @_;
 382       # Assumes that the two arguments are integers themselves!
 383      return $min if $min == $max;
 384      ($min, $max) = ($max, $min)  if  $min > $max;
 385      return $min + int rand(1 + $max - $min);
 386    }
 387
 388 =head1 Data: Dates
 389
 390 =head2 How do I find the day or week of the year?
 391
 392 The localtime function returns the day of the year.  Without an
 393 argument localtime uses the current time.
 394
 395     $day_of_year = (localtime)[7];
 396
 397 The POSIX module can also format a date as the day of the year or
 398 week of the year.
 399
 400         use POSIX qw/strftime/;
 401         my $day_of_year  = strftime "%j", localtime;
 402         my $week_of_year = strftime "%W", localtime;
 403
 404 To get the day of year for any date, use the Time::Local module to get
 405 a time in epoch seconds for the argument to localtime.
 406
 407         use POSIX qw/strftime/;
 408         use Time::Local;
 409         my $week_of_year = strftime "%W",
 410                 localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
 411
 412 The Date::Calc module provides two functions for to calculate these.
 413
 414         use Date::Calc;
 415         my $day_of_year  = Day_of_Year(  1987, 12, 18 );
 416         my $week_of_year = Week_of_Year( 1987, 12, 18 );
 417
 418 =head2 How do I find the current century or millennium?
 419
 420 Use the following simple functions:
 421
 422     sub get_century    {
 423         return int((((localtime(shift || time))[5] + 1999))/100);
 424     }
 425
 426     sub get_millennium {
 427         return 1+int((((localtime(shift || time))[5] + 1899))/1000);
 428     }
 429
 430 On some systems, the POSIX module's strftime() function has
 431 been extended in a non-standard way to use a C<%C> format,
 432 which they sometimes claim is the "century".  It isn't,
 433 because on most such systems, this is only the first two
 434 digits of the four-digit year, and thus cannot be used to
 435 reliably determine the current century or millennium.
 436
 437 =head2 How can I compare two dates and find the difference?
 438
 439 If you're storing your dates as epoch seconds then simply subtract one
 440 from the other.  If you've got a structured date (distinct year, day,
 441 month, hour, minute, seconds values), then for reasons of accessibility,
 442 simplicity, and efficiency, merely use either timelocal or timegm (from
 443 the Time::Local module in the standard distribution) to reduce structured
 444 dates to epoch seconds.  However, if you don't know the precise format of
 445 your dates, then you should probably use either of the Date::Manip and
 446 Date::Calc modules from CPAN before you go hacking up your own parsing
 447 routine to handle arbitrary date formats.
 448
 449 =head2 How can I take a string and turn it into epoch seconds?
 450
 451 If it's a regular enough string that it always has the same format,
 452 you can split it up and pass the parts to C<timelocal> in the standard
 453 Time::Local module.  Otherwise, you should look into the Date::Calc
 454 and Date::Manip modules from CPAN.
 455
 456 =head2 How can I find the Julian Day?
 457
 458 (contributed by brian d foy and Dave Cross)
 459
 460 You can use the Time::JulianDay module available on CPAN.  Ensure that
 461 you really want to find a Julian day, though, as many people have
 462 different ideas about Julian days.  See
 463 http://www.hermetic.ch/cal_stud/jdn.htm for instance.
 464
 465 You can also try the DateTime module, which can convert a date/time
 466 to a Julian Day.
 467
 468   $ perl -MDateTime -le'print DateTime->today->jd'
 469   2453401.5
 470
 471 Or the modified Julian Day
 472
 473   $ perl -MDateTime -le'print DateTime->today->mjd'
 474   53401
 475
 476 Or even the day of the year (which is what some people think of as a
 477 Julian day)
 478
 479   $ perl -MDateTime -le'print DateTime->today->doy'
 480   31
 481
 482 =head2 How do I find yesterday's date?
 483
 484 If you only need to find the date (and not the same time), you
 485 can use the Date::Calc module.
 486
 487         use Date::Calc qw(Today Add_Delta_Days);
 488
 489         my @date = Add_Delta_Days( Today(), -1 );
 490
 491         print "@date\n";
 492
 493 Most people try to use the time rather than the calendar to
 494 figure out dates, but that assumes that your days are
 495 twenty-four hours each.  For most people, there are two days
 496 a year when they aren't: the switch to and from summer time
 497 throws this off. Russ Allbery offers this solution.
 498
 499     sub yesterday {
 500                 my $now  = defined $_[0] ? $_[0] : time;
 501                 my $then = $now - 60 * 60 * 24;
 502                 my $ndst = (localtime $now)[8] > 0;
 503                 my $tdst = (localtime $then)[8] > 0;
 504                 $then - ($tdst - $ndst) * 60 * 60;
 505                 }
 506
 507 Should give you "this time yesterday" in seconds since epoch relative to
 508 the first argument or the current time if no argument is given and
 509 suitable for passing to localtime or whatever else you need to do with
 510 it.  $ndst is whether we're currently in daylight savings time; $tdst is
 511 whether the point 24 hours ago was in daylight savings time.  If $tdst
 512 and $ndst are the same, a boundary wasn't crossed, and the correction
 513 will subtract 0.  If $tdst is 1 and $ndst is 0, subtract an hour more
 514 from yesterday's time since we gained an extra hour while going off
 515 daylight savings time.  If $tdst is 0 and $ndst is 1, subtract a
 516 negative hour (add an hour) to yesterday's time since we lost an hour.
 517
 518 All of this is because during those days when one switches off or onto
 519 DST, a "day" isn't 24 hours long; it's either 23 or 25.
 520
 521 The explicit settings of $ndst and $tdst are necessary because localtime
 522 only says it returns the system tm struct, and the system tm struct at
 523 least on Solaris doesn't guarantee any particular positive value (like,
 524 say, 1) for isdst, just a positive value.  And that value can
 525 potentially be negative, if DST information isn't available (this sub
 526 just treats those cases like no DST).
 527
 528 Note that between 2am and 3am on the day after the time zone switches
 529 off daylight savings time, the exact hour of "yesterday" corresponding
 530 to the current hour is not clearly defined.  Note also that if used
 531 between 2am and 3am the day after the change to daylight savings time,
 532 the result will be between 3am and 4am of the previous day; it's
 533 arguable whether this is correct.
 534
 535 This sub does not attempt to deal with leap seconds (most things don't).
 536
 537
 538
 539 =head2 Does Perl have a Year 2000 problem?  Is Perl Y2K compliant?
 540
 541 Short answer: No, Perl does not have a Year 2000 problem.  Yes, Perl is
 542 Y2K compliant (whatever that means).  The programmers you've hired to
 543 use it, however, probably are not.
 544
 545 Long answer: The question belies a true understanding of the issue.
 546 Perl is just as Y2K compliant as your pencil--no more, and no less.
 547 Can you use your pencil to write a non-Y2K-compliant memo?  Of course
 548 you can.  Is that the pencil's fault?  Of course it isn't.
 549
 550 The date and time functions supplied with Perl (gmtime and localtime)
 551 supply adequate information to determine the year well beyond 2000
 552 (2038 is when trouble strikes for 32-bit machines).  The year returned
 553 by these functions when used in a list context is the year minus 1900.
 554 For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
 555 number. To avoid the year 2000 problem simply do not treat the year as
 556 a 2-digit number.  It isn't.
 557
 558 When gmtime() and localtime() are used in scalar context they return
 559 a timestamp string that contains a fully-expanded year.  For example,
 560 C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
 561 2001".  There's no year 2000 problem here.
 562
 563 That doesn't mean that Perl can't be used to create non-Y2K compliant
 564 programs.  It can.  But so can your pencil.  It's the fault of the user,
 565 not the language.  At the risk of inflaming the NRA: ``Perl doesn't
 566 break Y2K, people do.''  See http://www.perl.org/about/y2k.html for
 567 a longer exposition.
 568
 569 =head1 Data: Strings
 570
 571 =head2 How do I validate input?
 572
 573 The answer to this question is usually a regular expression, perhaps
 574 with auxiliary logic.  See the more specific questions (numbers, mail
 575 addresses, etc.) for details.
 576
 577 =head2 How do I unescape a string?
 578
 579 It depends just what you mean by ``escape''.  URL escapes are dealt
 580 with in L<perlfaq9>.  Shell escapes with the backslash (C<\>)
 581 character are removed with
 582
 583     s/\\(.)/$1/g;
 584
 585 This won't expand C<"\n"> or C<"\t"> or any other special escapes.
 586
 587 =head2 How do I remove consecutive pairs of characters?
 588
 589 To turn C<"abbcccd"> into C<"abccd">:
 590
 591     s/(.)\1/$1/g;       # add /s to include newlines
 592
 593 Here's a solution that turns "abbcccd" to "abcd":
 594
 595     y///cs;     # y == tr, but shorter :-)
 596
 597 =head2 How do I expand function calls in a string?
 598
 599 This is documented in L<perlref>.  In general, this is fraught with
 600 quoting and readability problems, but it is possible.  To interpolate
 601 a subroutine call (in list context) into a string:
 602
 603     print "My sub returned @{[mysub(1,2,3)]} that time.\n";
 604
 605 =head2 How do I find matching/nesting anything?
 606
 607 This isn't something that can be done in one regular expression, no
 608 matter how complicated.  To find something between two single
 609 characters, a pattern like C</x([^x]*)x/> will get the intervening
 610 bits in $1. For multiple ones, then something more like
 611 C</alpha(.*?)omega/> would be needed.  But none of these deals with
 612 nested patterns.  For balanced expressions using C<(>, C<{>, C<[>
 613 or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
 614 L<perlre/(??{ code })>.  For other cases, you'll have to write a parser.
 615
 616 If you are serious about writing a parser, there are a number of
 617 modules or oddities that will make your life a lot easier.  There are
 618 the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
 619 and the byacc program.   Starting from perl 5.8 the Text::Balanced
 620 is part of the standard distribution.
 621
 622 One simple destructive, inside-out approach that you might try is to
 623 pull out the smallest nesting parts one at a time:
 624
 625     while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
 626         # do something with $1
 627     }
 628
 629 A more complicated and sneaky approach is to make Perl's regular
 630 expression engine do it for you.  This is courtesy Dean Inada, and
 631 rather has the nature of an Obfuscated Perl Contest entry, but it
 632 really does work:
 633
 634     # $_ contains the string to parse
 635     # BEGIN and END are the opening and closing markers for the
 636     # nested text.
 637
 638     @( = ('(','');
 639     @) = (')','');
 640     ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
 641     @$ = (eval{/$re/},$@!~/unmatched/i);
 642     print join("\n",@$[0..$#$]) if( $$[-1] );
 643
 644 =head2 How do I reverse a string?
 645
 646 Use reverse() in scalar context, as documented in
 647 L<perlfunc/reverse>.
 648
 649     $reversed = reverse $string;
 650
 651 =head2 How do I expand tabs in a string?
 652
 653 You can do it yourself:
 654
 655     1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
 656
 657 Or you can just use the Text::Tabs module (part of the standard Perl
 658 distribution).
 659
 660     use Text::Tabs;
 661     @expanded_lines = expand(@lines_with_tabs);
 662
 663 =head2 How do I reformat a paragraph?
 664
 665 Use Text::Wrap (part of the standard Perl distribution):
 666
 667     use Text::Wrap;
 668     print wrap("\t", '  ', @paragraphs);
 669
 670 The paragraphs you give to Text::Wrap should not contain embedded
 671 newlines.  Text::Wrap doesn't justify the lines (flush-right).
 672
 673 Or use the CPAN module Text::Autoformat.  Formatting files can be easily
 674 done by making a shell alias, like so:
 675
 676     alias fmt="perl -i -MText::Autoformat -n0777 \
 677         -e 'print autoformat $_, {all=>1}' $*"
 678
 679 See the documentation for Text::Autoformat to appreciate its many
 680 capabilities.
 681
 682 =head2 How can I access or change N characters of a string?
 683
 684 You can access the first characters of a string with substr().
 685 To get the first character, for example, start at position 0
 686 and grab the string of length 1.
 687
 688
 689         $string = "Just another Perl Hacker";
 690     $first_char = substr( $string, 0, 1 );  #  'J'
 691
 692 To change part of a string, you can use the optional fourth
 693 argument which is the replacement string.
 694
 695     substr( $string, 13, 4, "Perl 5.8.0" );
 696
 697 You can also use substr() as an lvalue.
 698
 699     substr( $string, 13, 4 ) =  "Perl 5.8.0";
 700
 701 =head2 How do I change the Nth occurrence of something?
 702
 703 You have to keep track of N yourself.  For example, let's say you want
 704 to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
 705 C<"whosoever"> or C<"whomsoever">, case insensitively.  These
 706 all assume that $_ contains the string to be altered.
 707
 708     $count = 0;
 709     s{((whom?)ever)}{
 710         ++$count == 5           # is it the 5th?
 711             ? "${2}soever"      # yes, swap
 712             : $1                # renege and leave it there
 713     }ige;
 714
 715 In the more general case, you can use the C</g> modifier in a C<while>
 716 loop, keeping count of matches.
 717
 718     $WANT = 3;
 719     $count = 0;
 720     $_ = "One fish two fish red fish blue fish";
 721     while (/(\w+)\s+fish\b/gi) {
 722         if (++$count == $WANT) {
 723             print "The third fish is a $1 one.\n";
 724         }
 725     }
 726
 727 That prints out: C<"The third fish is a red one.">  You can also use a
 728 repetition count and repeated pattern like this:
 729
 730     /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
 731
 732 =head2 How can I count the number of occurrences of a substring within a string?
 733
 734 There are a number of ways, with varying efficiency.  If you want a
 735 count of a certain single character (X) within a string, you can use the
 736 C<tr///> function like so:
 737
 738     $string = "ThisXlineXhasXsomeXx'sXinXit";
 739     $count = ($string =~ tr/X//);
 740     print "There are $count X characters in the string";
 741
 742 This is fine if you are just looking for a single character.  However,
 743 if you are trying to count multiple character substrings within a
 744 larger string, C<tr///> won't work.  What you can do is wrap a while()
 745 loop around a global pattern match.  For example, let's count negative
 746 integers:
 747
 748     $string = "-9 55 48 -2 23 -76 4 14 -44";
 749     while ($string =~ /-\d+/g) { $count++ }
 750     print "There are $count negative numbers in the string";
 751
 752 Another version uses a global match in list context, then assigns the
 753 result to a scalar, producing a count of the number of matches.
 754
 755         $count = () = $string =~ /-\d+/g;
 756
 757 =head2 How do I capitalize all the words on one line?
 758
 759 To make the first letter of each word upper case:
 760
 761         $line =~ s/\b(\w)/\U$1/g;
 762
 763 This has the strange effect of turning "C<don't do it>" into "C<Don'T
 764 Do It>".  Sometimes you might want this.  Other times you might need a
 765 more thorough solution (Suggested by brian d foy):
 766
 767     $string =~ s/ (
 768                  (^\w)    #at the beginning of the line
 769                    |      # or
 770                  (\s\w)   #preceded by whitespace
 771                    )
 772                 /\U$1/xg;
 773     $string =~ /([\w']+)/\u\L$1/g;
 774
 775 To make the whole line upper case:
 776
 777         $line = uc($line);
 778
 779 To force each word to be lower case, with the first letter upper case:
 780
 781         $line =~ s/(\w+)/\u\L$1/g;
 782
 783 You can (and probably should) enable locale awareness of those
 784 characters by placing a C<use locale> pragma in your program.
 785 See L<perllocale> for endless details on locales.
 786
 787 This is sometimes referred to as putting something into "title
 788 case", but that's not quite accurate.  Consider the proper
 789 capitalization of the movie I<Dr. Strangelove or: How I Learned to
 790 Stop Worrying and Love the Bomb>, for example.
 791
 792 Damian Conway's L<Text::Autoformat> module provides some smart
 793 case transformations:
 794
 795     use Text::Autoformat;
 796     my $x = "Dr. Strangelove or: How I Learned to Stop ".
 797       "Worrying and Love the Bomb";
 798
 799     print $x, "\n";
 800     for my $style (qw( sentence title highlight ))
 801     {
 802         print autoformat($x, { case => $style }), "\n";
 803     }
 804
 805 =head2 How can I split a [character] delimited string except when inside [character]?
 806
 807 Several modules can handle this sort of pasing---Text::Balanced,
 808 Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
 809
 810 Take the example case of trying to split a string that is
 811 comma-separated into its different fields. You can't use C<split(/,/)>
 812 because you shouldn't split if the comma is inside quotes.  For
 813 example, take a data line like this:
 814
 815     SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
 816
 817 Due to the restriction of the quotes, this is a fairly complex
 818 problem.  Thankfully, we have Jeffrey Friedl, author of
 819 I<Mastering Regular Expressions>, to handle these for us.  He
 820 suggests (assuming your string is contained in $text):
 821
 822      @new = ();
 823      push(@new, $+) while $text =~ m{
 824          "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
 825        | ([^,]+),?
 826        | ,
 827      }gx;
 828      push(@new, undef) if substr($text,-1,1) eq ',';
 829
 830 If you want to represent quotation marks inside a
 831 quotation-mark-delimited field, escape them with backslashes (eg,
 832 C<"like \"this\"">.
 833
 834 Alternatively, the Text::ParseWords module (part of the standard Perl
 835 distribution) lets you say:
 836
 837     use Text::ParseWords;
 838     @new = quotewords(",", 0, $text);
 839
 840 There's also a Text::CSV (Comma-Separated Values) module on CPAN.
 841
 842 =head2 How do I strip blank space from the beginning/end of a string?
 843
 844 Although the simplest approach would seem to be
 845
 846     $string =~ s/^\s*(.*?)\s*$/$1/;
 847
 848 not only is this unnecessarily slow and destructive, it also fails with
 849 embedded newlines.  It is much faster to do this operation in two steps:
 850
 851     $string =~ s/^\s+//;
 852     $string =~ s/\s+$//;
 853
 854 Or more nicely written as:
 855
 856     for ($string) {
 857         s/^\s+//;
 858         s/\s+$//;
 859     }
 860
 861 This idiom takes advantage of the C<foreach> loop's aliasing
 862 behavior to factor out common code.  You can do this
 863 on several strings at once, or arrays, or even the
 864 values of a hash if you use a slice:
 865
 866     # trim whitespace in the scalar, the array,
 867     # and all the values in the hash
 868     foreach ($scalar, @array, @hash{keys %hash}) {
 869         s/^\s+//;
 870         s/\s+$//;
 871     }
 872
 873 =head2 How do I pad a string with blanks or pad a number with zeroes?
 874
 875 In the following examples, C<$pad_len> is the length to which you wish
 876 to pad the string, C<$text> or C<$num> contains the string to be padded,
 877 and C<$pad_char> contains the padding character. You can use a single
 878 character string constant instead of the C<$pad_char> variable if you
 879 know what it is in advance. And in the same way you can use an integer in
 880 place of C<$pad_len> if you know the pad length in advance.
 881
 882 The simplest method uses the C<sprintf> function. It can pad on the left
 883 or right with blanks and on the left with zeroes and it will not
 884 truncate the result. The C<pack> function can only pad strings on the
 885 right with blanks and it will truncate the result to a maximum length of
 886 C<$pad_len>.
 887
 888     # Left padding a string with blanks (no truncation):
 889         $padded = sprintf("%${pad_len}s", $text);
 890         $padded = sprintf("%*s", $pad_len, $text);  # same thing
 891
 892     # Right padding a string with blanks (no truncation):
 893         $padded = sprintf("%-${pad_len}s", $text);
 894         $padded = sprintf("%-*s", $pad_len, $text); # same thing
 895
 896     # Left padding a number with 0 (no truncation):
 897         $padded = sprintf("%0${pad_len}d", $num);
 898         $padded = sprintf("%0*d", $pad_len, $num); # same thing
 899
 900     # Right padding a string with blanks using pack (will truncate):
 901     $padded = pack("A$pad_len",$text);
 902
 903 If you need to pad with a character other than blank or zero you can use
 904 one of the following methods.  They all generate a pad string with the
 905 C<x> operator and combine that with C<$text>. These methods do
 906 not truncate C<$text>.
 907
 908 Left and right padding with any character, creating a new string:
 909
 910     $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
 911     $padded = $text . $pad_char x ( $pad_len - length( $text ) );
 912
 913 Left and right padding with any character, modifying C<$text> directly:
 914
 915     substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
 916     $text .= $pad_char x ( $pad_len - length( $text ) );
 917
 918 =head2 How do I extract selected columns from a string?
 919
 920 Use substr() or unpack(), both documented in L<perlfunc>.
 921 If you prefer thinking in terms of columns instead of widths,
 922 you can use this kind of thing:
 923
 924     # determine the unpack format needed to split Linux ps output
 925     # arguments are cut columns
 926     my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
 927
 928     sub cut2fmt {
 929         my(@positions) = @_;
 930         my $template  = '';
 931         my $lastpos   = 1;
 932         for my $place (@positions) {
 933             $template .= "A" . ($place - $lastpos) . " ";
 934             $lastpos   = $place;
 935         }
 936         $template .= "A*";
 937         return $template;
 938     }
 939
 940 =head2 How do I find the soundex value of a string?
 941
 942 (contributed by brian d foy)
 943
 944 You can use the Text::Soundex module. If you want to do fuzzy or close
 945 matching, you might also try the String::Approx, and Text::Metaphone,
 946 and Text::DoubleMetaphone modules.
 947
 948 =head2 How can I expand variables in text strings?
 949
 950 Let's assume that you have a string that contains placeholder
 951 variables.
 952
 953     $text = 'this has a $foo in it and a $bar';
 954
 955 You can use a substitution with a double evaluation.  The
 956 first /e turns C<$1> into C<$foo>, and the second /e turns
 957 C<$foo> into its value.  You may want to wrap this in an
 958 C<eval>: if you try to get the value of an undeclared variable
 959 while running under C<use strict>, you get a fatal error.
 960
 961     eval { $text =~ s/(\$\w+)/$1/eeg };
 962     die if $@;
 963
 964 It's probably better in the general case to treat those
 965 variables as entries in some special hash.  For example:
 966
 967     %user_defs = (
 968         foo  => 23,
 969         bar  => 19,
 970     );
 971     $text =~ s/\$(\w+)/$user_defs{$1}/g;
 972
 973 =head2 What's wrong with always quoting "$vars"?
 974
 975 The problem is that those double-quotes force stringification--
 976 coercing numbers and references into strings--even when you
 977 don't want them to be strings.  Think of it this way: double-quote
 978 expansion is used to produce new strings.  If you already
 979 have a string, why do you need more?
 980
 981 If you get used to writing odd things like these:
 982
 983     print "$var";       # BAD
 984     $new = "$old";      # BAD
 985     somefunc("$var");   # BAD
 986
 987 You'll be in trouble.  Those should (in 99.8% of the cases) be
 988 the simpler and more direct:
 989
 990     print $var;
 991     $new = $old;
 992     somefunc($var);
 993
 994 Otherwise, besides slowing you down, you're going to break code when
 995 the thing in the scalar is actually neither a string nor a number, but
 996 a reference:
 997
 998     func(\@array);
 999     sub func {
1000         my $aref = shift;
1001         my $oref = "$aref";  # WRONG
1002     }
1003
1004 You can also get into subtle problems on those few operations in Perl
1005 that actually do care about the difference between a string and a
1006 number, such as the magical C<++> autoincrement operator or the
1007 syscall() function.
1008
1009 Stringification also destroys arrays.
1010
1011     @lines = `command`;
1012     print "@lines";             # WRONG - extra blanks
1013     print @lines;               # right
1014
1015 =head2 Why don't my E<lt>E<lt>HERE documents work?
1016
1017 Check for these three things:
1018
1019 =over 4
1020
1021 =item There must be no space after the E<lt>E<lt> part.
1022
1023 =item There (probably) should be a semicolon at the end.
1024
1025 =item You can't (easily) have any space in front of the tag.
1026
1027 =back
1028
1029 If you want to indent the text in the here document, you
1030 can do this:
1031
1032     # all in one
1033     ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1034         your text
1035         goes here
1036     HERE_TARGET
1037
1038 But the HERE_TARGET must still be flush against the margin.
1039 If you want that indented also, you'll have to quote
1040 in the indentation.
1041
1042     ($quote = <<'    FINIS') =~ s/^\s+//gm;
1043             ...we will have peace, when you and all your works have
1044             perished--and the works of your dark master to whom you
1045             would deliver us. You are a liar, Saruman, and a corrupter
1046             of men's hearts.  --Theoden in /usr/src/perl/taint.c
1047         FINIS
1048     $quote =~ s/\s+--/\n--/;
1049
1050 A nice general-purpose fixer-upper function for indented here documents
1051 follows.  It expects to be called with a here document as its argument.
1052 It looks to see whether each line begins with a common substring, and
1053 if so, strips that substring off.  Otherwise, it takes the amount of leading
1054 whitespace found on the first line and removes that much off each
1055 subsequent line.
1056
1057     sub fix {
1058         local $_ = shift;
1059         my ($white, $leader);  # common whitespace and common leading string
1060         if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1061             ($white, $leader) = ($2, quotemeta($1));
1062         } else {
1063             ($white, $leader) = (/^(\s+)/, '');
1064         }
1065         s/^\s*?$leader(?:$white)?//gm;
1066         return $_;
1067     }
1068
1069 This works with leading special strings, dynamically determined:
1070
1071     $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1072         @@@ int
1073         @@@ runops() {
1074         @@@     SAVEI32(runlevel);
1075         @@@     runlevel++;
1076         @@@     while ( op = (*op->op_ppaddr)() );
1077         @@@     TAINT_NOT;
1078         @@@     return 0;
1079         @@@ }
1080     MAIN_INTERPRETER_LOOP
1081
1082 Or with a fixed amount of leading whitespace, with remaining
1083 indentation correctly preserved:
1084
1085     $poem = fix<<EVER_ON_AND_ON;
1086        Now far ahead the Road has gone,
1087           And I must follow, if I can,
1088        Pursuing it with eager feet,
1089           Until it joins some larger way
1090        Where many paths and errands meet.
1091           And whither then? I cannot say.
1092                 --Bilbo in /usr/src/perl/pp_ctl.c
1093     EVER_ON_AND_ON
1094
1095 =head1 Data: Arrays
1096
1097 =head2 What is the difference between a list and an array?
1098
1099 An array has a changeable length.  A list does not.  An array is something
1100 you can push or pop, while a list is a set of values.  Some people make
1101 the distinction that a list is a value while an array is a variable.
1102 Subroutines are passed and return lists, you put things into list
1103 context, you initialize arrays with lists, and you foreach() across
1104 a list.  C<@> variables are arrays, anonymous arrays are arrays, arrays
1105 in scalar context behave like the number of elements in them, subroutines
1106 access their arguments through the array C<@_>, and push/pop/shift only work
1107 on arrays.
1108
1109 As a side note, there's no such thing as a list in scalar context.
1110 When you say
1111
1112     $scalar = (2, 5, 7, 9);
1113
1114 you're using the comma operator in scalar context, so it uses the scalar
1115 comma operator.  There never was a list there at all!  This causes the
1116 last value to be returned: 9.
1117
1118 =head2 What is the difference between $array[1] and @array[1]?
1119
1120 The former is a scalar value; the latter an array slice, making
1121 it a list with one (scalar) value.  You should use $ when you want a
1122 scalar value (most of the time) and @ when you want a list with one
1123 scalar value in it (very, very rarely; nearly never, in fact).
1124
1125 Sometimes it doesn't make a difference, but sometimes it does.
1126 For example, compare:
1127
1128     $good[0] = `some program that outputs several lines`;
1129
1130 with
1131
1132     @bad[0]  = `same program that outputs several lines`;
1133
1134 The C<use warnings> pragma and the B<-w> flag will warn you about these
1135 matters.
1136
1137 =head2 How can I remove duplicate elements from a list or array?
1138
1139 There are several possible ways, depending on whether the array is
1140 ordered and whether you wish to preserve the ordering.
1141
1142 =over 4
1143
1144 =item a)
1145
1146 If @in is sorted, and you want @out to be sorted:
1147 (this assumes all true values in the array)
1148
1149     $prev = "not equal to $in[0]";
1150     @out = grep($_ ne $prev && ($prev = $_, 1), @in);
1151
1152 This is nice in that it doesn't use much extra memory, simulating
1153 uniq(1)'s behavior of removing only adjacent duplicates.  The ", 1"
1154 guarantees that the expression is true (so that grep picks it up)
1155 even if the $_ is 0, "", or undef.
1156
1157 =item b)
1158
1159 If you don't know whether @in is sorted:
1160
1161     undef %saw;
1162     @out = grep(!$saw{$_}++, @in);
1163
1164 =item c)
1165
1166 Like (b), but @in contains only small integers:
1167
1168     @out = grep(!$saw[$_]++, @in);
1169
1170 =item d)
1171
1172 A way to do (b) without any loops or greps:
1173
1174     undef %saw;
1175     @saw{@in} = ();
1176     @out = sort keys %saw;  # remove sort if undesired
1177
1178 =item e)
1179
1180 Like (d), but @in contains only small positive integers:
1181
1182     undef @ary;
1183     @ary[@in] = @in;
1184     @out = grep {defined} @ary;
1185
1186 =back
1187
1188 But perhaps you should have been using a hash all along, eh?
1189
1190 =head2 How can I tell whether a certain element is contained in a list or array?
1191
1192 Hearing the word "in" is an I<in>dication that you probably should have
1193 used a hash, not a list or array, to store your data.  Hashes are
1194 designed to answer this question quickly and efficiently.  Arrays aren't.
1195
1196 That being said, there are several ways to approach this.  If you
1197 are going to make this query many times over arbitrary string values,
1198 the fastest way is probably to invert the original array and maintain a
1199 hash whose keys are the first array's values.
1200
1201     @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1202     %is_blue = ();
1203     for (@blues) { $is_blue{$_} = 1 }
1204
1205 Now you can check whether $is_blue{$some_color}.  It might have been a
1206 good idea to keep the blues all in a hash in the first place.
1207
1208 If the values are all small integers, you could use a simple indexed
1209 array.  This kind of an array will take up less space:
1210
1211     @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1212     @is_tiny_prime = ();
1213     for (@primes) { $is_tiny_prime[$_] = 1 }
1214     # or simply  @istiny_prime[@primes] = (1) x @primes;
1215
1216 Now you check whether $is_tiny_prime[$some_number].
1217
1218 If the values in question are integers instead of strings, you can save
1219 quite a lot of space by using bit strings instead:
1220
1221     @articles = ( 1..10, 150..2000, 2017 );
1222     undef $read;
1223     for (@articles) { vec($read,$_,1) = 1 }
1224
1225 Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1226
1227 Please do not use
1228
1229     ($is_there) = grep $_ eq $whatever, @array;
1230
1231 or worse yet
1232
1233     ($is_there) = grep /$whatever/, @array;
1234
1235 These are slow (checks every element even if the first matches),
1236 inefficient (same reason), and potentially buggy (what if there are
1237 regex characters in $whatever?).  If you're only testing once, then
1238 use:
1239
1240     $is_there = 0;
1241     foreach $elt (@array) {
1242         if ($elt eq $elt_to_find) {
1243             $is_there = 1;
1244             last;
1245         }
1246     }
1247     if ($is_there) { ... }
1248
1249 =head2 How do I compute the difference of two arrays?  How do I compute the intersection of two arrays?
1250
1251 Use a hash.  Here's code to do both and more.  It assumes that
1252 each element is unique in a given array:
1253
1254     @union = @intersection = @difference = ();
1255     %count = ();
1256     foreach $element (@array1, @array2) { $count{$element}++ }
1257     foreach $element (keys %count) {
1258         push @union, $element;
1259         push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1260     }
1261
1262 Note that this is the I<symmetric difference>, that is, all elements in
1263 either A or in B but not in both.  Think of it as an xor operation.
1264
1265 =head2 How do I test whether two arrays or hashes are equal?
1266
1267 The following code works for single-level arrays.  It uses a stringwise
1268 comparison, and does not distinguish defined versus undefined empty
1269 strings.  Modify if you have other needs.
1270
1271     $are_equal = compare_arrays(\@frogs, \@toads);
1272
1273     sub compare_arrays {
1274         my ($first, $second) = @_;
1275         no warnings;  # silence spurious -w undef complaints
1276         return 0 unless @$first == @$second;
1277         for (my $i = 0; $i < @$first; $i++) {
1278             return 0 if $first->[$i] ne $second->[$i];
1279         }
1280         return 1;
1281     }
1282
1283 For multilevel structures, you may wish to use an approach more
1284 like this one.  It uses the CPAN module FreezeThaw:
1285
1286     use FreezeThaw qw(cmpStr);
1287     @a = @b = ( "this", "that", [ "more", "stuff" ] );
1288
1289     printf "a and b contain %s arrays\n",
1290         cmpStr(\@a, \@b) == 0
1291             ? "the same"
1292             : "different";
1293
1294 This approach also works for comparing hashes.  Here
1295 we'll demonstrate two different answers:
1296
1297     use FreezeThaw qw(cmpStr cmpStrHard);
1298
1299     %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1300     $a{EXTRA} = \%b;
1301     $b{EXTRA} = \%a;
1302
1303     printf "a and b contain %s hashes\n",
1304         cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1305
1306     printf "a and b contain %s hashes\n",
1307         cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1308
1309
1310 The first reports that both those the hashes contain the same data,
1311 while the second reports that they do not.  Which you prefer is left as
1312 an exercise to the reader.
1313
1314 =head2 How do I find the first array element for which a condition is true?
1315
1316 To find the first array element which satisfies a condition, you can
1317 use the first() function in the List::Util module, which comes with
1318 Perl 5.8.  This example finds the first element that contains "Perl".
1319
1320         use List::Util qw(first);
1321
1322         my $element = first { /Perl/ } @array;
1323
1324 If you cannot use List::Util, you can make your own loop to do the
1325 same thing.  Once you find the element, you stop the loop with last.
1326
1327         my $found;
1328         foreach my $element ( @array )
1329                 {
1330                 if( /Perl/ ) { $found = $element; last }
1331                 }
1332
1333 If you want the array index, you can iterate through the indices
1334 and check the array element at each index until you find one
1335 that satisfies the condition.
1336
1337         my( $found, $index ) = ( undef, -1 );
1338     for( $i = 0; $i < @array; $i++ )
1339         {
1340         if( $array[$i] =~ /Perl/ )
1341                 {
1342                 $found = $array[$i];
1343                 $index = $i;
1344                 last;
1345                 }
1346         }
1347
1348 =head2 How do I handle linked lists?
1349
1350 In general, you usually don't need a linked list in Perl, since with
1351 regular arrays, you can push and pop or shift and unshift at either end,
1352 or you can use splice to add and/or remove arbitrary number of elements at
1353 arbitrary points.  Both pop and shift are both O(1) operations on Perl's
1354 dynamic arrays.  In the absence of shifts and pops, push in general
1355 needs to reallocate on the order every log(N) times, and unshift will
1356 need to copy pointers each time.
1357
1358 If you really, really wanted, you could use structures as described in
1359 L<perldsc> or L<perltoot> and do just what the algorithm book tells you
1360 to do.  For example, imagine a list node like this:
1361
1362     $node = {
1363         VALUE => 42,
1364         LINK  => undef,
1365     };
1366
1367 You could walk the list this way:
1368
1369     print "List: ";
1370     for ($node = $head;  $node; $node = $node->{LINK}) {
1371         print $node->{VALUE}, " ";
1372     }
1373     print "\n";
1374
1375 You could add to the list this way:
1376
1377     my ($head, $tail);
1378     $tail = append($head, 1);       # grow a new head
1379     for $value ( 2 .. 10 ) {
1380         $tail = append($tail, $value);
1381     }
1382
1383     sub append {
1384         my($list, $value) = @_;
1385         my $node = { VALUE => $value };
1386         if ($list) {
1387             $node->{LINK} = $list->{LINK};
1388             $list->{LINK} = $node;
1389         } else {
1390             $_[0] = $node;      # replace caller's version
1391         }
1392         return $node;
1393     }
1394
1395 But again, Perl's built-in are virtually always good enough.
1396
1397 =head2 How do I handle circular lists?
1398
1399 Circular lists could be handled in the traditional fashion with linked
1400 lists, or you could just do something like this with an array:
1401
1402     unshift(@array, pop(@array));  # the last shall be first
1403     push(@array, shift(@array));   # and vice versa
1404
1405 =head2 How do I shuffle an array randomly?
1406
1407 If you either have Perl 5.8.0 or later installed, or if you have
1408 Scalar-List-Utils 1.03 or later installed, you can say:
1409
1410     use List::Util 'shuffle';
1411
1412         @shuffled = shuffle(@list);
1413
1414 If not, you can use a Fisher-Yates shuffle.
1415
1416     sub fisher_yates_shuffle {
1417         my $deck = shift;  # $deck is a reference to an array
1418         my $i = @$deck;
1419         while ($i--) {
1420             my $j = int rand ($i+1);
1421             @$deck[$i,$j] = @$deck[$j,$i];
1422         }
1423     }
1424
1425     # shuffle my mpeg collection
1426     #
1427     my @mpeg = <audio/*/*.mp3>;
1428     fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1429     print @mpeg;
1430
1431 Note that the above implementation shuffles an array in place,
1432 unlike the List::Util::shuffle() which takes a list and returns
1433 a new shuffled list.
1434
1435 You've probably seen shuffling algorithms that work using splice,
1436 randomly picking another element to swap the current element with
1437
1438     srand;
1439     @new = ();
1440     @old = 1 .. 10;  # just a demo
1441     while (@old) {
1442         push(@new, splice(@old, rand @old, 1));
1443     }
1444
1445 This is bad because splice is already O(N), and since you do it N times,
1446 you just invented a quadratic algorithm; that is, O(N**2).  This does
1447 not scale, although Perl is so efficient that you probably won't notice
1448 this until you have rather largish arrays.
1449
1450 =head2 How do I process/modify each element of an array?
1451
1452 Use C<for>/C<foreach>:
1453
1454     for (@lines) {
1455         s/foo/bar/;     # change that word
1456         y/XZ/ZX/;       # swap those letters
1457     }
1458
1459 Here's another; let's compute spherical volumes:
1460
1461     for (@volumes = @radii) {   # @volumes has changed parts
1462         $_ **= 3;
1463         $_ *= (4/3) * 3.14159;  # this will be constant folded
1464     }
1465
1466 which can also be done with map() which is made to transform
1467 one list into another:
1468
1469         @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1470
1471 If you want to do the same thing to modify the values of the
1472 hash, you can use the C<values> function.  As of Perl 5.6
1473 the values are not copied, so if you modify $orbit (in this
1474 case), you modify the value.
1475
1476     for $orbit ( values %orbits ) {
1477         ($orbit **= 3) *= (4/3) * 3.14159;
1478     }
1479
1480 Prior to perl 5.6 C<values> returned copies of the values,
1481 so older perl code often contains constructions such as
1482 C<@orbits{keys %orbits}> instead of C<values %orbits> where
1483 the hash is to be modified.
1484
1485 =head2 How do I select a random element from an array?
1486
1487 Use the rand() function (see L<perlfunc/rand>):
1488
1489     $index   = rand @array;
1490     $element = $array[$index];
1491
1492 Or, simply:
1493     my $element = $array[ rand @array ];
1494
1495 =head2 How do I permute N elements of a list?
1496
1497 Use the List::Permutor module on CPAN.  If the list is
1498 actually an array, try the Algorithm::Permute module (also
1499 on CPAN).  It's written in XS code and is very efficient.
1500
1501         use Algorithm::Permute;
1502         my @array = 'a'..'d';
1503         my $p_iterator = Algorithm::Permute->new ( \@array );
1504         while (my @perm = $p_iterator->next) {
1505            print "next permutation: (@perm)\n";
1506         }
1507
1508 For even faster execution, you could do:
1509
1510    use Algorithm::Permute;
1511    my @array = 'a'..'d';
1512    Algorithm::Permute::permute {
1513       print "next permutation: (@array)\n";
1514    } @array;
1515
1516 Here's a little program that generates all permutations of
1517 all the words on each line of input. The algorithm embodied
1518 in the permute() function is discussed in Volume 4 (still
1519 unpublished) of Knuth's I<The Art of Computer Programming>
1520 and will work on any list:
1521
1522         #!/usr/bin/perl -n
1523         # Fischer-Kause ordered permutation generator
1524
1525         sub permute (&@) {
1526                 my $code = shift;
1527                 my @idx = 0..$#_;
1528                 while ( $code->(@_[@idx]) ) {
1529                         my $p = $#idx;
1530                         --$p while $idx[$p-1] > $idx[$p];
1531                         my $q = $p or return;
1532                         push @idx, reverse splice @idx, $p;
1533                         ++$q while $idx[$p-1] > $idx[$q];
1534                         @idx[$p-1,$q]=@idx[$q,$p-1];
1535                 }
1536         }
1537
1538         permute {print"@_\n"} split;
1539
1540 =head2 How do I sort an array by (anything)?
1541
1542 Supply a comparison function to sort() (described in L<perlfunc/sort>):
1543
1544     @list = sort { $a <=> $b } @list;
1545
1546 The default sort function is cmp, string comparison, which would
1547 sort C<(1, 2, 10)> into C<(1, 10, 2)>.  C<< <=> >>, used above, is
1548 the numerical comparison operator.
1549
1550 If you have a complicated function needed to pull out the part you
1551 want to sort on, then don't do it inside the sort function.  Pull it
1552 out first, because the sort BLOCK can be called many times for the
1553 same element.  Here's an example of how to pull out the first word
1554 after the first number on each item, and then sort those words
1555 case-insensitively.
1556
1557     @idx = ();
1558     for (@data) {
1559         ($item) = /\d+\s*(\S+)/;
1560         push @idx, uc($item);
1561     }
1562     @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1563
1564 which could also be written this way, using a trick
1565 that's come to be known as the Schwartzian Transform:
1566
1567     @sorted = map  { $_->[0] }
1568               sort { $a->[1] cmp $b->[1] }
1569               map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1570
1571 If you need to sort on several fields, the following paradigm is useful.
1572
1573     @sorted = sort { field1($a) <=> field1($b) ||
1574                      field2($a) cmp field2($b) ||
1575                      field3($a) cmp field3($b)
1576                    }     @data;
1577
1578 This can be conveniently combined with precalculation of keys as given
1579 above.
1580
1581 See the F<sort> article in the "Far More Than You Ever Wanted
1582 To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
1583 more about this approach.
1584
1585 See also the question below on sorting hashes.
1586
1587 =head2 How do I manipulate arrays of bits?
1588
1589 Use pack() and unpack(), or else vec() and the bitwise operations.
1590
1591 For example, this sets $vec to have bit N set if $ints[N] was set:
1592
1593     $vec = '';
1594     foreach(@ints) { vec($vec,$_,1) = 1 }
1595
1596 Here's how, given a vector in $vec, you can
1597 get those bits into your @ints array:
1598
1599     sub bitvec_to_list {
1600         my $vec = shift;
1601         my @ints;
1602         # Find null-byte density then select best algorithm
1603         if ($vec =~ tr/\0// / length $vec > 0.95) {
1604             use integer;
1605             my $i;
1606             # This method is faster with mostly null-bytes
1607             while($vec =~ /[^\0]/g ) {
1608                 $i = -9 + 8 * pos $vec;
1609                 push @ints, $i if vec($vec, ++$i, 1);
1610                 push @ints, $i if vec($vec, ++$i, 1);
1611                 push @ints, $i if vec($vec, ++$i, 1);
1612                 push @ints, $i if vec($vec, ++$i, 1);
1613                 push @ints, $i if vec($vec, ++$i, 1);
1614                 push @ints, $i if vec($vec, ++$i, 1);
1615                 push @ints, $i if vec($vec, ++$i, 1);
1616                 push @ints, $i if vec($vec, ++$i, 1);
1617             }
1618         } else {
1619             # This method is a fast general algorithm
1620             use integer;
1621             my $bits = unpack "b*", $vec;
1622             push @ints, 0 if $bits =~ s/^(\d)// && $1;
1623             push @ints, pos $bits while($bits =~ /1/g);
1624         }
1625         return \@ints;
1626     }
1627
1628 This method gets faster the more sparse the bit vector is.
1629 (Courtesy of Tim Bunce and Winfried Koenig.)
1630
1631 You can make the while loop a lot shorter with this suggestion
1632 from Benjamin Goldberg:
1633
1634         while($vec =~ /[^\0]+/g ) {
1635            push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1636         }
1637
1638 Or use the CPAN module Bit::Vector:
1639
1640     $vector = Bit::Vector->new($num_of_bits);
1641     $vector->Index_List_Store(@ints);
1642     @ints = $vector->Index_List_Read();
1643
1644 Bit::Vector provides efficient methods for bit vector, sets of small integers
1645 and "big int" math.
1646
1647 Here's a more extensive illustration using vec():
1648
1649     # vec demo
1650     $vector = "\xff\x0f\xef\xfe";
1651     print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1652         unpack("N", $vector), "\n";
1653     $is_set = vec($vector, 23, 1);
1654     print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1655     pvec($vector);
1656
1657     set_vec(1,1,1);
1658     set_vec(3,1,1);
1659     set_vec(23,1,1);
1660
1661     set_vec(3,1,3);
1662     set_vec(3,2,3);
1663     set_vec(3,4,3);
1664     set_vec(3,4,7);
1665     set_vec(3,8,3);
1666     set_vec(3,8,7);
1667
1668     set_vec(0,32,17);
1669     set_vec(1,32,17);
1670
1671     sub set_vec {
1672         my ($offset, $width, $value) = @_;
1673         my $vector = '';
1674         vec($vector, $offset, $width) = $value;
1675         print "offset=$offset width=$width value=$value\n";
1676         pvec($vector);
1677     }
1678
1679     sub pvec {
1680         my $vector = shift;
1681         my $bits = unpack("b*", $vector);
1682         my $i = 0;
1683         my $BASE = 8;
1684
1685         print "vector length in bytes: ", length($vector), "\n";
1686         @bytes = unpack("A8" x length($vector), $bits);
1687         print "bits are: @bytes\n\n";
1688     }
1689
1690 =head2 Why does defined() return true on empty arrays and hashes?
1691
1692 The short story is that you should probably only use defined on scalars or
1693 functions, not on aggregates (arrays and hashes).  See L<perlfunc/defined>
1694 in the 5.004 release or later of Perl for more detail.
1695
1696 =head1 Data: Hashes (Associative Arrays)
1697
1698 =head2 How do I process an entire hash?
1699
1700 Use the each() function (see L<perlfunc/each>) if you don't care
1701 whether it's sorted:
1702
1703     while ( ($key, $value) = each %hash) {
1704         print "$key = $value\n";
1705     }
1706
1707 If you want it sorted, you'll have to use foreach() on the result of
1708 sorting the keys as shown in an earlier question.
1709
1710 =head2 What happens if I add or remove keys from a hash while iterating over it?
1711
1712 (contributed by brian d foy)
1713
1714 The easy answer is "Don't do that!"
1715
1716 If you iterate through the hash with each(), you can delete the key
1717 most recently returned without worrying about it.  If you delete or add
1718 other keys, the iterator may skip or double up on them since perl
1719 may rearrange the hash table.  See the
1720 entry for C<each()> in L<perlfunc>.
1721
1722 =head2 How do I look up a hash element by value?
1723
1724 Create a reverse hash:
1725
1726     %by_value = reverse %by_key;
1727     $key = $by_value{$value};
1728
1729 That's not particularly efficient.  It would be more space-efficient
1730 to use:
1731
1732     while (($key, $value) = each %by_key) {
1733         $by_value{$value} = $key;
1734     }
1735
1736 If your hash could have repeated values, the methods above will only find
1737 one of the associated keys.   This may or may not worry you.  If it does
1738 worry you, you can always reverse the hash into a hash of arrays instead:
1739
1740      while (($key, $value) = each %by_key) {
1741          push @{$key_list_by_value{$value}}, $key;
1742      }
1743
1744 =head2 How can I know how many entries are in a hash?
1745
1746 If you mean how many keys, then all you have to do is
1747 use the keys() function in a scalar context:
1748
1749     $num_keys = keys %hash;
1750
1751 The keys() function also resets the iterator, which means that you may
1752 see strange results if you use this between uses of other hash operators
1753 such as each().
1754
1755 =head2 How do I sort a hash (optionally by value instead of key)?
1756
1757 Internally, hashes are stored in a way that prevents you from imposing
1758 an order on key-value pairs.  Instead, you have to sort a list of the
1759 keys or values:
1760
1761     @keys = sort keys %hash;    # sorted by key
1762     @keys = sort {
1763                     $hash{$a} cmp $hash{$b}
1764             } keys %hash;       # and by value
1765
1766 Here we'll do a reverse numeric sort by value, and if two keys are
1767 identical, sort by length of key, or if that fails, by straight ASCII
1768 comparison of the keys (well, possibly modified by your locale--see
1769 L<perllocale>).
1770
1771     @keys = sort {
1772                 $hash{$b} <=> $hash{$a}
1773                           ||
1774                 length($b) <=> length($a)
1775                           ||
1776                       $a cmp $b
1777     } keys %hash;
1778
1779 =head2 How can I always keep my hash sorted?
1780
1781 You can look into using the DB_File module and tie() using the
1782 $DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
1783 The Tie::IxHash module from CPAN might also be instructive.
1784
1785 =head2 What's the difference between "delete" and "undef" with hashes?
1786
1787 Hashes contain pairs of scalars: the first is the key, the
1788 second is the value.  The key will be coerced to a string,
1789 although the value can be any kind of scalar: string,
1790 number, or reference.  If a key $key is present in
1791 %hash, C<exists($hash{$key})> will return true.  The value
1792 for a given key can be C<undef>, in which case
1793 C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1794 will return true.  This corresponds to (C<$key>, C<undef>)
1795 being in the hash.
1796
1797 Pictures help...  here's the %hash table:
1798
1799           keys  values
1800         +------+------+
1801         |  a   |  3   |
1802         |  x   |  7   |
1803         |  d   |  0   |
1804         |  e   |  2   |
1805         +------+------+
1806
1807 And these conditions hold
1808
1809         $hash{'a'}                       is true
1810         $hash{'d'}                       is false
1811         defined $hash{'d'}               is true
1812         defined $hash{'a'}               is true
1813         exists $hash{'a'}                is true (Perl5 only)
1814         grep ($_ eq 'a', keys %hash)     is true
1815
1816 If you now say
1817
1818         undef $hash{'a'}
1819
1820 your table now reads:
1821
1822
1823           keys  values
1824         +------+------+
1825         |  a   | undef|
1826         |  x   |  7   |
1827         |  d   |  0   |
1828         |  e   |  2   |
1829         +------+------+
1830
1831 and these conditions now hold; changes in caps:
1832
1833         $hash{'a'}                       is FALSE
1834         $hash{'d'}                       is false
1835         defined $hash{'d'}               is true
1836         defined $hash{'a'}               is FALSE
1837         exists $hash{'a'}                is true (Perl5 only)
1838         grep ($_ eq 'a', keys %hash)     is true
1839
1840 Notice the last two: you have an undef value, but a defined key!
1841
1842 Now, consider this:
1843
1844         delete $hash{'a'}
1845
1846 your table now reads:
1847
1848           keys  values
1849         +------+------+
1850         |  x   |  7   |
1851         |  d   |  0   |
1852         |  e   |  2   |
1853         +------+------+
1854
1855 and these conditions now hold; changes in caps:
1856
1857         $hash{'a'}                       is false
1858         $hash{'d'}                       is false
1859         defined $hash{'d'}               is true
1860         defined $hash{'a'}               is false
1861         exists $hash{'a'}                is FALSE (Perl5 only)
1862         grep ($_ eq 'a', keys %hash)     is FALSE
1863
1864 See, the whole entry is gone!
1865
1866 =head2 Why don't my tied hashes make the defined/exists distinction?
1867
1868 This depends on the tied hash's implementation of EXISTS().
1869 For example, there isn't the concept of undef with hashes
1870 that are tied to DBM* files. It also means that exists() and
1871 defined() do the same thing with a DBM* file, and what they
1872 end up doing is not what they do with ordinary hashes.
1873
1874 =head2 How do I reset an each() operation part-way through?
1875
1876 Using C<keys %hash> in scalar context returns the number of keys in
1877 the hash I<and> resets the iterator associated with the hash.  You may
1878 need to do this if you use C<last> to exit a loop early so that when you
1879 re-enter it, the hash iterator has been reset.
1880
1881 =head2 How can I get the unique keys from two hashes?
1882
1883 First you extract the keys from the hashes into lists, then solve
1884 the "removing duplicates" problem described above.  For example:
1885
1886     %seen = ();
1887     for $element (keys(%foo), keys(%bar)) {
1888         $seen{$element}++;
1889     }
1890     @uniq = keys %seen;
1891
1892 Or more succinctly:
1893
1894     @uniq = keys %{{%foo,%bar}};
1895
1896 Or if you really want to save space:
1897
1898     %seen = ();
1899     while (defined ($key = each %foo)) {
1900         $seen{$key}++;
1901     }
1902     while (defined ($key = each %bar)) {
1903         $seen{$key}++;
1904     }
1905     @uniq = keys %seen;
1906
1907 =head2 How can I store a multidimensional array in a DBM file?
1908
1909 Either stringify the structure yourself (no fun), or else
1910 get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1911 it on top of either DB_File or GDBM_File.
1912
1913 =head2 How can I make my hash remember the order I put elements into it?
1914
1915 Use the Tie::IxHash from CPAN.
1916
1917     use Tie::IxHash;
1918     tie my %myhash, 'Tie::IxHash';
1919     for (my $i=0; $i<20; $i++) {
1920         $myhash{$i} = 2*$i;
1921     }
1922     my @keys = keys %myhash;
1923     # @keys = (0,1,2,3,...)
1924
1925 =head2 Why does passing a subroutine an undefined element in a hash create it?
1926
1927 If you say something like:
1928
1929     somefunc($hash{"nonesuch key here"});
1930
1931 Then that element "autovivifies"; that is, it springs into existence
1932 whether you store something there or not.  That's because functions
1933 get scalars passed in by reference.  If somefunc() modifies C<$_[0]>,
1934 it has to be ready to write it back into the caller's version.
1935
1936 This has been fixed as of Perl5.004.
1937
1938 Normally, merely accessing a key's value for a nonexistent key does
1939 I<not> cause that key to be forever there.  This is different than
1940 awk's behavior.
1941
1942 =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
1943
1944 Usually a hash ref, perhaps like this:
1945
1946     $record = {
1947         NAME   => "Jason",
1948         EMPNO  => 132,
1949         TITLE  => "deputy peon",
1950         AGE    => 23,
1951         SALARY => 37_000,
1952         PALS   => [ "Norbert", "Rhys", "Phineas"],
1953     };
1954
1955 References are documented in L<perlref> and the upcoming L<perlreftut>.
1956 Examples of complex data structures are given in L<perldsc> and
1957 L<perllol>.  Examples of structures and object-oriented classes are
1958 in L<perltoot>.
1959
1960 =head2 How can I use a reference as a hash key?
1961
1962 You can't do this directly, but you could use the standard Tie::RefHash
1963 module distributed with Perl.
1964
1965 =head1 Data: Misc
1966
1967 =head2 How do I handle binary data correctly?
1968
1969 Perl is binary clean, so this shouldn't be a problem.  For example,
1970 this works fine (assuming the files are found):
1971
1972     if (`cat /vmunix` =~ /gzip/) {
1973         print "Your kernel is GNU-zip enabled!\n";
1974     }
1975
1976 On less elegant (read: Byzantine) systems, however, you have
1977 to play tedious games with "text" versus "binary" files.  See
1978 L<perlfunc/"binmode"> or L<perlopentut>.
1979
1980 If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1981
1982 If you want to deal with multibyte characters, however, there are
1983 some gotchas.  See the section on Regular Expressions.
1984
1985 =head2 How do I determine whether a scalar is a number/whole/integer/float?
1986
1987 Assuming that you don't care about IEEE notations like "NaN" or
1988 "Infinity", you probably just want to use a regular expression.
1989
1990    if (/\D/)            { print "has nondigits\n" }
1991    if (/^\d+$/)         { print "is a whole number\n" }
1992    if (/^-?\d+$/)       { print "is an integer\n" }
1993    if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
1994    if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1995    if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
1996    if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1997                         { print "a C float\n" }
1998
1999 There are also some commonly used modules for the task.
2000 L<Scalar::Util> (distributed with 5.8) provides access to perl's
2001 internal function C<looks_like_number> for determining
2002 whether a variable looks like a number.  L<Data::Types>
2003 exports functions that validate data types using both the
2004 above and other regular expressions. Thirdly, there is
2005 C<Regexp::Common> which has regular expressions to match
2006 various types of numbers. Those three modules are available
2007 from the CPAN.
2008
2009 If you're on a POSIX system, Perl supports the C<POSIX::strtod>
2010 function.  Its semantics are somewhat cumbersome, so here's a C<getnum>
2011 wrapper function for more convenient access.  This function takes
2012 a string and returns the number it found, or C<undef> for input that
2013 isn't a C float.  The C<is_numeric> function is a front end to C<getnum>
2014 if you just want to say, ``Is this a float?''
2015
2016     sub getnum {
2017         use POSIX qw(strtod);
2018         my $str = shift;
2019         $str =~ s/^\s+//;
2020         $str =~ s/\s+$//;
2021         $! = 0;
2022         my($num, $unparsed) = strtod($str);
2023         if (($str eq '') || ($unparsed != 0) || $!) {
2024             return undef;
2025         } else {
2026             return $num;
2027         }
2028     }
2029
2030     sub is_numeric { defined getnum($_[0]) }
2031
2032 Or you could check out the L<String::Scanf> module on the CPAN
2033 instead. The POSIX module (part of the standard Perl distribution) provides
2034 the C<strtod> and C<strtol> for converting strings to double and longs,
2035 respectively.
2036
2037 =head2 How do I keep persistent data across program calls?
2038
2039 For some specific applications, you can use one of the DBM modules.
2040 See L<AnyDBM_File>.  More generically, you should consult the FreezeThaw
2041 or Storable modules from CPAN.  Starting from Perl 5.8 Storable is part
2042 of the standard distribution.  Here's one example using Storable's C<store>
2043 and C<retrieve> functions:
2044
2045     use Storable;
2046     store(\%hash, "filename");
2047
2048     # later on...
2049     $href = retrieve("filename");        # by ref
2050     %hash = %{ retrieve("filename") };   # direct to hash
2051
2052 =head2 How do I print out or copy a recursive data structure?
2053
2054 The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2055 for printing out data structures.  The Storable module on CPAN (or the
2056 5.8 release of Perl), provides a function called C<dclone> that recursively
2057 copies its argument.
2058
2059     use Storable qw(dclone);
2060     $r2 = dclone($r1);
2061
2062 Where $r1 can be a reference to any kind of data structure you'd like.
2063 It will be deeply copied.  Because C<dclone> takes and returns references,
2064 you'd have to add extra punctuation if you had a hash of arrays that
2065 you wanted to copy.
2066
2067     %newhash = %{ dclone(\%oldhash) };
2068
2069 =head2 How do I define methods for every class/object?
2070
2071 Use the UNIVERSAL class (see L<UNIVERSAL>).
2072
2073 =head2 How do I verify a credit card checksum?
2074
2075 Get the Business::CreditCard module from CPAN.
2076
2077 =head2 How do I pack arrays of doubles or floats for XS code?
2078
2079 The kgbpack.c code in the PGPLOT module on CPAN does just this.
2080 If you're doing a lot of float or double processing, consider using
2081 the PDL module from CPAN instead--it makes number-crunching easy.
2082
2083 =head1 AUTHOR AND COPYRIGHT
2084
2085 Copyright (c) 1997-2005 Tom Christiansen, Nathan Torkington, and
2086 other authors as noted. All rights reserved.
2087
2088 This documentation is free; you can redistribute it and/or modify it
2089 under the same terms as Perl itself.
2090
2091 Irrespective of its distribution, all code examples in this file
2092 are hereby placed into the public domain.  You are permitted and
2093 encouraged to use this code in your own programs for fun
2094 or for profit as you see fit.  A simple comment in the code giving
2095 credit would be courteous but is not required.