pod/perlfaq4.pod

   1 =head1 NAME
   2
   3 perlfaq4 - Data Manipulation ($Revision: 1.19 $, $Date: 1997/04/24 22:43:57 $)
   4
   5 =head1 DESCRIPTION
   6
   7 The section of the FAQ answers question related to the manipulation
   8 of data as numbers, dates, strings, arrays, hashes, and miscellaneous
   9 data issues.
  10
  11 =head1 Data: Numbers
  12
  13 =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
  14
  15 Internally, your computer represents floating-point numbers in binary.
  16 Floating-point numbers read in from a file, or appearing as literals
  17 in your program, are converted from their decimal floating-point
  18 representation (eg, 19.95) to the internal binary representation.
  19
  20 However, 19.95 can't be precisely represented as a binary
  21 floating-point number, just like 1/3 can't be exactly represented as a
  22 decimal floating-point number.  The computer's binary representation
  23 of 19.95, therefore, isn't exactly 19.95.
  24
  25 When a floating-point number gets printed, the binary floating-point
  26 representation is converted back to decimal.  These decimal numbers
  27 are displayed in either the format you specify with printf(), or the
  28 current output format for numbers (see L<perlvar/"$#"> if you use
  29 print.  C<$#> has a different default value in Perl5 than it did in
  30 Perl4.  Changing C<$#> yourself is deprecated.
  31
  32 This affects B<all> computer languages that represent decimal
  33 floating-point numbers in binary, not just Perl.  Perl provides
  34 arbitrary-precision decimal numbers with the Math::BigFloat module
  35 (part of the standard Perl distribution), but mathematical operations
  36 are consequently slower.
  37
  38 To get rid of the superfluous digits, just use a format (eg,
  39 C<printf("%.2f", 19.95)>) to get the required precision.
  40
  41 =head2 Why isn't my octal data interpreted correctly?
  42
  43 Perl only understands octal and hex numbers as such when they occur
  44 as literals in your program.  If they are read in from somewhere and
  45 assigned, no automatic conversion takes place.  You must explicitly
  46 use oct() or hex() if you want the values converted.  oct() interprets
  47 both hex ("0x350") numbers and octal ones ("0350" or even without the
  48 leading "0", like "377"), while hex() only converts hexadecimal ones,
  49 with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
  50
  51 This problem shows up most often when people try using chmod(), mkdir(),
  52 umask(), or sysopen(), which all want permissions in octal.
  53
  54     chmod(644,  $file); # WRONG -- perl -w catches this
  55     chmod(0644, $file); # right
  56
  57 =head2 Does perl have a round function?  What about ceil() and floor()?
  58 Trig functions?
  59
  60 For rounding to a certain number of digits, sprintf() or printf() is
  61 usually the easiest route.
  62
  63 The POSIX module (part of the standard perl distribution) implements
  64 ceil(), floor(), and a number of other mathematical and trigonometric
  65 functions.
  66
  67 In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
  68 module.  With 5.004, the Math::Trig module (part of the standard perl
  69 distribution) implements the trigonometric functions. Internally it
  70 uses the Math::Complex module and some functions can break out from
  71 the real axis into the complex plane, for example the inverse sine of
  72 2.
  73
  74 Rounding in financial applications can have serious implications, and
  75 the rounding method used should be specified precisely.  In these
  76 cases, it probably pays not to trust whichever system rounding is
  77 being used by Perl, but to instead implement the rounding function you
  78 need yourself.
  79
  80 =head2 How do I convert bits into ints?
  81
  82 To turn a string of 1s and 0s like '10110110' into a scalar containing
  83 its binary value, use the pack() function (documented in
  84 L<perlfunc/"pack">):
  85
  86     $decimal = pack('B8', '10110110');
  87
  88 Here's an example of going the other way:
  89
  90     $binary_string = join('', unpack('B*', "\x29"));
  91
  92 =head2 How do I multiply matrices?
  93
  94 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
  95 or the PDL extension (also available from CPAN).
  96
  97 =head2 How do I perform an operation on a series of integers?
  98
  99 To call a function on each element in an array, and collect the
 100 results, use:
 101
 102     @results = map { my_func($_) } @array;
 103
 104 For example:
 105
 106     @triple = map { 3 * $_ } @single;
 107
 108 To call a function on each element of an array, but ignore the
 109 results:
 110
 111     foreach $iterator (@array) {
 112         &my_func($iterator);
 113     }
 114
 115 To call a function on each integer in a (small) range, you B<can> use:
 116
 117     @results = map { &my_func($_) } (5 .. 25);
 118
 119 but you should be aware that the C<..> operator creates an array of
 120 all integers in the range.  This can take a lot of memory for large
 121 ranges.  Instead use:
 122
 123     @results = ();
 124     for ($i=5; $i < 500_005; $i++) {
 125         push(@results, &my_func($i));
 126     }
 127
 128 =head2 How can I output Roman numerals?
 129
 130 Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
 131
 132 =head2 Why aren't my random numbers random?
 133
 134 The short explanation is that you're getting pseudorandom numbers, not
 135 random ones, because that's how these things work.  A longer
 136 explanation is available on
 137 http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
 138 Phoenix.
 139
 140 You should also check out the Math::TrulyRandom module from CPAN.
 141
 142 =head1 Data: Dates
 143
 144 =head2 How do I find the week-of-the-year/day-of-the-year?
 145
 146 The day of the year is in the array returned by localtime() (see
 147 L<perlfunc/"localtime">):
 148
 149     $day_of_year = (localtime(time()))[7];
 150
 151 or more legibly (in 5.004 or higher):
 152
 153     use Time::localtime;
 154     $day_of_year = localtime(time())->yday;
 155
 156 You can find the week of the year by dividing this by 7:
 157
 158     $week_of_year = int($day_of_year / 7);
 159
 160 Of course, this believes that weeks start at zero.
 161
 162 =head2 How can I compare two date strings?
 163
 164 Use the Date::Manip or Date::DateCalc modules from CPAN.
 165
 166 =head2 How can I take a string and turn it into epoch seconds?
 167
 168 If it's a regular enough string that it always has the same format,
 169 you can split it up and pass the parts to timelocal in the standard
 170 Time::Local module.  Otherwise, you should look into one of the
 171 Date modules from CPAN.
 172
 173 =head2 How can I find the Julian Day?
 174
 175 Neither Date::Manip nor Date::DateCalc deal with Julian days.
 176 Instead, there is an example of Julian date calculation in
 177 http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz,
 178 which should help.
 179
 180 =head2 Does Perl have a year 2000 problem?
 181
 182 Not unless you use Perl to create one. The date and time functions
 183 supplied with perl (gmtime and localtime) supply adequate information
 184 to determine the year well beyond 2000 (2038 is when trouble strikes).
 185 The year returned by these functions when used in an array context is
 186 the year minus 1900. For years between 1910 and 1999 this I<happens>
 187 to be a 2-digit decimal number. To avoid the year 2000 problem simply
 188 do not treat the year as a 2-digit number.  It isn't.
 189
 190 When gmtime() and localtime() are used in a scalar context they return
 191 a timestamp string that contains a fully-expanded year.  For example,
 192 C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
 193 2001".  There's no year 2000 problem here.
 194
 195 =head1 Data: Strings
 196
 197 =head2 How do I validate input?
 198
 199 The answer to this question is usually a regular expression, perhaps
 200 with auxiliary logic.  See the more specific questions (numbers, email
 201 addresses, etc.) for details.
 202
 203 =head2 How do I unescape a string?
 204
 205 It depends just what you mean by "escape".  URL escapes are dealt with
 206 in L<perlfaq9>.  Shell escapes with the backslash (\)
 207 character are removed with:
 208
 209     s/\\(.)/$1/g;
 210
 211 Note that this won't expand \n or \t or any other special escapes.
 212
 213 =head2 How do I remove consecutive pairs of characters?
 214
 215 To turn "abbcccd" into "abccd":
 216
 217     s/(.)\1/$1/g;
 218
 219 =head2 How do I expand function calls in a string?
 220
 221 This is documented in L<perlref>.  In general, this is fraught with
 222 quoting and readability problems, but it is possible.  To interpolate
 223 a subroutine call (in a list context) into a string:
 224
 225     print "My sub returned @{[mysub(1,2,3)]} that time.\n";
 226
 227 If you prefer scalar context, similar chicanery is also useful for
 228 arbitrary expressions:
 229
 230     print "That yields ${\($n + 5)} widgets\n";
 231
 232 See also "How can I expand variables in text strings?" in this section
 233 of the FAQ.
 234
 235 =head2 How do I find matching/nesting anything?
 236
 237 This isn't something that can be tackled in one regular expression, no
 238 matter how complicated.  To find something between two single characters,
 239 a pattern like C</x([^x]*)x/> will get the intervening bits in $1. For
 240 multiple ones, then something more like C</alpha(.*?)omega/> would
 241 be needed.  But none of these deals with nested patterns, nor can they.
 242 For that you'll have to write a parser.
 243
 244 =head2 How do I reverse a string?
 245
 246 Use reverse() in a scalar context, as documented in
 247 L<perlfunc/reverse>.
 248
 249     $reversed = reverse $string;
 250
 251 =head2 How do I expand tabs in a string?
 252
 253 You can do it the old-fashioned way:
 254
 255     1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
 256
 257 Or you can just use the Text::Tabs module (part of the standard perl
 258 distribution).
 259
 260     use Text::Tabs;
 261     @expanded_lines = expand(@lines_with_tabs);
 262
 263 =head2 How do I reformat a paragraph?
 264
 265 Use Text::Wrap (part of the standard perl distribution):
 266
 267     use Text::Wrap;
 268     print wrap("\t", '  ', @paragraphs);
 269
 270 The paragraphs you give to Text::Wrap may not contain embedded
 271 newlines.  Text::Wrap doesn't justify the lines (flush-right).
 272
 273 =head2 How can I access/change the first N letters of a string?
 274
 275 There are many ways.  If you just want to grab a copy, use
 276 substr:
 277
 278     $first_byte = substr($a, 0, 1);
 279
 280 If you want to modify part of a string, the simplest way is often to
 281 use substr() as an lvalue:
 282
 283     substr($a, 0, 3) = "Tom";
 284
 285 Although those with a regexp kind of thought process will likely prefer
 286
 287     $a =~ s/^.../Tom/;
 288
 289 =head2 How do I change the Nth occurrence of something?
 290
 291 You have to keep track.  For example, let's say you want
 292 to change the fifth occurrence of "whoever" or "whomever"
 293 into "whosoever" or "whomsoever", case insensitively.
 294
 295     $count = 0;
 296     s{((whom?)ever)}{
 297         ++$count == 5           # is it the 5th?
 298             ? "${2}soever"      # yes, swap
 299             : $1                # renege and leave it there
 300     }igex;
 301
 302 =head2 How can I count the number of occurrences of a substring within a string?
 303
 304 There are a number of ways, with varying efficiency: If you want a
 305 count of a certain single character (X) within a string, you can use the
 306 C<tr///> function like so:
 307
 308     $string = "ThisXlineXhasXsomeXx'sXinXit":
 309     $count = ($string =~ tr/X//);
 310     print "There are $count X charcters in the string";
 311
 312 This is fine if you are just looking for a single character.  However,
 313 if you are trying to count multiple character substrings within a
 314 larger string, C<tr///> won't work.  What you can do is wrap a while()
 315 loop around a global pattern match.  For example, let's count negative
 316 integers:
 317
 318     $string = "-9 55 48 -2 23 -76 4 14 -44";
 319     while ($string =~ /-\d+/g) { $count++ }
 320     print "There are $count negative numbers in the string";
 321
 322 =head2 How do I capitalize all the words on one line?
 323
 324 To make the first letter of each word upper case:
 325
 326         $line =~ s/\b(\w)/\U$1/g;
 327
 328 This has the strange effect of turning "C<don't do it>" into "C<Don'T
 329 Do It>".  Sometimes you might want this, instead (Suggested by Brian
 330 Foy E<lt>comdog@computerdog.comE<gt>):
 331
 332     $string =~ s/ (
 333                  (^\w)    #at the beginning of the line
 334                    |      # or
 335                  (\s\w)   #preceded by whitespace
 336                    )
 337                 /\U$1/xg;
 338     $string =~ /([\w']+)/\u\L$1/g;
 339
 340 To make the whole line upper case:
 341
 342         $line = uc($line);
 343
 344 To force each word to be lower case, with the first letter upper case:
 345
 346         $line =~ s/(\w+)/\u\L$1/g;
 347
 348 =head2 How can I split a [character] delimited string except when inside
 349 [character]? (Comma-separated files)
 350
 351 Take the example case of trying to split a string that is comma-separated
 352 into its different fields.  (We'll pretend you said comma-separated, not
 353 comma-delimited, which is different and almost never what you mean.) You
 354 can't use C<split(/,/)> because you shouldn't split if the comma is inside
 355 quotes.  For example, take a data line like this:
 356
 357     SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
 358
 359 Due to the restriction of the quotes, this is a fairly complex
 360 problem.  Thankfully, we have Jeffrey Friedl, author of a highly
 361 recommended book on regular expressions, to handle these for us.  He
 362 suggests (assuming your string is contained in $text):
 363
 364      @new = ();
 365      push(@new, $+) while $text =~ m{
 366          "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
 367        | ([^,]+),?
 368        | ,
 369      }gx;
 370      push(@new, undef) if substr($text,-1,1) eq ',';
 371
 372 If you want to represent quotation marks inside a
 373 quotation-mark-delimited field, escape them with backslashes (eg,
 374 C<"like \"this\"">.  Unescaping them is a task addressed earlier in
 375 this section.
 376
 377 Alternatively, the Text::ParseWords module (part of the standard perl
 378 distribution) lets you say:
 379
 380     use Text::ParseWords;
 381     @new = quotewords(",", 0, $text);
 382
 383 =head2 How do I strip blank space from the beginning/end of a string?
 384
 385 The simplest approach, albeit not the fastest, is probably like this:
 386
 387     $string =~ s/^\s*(.*?)\s*$/$1/;
 388
 389 It would be faster to do this in two steps:
 390
 391     $string =~ s/^\s+//;
 392     $string =~ s/\s+$//;
 393
 394 Or more nicely written as:
 395
 396     for ($string) {
 397         s/^\s+//;
 398         s/\s+$//;
 399     }
 400
 401 =head2 How do I extract selected columns from a string?
 402
 403 Use substr() or unpack(), both documented in L<perlfunc>.
 404
 405 =head2 How do I find the soundex value of a string?
 406
 407 Use the standard Text::Soundex module distributed with perl.
 408
 409 =head2 How can I expand variables in text strings?
 410
 411 Let's assume that you have a string like:
 412
 413     $text = 'this has a $foo in it and a $bar';
 414     $text =~ s/\$(\w+)/${$1}/g;
 415
 416 Before version 5 of perl, this had to be done with a double-eval
 417 substitution:
 418
 419     $text =~ s/(\$\w+)/$1/eeg;
 420
 421 Which is bizarre enough that you'll probably actually need an EEG
 422 afterwards. :-)
 423
 424 See also "How do I expand function calls in a string?" in this section
 425 of the FAQ.
 426
 427 =head2 What's wrong with always quoting "$vars"?
 428
 429 The problem is that those double-quotes force stringification,
 430 coercing numbers and references into strings, even when you
 431 don't want them to be.
 432
 433 If you get used to writing odd things like these:
 434
 435     print "$var";       # BAD
 436     $new = "$old";      # BAD
 437     somefunc("$var");   # BAD
 438
 439 You'll be in trouble.  Those should (in 99.8% of the cases) be
 440 the simpler and more direct:
 441
 442     print $var;
 443     $new = $old;
 444     somefunc($var);
 445
 446 Otherwise, besides slowing you down, you're going to break code when
 447 the thing in the scalar is actually neither a string nor a number, but
 448 a reference:
 449
 450     func(\@array);
 451     sub func {
 452         my $aref = shift;
 453         my $oref = "$aref";  # WRONG
 454     }
 455
 456 You can also get into subtle problems on those few operations in Perl
 457 that actually do care about the difference between a string and a
 458 number, such as the magical C<++> autoincrement operator or the
 459 syscall() function.
 460
 461 =head2 Why don't my <<HERE documents work?
 462
 463 Check for these three things:
 464
 465 =over 4
 466
 467 =item 1. There must be no space after the << part.
 468
 469 =item 2. There (probably) should be a semicolon at the end.
 470
 471 =item 3. You can't (easily) have any space in front of the tag.
 472
 473 =back
 474
 475 =head1 Data: Arrays
 476
 477 =head2 What is the difference between $array[1] and @array[1]?
 478
 479 The former is a scalar value, the latter an array slice, which makes
 480 it a list with one (scalar) value.  You should use $ when you want a
 481 scalar value (most of the time) and @ when you want a list with one
 482 scalar value in it (very, very rarely; nearly never, in fact).
 483
 484 Sometimes it doesn't make a difference, but sometimes it does.
 485 For example, compare:
 486
 487     $good[0] = `some program that outputs several lines`;
 488
 489 with
 490
 491     @bad[0]  = `same program that outputs several lines`;
 492
 493 The B<-w> flag will warn you about these matters.
 494
 495 =head2 How can I extract just the unique elements of an array?
 496
 497 There are several possible ways, depending on whether the array is
 498 ordered and whether you wish to preserve the ordering.
 499
 500 =over 4
 501
 502 =item a) If @in is sorted, and you want @out to be sorted:
 503
 504     $prev = 'nonesuch';
 505     @out = grep($_ ne $prev && ($prev = $_), @in);
 506
 507 This is nice in that it doesn't use much extra memory,
 508 simulating uniq(1)'s behavior of removing only adjacent
 509 duplicates.
 510
 511 =item b) If you don't know whether @in is sorted:
 512
 513     undef %saw;
 514     @out = grep(!$saw{$_}++, @in);
 515
 516 =item c) Like (b), but @in contains only small integers:
 517
 518     @out = grep(!$saw[$_]++, @in);
 519
 520 =item d) A way to do (b) without any loops or greps:
 521
 522     undef %saw;
 523     @saw{@in} = ();
 524     @out = sort keys %saw;  # remove sort if undesired
 525
 526 =item e) Like (d), but @in contains only small positive integers:
 527
 528     undef @ary;
 529     @ary[@in] = @in;
 530     @out = @ary;
 531
 532 =back
 533
 534 =head2 How can I tell whether an array contains a certain element?
 535
 536 There are several ways to approach this.  If you are going to make
 537 this query many times and the values are arbitrary strings, the
 538 fastest way is probably to invert the original array and keep an
 539 associative array lying about whose keys are the first array's values.
 540
 541     @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
 542     undef %is_blue;
 543     for (@blues) { $is_blue{$_} = 1 }
 544
 545 Now you can check whether $is_blue{$some_color}.  It might have been a
 546 good idea to keep the blues all in a hash in the first place.
 547
 548 If the values are all small integers, you could use a simple indexed
 549 array.  This kind of an array will take up less space:
 550
 551     @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
 552     undef @is_tiny_prime;
 553     for (@primes) { $is_tiny_prime[$_] = 1; }
 554
 555 Now you check whether $is_tiny_prime[$some_number].
 556
 557 If the values in question are integers instead of strings, you can save
 558 quite a lot of space by using bit strings instead:
 559
 560     @articles = ( 1..10, 150..2000, 2017 );
 561     undef $read;
 562     for (@articles) { vec($read,$_,1) = 1 }
 563
 564 Now check whether C<vec($read,$n,1)> is true for some C<$n>.
 565
 566 Please do not use
 567
 568     $is_there = grep $_ eq $whatever, @array;
 569
 570 or worse yet
 571
 572     $is_there = grep /$whatever/, @array;
 573
 574 These are slow (checks every element even if the first matches),
 575 inefficient (same reason), and potentially buggy (what if there are
 576 regexp characters in $whatever?).
 577
 578 =head2 How do I compute the difference of two arrays?  How do I compute the intersection of two arrays?
 579
 580 Use a hash.  Here's code to do both and more.  It assumes that
 581 each element is unique in a given array:
 582
 583     @union = @intersection = @difference = ();
 584     %count = ();
 585     foreach $element (@array1, @array2) { $count{$element}++ }
 586     foreach $element (keys %count) {
 587         push @union, $element;
 588         push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
 589     }
 590
 591 =head2 How do I find the first array element for which a condition is true?
 592
 593 You can use this if you care about the index:
 594
 595     for ($i=0; $i < @array; $i++) {
 596         if ($array[$i] eq "Waldo") {
 597             $found_index = $i;
 598             last;
 599         }
 600     }
 601
 602 Now C<$found_index> has what you want.
 603
 604 =head2 How do I handle linked lists?
 605
 606 In general, you usually don't need a linked list in Perl, since with
 607 regular arrays, you can push and pop or shift and unshift at either end,
 608 or you can use splice to add and/or remove arbitrary number of elements
 609 at arbitrary points.
 610
 611 If you really, really wanted, you could use structures as described in
 612 L<perldsc> or L<perltoot> and do just what the algorithm book tells you
 613 to do.
 614
 615 =head2 How do I handle circular lists?
 616
 617 Circular lists could be handled in the traditional fashion with linked
 618 lists, or you could just do something like this with an array:
 619
 620     unshift(@array, pop(@array));  # the last shall be first
 621     push(@array, shift(@array));   # and vice versa
 622
 623 =head2 How do I shuffle an array randomly?
 624
 625 Here's a shuffling algorithm which works its way through the list,
 626 randomly picking another element to swap the current element with:
 627
 628     srand;
 629     @new = ();
 630     @old = 1 .. 10;  # just a demo
 631     while (@old) {
 632         push(@new, splice(@old, rand @old, 1));
 633     }
 634
 635 For large arrays, this avoids a lot of the reshuffling:
 636
 637     srand;
 638     @new = ();
 639     @old = 1 .. 10000;  # just a demo
 640     for( @old ){
 641         my $r = rand @new+1;
 642         push(@new,$new[$r]);
 643         $new[$r] = $_;
 644     }
 645
 646 =head2 How do I process/modify each element of an array?
 647
 648 Use C<for>/C<foreach>:
 649
 650     for (@lines) {
 651         s/foo/bar/;
 652         tr[a-z][A-Z];
 653     }
 654
 655 Here's another; let's compute spherical volumes:
 656
 657     for (@radii) {
 658         $_ **= 3;
 659         $_ *= (4/3) * 3.14159;  # this will be constant folded
 660     }
 661
 662 =head2 How do I select a random element from an array?
 663
 664 Use the rand() function (see L<perlfunc/rand>):
 665
 666     srand;                      # not needed for 5.004 and later
 667     $index   = rand @array;
 668     $element = $array[$index];
 669
 670 =head2 How do I permute N elements of a list?
 671
 672 Here's a little program that generates all permutations
 673 of all the words on each line of input.  The algorithm embodied
 674 in the permut() function should work on any list:
 675
 676     #!/usr/bin/perl -n
 677     # permute - tchrist@perl.com
 678     permut([split], []);
 679     sub permut {
 680         my @head = @{ $_[0] };
 681         my @tail = @{ $_[1] };
 682         unless (@head) {
 683             # stop recursing when there are no elements in the head
 684             print "@tail\n";
 685         } else {
 686             # for all elements in @head, move one from @head to @tail
 687             # and call permut() on the new @head and @tail
 688             my(@newhead,@newtail,$i);
 689             foreach $i (0 .. $#head) {
 690                 @newhead = @head;
 691                 @newtail = @tail;
 692                 unshift(@newtail, splice(@newhead, $i, 1));
 693                 permut([@newhead], [@newtail]);
 694             }
 695         }
 696     }
 697
 698 =head2 How do I sort an array by (anything)?
 699
 700 Supply a comparison function to sort() (described in L<perlfunc/sort>):
 701
 702     @list = sort { $a <=> $b } @list;
 703
 704 The default sort function is cmp, string comparison, which would
 705 sort C<(1, 2, 10)> into C<(1, 10, 2)>.  C<E<lt>=E<gt>>, used above, is
 706 the numerical comparison operator.
 707
 708 If you have a complicated function needed to pull out the part you
 709 want to sort on, then don't do it inside the sort function.  Pull it
 710 out first, because the sort BLOCK can be called many times for the
 711 same element.  Here's an example of how to pull out the first word
 712 after the first number on each item, and then sort those words
 713 case-insensitively.
 714
 715     @idx = ();
 716     for (@data) {
 717         ($item) = /\d+\s*(\S+)/;
 718         push @idx, uc($item);
 719     }
 720     @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
 721
 722 Which could also be written this way, using a trick
 723 that's come to be known as the Schwartzian Transform:
 724
 725     @sorted = map  { $_->[0] }
 726               sort { $a->[1] cmp $b->[1] }
 727               map  { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
 728
 729 If you need to sort on several fields, the following paradigm is useful.
 730
 731     @sorted = sort { field1($a) <=> field1($b) ||
 732                      field2($a) cmp field2($b) ||
 733                      field3($a) cmp field3($b)
 734                    }     @data;
 735
 736 This can be conveniently combined with precalculation of keys as given
 737 above.
 738
 739 See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
 740 this approach.
 741
 742 See also the question below on sorting hashes.
 743
 744 =head2 How do I manipulate arrays of bits?
 745
 746 Use pack() and unpack(), or else vec() and the bitwise operations.
 747
 748 For example, this sets $vec to have bit N set if $ints[N] was set:
 749
 750     $vec = '';
 751     foreach(@ints) { vec($vec,$_,1) = 1 }
 752
 753 And here's how, given a vector in $vec, you can
 754 get those bits into your @ints array:
 755
 756     sub bitvec_to_list {
 757         my $vec = shift;
 758         my @ints;
 759         # Find null-byte density then select best algorithm
 760         if ($vec =~ tr/\0// / length $vec > 0.95) {
 761             use integer;
 762             my $i;
 763             # This method is faster with mostly null-bytes
 764             while($vec =~ /[^\0]/g ) {
 765                 $i = -9 + 8 * pos $vec;
 766                 push @ints, $i if vec($vec, ++$i, 1);
 767                 push @ints, $i if vec($vec, ++$i, 1);
 768                 push @ints, $i if vec($vec, ++$i, 1);
 769                 push @ints, $i if vec($vec, ++$i, 1);
 770                 push @ints, $i if vec($vec, ++$i, 1);
 771                 push @ints, $i if vec($vec, ++$i, 1);
 772                 push @ints, $i if vec($vec, ++$i, 1);
 773                 push @ints, $i if vec($vec, ++$i, 1);
 774             }
 775         } else {
 776             # This method is a fast general algorithm
 777             use integer;
 778             my $bits = unpack "b*", $vec;
 779             push @ints, 0 if $bits =~ s/^(\d)// && $1;
 780             push @ints, pos $bits while($bits =~ /1/g);
 781         }
 782         return \@ints;
 783     }
 784
 785 This method gets faster the more sparse the bit vector is.
 786 (Courtesy of Tim Bunce and Winfried Koenig.)
 787
 788 =head2 Why does defined() return true on empty arrays and hashes?
 789
 790 See L<perlfunc/defined> in the 5.004 release or later of Perl.
 791
 792 =head1 Data: Hashes (Associative Arrays)
 793
 794 =head2 How do I process an entire hash?
 795
 796 Use the each() function (see L<perlfunc/each>) if you don't care
 797 whether it's sorted:
 798
 799     while (($key,$value) = each %hash) {
 800         print "$key = $value\n";
 801     }
 802
 803 If you want it sorted, you'll have to use foreach() on the result of
 804 sorting the keys as shown in an earlier question.
 805
 806 =head2 What happens if I add or remove keys from a hash while iterating over it?
 807
 808 Don't do that.
 809
 810 =head2 How do I look up a hash element by value?
 811
 812 Create a reverse hash:
 813
 814     %by_value = reverse %by_key;
 815     $key = $by_value{$value};
 816
 817 That's not particularly efficient.  It would be more space-efficient
 818 to use:
 819
 820     while (($key, $value) = each %by_key) {
 821         $by_value{$value} = $key;
 822     }
 823
 824 If your hash could have repeated values, the methods above will only
 825 find one of the associated keys.   This may or may not worry you.
 826
 827 =head2 How can I know how many entries are in a hash?
 828
 829 If you mean how many keys, then all you have to do is
 830 take the scalar sense of the keys() function:
 831
 832     $num_keys = scalar keys %hash;
 833
 834 In void context it just resets the iterator, which is faster
 835 for tied hashes.
 836
 837 =head2 How do I sort a hash (optionally by value instead of key)?
 838
 839 Internally, hashes are stored in a way that prevents you from imposing
 840 an order on key-value pairs.  Instead, you have to sort a list of the
 841 keys or values:
 842
 843     @keys = sort keys %hash;    # sorted by key
 844     @keys = sort {
 845                     $hash{$a} cmp $hash{$b}
 846             } keys %hash;       # and by value
 847
 848 Here we'll do a reverse numeric sort by value, and if two keys are
 849 identical, sort by length of key, and if that fails, by straight ASCII
 850 comparison of the keys (well, possibly modified by your locale -- see
 851 L<perllocale>).
 852
 853     @keys = sort {
 854                 $hash{$b} <=> $hash{$a}
 855                           ||
 856                 length($b) <=> length($a)
 857                           ||
 858                       $a cmp $b
 859     } keys %hash;
 860
 861 =head2 How can I always keep my hash sorted?
 862
 863 You can look into using the DB_File module and tie() using the
 864 $DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
 865
 866 =head2 What's the difference between "delete" and "undef" with hashes?
 867
 868 Hashes are pairs of scalars: the first is the key, the second is the
 869 value.  The key will be coerced to a string, although the value can be
 870 any kind of scalar: string, number, or reference.  If a key C<$key> is
 871 present in the array, C<exists($key)> will return true.  The value for
 872 a given key can be C<undef>, in which case C<$array{$key}> will be
 873 C<undef> while C<$exists{$key}> will return true.  This corresponds to
 874 (C<$key>, C<undef>) being in the hash.
 875
 876 Pictures help...  here's the C<%ary> table:
 877
 878           keys  values
 879         +------+------+
 880         |  a   |  3   |
 881         |  x   |  7   |
 882         |  d   |  0   |
 883         |  e   |  2   |
 884         +------+------+
 885
 886 And these conditions hold
 887
 888         $ary{'a'}                       is true
 889         $ary{'d'}                       is false
 890         defined $ary{'d'}               is true
 891         defined $ary{'a'}               is true
 892         exists $ary{'a'}                is true (perl5 only)
 893         grep ($_ eq 'a', keys %ary)     is true
 894
 895 If you now say
 896
 897         undef $ary{'a'}
 898
 899 your table now reads:
 900
 901
 902           keys  values
 903         +------+------+
 904         |  a   | undef|
 905         |  x   |  7   |
 906         |  d   |  0   |
 907         |  e   |  2   |
 908         +------+------+
 909
 910 and these conditions now hold; changes in caps:
 911
 912         $ary{'a'}                       is FALSE
 913         $ary{'d'}                       is false
 914         defined $ary{'d'}               is true
 915         defined $ary{'a'}               is FALSE
 916         exists $ary{'a'}                is true (perl5 only)
 917         grep ($_ eq 'a', keys %ary)     is true
 918
 919 Notice the last two: you have an undef value, but a defined key!
 920
 921 Now, consider this:
 922
 923         delete $ary{'a'}
 924
 925 your table now reads:
 926
 927           keys  values
 928         +------+------+
 929         |  x   |  7   |
 930         |  d   |  0   |
 931         |  e   |  2   |
 932         +------+------+
 933
 934 and these conditions now hold; changes in caps:
 935
 936         $ary{'a'}                       is false
 937         $ary{'d'}                       is false
 938         defined $ary{'d'}               is true
 939         defined $ary{'a'}               is false
 940         exists $ary{'a'}                is FALSE (perl5 only)
 941         grep ($_ eq 'a', keys %ary)     is FALSE
 942
 943 See, the whole entry is gone!
 944
 945 =head2 Why don't my tied hashes make the defined/exists distinction?
 946
 947 They may or may not implement the EXISTS() and DEFINED() methods
 948 differently.  For example, there isn't the concept of undef with hashes
 949 that are tied to DBM* files. This means the true/false tables above
 950 will give different results when used on such a hash.  It also means
 951 that exists and defined do the same thing with a DBM* file, and what
 952 they end up doing is not what they do with ordinary hashes.
 953
 954 =head2 How do I reset an each() operation part-way through?
 955
 956 Using C<keys %hash> in a scalar context returns the number of keys in
 957 the hash I<and> resets the iterator associated with the hash.  You may
 958 need to do this if you use C<last> to exit a loop early so that when you
 959 re-enter it, the hash iterator has been reset.
 960
 961 =head2 How can I get the unique keys from two hashes?
 962
 963 First you extract the keys from the hashes into arrays, and then solve
 964 the uniquifying the array problem described above.  For example:
 965
 966     %seen = ();
 967     for $element (keys(%foo), keys(%bar)) {
 968         $seen{$element}++;
 969     }
 970     @uniq = keys %seen;
 971
 972 Or more succinctly:
 973
 974     @uniq = keys %{{%foo,%bar}};
 975
 976 Or if you really want to save space:
 977
 978     %seen = ();
 979     while (defined ($key = each %foo)) {
 980         $seen{$key}++;
 981     }
 982     while (defined ($key = each %bar)) {
 983         $seen{$key}++;
 984     }
 985     @uniq = keys %seen;
 986
 987 =head2 How can I store a multidimensional array in a DBM file?
 988
 989 Either stringify the structure yourself (no fun), or else
 990 get the MLDBM (which uses Data::Dumper) module from CPAN and layer
 991 it on top of either DB_File or GDBM_File.
 992
 993 =head2 How can I make my hash remember the order I put elements into it?
 994
 995 Use the Tie::IxHash from CPAN.
 996
 997     use Tie::IxHash;
 998     tie(%myhash, Tie::IxHash);
 999     for ($i=0; $i<20; $i++) {
1000         $myhash{$i} = 2*$i;
1001     }
1002     @keys = keys %myhash;
1003     # @keys = (0,1,2,3,...)
1004
1005 =head2 Why does passing a subroutine an undefined element in a hash create it?
1006
1007 If you say something like:
1008
1009     somefunc($hash{"nonesuch key here"});
1010
1011 Then that element "autovivifies"; that is, it springs into existence
1012 whether you store something there or not.  That's because functions
1013 get scalars passed in by reference.  If somefunc() modifies C<$_[0]>,
1014 it has to be ready to write it back into the caller's version.
1015
1016 This has been fixed as of perl5.004.
1017
1018 Normally, merely accessing a key's value for a nonexistent key does
1019 I<not> cause that key to be forever there.  This is different than
1020 awk's behavior.
1021
1022 =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
1023
1024 Use references (documented in L<perlref>).  Examples of complex data
1025 structures are given in L<perldsc> and L<perllol>.  Examples of
1026 structures and object-oriented classes are in L<perltoot>.
1027
1028 =head2 How can I use a reference as a hash key?
1029
1030 You can't do this directly, but you could use the standard Tie::Refhash
1031 module distributed with perl.
1032
1033 =head1 Data: Misc
1034
1035 =head2 How do I handle binary data correctly?
1036
1037 Perl is binary clean, so this shouldn't be a problem.  For example,
1038 this works fine (assuming the files are found):
1039
1040     if (`cat /vmunix` =~ /gzip/) {
1041         print "Your kernel is GNU-zip enabled!\n";
1042     }
1043
1044 On some systems, however, you have to play tedious games with "text"
1045 versus "binary" files.  See L<perlfunc/"binmode">.
1046
1047 If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1048
1049 If you want to deal with multibyte characters, however, there are
1050 some gotchas.  See the section on Regular Expressions.
1051
1052 =head2 How do I determine whether a scalar is a number/whole/integer/float?
1053
1054 Assuming that you don't care about IEEE notations like "NaN" or
1055 "Infinity", you probably just want to use a regular expression.
1056
1057    warn "has nondigits"        if     /\D/;
1058    warn "not a whole number"   unless /^\d+$/;
1059    warn "not an integer"       unless /^-?\d+$/;  # reject +3
1060    warn "not an integer"       unless /^[+-]?\d+$/;
1061    warn "not a decimal number" unless /^-?\d+\.?\d*$/;  # rejects .2
1062    warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
1063    warn "not a C float"
1064        unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
1065
1066 Or you could check out
1067 http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz
1068 instead.  The POSIX module (part of the standard Perl distribution)
1069 provides the C<strtol> and C<strtod> for converting strings to double
1070 and longs, respectively.
1071
1072 =head2 How do I keep persistent data across program calls?
1073
1074 For some specific applications, you can use one of the DBM modules.
1075 See L<AnyDBM_File>.  More generically, you should consult the
1076 FreezeThaw, Storable, or Class::Eroot modules from CPAN.
1077
1078 =head2 How do I print out or copy a recursive data structure?
1079
1080 The Data::Dumper module on CPAN is nice for printing out
1081 data structures, and FreezeThaw for copying them.  For example:
1082
1083     use FreezeThaw qw(freeze thaw);
1084     $new = thaw freeze $old;
1085
1086 Where $old can be (a reference to) any kind of data structure you'd like.
1087 It will be deeply copied.
1088
1089 =head2 How do I define methods for every class/object?
1090
1091 Use the UNIVERSAL class (see L<UNIVERSAL>).
1092
1093 =head2 How do I verify a credit card checksum?
1094
1095 Get the Business::CreditCard module from CPAN.
1096
1097 =head1 AUTHOR AND COPYRIGHT
1098
1099 Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
1100 All rights reserved.  See L<perlfaq> for distribution information.
1101