* Synced the perlfaq
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f 14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ee891a00 20L<perlnumber> shows the gory details of number representations and
ac9dac7f 21conversions.
49d635f9 22
ac9dac7f 23To limit the number of decimal places in your numbers, you can use the
3bc3c5be 24C<printf> or C<sprintf> function. See the L<"Floating Point
ac9dac7f 25Arithmetic"|perlop> for more details.
49d635f9 26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e 31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e 34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e 37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e 43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
109f0441 51(contributed by brian d foy)
52
53You're probably trying to convert a string to a number, which Perl only
54converts as a decimal number. When Perl converts a string to a number, it
55ignores leading spaces and zeroes, then assumes the rest of the digits
56are in base 10:
57
58 my $string = '0644';
59
60 print $string + 0; # prints 644
61
62 print $string + 44; # prints 688, certainly not octal!
63
64This problem usually involves one of the Perl built-ins that has the
65same name a unix command that uses octal numbers as arguments on the
66command line. In this example, C<chmod> on the command line knows that
67its first argument is octal because that's what it does:
68
69 %prompt> chmod 644 file
70
71If you want to use the same literal digits (644) in Perl, you have to tell
72Perl to treat them as octal numbers either by prefixing the digits with
73a C<0> or using C<oct>:
74
75 chmod( 0644, $file); # right, has leading zero
76 chmod( oct(644), $file ); # also correct
68dc0745 77
109f0441 78The problem comes in when you take your numbers from something that Perl
79thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 80
109f0441 81 chmod( $ARGV[0], $file); # wrong, even if "0644"
68dc0745 82
109f0441 83 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
33ce146f 84
109f0441 85You can always check the value you're using by printing it in octal
86notation to ensure it matches what you think it should be. Print it
87in octal and decimal format:
33ce146f 88
109f0441 89 printf "0%o %d", $number, $number;
33ce146f 90
65acb1b1 91=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 92
ac9dac7f 93Remember that C<int()> merely truncates toward 0. For rounding to a
94certain number of digits, C<sprintf()> or C<printf()> is usually the
95easiest route.
92c2ed05 96
ac9dac7f 97 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 98
ac9dac7f 99The C<POSIX> module (part of the standard Perl distribution)
100implements C<ceil()>, C<floor()>, and a number of other mathematical
101and trigonometric functions.
68dc0745 102
ac9dac7f 103 use POSIX;
104 $ceil = ceil(3.5); # 4
105 $floor = floor(3.5); # 3
92c2ed05 106
ac9dac7f 107In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
108module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 109distribution) implements the trigonometric functions. Internally it
ac9dac7f 110uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 111the real axis into the complex plane, for example the inverse sine of
1122.
68dc0745 113
114Rounding in financial applications can have serious implications, and
115the rounding method used should be specified precisely. In these
116cases, it probably pays not to trust whichever system rounding is
117being used by Perl, but to instead implement the rounding function you
118need yourself.
119
65acb1b1 120To see why, notice how you'll still have an issue on half-way-point
121alternation:
122
ac9dac7f 123 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 124
ac9dac7f 125 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
126 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 127
ac9dac7f 128Don't blame Perl. It's the same as in C. IEEE says we have to do
129this. Perl numbers whose absolute values are integers under 2**31 (on
13032 bit machines) will work pretty much like mathematical integers.
131Other numbers are not guaranteed.
65acb1b1 132
6f0efb17 133=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 134
ac9dac7f 135As always with Perl there is more than one way to do it. Below are a
136few examples of approaches to making common conversions between number
137representations. This is intended to be representational rather than
138exhaustive.
68dc0745 139
ac9dac7f 140Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
141module from CPAN. The reason you might choose C<Bit::Vector> over the
142perl built in functions is that it works with numbers of ANY size,
143that it is optimized for speed on some operations, and for at least
144some programmers the notation might be familiar.
d92eb7b0 145
818c4caa 146=over 4
147
148=item How do I convert hexadecimal into decimal
d92eb7b0 149
ac9dac7f 150Using perl's built in conversion of C<0x> notation:
6761e064 151
ac9dac7f 152 $dec = 0xDEADBEEF;
7207e29d 153
ac9dac7f 154Using the C<hex> function:
6761e064 155
ac9dac7f 156 $dec = hex("DEADBEEF");
6761e064 157
ac9dac7f 158Using C<pack>:
6761e064 159
ac9dac7f 160 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 161
ac9dac7f 162Using the CPAN module C<Bit::Vector>:
6761e064 163
ac9dac7f 164 use Bit::Vector;
165 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166 $dec = $vec->to_Dec();
6761e064 167
818c4caa 168=item How do I convert from decimal to hexadecimal
6761e064 169
ac9dac7f 170Using C<sprintf>:
6761e064 171
ac9dac7f 172 $hex = sprintf("%X", 3735928559); # upper case A-F
173 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 174
ac9dac7f 175Using C<unpack>:
6761e064 176
ac9dac7f 177 $hex = unpack("H*", pack("N", 3735928559));
6761e064 178
ac9dac7f 179Using C<Bit::Vector>:
6761e064 180
ac9dac7f 181 use Bit::Vector;
182 $vec = Bit::Vector->new_Dec(32, -559038737);
183 $hex = $vec->to_Hex();
6761e064 184
ac9dac7f 185And C<Bit::Vector> supports odd bit counts:
6761e064 186
ac9dac7f 187 use Bit::Vector;
188 $vec = Bit::Vector->new_Dec(33, 3735928559);
189 $vec->Resize(32); # suppress leading 0 if unwanted
190 $hex = $vec->to_Hex();
6761e064 191
818c4caa 192=item How do I convert from octal to decimal
6761e064 193
194Using Perl's built in conversion of numbers with leading zeros:
195
ac9dac7f 196 $dec = 033653337357; # note the leading 0!
6761e064 197
ac9dac7f 198Using the C<oct> function:
6761e064 199
ac9dac7f 200 $dec = oct("33653337357");
6761e064 201
ac9dac7f 202Using C<Bit::Vector>:
6761e064 203
ac9dac7f 204 use Bit::Vector;
205 $vec = Bit::Vector->new(32);
206 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
207 $dec = $vec->to_Dec();
6761e064 208
818c4caa 209=item How do I convert from decimal to octal
6761e064 210
ac9dac7f 211Using C<sprintf>:
6761e064 212
ac9dac7f 213 $oct = sprintf("%o", 3735928559);
6761e064 214
ac9dac7f 215Using C<Bit::Vector>:
6761e064 216
ac9dac7f 217 use Bit::Vector;
218 $vec = Bit::Vector->new_Dec(32, -559038737);
219 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 220
818c4caa 221=item How do I convert from binary to decimal
6761e064 222
2c646907 223Perl 5.6 lets you write binary numbers directly with
ac9dac7f 224the C<0b> notation:
2c646907 225
ac9dac7f 226 $number = 0b10110110;
6f0efb17 227
ac9dac7f 228Using C<oct>:
6f0efb17 229
ac9dac7f 230 my $input = "10110110";
231 $decimal = oct( "0b$input" );
2c646907 232
ac9dac7f 233Using C<pack> and C<ord>:
d92eb7b0 234
ac9dac7f 235 $decimal = ord(pack('B8', '10110110'));
68dc0745 236
ac9dac7f 237Using C<pack> and C<unpack> for larger strings:
6761e064 238
ac9dac7f 239 $int = unpack("N", pack("B32",
6761e064 240 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 241 $dec = sprintf("%d", $int);
6761e064 242
ac9dac7f 243 # substr() is used to left pad a 32 character string with zeros.
6761e064 244
ac9dac7f 245Using C<Bit::Vector>:
6761e064 246
ac9dac7f 247 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
248 $dec = $vec->to_Dec();
6761e064 249
818c4caa 250=item How do I convert from decimal to binary
6761e064 251
ac9dac7f 252Using C<sprintf> (perl 5.6+):
4dfcc30b 253
ac9dac7f 254 $bin = sprintf("%b", 3735928559);
4dfcc30b 255
ac9dac7f 256Using C<unpack>:
6761e064 257
ac9dac7f 258 $bin = unpack("B*", pack("N", 3735928559));
6761e064 259
ac9dac7f 260Using C<Bit::Vector>:
6761e064 261
ac9dac7f 262 use Bit::Vector;
263 $vec = Bit::Vector->new_Dec(32, -559038737);
264 $bin = $vec->to_Bin();
6761e064 265
266The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
267are left as an exercise to the inclined reader.
68dc0745 268
818c4caa 269=back
68dc0745 270
65acb1b1 271=head2 Why doesn't & work the way I want it to?
272
273The behavior of binary arithmetic operators depends on whether they're
274used on numbers or strings. The operators treat a string as a series
275of bits and work with that (the string C<"3"> is the bit pattern
276C<00110011>). The operators work with the binary form of a number
277(the number C<3> is treated as the bit pattern C<00000011>).
278
279So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 280C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1 281(yielding C<"1">).
282
283Most problems with C<&> and C<|> arise because the programmer thinks
284they have a number but really it's a string. The rest arise because
285the programmer says:
286
ac9dac7f 287 if ("\020\020" & "\101\101") {
288 # ...
289 }
65acb1b1 290
291but a string consisting of two null bytes (the result of C<"\020\020"
292& "\101\101">) is not a false value in Perl. You need:
293
ac9dac7f 294 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
295 # ...
296 }
65acb1b1 297
68dc0745 298=head2 How do I multiply matrices?
299
300Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
301or the PDL extension (also available from CPAN).
302
303=head2 How do I perform an operation on a series of integers?
304
305To call a function on each element in an array, and collect the
306results, use:
307
ac9dac7f 308 @results = map { my_func($_) } @array;
68dc0745 309
310For example:
311
ac9dac7f 312 @triple = map { 3 * $_ } @single;
68dc0745 313
314To call a function on each element of an array, but ignore the
315results:
316
ac9dac7f 317 foreach $iterator (@array) {
318 some_func($iterator);
319 }
68dc0745 320
321To call a function on each integer in a (small) range, you B<can> use:
322
ac9dac7f 323 @results = map { some_func($_) } (5 .. 25);
68dc0745 324
325but you should be aware that the C<..> operator creates an array of
326all integers in the range. This can take a lot of memory for large
327ranges. Instead use:
328
ac9dac7f 329 @results = ();
330 for ($i=5; $i < 500_005; $i++) {
331 push(@results, some_func($i));
332 }
68dc0745 333
87275199 334This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
335loop will iterate over the range, without creating the entire range.
336
ac9dac7f 337 for my $i (5 .. 500_005) {
338 push(@results, some_func($i));
339 }
87275199 340
341will not create a list of 500,000 integers.
342
68dc0745 343=head2 How can I output Roman numerals?
344
a93751fa 345Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 346
347=head2 Why aren't my random numbers random?
348
65acb1b1 349If you're using a version of Perl before 5.004, you must call C<srand>
350once at the start of your program to seed the random number generator.
49d635f9 351
5cd0b561 352 BEGIN { srand() if $] < 5.004 }
49d635f9 353
65acb1b1 3545.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f 355call C<srand> more than once--you make your numbers less random,
356rather than more.
92c2ed05 357
65acb1b1 358Computers are good at being predictable and bad at being random
06a5f41f 359(despite appearances caused by bugs in your programs :-). see the
49d635f9 360F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f 361collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
362of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 363who attempts to generate random numbers by deterministic means is, of
b432a672 364course, living in a state of sin."
65acb1b1 365
366If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 367provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1 368CPAN. It uses the imperfections in your system's timer to generate
369random numbers, but this takes quite a while. If you want a better
92c2ed05 370pseudorandom generator than comes with your operating system, look at
b432a672 371"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 372
881bdbd4 373=head2 How do I get a random number between X and Y?
374
ee891a00 375To get a random number between two values, you can use the C<rand()>
109f0441 376built-in to get a random number between 0 and 1. From there, you shift
ee891a00 377that into the range that you want.
500071f4 378
ee891a00 379C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
380what you want to have perl figure out is a random number in the range
381from 0 to the difference between your I<X> and I<Y>.
793f5136 382
ee891a00 383That is, to get a number between 10 and 15, inclusive, you want a
384random number between 0 and 5 that you can then add to 10.
793f5136 385
109f0441 386 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136 387
388Hence you derive the following simple function to abstract
389that. It selects a random integer between the two given
500071f4 390integers (inclusive), For example: C<random_int_between(50,120)>.
391
ac9dac7f 392 sub random_int_between {
500071f4 393 my($min, $max) = @_;
394 # Assumes that the two arguments are integers themselves!
395 return $min if $min == $max;
396 ($min, $max) = ($max, $min) if $min > $max;
397 return $min + int rand(1 + $max - $min);
398 }
881bdbd4 399
68dc0745 400=head1 Data: Dates
401
5cd0b561 402=head2 How do I find the day or week of the year?
68dc0745 403
571e049f 404The localtime function returns the day of the year. Without an
5cd0b561 405argument localtime uses the current time.
68dc0745 406
a05e4845 407 $day_of_year = (localtime)[7];
ffc145e8 408
ac9dac7f 409The C<POSIX> module can also format a date as the day of the year or
5cd0b561 410week of the year.
68dc0745 411
5cd0b561 412 use POSIX qw/strftime/;
413 my $day_of_year = strftime "%j", localtime;
414 my $week_of_year = strftime "%W", localtime;
415
ac9dac7f 416To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 417a time in epoch seconds for the argument to localtime.
ffc145e8 418
ac9dac7f 419 use POSIX qw/mktime strftime/;
6670e5e7 420 my $week_of_year = strftime "%W",
ac9dac7f 421 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 422
ac9dac7f 423The C<Date::Calc> module provides two functions to calculate these.
5cd0b561 424
425 use Date::Calc;
426 my $day_of_year = Day_of_Year( 1987, 12, 18 );
427 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 428
d92eb7b0 429=head2 How do I find the current century or millennium?
430
431Use the following simple functions:
432
ac9dac7f 433 sub get_century {
434 return int((((localtime(shift || time))[5] + 1999))/100);
435 }
6670e5e7 436
ac9dac7f 437 sub get_millennium {
438 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
439 }
d92eb7b0 440
ac9dac7f 441On some systems, the C<POSIX> module's C<strftime()> function has been
442extended in a non-standard way to use a C<%C> format, which they
443sometimes claim is the "century". It isn't, because on most such
444systems, this is only the first two digits of the four-digit year, and
445thus cannot be used to reliably determine the current century or
446millennium.
d92eb7b0 447
92c2ed05 448=head2 How can I compare two dates and find the difference?
68dc0745 449
b68463f7 450(contributed by brian d foy)
451
ac9dac7f 452You could just store all your dates as a number and then subtract.
453Life isn't always that simple though. If you want to work with
454formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
455modules can help you.
68dc0745 456
457=head2 How can I take a string and turn it into epoch seconds?
458
459If it's a regular enough string that it always has the same format,
92c2ed05 460you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f 461C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
462and C<Date::Manip> modules from CPAN.
68dc0745 463
464=head2 How can I find the Julian Day?
465
7678cced 466(contributed by brian d foy and Dave Cross)
467
ac9dac7f 468You can use the C<Time::JulianDay> module available on CPAN. Ensure
469that you really want to find a Julian day, though, as many people have
7678cced 470different ideas about Julian days. See
471http://www.hermetic.ch/cal_stud/jdn.htm for instance.
472
ac9dac7f 473You can also try the C<DateTime> module, which can convert a date/time
7678cced 474to a Julian Day.
475
ac9dac7f 476 $ perl -MDateTime -le'print DateTime->today->jd'
477 2453401.5
7678cced 478
479Or the modified Julian Day
480
ac9dac7f 481 $ perl -MDateTime -le'print DateTime->today->mjd'
482 53401
7678cced 483
484Or even the day of the year (which is what some people think of as a
485Julian day)
486
ac9dac7f 487 $ perl -MDateTime -le'print DateTime->today->doy'
488 31
be94a901 489
65acb1b1 490=head2 How do I find yesterday's date?
109f0441 491X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
492X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
493X<timelocal>
65acb1b1 494
6670e5e7 495(contributed by brian d foy)
49d635f9 496
6670e5e7 497Use one of the Date modules. The C<DateTime> module makes it simple, and
498give you the same time of day, only the day before.
49d635f9 499
6670e5e7 500 use DateTime;
58103a2e 501
6670e5e7 502 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 503
6670e5e7 504 print "Yesterday was $yesterday\n";
49d635f9 505
ee891a00 506You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 507function.
49d635f9 508
6670e5e7 509 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 510
6670e5e7 511 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 512
ee891a00 513 print "@date_time\n";
58103a2e 514
6670e5e7 515Most people try to use the time rather than the calendar to figure out
516dates, but that assumes that days are twenty-four hours each. For
517most people, there are two days a year when they aren't: the switch to
518and from summer time throws this off. Let the modules do the work.
d92eb7b0 519
109f0441 520If you absolutely must do it yourself (or can't use one of the
521modules), here's a solution using C<Time::Local>, which comes with
522Perl:
523
524 # contributed by Gunnar Hjalmarsson
525 use Time::Local;
526 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
527 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
528 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
529
530In this case, you measure the day starting at noon, and subtract 24
531hours. Even if the length of the calendar day is 23 or 25 hours,
532you'll still end up on the previous calendar day, although not at
533noon. Since you don't care about the time, the one hour difference
534doesn't matter and you end up with the previous date.
535
3bc3c5be 536=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
537
538(contributed by brian d foy)
539
540Perl itself never had a Y2K problem, although that nevers stopped people
541from creating Y2K problems on their own. See the documentation for
542C<localtime> for its proper use.
543
544Starting with Perl 5.11, C<localtime> and C<gmtime> can handle dates past
54503:14:08 January 19, 2038, when a 32-bit based time would overflow. You
546still might get a warning on a 32-bit C<perl>:
547
548 % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
549 Integer overflow in hexadecimal number at -e line 1.
550 Wed Nov 1 19:42:39 5576711
551
552On a 64-bit C<perl>, you can get even larger dates for those really long
553running projects:
554
555 % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
556 Thu Nov 2 00:42:39 5576711
557
558You're still out of luck if you need to keep tracking of decaying protons
559though.
5a964f20 560
68dc0745 561=head1 Data: Strings
562
563=head2 How do I validate input?
564
6670e5e7 565(contributed by brian d foy)
566
567There are many ways to ensure that values are what you expect or
568want to accept. Besides the specific examples that we cover in the
569perlfaq, you can also look at the modules with "Assert" and "Validate"
570in their names, along with other modules such as C<Regexp::Common>.
571
572Some modules have validation for particular types of input, such
573as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
574and C<Data::Validate::IP>.
68dc0745 575
576=head2 How do I unescape a string?
577
b432a672 578It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 579with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 580character are removed with
68dc0745 581
ac9dac7f 582 s/\\(.)/$1/g;
68dc0745 583
92c2ed05 584This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 585
586=head2 How do I remove consecutive pairs of characters?
587
6670e5e7 588(contributed by brian d foy)
589
590You can use the substitution operator to find pairs of characters (or
591runs of characters) and replace them with a single instance. In this
592substitution, we find a character in C<(.)>. The memory parentheses
593store the matched character in the back-reference C<\1> and we use
594that to require that the same thing immediately follow it. We replace
595that part of the string with the character in C<$1>.
68dc0745 596
ac9dac7f 597 s/(.)\1/$1/g;
d92eb7b0 598
6670e5e7 599We can also use the transliteration operator, C<tr///>. In this
600example, the search list side of our C<tr///> contains nothing, but
601the C<c> option complements that so it contains everything. The
602replacement list also contains nothing, so the transliteration is
603almost a no-op since it won't do any replacements (or more exactly,
604replace the character with itself). However, the C<s> option squashes
605duplicated and consecutive characters in the string so a character
606does not show up next to itself
d92eb7b0 607
6670e5e7 608 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 609 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 610
611=head2 How do I expand function calls in a string?
612
6670e5e7 613(contributed by brian d foy)
614
615This is documented in L<perlref>, and although it's not the easiest
616thing to read, it does work. In each of these examples, we call the
58103a2e 617function inside the braces used to dereference a reference. If we
5ae37c3f 618have more than one return value, we can construct and dereference an
6670e5e7 619anonymous array. In this case, we call the function in list context.
620
58103a2e 621 print "The time values are @{ [localtime] }.\n";
6670e5e7 622
623If we want to call the function in scalar context, we have to do a bit
624more work. We can really have any code we like inside the braces, so
625we simply have to end with the scalar reference, although how you do
e573f903 626that is up to you, and you can use code inside the braces. Note that
627the use of parens creates a list context, so we need C<scalar> to
628force the scalar context on the function:
68dc0745 629
6670e5e7 630 print "The time is ${\(scalar localtime)}.\n"
58103a2e 631
6670e5e7 632 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 633
6670e5e7 634If your function already returns a reference, you don't need to create
635the reference yourself.
636
637 sub timestamp { my $t = localtime; \$t }
58103a2e 638
6670e5e7 639 print "The time is ${ timestamp() }.\n";
58103a2e 640
641The C<Interpolation> module can also do a lot of magic for you. You can
642specify a variable name, in this case C<E>, to set up a tied hash that
643does the interpolation for you. It has several other methods to do this
644as well.
645
646 use Interpolation E => 'eval';
647 print "The time values are $E{localtime()}.\n";
648
649In most cases, it is probably easier to simply use string concatenation,
650which also forces scalar context.
6670e5e7 651
ac9dac7f 652 print "The time is " . localtime() . ".\n";
68dc0745 653
68dc0745 654=head2 How do I find matching/nesting anything?
655
92c2ed05 656This isn't something that can be done in one regular expression, no
657matter how complicated. To find something between two single
658characters, a pattern like C</x([^x]*)x/> will get the intervening
659bits in $1. For multiple ones, then something more like
ac9dac7f 660C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7 661nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
662C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
663L<perlre/(??{ code })>. For other cases, you'll have to write a
664parser.
92c2ed05 665
666If you are serious about writing a parser, there are a number of
6a2af475 667modules or oddities that will make your life a lot easier. There are
ac9dac7f 668the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
669C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
670the C<Text::Balanced> is part of the standard distribution.
68dc0745 671
92c2ed05 672One simple destructive, inside-out approach that you might try is to
673pull out the smallest nesting parts one at a time:
5a964f20 674
ac9dac7f 675 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
676 # do something with $1
677 }
5a964f20 678
65acb1b1 679A more complicated and sneaky approach is to make Perl's regular
680expression engine do it for you. This is courtesy Dean Inada, and
681rather has the nature of an Obfuscated Perl Contest entry, but it
682really does work:
683
ac9dac7f 684 # $_ contains the string to parse
685 # BEGIN and END are the opening and closing markers for the
686 # nested text.
c47ff5f1 687
ac9dac7f 688 @( = ('(','');
689 @) = (')','');
690 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
691 @$ = (eval{/$re/},$@!~/unmatched/i);
692 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 693
68dc0745 694=head2 How do I reverse a string?
695
ac9dac7f 696Use C<reverse()> in scalar context, as documented in
68dc0745 697L<perlfunc/reverse>.
698
ac9dac7f 699 $reversed = reverse $string;
68dc0745 700
701=head2 How do I expand tabs in a string?
702
5a964f20 703You can do it yourself:
68dc0745 704
ac9dac7f 705 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 706
ac9dac7f 707Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 708distribution).
709
ac9dac7f 710 use Text::Tabs;
711 @expanded_lines = expand(@lines_with_tabs);
68dc0745 712
713=head2 How do I reformat a paragraph?
714
ac9dac7f 715Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 716
ac9dac7f 717 use Text::Wrap;
718 print wrap("\t", ' ', @paragraphs);
68dc0745 719
ac9dac7f 720The paragraphs you give to C<Text::Wrap> should not contain embedded
721newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 722
ac9dac7f 723Or use the CPAN module C<Text::Autoformat>. Formatting files can be
724easily done by making a shell alias, like so:
bc06af74 725
ac9dac7f 726 alias fmt="perl -i -MText::Autoformat -n0777 \
727 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 728
ac9dac7f 729See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74 730capabilities.
731
49d635f9 732=head2 How can I access or change N characters of a string?
68dc0745 733
49d635f9 734You can access the first characters of a string with substr().
735To get the first character, for example, start at position 0
197aec24 736and grab the string of length 1.
68dc0745 737
68dc0745 738
49d635f9 739 $string = "Just another Perl Hacker";
ac9dac7f 740 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 741
49d635f9 742To change part of a string, you can use the optional fourth
743argument which is the replacement string.
68dc0745 744
ac9dac7f 745 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 746
49d635f9 747You can also use substr() as an lvalue.
68dc0745 748
ac9dac7f 749 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 750
68dc0745 751=head2 How do I change the Nth occurrence of something?
752
92c2ed05 753You have to keep track of N yourself. For example, let's say you want
754to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 755C<"whosoever"> or C<"whomsoever">, case insensitively. These
756all assume that $_ contains the string to be altered.
68dc0745 757
ac9dac7f 758 $count = 0;
759 s{((whom?)ever)}{
760 ++$count == 5 # is it the 5th?
761 ? "${2}soever" # yes, swap
762 : $1 # renege and leave it there
763 }ige;
68dc0745 764
5a964f20 765In the more general case, you can use the C</g> modifier in a C<while>
766loop, keeping count of matches.
767
ac9dac7f 768 $WANT = 3;
769 $count = 0;
770 $_ = "One fish two fish red fish blue fish";
771 while (/(\w+)\s+fish\b/gi) {
772 if (++$count == $WANT) {
773 print "The third fish is a $1 one.\n";
774 }
775 }
5a964f20 776
92c2ed05 777That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 778repetition count and repeated pattern like this:
779
ac9dac7f 780 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 781
68dc0745 782=head2 How can I count the number of occurrences of a substring within a string?
783
a6dd486b 784There are a number of ways, with varying efficiency. If you want a
68dc0745 785count of a certain single character (X) within a string, you can use the
786C<tr///> function like so:
787
ac9dac7f 788 $string = "ThisXlineXhasXsomeXx'sXinXit";
789 $count = ($string =~ tr/X//);
790 print "There are $count X characters in the string";
68dc0745 791
792This is fine if you are just looking for a single character. However,
793if you are trying to count multiple character substrings within a
794larger string, C<tr///> won't work. What you can do is wrap a while()
795loop around a global pattern match. For example, let's count negative
796integers:
797
ac9dac7f 798 $string = "-9 55 48 -2 23 -76 4 14 -44";
799 while ($string =~ /-\d+/g) { $count++ }
800 print "There are $count negative numbers in the string";
68dc0745 801
881bdbd4 802Another version uses a global match in list context, then assigns the
803result to a scalar, producing a count of the number of matches.
804
805 $count = () = $string =~ /-\d+/g;
806
109f0441 807=head2 How do I capitalize all the words on one line?
808X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 809
109f0441 810(contributed by brian d foy)
65acb1b1 811
109f0441 812Damian Conway's L<Text::Autoformat> handles all of the thinking
813for you.
369b44b4 814
ac9dac7f 815 use Text::Autoformat;
816 my $x = "Dr. Strangelove or: How I Learned to Stop ".
817 "Worrying and Love the Bomb";
369b44b4 818
ac9dac7f 819 print $x, "\n";
820 for my $style (qw( sentence title highlight )) {
821 print autoformat($x, { case => $style }), "\n";
822 }
369b44b4 823
109f0441 824How do you want to capitalize those words?
825
826 FRED AND BARNEY'S LODGE # all uppercase
827 Fred And Barney's Lodge # title case
828 Fred and Barney's Lodge # highlight case
829
830It's not as easy a problem as it looks. How many words do you think
831are in there? Wait for it... wait for it.... If you answered 5
832you're right. Perl words are groups of C<\w+>, but that's not what
833you want to capitalize. How is Perl supposed to know not to capitalize
834that C<s> after the apostrophe? You could try a regular expression:
835
836 $string =~ s/ (
837 (^\w) #at the beginning of the line
838 | # or
839 (\s\w) #preceded by whitespace
840 )
841 /\U$1/xg;
842
843 $string =~ s/([\w']+)/\u\L$1/g;
844
845Now, what if you don't want to capitalize that "and"? Just use
846L<Text::Autoformat> and get on with the next problem. :)
847
49d635f9 848=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 849
ac9dac7f 850Several modules can handle this sort of parsing--C<Text::Balanced>,
851C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9 852
853Take the example case of trying to split a string that is
854comma-separated into its different fields. You can't use C<split(/,/)>
855because you shouldn't split if the comma is inside quotes. For
856example, take a data line like this:
68dc0745 857
ac9dac7f 858 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 859
860Due to the restriction of the quotes, this is a fairly complex
197aec24 861problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 862I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 863suggests (assuming your string is contained in C<$text>):
68dc0745 864
ac9dac7f 865 @new = ();
866 push(@new, $+) while $text =~ m{
867 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
868 | ([^,]+),?
869 | ,
870 }gx;
871 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 872
46fc3d4c 873If you want to represent quotation marks inside a
874quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 875C<"like \"this\"">.
46fc3d4c 876
ac9dac7f 877Alternatively, the C<Text::ParseWords> module (part of the standard
878Perl distribution) lets you say:
68dc0745 879
ac9dac7f 880 use Text::ParseWords;
881 @new = quotewords(",", 0, $text);
65acb1b1 882
68dc0745 883=head2 How do I strip blank space from the beginning/end of a string?
884
6670e5e7 885(contributed by brian d foy)
68dc0745 886
6670e5e7 887A substitution can do this for you. For a single line, you want to
888replace all the leading or trailing whitespace with nothing. You
889can do that with a pair of substitutions.
68dc0745 890
6670e5e7 891 s/^\s+//;
892 s/\s+$//;
68dc0745 893
6670e5e7 894You can also write that as a single substitution, although it turns
895out the combined statement is slower than the separate ones. That
896might not matter to you, though.
68dc0745 897
6670e5e7 898 s/^\s+|\s+$//g;
68dc0745 899
6670e5e7 900In this regular expression, the alternation matches either at the
901beginning or the end of the string since the anchors have a lower
902precedence than the alternation. With the C</g> flag, the substitution
903makes all possible matches, so it gets both. Remember, the trailing
904newline matches the C<\s+>, and the C<$> anchor can match to the
905physical end of the string, so the newline disappears too. Just add
906the newline to the output, which has the added benefit of preserving
907"blank" (consisting entirely of whitespace) lines which the C<^\s+>
908would remove all by itself.
68dc0745 909
6670e5e7 910 while( <> )
911 {
912 s/^\s+|\s+$//g;
913 print "$_\n";
914 }
5a964f20 915
6670e5e7 916For a multi-line string, you can apply the regular expression
917to each logical line in the string by adding the C</m> flag (for
918"multi-line"). With the C</m> flag, the C<$> matches I<before> an
919embedded newline, so it doesn't remove it. It still removes the
920newline at the end of the string.
921
ac9dac7f 922 $string =~ s/^\s+|\s+$//gm;
6670e5e7 923
924Remember that lines consisting entirely of whitespace will disappear,
925since the first part of the alternation can match the entire string
926and replace it with nothing. If need to keep embedded blank lines,
927you have to do a little more work. Instead of matching any whitespace
928(since that includes a newline), just match the other whitespace.
929
930 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 931
65acb1b1 932=head2 How do I pad a string with blanks or pad a number with zeroes?
933
65acb1b1 934In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 935to pad the string, C<$text> or C<$num> contains the string to be padded,
936and C<$pad_char> contains the padding character. You can use a single
937character string constant instead of the C<$pad_char> variable if you
938know what it is in advance. And in the same way you can use an integer in
939place of C<$pad_len> if you know the pad length in advance.
65acb1b1 940
d92eb7b0 941The simplest method uses the C<sprintf> function. It can pad on the left
942or right with blanks and on the left with zeroes and it will not
943truncate the result. The C<pack> function can only pad strings on the
944right with blanks and it will truncate the result to a maximum length of
945C<$pad_len>.
65acb1b1 946
ac9dac7f 947 # Left padding a string with blanks (no truncation):
04d666b1 948 $padded = sprintf("%${pad_len}s", $text);
949 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 950
ac9dac7f 951 # Right padding a string with blanks (no truncation):
04d666b1 952 $padded = sprintf("%-${pad_len}s", $text);
953 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 954
ac9dac7f 955 # Left padding a number with 0 (no truncation):
04d666b1 956 $padded = sprintf("%0${pad_len}d", $num);
957 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 958
ac9dac7f 959 # Right padding a string with blanks using pack (will truncate):
960 $padded = pack("A$pad_len",$text);
65acb1b1 961
d92eb7b0 962If you need to pad with a character other than blank or zero you can use
963one of the following methods. They all generate a pad string with the
964C<x> operator and combine that with C<$text>. These methods do
965not truncate C<$text>.
65acb1b1 966
d92eb7b0 967Left and right padding with any character, creating a new string:
65acb1b1 968
ac9dac7f 969 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
970 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 971
d92eb7b0 972Left and right padding with any character, modifying C<$text> directly:
65acb1b1 973
ac9dac7f 974 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
975 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 976
68dc0745 977=head2 How do I extract selected columns from a string?
978
e573f903 979(contributed by brian d foy)
980
981If you know where the columns that contain the data, you can
982use C<substr> to extract a single column.
983
984 my $column = substr( $line, $start_column, $length );
985
986You can use C<split> if the columns are separated by whitespace or
987some other delimiter, as long as whitespace or the delimiter cannot
988appear as part of the data.
989
990 my $line = ' fred barney betty ';
991 my @columns = split /\s+/, $line;
992 # ( '', 'fred', 'barney', 'betty' );
993
994 my $line = 'fred||barney||betty';
995 my @columns = split /\|/, $line;
996 # ( 'fred', '', 'barney', '', 'betty' );
997
998If you want to work with comma-separated values, don't do this since
999that format is a bit more complicated. Use one of the modules that
109f0441 1000handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903 1001C<Text::CSV_PP>.
1002
1003If you want to break apart an entire line of fixed columns, you can use
589a5df2 1004C<unpack> with the A (ASCII) format. By using a number after the format
e573f903 1005specifier, you can denote the column width. See the C<pack> and C<unpack>
1006entries in L<perlfunc> for more details.
1007
1008 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1009
1010Note that spaces in the format argument to C<unpack> do not denote literal
1011spaces. If you have space separated data, you may want C<split> instead.
68dc0745 1012
1013=head2 How do I find the soundex value of a string?
1014
7678cced 1015(contributed by brian d foy)
1016
1017You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f 1018matching, you might also try the C<String::Approx>, and
1019C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1020
1021=head2 How can I expand variables in text strings?
1022
e573f903 1023(contributed by brian d foy)
5a964f20 1024
322be77c 1025If you can avoid it, don't, or if you can use a templating system,
c195e131 1026such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1027might even be able to get the job done with C<sprintf> or C<printf>:
1028
1029 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
322be77c 1030
1031However, for the one-off simple case where I don't want to pull out a
1032full templating system, I'll use a string that has two Perl scalar
1033variables in it. In this example, I want to expand C<$foo> and C<$bar>
c195e131 1034to their variable's values:
e573f903 1035
1036 my $foo = 'Fred';
1037 my $bar = 'Barney';
1038 $string = 'Say hello to $foo and $bar';
1039
1040One way I can do this involves the substitution operator and a double
1041C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1042turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1043it with its value. C<$foo>, then, turns into 'Fred', and that's finally
c195e131 1044what's left in the string:
e573f903 1045
1046 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1047
e573f903 1048The C</e> will also silently ignore violations of strict, replacing
c195e131 1049undefined variable names with the empty string. Since I'm using the
109f0441 1050C</e> flag (twice even!), I have all of the same security problems I
c195e131 1051have with C<eval> in its string form. If there's something odd in
1052C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1053I could get myself in trouble.
1054
1055To get around the security problem, I could also pull the values from
1056a hash instead of evaluating variable names. Using a single C</e>, I
1057can check the hash to ensure the value exists, and if it doesn't, I
1058can replace the missing value with a marker, in this case C<???> to
1059signal that I missed something:
e573f903 1060
1061 my $string = 'This has $foo and $bar';
109f0441 1062
e573f903 1063 my %Replacements = (
1064 foo => 'Fred',
ac9dac7f 1065 );
322be77c 1066
e573f903 1067 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1068 $string =~ s/\$(\w+)/
1069 exists $Replacements{$1} ? $Replacements{$1} : '???'
1070 /eg;
322be77c 1071
e573f903 1072 print $string;
322be77c 1073
68dc0745 1074=head2 What's wrong with always quoting "$vars"?
1075
ac9dac7f 1076The problem is that those double-quotes force
e573f903 1077stringification--coercing numbers and references into strings--even
1078when you don't want them to be strings. Think of it this way:
1079double-quote expansion is used to produce new strings. If you already
1080have a string, why do you need more?
68dc0745 1081
1082If you get used to writing odd things like these:
1083
ac9dac7f 1084 print "$var"; # BAD
1085 $new = "$old"; # BAD
1086 somefunc("$var"); # BAD
68dc0745 1087
1088You'll be in trouble. Those should (in 99.8% of the cases) be
1089the simpler and more direct:
1090
ac9dac7f 1091 print $var;
1092 $new = $old;
1093 somefunc($var);
68dc0745 1094
1095Otherwise, besides slowing you down, you're going to break code when
1096the thing in the scalar is actually neither a string nor a number, but
1097a reference:
1098
ac9dac7f 1099 func(\@array);
1100 sub func {
1101 my $aref = shift;
1102 my $oref = "$aref"; # WRONG
1103 }
68dc0745 1104
1105You can also get into subtle problems on those few operations in Perl
1106that actually do care about the difference between a string and a
1107number, such as the magical C<++> autoincrement operator or the
1108syscall() function.
1109
197aec24 1110Stringification also destroys arrays.
5a964f20 1111
ac9dac7f 1112 @lines = `command`;
1113 print "@lines"; # WRONG - extra blanks
1114 print @lines; # right
5a964f20 1115
04d666b1 1116=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1117
1118Check for these three things:
1119
1120=over 4
1121
04d666b1 1122=item There must be no space after the E<lt>E<lt> part.
68dc0745 1123
197aec24 1124=item There (probably) should be a semicolon at the end.
68dc0745 1125
197aec24 1126=item You can't (easily) have any space in front of the tag.
68dc0745 1127
1128=back
1129
197aec24 1130If you want to indent the text in the here document, you
5a964f20 1131can do this:
1132
1133 # all in one
1134 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1135 your text
1136 goes here
1137 HERE_TARGET
1138
1139But the HERE_TARGET must still be flush against the margin.
197aec24 1140If you want that indented also, you'll have to quote
5a964f20 1141in the indentation.
1142
1143 ($quote = <<' FINIS') =~ s/^\s+//gm;
1144 ...we will have peace, when you and all your works have
1145 perished--and the works of your dark master to whom you
1146 would deliver us. You are a liar, Saruman, and a corrupter
1147 of men's hearts. --Theoden in /usr/src/perl/taint.c
1148 FINIS
83ded9ee 1149 $quote =~ s/\s+--/\n--/;
5a964f20 1150
1151A nice general-purpose fixer-upper function for indented here documents
1152follows. It expects to be called with a here document as its argument.
1153It looks to see whether each line begins with a common substring, and
a6dd486b 1154if so, strips that substring off. Otherwise, it takes the amount of leading
1155whitespace found on the first line and removes that much off each
5a964f20 1156subsequent line.
1157
1158 sub fix {
1159 local $_ = shift;
a6dd486b 1160 my ($white, $leader); # common whitespace and common leading string
5a964f20 1161 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1162 ($white, $leader) = ($2, quotemeta($1));
1163 } else {
1164 ($white, $leader) = (/^(\s+)/, '');
1165 }
1166 s/^\s*?$leader(?:$white)?//gm;
1167 return $_;
1168 }
1169
c8db1d39 1170This works with leading special strings, dynamically determined:
5a964f20 1171
ac9dac7f 1172 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20 1173 @@@ int
1174 @@@ runops() {
1175 @@@ SAVEI32(runlevel);
1176 @@@ runlevel++;
d92eb7b0 1177 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1178 @@@ TAINT_NOT;
1179 @@@ return 0;
1180 @@@ }
ac9dac7f 1181 MAIN_INTERPRETER_LOOP
5a964f20 1182
a6dd486b 1183Or with a fixed amount of leading whitespace, with remaining
5a964f20 1184indentation correctly preserved:
1185
ac9dac7f 1186 $poem = fix<<EVER_ON_AND_ON;
5a964f20 1187 Now far ahead the Road has gone,
1188 And I must follow, if I can,
1189 Pursuing it with eager feet,
1190 Until it joins some larger way
1191 Where many paths and errands meet.
1192 And whither then? I cannot say.
1193 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1194 EVER_ON_AND_ON
5a964f20 1195
68dc0745 1196=head1 Data: Arrays
1197
65acb1b1 1198=head2 What is the difference between a list and an array?
1199
ac9dac7f 1200An array has a changeable length. A list does not. An array is
1201something you can push or pop, while a list is a set of values. Some
1202people make the distinction that a list is a value while an array is a
1203variable. Subroutines are passed and return lists, you put things into
1204list context, you initialize arrays with lists, and you C<foreach()>
1205across a list. C<@> variables are arrays, anonymous arrays are
1206arrays, arrays in scalar context behave like the number of elements in
1207them, subroutines access their arguments through the array C<@_>, and
1208C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1 1209
1210As a side note, there's no such thing as a list in scalar context.
1211When you say
1212
ac9dac7f 1213 $scalar = (2, 5, 7, 9);
65acb1b1 1214
d92eb7b0 1215you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1216comma operator. There never was a list there at all! This causes the
d92eb7b0 1217last value to be returned: 9.
65acb1b1 1218
68dc0745 1219=head2 What is the difference between $array[1] and @array[1]?
1220
a6dd486b 1221The former is a scalar value; the latter an array slice, making
68dc0745 1222it a list with one (scalar) value. You should use $ when you want a
1223scalar value (most of the time) and @ when you want a list with one
1224scalar value in it (very, very rarely; nearly never, in fact).
1225
1226Sometimes it doesn't make a difference, but sometimes it does.
1227For example, compare:
1228
ac9dac7f 1229 $good[0] = `some program that outputs several lines`;
68dc0745 1230
1231with
1232
ac9dac7f 1233 @bad[0] = `same program that outputs several lines`;
68dc0745 1234
197aec24 1235The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1236matters.
68dc0745 1237
d92eb7b0 1238=head2 How can I remove duplicate elements from a list or array?
68dc0745 1239
6670e5e7 1240(contributed by brian d foy)
68dc0745 1241
6670e5e7 1242Use a hash. When you think the words "unique" or "duplicated", think
1243"hash keys".
68dc0745 1244
6670e5e7 1245If you don't care about the order of the elements, you could just
1246create the hash then extract the keys. It's not important how you
1247create that hash: just that you use C<keys> to get the unique
1248elements.
551e1d92 1249
ac9dac7f 1250 my %hash = map { $_, 1 } @array;
1251 # or a hash slice: @hash{ @array } = ();
1252 # or a foreach: $hash{$_} = 1 foreach ( @array );
1253
1254 my @unique = keys %hash;
68dc0745 1255
ac9dac7f 1256If you want to use a module, try the C<uniq> function from
1257C<List::MoreUtils>. In list context it returns the unique elements,
1258preserving their order in the list. In scalar context, it returns the
1259number of unique elements.
1260
1261 use List::MoreUtils qw(uniq);
1262
1263 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1264 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1265
6670e5e7 1266You can also go through each element and skip the ones you've seen
1267before. Use a hash to keep track. The first time the loop sees an
1268element, that element has no key in C<%Seen>. The C<next> statement
1269creates the key and immediately uses its value, which is C<undef>, so
1270the loop continues to the C<push> and increments the value for that
1271key. The next time the loop sees that same element, its key exists in
1272the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f 1273C<undef>), so the next skips that iteration and the loop goes to the
1274next element.
551e1d92 1275
6670e5e7 1276 my @unique = ();
1277 my %seen = ();
68dc0745 1278
6670e5e7 1279 foreach my $elem ( @array )
1280 {
1281 next if $seen{ $elem }++;
1282 push @unique, $elem;
1283 }
68dc0745 1284
6670e5e7 1285You can write this more briefly using a grep, which does the
1286same thing.
68dc0745 1287
ac9dac7f 1288 my %seen = ();
1289 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1290
ddbc1f16 1291=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1292
109f0441 1293(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1294
5a964f20 1295Hearing the word "in" is an I<in>dication that you probably should have
1296used a hash, not a list or array, to store your data. Hashes are
1297designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1298
109f0441 1299That being said, there are several ways to approach this. In Perl 5.10
1300and later, you can use the smart match operator to check that an item is
1301contained in an array or a hash:
1302
1303 use 5.010;
1304
1305 if( $item ~~ @array )
1306 {
1307 say "The array contains $item"
1308 }
1309
1310 if( $item ~~ %hash )
1311 {
1312 say "The hash contains $item"
1313 }
1314
1315With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1316are going to make this query many times over arbitrary string values,
881bdbd4 1317the fastest way is probably to invert the original array and maintain a
109f0441 1318hash whose keys are the first array's values:
68dc0745 1319
ac9dac7f 1320 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1321 %is_blue = ();
1322 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1323
ac9dac7f 1324Now you can check whether C<$is_blue{$some_color}>. It might have
1325been a good idea to keep the blues all in a hash in the first place.
68dc0745 1326
1327If the values are all small integers, you could use a simple indexed
1328array. This kind of an array will take up less space:
1329
ac9dac7f 1330 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1331 @is_tiny_prime = ();
1332 for (@primes) { $is_tiny_prime[$_] = 1 }
1333 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1334
1335Now you check whether $is_tiny_prime[$some_number].
1336
1337If the values in question are integers instead of strings, you can save
1338quite a lot of space by using bit strings instead:
1339
ac9dac7f 1340 @articles = ( 1..10, 150..2000, 2017 );
1341 undef $read;
1342 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1343
1344Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1345
9e72e4c6 1346These methods guarantee fast individual tests but require a re-organization
1347of the original list or array. They only pay off if you have to test
1348multiple values against the same array.
68dc0745 1349
ac9dac7f 1350If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1351the function C<first> for this purpose. It works by stopping once it
c195e131 1352finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1353looks like this subroutine:
68dc0745 1354
9e72e4c6 1355 sub first (&@) {
1356 my $code = shift;
1357 foreach (@_) {
1358 return $_ if &{$code}();
1359 }
1360 undef;
1361 }
68dc0745 1362
9e72e4c6 1363If speed is of little concern, the common idiom uses grep in scalar context
1364(which returns the number of items that passed its condition) to traverse the
1365entire list. This does have the benefit of telling you how many matches it
1366found, though.
68dc0745 1367
9e72e4c6 1368 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1369
9e72e4c6 1370If you want to actually extract the matching elements, simply use grep in
1371list context.
68dc0745 1372
9e72e4c6 1373 my @matches = grep $_ eq $whatever, @array;
58103a2e 1374
68dc0745 1375=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1376
ac9dac7f 1377Use a hash. Here's code to do both and more. It assumes that each
1378element is unique in a given array:
68dc0745 1379
ac9dac7f 1380 @union = @intersection = @difference = ();
1381 %count = ();
1382 foreach $element (@array1, @array2) { $count{$element}++ }
1383 foreach $element (keys %count) {
1384 push @union, $element;
1385 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1386 }
68dc0745 1387
ac9dac7f 1388Note that this is the I<symmetric difference>, that is, all elements
1389in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1390
65acb1b1 1391=head2 How do I test whether two arrays or hashes are equal?
1392
109f0441 1393With Perl 5.10 and later, the smart match operator can give you the answer
1394with the least amount of work:
1395
1396 use 5.010;
1397
1398 if( @array1 ~~ @array2 )
1399 {
1400 say "The arrays are the same";
1401 }
1402
1403 if( %hash1 ~~ %hash2 ) # doesn't check values!
1404 {
1405 say "The hash keys are the same";
1406 }
1407
ac9dac7f 1408The following code works for single-level arrays. It uses a
1409stringwise comparison, and does not distinguish defined versus
1410undefined empty strings. Modify if you have other needs.
65acb1b1 1411
ac9dac7f 1412 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1413
ac9dac7f 1414 sub compare_arrays {
1415 my ($first, $second) = @_;
1416 no warnings; # silence spurious -w undef complaints
1417 return 0 unless @$first == @$second;
1418 for (my $i = 0; $i < @$first; $i++) {
1419 return 0 if $first->[$i] ne $second->[$i];
1420 }
1421 return 1;
1422 }
65acb1b1 1423
1424For multilevel structures, you may wish to use an approach more
ac9dac7f 1425like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1426
ac9dac7f 1427 use FreezeThaw qw(cmpStr);
1428 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1429
ac9dac7f 1430 printf "a and b contain %s arrays\n",
1431 cmpStr(\@a, \@b) == 0
1432 ? "the same"
1433 : "different";
65acb1b1 1434
ac9dac7f 1435This approach also works for comparing hashes. Here we'll demonstrate
1436two different answers:
65acb1b1 1437
ac9dac7f 1438 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1439
ac9dac7f 1440 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1441 $a{EXTRA} = \%b;
1442 $b{EXTRA} = \%a;
65acb1b1 1443
ac9dac7f 1444 printf "a and b contain %s hashes\n",
65acb1b1 1445 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1446
ac9dac7f 1447 printf "a and b contain %s hashes\n",
65acb1b1 1448 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1449
1450
1451The first reports that both those the hashes contain the same data,
1452while the second reports that they do not. Which you prefer is left as
1453an exercise to the reader.
1454
68dc0745 1455=head2 How do I find the first array element for which a condition is true?
1456
49d635f9 1457To find the first array element which satisfies a condition, you can
ac9dac7f 1458use the C<first()> function in the C<List::Util> module, which comes
1459with Perl 5.8. This example finds the first element that contains
1460"Perl".
49d635f9 1461
1462 use List::Util qw(first);
197aec24 1463
49d635f9 1464 my $element = first { /Perl/ } @array;
197aec24 1465
ac9dac7f 1466If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9 1467same thing. Once you find the element, you stop the loop with last.
1468
1469 my $found;
ac9dac7f 1470 foreach ( @array ) {
6670e5e7 1471 if( /Perl/ ) { $found = $_; last }
49d635f9 1472 }
1473
1474If you want the array index, you can iterate through the indices
1475and check the array element at each index until you find one
1476that satisfies the condition.
1477
197aec24 1478 my( $found, $index ) = ( undef, -1 );
ac9dac7f 1479 for( $i = 0; $i < @array; $i++ ) {
1480 if( $array[$i] =~ /Perl/ ) {
6670e5e7 1481 $found = $array[$i];
1482 $index = $i;
1483 last;
1484 }
1485 }
68dc0745 1486
1487=head2 How do I handle linked lists?
1488
1489In general, you usually don't need a linked list in Perl, since with
ac9dac7f 1490regular arrays, you can push and pop or shift and unshift at either
1491end, or you can use splice to add and/or remove arbitrary number of
ac003c96 1492elements at arbitrary points. Both pop and shift are O(1)
ac9dac7f 1493operations on Perl's dynamic arrays. In the absence of shifts and
1494pops, push in general needs to reallocate on the order every log(N)
1495times, and unshift will need to copy pointers each time.
68dc0745 1496
1497If you really, really wanted, you could use structures as described in
ac9dac7f 1498L<perldsc> or L<perltoot> and do just what the algorithm book tells
1499you to do. For example, imagine a list node like this:
65acb1b1 1500
ac9dac7f 1501 $node = {
1502 VALUE => 42,
1503 LINK => undef,
1504 };
65acb1b1 1505
1506You could walk the list this way:
1507
ac9dac7f 1508 print "List: ";
1509 for ($node = $head; $node; $node = $node->{LINK}) {
1510 print $node->{VALUE}, " ";
1511 }
1512 print "\n";
65acb1b1 1513
a6dd486b 1514You could add to the list this way:
65acb1b1 1515
ac9dac7f 1516 my ($head, $tail);
1517 $tail = append($head, 1); # grow a new head
1518 for $value ( 2 .. 10 ) {
1519 $tail = append($tail, $value);
1520 }
65acb1b1 1521
ac9dac7f 1522 sub append {
1523 my($list, $value) = @_;
1524 my $node = { VALUE => $value };
1525 if ($list) {
1526 $node->{LINK} = $list->{LINK};
1527 $list->{LINK} = $node;
1528 }
1529 else {
1530 $_[0] = $node; # replace caller's version
1531 }
1532 return $node;
1533 }
65acb1b1 1534
1535But again, Perl's built-in are virtually always good enough.
68dc0745 1536
1537=head2 How do I handle circular lists?
109f0441 1538X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1539X<cycle> X<modulus>
68dc0745 1540
109f0441 1541(contributed by brian d foy)
1542
589a5df2 1543If you want to cycle through an array endlessly, you can increment the
109f0441 1544index modulo the number of elements in the array:
68dc0745 1545
109f0441 1546 my @array = qw( a b c );
1547 my $i = 0;
1548
1549 while( 1 ) {
1550 print $array[ $i++ % @array ], "\n";
1551 last if $i > 20;
1552 }
ac9dac7f 1553
109f0441 1554You can also use C<Tie::Cycle> to use a scalar that always has the
1555next element of the circular array:
ac9dac7f 1556
1557 use Tie::Cycle;
1558
1559 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1560
1561 print $cycle; # FFFFFF
1562 print $cycle; # 000000
1563 print $cycle; # FFFF00
68dc0745 1564
109f0441 1565The C<Array::Iterator::Circular> creates an iterator object for
1566circular arrays:
1567
1568 use Array::Iterator::Circular;
1569
1570 my $color_iterator = Array::Iterator::Circular->new(
1571 qw(red green blue orange)
1572 );
1573
1574 foreach ( 1 .. 20 ) {
1575 print $color_iterator->next, "\n";
1576 }
1577
68dc0745 1578=head2 How do I shuffle an array randomly?
1579
45bbf655 1580If you either have Perl 5.8.0 or later installed, or if you have
1581Scalar-List-Utils 1.03 or later installed, you can say:
1582
ac9dac7f 1583 use List::Util 'shuffle';
45bbf655 1584
1585 @shuffled = shuffle(@list);
1586
f05bbc40 1587If not, you can use a Fisher-Yates shuffle.
5a964f20 1588
ac9dac7f 1589 sub fisher_yates_shuffle {
1590 my $deck = shift; # $deck is a reference to an array
109f0441 1591 return unless @$deck; # must not be empty!
1592
ac9dac7f 1593 my $i = @$deck;
1594 while (--$i) {
1595 my $j = int rand ($i+1);
1596 @$deck[$i,$j] = @$deck[$j,$i];
1597 }
1598 }
5a964f20 1599
ac9dac7f 1600 # shuffle my mpeg collection
1601 #
1602 my @mpeg = <audio/*/*.mp3>;
1603 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1604 print @mpeg;
5a964f20 1605
45bbf655 1606Note that the above implementation shuffles an array in place,
ac9dac7f 1607unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655 1608a new shuffled list.
1609
d92eb7b0 1610You've probably seen shuffling algorithms that work using splice,
a6dd486b 1611randomly picking another element to swap the current element with
68dc0745 1612
ac9dac7f 1613 srand;
1614 @new = ();
1615 @old = 1 .. 10; # just a demo
1616 while (@old) {
1617 push(@new, splice(@old, rand @old, 1));
1618 }
68dc0745 1619
ac9dac7f 1620This is bad because splice is already O(N), and since you do it N
1621times, you just invented a quadratic algorithm; that is, O(N**2).
1622This does not scale, although Perl is so efficient that you probably
1623won't notice this until you have rather largish arrays.
68dc0745 1624
1625=head2 How do I process/modify each element of an array?
1626
1627Use C<for>/C<foreach>:
1628
ac9dac7f 1629 for (@lines) {
6670e5e7 1630 s/foo/bar/; # change that word
1631 tr/XZ/ZX/; # swap those letters
ac9dac7f 1632 }
68dc0745 1633
1634Here's another; let's compute spherical volumes:
1635
ac9dac7f 1636 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7 1637 $_ **= 3;
1638 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1639 }
197aec24 1640
ac9dac7f 1641which can also be done with C<map()> which is made to transform
49d635f9 1642one list into another:
1643
1644 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1645
76817d6d 1646If you want to do the same thing to modify the values of the
1647hash, you can use the C<values> function. As of Perl 5.6
1648the values are not copied, so if you modify $orbit (in this
1649case), you modify the value.
5a964f20 1650
ac9dac7f 1651 for $orbit ( values %orbits ) {
6670e5e7 1652 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1653 }
818c4caa 1654
76817d6d 1655Prior to perl 5.6 C<values> returned copies of the values,
1656so older perl code often contains constructions such as
1657C<@orbits{keys %orbits}> instead of C<values %orbits> where
1658the hash is to be modified.
818c4caa 1659
68dc0745 1660=head2 How do I select a random element from an array?
1661
ac9dac7f 1662Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1663
ac9dac7f 1664 $index = rand @array;
1665 $element = $array[$index];
68dc0745 1666
793f5136 1667Or, simply:
ac9dac7f 1668
1669 my $element = $array[ rand @array ];
5a964f20 1670
68dc0745 1671=head2 How do I permute N elements of a list?
c195e131 1672X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1673X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1674
c195e131 1675Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1676array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1677written in XS code and is very efficient:
49d635f9 1678
1679 use Algorithm::Permute;
c195e131 1680
49d635f9 1681 my @array = 'a'..'d';
1682 my $p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1683
49d635f9 1684 while (my @perm = $p_iterator->next) {
1685 print "next permutation: (@perm)\n";
ac9dac7f 1686 }
49d635f9 1687
197aec24 1688For even faster execution, you could do:
1689
ac9dac7f 1690 use Algorithm::Permute;
c195e131 1691
ac9dac7f 1692 my @array = 'a'..'d';
c195e131 1693
ac9dac7f 1694 Algorithm::Permute::permute {
1695 print "next permutation: (@array)\n";
1696 } @array;
197aec24 1697
c195e131 1698Here's a little program that generates all permutations of all the
1699words on each line of input. The algorithm embodied in the
1700C<permute()> function is discussed in Volume 4 (still unpublished) of
1701Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9 1702
1703 #!/usr/bin/perl -n
ac003c96 1704 # Fischer-Krause ordered permutation generator
49d635f9 1705
1706 sub permute (&@) {
1707 my $code = shift;
1708 my @idx = 0..$#_;
1709 while ( $code->(@_[@idx]) ) {
1710 my $p = $#idx;
1711 --$p while $idx[$p-1] > $idx[$p];
1712 my $q = $p or return;
1713 push @idx, reverse splice @idx, $p;
1714 ++$q while $idx[$p-1] > $idx[$q];
1715 @idx[$p-1,$q]=@idx[$q,$p-1];
1716 }
68dc0745 1717 }
68dc0745 1718
c195e131 1719 permute { print "@_\n" } split;
1720
1721The C<Algorithm::Loops> module also provides the C<NextPermute> and
1722C<NextPermuteNum> functions which efficiently find all unique permutations
1723of an array, even if it contains duplicate values, modifying it in-place:
1724if its elements are in reverse-sorted order then the array is reversed,
1725making it sorted, and it returns false; otherwise the next
1726permutation is returned.
1727
1728C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1729you can enumerate all the permutations of C<0..9> like this:
1730
1731 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1732
c195e131 1733 my @list= 0..9;
1734 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1735
68dc0745 1736=head2 How do I sort an array by (anything)?
1737
1738Supply a comparison function to sort() (described in L<perlfunc/sort>):
1739
ac9dac7f 1740 @list = sort { $a <=> $b } @list;
68dc0745 1741
1742The default sort function is cmp, string comparison, which would
c47ff5f1 1743sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1744the numerical comparison operator.
1745
1746If you have a complicated function needed to pull out the part you
1747want to sort on, then don't do it inside the sort function. Pull it
1748out first, because the sort BLOCK can be called many times for the
1749same element. Here's an example of how to pull out the first word
1750after the first number on each item, and then sort those words
1751case-insensitively.
1752
ac9dac7f 1753 @idx = ();
1754 for (@data) {
1755 ($item) = /\d+\s*(\S+)/;
1756 push @idx, uc($item);
1757 }
1758 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1759
a6dd486b 1760which could also be written this way, using a trick
68dc0745 1761that's come to be known as the Schwartzian Transform:
1762
ac9dac7f 1763 @sorted = map { $_->[0] }
1764 sort { $a->[1] cmp $b->[1] }
1765 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1766
1767If you need to sort on several fields, the following paradigm is useful.
1768
ac9dac7f 1769 @sorted = sort {
1770 field1($a) <=> field1($b) ||
1771 field2($a) cmp field2($b) ||
1772 field3($a) cmp field3($b)
1773 } @data;
68dc0745 1774
1775This can be conveniently combined with precalculation of keys as given
1776above.
1777
379e39d7 1778See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1779To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1780more about this approach.
68dc0745 1781
ac9dac7f 1782See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1783
1784=head2 How do I manipulate arrays of bits?
1785
ac9dac7f 1786Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1787operations.
1788
109f0441 1789For example, you don't have to store individual bits in an array
1790(which would mean that you're wasting a lot of space). To convert an
1791array of bits to a string, use C<vec()> to set the right bits. This
1792sets C<$vec> to have bit N set only if C<$ints[N]> was set:
ac9dac7f 1793
109f0441 1794 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1795 $vec = '';
109f0441 1796 foreach( 0 .. $#ints ) {
1797 vec($vec,$_,1) = 1 if $ints[$_];
1798 }
ac9dac7f 1799
109f0441 1800The string C<$vec> only takes up as many bits as it needs. For
1801instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1802bytes to store them (not counting the scalar variable overhead).
1803
1804Here's how, given a vector in C<$vec>, you can get those bits into
1805your C<@ints> array:
ac9dac7f 1806
1807 sub bitvec_to_list {
1808 my $vec = shift;
1809 my @ints;
1810 # Find null-byte density then select best algorithm
1811 if ($vec =~ tr/\0// / length $vec > 0.95) {
1812 use integer;
1813 my $i;
1814
1815 # This method is faster with mostly null-bytes
1816 while($vec =~ /[^\0]/g ) {
1817 $i = -9 + 8 * pos $vec;
1818 push @ints, $i if vec($vec, ++$i, 1);
1819 push @ints, $i if vec($vec, ++$i, 1);
1820 push @ints, $i if vec($vec, ++$i, 1);
1821 push @ints, $i if vec($vec, ++$i, 1);
1822 push @ints, $i if vec($vec, ++$i, 1);
1823 push @ints, $i if vec($vec, ++$i, 1);
1824 push @ints, $i if vec($vec, ++$i, 1);
1825 push @ints, $i if vec($vec, ++$i, 1);
1826 }
1827 }
1828 else {
1829 # This method is a fast general algorithm
1830 use integer;
1831 my $bits = unpack "b*", $vec;
1832 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1833 push @ints, pos $bits while($bits =~ /1/g);
1834 }
1835
1836 return \@ints;
1837 }
68dc0745 1838
1839This method gets faster the more sparse the bit vector is.
1840(Courtesy of Tim Bunce and Winfried Koenig.)
1841
76817d6d 1842You can make the while loop a lot shorter with this suggestion
1843from Benjamin Goldberg:
1844
1845 while($vec =~ /[^\0]+/g ) {
ac9dac7f 1846 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1847 }
76817d6d 1848
ac9dac7f 1849Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1850
ac9dac7f 1851 $vector = Bit::Vector->new($num_of_bits);
1852 $vector->Index_List_Store(@ints);
1853 @ints = $vector->Index_List_Read();
cc30d1a7 1854
ac9dac7f 1855C<Bit::Vector> provides efficient methods for bit vector, sets of
1856small integers and "big int" math.
cc30d1a7 1857
1858Here's a more extensive illustration using vec():
65acb1b1 1859
ac9dac7f 1860 # vec demo
1861 $vector = "\xff\x0f\xef\xfe";
1862 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1863 unpack("N", $vector), "\n";
ac9dac7f 1864 $is_set = vec($vector, 23, 1);
1865 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1866 pvec($vector);
65acb1b1 1867
ac9dac7f 1868 set_vec(1,1,1);
1869 set_vec(3,1,1);
1870 set_vec(23,1,1);
1871
1872 set_vec(3,1,3);
1873 set_vec(3,2,3);
1874 set_vec(3,4,3);
1875 set_vec(3,4,7);
1876 set_vec(3,8,3);
1877 set_vec(3,8,7);
1878
1879 set_vec(0,32,17);
1880 set_vec(1,32,17);
1881
1882 sub set_vec {
1883 my ($offset, $width, $value) = @_;
1884 my $vector = '';
1885 vec($vector, $offset, $width) = $value;
1886 print "offset=$offset width=$width value=$value\n";
1887 pvec($vector);
1888 }
65acb1b1 1889
ac9dac7f 1890 sub pvec {
1891 my $vector = shift;
1892 my $bits = unpack("b*", $vector);
1893 my $i = 0;
1894 my $BASE = 8;
1895
1896 print "vector length in bytes: ", length($vector), "\n";
1897 @bytes = unpack("A8" x length($vector), $bits);
1898 print "bits are: @bytes\n\n";
1899 }
65acb1b1 1900
68dc0745 1901=head2 Why does defined() return true on empty arrays and hashes?
1902
65acb1b1 1903The short story is that you should probably only use defined on scalars or
1904functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1905in the 5.004 release or later of Perl for more detail.
68dc0745 1906
1907=head1 Data: Hashes (Associative Arrays)
1908
1909=head2 How do I process an entire hash?
1910
ee891a00 1911(contributed by brian d foy)
1912
1913There are a couple of ways that you can process an entire hash. You
1914can get a list of keys, then go through each key, or grab a one
1915key-value pair at a time.
68dc0745 1916
ee891a00 1917To go through all of the keys, use the C<keys> function. This extracts
1918all of the keys of the hash and gives them back to you as a list. You
1919can then get the value through the particular key you're processing:
1920
1921 foreach my $key ( keys %hash ) {
1922 my $value = $hash{$key}
1923 ...
ac9dac7f 1924 }
68dc0745 1925
ee891a00 1926Once you have the list of keys, you can process that list before you
109f0441 1927process the hash elements. For instance, you can sort the keys so you
ee891a00 1928can process them in lexical order:
1929
1930 foreach my $key ( sort keys %hash ) {
1931 my $value = $hash{$key}
1932 ...
1933 }
1934
1935Or, you might want to only process some of the items. If you only want
1936to deal with the keys that start with C<text:>, you can select just
1937those using C<grep>:
1938
1939 foreach my $key ( grep /^text:/, keys %hash ) {
1940 my $value = $hash{$key}
1941 ...
1942 }
1943
1944If the hash is very large, you might not want to create a long list of
109f0441 1945keys. To save some memory, you can grab one key-value pair at a time using
ee891a00 1946C<each()>, which returns a pair you haven't seen yet:
1947
1948 while( my( $key, $value ) = each( %hash ) ) {
1949 ...
1950 }
1951
1952The C<each> operator returns the pairs in apparently random order, so if
1953ordering matters to you, you'll have to stick with the C<keys> method.
1954
1955The C<each()> operator can be a bit tricky though. You can't add or
1956delete keys of the hash while you're using it without possibly
1957skipping or re-processing some pairs after Perl internally rehashes
1958all of the elements. Additionally, a hash has only one iterator, so if
1959you use C<keys>, C<values>, or C<each> on the same hash, you can reset
1960the iterator and mess up your processing. See the C<each> entry in
1961L<perlfunc> for more details.
68dc0745 1962
109f0441 1963=head2 How do I merge two hashes?
1964X<hash> X<merge> X<slice, hash>
1965
1966(contributed by brian d foy)
1967
1968Before you decide to merge two hashes, you have to decide what to do
1969if both hashes contain keys that are the same and if you want to leave
1970the original hashes as they were.
1971
1972If you want to preserve the original hashes, copy one hash (C<%hash1>)
1973to a new hash (C<%new_hash>), then add the keys from the other hash
1974(C<%hash2> to the new hash. Checking that the key already exists in
1975C<%new_hash> gives you a chance to decide what to do with the
1976duplicates:
1977
1978 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1979
1980 foreach my $key2 ( keys %hash2 )
1981 {
1982 if( exists $new_hash{$key2} )
1983 {
1984 warn "Key [$key2] is in both hashes!";
1985 # handle the duplicate (perhaps only warning)
1986 ...
1987 next;
1988 }
1989 else
1990 {
1991 $new_hash{$key2} = $hash2{$key2};
1992 }
1993 }
1994
1995If you don't want to create a new hash, you can still use this looping
1996technique; just change the C<%new_hash> to C<%hash1>.
1997
1998 foreach my $key2 ( keys %hash2 )
1999 {
2000 if( exists $hash1{$key2} )
2001 {
2002 warn "Key [$key2] is in both hashes!";
2003 # handle the duplicate (perhaps only warning)
2004 ...
2005 next;
2006 }
2007 else
2008 {
2009 $hash1{$key2} = $hash2{$key2};
2010 }
2011 }
2012
2013If you don't care that one hash overwrites keys and values from the other, you
2014could just use a hash slice to add one hash to another. In this case, values
2015from C<%hash2> replace values from C<%hash1> when they have keys in common:
2016
2017 @hash1{ keys %hash2 } = values %hash2;
2018
68dc0745 2019=head2 What happens if I add or remove keys from a hash while iterating over it?
2020
28b41a80 2021(contributed by brian d foy)
d92eb7b0 2022
28b41a80 2023The easy answer is "Don't do that!"
d92eb7b0 2024
28b41a80 2025If you iterate through the hash with each(), you can delete the key
2026most recently returned without worrying about it. If you delete or add
2027other keys, the iterator may skip or double up on them since perl
2028may rearrange the hash table. See the
2029entry for C<each()> in L<perlfunc>.
68dc0745 2030
2031=head2 How do I look up a hash element by value?
2032
2033Create a reverse hash:
2034
ac9dac7f 2035 %by_value = reverse %by_key;
2036 $key = $by_value{$value};
68dc0745 2037
2038That's not particularly efficient. It would be more space-efficient
2039to use:
2040
ac9dac7f 2041 while (($key, $value) = each %by_key) {
2042 $by_value{$value} = $key;
2043 }
68dc0745 2044
d92eb7b0 2045If your hash could have repeated values, the methods above will only find
2046one of the associated keys. This may or may not worry you. If it does
2047worry you, you can always reverse the hash into a hash of arrays instead:
2048
ac9dac7f 2049 while (($key, $value) = each %by_key) {
2050 push @{$key_list_by_value{$value}}, $key;
2051 }
68dc0745 2052
2053=head2 How can I know how many entries are in a hash?
2054
109f0441 2055(contributed by brian d foy)
2056
2057This is very similar to "How do I process an entire hash?", also in
2058L<perlfaq4>, but a bit simpler in the common cases.
2059
2060You can use the C<keys()> built-in function in scalar context to find out
2061have many entries you have in a hash:
68dc0745 2062
109f0441 2063 my $key_count = keys %hash; # must be scalar context!
2064
2065If you want to find out how many entries have a defined value, that's
2066a bit different. You have to check each value. A C<grep> is handy:
2067
2068 my $defined_value_count = grep { defined } values %hash;
68dc0745 2069
109f0441 2070You can use that same structure to count the entries any way that
2071you like. If you want the count of the keys with vowels in them,
2072you just test for that instead:
2073
2074 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2075
2076The C<grep> in scalar context returns the count. If you want the list
2077of matching items, just use it in list context instead:
2078
2079 my @defined_values = grep { defined } values %hash;
2080
2081The C<keys()> function also resets the iterator, which means that you may
197aec24 2082see strange results if you use this between uses of other hash operators
109f0441 2083such as C<each()>.
68dc0745 2084
2085=head2 How do I sort a hash (optionally by value instead of key)?
2086
a05e4845 2087(contributed by brian d foy)
2088
2089To sort a hash, start with the keys. In this example, we give the list of
2090keys to the sort function which then compares them ASCIIbetically (which
2091might be affected by your locale settings). The output list has the keys
2092in ASCIIbetical order. Once we have the keys, we can go through them to
2093create a report which lists the keys in ASCIIbetical order.
2094
2095 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 2096
a05e4845 2097 foreach my $key ( @keys )
2098 {
109f0441 2099 printf "%-20s %6d\n", $key, $hash{$key};
a05e4845 2100 }
2101
58103a2e 2102We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2103comparing the keys, we can compute a value with them and use that
58103a2e 2104value as the comparison.
a05e4845 2105
2106For instance, to make our report order case-insensitive, we use
58103a2e 2107the C<\L> sequence in a double-quoted string to make everything
a05e4845 2108lowercase. The C<sort()> block then compares the lowercased
2109values to determine in which order to put the keys.
2110
2111 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 2112
a05e4845 2113Note: if the computation is expensive or the hash has many elements,
58103a2e 2114you may want to look at the Schwartzian Transform to cache the
a05e4845 2115computation results.
2116
2117If we want to sort by the hash value instead, we use the hash key
2118to look it up. We still get out a list of keys, but this time they
2119are ordered by their value.
2120
2121 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2122
2123From there we can get more complex. If the hash values are the same,
2124we can provide a secondary sort on the hash key.
2125
58103a2e 2126 my @keys = sort {
2127 $hash{$a} <=> $hash{$b}
a05e4845 2128 or
2129 "\L$a" cmp "\L$b"
2130 } keys %hash;
68dc0745 2131
2132=head2 How can I always keep my hash sorted?
ac9dac7f 2133X<hash tie sort DB_File Tie::IxHash>
68dc0745 2134
ac9dac7f 2135You can look into using the C<DB_File> module and C<tie()> using the
2136C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2137Databases">. The C<Tie::IxHash> module from CPAN might also be
2138instructive. Although this does keep your hash sorted, you might not
2139like the slow down you suffer from the tie interface. Are you sure you
2140need to do this? :)
68dc0745 2141
2142=head2 What's the difference between "delete" and "undef" with hashes?
2143
92993692 2144Hashes contain pairs of scalars: the first is the key, the
2145second is the value. The key will be coerced to a string,
2146although the value can be any kind of scalar: string,
ac9dac7f 2147number, or reference. If a key C<$key> is present in
92993692 2148%hash, C<exists($hash{$key})> will return true. The value
2149for a given key can be C<undef>, in which case
2150C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2151will return true. This corresponds to (C<$key>, C<undef>)
2152being in the hash.
68dc0745 2153
589a5df2 2154Pictures help... Here's the C<%hash> table:
68dc0745 2155
2156 keys values
2157 +------+------+
2158 | a | 3 |
2159 | x | 7 |
2160 | d | 0 |
2161 | e | 2 |
2162 +------+------+
2163
2164And these conditions hold
2165
92993692 2166 $hash{'a'} is true
2167 $hash{'d'} is false
2168 defined $hash{'d'} is true
2169 defined $hash{'a'} is true
e9d185f8 2170 exists $hash{'a'} is true (Perl 5 only)
92993692 2171 grep ($_ eq 'a', keys %hash) is true
68dc0745 2172
2173If you now say
2174
92993692 2175 undef $hash{'a'}
68dc0745 2176
2177your table now reads:
2178
2179
2180 keys values
2181 +------+------+
2182 | a | undef|
2183 | x | 7 |
2184 | d | 0 |
2185 | e | 2 |
2186 +------+------+
2187
2188and these conditions now hold; changes in caps:
2189
92993692 2190 $hash{'a'} is FALSE
2191 $hash{'d'} is false
2192 defined $hash{'d'} is true
2193 defined $hash{'a'} is FALSE
e9d185f8 2194 exists $hash{'a'} is true (Perl 5 only)
92993692 2195 grep ($_ eq 'a', keys %hash) is true
68dc0745 2196
2197Notice the last two: you have an undef value, but a defined key!
2198
2199Now, consider this:
2200
92993692 2201 delete $hash{'a'}
68dc0745 2202
2203your table now reads:
2204
2205 keys values
2206 +------+------+
2207 | x | 7 |
2208 | d | 0 |
2209 | e | 2 |
2210 +------+------+
2211
2212and these conditions now hold; changes in caps:
2213
92993692 2214 $hash{'a'} is false
2215 $hash{'d'} is false
2216 defined $hash{'d'} is true
2217 defined $hash{'a'} is false
e9d185f8 2218 exists $hash{'a'} is FALSE (Perl 5 only)
92993692 2219 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 2220
2221See, the whole entry is gone!
2222
2223=head2 Why don't my tied hashes make the defined/exists distinction?
2224
92993692 2225This depends on the tied hash's implementation of EXISTS().
2226For example, there isn't the concept of undef with hashes
2227that are tied to DBM* files. It also means that exists() and
2228defined() do the same thing with a DBM* file, and what they
2229end up doing is not what they do with ordinary hashes.
68dc0745 2230
2231=head2 How do I reset an each() operation part-way through?
2232
fb2fe781 2233(contributed by brian d foy)
2234
2235You can use the C<keys> or C<values> functions to reset C<each>. To
2236simply reset the iterator used by C<each> without doing anything else,
2237use one of them in void context:
2238
2239 keys %hash; # resets iterator, nothing else.
2240 values %hash; # resets iterator, nothing else.
2241
2242See the documentation for C<each> in L<perlfunc>.
68dc0745 2243
2244=head2 How can I get the unique keys from two hashes?
2245
d92eb7b0 2246First you extract the keys from the hashes into lists, then solve
2247the "removing duplicates" problem described above. For example:
68dc0745 2248
ac9dac7f 2249 %seen = ();
2250 for $element (keys(%foo), keys(%bar)) {
2251 $seen{$element}++;
2252 }
2253 @uniq = keys %seen;
68dc0745 2254
2255Or more succinctly:
2256
ac9dac7f 2257 @uniq = keys %{{%foo,%bar}};
68dc0745 2258
2259Or if you really want to save space:
2260
ac9dac7f 2261 %seen = ();
2262 while (defined ($key = each %foo)) {
2263 $seen{$key}++;
2264 }
2265 while (defined ($key = each %bar)) {
2266 $seen{$key}++;
2267 }
2268 @uniq = keys %seen;
68dc0745 2269
2270=head2 How can I store a multidimensional array in a DBM file?
2271
2272Either stringify the structure yourself (no fun), or else
2273get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2274it on top of either DB_File or GDBM_File.
2275
2276=head2 How can I make my hash remember the order I put elements into it?
2277
ac9dac7f 2278Use the C<Tie::IxHash> from CPAN.
68dc0745 2279
ac9dac7f 2280 use Tie::IxHash;
2281
2282 tie my %myhash, 'Tie::IxHash';
2283
2284 for (my $i=0; $i<20; $i++) {
2285 $myhash{$i} = 2*$i;
2286 }
2287
2288 my @keys = keys %myhash;
2289 # @keys = (0,1,2,3,...)
46fc3d4c 2290
68dc0745 2291=head2 Why does passing a subroutine an undefined element in a hash create it?
2292
109f0441 2293(contributed by brian d foy)
2294
2295Are you using a really old version of Perl?
2296
2297Normally, accessing a hash key's value for a nonexistent key will
2298I<not> create the key.
2299
2300 my %hash = ();
2301 my $value = $hash{ 'foo' };
2302 print "This won't print\n" if exists $hash{ 'foo' };
2303
2304Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2305Since you could assign directly to C<$_[0]>, Perl had to be ready to
2306make that assignment so it created the hash key ahead of time:
2307
2308 my_sub( $hash{ 'foo' } );
2309 print "This will print before 5.004\n" if exists $hash{ 'foo' };
68dc0745 2310
109f0441 2311 sub my_sub {
2312 # $_[0] = 'bar'; # create hash key in case you do this
2313 1;
2314 }
2315
2316Since Perl 5.004, however, this situation is a special case and Perl
2317creates the hash key only when you make the assignment:
68dc0745 2318
109f0441 2319 my_sub( $hash{ 'foo' } );
2320 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2321
2322 sub my_sub {
2323 $_[0] = 'bar';
2324 }
68dc0745 2325
109f0441 2326However, if you want the old behavior (and think carefully about that
2327because it's a weird side effect), you can pass a hash slice instead.
2328Perl 5.004 didn't make this a special case:
68dc0745 2329
109f0441 2330 my_sub( @hash{ qw/foo/ } );
68dc0745 2331
fc36a67e 2332=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2333
65acb1b1 2334Usually a hash ref, perhaps like this:
2335
ac9dac7f 2336 $record = {
2337 NAME => "Jason",
2338 EMPNO => 132,
2339 TITLE => "deputy peon",
2340 AGE => 23,
2341 SALARY => 37_000,
2342 PALS => [ "Norbert", "Rhys", "Phineas"],
2343 };
65acb1b1 2344
2345References are documented in L<perlref> and the upcoming L<perlreftut>.
2346Examples of complex data structures are given in L<perldsc> and
2347L<perllol>. Examples of structures and object-oriented classes are
2348in L<perltoot>.
68dc0745 2349
2350=head2 How can I use a reference as a hash key?
2351
109f0441 2352(contributed by brian d foy and Ben Morrow)
9e72e4c6 2353
2354Hash keys are strings, so you can't really use a reference as the key.
2355When you try to do that, perl turns the reference into its stringified
ac9dac7f 2356form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2357back the reference from the stringified form, at least without doing
109f0441 2358some extra work on your own.
2359
2360Remember that the entry in the hash will still be there even if
2361the referenced variable goes out of scope, and that it is entirely
2362possible for Perl to subsequently allocate a different variable at
2363the same address. This will mean a new variable might accidentally
2364be associated with the value for an old.
2365
2366If you have Perl 5.10 or later, and you just want to store a value
2367against the reference for lookup later, you can use the core
2368Hash::Util::Fieldhash module. This will also handle renaming the
2369keys if you use multiple threads (which causes all variables to be
2370reallocated at new addresses, changing their stringification), and
2371garbage-collecting the entries when the referenced variable goes out
2372of scope.
2373
2374If you actually need to be able to get a real reference back from
2375each hash entry, you can use the Tie::RefHash module, which does the
2376required work for you.
68dc0745 2377
2378=head1 Data: Misc
2379
2380=head2 How do I handle binary data correctly?
2381
ac9dac7f 2382Perl is binary clean, so it can handle binary data just fine.
e573f903 2383On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f 2384files to avoid conversions for line endings. In general, you should
2385use C<binmode> any time you want to work with binary data.
68dc0745 2386
ac9dac7f 2387Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2388
ac9dac7f 2389If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2390If you want to deal with multibyte characters, however, there are
68dc0745 2391some gotchas. See the section on Regular Expressions.
2392
2393=head2 How do I determine whether a scalar is a number/whole/integer/float?
2394
2395Assuming that you don't care about IEEE notations like "NaN" or
2396"Infinity", you probably just want to use a regular expression.
2397
ac9dac7f 2398 if (/\D/) { print "has nondigits\n" }
2399 if (/^\d+$/) { print "is a whole number\n" }
2400 if (/^-?\d+$/) { print "is an integer\n" }
2401 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2402 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2403 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2404 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2405 { print "a C float\n" }
68dc0745 2406
f0d19b68 2407There are also some commonly used modules for the task.
2408L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f 2409internal function C<looks_like_number> for determining whether a
2410variable looks like a number. L<Data::Types> exports functions that
2411validate data types using both the above and other regular
2412expressions. Thirdly, there is C<Regexp::Common> which has regular
2413expressions to match various types of numbers. Those three modules are
2414available from the CPAN.
f0d19b68 2415
2416If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f 2417function. Its semantics are somewhat cumbersome, so here's a
2418C<getnum> wrapper function for more convenient access. This function
2419takes a string and returns the number it found, or C<undef> for input
2420that isn't a C float. The C<is_numeric> function is a front end to
2421C<getnum> if you just want to say, "Is this a float?"
2422
2423 sub getnum {
2424 use POSIX qw(strtod);
2425 my $str = shift;
2426 $str =~ s/^\s+//;
2427 $str =~ s/\s+$//;
2428 $! = 0;
2429 my($num, $unparsed) = strtod($str);
2430 if (($str eq '') || ($unparsed != 0) || $!) {
2431 return undef;
2432 }
2433 else {
2434 return $num;
2435 }
2436 }
5a964f20 2437
ac9dac7f 2438 sub is_numeric { defined getnum($_[0]) }
5a964f20 2439
f0d19b68 2440Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f 2441instead. The C<POSIX> module (part of the standard Perl distribution)
2442provides the C<strtod> and C<strtol> for converting strings to double
2443and longs, respectively.
68dc0745 2444
2445=head2 How do I keep persistent data across program calls?
2446
2447For some specific applications, you can use one of the DBM modules.
ac9dac7f 2448See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2449or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2450of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2451and C<retrieve> functions:
65acb1b1 2452
ac9dac7f 2453 use Storable;
2454 store(\%hash, "filename");
65acb1b1 2455
ac9dac7f 2456 # later on...
2457 $href = retrieve("filename"); # by ref
2458 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2459
2460=head2 How do I print out or copy a recursive data structure?
2461
ac9dac7f 2462The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2463for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a 24645.8 release of Perl), provides a function called C<dclone> that recursively
2465copies its argument.
65acb1b1 2466
ac9dac7f 2467 use Storable qw(dclone);
2468 $r2 = dclone($r1);
68dc0745 2469
ac9dac7f 2470Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1 2471It will be deeply copied. Because C<dclone> takes and returns references,
2472you'd have to add extra punctuation if you had a hash of arrays that
2473you wanted to copy.
68dc0745 2474
ac9dac7f 2475 %newhash = %{ dclone(\%oldhash) };
68dc0745 2476
2477=head2 How do I define methods for every class/object?
2478
109f0441 2479(contributed by Ben Morrow)
2480
2481You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2482be very careful to consider the consequences of doing this: adding
2483methods to every object is very likely to have unintended
2484consequences. If possible, it would be better to have all your object
2485inherit from some common base class, or to use an object system like
2486Moose that supports roles.
68dc0745 2487
2488=head2 How do I verify a credit card checksum?
2489
ac9dac7f 2490Get the C<Business::CreditCard> module from CPAN.
68dc0745 2491
65acb1b1 2492=head2 How do I pack arrays of doubles or floats for XS code?
2493
109f0441 2494The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2495If you're doing a lot of float or double processing, consider using
ac9dac7f 2496the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2497
109f0441 2498See L<http://search.cpan.org/dist/PGPLOT> for the code.
2499
500071f4 2500=head1 REVISION
2501
109f0441 2502Revision: $Revision$
500071f4 2503
109f0441 2504Date: $Date$
500071f4 2505
2506See L<perlfaq> for source control details and availability.
2507
68dc0745 2508=head1 AUTHOR AND COPYRIGHT
2509
109f0441 2510Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
7678cced 2511other authors as noted. All rights reserved.
5a964f20 2512
5a7beb56 2513This documentation is free; you can redistribute it and/or modify it
2514under the same terms as Perl itself.
5a964f20 2515
2516Irrespective of its distribution, all code examples in this file
2517are hereby placed into the public domain. You are permitted and
2518encouraged to use this code in your own programs for fun
2519or for profit as you see fit. A simple comment in the code giving
2520credit would be courteous but is not required.