Upgrade to PathTools 3.08
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
571e049f 3perlfaq4 - Data Manipulation ($Revision: 1.61 $, $Date: 2005/03/11 16:27:53 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
49d635f9 14Internally, your computer represents floating-point numbers
15in binary. Digital (as in powers of two) computers cannot
16store all numbers exactly. Some real numbers lose precision
17in the process. This is a problem with how computers store
18numbers and affects all computer languages, not just Perl.
46fc3d4c 19
49d635f9 20L<perlnumber> show the gory details of number
21representations and conversions.
22
23To limit the number of decimal places in your numbers, you
24can use the printf or sprintf function. See the
197aec24 25L<"Floating Point Arithmetic"|perlop> for more details.
49d635f9 26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e 31=head2 Why is int() broken?
32
33Your int() is most probably working just fine. It's the numbers that
34aren't quite what you think.
35
36First, see the above item "Why am I getting long decimals
37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
42 print int(0.6/0.2-2), "\n";
43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
49d635f9 51Perl only understands octal and hex numbers as such when they occur as
52literals in your program. Octal literals in perl must start with a
53leading "0" and hexadecimal literals must start with a leading "0x".
54If they are read in from somewhere and assigned, no automatic
55conversion takes place. You must explicitly use oct() or hex() if you
56want the values converted to decimal. oct() interprets hex ("0x350"),
57octal ("0350" or even without the leading "0", like "377") and binary
58("0b1010") numbers, while hex() only converts hexadecimal ones, with
59or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
33ce146f 60The inverse mapping from decimal to octal can be done with either the
49d635f9 61"%o" or "%O" sprintf() formats.
68dc0745 62
63This problem shows up most often when people try using chmod(), mkdir(),
197aec24 64umask(), or sysopen(), which by widespread tradition typically take
33ce146f 65permissions in octal.
68dc0745 66
33ce146f 67 chmod(644, $file); # WRONG
68dc0745 68 chmod(0644, $file); # right
69
197aec24 70Note the mistake in the first line was specifying the decimal literal
33ce146f 71644, rather than the intended octal literal 0644. The problem can
72be seen with:
73
434f7166 74 printf("%#o",644); # prints 01204
33ce146f 75
76Surely you had not intended C<chmod(01204, $file);> - did you? If you
77want to use numeric literals as arguments to chmod() et al. then please
197aec24 78try to express them as octal constants, that is with a leading zero and
33ce146f 79with the following digits restricted to the set 0..7.
80
65acb1b1 81=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 82
92c2ed05 83Remember that int() merely truncates toward 0. For rounding to a
84certain number of digits, sprintf() or printf() is usually the easiest
85route.
86
87 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 88
87275199 89The POSIX module (part of the standard Perl distribution) implements
68dc0745 90ceil(), floor(), and a number of other mathematical and trigonometric
91functions.
92
92c2ed05 93 use POSIX;
94 $ceil = ceil(3.5); # 4
95 $floor = floor(3.5); # 3
96
a6dd486b 97In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
87275199 98module. With 5.004, the Math::Trig module (part of the standard Perl
46fc3d4c 99distribution) implements the trigonometric functions. Internally it
100uses the Math::Complex module and some functions can break out from
101the real axis into the complex plane, for example the inverse sine of
1022.
68dc0745 103
104Rounding in financial applications can have serious implications, and
105the rounding method used should be specified precisely. In these
106cases, it probably pays not to trust whichever system rounding is
107being used by Perl, but to instead implement the rounding function you
108need yourself.
109
65acb1b1 110To see why, notice how you'll still have an issue on half-way-point
111alternation:
112
113 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
114
197aec24 115 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
65acb1b1 116 0.8 0.8 0.9 0.9 1.0 1.0
117
118Don't blame Perl. It's the same as in C. IEEE says we have to do this.
119Perl numbers whose absolute values are integers under 2**31 (on 32 bit
120machines) will work pretty much like mathematical integers. Other numbers
121are not guaranteed.
122
6f0efb17 123=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 124
6761e064 125As always with Perl there is more than one way to do it. Below
126are a few examples of approaches to making common conversions
127between number representations. This is intended to be representational
128rather than exhaustive.
68dc0745 129
6761e064 130Some of the examples below use the Bit::Vector module from CPAN.
131The reason you might choose Bit::Vector over the perl built in
132functions is that it works with numbers of ANY size, that it is
133optimized for speed on some operations, and for at least some
134programmers the notation might be familiar.
d92eb7b0 135
818c4caa 136=over 4
137
138=item How do I convert hexadecimal into decimal
d92eb7b0 139
6761e064 140Using perl's built in conversion of 0x notation:
141
6f0efb17 142 $dec = 0xDEADBEEF;
7207e29d 143
6761e064 144Using the hex function:
145
6f0efb17 146 $dec = hex("DEADBEEF");
6761e064 147
148Using pack:
149
6f0efb17 150 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 151
152Using the CPAN module Bit::Vector:
153
154 use Bit::Vector;
155 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
156 $dec = $vec->to_Dec();
157
818c4caa 158=item How do I convert from decimal to hexadecimal
6761e064 159
04d666b1 160Using sprintf:
6761e064 161
6f0efb17 162 $hex = sprintf("%X", 3735928559); # upper case A-F
163 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 164
6f0efb17 165Using unpack:
6761e064 166
167 $hex = unpack("H*", pack("N", 3735928559));
168
6f0efb17 169Using Bit::Vector:
6761e064 170
171 use Bit::Vector;
172 $vec = Bit::Vector->new_Dec(32, -559038737);
173 $hex = $vec->to_Hex();
174
175And Bit::Vector supports odd bit counts:
176
177 use Bit::Vector;
178 $vec = Bit::Vector->new_Dec(33, 3735928559);
179 $vec->Resize(32); # suppress leading 0 if unwanted
180 $hex = $vec->to_Hex();
181
818c4caa 182=item How do I convert from octal to decimal
6761e064 183
184Using Perl's built in conversion of numbers with leading zeros:
185
6f0efb17 186 $dec = 033653337357; # note the leading 0!
6761e064 187
188Using the oct function:
189
6f0efb17 190 $dec = oct("33653337357");
6761e064 191
192Using Bit::Vector:
193
194 use Bit::Vector;
195 $vec = Bit::Vector->new(32);
196 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
197 $dec = $vec->to_Dec();
198
818c4caa 199=item How do I convert from decimal to octal
6761e064 200
201Using sprintf:
202
203 $oct = sprintf("%o", 3735928559);
204
6f0efb17 205Using Bit::Vector:
6761e064 206
207 use Bit::Vector;
208 $vec = Bit::Vector->new_Dec(32, -559038737);
209 $oct = reverse join('', $vec->Chunk_List_Read(3));
210
818c4caa 211=item How do I convert from binary to decimal
6761e064 212
2c646907 213Perl 5.6 lets you write binary numbers directly with
214the 0b notation:
215
6f0efb17 216 $number = 0b10110110;
217
218Using oct:
219
220 my $input = "10110110";
221 $decimal = oct( "0b$input" );
2c646907 222
6f0efb17 223Using pack and ord:
d92eb7b0 224
225 $decimal = ord(pack('B8', '10110110'));
68dc0745 226
6f0efb17 227Using pack and unpack for larger strings:
6761e064 228
229 $int = unpack("N", pack("B32",
230 substr("0" x 32 . "11110101011011011111011101111", -32)));
231 $dec = sprintf("%d", $int);
232
5efd7060 233 # substr() is used to left pad a 32 character string with zeros.
6761e064 234
235Using Bit::Vector:
236
237 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
238 $dec = $vec->to_Dec();
239
818c4caa 240=item How do I convert from decimal to binary
6761e064 241
4dfcc30b 242Using sprintf (perl 5.6+):
243
244 $bin = sprintf("%b", 3735928559);
245
246Using unpack:
6761e064 247
248 $bin = unpack("B*", pack("N", 3735928559));
249
250Using Bit::Vector:
251
252 use Bit::Vector;
253 $vec = Bit::Vector->new_Dec(32, -559038737);
254 $bin = $vec->to_Bin();
255
256The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257are left as an exercise to the inclined reader.
68dc0745 258
818c4caa 259=back
68dc0745 260
65acb1b1 261=head2 Why doesn't & work the way I want it to?
262
263The behavior of binary arithmetic operators depends on whether they're
264used on numbers or strings. The operators treat a string as a series
265of bits and work with that (the string C<"3"> is the bit pattern
266C<00110011>). The operators work with the binary form of a number
267(the number C<3> is treated as the bit pattern C<00000011>).
268
269So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 270C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1 271(yielding C<"1">).
272
273Most problems with C<&> and C<|> arise because the programmer thinks
274they have a number but really it's a string. The rest arise because
275the programmer says:
276
277 if ("\020\020" & "\101\101") {
278 # ...
279 }
280
281but a string consisting of two null bytes (the result of C<"\020\020"
282& "\101\101">) is not a false value in Perl. You need:
283
284 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
285 # ...
286 }
287
68dc0745 288=head2 How do I multiply matrices?
289
290Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
291or the PDL extension (also available from CPAN).
292
293=head2 How do I perform an operation on a series of integers?
294
295To call a function on each element in an array, and collect the
296results, use:
297
298 @results = map { my_func($_) } @array;
299
300For example:
301
302 @triple = map { 3 * $_ } @single;
303
304To call a function on each element of an array, but ignore the
305results:
306
307 foreach $iterator (@array) {
65acb1b1 308 some_func($iterator);
68dc0745 309 }
310
311To call a function on each integer in a (small) range, you B<can> use:
312
65acb1b1 313 @results = map { some_func($_) } (5 .. 25);
68dc0745 314
315but you should be aware that the C<..> operator creates an array of
316all integers in the range. This can take a lot of memory for large
317ranges. Instead use:
318
319 @results = ();
320 for ($i=5; $i < 500_005; $i++) {
65acb1b1 321 push(@results, some_func($i));
68dc0745 322 }
323
87275199 324This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
325loop will iterate over the range, without creating the entire range.
326
327 for my $i (5 .. 500_005) {
328 push(@results, some_func($i));
329 }
330
331will not create a list of 500,000 integers.
332
68dc0745 333=head2 How can I output Roman numerals?
334
a93751fa 335Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 336
337=head2 Why aren't my random numbers random?
338
65acb1b1 339If you're using a version of Perl before 5.004, you must call C<srand>
340once at the start of your program to seed the random number generator.
49d635f9 341
5cd0b561 342 BEGIN { srand() if $] < 5.004 }
49d635f9 343
65acb1b1 3445.004 and later automatically call C<srand> at the beginning. Don't
49d635f9 345call C<srand> more than once---you make your numbers less random, rather
65acb1b1 346than more.
92c2ed05 347
65acb1b1 348Computers are good at being predictable and bad at being random
06a5f41f 349(despite appearances caused by bugs in your programs :-). see the
49d635f9 350F<random> article in the "Far More Than You Ever Wanted To Know"
351collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
06a5f41f 352Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
353who attempts to generate random numbers by deterministic means is, of
65acb1b1 354course, living in a state of sin.''
355
356If you want numbers that are more random than C<rand> with C<srand>
357provides, you should also check out the Math::TrulyRandom module from
358CPAN. It uses the imperfections in your system's timer to generate
359random numbers, but this takes quite a while. If you want a better
92c2ed05 360pseudorandom generator than comes with your operating system, look at
65acb1b1 361``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 362
881bdbd4 363=head2 How do I get a random number between X and Y?
364
793f5136 365C<rand($x)> returns a number such that
366C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
367figure out is a random number in the range from 0 to the
368difference between your I<X> and I<Y>.
369
370That is, to get a number between 10 and 15, inclusive, you
371want a random number between 0 and 5 that you can then add
372to 10.
373
374 my $number = 10 + int rand( 15-10+1 );
375
376Hence you derive the following simple function to abstract
377that. It selects a random integer between the two given
378integers (inclusive), For example: C<random_int_in(50,120)>.
881bdbd4 379
380 sub random_int_in ($$) {
381 my($min, $max) = @_;
382 # Assumes that the two arguments are integers themselves!
383 return $min if $min == $max;
384 ($min, $max) = ($max, $min) if $min > $max;
385 return $min + int rand(1 + $max - $min);
386 }
387
68dc0745 388=head1 Data: Dates
389
5cd0b561 390=head2 How do I find the day or week of the year?
68dc0745 391
571e049f 392The localtime function returns the day of the year. Without an
5cd0b561 393argument localtime uses the current time.
68dc0745 394
5cd0b561 395 $day_of_year = (localtime)[7];
ffc145e8 396
5cd0b561 397The POSIX module can also format a date as the day of the year or
398week of the year.
68dc0745 399
5cd0b561 400 use POSIX qw/strftime/;
401 my $day_of_year = strftime "%j", localtime;
402 my $week_of_year = strftime "%W", localtime;
403
404To get the day of year for any date, use the Time::Local module to get
405a time in epoch seconds for the argument to localtime.
ffc145e8 406
5cd0b561 407 use POSIX qw/strftime/;
408 use Time::Local;
409 my $week_of_year = strftime "%W",
410 localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
411
412The Date::Calc module provides two functions for to calculate these.
413
414 use Date::Calc;
415 my $day_of_year = Day_of_Year( 1987, 12, 18 );
416 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 417
d92eb7b0 418=head2 How do I find the current century or millennium?
419
420Use the following simple functions:
421
197aec24 422 sub get_century {
d92eb7b0 423 return int((((localtime(shift || time))[5] + 1999))/100);
197aec24 424 }
571e049f 425
197aec24 426 sub get_millennium {
d92eb7b0 427 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
197aec24 428 }
d92eb7b0 429
49d635f9 430On some systems, the POSIX module's strftime() function has
431been extended in a non-standard way to use a C<%C> format,
432which they sometimes claim is the "century". It isn't,
433because on most such systems, this is only the first two
434digits of the four-digit year, and thus cannot be used to
435reliably determine the current century or millennium.
d92eb7b0 436
92c2ed05 437=head2 How can I compare two dates and find the difference?
68dc0745 438
92c2ed05 439If you're storing your dates as epoch seconds then simply subtract one
440from the other. If you've got a structured date (distinct year, day,
d92eb7b0 441month, hour, minute, seconds values), then for reasons of accessibility,
442simplicity, and efficiency, merely use either timelocal or timegm (from
443the Time::Local module in the standard distribution) to reduce structured
444dates to epoch seconds. However, if you don't know the precise format of
445your dates, then you should probably use either of the Date::Manip and
446Date::Calc modules from CPAN before you go hacking up your own parsing
447routine to handle arbitrary date formats.
68dc0745 448
449=head2 How can I take a string and turn it into epoch seconds?
450
451If it's a regular enough string that it always has the same format,
92c2ed05 452you can split it up and pass the parts to C<timelocal> in the standard
453Time::Local module. Otherwise, you should look into the Date::Calc
454and Date::Manip modules from CPAN.
68dc0745 455
456=head2 How can I find the Julian Day?
457
7678cced 458(contributed by brian d foy and Dave Cross)
459
460You can use the Time::JulianDay module available on CPAN. Ensure that
461you really want to find a Julian day, though, as many people have
462different ideas about Julian days. See
463http://www.hermetic.ch/cal_stud/jdn.htm for instance.
464
465You can also try the DateTime module, which can convert a date/time
466to a Julian Day.
467
468 $ perl -MDateTime -le'print DateTime->today->jd'
469 2453401.5
470
471Or the modified Julian Day
472
473 $ perl -MDateTime -le'print DateTime->today->mjd'
474 53401
475
476Or even the day of the year (which is what some people think of as a
477Julian day)
478
479 $ perl -MDateTime -le'print DateTime->today->doy'
480 31
be94a901 481
65acb1b1 482=head2 How do I find yesterday's date?
483
49d635f9 484If you only need to find the date (and not the same time), you
485can use the Date::Calc module.
65acb1b1 486
49d635f9 487 use Date::Calc qw(Today Add_Delta_Days);
197aec24 488
49d635f9 489 my @date = Add_Delta_Days( Today(), -1 );
197aec24 490
49d635f9 491 print "@date\n";
65acb1b1 492
49d635f9 493Most people try to use the time rather than the calendar to
494figure out dates, but that assumes that your days are
495twenty-four hours each. For most people, there are two days
496a year when they aren't: the switch to and from summer time
497throws this off. Russ Allbery offers this solution.
d92eb7b0 498
499 sub yesterday {
49d635f9 500 my $now = defined $_[0] ? $_[0] : time;
501 my $then = $now - 60 * 60 * 24;
502 my $ndst = (localtime $now)[8] > 0;
503 my $tdst = (localtime $then)[8] > 0;
504 $then - ($tdst - $ndst) * 60 * 60;
505 }
197aec24 506
49d635f9 507Should give you "this time yesterday" in seconds since epoch relative to
508the first argument or the current time if no argument is given and
509suitable for passing to localtime or whatever else you need to do with
510it. $ndst is whether we're currently in daylight savings time; $tdst is
511whether the point 24 hours ago was in daylight savings time. If $tdst
512and $ndst are the same, a boundary wasn't crossed, and the correction
513will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
514from yesterday's time since we gained an extra hour while going off
515daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
516negative hour (add an hour) to yesterday's time since we lost an hour.
517
518All of this is because during those days when one switches off or onto
519DST, a "day" isn't 24 hours long; it's either 23 or 25.
520
521The explicit settings of $ndst and $tdst are necessary because localtime
522only says it returns the system tm struct, and the system tm struct at
523least on Solaris doesn't guarantee any particular positive value (like,
524say, 1) for isdst, just a positive value. And that value can
525potentially be negative, if DST information isn't available (this sub
526just treats those cases like no DST).
527
528Note that between 2am and 3am on the day after the time zone switches
529off daylight savings time, the exact hour of "yesterday" corresponding
530to the current hour is not clearly defined. Note also that if used
531between 2am and 3am the day after the change to daylight savings time,
532the result will be between 3am and 4am of the previous day; it's
533arguable whether this is correct.
534
535This sub does not attempt to deal with leap seconds (most things don't).
536
537
d92eb7b0 538
87275199 539=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 540
65acb1b1 541Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
542Y2K compliant (whatever that means). The programmers you've hired to
543use it, however, probably are not.
544
545Long answer: The question belies a true understanding of the issue.
546Perl is just as Y2K compliant as your pencil--no more, and no less.
547Can you use your pencil to write a non-Y2K-compliant memo? Of course
548you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 549
87275199 550The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 551supply adequate information to determine the year well beyond 2000
552(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 553by these functions when used in a list context is the year minus 1900.
65acb1b1 554For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
555number. To avoid the year 2000 problem simply do not treat the year as
556a 2-digit number. It isn't.
68dc0745 557
5a964f20 558When gmtime() and localtime() are used in scalar context they return
68dc0745 559a timestamp string that contains a fully-expanded year. For example,
560C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5612001". There's no year 2000 problem here.
562
5a964f20 563That doesn't mean that Perl can't be used to create non-Y2K compliant
564programs. It can. But so can your pencil. It's the fault of the user,
565not the language. At the risk of inflaming the NRA: ``Perl doesn't
c98c5709 566break Y2K, people do.'' See http://www.perl.org/about/y2k.html for
5a964f20 567a longer exposition.
568
68dc0745 569=head1 Data: Strings
570
571=head2 How do I validate input?
572
573The answer to this question is usually a regular expression, perhaps
5a964f20 574with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 575addresses, etc.) for details.
576
577=head2 How do I unescape a string?
578
92c2ed05 579It depends just what you mean by ``escape''. URL escapes are dealt
580with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 581character are removed with
68dc0745 582
583 s/\\(.)/$1/g;
584
92c2ed05 585This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 586
587=head2 How do I remove consecutive pairs of characters?
588
92c2ed05 589To turn C<"abbcccd"> into C<"abccd">:
68dc0745 590
d92eb7b0 591 s/(.)\1/$1/g; # add /s to include newlines
592
593Here's a solution that turns "abbcccd" to "abcd":
594
595 y///cs; # y == tr, but shorter :-)
68dc0745 596
597=head2 How do I expand function calls in a string?
598
599This is documented in L<perlref>. In general, this is fraught with
600quoting and readability problems, but it is possible. To interpolate
5a964f20 601a subroutine call (in list context) into a string:
68dc0745 602
603 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
604
68dc0745 605=head2 How do I find matching/nesting anything?
606
92c2ed05 607This isn't something that can be done in one regular expression, no
608matter how complicated. To find something between two single
609characters, a pattern like C</x([^x]*)x/> will get the intervening
610bits in $1. For multiple ones, then something more like
611C</alpha(.*?)omega/> would be needed. But none of these deals with
f0f835c2 612nested patterns. For balanced expressions using C<(>, C<{>, C<[>
613or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
614L<perlre/(??{ code })>. For other cases, you'll have to write a parser.
92c2ed05 615
616If you are serious about writing a parser, there are a number of
6a2af475 617modules or oddities that will make your life a lot easier. There are
618the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
83df6a1d 619and the byacc program. Starting from perl 5.8 the Text::Balanced
620is part of the standard distribution.
68dc0745 621
92c2ed05 622One simple destructive, inside-out approach that you might try is to
623pull out the smallest nesting parts one at a time:
5a964f20 624
d92eb7b0 625 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 626 # do something with $1
197aec24 627 }
5a964f20 628
65acb1b1 629A more complicated and sneaky approach is to make Perl's regular
630expression engine do it for you. This is courtesy Dean Inada, and
631rather has the nature of an Obfuscated Perl Contest entry, but it
632really does work:
633
634 # $_ contains the string to parse
635 # BEGIN and END are the opening and closing markers for the
636 # nested text.
c47ff5f1 637
65acb1b1 638 @( = ('(','');
639 @) = (')','');
640 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
5ed30e05 641 @$ = (eval{/$re/},$@!~/unmatched/i);
65acb1b1 642 print join("\n",@$[0..$#$]) if( $$[-1] );
643
68dc0745 644=head2 How do I reverse a string?
645
5a964f20 646Use reverse() in scalar context, as documented in
68dc0745 647L<perlfunc/reverse>.
648
649 $reversed = reverse $string;
650
651=head2 How do I expand tabs in a string?
652
5a964f20 653You can do it yourself:
68dc0745 654
655 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
656
87275199 657Or you can just use the Text::Tabs module (part of the standard Perl
68dc0745 658distribution).
659
660 use Text::Tabs;
661 @expanded_lines = expand(@lines_with_tabs);
662
663=head2 How do I reformat a paragraph?
664
87275199 665Use Text::Wrap (part of the standard Perl distribution):
68dc0745 666
667 use Text::Wrap;
668 print wrap("\t", ' ', @paragraphs);
669
92c2ed05 670The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 671newlines. Text::Wrap doesn't justify the lines (flush-right).
672
bc06af74 673Or use the CPAN module Text::Autoformat. Formatting files can be easily
674done by making a shell alias, like so:
675
676 alias fmt="perl -i -MText::Autoformat -n0777 \
677 -e 'print autoformat $_, {all=>1}' $*"
678
679See the documentation for Text::Autoformat to appreciate its many
680capabilities.
681
49d635f9 682=head2 How can I access or change N characters of a string?
68dc0745 683
49d635f9 684You can access the first characters of a string with substr().
685To get the first character, for example, start at position 0
197aec24 686and grab the string of length 1.
68dc0745 687
68dc0745 688
49d635f9 689 $string = "Just another Perl Hacker";
690 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 691
49d635f9 692To change part of a string, you can use the optional fourth
693argument which is the replacement string.
68dc0745 694
49d635f9 695 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 696
49d635f9 697You can also use substr() as an lvalue.
68dc0745 698
49d635f9 699 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 700
68dc0745 701=head2 How do I change the Nth occurrence of something?
702
92c2ed05 703You have to keep track of N yourself. For example, let's say you want
704to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 705C<"whosoever"> or C<"whomsoever">, case insensitively. These
706all assume that $_ contains the string to be altered.
68dc0745 707
708 $count = 0;
709 s{((whom?)ever)}{
710 ++$count == 5 # is it the 5th?
711 ? "${2}soever" # yes, swap
712 : $1 # renege and leave it there
d92eb7b0 713 }ige;
68dc0745 714
5a964f20 715In the more general case, you can use the C</g> modifier in a C<while>
716loop, keeping count of matches.
717
718 $WANT = 3;
719 $count = 0;
d92eb7b0 720 $_ = "One fish two fish red fish blue fish";
5a964f20 721 while (/(\w+)\s+fish\b/gi) {
722 if (++$count == $WANT) {
723 print "The third fish is a $1 one.\n";
5a964f20 724 }
725 }
726
92c2ed05 727That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 728repetition count and repeated pattern like this:
729
730 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
731
68dc0745 732=head2 How can I count the number of occurrences of a substring within a string?
733
a6dd486b 734There are a number of ways, with varying efficiency. If you want a
68dc0745 735count of a certain single character (X) within a string, you can use the
736C<tr///> function like so:
737
368c9434 738 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 739 $count = ($string =~ tr/X//);
d92eb7b0 740 print "There are $count X characters in the string";
68dc0745 741
742This is fine if you are just looking for a single character. However,
743if you are trying to count multiple character substrings within a
744larger string, C<tr///> won't work. What you can do is wrap a while()
745loop around a global pattern match. For example, let's count negative
746integers:
747
748 $string = "-9 55 48 -2 23 -76 4 14 -44";
749 while ($string =~ /-\d+/g) { $count++ }
750 print "There are $count negative numbers in the string";
751
881bdbd4 752Another version uses a global match in list context, then assigns the
753result to a scalar, producing a count of the number of matches.
754
755 $count = () = $string =~ /-\d+/g;
756
68dc0745 757=head2 How do I capitalize all the words on one line?
758
759To make the first letter of each word upper case:
3fe9a6f1 760
68dc0745 761 $line =~ s/\b(\w)/\U$1/g;
762
46fc3d4c 763This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 764Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 765more thorough solution (Suggested by brian d foy):
46fc3d4c 766
767 $string =~ s/ (
768 (^\w) #at the beginning of the line
769 | # or
770 (\s\w) #preceded by whitespace
771 )
772 /\U$1/xg;
773 $string =~ /([\w']+)/\u\L$1/g;
774
68dc0745 775To make the whole line upper case:
3fe9a6f1 776
68dc0745 777 $line = uc($line);
778
779To force each word to be lower case, with the first letter upper case:
3fe9a6f1 780
68dc0745 781 $line =~ s/(\w+)/\u\L$1/g;
782
5a964f20 783You can (and probably should) enable locale awareness of those
784characters by placing a C<use locale> pragma in your program.
92c2ed05 785See L<perllocale> for endless details on locales.
5a964f20 786
65acb1b1 787This is sometimes referred to as putting something into "title
d92eb7b0 788case", but that's not quite accurate. Consider the proper
65acb1b1 789capitalization of the movie I<Dr. Strangelove or: How I Learned to
790Stop Worrying and Love the Bomb>, for example.
791
369b44b4 792Damian Conway's L<Text::Autoformat> module provides some smart
793case transformations:
794
795 use Text::Autoformat;
796 my $x = "Dr. Strangelove or: How I Learned to Stop ".
797 "Worrying and Love the Bomb";
798
799 print $x, "\n";
800 for my $style (qw( sentence title highlight ))
801 {
802 print autoformat($x, { case => $style }), "\n";
803 }
804
49d635f9 805=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 806
49d635f9 807Several modules can handle this sort of pasing---Text::Balanced,
7678cced 808Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
49d635f9 809
810Take the example case of trying to split a string that is
811comma-separated into its different fields. You can't use C<split(/,/)>
812because you shouldn't split if the comma is inside quotes. For
813example, take a data line like this:
68dc0745 814
815 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
816
817Due to the restriction of the quotes, this is a fairly complex
197aec24 818problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 819I<Mastering Regular Expressions>, to handle these for us. He
68dc0745 820suggests (assuming your string is contained in $text):
821
822 @new = ();
823 push(@new, $+) while $text =~ m{
824 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
825 | ([^,]+),?
826 | ,
827 }gx;
828 push(@new, undef) if substr($text,-1,1) eq ',';
829
46fc3d4c 830If you want to represent quotation marks inside a
831quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 832C<"like \"this\"">.
46fc3d4c 833
87275199 834Alternatively, the Text::ParseWords module (part of the standard Perl
68dc0745 835distribution) lets you say:
836
837 use Text::ParseWords;
838 @new = quotewords(",", 0, $text);
839
a6dd486b 840There's also a Text::CSV (Comma-Separated Values) module on CPAN.
65acb1b1 841
68dc0745 842=head2 How do I strip blank space from the beginning/end of a string?
843
a6dd486b 844Although the simplest approach would seem to be
68dc0745 845
846 $string =~ s/^\s*(.*?)\s*$/$1/;
847
a6dd486b 848not only is this unnecessarily slow and destructive, it also fails with
d92eb7b0 849embedded newlines. It is much faster to do this operation in two steps:
68dc0745 850
851 $string =~ s/^\s+//;
852 $string =~ s/\s+$//;
853
854Or more nicely written as:
855
856 for ($string) {
857 s/^\s+//;
858 s/\s+$//;
859 }
860
5e3006a4 861This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 862behavior to factor out common code. You can do this
197aec24 863on several strings at once, or arrays, or even the
d92eb7b0 864values of a hash if you use a slice:
5a964f20 865
197aec24 866 # trim whitespace in the scalar, the array,
5a964f20 867 # and all the values in the hash
868 foreach ($scalar, @array, @hash{keys %hash}) {
869 s/^\s+//;
870 s/\s+$//;
871 }
872
65acb1b1 873=head2 How do I pad a string with blanks or pad a number with zeroes?
874
65acb1b1 875In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 876to pad the string, C<$text> or C<$num> contains the string to be padded,
877and C<$pad_char> contains the padding character. You can use a single
878character string constant instead of the C<$pad_char> variable if you
879know what it is in advance. And in the same way you can use an integer in
880place of C<$pad_len> if you know the pad length in advance.
65acb1b1 881
d92eb7b0 882The simplest method uses the C<sprintf> function. It can pad on the left
883or right with blanks and on the left with zeroes and it will not
884truncate the result. The C<pack> function can only pad strings on the
885right with blanks and it will truncate the result to a maximum length of
886C<$pad_len>.
65acb1b1 887
d92eb7b0 888 # Left padding a string with blanks (no truncation):
04d666b1 889 $padded = sprintf("%${pad_len}s", $text);
890 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 891
d92eb7b0 892 # Right padding a string with blanks (no truncation):
04d666b1 893 $padded = sprintf("%-${pad_len}s", $text);
894 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 895
197aec24 896 # Left padding a number with 0 (no truncation):
04d666b1 897 $padded = sprintf("%0${pad_len}d", $num);
898 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 899
d92eb7b0 900 # Right padding a string with blanks using pack (will truncate):
901 $padded = pack("A$pad_len",$text);
65acb1b1 902
d92eb7b0 903If you need to pad with a character other than blank or zero you can use
904one of the following methods. They all generate a pad string with the
905C<x> operator and combine that with C<$text>. These methods do
906not truncate C<$text>.
65acb1b1 907
d92eb7b0 908Left and right padding with any character, creating a new string:
65acb1b1 909
d92eb7b0 910 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
911 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 912
d92eb7b0 913Left and right padding with any character, modifying C<$text> directly:
65acb1b1 914
d92eb7b0 915 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
916 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 917
68dc0745 918=head2 How do I extract selected columns from a string?
919
920Use substr() or unpack(), both documented in L<perlfunc>.
197aec24 921If you prefer thinking in terms of columns instead of widths,
5a964f20 922you can use this kind of thing:
923
924 # determine the unpack format needed to split Linux ps output
925 # arguments are cut columns
926 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
927
197aec24 928 sub cut2fmt {
5a964f20 929 my(@positions) = @_;
930 my $template = '';
931 my $lastpos = 1;
932 for my $place (@positions) {
197aec24 933 $template .= "A" . ($place - $lastpos) . " ";
5a964f20 934 $lastpos = $place;
935 }
936 $template .= "A*";
937 return $template;
938 }
68dc0745 939
940=head2 How do I find the soundex value of a string?
941
7678cced 942(contributed by brian d foy)
943
944You can use the Text::Soundex module. If you want to do fuzzy or close
945matching, you might also try the String::Approx, and Text::Metaphone,
946and Text::DoubleMetaphone modules.
68dc0745 947
948=head2 How can I expand variables in text strings?
949
7678cced 950Let's assume that you have a string that contains placeholder
951variables.
68dc0745 952
953 $text = 'this has a $foo in it and a $bar';
5a964f20 954
7678cced 955You can use a substitution with a double evaluation. The
956first /e turns C<$1> into C<$foo>, and the second /e turns
957C<$foo> into its value. You may want to wrap this in an
958C<eval>: if you try to get the value of an undeclared variable
959while running under C<use strict>, you get a fatal error.
5a964f20 960
7678cced 961 eval { $text =~ s/(\$\w+)/$1/eeg };
962 die if $@;
68dc0745 963
5a964f20 964It's probably better in the general case to treat those
965variables as entries in some special hash. For example:
966
197aec24 967 %user_defs = (
5a964f20 968 foo => 23,
969 bar => 19,
970 );
971 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 972
973=head2 What's wrong with always quoting "$vars"?
974
a6dd486b 975The problem is that those double-quotes force stringification--
976coercing numbers and references into strings--even when you
977don't want them to be strings. Think of it this way: double-quote
197aec24 978expansion is used to produce new strings. If you already
65acb1b1 979have a string, why do you need more?
68dc0745 980
981If you get used to writing odd things like these:
982
983 print "$var"; # BAD
984 $new = "$old"; # BAD
985 somefunc("$var"); # BAD
986
987You'll be in trouble. Those should (in 99.8% of the cases) be
988the simpler and more direct:
989
990 print $var;
991 $new = $old;
992 somefunc($var);
993
994Otherwise, besides slowing you down, you're going to break code when
995the thing in the scalar is actually neither a string nor a number, but
996a reference:
997
998 func(\@array);
999 sub func {
1000 my $aref = shift;
1001 my $oref = "$aref"; # WRONG
1002 }
1003
1004You can also get into subtle problems on those few operations in Perl
1005that actually do care about the difference between a string and a
1006number, such as the magical C<++> autoincrement operator or the
1007syscall() function.
1008
197aec24 1009Stringification also destroys arrays.
5a964f20 1010
1011 @lines = `command`;
1012 print "@lines"; # WRONG - extra blanks
1013 print @lines; # right
1014
04d666b1 1015=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1016
1017Check for these three things:
1018
1019=over 4
1020
04d666b1 1021=item There must be no space after the E<lt>E<lt> part.
68dc0745 1022
197aec24 1023=item There (probably) should be a semicolon at the end.
68dc0745 1024
197aec24 1025=item You can't (easily) have any space in front of the tag.
68dc0745 1026
1027=back
1028
197aec24 1029If you want to indent the text in the here document, you
5a964f20 1030can do this:
1031
1032 # all in one
1033 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1034 your text
1035 goes here
1036 HERE_TARGET
1037
1038But the HERE_TARGET must still be flush against the margin.
197aec24 1039If you want that indented also, you'll have to quote
5a964f20 1040in the indentation.
1041
1042 ($quote = <<' FINIS') =~ s/^\s+//gm;
1043 ...we will have peace, when you and all your works have
1044 perished--and the works of your dark master to whom you
1045 would deliver us. You are a liar, Saruman, and a corrupter
1046 of men's hearts. --Theoden in /usr/src/perl/taint.c
1047 FINIS
83ded9ee 1048 $quote =~ s/\s+--/\n--/;
5a964f20 1049
1050A nice general-purpose fixer-upper function for indented here documents
1051follows. It expects to be called with a here document as its argument.
1052It looks to see whether each line begins with a common substring, and
a6dd486b 1053if so, strips that substring off. Otherwise, it takes the amount of leading
1054whitespace found on the first line and removes that much off each
5a964f20 1055subsequent line.
1056
1057 sub fix {
1058 local $_ = shift;
a6dd486b 1059 my ($white, $leader); # common whitespace and common leading string
5a964f20 1060 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1061 ($white, $leader) = ($2, quotemeta($1));
1062 } else {
1063 ($white, $leader) = (/^(\s+)/, '');
1064 }
1065 s/^\s*?$leader(?:$white)?//gm;
1066 return $_;
1067 }
1068
c8db1d39 1069This works with leading special strings, dynamically determined:
5a964f20 1070
1071 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1072 @@@ int
1073 @@@ runops() {
1074 @@@ SAVEI32(runlevel);
1075 @@@ runlevel++;
d92eb7b0 1076 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1077 @@@ TAINT_NOT;
1078 @@@ return 0;
1079 @@@ }
1080 MAIN_INTERPRETER_LOOP
1081
a6dd486b 1082Or with a fixed amount of leading whitespace, with remaining
5a964f20 1083indentation correctly preserved:
1084
1085 $poem = fix<<EVER_ON_AND_ON;
1086 Now far ahead the Road has gone,
1087 And I must follow, if I can,
1088 Pursuing it with eager feet,
1089 Until it joins some larger way
1090 Where many paths and errands meet.
1091 And whither then? I cannot say.
1092 --Bilbo in /usr/src/perl/pp_ctl.c
1093 EVER_ON_AND_ON
1094
68dc0745 1095=head1 Data: Arrays
1096
65acb1b1 1097=head2 What is the difference between a list and an array?
1098
1099An array has a changeable length. A list does not. An array is something
1100you can push or pop, while a list is a set of values. Some people make
1101the distinction that a list is a value while an array is a variable.
1102Subroutines are passed and return lists, you put things into list
1103context, you initialize arrays with lists, and you foreach() across
1104a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
1105in scalar context behave like the number of elements in them, subroutines
a6dd486b 1106access their arguments through the array C<@_>, and push/pop/shift only work
65acb1b1 1107on arrays.
1108
1109As a side note, there's no such thing as a list in scalar context.
1110When you say
1111
1112 $scalar = (2, 5, 7, 9);
1113
d92eb7b0 1114you're using the comma operator in scalar context, so it uses the scalar
1115comma operator. There never was a list there at all! This causes the
1116last value to be returned: 9.
65acb1b1 1117
68dc0745 1118=head2 What is the difference between $array[1] and @array[1]?
1119
a6dd486b 1120The former is a scalar value; the latter an array slice, making
68dc0745 1121it a list with one (scalar) value. You should use $ when you want a
1122scalar value (most of the time) and @ when you want a list with one
1123scalar value in it (very, very rarely; nearly never, in fact).
1124
1125Sometimes it doesn't make a difference, but sometimes it does.
1126For example, compare:
1127
1128 $good[0] = `some program that outputs several lines`;
1129
1130with
1131
1132 @bad[0] = `same program that outputs several lines`;
1133
197aec24 1134The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1135matters.
68dc0745 1136
d92eb7b0 1137=head2 How can I remove duplicate elements from a list or array?
68dc0745 1138
1139There are several possible ways, depending on whether the array is
1140ordered and whether you wish to preserve the ordering.
1141
1142=over 4
1143
551e1d92 1144=item a)
1145
1146If @in is sorted, and you want @out to be sorted:
5a964f20 1147(this assumes all true values in the array)
68dc0745 1148
a4341a65 1149 $prev = "not equal to $in[0]";
3bc5ef3e 1150 @out = grep($_ ne $prev && ($prev = $_, 1), @in);
68dc0745 1151
c8db1d39 1152This is nice in that it doesn't use much extra memory, simulating
3bc5ef3e 1153uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
1154guarantees that the expression is true (so that grep picks it up)
1155even if the $_ is 0, "", or undef.
68dc0745 1156
551e1d92 1157=item b)
1158
1159If you don't know whether @in is sorted:
68dc0745 1160
1161 undef %saw;
1162 @out = grep(!$saw{$_}++, @in);
1163
551e1d92 1164=item c)
1165
1166Like (b), but @in contains only small integers:
68dc0745 1167
1168 @out = grep(!$saw[$_]++, @in);
1169
551e1d92 1170=item d)
1171
1172A way to do (b) without any loops or greps:
68dc0745 1173
1174 undef %saw;
1175 @saw{@in} = ();
1176 @out = sort keys %saw; # remove sort if undesired
1177
551e1d92 1178=item e)
1179
1180Like (d), but @in contains only small positive integers:
68dc0745 1181
1182 undef @ary;
1183 @ary[@in] = @in;
87275199 1184 @out = grep {defined} @ary;
68dc0745 1185
1186=back
1187
65acb1b1 1188But perhaps you should have been using a hash all along, eh?
1189
ddbc1f16 1190=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1191
1192Hearing the word "in" is an I<in>dication that you probably should have
1193used a hash, not a list or array, to store your data. Hashes are
1194designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1195
5a964f20 1196That being said, there are several ways to approach this. If you
1197are going to make this query many times over arbitrary string values,
881bdbd4 1198the fastest way is probably to invert the original array and maintain a
1199hash whose keys are the first array's values.
68dc0745 1200
1201 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
881bdbd4 1202 %is_blue = ();
68dc0745 1203 for (@blues) { $is_blue{$_} = 1 }
1204
1205Now you can check whether $is_blue{$some_color}. It might have been a
1206good idea to keep the blues all in a hash in the first place.
1207
1208If the values are all small integers, you could use a simple indexed
1209array. This kind of an array will take up less space:
1210
1211 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
881bdbd4 1212 @is_tiny_prime = ();
d92eb7b0 1213 for (@primes) { $is_tiny_prime[$_] = 1 }
1214 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1215
1216Now you check whether $is_tiny_prime[$some_number].
1217
1218If the values in question are integers instead of strings, you can save
1219quite a lot of space by using bit strings instead:
1220
1221 @articles = ( 1..10, 150..2000, 2017 );
1222 undef $read;
7b8d334a 1223 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1224
1225Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1226
1227Please do not use
1228
a6dd486b 1229 ($is_there) = grep $_ eq $whatever, @array;
68dc0745 1230
1231or worse yet
1232
a6dd486b 1233 ($is_there) = grep /$whatever/, @array;
68dc0745 1234
1235These are slow (checks every element even if the first matches),
1236inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1237regex characters in $whatever?). If you're only testing once, then
65acb1b1 1238use:
1239
1240 $is_there = 0;
1241 foreach $elt (@array) {
1242 if ($elt eq $elt_to_find) {
1243 $is_there = 1;
1244 last;
1245 }
1246 }
1247 if ($is_there) { ... }
68dc0745 1248
1249=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1250
1251Use a hash. Here's code to do both and more. It assumes that
1252each element is unique in a given array:
1253
1254 @union = @intersection = @difference = ();
1255 %count = ();
1256 foreach $element (@array1, @array2) { $count{$element}++ }
1257 foreach $element (keys %count) {
1258 push @union, $element;
1259 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1260 }
1261
d92eb7b0 1262Note that this is the I<symmetric difference>, that is, all elements in
a6dd486b 1263either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1264
65acb1b1 1265=head2 How do I test whether two arrays or hashes are equal?
1266
1267The following code works for single-level arrays. It uses a stringwise
1268comparison, and does not distinguish defined versus undefined empty
1269strings. Modify if you have other needs.
1270
1271 $are_equal = compare_arrays(\@frogs, \@toads);
1272
1273 sub compare_arrays {
1274 my ($first, $second) = @_;
9f1b1f2d 1275 no warnings; # silence spurious -w undef complaints
65acb1b1 1276 return 0 unless @$first == @$second;
1277 for (my $i = 0; $i < @$first; $i++) {
1278 return 0 if $first->[$i] ne $second->[$i];
1279 }
1280 return 1;
1281 }
1282
1283For multilevel structures, you may wish to use an approach more
1284like this one. It uses the CPAN module FreezeThaw:
1285
1286 use FreezeThaw qw(cmpStr);
1287 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1288
1289 printf "a and b contain %s arrays\n",
197aec24 1290 cmpStr(\@a, \@b) == 0
1291 ? "the same"
65acb1b1 1292 : "different";
1293
1294This approach also works for comparing hashes. Here
1295we'll demonstrate two different answers:
1296
1297 use FreezeThaw qw(cmpStr cmpStrHard);
1298
1299 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1300 $a{EXTRA} = \%b;
197aec24 1301 $b{EXTRA} = \%a;
65acb1b1 1302
1303 printf "a and b contain %s hashes\n",
1304 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1305
1306 printf "a and b contain %s hashes\n",
1307 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1308
1309
1310The first reports that both those the hashes contain the same data,
1311while the second reports that they do not. Which you prefer is left as
1312an exercise to the reader.
1313
68dc0745 1314=head2 How do I find the first array element for which a condition is true?
1315
49d635f9 1316To find the first array element which satisfies a condition, you can
1317use the first() function in the List::Util module, which comes with
1318Perl 5.8. This example finds the first element that contains "Perl".
1319
1320 use List::Util qw(first);
197aec24 1321
49d635f9 1322 my $element = first { /Perl/ } @array;
197aec24 1323
49d635f9 1324If you cannot use List::Util, you can make your own loop to do the
1325same thing. Once you find the element, you stop the loop with last.
1326
1327 my $found;
1328 foreach my $element ( @array )
1329 {
1330 if( /Perl/ ) { $found = $element; last }
1331 }
1332
1333If you want the array index, you can iterate through the indices
1334and check the array element at each index until you find one
1335that satisfies the condition.
1336
197aec24 1337 my( $found, $index ) = ( undef, -1 );
1338 for( $i = 0; $i < @array; $i++ )
49d635f9 1339 {
197aec24 1340 if( $array[$i] =~ /Perl/ )
1341 {
49d635f9 1342 $found = $array[$i];
197aec24 1343 $index = $i;
49d635f9 1344 last;
1345 }
68dc0745 1346 }
68dc0745 1347
1348=head2 How do I handle linked lists?
1349
1350In general, you usually don't need a linked list in Perl, since with
1351regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1352or you can use splice to add and/or remove arbitrary number of elements at
87275199 1353arbitrary points. Both pop and shift are both O(1) operations on Perl's
5a964f20 1354dynamic arrays. In the absence of shifts and pops, push in general
1355needs to reallocate on the order every log(N) times, and unshift will
1356need to copy pointers each time.
68dc0745 1357
1358If you really, really wanted, you could use structures as described in
1359L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1360to do. For example, imagine a list node like this:
1361
1362 $node = {
1363 VALUE => 42,
1364 LINK => undef,
1365 };
1366
1367You could walk the list this way:
1368
1369 print "List: ";
1370 for ($node = $head; $node; $node = $node->{LINK}) {
1371 print $node->{VALUE}, " ";
1372 }
1373 print "\n";
1374
a6dd486b 1375You could add to the list this way:
65acb1b1 1376
1377 my ($head, $tail);
1378 $tail = append($head, 1); # grow a new head
1379 for $value ( 2 .. 10 ) {
1380 $tail = append($tail, $value);
1381 }
1382
1383 sub append {
1384 my($list, $value) = @_;
1385 my $node = { VALUE => $value };
1386 if ($list) {
1387 $node->{LINK} = $list->{LINK};
1388 $list->{LINK} = $node;
1389 } else {
1390 $_[0] = $node; # replace caller's version
1391 }
1392 return $node;
1393 }
1394
1395But again, Perl's built-in are virtually always good enough.
68dc0745 1396
1397=head2 How do I handle circular lists?
1398
1399Circular lists could be handled in the traditional fashion with linked
1400lists, or you could just do something like this with an array:
1401
1402 unshift(@array, pop(@array)); # the last shall be first
1403 push(@array, shift(@array)); # and vice versa
1404
1405=head2 How do I shuffle an array randomly?
1406
45bbf655 1407If you either have Perl 5.8.0 or later installed, or if you have
1408Scalar-List-Utils 1.03 or later installed, you can say:
1409
f05bbc40 1410 use List::Util 'shuffle';
45bbf655 1411
1412 @shuffled = shuffle(@list);
1413
f05bbc40 1414If not, you can use a Fisher-Yates shuffle.
5a964f20 1415
5a964f20 1416 sub fisher_yates_shuffle {
cc30d1a7 1417 my $deck = shift; # $deck is a reference to an array
1418 my $i = @$deck;
f05bbc40 1419 while ($i--) {
5a964f20 1420 my $j = int rand ($i+1);
cc30d1a7 1421 @$deck[$i,$j] = @$deck[$j,$i];
5a964f20 1422 }
1423 }
1424
cc30d1a7 1425 # shuffle my mpeg collection
1426 #
1427 my @mpeg = <audio/*/*.mp3>;
1428 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1429 print @mpeg;
5a964f20 1430
45bbf655 1431Note that the above implementation shuffles an array in place,
1432unlike the List::Util::shuffle() which takes a list and returns
1433a new shuffled list.
1434
d92eb7b0 1435You've probably seen shuffling algorithms that work using splice,
a6dd486b 1436randomly picking another element to swap the current element with
68dc0745 1437
1438 srand;
1439 @new = ();
1440 @old = 1 .. 10; # just a demo
1441 while (@old) {
1442 push(@new, splice(@old, rand @old, 1));
1443 }
1444
5a964f20 1445This is bad because splice is already O(N), and since you do it N times,
1446you just invented a quadratic algorithm; that is, O(N**2). This does
1447not scale, although Perl is so efficient that you probably won't notice
1448this until you have rather largish arrays.
68dc0745 1449
1450=head2 How do I process/modify each element of an array?
1451
1452Use C<for>/C<foreach>:
1453
1454 for (@lines) {
5a964f20 1455 s/foo/bar/; # change that word
1456 y/XZ/ZX/; # swap those letters
68dc0745 1457 }
1458
1459Here's another; let's compute spherical volumes:
1460
5a964f20 1461 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1462 $_ **= 3;
1463 $_ *= (4/3) * 3.14159; # this will be constant folded
1464 }
197aec24 1465
49d635f9 1466which can also be done with map() which is made to transform
1467one list into another:
1468
1469 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1470
76817d6d 1471If you want to do the same thing to modify the values of the
1472hash, you can use the C<values> function. As of Perl 5.6
1473the values are not copied, so if you modify $orbit (in this
1474case), you modify the value.
5a964f20 1475
76817d6d 1476 for $orbit ( values %orbits ) {
197aec24 1477 ($orbit **= 3) *= (4/3) * 3.14159;
5a964f20 1478 }
818c4caa 1479
76817d6d 1480Prior to perl 5.6 C<values> returned copies of the values,
1481so older perl code often contains constructions such as
1482C<@orbits{keys %orbits}> instead of C<values %orbits> where
1483the hash is to be modified.
818c4caa 1484
68dc0745 1485=head2 How do I select a random element from an array?
1486
1487Use the rand() function (see L<perlfunc/rand>):
1488
68dc0745 1489 $index = rand @array;
1490 $element = $array[$index];
1491
793f5136 1492Or, simply:
1493 my $element = $array[ rand @array ];
5a964f20 1494
68dc0745 1495=head2 How do I permute N elements of a list?
1496
49d635f9 1497Use the List::Permutor module on CPAN. If the list is
1498actually an array, try the Algorithm::Permute module (also
1499on CPAN). It's written in XS code and is very efficient.
1500
1501 use Algorithm::Permute;
1502 my @array = 'a'..'d';
1503 my $p_iterator = Algorithm::Permute->new ( \@array );
1504 while (my @perm = $p_iterator->next) {
1505 print "next permutation: (@perm)\n";
1506 }
1507
197aec24 1508For even faster execution, you could do:
1509
1510 use Algorithm::Permute;
1511 my @array = 'a'..'d';
1512 Algorithm::Permute::permute {
1513 print "next permutation: (@array)\n";
1514 } @array;
1515
49d635f9 1516Here's a little program that generates all permutations of
1517all the words on each line of input. The algorithm embodied
1518in the permute() function is discussed in Volume 4 (still
1519unpublished) of Knuth's I<The Art of Computer Programming>
1520and will work on any list:
1521
1522 #!/usr/bin/perl -n
1523 # Fischer-Kause ordered permutation generator
1524
1525 sub permute (&@) {
1526 my $code = shift;
1527 my @idx = 0..$#_;
1528 while ( $code->(@_[@idx]) ) {
1529 my $p = $#idx;
1530 --$p while $idx[$p-1] > $idx[$p];
1531 my $q = $p or return;
1532 push @idx, reverse splice @idx, $p;
1533 ++$q while $idx[$p-1] > $idx[$q];
1534 @idx[$p-1,$q]=@idx[$q,$p-1];
1535 }
68dc0745 1536 }
68dc0745 1537
49d635f9 1538 permute {print"@_\n"} split;
b8d2732a 1539
68dc0745 1540=head2 How do I sort an array by (anything)?
1541
1542Supply a comparison function to sort() (described in L<perlfunc/sort>):
1543
1544 @list = sort { $a <=> $b } @list;
1545
1546The default sort function is cmp, string comparison, which would
c47ff5f1 1547sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1548the numerical comparison operator.
1549
1550If you have a complicated function needed to pull out the part you
1551want to sort on, then don't do it inside the sort function. Pull it
1552out first, because the sort BLOCK can be called many times for the
1553same element. Here's an example of how to pull out the first word
1554after the first number on each item, and then sort those words
1555case-insensitively.
1556
1557 @idx = ();
1558 for (@data) {
1559 ($item) = /\d+\s*(\S+)/;
1560 push @idx, uc($item);
1561 }
1562 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1563
a6dd486b 1564which could also be written this way, using a trick
68dc0745 1565that's come to be known as the Schwartzian Transform:
1566
1567 @sorted = map { $_->[0] }
1568 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1569 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1570
1571If you need to sort on several fields, the following paradigm is useful.
1572
1573 @sorted = sort { field1($a) <=> field1($b) ||
1574 field2($a) cmp field2($b) ||
1575 field3($a) cmp field3($b)
1576 } @data;
1577
1578This can be conveniently combined with precalculation of keys as given
1579above.
1580
379e39d7 1581See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1582To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1583more about this approach.
68dc0745 1584
1585See also the question below on sorting hashes.
1586
1587=head2 How do I manipulate arrays of bits?
1588
1589Use pack() and unpack(), or else vec() and the bitwise operations.
1590
1591For example, this sets $vec to have bit N set if $ints[N] was set:
1592
1593 $vec = '';
1594 foreach(@ints) { vec($vec,$_,1) = 1 }
1595
cc30d1a7 1596Here's how, given a vector in $vec, you can
68dc0745 1597get those bits into your @ints array:
1598
1599 sub bitvec_to_list {
1600 my $vec = shift;
1601 my @ints;
1602 # Find null-byte density then select best algorithm
1603 if ($vec =~ tr/\0// / length $vec > 0.95) {
1604 use integer;
1605 my $i;
1606 # This method is faster with mostly null-bytes
1607 while($vec =~ /[^\0]/g ) {
1608 $i = -9 + 8 * pos $vec;
1609 push @ints, $i if vec($vec, ++$i, 1);
1610 push @ints, $i if vec($vec, ++$i, 1);
1611 push @ints, $i if vec($vec, ++$i, 1);
1612 push @ints, $i if vec($vec, ++$i, 1);
1613 push @ints, $i if vec($vec, ++$i, 1);
1614 push @ints, $i if vec($vec, ++$i, 1);
1615 push @ints, $i if vec($vec, ++$i, 1);
1616 push @ints, $i if vec($vec, ++$i, 1);
1617 }
1618 } else {
1619 # This method is a fast general algorithm
1620 use integer;
1621 my $bits = unpack "b*", $vec;
1622 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1623 push @ints, pos $bits while($bits =~ /1/g);
1624 }
1625 return \@ints;
1626 }
1627
1628This method gets faster the more sparse the bit vector is.
1629(Courtesy of Tim Bunce and Winfried Koenig.)
1630
76817d6d 1631You can make the while loop a lot shorter with this suggestion
1632from Benjamin Goldberg:
1633
1634 while($vec =~ /[^\0]+/g ) {
1635 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1636 }
1637
cc30d1a7 1638Or use the CPAN module Bit::Vector:
1639
1640 $vector = Bit::Vector->new($num_of_bits);
1641 $vector->Index_List_Store(@ints);
1642 @ints = $vector->Index_List_Read();
1643
1644Bit::Vector provides efficient methods for bit vector, sets of small integers
197aec24 1645and "big int" math.
cc30d1a7 1646
1647Here's a more extensive illustration using vec():
65acb1b1 1648
1649 # vec demo
1650 $vector = "\xff\x0f\xef\xfe";
197aec24 1651 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1652 unpack("N", $vector), "\n";
1653 $is_set = vec($vector, 23, 1);
1654 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1655 pvec($vector);
1656
1657 set_vec(1,1,1);
1658 set_vec(3,1,1);
1659 set_vec(23,1,1);
1660
1661 set_vec(3,1,3);
1662 set_vec(3,2,3);
1663 set_vec(3,4,3);
1664 set_vec(3,4,7);
1665 set_vec(3,8,3);
1666 set_vec(3,8,7);
1667
1668 set_vec(0,32,17);
1669 set_vec(1,32,17);
1670
197aec24 1671 sub set_vec {
65acb1b1 1672 my ($offset, $width, $value) = @_;
1673 my $vector = '';
1674 vec($vector, $offset, $width) = $value;
1675 print "offset=$offset width=$width value=$value\n";
1676 pvec($vector);
1677 }
1678
1679 sub pvec {
1680 my $vector = shift;
1681 my $bits = unpack("b*", $vector);
1682 my $i = 0;
1683 my $BASE = 8;
1684
1685 print "vector length in bytes: ", length($vector), "\n";
1686 @bytes = unpack("A8" x length($vector), $bits);
1687 print "bits are: @bytes\n\n";
197aec24 1688 }
65acb1b1 1689
68dc0745 1690=head2 Why does defined() return true on empty arrays and hashes?
1691
65acb1b1 1692The short story is that you should probably only use defined on scalars or
1693functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1694in the 5.004 release or later of Perl for more detail.
68dc0745 1695
1696=head1 Data: Hashes (Associative Arrays)
1697
1698=head2 How do I process an entire hash?
1699
1700Use the each() function (see L<perlfunc/each>) if you don't care
1701whether it's sorted:
1702
5a964f20 1703 while ( ($key, $value) = each %hash) {
68dc0745 1704 print "$key = $value\n";
1705 }
1706
1707If you want it sorted, you'll have to use foreach() on the result of
1708sorting the keys as shown in an earlier question.
1709
1710=head2 What happens if I add or remove keys from a hash while iterating over it?
1711
28b41a80 1712(contributed by brian d foy)
d92eb7b0 1713
28b41a80 1714The easy answer is "Don't do that!"
d92eb7b0 1715
28b41a80 1716If you iterate through the hash with each(), you can delete the key
1717most recently returned without worrying about it. If you delete or add
1718other keys, the iterator may skip or double up on them since perl
1719may rearrange the hash table. See the
1720entry for C<each()> in L<perlfunc>.
68dc0745 1721
1722=head2 How do I look up a hash element by value?
1723
1724Create a reverse hash:
1725
1726 %by_value = reverse %by_key;
1727 $key = $by_value{$value};
1728
1729That's not particularly efficient. It would be more space-efficient
1730to use:
1731
1732 while (($key, $value) = each %by_key) {
1733 $by_value{$value} = $key;
1734 }
1735
d92eb7b0 1736If your hash could have repeated values, the methods above will only find
1737one of the associated keys. This may or may not worry you. If it does
1738worry you, you can always reverse the hash into a hash of arrays instead:
1739
1740 while (($key, $value) = each %by_key) {
1741 push @{$key_list_by_value{$value}}, $key;
1742 }
68dc0745 1743
1744=head2 How can I know how many entries are in a hash?
1745
1746If you mean how many keys, then all you have to do is
875e5c2f 1747use the keys() function in a scalar context:
68dc0745 1748
875e5c2f 1749 $num_keys = keys %hash;
68dc0745 1750
197aec24 1751The keys() function also resets the iterator, which means that you may
1752see strange results if you use this between uses of other hash operators
875e5c2f 1753such as each().
68dc0745 1754
1755=head2 How do I sort a hash (optionally by value instead of key)?
1756
1757Internally, hashes are stored in a way that prevents you from imposing
1758an order on key-value pairs. Instead, you have to sort a list of the
1759keys or values:
1760
1761 @keys = sort keys %hash; # sorted by key
1762 @keys = sort {
1763 $hash{$a} cmp $hash{$b}
1764 } keys %hash; # and by value
1765
1766Here we'll do a reverse numeric sort by value, and if two keys are
a6dd486b 1767identical, sort by length of key, or if that fails, by straight ASCII
1768comparison of the keys (well, possibly modified by your locale--see
68dc0745 1769L<perllocale>).
1770
1771 @keys = sort {
1772 $hash{$b} <=> $hash{$a}
1773 ||
1774 length($b) <=> length($a)
1775 ||
1776 $a cmp $b
1777 } keys %hash;
1778
1779=head2 How can I always keep my hash sorted?
1780
1781You can look into using the DB_File module and tie() using the
1782$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1783The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1784
1785=head2 What's the difference between "delete" and "undef" with hashes?
1786
92993692 1787Hashes contain pairs of scalars: the first is the key, the
1788second is the value. The key will be coerced to a string,
1789although the value can be any kind of scalar: string,
1790number, or reference. If a key $key is present in
1791%hash, C<exists($hash{$key})> will return true. The value
1792for a given key can be C<undef>, in which case
1793C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1794will return true. This corresponds to (C<$key>, C<undef>)
1795being in the hash.
68dc0745 1796
92993692 1797Pictures help... here's the %hash table:
68dc0745 1798
1799 keys values
1800 +------+------+
1801 | a | 3 |
1802 | x | 7 |
1803 | d | 0 |
1804 | e | 2 |
1805 +------+------+
1806
1807And these conditions hold
1808
92993692 1809 $hash{'a'} is true
1810 $hash{'d'} is false
1811 defined $hash{'d'} is true
1812 defined $hash{'a'} is true
1813 exists $hash{'a'} is true (Perl5 only)
1814 grep ($_ eq 'a', keys %hash) is true
68dc0745 1815
1816If you now say
1817
92993692 1818 undef $hash{'a'}
68dc0745 1819
1820your table now reads:
1821
1822
1823 keys values
1824 +------+------+
1825 | a | undef|
1826 | x | 7 |
1827 | d | 0 |
1828 | e | 2 |
1829 +------+------+
1830
1831and these conditions now hold; changes in caps:
1832
92993692 1833 $hash{'a'} is FALSE
1834 $hash{'d'} is false
1835 defined $hash{'d'} is true
1836 defined $hash{'a'} is FALSE
1837 exists $hash{'a'} is true (Perl5 only)
1838 grep ($_ eq 'a', keys %hash) is true
68dc0745 1839
1840Notice the last two: you have an undef value, but a defined key!
1841
1842Now, consider this:
1843
92993692 1844 delete $hash{'a'}
68dc0745 1845
1846your table now reads:
1847
1848 keys values
1849 +------+------+
1850 | x | 7 |
1851 | d | 0 |
1852 | e | 2 |
1853 +------+------+
1854
1855and these conditions now hold; changes in caps:
1856
92993692 1857 $hash{'a'} is false
1858 $hash{'d'} is false
1859 defined $hash{'d'} is true
1860 defined $hash{'a'} is false
1861 exists $hash{'a'} is FALSE (Perl5 only)
1862 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 1863
1864See, the whole entry is gone!
1865
1866=head2 Why don't my tied hashes make the defined/exists distinction?
1867
92993692 1868This depends on the tied hash's implementation of EXISTS().
1869For example, there isn't the concept of undef with hashes
1870that are tied to DBM* files. It also means that exists() and
1871defined() do the same thing with a DBM* file, and what they
1872end up doing is not what they do with ordinary hashes.
68dc0745 1873
1874=head2 How do I reset an each() operation part-way through?
1875
5a964f20 1876Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1877the hash I<and> resets the iterator associated with the hash. You may
1878need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1879re-enter it, the hash iterator has been reset.
68dc0745 1880
1881=head2 How can I get the unique keys from two hashes?
1882
d92eb7b0 1883First you extract the keys from the hashes into lists, then solve
1884the "removing duplicates" problem described above. For example:
68dc0745 1885
1886 %seen = ();
1887 for $element (keys(%foo), keys(%bar)) {
1888 $seen{$element}++;
1889 }
1890 @uniq = keys %seen;
1891
1892Or more succinctly:
1893
1894 @uniq = keys %{{%foo,%bar}};
1895
1896Or if you really want to save space:
1897
1898 %seen = ();
1899 while (defined ($key = each %foo)) {
1900 $seen{$key}++;
1901 }
1902 while (defined ($key = each %bar)) {
1903 $seen{$key}++;
1904 }
1905 @uniq = keys %seen;
1906
1907=head2 How can I store a multidimensional array in a DBM file?
1908
1909Either stringify the structure yourself (no fun), or else
1910get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1911it on top of either DB_File or GDBM_File.
1912
1913=head2 How can I make my hash remember the order I put elements into it?
1914
1915Use the Tie::IxHash from CPAN.
1916
46fc3d4c 1917 use Tie::IxHash;
5f8d77f1 1918 tie my %myhash, 'Tie::IxHash';
49d635f9 1919 for (my $i=0; $i<20; $i++) {
46fc3d4c 1920 $myhash{$i} = 2*$i;
1921 }
49d635f9 1922 my @keys = keys %myhash;
46fc3d4c 1923 # @keys = (0,1,2,3,...)
1924
68dc0745 1925=head2 Why does passing a subroutine an undefined element in a hash create it?
1926
1927If you say something like:
1928
1929 somefunc($hash{"nonesuch key here"});
1930
1931Then that element "autovivifies"; that is, it springs into existence
1932whether you store something there or not. That's because functions
1933get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1934it has to be ready to write it back into the caller's version.
1935
87275199 1936This has been fixed as of Perl5.004.
68dc0745 1937
1938Normally, merely accessing a key's value for a nonexistent key does
1939I<not> cause that key to be forever there. This is different than
1940awk's behavior.
1941
fc36a67e 1942=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1943
65acb1b1 1944Usually a hash ref, perhaps like this:
1945
1946 $record = {
1947 NAME => "Jason",
1948 EMPNO => 132,
1949 TITLE => "deputy peon",
1950 AGE => 23,
1951 SALARY => 37_000,
1952 PALS => [ "Norbert", "Rhys", "Phineas"],
1953 };
1954
1955References are documented in L<perlref> and the upcoming L<perlreftut>.
1956Examples of complex data structures are given in L<perldsc> and
1957L<perllol>. Examples of structures and object-oriented classes are
1958in L<perltoot>.
68dc0745 1959
1960=head2 How can I use a reference as a hash key?
1961
fe854a6f 1962You can't do this directly, but you could use the standard Tie::RefHash
87275199 1963module distributed with Perl.
68dc0745 1964
1965=head1 Data: Misc
1966
1967=head2 How do I handle binary data correctly?
1968
1969Perl is binary clean, so this shouldn't be a problem. For example,
1970this works fine (assuming the files are found):
1971
1972 if (`cat /vmunix` =~ /gzip/) {
1973 print "Your kernel is GNU-zip enabled!\n";
1974 }
1975
d92eb7b0 1976On less elegant (read: Byzantine) systems, however, you have
1977to play tedious games with "text" versus "binary" files. See
49d635f9 1978L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 1979
1980If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1981
54310121 1982If you want to deal with multibyte characters, however, there are
68dc0745 1983some gotchas. See the section on Regular Expressions.
1984
1985=head2 How do I determine whether a scalar is a number/whole/integer/float?
1986
1987Assuming that you don't care about IEEE notations like "NaN" or
1988"Infinity", you probably just want to use a regular expression.
1989
65acb1b1 1990 if (/\D/) { print "has nondigits\n" }
1991 if (/^\d+$/) { print "is a whole number\n" }
1992 if (/^-?\d+$/) { print "is an integer\n" }
1993 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1994 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
881bdbd4 1995 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
65acb1b1 1996 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 1997 { print "a C float\n" }
68dc0745 1998
f0d19b68 1999There are also some commonly used modules for the task.
2000L<Scalar::Util> (distributed with 5.8) provides access to perl's
2001internal function C<looks_like_number> for determining
2002whether a variable looks like a number. L<Data::Types>
2003exports functions that validate data types using both the
2004above and other regular expressions. Thirdly, there is
2005C<Regexp::Common> which has regular expressions to match
2006various types of numbers. Those three modules are available
2007from the CPAN.
2008
2009If you're on a POSIX system, Perl supports the C<POSIX::strtod>
5a964f20 2010function. Its semantics are somewhat cumbersome, so here's a C<getnum>
2011wrapper function for more convenient access. This function takes
2012a string and returns the number it found, or C<undef> for input that
2013isn't a C float. The C<is_numeric> function is a front end to C<getnum>
2014if you just want to say, ``Is this a float?''
2015
2016 sub getnum {
2017 use POSIX qw(strtod);
2018 my $str = shift;
2019 $str =~ s/^\s+//;
2020 $str =~ s/\s+$//;
2021 $! = 0;
2022 my($num, $unparsed) = strtod($str);
2023 if (($str eq '') || ($unparsed != 0) || $!) {
2024 return undef;
2025 } else {
2026 return $num;
197aec24 2027 }
2028 }
5a964f20 2029
197aec24 2030 sub is_numeric { defined getnum($_[0]) }
5a964f20 2031
f0d19b68 2032Or you could check out the L<String::Scanf> module on the CPAN
b5b6f210 2033instead. The POSIX module (part of the standard Perl distribution) provides
2034the C<strtod> and C<strtol> for converting strings to double and longs,
6cecdcac 2035respectively.
68dc0745 2036
2037=head2 How do I keep persistent data across program calls?
2038
2039For some specific applications, you can use one of the DBM modules.
fe854a6f 2040See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
2041or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
2042of the standard distribution. Here's one example using Storable's C<store>
2043and C<retrieve> functions:
65acb1b1 2044
197aec24 2045 use Storable;
65acb1b1 2046 store(\%hash, "filename");
2047
197aec24 2048 # later on...
65acb1b1 2049 $href = retrieve("filename"); # by ref
2050 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2051
2052=head2 How do I print out or copy a recursive data structure?
2053
65acb1b1 2054The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
6f82c03a 2055for printing out data structures. The Storable module on CPAN (or the
20565.8 release of Perl), provides a function called C<dclone> that recursively
2057copies its argument.
65acb1b1 2058
197aec24 2059 use Storable qw(dclone);
65acb1b1 2060 $r2 = dclone($r1);
68dc0745 2061
65acb1b1 2062Where $r1 can be a reference to any kind of data structure you'd like.
2063It will be deeply copied. Because C<dclone> takes and returns references,
2064you'd have to add extra punctuation if you had a hash of arrays that
2065you wanted to copy.
68dc0745 2066
65acb1b1 2067 %newhash = %{ dclone(\%oldhash) };
68dc0745 2068
2069=head2 How do I define methods for every class/object?
2070
2071Use the UNIVERSAL class (see L<UNIVERSAL>).
2072
2073=head2 How do I verify a credit card checksum?
2074
2075Get the Business::CreditCard module from CPAN.
2076
65acb1b1 2077=head2 How do I pack arrays of doubles or floats for XS code?
2078
2079The kgbpack.c code in the PGPLOT module on CPAN does just this.
2080If you're doing a lot of float or double processing, consider using
2081the PDL module from CPAN instead--it makes number-crunching easy.
2082
68dc0745 2083=head1 AUTHOR AND COPYRIGHT
2084
7678cced 2085Copyright (c) 1997-2005 Tom Christiansen, Nathan Torkington, and
2086other authors as noted. All rights reserved.
5a964f20 2087
5a7beb56 2088This documentation is free; you can redistribute it and/or modify it
2089under the same terms as Perl itself.
5a964f20 2090
2091Irrespective of its distribution, all code examples in this file
2092are hereby placed into the public domain. You are permitted and
2093encouraged to use this code in your own programs for fun
2094or for profit as you see fit. A simple comment in the code giving
2095credit would be courteous but is not required.