perlhack.pod typos [PATCH]
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
2c646907 3perlfaq4 - Data Manipulation ($Revision: 1.21 $, $Date: 2002/05/06 13:08:46 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
5a964f20 14The infinite set that a mathematician thinks of as the real numbers can
a6dd486b 15only be approximated on a computer, since the computer only has a finite
5a964f20 16number of bits to store an infinite number of, um, numbers.
17
46fc3d4c 18Internally, your computer represents floating-point numbers in binary.
92c2ed05 19Floating-point numbers read in from a file or appearing as literals
20in your program are converted from their decimal floating-point
a6dd486b 21representation (eg, 19.95) to an internal binary representation.
46fc3d4c 22
23However, 19.95 can't be precisely represented as a binary
24floating-point number, just like 1/3 can't be exactly represented as a
25decimal floating-point number. The computer's binary representation
26of 19.95, therefore, isn't exactly 19.95.
27
28When a floating-point number gets printed, the binary floating-point
29representation is converted back to decimal. These decimal numbers
30are displayed in either the format you specify with printf(), or the
a6dd486b 31current output format for numbers. (See L<perlvar/"$#"> if you use
46fc3d4c 32print. C<$#> has a different default value in Perl5 than it did in
87275199 33Perl4. Changing C<$#> yourself is deprecated.)
46fc3d4c 34
35This affects B<all> computer languages that represent decimal
36floating-point numbers in binary, not just Perl. Perl provides
37arbitrary-precision decimal numbers with the Math::BigFloat module
38(part of the standard Perl distribution), but mathematical operations
39are consequently slower.
40
80ba158a 41If precision is important, such as when dealing with money, it's good
1affb2ee 42to work with integers and then divide at the last possible moment.
43For example, work in pennies (1995) instead of dollars and cents
6b927632 44(19.95) and divide by 100 at the end.
1affb2ee 45
46fc3d4c 46To get rid of the superfluous digits, just use a format (eg,
47C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 48See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 49
68dc0745 50=head2 Why isn't my octal data interpreted correctly?
51
52Perl only understands octal and hex numbers as such when they occur
33ce146f 53as literals in your program. Octal literals in perl must start with
54a leading "0" and hexadecimal literals must start with a leading "0x".
55If they are read in from somewhere and assigned, no automatic
56conversion takes place. You must explicitly use oct() or hex() if you
57want the values converted to decimal. oct() interprets
68dc0745 58both hex ("0x350") numbers and octal ones ("0350" or even without the
59leading "0", like "377"), while hex() only converts hexadecimal ones,
60with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
33ce146f 61The inverse mapping from decimal to octal can be done with either the
62"%o" or "%O" sprintf() formats. To get from decimal to hex try either
63the "%x" or the "%X" formats to sprintf().
68dc0745 64
65This problem shows up most often when people try using chmod(), mkdir(),
33ce146f 66umask(), or sysopen(), which by widespread tradition typically take
67permissions in octal.
68dc0745 68
33ce146f 69 chmod(644, $file); # WRONG
68dc0745 70 chmod(0644, $file); # right
71
33ce146f 72Note the mistake in the first line was specifying the decimal literal
73644, rather than the intended octal literal 0644. The problem can
74be seen with:
75
434f7166 76 printf("%#o",644); # prints 01204
33ce146f 77
78Surely you had not intended C<chmod(01204, $file);> - did you? If you
79want to use numeric literals as arguments to chmod() et al. then please
80try to express them as octal constants, that is with a leading zero and
81with the following digits restricted to the set 0..7.
82
65acb1b1 83=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 84
92c2ed05 85Remember that int() merely truncates toward 0. For rounding to a
86certain number of digits, sprintf() or printf() is usually the easiest
87route.
88
89 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 90
87275199 91The POSIX module (part of the standard Perl distribution) implements
68dc0745 92ceil(), floor(), and a number of other mathematical and trigonometric
93functions.
94
92c2ed05 95 use POSIX;
96 $ceil = ceil(3.5); # 4
97 $floor = floor(3.5); # 3
98
a6dd486b 99In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
87275199 100module. With 5.004, the Math::Trig module (part of the standard Perl
46fc3d4c 101distribution) implements the trigonometric functions. Internally it
102uses the Math::Complex module and some functions can break out from
103the real axis into the complex plane, for example the inverse sine of
1042.
68dc0745 105
106Rounding in financial applications can have serious implications, and
107the rounding method used should be specified precisely. In these
108cases, it probably pays not to trust whichever system rounding is
109being used by Perl, but to instead implement the rounding function you
110need yourself.
111
65acb1b1 112To see why, notice how you'll still have an issue on half-way-point
113alternation:
114
115 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
116
117 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
118 0.8 0.8 0.9 0.9 1.0 1.0
119
120Don't blame Perl. It's the same as in C. IEEE says we have to do this.
121Perl numbers whose absolute values are integers under 2**31 (on 32 bit
122machines) will work pretty much like mathematical integers. Other numbers
123are not guaranteed.
124
ae3d0b9f 125=head2 How do I convert between numeric representations?
68dc0745 126
6761e064 127As always with Perl there is more than one way to do it. Below
128are a few examples of approaches to making common conversions
129between number representations. This is intended to be representational
130rather than exhaustive.
68dc0745 131
6761e064 132Some of the examples below use the Bit::Vector module from CPAN.
133The reason you might choose Bit::Vector over the perl built in
134functions is that it works with numbers of ANY size, that it is
135optimized for speed on some operations, and for at least some
136programmers the notation might be familiar.
d92eb7b0 137
2c646907 138=item B<How do I convert hexadecimal into decimal:>
d92eb7b0 139
6761e064 140Using perl's built in conversion of 0x notation:
141
142 $int = 0xDEADBEEF;
143 $dec = sprintf("%d", $int);
7207e29d 144
6761e064 145Using the hex function:
146
147 $int = hex("DEADBEEF");
148 $dec = sprintf("%d", $int);
149
150Using pack:
151
152 $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
153 $dec = sprintf("%d", $int);
154
155Using the CPAN module Bit::Vector:
156
157 use Bit::Vector;
158 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
159 $dec = $vec->to_Dec();
160
161=item B<How do I convert from decimal to hexadecimal:>
162
163Using sprint:
164
165 $hex = sprintf("%X", 3735928559);
166
167Using unpack
168
169 $hex = unpack("H*", pack("N", 3735928559));
170
171Using Bit::Vector
172
173 use Bit::Vector;
174 $vec = Bit::Vector->new_Dec(32, -559038737);
175 $hex = $vec->to_Hex();
176
177And Bit::Vector supports odd bit counts:
178
179 use Bit::Vector;
180 $vec = Bit::Vector->new_Dec(33, 3735928559);
181 $vec->Resize(32); # suppress leading 0 if unwanted
182 $hex = $vec->to_Hex();
183
184=item B<How do I convert from octal to decimal:>
185
186Using Perl's built in conversion of numbers with leading zeros:
187
188 $int = 033653337357; # note the leading 0!
189 $dec = sprintf("%d", $int);
190
191Using the oct function:
192
193 $int = oct("33653337357");
194 $dec = sprintf("%d", $int);
195
196Using Bit::Vector:
197
198 use Bit::Vector;
199 $vec = Bit::Vector->new(32);
200 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
201 $dec = $vec->to_Dec();
202
203=item B<How do I convert from decimal to octal:>
204
205Using sprintf:
206
207 $oct = sprintf("%o", 3735928559);
208
209Using Bit::Vector
210
211 use Bit::Vector;
212 $vec = Bit::Vector->new_Dec(32, -559038737);
213 $oct = reverse join('', $vec->Chunk_List_Read(3));
214
215=item B<How do I convert from binary to decimal:>
216
2c646907 217Perl 5.6 lets you write binary numbers directly with
218the 0b notation:
219
220 $number = 0b10110110;
221
6761e064 222Using pack and ord
d92eb7b0 223
224 $decimal = ord(pack('B8', '10110110'));
68dc0745 225
6761e064 226Using pack and unpack for larger strings
227
228 $int = unpack("N", pack("B32",
229 substr("0" x 32 . "11110101011011011111011101111", -32)));
230 $dec = sprintf("%d", $int);
231
5efd7060 232 # substr() is used to left pad a 32 character string with zeros.
6761e064 233
234Using Bit::Vector:
235
236 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
237 $dec = $vec->to_Dec();
238
239=item B<How do I convert from decimal to binary:>
240
241Using unpack;
242
243 $bin = unpack("B*", pack("N", 3735928559));
244
245Using Bit::Vector:
246
247 use Bit::Vector;
248 $vec = Bit::Vector->new_Dec(32, -559038737);
249 $bin = $vec->to_Bin();
250
251The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
252are left as an exercise to the inclined reader.
68dc0745 253
68dc0745 254
65acb1b1 255=head2 Why doesn't & work the way I want it to?
256
257The behavior of binary arithmetic operators depends on whether they're
258used on numbers or strings. The operators treat a string as a series
259of bits and work with that (the string C<"3"> is the bit pattern
260C<00110011>). The operators work with the binary form of a number
261(the number C<3> is treated as the bit pattern C<00000011>).
262
263So, saying C<11 & 3> performs the "and" operation on numbers (yielding
264C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
265(yielding C<"1">).
266
267Most problems with C<&> and C<|> arise because the programmer thinks
268they have a number but really it's a string. The rest arise because
269the programmer says:
270
271 if ("\020\020" & "\101\101") {
272 # ...
273 }
274
275but a string consisting of two null bytes (the result of C<"\020\020"
276& "\101\101">) is not a false value in Perl. You need:
277
278 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
279 # ...
280 }
281
68dc0745 282=head2 How do I multiply matrices?
283
284Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
285or the PDL extension (also available from CPAN).
286
287=head2 How do I perform an operation on a series of integers?
288
289To call a function on each element in an array, and collect the
290results, use:
291
292 @results = map { my_func($_) } @array;
293
294For example:
295
296 @triple = map { 3 * $_ } @single;
297
298To call a function on each element of an array, but ignore the
299results:
300
301 foreach $iterator (@array) {
65acb1b1 302 some_func($iterator);
68dc0745 303 }
304
305To call a function on each integer in a (small) range, you B<can> use:
306
65acb1b1 307 @results = map { some_func($_) } (5 .. 25);
68dc0745 308
309but you should be aware that the C<..> operator creates an array of
310all integers in the range. This can take a lot of memory for large
311ranges. Instead use:
312
313 @results = ();
314 for ($i=5; $i < 500_005; $i++) {
65acb1b1 315 push(@results, some_func($i));
68dc0745 316 }
317
87275199 318This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
319loop will iterate over the range, without creating the entire range.
320
321 for my $i (5 .. 500_005) {
322 push(@results, some_func($i));
323 }
324
325will not create a list of 500,000 integers.
326
68dc0745 327=head2 How can I output Roman numerals?
328
a93751fa 329Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 330
331=head2 Why aren't my random numbers random?
332
65acb1b1 333If you're using a version of Perl before 5.004, you must call C<srand>
334once at the start of your program to seed the random number generator.
3355.004 and later automatically call C<srand> at the beginning. Don't
336call C<srand> more than once--you make your numbers less random, rather
337than more.
92c2ed05 338
65acb1b1 339Computers are good at being predictable and bad at being random
06a5f41f 340(despite appearances caused by bugs in your programs :-). see the
341F<random> artitcle in the "Far More Than You Ever Wanted To Know"
342collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of
343Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
344who attempts to generate random numbers by deterministic means is, of
65acb1b1 345course, living in a state of sin.''
346
347If you want numbers that are more random than C<rand> with C<srand>
348provides, you should also check out the Math::TrulyRandom module from
349CPAN. It uses the imperfections in your system's timer to generate
350random numbers, but this takes quite a while. If you want a better
92c2ed05 351pseudorandom generator than comes with your operating system, look at
65acb1b1 352``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 353
881bdbd4 354=head2 How do I get a random number between X and Y?
355
356Use the following simple function. It selects a random integer between
357(and possibly including!) the two given integers, e.g.,
358C<random_int_in(50,120)>
359
360 sub random_int_in ($$) {
361 my($min, $max) = @_;
362 # Assumes that the two arguments are integers themselves!
363 return $min if $min == $max;
364 ($min, $max) = ($max, $min) if $min > $max;
365 return $min + int rand(1 + $max - $min);
366 }
367
68dc0745 368=head1 Data: Dates
369
370=head2 How do I find the week-of-the-year/day-of-the-year?
371
372The day of the year is in the array returned by localtime() (see
373L<perlfunc/"localtime">):
374
375 $day_of_year = (localtime(time()))[7];
376
d92eb7b0 377=head2 How do I find the current century or millennium?
378
379Use the following simple functions:
380
381 sub get_century {
382 return int((((localtime(shift || time))[5] + 1999))/100);
383 }
384 sub get_millennium {
385 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
386 }
387
388On some systems, you'll find that the POSIX module's strftime() function
389has been extended in a non-standard way to use a C<%C> format, which they
390sometimes claim is the "century". It isn't, because on most such systems,
391this is only the first two digits of the four-digit year, and thus cannot
392be used to reliably determine the current century or millennium.
393
92c2ed05 394=head2 How can I compare two dates and find the difference?
68dc0745 395
92c2ed05 396If you're storing your dates as epoch seconds then simply subtract one
397from the other. If you've got a structured date (distinct year, day,
d92eb7b0 398month, hour, minute, seconds values), then for reasons of accessibility,
399simplicity, and efficiency, merely use either timelocal or timegm (from
400the Time::Local module in the standard distribution) to reduce structured
401dates to epoch seconds. However, if you don't know the precise format of
402your dates, then you should probably use either of the Date::Manip and
403Date::Calc modules from CPAN before you go hacking up your own parsing
404routine to handle arbitrary date formats.
68dc0745 405
406=head2 How can I take a string and turn it into epoch seconds?
407
408If it's a regular enough string that it always has the same format,
92c2ed05 409you can split it up and pass the parts to C<timelocal> in the standard
410Time::Local module. Otherwise, you should look into the Date::Calc
411and Date::Manip modules from CPAN.
68dc0745 412
413=head2 How can I find the Julian Day?
414
2a2bf5f4 415Use the Time::JulianDay module (part of the Time-modules bundle
416available from CPAN.)
d92eb7b0 417
89435c96 418Before you immerse yourself too deeply in this, be sure to verify that
419it is the I<Julian> Day you really want. Are you interested in a way
420of getting serial days so that you just can tell how many days they
421are apart or so that you can do also other date arithmetic? If you
d92eb7b0 422are interested in performing date arithmetic, this can be done using
2a2bf5f4 423modules Date::Manip or Date::Calc.
89435c96 424
425There is too many details and much confusion on this issue to cover in
426this FAQ, but the term is applied (correctly) to a calendar now
427supplanted by the Gregorian Calendar, with the Julian Calendar failing
428to adjust properly for leap years on centennial years (among other
429annoyances). The term is also used (incorrectly) to mean: [1] days in
430the Gregorian Calendar; and [2] days since a particular starting time
431or `epoch', usually 1970 in the Unix world and 1980 in the
432MS-DOS/Windows world. If you find that it is not the first meaning
433that you really want, then check out the Date::Manip and Date::Calc
434modules. (Thanks to David Cassell for most of this text.)
be94a901 435
65acb1b1 436=head2 How do I find yesterday's date?
437
438The C<time()> function returns the current time in seconds since the
d92eb7b0 439epoch. Take twenty-four hours off that:
65acb1b1 440
441 $yesterday = time() - ( 24 * 60 * 60 );
442
443Then you can pass this to C<localtime()> and get the individual year,
444month, day, hour, minute, seconds values.
445
d92eb7b0 446Note very carefully that the code above assumes that your days are
447twenty-four hours each. For most people, there are two days a year
448when they aren't: the switch to and from summer time throws this off.
449A solution to this issue is offered by Russ Allbery.
450
451 sub yesterday {
452 my $now = defined $_[0] ? $_[0] : time;
453 my $then = $now - 60 * 60 * 24;
454 my $ndst = (localtime $now)[8] > 0;
455 my $tdst = (localtime $then)[8] > 0;
456 $then - ($tdst - $ndst) * 60 * 60;
457 }
458 # Should give you "this time yesterday" in seconds since epoch relative to
459 # the first argument or the current time if no argument is given and
460 # suitable for passing to localtime or whatever else you need to do with
461 # it. $ndst is whether we're currently in daylight savings time; $tdst is
462 # whether the point 24 hours ago was in daylight savings time. If $tdst
463 # and $ndst are the same, a boundary wasn't crossed, and the correction
464 # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
465 # from yesterday's time since we gained an extra hour while going off
466 # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
467 # negative hour (add an hour) to yesterday's time since we lost an hour.
468 #
469 # All of this is because during those days when one switches off or onto
470 # DST, a "day" isn't 24 hours long; it's either 23 or 25.
471 #
472 # The explicit settings of $ndst and $tdst are necessary because localtime
473 # only says it returns the system tm struct, and the system tm struct at
87275199 474 # least on Solaris doesn't guarantee any particular positive value (like,
d92eb7b0 475 # say, 1) for isdst, just a positive value. And that value can
476 # potentially be negative, if DST information isn't available (this sub
477 # just treats those cases like no DST).
478 #
479 # Note that between 2am and 3am on the day after the time zone switches
480 # off daylight savings time, the exact hour of "yesterday" corresponding
481 # to the current hour is not clearly defined. Note also that if used
482 # between 2am and 3am the day after the change to daylight savings time,
483 # the result will be between 3am and 4am of the previous day; it's
484 # arguable whether this is correct.
485 #
486 # This sub does not attempt to deal with leap seconds (most things don't).
487 #
488 # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
489 # This code is in the public domain
490
87275199 491=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 492
65acb1b1 493Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
494Y2K compliant (whatever that means). The programmers you've hired to
495use it, however, probably are not.
496
497Long answer: The question belies a true understanding of the issue.
498Perl is just as Y2K compliant as your pencil--no more, and no less.
499Can you use your pencil to write a non-Y2K-compliant memo? Of course
500you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 501
87275199 502The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 503supply adequate information to determine the year well beyond 2000
504(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 505by these functions when used in a list context is the year minus 1900.
65acb1b1 506For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
507number. To avoid the year 2000 problem simply do not treat the year as
508a 2-digit number. It isn't.
68dc0745 509
5a964f20 510When gmtime() and localtime() are used in scalar context they return
68dc0745 511a timestamp string that contains a fully-expanded year. For example,
512C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5132001". There's no year 2000 problem here.
514
5a964f20 515That doesn't mean that Perl can't be used to create non-Y2K compliant
516programs. It can. But so can your pencil. It's the fault of the user,
517not the language. At the risk of inflaming the NRA: ``Perl doesn't
518break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
519a longer exposition.
520
68dc0745 521=head1 Data: Strings
522
523=head2 How do I validate input?
524
525The answer to this question is usually a regular expression, perhaps
5a964f20 526with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 527addresses, etc.) for details.
528
529=head2 How do I unescape a string?
530
92c2ed05 531It depends just what you mean by ``escape''. URL escapes are dealt
532with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 533character are removed with
68dc0745 534
535 s/\\(.)/$1/g;
536
92c2ed05 537This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 538
539=head2 How do I remove consecutive pairs of characters?
540
92c2ed05 541To turn C<"abbcccd"> into C<"abccd">:
68dc0745 542
d92eb7b0 543 s/(.)\1/$1/g; # add /s to include newlines
544
545Here's a solution that turns "abbcccd" to "abcd":
546
547 y///cs; # y == tr, but shorter :-)
68dc0745 548
549=head2 How do I expand function calls in a string?
550
551This is documented in L<perlref>. In general, this is fraught with
552quoting and readability problems, but it is possible. To interpolate
5a964f20 553a subroutine call (in list context) into a string:
68dc0745 554
555 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
556
557If you prefer scalar context, similar chicanery is also useful for
558arbitrary expressions:
559
560 print "That yields ${\($n + 5)} widgets\n";
561
92c2ed05 562Version 5.004 of Perl had a bug that gave list context to the
563expression in C<${...}>, but this is fixed in version 5.005.
564
565See also ``How can I expand variables in text strings?'' in this
566section of the FAQ.
46fc3d4c 567
68dc0745 568=head2 How do I find matching/nesting anything?
569
92c2ed05 570This isn't something that can be done in one regular expression, no
571matter how complicated. To find something between two single
572characters, a pattern like C</x([^x]*)x/> will get the intervening
573bits in $1. For multiple ones, then something more like
574C</alpha(.*?)omega/> would be needed. But none of these deals with
575nested patterns, nor can they. For that you'll have to write a
576parser.
577
578If you are serious about writing a parser, there are a number of
6a2af475 579modules or oddities that will make your life a lot easier. There are
580the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
83df6a1d 581and the byacc program. Starting from perl 5.8 the Text::Balanced
582is part of the standard distribution.
68dc0745 583
92c2ed05 584One simple destructive, inside-out approach that you might try is to
585pull out the smallest nesting parts one at a time:
5a964f20 586
d92eb7b0 587 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 588 # do something with $1
589 }
590
65acb1b1 591A more complicated and sneaky approach is to make Perl's regular
592expression engine do it for you. This is courtesy Dean Inada, and
593rather has the nature of an Obfuscated Perl Contest entry, but it
594really does work:
595
596 # $_ contains the string to parse
597 # BEGIN and END are the opening and closing markers for the
598 # nested text.
c47ff5f1 599
65acb1b1 600 @( = ('(','');
601 @) = (')','');
602 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
5ed30e05 603 @$ = (eval{/$re/},$@!~/unmatched/i);
65acb1b1 604 print join("\n",@$[0..$#$]) if( $$[-1] );
605
68dc0745 606=head2 How do I reverse a string?
607
5a964f20 608Use reverse() in scalar context, as documented in
68dc0745 609L<perlfunc/reverse>.
610
611 $reversed = reverse $string;
612
613=head2 How do I expand tabs in a string?
614
5a964f20 615You can do it yourself:
68dc0745 616
617 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
618
87275199 619Or you can just use the Text::Tabs module (part of the standard Perl
68dc0745 620distribution).
621
622 use Text::Tabs;
623 @expanded_lines = expand(@lines_with_tabs);
624
625=head2 How do I reformat a paragraph?
626
87275199 627Use Text::Wrap (part of the standard Perl distribution):
68dc0745 628
629 use Text::Wrap;
630 print wrap("\t", ' ', @paragraphs);
631
92c2ed05 632The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 633newlines. Text::Wrap doesn't justify the lines (flush-right).
634
bc06af74 635Or use the CPAN module Text::Autoformat. Formatting files can be easily
636done by making a shell alias, like so:
637
638 alias fmt="perl -i -MText::Autoformat -n0777 \
639 -e 'print autoformat $_, {all=>1}' $*"
640
641See the documentation for Text::Autoformat to appreciate its many
642capabilities.
643
68dc0745 644=head2 How can I access/change the first N letters of a string?
645
646There are many ways. If you just want to grab a copy, use
92c2ed05 647substr():
68dc0745 648
649 $first_byte = substr($a, 0, 1);
650
651If you want to modify part of a string, the simplest way is often to
652use substr() as an lvalue:
653
654 substr($a, 0, 3) = "Tom";
655
92c2ed05 656Although those with a pattern matching kind of thought process will
a6dd486b 657likely prefer
68dc0745 658
659 $a =~ s/^.../Tom/;
660
661=head2 How do I change the Nth occurrence of something?
662
92c2ed05 663You have to keep track of N yourself. For example, let's say you want
664to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 665C<"whosoever"> or C<"whomsoever">, case insensitively. These
666all assume that $_ contains the string to be altered.
68dc0745 667
668 $count = 0;
669 s{((whom?)ever)}{
670 ++$count == 5 # is it the 5th?
671 ? "${2}soever" # yes, swap
672 : $1 # renege and leave it there
d92eb7b0 673 }ige;
68dc0745 674
5a964f20 675In the more general case, you can use the C</g> modifier in a C<while>
676loop, keeping count of matches.
677
678 $WANT = 3;
679 $count = 0;
d92eb7b0 680 $_ = "One fish two fish red fish blue fish";
5a964f20 681 while (/(\w+)\s+fish\b/gi) {
682 if (++$count == $WANT) {
683 print "The third fish is a $1 one.\n";
5a964f20 684 }
685 }
686
92c2ed05 687That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 688repetition count and repeated pattern like this:
689
690 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
691
68dc0745 692=head2 How can I count the number of occurrences of a substring within a string?
693
a6dd486b 694There are a number of ways, with varying efficiency. If you want a
68dc0745 695count of a certain single character (X) within a string, you can use the
696C<tr///> function like so:
697
368c9434 698 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 699 $count = ($string =~ tr/X//);
d92eb7b0 700 print "There are $count X characters in the string";
68dc0745 701
702This is fine if you are just looking for a single character. However,
703if you are trying to count multiple character substrings within a
704larger string, C<tr///> won't work. What you can do is wrap a while()
705loop around a global pattern match. For example, let's count negative
706integers:
707
708 $string = "-9 55 48 -2 23 -76 4 14 -44";
709 while ($string =~ /-\d+/g) { $count++ }
710 print "There are $count negative numbers in the string";
711
881bdbd4 712Another version uses a global match in list context, then assigns the
713result to a scalar, producing a count of the number of matches.
714
715 $count = () = $string =~ /-\d+/g;
716
68dc0745 717=head2 How do I capitalize all the words on one line?
718
719To make the first letter of each word upper case:
3fe9a6f1 720
68dc0745 721 $line =~ s/\b(\w)/\U$1/g;
722
46fc3d4c 723This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 724Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 725more thorough solution (Suggested by brian d foy):
46fc3d4c 726
727 $string =~ s/ (
728 (^\w) #at the beginning of the line
729 | # or
730 (\s\w) #preceded by whitespace
731 )
732 /\U$1/xg;
733 $string =~ /([\w']+)/\u\L$1/g;
734
68dc0745 735To make the whole line upper case:
3fe9a6f1 736
68dc0745 737 $line = uc($line);
738
739To force each word to be lower case, with the first letter upper case:
3fe9a6f1 740
68dc0745 741 $line =~ s/(\w+)/\u\L$1/g;
742
5a964f20 743You can (and probably should) enable locale awareness of those
744characters by placing a C<use locale> pragma in your program.
92c2ed05 745See L<perllocale> for endless details on locales.
5a964f20 746
65acb1b1 747This is sometimes referred to as putting something into "title
d92eb7b0 748case", but that's not quite accurate. Consider the proper
65acb1b1 749capitalization of the movie I<Dr. Strangelove or: How I Learned to
750Stop Worrying and Love the Bomb>, for example.
751
68dc0745 752=head2 How can I split a [character] delimited string except when inside
753[character]? (Comma-separated files)
754
755Take the example case of trying to split a string that is comma-separated
756into its different fields. (We'll pretend you said comma-separated, not
757comma-delimited, which is different and almost never what you mean.) You
758can't use C<split(/,/)> because you shouldn't split if the comma is inside
759quotes. For example, take a data line like this:
760
761 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
762
763Due to the restriction of the quotes, this is a fairly complex
764problem. Thankfully, we have Jeffrey Friedl, author of a highly
765recommended book on regular expressions, to handle these for us. He
766suggests (assuming your string is contained in $text):
767
768 @new = ();
769 push(@new, $+) while $text =~ m{
770 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
771 | ([^,]+),?
772 | ,
773 }gx;
774 push(@new, undef) if substr($text,-1,1) eq ',';
775
46fc3d4c 776If you want to represent quotation marks inside a
777quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 778C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 779this section.
780
87275199 781Alternatively, the Text::ParseWords module (part of the standard Perl
68dc0745 782distribution) lets you say:
783
784 use Text::ParseWords;
785 @new = quotewords(",", 0, $text);
786
a6dd486b 787There's also a Text::CSV (Comma-Separated Values) module on CPAN.
65acb1b1 788
68dc0745 789=head2 How do I strip blank space from the beginning/end of a string?
790
a6dd486b 791Although the simplest approach would seem to be
68dc0745 792
793 $string =~ s/^\s*(.*?)\s*$/$1/;
794
a6dd486b 795not only is this unnecessarily slow and destructive, it also fails with
d92eb7b0 796embedded newlines. It is much faster to do this operation in two steps:
68dc0745 797
798 $string =~ s/^\s+//;
799 $string =~ s/\s+$//;
800
801Or more nicely written as:
802
803 for ($string) {
804 s/^\s+//;
805 s/\s+$//;
806 }
807
5e3006a4 808This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 809behavior to factor out common code. You can do this
810on several strings at once, or arrays, or even the
d92eb7b0 811values of a hash if you use a slice:
5a964f20 812
813 # trim whitespace in the scalar, the array,
814 # and all the values in the hash
815 foreach ($scalar, @array, @hash{keys %hash}) {
816 s/^\s+//;
817 s/\s+$//;
818 }
819
65acb1b1 820=head2 How do I pad a string with blanks or pad a number with zeroes?
821
d92eb7b0 822(This answer contributed by Uri Guttman, with kibitzing from
823Bart Lateur.)
65acb1b1 824
825In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 826to pad the string, C<$text> or C<$num> contains the string to be padded,
827and C<$pad_char> contains the padding character. You can use a single
828character string constant instead of the C<$pad_char> variable if you
829know what it is in advance. And in the same way you can use an integer in
830place of C<$pad_len> if you know the pad length in advance.
65acb1b1 831
d92eb7b0 832The simplest method uses the C<sprintf> function. It can pad on the left
833or right with blanks and on the left with zeroes and it will not
834truncate the result. The C<pack> function can only pad strings on the
835right with blanks and it will truncate the result to a maximum length of
836C<$pad_len>.
65acb1b1 837
d92eb7b0 838 # Left padding a string with blanks (no truncation):
839 $padded = sprintf("%${pad_len}s", $text);
65acb1b1 840
d92eb7b0 841 # Right padding a string with blanks (no truncation):
842 $padded = sprintf("%-${pad_len}s", $text);
65acb1b1 843
d92eb7b0 844 # Left padding a number with 0 (no truncation):
845 $padded = sprintf("%0${pad_len}d", $num);
65acb1b1 846
d92eb7b0 847 # Right padding a string with blanks using pack (will truncate):
848 $padded = pack("A$pad_len",$text);
65acb1b1 849
d92eb7b0 850If you need to pad with a character other than blank or zero you can use
851one of the following methods. They all generate a pad string with the
852C<x> operator and combine that with C<$text>. These methods do
853not truncate C<$text>.
65acb1b1 854
d92eb7b0 855Left and right padding with any character, creating a new string:
65acb1b1 856
d92eb7b0 857 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
858 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 859
d92eb7b0 860Left and right padding with any character, modifying C<$text> directly:
65acb1b1 861
d92eb7b0 862 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
863 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 864
68dc0745 865=head2 How do I extract selected columns from a string?
866
867Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 868If you prefer thinking in terms of columns instead of widths,
869you can use this kind of thing:
870
871 # determine the unpack format needed to split Linux ps output
872 # arguments are cut columns
873 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
874
875 sub cut2fmt {
876 my(@positions) = @_;
877 my $template = '';
878 my $lastpos = 1;
879 for my $place (@positions) {
880 $template .= "A" . ($place - $lastpos) . " ";
881 $lastpos = $place;
882 }
883 $template .= "A*";
884 return $template;
885 }
68dc0745 886
887=head2 How do I find the soundex value of a string?
888
87275199 889Use the standard Text::Soundex module distributed with Perl.
a6dd486b 890Before you do so, you may want to determine whether `soundex' is in
d92eb7b0 891fact what you think it is. Knuth's soundex algorithm compresses words
892into a small space, and so it does not necessarily distinguish between
893two words which you might want to appear separately. For example, the
894last names `Knuth' and `Kant' are both mapped to the soundex code K530.
895If Text::Soundex does not do what you are looking for, you might want
896to consider the String::Approx module available at CPAN.
68dc0745 897
898=head2 How can I expand variables in text strings?
899
900Let's assume that you have a string like:
901
902 $text = 'this has a $foo in it and a $bar';
5a964f20 903
904If those were both global variables, then this would
905suffice:
906
65acb1b1 907 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 908
5a964f20 909But since they are probably lexicals, or at least, they could
910be, you'd have to do this:
68dc0745 911
912 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 913 die if $@; # needed /ee, not /e
68dc0745 914
5a964f20 915It's probably better in the general case to treat those
916variables as entries in some special hash. For example:
917
918 %user_defs = (
919 foo => 23,
920 bar => 19,
921 );
922 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 923
92c2ed05 924See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 925of the FAQ.
926
68dc0745 927=head2 What's wrong with always quoting "$vars"?
928
a6dd486b 929The problem is that those double-quotes force stringification--
930coercing numbers and references into strings--even when you
931don't want them to be strings. Think of it this way: double-quote
65acb1b1 932expansion is used to produce new strings. If you already
933have a string, why do you need more?
68dc0745 934
935If you get used to writing odd things like these:
936
937 print "$var"; # BAD
938 $new = "$old"; # BAD
939 somefunc("$var"); # BAD
940
941You'll be in trouble. Those should (in 99.8% of the cases) be
942the simpler and more direct:
943
944 print $var;
945 $new = $old;
946 somefunc($var);
947
948Otherwise, besides slowing you down, you're going to break code when
949the thing in the scalar is actually neither a string nor a number, but
950a reference:
951
952 func(\@array);
953 sub func {
954 my $aref = shift;
955 my $oref = "$aref"; # WRONG
956 }
957
958You can also get into subtle problems on those few operations in Perl
959that actually do care about the difference between a string and a
960number, such as the magical C<++> autoincrement operator or the
961syscall() function.
962
5a964f20 963Stringification also destroys arrays.
964
965 @lines = `command`;
966 print "@lines"; # WRONG - extra blanks
967 print @lines; # right
968
c47ff5f1 969=head2 Why don't my <<HERE documents work?
68dc0745 970
971Check for these three things:
972
973=over 4
974
975=item 1. There must be no space after the << part.
976
977=item 2. There (probably) should be a semicolon at the end.
978
979=item 3. You can't (easily) have any space in front of the tag.
980
981=back
982
5a964f20 983If you want to indent the text in the here document, you
984can do this:
985
986 # all in one
987 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
988 your text
989 goes here
990 HERE_TARGET
991
992But the HERE_TARGET must still be flush against the margin.
993If you want that indented also, you'll have to quote
994in the indentation.
995
996 ($quote = <<' FINIS') =~ s/^\s+//gm;
997 ...we will have peace, when you and all your works have
998 perished--and the works of your dark master to whom you
999 would deliver us. You are a liar, Saruman, and a corrupter
1000 of men's hearts. --Theoden in /usr/src/perl/taint.c
1001 FINIS
83ded9ee 1002 $quote =~ s/\s+--/\n--/;
5a964f20 1003
1004A nice general-purpose fixer-upper function for indented here documents
1005follows. It expects to be called with a here document as its argument.
1006It looks to see whether each line begins with a common substring, and
a6dd486b 1007if so, strips that substring off. Otherwise, it takes the amount of leading
1008whitespace found on the first line and removes that much off each
5a964f20 1009subsequent line.
1010
1011 sub fix {
1012 local $_ = shift;
a6dd486b 1013 my ($white, $leader); # common whitespace and common leading string
5a964f20 1014 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1015 ($white, $leader) = ($2, quotemeta($1));
1016 } else {
1017 ($white, $leader) = (/^(\s+)/, '');
1018 }
1019 s/^\s*?$leader(?:$white)?//gm;
1020 return $_;
1021 }
1022
c8db1d39 1023This works with leading special strings, dynamically determined:
5a964f20 1024
1025 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1026 @@@ int
1027 @@@ runops() {
1028 @@@ SAVEI32(runlevel);
1029 @@@ runlevel++;
d92eb7b0 1030 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1031 @@@ TAINT_NOT;
1032 @@@ return 0;
1033 @@@ }
1034 MAIN_INTERPRETER_LOOP
1035
a6dd486b 1036Or with a fixed amount of leading whitespace, with remaining
5a964f20 1037indentation correctly preserved:
1038
1039 $poem = fix<<EVER_ON_AND_ON;
1040 Now far ahead the Road has gone,
1041 And I must follow, if I can,
1042 Pursuing it with eager feet,
1043 Until it joins some larger way
1044 Where many paths and errands meet.
1045 And whither then? I cannot say.
1046 --Bilbo in /usr/src/perl/pp_ctl.c
1047 EVER_ON_AND_ON
1048
68dc0745 1049=head1 Data: Arrays
1050
65acb1b1 1051=head2 What is the difference between a list and an array?
1052
1053An array has a changeable length. A list does not. An array is something
1054you can push or pop, while a list is a set of values. Some people make
1055the distinction that a list is a value while an array is a variable.
1056Subroutines are passed and return lists, you put things into list
1057context, you initialize arrays with lists, and you foreach() across
1058a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
1059in scalar context behave like the number of elements in them, subroutines
a6dd486b 1060access their arguments through the array C<@_>, and push/pop/shift only work
65acb1b1 1061on arrays.
1062
1063As a side note, there's no such thing as a list in scalar context.
1064When you say
1065
1066 $scalar = (2, 5, 7, 9);
1067
d92eb7b0 1068you're using the comma operator in scalar context, so it uses the scalar
1069comma operator. There never was a list there at all! This causes the
1070last value to be returned: 9.
65acb1b1 1071
68dc0745 1072=head2 What is the difference between $array[1] and @array[1]?
1073
a6dd486b 1074The former is a scalar value; the latter an array slice, making
68dc0745 1075it a list with one (scalar) value. You should use $ when you want a
1076scalar value (most of the time) and @ when you want a list with one
1077scalar value in it (very, very rarely; nearly never, in fact).
1078
1079Sometimes it doesn't make a difference, but sometimes it does.
1080For example, compare:
1081
1082 $good[0] = `some program that outputs several lines`;
1083
1084with
1085
1086 @bad[0] = `same program that outputs several lines`;
1087
9f1b1f2d 1088The C<use warnings> pragma and the B<-w> flag will warn you about these
1089matters.
68dc0745 1090
d92eb7b0 1091=head2 How can I remove duplicate elements from a list or array?
68dc0745 1092
1093There are several possible ways, depending on whether the array is
1094ordered and whether you wish to preserve the ordering.
1095
1096=over 4
1097
551e1d92 1098=item a)
1099
1100If @in is sorted, and you want @out to be sorted:
5a964f20 1101(this assumes all true values in the array)
68dc0745 1102
a4341a65 1103 $prev = "not equal to $in[0]";
3bc5ef3e 1104 @out = grep($_ ne $prev && ($prev = $_, 1), @in);
68dc0745 1105
c8db1d39 1106This is nice in that it doesn't use much extra memory, simulating
3bc5ef3e 1107uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
1108guarantees that the expression is true (so that grep picks it up)
1109even if the $_ is 0, "", or undef.
68dc0745 1110
551e1d92 1111=item b)
1112
1113If you don't know whether @in is sorted:
68dc0745 1114
1115 undef %saw;
1116 @out = grep(!$saw{$_}++, @in);
1117
551e1d92 1118=item c)
1119
1120Like (b), but @in contains only small integers:
68dc0745 1121
1122 @out = grep(!$saw[$_]++, @in);
1123
551e1d92 1124=item d)
1125
1126A way to do (b) without any loops or greps:
68dc0745 1127
1128 undef %saw;
1129 @saw{@in} = ();
1130 @out = sort keys %saw; # remove sort if undesired
1131
551e1d92 1132=item e)
1133
1134Like (d), but @in contains only small positive integers:
68dc0745 1135
1136 undef @ary;
1137 @ary[@in] = @in;
87275199 1138 @out = grep {defined} @ary;
68dc0745 1139
1140=back
1141
65acb1b1 1142But perhaps you should have been using a hash all along, eh?
1143
ddbc1f16 1144=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1145
1146Hearing the word "in" is an I<in>dication that you probably should have
1147used a hash, not a list or array, to store your data. Hashes are
1148designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1149
5a964f20 1150That being said, there are several ways to approach this. If you
1151are going to make this query many times over arbitrary string values,
881bdbd4 1152the fastest way is probably to invert the original array and maintain a
1153hash whose keys are the first array's values.
68dc0745 1154
1155 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
881bdbd4 1156 %is_blue = ();
68dc0745 1157 for (@blues) { $is_blue{$_} = 1 }
1158
1159Now you can check whether $is_blue{$some_color}. It might have been a
1160good idea to keep the blues all in a hash in the first place.
1161
1162If the values are all small integers, you could use a simple indexed
1163array. This kind of an array will take up less space:
1164
1165 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
881bdbd4 1166 @is_tiny_prime = ();
d92eb7b0 1167 for (@primes) { $is_tiny_prime[$_] = 1 }
1168 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1169
1170Now you check whether $is_tiny_prime[$some_number].
1171
1172If the values in question are integers instead of strings, you can save
1173quite a lot of space by using bit strings instead:
1174
1175 @articles = ( 1..10, 150..2000, 2017 );
1176 undef $read;
7b8d334a 1177 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1178
1179Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1180
1181Please do not use
1182
a6dd486b 1183 ($is_there) = grep $_ eq $whatever, @array;
68dc0745 1184
1185or worse yet
1186
a6dd486b 1187 ($is_there) = grep /$whatever/, @array;
68dc0745 1188
1189These are slow (checks every element even if the first matches),
1190inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1191regex characters in $whatever?). If you're only testing once, then
65acb1b1 1192use:
1193
1194 $is_there = 0;
1195 foreach $elt (@array) {
1196 if ($elt eq $elt_to_find) {
1197 $is_there = 1;
1198 last;
1199 }
1200 }
1201 if ($is_there) { ... }
68dc0745 1202
1203=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1204
1205Use a hash. Here's code to do both and more. It assumes that
1206each element is unique in a given array:
1207
1208 @union = @intersection = @difference = ();
1209 %count = ();
1210 foreach $element (@array1, @array2) { $count{$element}++ }
1211 foreach $element (keys %count) {
1212 push @union, $element;
1213 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1214 }
1215
d92eb7b0 1216Note that this is the I<symmetric difference>, that is, all elements in
a6dd486b 1217either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1218
65acb1b1 1219=head2 How do I test whether two arrays or hashes are equal?
1220
1221The following code works for single-level arrays. It uses a stringwise
1222comparison, and does not distinguish defined versus undefined empty
1223strings. Modify if you have other needs.
1224
1225 $are_equal = compare_arrays(\@frogs, \@toads);
1226
1227 sub compare_arrays {
1228 my ($first, $second) = @_;
9f1b1f2d 1229 no warnings; # silence spurious -w undef complaints
65acb1b1 1230 return 0 unless @$first == @$second;
1231 for (my $i = 0; $i < @$first; $i++) {
1232 return 0 if $first->[$i] ne $second->[$i];
1233 }
1234 return 1;
1235 }
1236
1237For multilevel structures, you may wish to use an approach more
1238like this one. It uses the CPAN module FreezeThaw:
1239
1240 use FreezeThaw qw(cmpStr);
1241 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1242
1243 printf "a and b contain %s arrays\n",
1244 cmpStr(\@a, \@b) == 0
1245 ? "the same"
1246 : "different";
1247
1248This approach also works for comparing hashes. Here
1249we'll demonstrate two different answers:
1250
1251 use FreezeThaw qw(cmpStr cmpStrHard);
1252
1253 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1254 $a{EXTRA} = \%b;
1255 $b{EXTRA} = \%a;
1256
1257 printf "a and b contain %s hashes\n",
1258 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1259
1260 printf "a and b contain %s hashes\n",
1261 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1262
1263
1264The first reports that both those the hashes contain the same data,
1265while the second reports that they do not. Which you prefer is left as
1266an exercise to the reader.
1267
68dc0745 1268=head2 How do I find the first array element for which a condition is true?
1269
1270You can use this if you care about the index:
1271
65acb1b1 1272 for ($i= 0; $i < @array; $i++) {
68dc0745 1273 if ($array[$i] eq "Waldo") {
1274 $found_index = $i;
1275 last;
1276 }
1277 }
1278
1279Now C<$found_index> has what you want.
1280
1281=head2 How do I handle linked lists?
1282
1283In general, you usually don't need a linked list in Perl, since with
1284regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1285or you can use splice to add and/or remove arbitrary number of elements at
87275199 1286arbitrary points. Both pop and shift are both O(1) operations on Perl's
5a964f20 1287dynamic arrays. In the absence of shifts and pops, push in general
1288needs to reallocate on the order every log(N) times, and unshift will
1289need to copy pointers each time.
68dc0745 1290
1291If you really, really wanted, you could use structures as described in
1292L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1293to do. For example, imagine a list node like this:
1294
1295 $node = {
1296 VALUE => 42,
1297 LINK => undef,
1298 };
1299
1300You could walk the list this way:
1301
1302 print "List: ";
1303 for ($node = $head; $node; $node = $node->{LINK}) {
1304 print $node->{VALUE}, " ";
1305 }
1306 print "\n";
1307
a6dd486b 1308You could add to the list this way:
65acb1b1 1309
1310 my ($head, $tail);
1311 $tail = append($head, 1); # grow a new head
1312 for $value ( 2 .. 10 ) {
1313 $tail = append($tail, $value);
1314 }
1315
1316 sub append {
1317 my($list, $value) = @_;
1318 my $node = { VALUE => $value };
1319 if ($list) {
1320 $node->{LINK} = $list->{LINK};
1321 $list->{LINK} = $node;
1322 } else {
1323 $_[0] = $node; # replace caller's version
1324 }
1325 return $node;
1326 }
1327
1328But again, Perl's built-in are virtually always good enough.
68dc0745 1329
1330=head2 How do I handle circular lists?
1331
1332Circular lists could be handled in the traditional fashion with linked
1333lists, or you could just do something like this with an array:
1334
1335 unshift(@array, pop(@array)); # the last shall be first
1336 push(@array, shift(@array)); # and vice versa
1337
1338=head2 How do I shuffle an array randomly?
1339
45bbf655 1340If you either have Perl 5.8.0 or later installed, or if you have
1341Scalar-List-Utils 1.03 or later installed, you can say:
1342
f05bbc40 1343 use List::Util 'shuffle';
45bbf655 1344
1345 @shuffled = shuffle(@list);
1346
f05bbc40 1347If not, you can use a Fisher-Yates shuffle.
5a964f20 1348
5a964f20 1349 sub fisher_yates_shuffle {
cc30d1a7 1350 my $deck = shift; # $deck is a reference to an array
1351 my $i = @$deck;
f05bbc40 1352 while ($i--) {
5a964f20 1353 my $j = int rand ($i+1);
cc30d1a7 1354 @$deck[$i,$j] = @$deck[$j,$i];
5a964f20 1355 }
1356 }
1357
cc30d1a7 1358 # shuffle my mpeg collection
1359 #
1360 my @mpeg = <audio/*/*.mp3>;
1361 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1362 print @mpeg;
5a964f20 1363
45bbf655 1364Note that the above implementation shuffles an array in place,
1365unlike the List::Util::shuffle() which takes a list and returns
1366a new shuffled list.
1367
d92eb7b0 1368You've probably seen shuffling algorithms that work using splice,
a6dd486b 1369randomly picking another element to swap the current element with
68dc0745 1370
1371 srand;
1372 @new = ();
1373 @old = 1 .. 10; # just a demo
1374 while (@old) {
1375 push(@new, splice(@old, rand @old, 1));
1376 }
1377
5a964f20 1378This is bad because splice is already O(N), and since you do it N times,
1379you just invented a quadratic algorithm; that is, O(N**2). This does
1380not scale, although Perl is so efficient that you probably won't notice
1381this until you have rather largish arrays.
68dc0745 1382
1383=head2 How do I process/modify each element of an array?
1384
1385Use C<for>/C<foreach>:
1386
1387 for (@lines) {
5a964f20 1388 s/foo/bar/; # change that word
1389 y/XZ/ZX/; # swap those letters
68dc0745 1390 }
1391
1392Here's another; let's compute spherical volumes:
1393
5a964f20 1394 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1395 $_ **= 3;
1396 $_ *= (4/3) * 3.14159; # this will be constant folded
1397 }
1398
5a964f20 1399If you want to do the same thing to modify the values of the hash,
1400you may not use the C<values> function, oddly enough. You need a slice:
1401
1402 for $orbit ( @orbits{keys %orbits} ) {
1403 ($orbit **= 3) *= (4/3) * 3.14159;
1404 }
1405
68dc0745 1406=head2 How do I select a random element from an array?
1407
1408Use the rand() function (see L<perlfunc/rand>):
1409
5a964f20 1410 # at the top of the program:
68dc0745 1411 srand; # not needed for 5.004 and later
5a964f20 1412
1413 # then later on
68dc0745 1414 $index = rand @array;
1415 $element = $array[$index];
1416
5a964f20 1417Make sure you I<only call srand once per program, if then>.
1418If you are calling it more than once (such as before each
1419call to rand), you're almost certainly doing something wrong.
1420
68dc0745 1421=head2 How do I permute N elements of a list?
1422
1423Here's a little program that generates all permutations
1424of all the words on each line of input. The algorithm embodied
5a964f20 1425in the permute() function should work on any list:
68dc0745 1426
1427 #!/usr/bin/perl -n
5a964f20 1428 # tsc-permute: permute each word of input
1429 permute([split], []);
1430 sub permute {
1431 my @items = @{ $_[0] };
1432 my @perms = @{ $_[1] };
1433 unless (@items) {
1434 print "@perms\n";
68dc0745 1435 } else {
5a964f20 1436 my(@newitems,@newperms,$i);
1437 foreach $i (0 .. $#items) {
1438 @newitems = @items;
1439 @newperms = @perms;
1440 unshift(@newperms, splice(@newitems, $i, 1));
1441 permute([@newitems], [@newperms]);
68dc0745 1442 }
1443 }
1444 }
1445
b8d2732a 1446Unfortunately, this algorithm is very inefficient. The Algorithm::Permute
1447module from CPAN runs at least an order of magnitude faster. If you don't
1448have a C compiler (or a binary distribution of Algorithm::Permute), then
1449you can use List::Permutor which is written in pure Perl, and is still
f8620f40 1450several times faster than the algorithm above.
b8d2732a 1451
68dc0745 1452=head2 How do I sort an array by (anything)?
1453
1454Supply a comparison function to sort() (described in L<perlfunc/sort>):
1455
1456 @list = sort { $a <=> $b } @list;
1457
1458The default sort function is cmp, string comparison, which would
c47ff5f1 1459sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1460the numerical comparison operator.
1461
1462If you have a complicated function needed to pull out the part you
1463want to sort on, then don't do it inside the sort function. Pull it
1464out first, because the sort BLOCK can be called many times for the
1465same element. Here's an example of how to pull out the first word
1466after the first number on each item, and then sort those words
1467case-insensitively.
1468
1469 @idx = ();
1470 for (@data) {
1471 ($item) = /\d+\s*(\S+)/;
1472 push @idx, uc($item);
1473 }
1474 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1475
a6dd486b 1476which could also be written this way, using a trick
68dc0745 1477that's come to be known as the Schwartzian Transform:
1478
1479 @sorted = map { $_->[0] }
1480 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1481 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1482
1483If you need to sort on several fields, the following paradigm is useful.
1484
1485 @sorted = sort { field1($a) <=> field1($b) ||
1486 field2($a) cmp field2($b) ||
1487 field3($a) cmp field3($b)
1488 } @data;
1489
1490This can be conveniently combined with precalculation of keys as given
1491above.
1492
06a5f41f 1493See the F<sort> artitcle article in the "Far More Than You Ever Wanted
1494To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for
1495more about this approach.
68dc0745 1496
1497See also the question below on sorting hashes.
1498
1499=head2 How do I manipulate arrays of bits?
1500
1501Use pack() and unpack(), or else vec() and the bitwise operations.
1502
1503For example, this sets $vec to have bit N set if $ints[N] was set:
1504
1505 $vec = '';
1506 foreach(@ints) { vec($vec,$_,1) = 1 }
1507
cc30d1a7 1508Here's how, given a vector in $vec, you can
68dc0745 1509get those bits into your @ints array:
1510
1511 sub bitvec_to_list {
1512 my $vec = shift;
1513 my @ints;
1514 # Find null-byte density then select best algorithm
1515 if ($vec =~ tr/\0// / length $vec > 0.95) {
1516 use integer;
1517 my $i;
1518 # This method is faster with mostly null-bytes
1519 while($vec =~ /[^\0]/g ) {
1520 $i = -9 + 8 * pos $vec;
1521 push @ints, $i if vec($vec, ++$i, 1);
1522 push @ints, $i if vec($vec, ++$i, 1);
1523 push @ints, $i if vec($vec, ++$i, 1);
1524 push @ints, $i if vec($vec, ++$i, 1);
1525 push @ints, $i if vec($vec, ++$i, 1);
1526 push @ints, $i if vec($vec, ++$i, 1);
1527 push @ints, $i if vec($vec, ++$i, 1);
1528 push @ints, $i if vec($vec, ++$i, 1);
1529 }
1530 } else {
1531 # This method is a fast general algorithm
1532 use integer;
1533 my $bits = unpack "b*", $vec;
1534 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1535 push @ints, pos $bits while($bits =~ /1/g);
1536 }
1537 return \@ints;
1538 }
1539
1540This method gets faster the more sparse the bit vector is.
1541(Courtesy of Tim Bunce and Winfried Koenig.)
1542
cc30d1a7 1543Or use the CPAN module Bit::Vector:
1544
1545 $vector = Bit::Vector->new($num_of_bits);
1546 $vector->Index_List_Store(@ints);
1547 @ints = $vector->Index_List_Read();
1548
1549Bit::Vector provides efficient methods for bit vector, sets of small integers
1550and "big int" math.
1551
1552Here's a more extensive illustration using vec():
65acb1b1 1553
1554 # vec demo
1555 $vector = "\xff\x0f\xef\xfe";
1556 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1557 unpack("N", $vector), "\n";
1558 $is_set = vec($vector, 23, 1);
1559 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1560 pvec($vector);
1561
1562 set_vec(1,1,1);
1563 set_vec(3,1,1);
1564 set_vec(23,1,1);
1565
1566 set_vec(3,1,3);
1567 set_vec(3,2,3);
1568 set_vec(3,4,3);
1569 set_vec(3,4,7);
1570 set_vec(3,8,3);
1571 set_vec(3,8,7);
1572
1573 set_vec(0,32,17);
1574 set_vec(1,32,17);
1575
1576 sub set_vec {
1577 my ($offset, $width, $value) = @_;
1578 my $vector = '';
1579 vec($vector, $offset, $width) = $value;
1580 print "offset=$offset width=$width value=$value\n";
1581 pvec($vector);
1582 }
1583
1584 sub pvec {
1585 my $vector = shift;
1586 my $bits = unpack("b*", $vector);
1587 my $i = 0;
1588 my $BASE = 8;
1589
1590 print "vector length in bytes: ", length($vector), "\n";
1591 @bytes = unpack("A8" x length($vector), $bits);
1592 print "bits are: @bytes\n\n";
1593 }
1594
68dc0745 1595=head2 Why does defined() return true on empty arrays and hashes?
1596
65acb1b1 1597The short story is that you should probably only use defined on scalars or
1598functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1599in the 5.004 release or later of Perl for more detail.
68dc0745 1600
1601=head1 Data: Hashes (Associative Arrays)
1602
1603=head2 How do I process an entire hash?
1604
1605Use the each() function (see L<perlfunc/each>) if you don't care
1606whether it's sorted:
1607
5a964f20 1608 while ( ($key, $value) = each %hash) {
68dc0745 1609 print "$key = $value\n";
1610 }
1611
1612If you want it sorted, you'll have to use foreach() on the result of
1613sorting the keys as shown in an earlier question.
1614
1615=head2 What happens if I add or remove keys from a hash while iterating over it?
1616
d92eb7b0 1617Don't do that. :-)
1618
1619[lwall] In Perl 4, you were not allowed to modify a hash at all while
87275199 1620iterating over it. In Perl 5 you can delete from it, but you still
d92eb7b0 1621can't add to it, because that might cause a doubling of the hash table,
1622in which half the entries get copied up to the new top half of the
87275199 1623table, at which point you've totally bamboozled the iterator code.
d92eb7b0 1624Even if the table doesn't double, there's no telling whether your new
1625entry will be inserted before or after the current iterator position.
1626
a6dd486b 1627Either treasure up your changes and make them after the iterator finishes
d92eb7b0 1628or use keys to fetch all the old keys at once, and iterate over the list
1629of keys.
68dc0745 1630
1631=head2 How do I look up a hash element by value?
1632
1633Create a reverse hash:
1634
1635 %by_value = reverse %by_key;
1636 $key = $by_value{$value};
1637
1638That's not particularly efficient. It would be more space-efficient
1639to use:
1640
1641 while (($key, $value) = each %by_key) {
1642 $by_value{$value} = $key;
1643 }
1644
d92eb7b0 1645If your hash could have repeated values, the methods above will only find
1646one of the associated keys. This may or may not worry you. If it does
1647worry you, you can always reverse the hash into a hash of arrays instead:
1648
1649 while (($key, $value) = each %by_key) {
1650 push @{$key_list_by_value{$value}}, $key;
1651 }
68dc0745 1652
1653=head2 How can I know how many entries are in a hash?
1654
1655If you mean how many keys, then all you have to do is
875e5c2f 1656use the keys() function in a scalar context:
68dc0745 1657
875e5c2f 1658 $num_keys = keys %hash;
68dc0745 1659
875e5c2f 1660The keys() function also resets the iterator, which means that you may
1661see strange results if you use this between uses of other hash operators
1662such as each().
68dc0745 1663
1664=head2 How do I sort a hash (optionally by value instead of key)?
1665
1666Internally, hashes are stored in a way that prevents you from imposing
1667an order on key-value pairs. Instead, you have to sort a list of the
1668keys or values:
1669
1670 @keys = sort keys %hash; # sorted by key
1671 @keys = sort {
1672 $hash{$a} cmp $hash{$b}
1673 } keys %hash; # and by value
1674
1675Here we'll do a reverse numeric sort by value, and if two keys are
a6dd486b 1676identical, sort by length of key, or if that fails, by straight ASCII
1677comparison of the keys (well, possibly modified by your locale--see
68dc0745 1678L<perllocale>).
1679
1680 @keys = sort {
1681 $hash{$b} <=> $hash{$a}
1682 ||
1683 length($b) <=> length($a)
1684 ||
1685 $a cmp $b
1686 } keys %hash;
1687
1688=head2 How can I always keep my hash sorted?
1689
1690You can look into using the DB_File module and tie() using the
1691$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1692The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1693
1694=head2 What's the difference between "delete" and "undef" with hashes?
1695
1696Hashes are pairs of scalars: the first is the key, the second is the
1697value. The key will be coerced to a string, although the value can be
1698any kind of scalar: string, number, or reference. If a key C<$key> is
1699present in the array, C<exists($key)> will return true. The value for
1700a given key can be C<undef>, in which case C<$array{$key}> will be
1701C<undef> while C<$exists{$key}> will return true. This corresponds to
1702(C<$key>, C<undef>) being in the hash.
1703
1704Pictures help... here's the C<%ary> table:
1705
1706 keys values
1707 +------+------+
1708 | a | 3 |
1709 | x | 7 |
1710 | d | 0 |
1711 | e | 2 |
1712 +------+------+
1713
1714And these conditions hold
1715
1716 $ary{'a'} is true
1717 $ary{'d'} is false
1718 defined $ary{'d'} is true
1719 defined $ary{'a'} is true
87275199 1720 exists $ary{'a'} is true (Perl5 only)
68dc0745 1721 grep ($_ eq 'a', keys %ary) is true
1722
1723If you now say
1724
1725 undef $ary{'a'}
1726
1727your table now reads:
1728
1729
1730 keys values
1731 +------+------+
1732 | a | undef|
1733 | x | 7 |
1734 | d | 0 |
1735 | e | 2 |
1736 +------+------+
1737
1738and these conditions now hold; changes in caps:
1739
1740 $ary{'a'} is FALSE
1741 $ary{'d'} is false
1742 defined $ary{'d'} is true
1743 defined $ary{'a'} is FALSE
87275199 1744 exists $ary{'a'} is true (Perl5 only)
68dc0745 1745 grep ($_ eq 'a', keys %ary) is true
1746
1747Notice the last two: you have an undef value, but a defined key!
1748
1749Now, consider this:
1750
1751 delete $ary{'a'}
1752
1753your table now reads:
1754
1755 keys values
1756 +------+------+
1757 | x | 7 |
1758 | d | 0 |
1759 | e | 2 |
1760 +------+------+
1761
1762and these conditions now hold; changes in caps:
1763
1764 $ary{'a'} is false
1765 $ary{'d'} is false
1766 defined $ary{'d'} is true
1767 defined $ary{'a'} is false
87275199 1768 exists $ary{'a'} is FALSE (Perl5 only)
68dc0745 1769 grep ($_ eq 'a', keys %ary) is FALSE
1770
1771See, the whole entry is gone!
1772
1773=head2 Why don't my tied hashes make the defined/exists distinction?
1774
1775They may or may not implement the EXISTS() and DEFINED() methods
1776differently. For example, there isn't the concept of undef with hashes
1777that are tied to DBM* files. This means the true/false tables above
1778will give different results when used on such a hash. It also means
1779that exists and defined do the same thing with a DBM* file, and what
1780they end up doing is not what they do with ordinary hashes.
1781
1782=head2 How do I reset an each() operation part-way through?
1783
5a964f20 1784Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1785the hash I<and> resets the iterator associated with the hash. You may
1786need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1787re-enter it, the hash iterator has been reset.
68dc0745 1788
1789=head2 How can I get the unique keys from two hashes?
1790
d92eb7b0 1791First you extract the keys from the hashes into lists, then solve
1792the "removing duplicates" problem described above. For example:
68dc0745 1793
1794 %seen = ();
1795 for $element (keys(%foo), keys(%bar)) {
1796 $seen{$element}++;
1797 }
1798 @uniq = keys %seen;
1799
1800Or more succinctly:
1801
1802 @uniq = keys %{{%foo,%bar}};
1803
1804Or if you really want to save space:
1805
1806 %seen = ();
1807 while (defined ($key = each %foo)) {
1808 $seen{$key}++;
1809 }
1810 while (defined ($key = each %bar)) {
1811 $seen{$key}++;
1812 }
1813 @uniq = keys %seen;
1814
1815=head2 How can I store a multidimensional array in a DBM file?
1816
1817Either stringify the structure yourself (no fun), or else
1818get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1819it on top of either DB_File or GDBM_File.
1820
1821=head2 How can I make my hash remember the order I put elements into it?
1822
1823Use the Tie::IxHash from CPAN.
1824
46fc3d4c 1825 use Tie::IxHash;
1826 tie(%myhash, Tie::IxHash);
1827 for ($i=0; $i<20; $i++) {
1828 $myhash{$i} = 2*$i;
1829 }
1830 @keys = keys %myhash;
1831 # @keys = (0,1,2,3,...)
1832
68dc0745 1833=head2 Why does passing a subroutine an undefined element in a hash create it?
1834
1835If you say something like:
1836
1837 somefunc($hash{"nonesuch key here"});
1838
1839Then that element "autovivifies"; that is, it springs into existence
1840whether you store something there or not. That's because functions
1841get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1842it has to be ready to write it back into the caller's version.
1843
87275199 1844This has been fixed as of Perl5.004.
68dc0745 1845
1846Normally, merely accessing a key's value for a nonexistent key does
1847I<not> cause that key to be forever there. This is different than
1848awk's behavior.
1849
fc36a67e 1850=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1851
65acb1b1 1852Usually a hash ref, perhaps like this:
1853
1854 $record = {
1855 NAME => "Jason",
1856 EMPNO => 132,
1857 TITLE => "deputy peon",
1858 AGE => 23,
1859 SALARY => 37_000,
1860 PALS => [ "Norbert", "Rhys", "Phineas"],
1861 };
1862
1863References are documented in L<perlref> and the upcoming L<perlreftut>.
1864Examples of complex data structures are given in L<perldsc> and
1865L<perllol>. Examples of structures and object-oriented classes are
1866in L<perltoot>.
68dc0745 1867
1868=head2 How can I use a reference as a hash key?
1869
fe854a6f 1870You can't do this directly, but you could use the standard Tie::RefHash
87275199 1871module distributed with Perl.
68dc0745 1872
1873=head1 Data: Misc
1874
1875=head2 How do I handle binary data correctly?
1876
1877Perl is binary clean, so this shouldn't be a problem. For example,
1878this works fine (assuming the files are found):
1879
1880 if (`cat /vmunix` =~ /gzip/) {
1881 print "Your kernel is GNU-zip enabled!\n";
1882 }
1883
d92eb7b0 1884On less elegant (read: Byzantine) systems, however, you have
1885to play tedious games with "text" versus "binary" files. See
1886L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
1887systems are curses out of Microsoft, who seem to be committed to putting
1888the backward into backward compatibility.
68dc0745 1889
1890If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1891
54310121 1892If you want to deal with multibyte characters, however, there are
68dc0745 1893some gotchas. See the section on Regular Expressions.
1894
1895=head2 How do I determine whether a scalar is a number/whole/integer/float?
1896
1897Assuming that you don't care about IEEE notations like "NaN" or
1898"Infinity", you probably just want to use a regular expression.
1899
65acb1b1 1900 if (/\D/) { print "has nondigits\n" }
1901 if (/^\d+$/) { print "is a whole number\n" }
1902 if (/^-?\d+$/) { print "is an integer\n" }
1903 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1904 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
881bdbd4 1905 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
65acb1b1 1906 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 1907 { print "a C float\n" }
68dc0745 1908
5a964f20 1909If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1910function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1911wrapper function for more convenient access. This function takes
1912a string and returns the number it found, or C<undef> for input that
1913isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1914if you just want to say, ``Is this a float?''
1915
1916 sub getnum {
1917 use POSIX qw(strtod);
1918 my $str = shift;
1919 $str =~ s/^\s+//;
1920 $str =~ s/\s+$//;
1921 $! = 0;
1922 my($num, $unparsed) = strtod($str);
1923 if (($str eq '') || ($unparsed != 0) || $!) {
1924 return undef;
1925 } else {
1926 return $num;
1927 }
1928 }
1929
072dc14b 1930 sub is_numeric { defined getnum($_[0]) }
5a964f20 1931
6cecdcac 1932Or you could check out the String::Scanf module on CPAN instead. The
1933POSIX module (part of the standard Perl distribution) provides the
bf4acbe4 1934C<strtod> and C<strtol> for converting strings to double and longs,
6cecdcac 1935respectively.
68dc0745 1936
1937=head2 How do I keep persistent data across program calls?
1938
1939For some specific applications, you can use one of the DBM modules.
fe854a6f 1940See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
1941or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
1942of the standard distribution. Here's one example using Storable's C<store>
1943and C<retrieve> functions:
65acb1b1 1944
1945 use Storable;
1946 store(\%hash, "filename");
1947
1948 # later on...
1949 $href = retrieve("filename"); # by ref
1950 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1951
1952=head2 How do I print out or copy a recursive data structure?
1953
65acb1b1 1954The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1955for printing out data structures. The Storable module, found on CPAN,
1956provides a function called C<dclone> that recursively copies its argument.
1957
1958 use Storable qw(dclone);
1959 $r2 = dclone($r1);
68dc0745 1960
65acb1b1 1961Where $r1 can be a reference to any kind of data structure you'd like.
1962It will be deeply copied. Because C<dclone> takes and returns references,
1963you'd have to add extra punctuation if you had a hash of arrays that
1964you wanted to copy.
68dc0745 1965
65acb1b1 1966 %newhash = %{ dclone(\%oldhash) };
68dc0745 1967
1968=head2 How do I define methods for every class/object?
1969
1970Use the UNIVERSAL class (see L<UNIVERSAL>).
1971
1972=head2 How do I verify a credit card checksum?
1973
1974Get the Business::CreditCard module from CPAN.
1975
65acb1b1 1976=head2 How do I pack arrays of doubles or floats for XS code?
1977
1978The kgbpack.c code in the PGPLOT module on CPAN does just this.
1979If you're doing a lot of float or double processing, consider using
1980the PDL module from CPAN instead--it makes number-crunching easy.
1981
68dc0745 1982=head1 AUTHOR AND COPYRIGHT
1983
0bc0ad85 1984Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
5a964f20 1985All rights reserved.
1986
5a7beb56 1987This documentation is free; you can redistribute it and/or modify it
1988under the same terms as Perl itself.
5a964f20 1989
1990Irrespective of its distribution, all code examples in this file
1991are hereby placed into the public domain. You are permitted and
1992encouraged to use this code in your own programs for fun
1993or for profit as you see fit. A simple comment in the code giving
1994credit would be courteous but is not required.