Document better what the foo_r_proto are supposed to be.
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
76817d6d 3perlfaq4 - Data Manipulation ($Revision: 1.22 $, $Date: 2002/05/16 12:44:24 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
5a964f20 14The infinite set that a mathematician thinks of as the real numbers can
a6dd486b 15only be approximated on a computer, since the computer only has a finite
5a964f20 16number of bits to store an infinite number of, um, numbers.
17
46fc3d4c 18Internally, your computer represents floating-point numbers in binary.
92c2ed05 19Floating-point numbers read in from a file or appearing as literals
20in your program are converted from their decimal floating-point
a6dd486b 21representation (eg, 19.95) to an internal binary representation.
46fc3d4c 22
23However, 19.95 can't be precisely represented as a binary
24floating-point number, just like 1/3 can't be exactly represented as a
25decimal floating-point number. The computer's binary representation
26of 19.95, therefore, isn't exactly 19.95.
27
28When a floating-point number gets printed, the binary floating-point
29representation is converted back to decimal. These decimal numbers
30are displayed in either the format you specify with printf(), or the
a6dd486b 31current output format for numbers. (See L<perlvar/"$#"> if you use
46fc3d4c 32print. C<$#> has a different default value in Perl5 than it did in
87275199 33Perl4. Changing C<$#> yourself is deprecated.)
46fc3d4c 34
35This affects B<all> computer languages that represent decimal
36floating-point numbers in binary, not just Perl. Perl provides
37arbitrary-precision decimal numbers with the Math::BigFloat module
38(part of the standard Perl distribution), but mathematical operations
39are consequently slower.
40
80ba158a 41If precision is important, such as when dealing with money, it's good
1affb2ee 42to work with integers and then divide at the last possible moment.
43For example, work in pennies (1995) instead of dollars and cents
6b927632 44(19.95) and divide by 100 at the end.
1affb2ee 45
46fc3d4c 46To get rid of the superfluous digits, just use a format (eg,
47C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 48See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 49
68dc0745 50=head2 Why isn't my octal data interpreted correctly?
51
52Perl only understands octal and hex numbers as such when they occur
33ce146f 53as literals in your program. Octal literals in perl must start with
54a leading "0" and hexadecimal literals must start with a leading "0x".
55If they are read in from somewhere and assigned, no automatic
56conversion takes place. You must explicitly use oct() or hex() if you
57want the values converted to decimal. oct() interprets
68dc0745 58both hex ("0x350") numbers and octal ones ("0350" or even without the
59leading "0", like "377"), while hex() only converts hexadecimal ones,
60with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
33ce146f 61The inverse mapping from decimal to octal can be done with either the
62"%o" or "%O" sprintf() formats. To get from decimal to hex try either
63the "%x" or the "%X" formats to sprintf().
68dc0745 64
65This problem shows up most often when people try using chmod(), mkdir(),
33ce146f 66umask(), or sysopen(), which by widespread tradition typically take
67permissions in octal.
68dc0745 68
33ce146f 69 chmod(644, $file); # WRONG
68dc0745 70 chmod(0644, $file); # right
71
33ce146f 72Note the mistake in the first line was specifying the decimal literal
73644, rather than the intended octal literal 0644. The problem can
74be seen with:
75
434f7166 76 printf("%#o",644); # prints 01204
33ce146f 77
78Surely you had not intended C<chmod(01204, $file);> - did you? If you
79want to use numeric literals as arguments to chmod() et al. then please
80try to express them as octal constants, that is with a leading zero and
81with the following digits restricted to the set 0..7.
82
65acb1b1 83=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 84
92c2ed05 85Remember that int() merely truncates toward 0. For rounding to a
86certain number of digits, sprintf() or printf() is usually the easiest
87route.
88
89 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 90
87275199 91The POSIX module (part of the standard Perl distribution) implements
68dc0745 92ceil(), floor(), and a number of other mathematical and trigonometric
93functions.
94
92c2ed05 95 use POSIX;
96 $ceil = ceil(3.5); # 4
97 $floor = floor(3.5); # 3
98
a6dd486b 99In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
87275199 100module. With 5.004, the Math::Trig module (part of the standard Perl
46fc3d4c 101distribution) implements the trigonometric functions. Internally it
102uses the Math::Complex module and some functions can break out from
103the real axis into the complex plane, for example the inverse sine of
1042.
68dc0745 105
106Rounding in financial applications can have serious implications, and
107the rounding method used should be specified precisely. In these
108cases, it probably pays not to trust whichever system rounding is
109being used by Perl, but to instead implement the rounding function you
110need yourself.
111
65acb1b1 112To see why, notice how you'll still have an issue on half-way-point
113alternation:
114
115 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
116
117 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
118 0.8 0.8 0.9 0.9 1.0 1.0
119
120Don't blame Perl. It's the same as in C. IEEE says we have to do this.
121Perl numbers whose absolute values are integers under 2**31 (on 32 bit
122machines) will work pretty much like mathematical integers. Other numbers
123are not guaranteed.
124
ae3d0b9f 125=head2 How do I convert between numeric representations?
68dc0745 126
6761e064 127As always with Perl there is more than one way to do it. Below
128are a few examples of approaches to making common conversions
129between number representations. This is intended to be representational
130rather than exhaustive.
68dc0745 131
6761e064 132Some of the examples below use the Bit::Vector module from CPAN.
133The reason you might choose Bit::Vector over the perl built in
134functions is that it works with numbers of ANY size, that it is
135optimized for speed on some operations, and for at least some
136programmers the notation might be familiar.
d92eb7b0 137
2c646907 138=item B<How do I convert hexadecimal into decimal:>
d92eb7b0 139
6761e064 140Using perl's built in conversion of 0x notation:
141
142 $int = 0xDEADBEEF;
143 $dec = sprintf("%d", $int);
7207e29d 144
6761e064 145Using the hex function:
146
147 $int = hex("DEADBEEF");
148 $dec = sprintf("%d", $int);
149
150Using pack:
151
152 $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
153 $dec = sprintf("%d", $int);
154
155Using the CPAN module Bit::Vector:
156
157 use Bit::Vector;
158 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
159 $dec = $vec->to_Dec();
160
161=item B<How do I convert from decimal to hexadecimal:>
162
163Using sprint:
164
165 $hex = sprintf("%X", 3735928559);
166
167Using unpack
168
169 $hex = unpack("H*", pack("N", 3735928559));
170
171Using Bit::Vector
172
173 use Bit::Vector;
174 $vec = Bit::Vector->new_Dec(32, -559038737);
175 $hex = $vec->to_Hex();
176
177And Bit::Vector supports odd bit counts:
178
179 use Bit::Vector;
180 $vec = Bit::Vector->new_Dec(33, 3735928559);
181 $vec->Resize(32); # suppress leading 0 if unwanted
182 $hex = $vec->to_Hex();
183
184=item B<How do I convert from octal to decimal:>
185
186Using Perl's built in conversion of numbers with leading zeros:
187
188 $int = 033653337357; # note the leading 0!
189 $dec = sprintf("%d", $int);
190
191Using the oct function:
192
193 $int = oct("33653337357");
194 $dec = sprintf("%d", $int);
195
196Using Bit::Vector:
197
198 use Bit::Vector;
199 $vec = Bit::Vector->new(32);
200 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
201 $dec = $vec->to_Dec();
202
203=item B<How do I convert from decimal to octal:>
204
205Using sprintf:
206
207 $oct = sprintf("%o", 3735928559);
208
209Using Bit::Vector
210
211 use Bit::Vector;
212 $vec = Bit::Vector->new_Dec(32, -559038737);
213 $oct = reverse join('', $vec->Chunk_List_Read(3));
214
215=item B<How do I convert from binary to decimal:>
216
2c646907 217Perl 5.6 lets you write binary numbers directly with
218the 0b notation:
219
220 $number = 0b10110110;
221
6761e064 222Using pack and ord
d92eb7b0 223
224 $decimal = ord(pack('B8', '10110110'));
68dc0745 225
6761e064 226Using pack and unpack for larger strings
227
228 $int = unpack("N", pack("B32",
229 substr("0" x 32 . "11110101011011011111011101111", -32)));
230 $dec = sprintf("%d", $int);
231
5efd7060 232 # substr() is used to left pad a 32 character string with zeros.
6761e064 233
234Using Bit::Vector:
235
236 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
237 $dec = $vec->to_Dec();
238
239=item B<How do I convert from decimal to binary:>
240
241Using unpack;
242
243 $bin = unpack("B*", pack("N", 3735928559));
244
245Using Bit::Vector:
246
247 use Bit::Vector;
248 $vec = Bit::Vector->new_Dec(32, -559038737);
249 $bin = $vec->to_Bin();
250
251The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
252are left as an exercise to the inclined reader.
68dc0745 253
68dc0745 254
65acb1b1 255=head2 Why doesn't & work the way I want it to?
256
257The behavior of binary arithmetic operators depends on whether they're
258used on numbers or strings. The operators treat a string as a series
259of bits and work with that (the string C<"3"> is the bit pattern
260C<00110011>). The operators work with the binary form of a number
261(the number C<3> is treated as the bit pattern C<00000011>).
262
263So, saying C<11 & 3> performs the "and" operation on numbers (yielding
264C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
265(yielding C<"1">).
266
267Most problems with C<&> and C<|> arise because the programmer thinks
268they have a number but really it's a string. The rest arise because
269the programmer says:
270
271 if ("\020\020" & "\101\101") {
272 # ...
273 }
274
275but a string consisting of two null bytes (the result of C<"\020\020"
276& "\101\101">) is not a false value in Perl. You need:
277
278 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
279 # ...
280 }
281
68dc0745 282=head2 How do I multiply matrices?
283
284Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
285or the PDL extension (also available from CPAN).
286
287=head2 How do I perform an operation on a series of integers?
288
289To call a function on each element in an array, and collect the
290results, use:
291
292 @results = map { my_func($_) } @array;
293
294For example:
295
296 @triple = map { 3 * $_ } @single;
297
298To call a function on each element of an array, but ignore the
299results:
300
301 foreach $iterator (@array) {
65acb1b1 302 some_func($iterator);
68dc0745 303 }
304
305To call a function on each integer in a (small) range, you B<can> use:
306
65acb1b1 307 @results = map { some_func($_) } (5 .. 25);
68dc0745 308
309but you should be aware that the C<..> operator creates an array of
310all integers in the range. This can take a lot of memory for large
311ranges. Instead use:
312
313 @results = ();
314 for ($i=5; $i < 500_005; $i++) {
65acb1b1 315 push(@results, some_func($i));
68dc0745 316 }
317
87275199 318This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
319loop will iterate over the range, without creating the entire range.
320
321 for my $i (5 .. 500_005) {
322 push(@results, some_func($i));
323 }
324
325will not create a list of 500,000 integers.
326
68dc0745 327=head2 How can I output Roman numerals?
328
a93751fa 329Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 330
331=head2 Why aren't my random numbers random?
332
65acb1b1 333If you're using a version of Perl before 5.004, you must call C<srand>
334once at the start of your program to seed the random number generator.
3355.004 and later automatically call C<srand> at the beginning. Don't
336call C<srand> more than once--you make your numbers less random, rather
337than more.
92c2ed05 338
65acb1b1 339Computers are good at being predictable and bad at being random
06a5f41f 340(despite appearances caused by bugs in your programs :-). see the
341F<random> artitcle in the "Far More Than You Ever Wanted To Know"
342collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of
343Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
344who attempts to generate random numbers by deterministic means is, of
65acb1b1 345course, living in a state of sin.''
346
347If you want numbers that are more random than C<rand> with C<srand>
348provides, you should also check out the Math::TrulyRandom module from
349CPAN. It uses the imperfections in your system's timer to generate
350random numbers, but this takes quite a while. If you want a better
92c2ed05 351pseudorandom generator than comes with your operating system, look at
65acb1b1 352``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 353
881bdbd4 354=head2 How do I get a random number between X and Y?
355
356Use the following simple function. It selects a random integer between
357(and possibly including!) the two given integers, e.g.,
358C<random_int_in(50,120)>
359
360 sub random_int_in ($$) {
361 my($min, $max) = @_;
362 # Assumes that the two arguments are integers themselves!
363 return $min if $min == $max;
364 ($min, $max) = ($max, $min) if $min > $max;
365 return $min + int rand(1 + $max - $min);
366 }
367
68dc0745 368=head1 Data: Dates
369
370=head2 How do I find the week-of-the-year/day-of-the-year?
371
372The day of the year is in the array returned by localtime() (see
373L<perlfunc/"localtime">):
374
375 $day_of_year = (localtime(time()))[7];
376
d92eb7b0 377=head2 How do I find the current century or millennium?
378
379Use the following simple functions:
380
381 sub get_century {
382 return int((((localtime(shift || time))[5] + 1999))/100);
383 }
384 sub get_millennium {
385 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
386 }
387
388On some systems, you'll find that the POSIX module's strftime() function
389has been extended in a non-standard way to use a C<%C> format, which they
390sometimes claim is the "century". It isn't, because on most such systems,
391this is only the first two digits of the four-digit year, and thus cannot
392be used to reliably determine the current century or millennium.
393
92c2ed05 394=head2 How can I compare two dates and find the difference?
68dc0745 395
92c2ed05 396If you're storing your dates as epoch seconds then simply subtract one
397from the other. If you've got a structured date (distinct year, day,
d92eb7b0 398month, hour, minute, seconds values), then for reasons of accessibility,
399simplicity, and efficiency, merely use either timelocal or timegm (from
400the Time::Local module in the standard distribution) to reduce structured
401dates to epoch seconds. However, if you don't know the precise format of
402your dates, then you should probably use either of the Date::Manip and
403Date::Calc modules from CPAN before you go hacking up your own parsing
404routine to handle arbitrary date formats.
68dc0745 405
406=head2 How can I take a string and turn it into epoch seconds?
407
408If it's a regular enough string that it always has the same format,
92c2ed05 409you can split it up and pass the parts to C<timelocal> in the standard
410Time::Local module. Otherwise, you should look into the Date::Calc
411and Date::Manip modules from CPAN.
68dc0745 412
413=head2 How can I find the Julian Day?
414
2a2bf5f4 415Use the Time::JulianDay module (part of the Time-modules bundle
416available from CPAN.)
d92eb7b0 417
89435c96 418Before you immerse yourself too deeply in this, be sure to verify that
419it is the I<Julian> Day you really want. Are you interested in a way
420of getting serial days so that you just can tell how many days they
421are apart or so that you can do also other date arithmetic? If you
d92eb7b0 422are interested in performing date arithmetic, this can be done using
2a2bf5f4 423modules Date::Manip or Date::Calc.
89435c96 424
425There is too many details and much confusion on this issue to cover in
426this FAQ, but the term is applied (correctly) to a calendar now
427supplanted by the Gregorian Calendar, with the Julian Calendar failing
428to adjust properly for leap years on centennial years (among other
429annoyances). The term is also used (incorrectly) to mean: [1] days in
430the Gregorian Calendar; and [2] days since a particular starting time
431or `epoch', usually 1970 in the Unix world and 1980 in the
432MS-DOS/Windows world. If you find that it is not the first meaning
433that you really want, then check out the Date::Manip and Date::Calc
434modules. (Thanks to David Cassell for most of this text.)
be94a901 435
65acb1b1 436=head2 How do I find yesterday's date?
437
438The C<time()> function returns the current time in seconds since the
d92eb7b0 439epoch. Take twenty-four hours off that:
65acb1b1 440
441 $yesterday = time() - ( 24 * 60 * 60 );
442
443Then you can pass this to C<localtime()> and get the individual year,
444month, day, hour, minute, seconds values.
445
d92eb7b0 446Note very carefully that the code above assumes that your days are
447twenty-four hours each. For most people, there are two days a year
448when they aren't: the switch to and from summer time throws this off.
449A solution to this issue is offered by Russ Allbery.
450
451 sub yesterday {
452 my $now = defined $_[0] ? $_[0] : time;
453 my $then = $now - 60 * 60 * 24;
454 my $ndst = (localtime $now)[8] > 0;
455 my $tdst = (localtime $then)[8] > 0;
456 $then - ($tdst - $ndst) * 60 * 60;
457 }
458 # Should give you "this time yesterday" in seconds since epoch relative to
459 # the first argument or the current time if no argument is given and
460 # suitable for passing to localtime or whatever else you need to do with
461 # it. $ndst is whether we're currently in daylight savings time; $tdst is
462 # whether the point 24 hours ago was in daylight savings time. If $tdst
463 # and $ndst are the same, a boundary wasn't crossed, and the correction
464 # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
465 # from yesterday's time since we gained an extra hour while going off
466 # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
467 # negative hour (add an hour) to yesterday's time since we lost an hour.
468 #
469 # All of this is because during those days when one switches off or onto
470 # DST, a "day" isn't 24 hours long; it's either 23 or 25.
471 #
472 # The explicit settings of $ndst and $tdst are necessary because localtime
473 # only says it returns the system tm struct, and the system tm struct at
87275199 474 # least on Solaris doesn't guarantee any particular positive value (like,
d92eb7b0 475 # say, 1) for isdst, just a positive value. And that value can
476 # potentially be negative, if DST information isn't available (this sub
477 # just treats those cases like no DST).
478 #
479 # Note that between 2am and 3am on the day after the time zone switches
480 # off daylight savings time, the exact hour of "yesterday" corresponding
481 # to the current hour is not clearly defined. Note also that if used
482 # between 2am and 3am the day after the change to daylight savings time,
483 # the result will be between 3am and 4am of the previous day; it's
484 # arguable whether this is correct.
485 #
486 # This sub does not attempt to deal with leap seconds (most things don't).
487 #
488 # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
489 # This code is in the public domain
490
87275199 491=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 492
65acb1b1 493Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
494Y2K compliant (whatever that means). The programmers you've hired to
495use it, however, probably are not.
496
497Long answer: The question belies a true understanding of the issue.
498Perl is just as Y2K compliant as your pencil--no more, and no less.
499Can you use your pencil to write a non-Y2K-compliant memo? Of course
500you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 501
87275199 502The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 503supply adequate information to determine the year well beyond 2000
504(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 505by these functions when used in a list context is the year minus 1900.
65acb1b1 506For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
507number. To avoid the year 2000 problem simply do not treat the year as
508a 2-digit number. It isn't.
68dc0745 509
5a964f20 510When gmtime() and localtime() are used in scalar context they return
68dc0745 511a timestamp string that contains a fully-expanded year. For example,
512C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5132001". There's no year 2000 problem here.
514
5a964f20 515That doesn't mean that Perl can't be used to create non-Y2K compliant
516programs. It can. But so can your pencil. It's the fault of the user,
517not the language. At the risk of inflaming the NRA: ``Perl doesn't
518break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
519a longer exposition.
520
68dc0745 521=head1 Data: Strings
522
523=head2 How do I validate input?
524
525The answer to this question is usually a regular expression, perhaps
5a964f20 526with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 527addresses, etc.) for details.
528
529=head2 How do I unescape a string?
530
92c2ed05 531It depends just what you mean by ``escape''. URL escapes are dealt
532with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 533character are removed with
68dc0745 534
535 s/\\(.)/$1/g;
536
92c2ed05 537This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 538
539=head2 How do I remove consecutive pairs of characters?
540
92c2ed05 541To turn C<"abbcccd"> into C<"abccd">:
68dc0745 542
d92eb7b0 543 s/(.)\1/$1/g; # add /s to include newlines
544
545Here's a solution that turns "abbcccd" to "abcd":
546
547 y///cs; # y == tr, but shorter :-)
68dc0745 548
549=head2 How do I expand function calls in a string?
550
551This is documented in L<perlref>. In general, this is fraught with
552quoting and readability problems, but it is possible. To interpolate
5a964f20 553a subroutine call (in list context) into a string:
68dc0745 554
555 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
556
557If you prefer scalar context, similar chicanery is also useful for
558arbitrary expressions:
559
560 print "That yields ${\($n + 5)} widgets\n";
561
92c2ed05 562Version 5.004 of Perl had a bug that gave list context to the
563expression in C<${...}>, but this is fixed in version 5.005.
564
565See also ``How can I expand variables in text strings?'' in this
566section of the FAQ.
46fc3d4c 567
68dc0745 568=head2 How do I find matching/nesting anything?
569
92c2ed05 570This isn't something that can be done in one regular expression, no
571matter how complicated. To find something between two single
572characters, a pattern like C</x([^x]*)x/> will get the intervening
573bits in $1. For multiple ones, then something more like
574C</alpha(.*?)omega/> would be needed. But none of these deals with
575nested patterns, nor can they. For that you'll have to write a
576parser.
577
578If you are serious about writing a parser, there are a number of
6a2af475 579modules or oddities that will make your life a lot easier. There are
580the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
83df6a1d 581and the byacc program. Starting from perl 5.8 the Text::Balanced
582is part of the standard distribution.
68dc0745 583
92c2ed05 584One simple destructive, inside-out approach that you might try is to
585pull out the smallest nesting parts one at a time:
5a964f20 586
d92eb7b0 587 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 588 # do something with $1
589 }
590
65acb1b1 591A more complicated and sneaky approach is to make Perl's regular
592expression engine do it for you. This is courtesy Dean Inada, and
593rather has the nature of an Obfuscated Perl Contest entry, but it
594really does work:
595
596 # $_ contains the string to parse
597 # BEGIN and END are the opening and closing markers for the
598 # nested text.
c47ff5f1 599
65acb1b1 600 @( = ('(','');
601 @) = (')','');
602 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
5ed30e05 603 @$ = (eval{/$re/},$@!~/unmatched/i);
65acb1b1 604 print join("\n",@$[0..$#$]) if( $$[-1] );
605
68dc0745 606=head2 How do I reverse a string?
607
5a964f20 608Use reverse() in scalar context, as documented in
68dc0745 609L<perlfunc/reverse>.
610
611 $reversed = reverse $string;
612
613=head2 How do I expand tabs in a string?
614
5a964f20 615You can do it yourself:
68dc0745 616
617 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
618
87275199 619Or you can just use the Text::Tabs module (part of the standard Perl
68dc0745 620distribution).
621
622 use Text::Tabs;
623 @expanded_lines = expand(@lines_with_tabs);
624
625=head2 How do I reformat a paragraph?
626
87275199 627Use Text::Wrap (part of the standard Perl distribution):
68dc0745 628
629 use Text::Wrap;
630 print wrap("\t", ' ', @paragraphs);
631
92c2ed05 632The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 633newlines. Text::Wrap doesn't justify the lines (flush-right).
634
bc06af74 635Or use the CPAN module Text::Autoformat. Formatting files can be easily
636done by making a shell alias, like so:
637
638 alias fmt="perl -i -MText::Autoformat -n0777 \
639 -e 'print autoformat $_, {all=>1}' $*"
640
641See the documentation for Text::Autoformat to appreciate its many
642capabilities.
643
68dc0745 644=head2 How can I access/change the first N letters of a string?
645
646There are many ways. If you just want to grab a copy, use
92c2ed05 647substr():
68dc0745 648
649 $first_byte = substr($a, 0, 1);
650
651If you want to modify part of a string, the simplest way is often to
652use substr() as an lvalue:
653
654 substr($a, 0, 3) = "Tom";
655
92c2ed05 656Although those with a pattern matching kind of thought process will
a6dd486b 657likely prefer
68dc0745 658
659 $a =~ s/^.../Tom/;
660
661=head2 How do I change the Nth occurrence of something?
662
92c2ed05 663You have to keep track of N yourself. For example, let's say you want
664to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 665C<"whosoever"> or C<"whomsoever">, case insensitively. These
666all assume that $_ contains the string to be altered.
68dc0745 667
668 $count = 0;
669 s{((whom?)ever)}{
670 ++$count == 5 # is it the 5th?
671 ? "${2}soever" # yes, swap
672 : $1 # renege and leave it there
d92eb7b0 673 }ige;
68dc0745 674
5a964f20 675In the more general case, you can use the C</g> modifier in a C<while>
676loop, keeping count of matches.
677
678 $WANT = 3;
679 $count = 0;
d92eb7b0 680 $_ = "One fish two fish red fish blue fish";
5a964f20 681 while (/(\w+)\s+fish\b/gi) {
682 if (++$count == $WANT) {
683 print "The third fish is a $1 one.\n";
5a964f20 684 }
685 }
686
92c2ed05 687That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 688repetition count and repeated pattern like this:
689
690 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
691
68dc0745 692=head2 How can I count the number of occurrences of a substring within a string?
693
a6dd486b 694There are a number of ways, with varying efficiency. If you want a
68dc0745 695count of a certain single character (X) within a string, you can use the
696C<tr///> function like so:
697
368c9434 698 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 699 $count = ($string =~ tr/X//);
d92eb7b0 700 print "There are $count X characters in the string";
68dc0745 701
702This is fine if you are just looking for a single character. However,
703if you are trying to count multiple character substrings within a
704larger string, C<tr///> won't work. What you can do is wrap a while()
705loop around a global pattern match. For example, let's count negative
706integers:
707
708 $string = "-9 55 48 -2 23 -76 4 14 -44";
709 while ($string =~ /-\d+/g) { $count++ }
710 print "There are $count negative numbers in the string";
711
881bdbd4 712Another version uses a global match in list context, then assigns the
713result to a scalar, producing a count of the number of matches.
714
715 $count = () = $string =~ /-\d+/g;
716
68dc0745 717=head2 How do I capitalize all the words on one line?
718
719To make the first letter of each word upper case:
3fe9a6f1 720
68dc0745 721 $line =~ s/\b(\w)/\U$1/g;
722
46fc3d4c 723This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 724Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 725more thorough solution (Suggested by brian d foy):
46fc3d4c 726
727 $string =~ s/ (
728 (^\w) #at the beginning of the line
729 | # or
730 (\s\w) #preceded by whitespace
731 )
732 /\U$1/xg;
733 $string =~ /([\w']+)/\u\L$1/g;
734
68dc0745 735To make the whole line upper case:
3fe9a6f1 736
68dc0745 737 $line = uc($line);
738
739To force each word to be lower case, with the first letter upper case:
3fe9a6f1 740
68dc0745 741 $line =~ s/(\w+)/\u\L$1/g;
742
5a964f20 743You can (and probably should) enable locale awareness of those
744characters by placing a C<use locale> pragma in your program.
92c2ed05 745See L<perllocale> for endless details on locales.
5a964f20 746
65acb1b1 747This is sometimes referred to as putting something into "title
d92eb7b0 748case", but that's not quite accurate. Consider the proper
65acb1b1 749capitalization of the movie I<Dr. Strangelove or: How I Learned to
750Stop Worrying and Love the Bomb>, for example.
751
68dc0745 752=head2 How can I split a [character] delimited string except when inside
753[character]? (Comma-separated files)
754
755Take the example case of trying to split a string that is comma-separated
756into its different fields. (We'll pretend you said comma-separated, not
757comma-delimited, which is different and almost never what you mean.) You
758can't use C<split(/,/)> because you shouldn't split if the comma is inside
759quotes. For example, take a data line like this:
760
761 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
762
763Due to the restriction of the quotes, this is a fairly complex
764problem. Thankfully, we have Jeffrey Friedl, author of a highly
765recommended book on regular expressions, to handle these for us. He
766suggests (assuming your string is contained in $text):
767
768 @new = ();
769 push(@new, $+) while $text =~ m{
770 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
771 | ([^,]+),?
772 | ,
773 }gx;
774 push(@new, undef) if substr($text,-1,1) eq ',';
775
46fc3d4c 776If you want to represent quotation marks inside a
777quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 778C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 779this section.
780
87275199 781Alternatively, the Text::ParseWords module (part of the standard Perl
68dc0745 782distribution) lets you say:
783
784 use Text::ParseWords;
785 @new = quotewords(",", 0, $text);
786
a6dd486b 787There's also a Text::CSV (Comma-Separated Values) module on CPAN.
65acb1b1 788
68dc0745 789=head2 How do I strip blank space from the beginning/end of a string?
790
a6dd486b 791Although the simplest approach would seem to be
68dc0745 792
793 $string =~ s/^\s*(.*?)\s*$/$1/;
794
a6dd486b 795not only is this unnecessarily slow and destructive, it also fails with
d92eb7b0 796embedded newlines. It is much faster to do this operation in two steps:
68dc0745 797
798 $string =~ s/^\s+//;
799 $string =~ s/\s+$//;
800
801Or more nicely written as:
802
803 for ($string) {
804 s/^\s+//;
805 s/\s+$//;
806 }
807
5e3006a4 808This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 809behavior to factor out common code. You can do this
810on several strings at once, or arrays, or even the
d92eb7b0 811values of a hash if you use a slice:
5a964f20 812
813 # trim whitespace in the scalar, the array,
814 # and all the values in the hash
815 foreach ($scalar, @array, @hash{keys %hash}) {
816 s/^\s+//;
817 s/\s+$//;
818 }
819
65acb1b1 820=head2 How do I pad a string with blanks or pad a number with zeroes?
821
d92eb7b0 822(This answer contributed by Uri Guttman, with kibitzing from
823Bart Lateur.)
65acb1b1 824
825In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 826to pad the string, C<$text> or C<$num> contains the string to be padded,
827and C<$pad_char> contains the padding character. You can use a single
828character string constant instead of the C<$pad_char> variable if you
829know what it is in advance. And in the same way you can use an integer in
830place of C<$pad_len> if you know the pad length in advance.
65acb1b1 831
d92eb7b0 832The simplest method uses the C<sprintf> function. It can pad on the left
833or right with blanks and on the left with zeroes and it will not
834truncate the result. The C<pack> function can only pad strings on the
835right with blanks and it will truncate the result to a maximum length of
836C<$pad_len>.
65acb1b1 837
d92eb7b0 838 # Left padding a string with blanks (no truncation):
839 $padded = sprintf("%${pad_len}s", $text);
65acb1b1 840
d92eb7b0 841 # Right padding a string with blanks (no truncation):
842 $padded = sprintf("%-${pad_len}s", $text);
65acb1b1 843
d92eb7b0 844 # Left padding a number with 0 (no truncation):
845 $padded = sprintf("%0${pad_len}d", $num);
65acb1b1 846
d92eb7b0 847 # Right padding a string with blanks using pack (will truncate):
848 $padded = pack("A$pad_len",$text);
65acb1b1 849
d92eb7b0 850If you need to pad with a character other than blank or zero you can use
851one of the following methods. They all generate a pad string with the
852C<x> operator and combine that with C<$text>. These methods do
853not truncate C<$text>.
65acb1b1 854
d92eb7b0 855Left and right padding with any character, creating a new string:
65acb1b1 856
d92eb7b0 857 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
858 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 859
d92eb7b0 860Left and right padding with any character, modifying C<$text> directly:
65acb1b1 861
d92eb7b0 862 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
863 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 864
68dc0745 865=head2 How do I extract selected columns from a string?
866
867Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 868If you prefer thinking in terms of columns instead of widths,
869you can use this kind of thing:
870
871 # determine the unpack format needed to split Linux ps output
872 # arguments are cut columns
873 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
874
875 sub cut2fmt {
876 my(@positions) = @_;
877 my $template = '';
878 my $lastpos = 1;
879 for my $place (@positions) {
880 $template .= "A" . ($place - $lastpos) . " ";
881 $lastpos = $place;
882 }
883 $template .= "A*";
884 return $template;
885 }
68dc0745 886
887=head2 How do I find the soundex value of a string?
888
87275199 889Use the standard Text::Soundex module distributed with Perl.
a6dd486b 890Before you do so, you may want to determine whether `soundex' is in
d92eb7b0 891fact what you think it is. Knuth's soundex algorithm compresses words
892into a small space, and so it does not necessarily distinguish between
893two words which you might want to appear separately. For example, the
894last names `Knuth' and `Kant' are both mapped to the soundex code K530.
895If Text::Soundex does not do what you are looking for, you might want
896to consider the String::Approx module available at CPAN.
68dc0745 897
898=head2 How can I expand variables in text strings?
899
900Let's assume that you have a string like:
901
902 $text = 'this has a $foo in it and a $bar';
5a964f20 903
904If those were both global variables, then this would
905suffice:
906
65acb1b1 907 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 908
5a964f20 909But since they are probably lexicals, or at least, they could
910be, you'd have to do this:
68dc0745 911
912 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 913 die if $@; # needed /ee, not /e
68dc0745 914
5a964f20 915It's probably better in the general case to treat those
916variables as entries in some special hash. For example:
917
918 %user_defs = (
919 foo => 23,
920 bar => 19,
921 );
922 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 923
92c2ed05 924See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 925of the FAQ.
926
68dc0745 927=head2 What's wrong with always quoting "$vars"?
928
a6dd486b 929The problem is that those double-quotes force stringification--
930coercing numbers and references into strings--even when you
931don't want them to be strings. Think of it this way: double-quote
65acb1b1 932expansion is used to produce new strings. If you already
933have a string, why do you need more?
68dc0745 934
935If you get used to writing odd things like these:
936
937 print "$var"; # BAD
938 $new = "$old"; # BAD
939 somefunc("$var"); # BAD
940
941You'll be in trouble. Those should (in 99.8% of the cases) be
942the simpler and more direct:
943
944 print $var;
945 $new = $old;
946 somefunc($var);
947
948Otherwise, besides slowing you down, you're going to break code when
949the thing in the scalar is actually neither a string nor a number, but
950a reference:
951
952 func(\@array);
953 sub func {
954 my $aref = shift;
955 my $oref = "$aref"; # WRONG
956 }
957
958You can also get into subtle problems on those few operations in Perl
959that actually do care about the difference between a string and a
960number, such as the magical C<++> autoincrement operator or the
961syscall() function.
962
5a964f20 963Stringification also destroys arrays.
964
965 @lines = `command`;
966 print "@lines"; # WRONG - extra blanks
967 print @lines; # right
968
c47ff5f1 969=head2 Why don't my <<HERE documents work?
68dc0745 970
971Check for these three things:
972
973=over 4
974
975=item 1. There must be no space after the << part.
976
977=item 2. There (probably) should be a semicolon at the end.
978
979=item 3. You can't (easily) have any space in front of the tag.
980
981=back
982
5a964f20 983If you want to indent the text in the here document, you
984can do this:
985
986 # all in one
987 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
988 your text
989 goes here
990 HERE_TARGET
991
992But the HERE_TARGET must still be flush against the margin.
993If you want that indented also, you'll have to quote
994in the indentation.
995
996 ($quote = <<' FINIS') =~ s/^\s+//gm;
997 ...we will have peace, when you and all your works have
998 perished--and the works of your dark master to whom you
999 would deliver us. You are a liar, Saruman, and a corrupter
1000 of men's hearts. --Theoden in /usr/src/perl/taint.c
1001 FINIS
83ded9ee 1002 $quote =~ s/\s+--/\n--/;
5a964f20 1003
1004A nice general-purpose fixer-upper function for indented here documents
1005follows. It expects to be called with a here document as its argument.
1006It looks to see whether each line begins with a common substring, and
a6dd486b 1007if so, strips that substring off. Otherwise, it takes the amount of leading
1008whitespace found on the first line and removes that much off each
5a964f20 1009subsequent line.
1010
1011 sub fix {
1012 local $_ = shift;
a6dd486b 1013 my ($white, $leader); # common whitespace and common leading string
5a964f20 1014 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1015 ($white, $leader) = ($2, quotemeta($1));
1016 } else {
1017 ($white, $leader) = (/^(\s+)/, '');
1018 }
1019 s/^\s*?$leader(?:$white)?//gm;
1020 return $_;
1021 }
1022
c8db1d39 1023This works with leading special strings, dynamically determined:
5a964f20 1024
1025 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1026 @@@ int
1027 @@@ runops() {
1028 @@@ SAVEI32(runlevel);
1029 @@@ runlevel++;
d92eb7b0 1030 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1031 @@@ TAINT_NOT;
1032 @@@ return 0;
1033 @@@ }
1034 MAIN_INTERPRETER_LOOP
1035
a6dd486b 1036Or with a fixed amount of leading whitespace, with remaining
5a964f20 1037indentation correctly preserved:
1038
1039 $poem = fix<<EVER_ON_AND_ON;
1040 Now far ahead the Road has gone,
1041 And I must follow, if I can,
1042 Pursuing it with eager feet,
1043 Until it joins some larger way
1044 Where many paths and errands meet.
1045 And whither then? I cannot say.
1046 --Bilbo in /usr/src/perl/pp_ctl.c
1047 EVER_ON_AND_ON
1048
68dc0745 1049=head1 Data: Arrays
1050
65acb1b1 1051=head2 What is the difference between a list and an array?
1052
1053An array has a changeable length. A list does not. An array is something
1054you can push or pop, while a list is a set of values. Some people make
1055the distinction that a list is a value while an array is a variable.
1056Subroutines are passed and return lists, you put things into list
1057context, you initialize arrays with lists, and you foreach() across
1058a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
1059in scalar context behave like the number of elements in them, subroutines
a6dd486b 1060access their arguments through the array C<@_>, and push/pop/shift only work
65acb1b1 1061on arrays.
1062
1063As a side note, there's no such thing as a list in scalar context.
1064When you say
1065
1066 $scalar = (2, 5, 7, 9);
1067
d92eb7b0 1068you're using the comma operator in scalar context, so it uses the scalar
1069comma operator. There never was a list there at all! This causes the
1070last value to be returned: 9.
65acb1b1 1071
68dc0745 1072=head2 What is the difference between $array[1] and @array[1]?
1073
a6dd486b 1074The former is a scalar value; the latter an array slice, making
68dc0745 1075it a list with one (scalar) value. You should use $ when you want a
1076scalar value (most of the time) and @ when you want a list with one
1077scalar value in it (very, very rarely; nearly never, in fact).
1078
1079Sometimes it doesn't make a difference, but sometimes it does.
1080For example, compare:
1081
1082 $good[0] = `some program that outputs several lines`;
1083
1084with
1085
1086 @bad[0] = `same program that outputs several lines`;
1087
9f1b1f2d 1088The C<use warnings> pragma and the B<-w> flag will warn you about these
1089matters.
68dc0745 1090
d92eb7b0 1091=head2 How can I remove duplicate elements from a list or array?
68dc0745 1092
1093There are several possible ways, depending on whether the array is
1094ordered and whether you wish to preserve the ordering.
1095
1096=over 4
1097
551e1d92 1098=item a)
1099
1100If @in is sorted, and you want @out to be sorted:
5a964f20 1101(this assumes all true values in the array)
68dc0745 1102
a4341a65 1103 $prev = "not equal to $in[0]";
3bc5ef3e 1104 @out = grep($_ ne $prev && ($prev = $_, 1), @in);
68dc0745 1105
c8db1d39 1106This is nice in that it doesn't use much extra memory, simulating
3bc5ef3e 1107uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
1108guarantees that the expression is true (so that grep picks it up)
1109even if the $_ is 0, "", or undef.
68dc0745 1110
551e1d92 1111=item b)
1112
1113If you don't know whether @in is sorted:
68dc0745 1114
1115 undef %saw;
1116 @out = grep(!$saw{$_}++, @in);
1117
551e1d92 1118=item c)
1119
1120Like (b), but @in contains only small integers:
68dc0745 1121
1122 @out = grep(!$saw[$_]++, @in);
1123
551e1d92 1124=item d)
1125
1126A way to do (b) without any loops or greps:
68dc0745 1127
1128 undef %saw;
1129 @saw{@in} = ();
1130 @out = sort keys %saw; # remove sort if undesired
1131
551e1d92 1132=item e)
1133
1134Like (d), but @in contains only small positive integers:
68dc0745 1135
1136 undef @ary;
1137 @ary[@in] = @in;
87275199 1138 @out = grep {defined} @ary;
68dc0745 1139
1140=back
1141
65acb1b1 1142But perhaps you should have been using a hash all along, eh?
1143
ddbc1f16 1144=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1145
1146Hearing the word "in" is an I<in>dication that you probably should have
1147used a hash, not a list or array, to store your data. Hashes are
1148designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1149
5a964f20 1150That being said, there are several ways to approach this. If you
1151are going to make this query many times over arbitrary string values,
881bdbd4 1152the fastest way is probably to invert the original array and maintain a
1153hash whose keys are the first array's values.
68dc0745 1154
1155 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
881bdbd4 1156 %is_blue = ();
68dc0745 1157 for (@blues) { $is_blue{$_} = 1 }
1158
1159Now you can check whether $is_blue{$some_color}. It might have been a
1160good idea to keep the blues all in a hash in the first place.
1161
1162If the values are all small integers, you could use a simple indexed
1163array. This kind of an array will take up less space:
1164
1165 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
881bdbd4 1166 @is_tiny_prime = ();
d92eb7b0 1167 for (@primes) { $is_tiny_prime[$_] = 1 }
1168 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1169
1170Now you check whether $is_tiny_prime[$some_number].
1171
1172If the values in question are integers instead of strings, you can save
1173quite a lot of space by using bit strings instead:
1174
1175 @articles = ( 1..10, 150..2000, 2017 );
1176 undef $read;
7b8d334a 1177 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1178
1179Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1180
1181Please do not use
1182
a6dd486b 1183 ($is_there) = grep $_ eq $whatever, @array;
68dc0745 1184
1185or worse yet
1186
a6dd486b 1187 ($is_there) = grep /$whatever/, @array;
68dc0745 1188
1189These are slow (checks every element even if the first matches),
1190inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1191regex characters in $whatever?). If you're only testing once, then
65acb1b1 1192use:
1193
1194 $is_there = 0;
1195 foreach $elt (@array) {
1196 if ($elt eq $elt_to_find) {
1197 $is_there = 1;
1198 last;
1199 }
1200 }
1201 if ($is_there) { ... }
68dc0745 1202
1203=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1204
1205Use a hash. Here's code to do both and more. It assumes that
1206each element is unique in a given array:
1207
1208 @union = @intersection = @difference = ();
1209 %count = ();
1210 foreach $element (@array1, @array2) { $count{$element}++ }
1211 foreach $element (keys %count) {
1212 push @union, $element;
1213 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1214 }
1215
d92eb7b0 1216Note that this is the I<symmetric difference>, that is, all elements in
a6dd486b 1217either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1218
65acb1b1 1219=head2 How do I test whether two arrays or hashes are equal?
1220
1221The following code works for single-level arrays. It uses a stringwise
1222comparison, and does not distinguish defined versus undefined empty
1223strings. Modify if you have other needs.
1224
1225 $are_equal = compare_arrays(\@frogs, \@toads);
1226
1227 sub compare_arrays {
1228 my ($first, $second) = @_;
9f1b1f2d 1229 no warnings; # silence spurious -w undef complaints
65acb1b1 1230 return 0 unless @$first == @$second;
1231 for (my $i = 0; $i < @$first; $i++) {
1232 return 0 if $first->[$i] ne $second->[$i];
1233 }
1234 return 1;
1235 }
1236
1237For multilevel structures, you may wish to use an approach more
1238like this one. It uses the CPAN module FreezeThaw:
1239
1240 use FreezeThaw qw(cmpStr);
1241 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1242
1243 printf "a and b contain %s arrays\n",
1244 cmpStr(\@a, \@b) == 0
1245 ? "the same"
1246 : "different";
1247
1248This approach also works for comparing hashes. Here
1249we'll demonstrate two different answers:
1250
1251 use FreezeThaw qw(cmpStr cmpStrHard);
1252
1253 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1254 $a{EXTRA} = \%b;
1255 $b{EXTRA} = \%a;
1256
1257 printf "a and b contain %s hashes\n",
1258 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1259
1260 printf "a and b contain %s hashes\n",
1261 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1262
1263
1264The first reports that both those the hashes contain the same data,
1265while the second reports that they do not. Which you prefer is left as
1266an exercise to the reader.
1267
68dc0745 1268=head2 How do I find the first array element for which a condition is true?
1269
1270You can use this if you care about the index:
1271
65acb1b1 1272 for ($i= 0; $i < @array; $i++) {
68dc0745 1273 if ($array[$i] eq "Waldo") {
1274 $found_index = $i;
1275 last;
1276 }
1277 }
1278
1279Now C<$found_index> has what you want.
1280
1281=head2 How do I handle linked lists?
1282
1283In general, you usually don't need a linked list in Perl, since with
1284regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1285or you can use splice to add and/or remove arbitrary number of elements at
87275199 1286arbitrary points. Both pop and shift are both O(1) operations on Perl's
5a964f20 1287dynamic arrays. In the absence of shifts and pops, push in general
1288needs to reallocate on the order every log(N) times, and unshift will
1289need to copy pointers each time.
68dc0745 1290
1291If you really, really wanted, you could use structures as described in
1292L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1293to do. For example, imagine a list node like this:
1294
1295 $node = {
1296 VALUE => 42,
1297 LINK => undef,
1298 };
1299
1300You could walk the list this way:
1301
1302 print "List: ";
1303 for ($node = $head; $node; $node = $node->{LINK}) {
1304 print $node->{VALUE}, " ";
1305 }
1306 print "\n";
1307
a6dd486b 1308You could add to the list this way:
65acb1b1 1309
1310 my ($head, $tail);
1311 $tail = append($head, 1); # grow a new head
1312 for $value ( 2 .. 10 ) {
1313 $tail = append($tail, $value);
1314 }
1315
1316 sub append {
1317 my($list, $value) = @_;
1318 my $node = { VALUE => $value };
1319 if ($list) {
1320 $node->{LINK} = $list->{LINK};
1321 $list->{LINK} = $node;
1322 } else {
1323 $_[0] = $node; # replace caller's version
1324 }
1325 return $node;
1326 }
1327
1328But again, Perl's built-in are virtually always good enough.
68dc0745 1329
1330=head2 How do I handle circular lists?
1331
1332Circular lists could be handled in the traditional fashion with linked
1333lists, or you could just do something like this with an array:
1334
1335 unshift(@array, pop(@array)); # the last shall be first
1336 push(@array, shift(@array)); # and vice versa
1337
1338=head2 How do I shuffle an array randomly?
1339
45bbf655 1340If you either have Perl 5.8.0 or later installed, or if you have
1341Scalar-List-Utils 1.03 or later installed, you can say:
1342
f05bbc40 1343 use List::Util 'shuffle';
45bbf655 1344
1345 @shuffled = shuffle(@list);
1346
f05bbc40 1347If not, you can use a Fisher-Yates shuffle.
5a964f20 1348
5a964f20 1349 sub fisher_yates_shuffle {
cc30d1a7 1350 my $deck = shift; # $deck is a reference to an array
1351 my $i = @$deck;
f05bbc40 1352 while ($i--) {
5a964f20 1353 my $j = int rand ($i+1);
cc30d1a7 1354 @$deck[$i,$j] = @$deck[$j,$i];
5a964f20 1355 }
1356 }
1357
cc30d1a7 1358 # shuffle my mpeg collection
1359 #
1360 my @mpeg = <audio/*/*.mp3>;
1361 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1362 print @mpeg;
5a964f20 1363
45bbf655 1364Note that the above implementation shuffles an array in place,
1365unlike the List::Util::shuffle() which takes a list and returns
1366a new shuffled list.
1367
d92eb7b0 1368You've probably seen shuffling algorithms that work using splice,
a6dd486b 1369randomly picking another element to swap the current element with
68dc0745 1370
1371 srand;
1372 @new = ();
1373 @old = 1 .. 10; # just a demo
1374 while (@old) {
1375 push(@new, splice(@old, rand @old, 1));
1376 }
1377
5a964f20 1378This is bad because splice is already O(N), and since you do it N times,
1379you just invented a quadratic algorithm; that is, O(N**2). This does
1380not scale, although Perl is so efficient that you probably won't notice
1381this until you have rather largish arrays.
68dc0745 1382
1383=head2 How do I process/modify each element of an array?
1384
1385Use C<for>/C<foreach>:
1386
1387 for (@lines) {
5a964f20 1388 s/foo/bar/; # change that word
1389 y/XZ/ZX/; # swap those letters
68dc0745 1390 }
1391
1392Here's another; let's compute spherical volumes:
1393
5a964f20 1394 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1395 $_ **= 3;
1396 $_ *= (4/3) * 3.14159; # this will be constant folded
1397 }
1398
76817d6d 1399If you want to do the same thing to modify the values of the
1400hash, you can use the C<values> function. As of Perl 5.6
1401the values are not copied, so if you modify $orbit (in this
1402case), you modify the value.
5a964f20 1403
76817d6d 1404 for $orbit ( values %orbits ) {
5a964f20 1405 ($orbit **= 3) *= (4/3) * 3.14159;
1406 }
76817d6d 1407
1408Prior to perl 5.6 C<values> returned copies of the values,
1409so older perl code often contains constructions such as
1410C<@orbits{keys %orbits}> instead of C<values %orbits> where
1411the hash is to be modified.
1412
68dc0745 1413=head2 How do I select a random element from an array?
1414
1415Use the rand() function (see L<perlfunc/rand>):
1416
5a964f20 1417 # at the top of the program:
68dc0745 1418 srand; # not needed for 5.004 and later
5a964f20 1419
1420 # then later on
68dc0745 1421 $index = rand @array;
1422 $element = $array[$index];
1423
5a964f20 1424Make sure you I<only call srand once per program, if then>.
1425If you are calling it more than once (such as before each
1426call to rand), you're almost certainly doing something wrong.
1427
68dc0745 1428=head2 How do I permute N elements of a list?
1429
1430Here's a little program that generates all permutations
1431of all the words on each line of input. The algorithm embodied
5a964f20 1432in the permute() function should work on any list:
68dc0745 1433
1434 #!/usr/bin/perl -n
5a964f20 1435 # tsc-permute: permute each word of input
1436 permute([split], []);
1437 sub permute {
1438 my @items = @{ $_[0] };
1439 my @perms = @{ $_[1] };
1440 unless (@items) {
1441 print "@perms\n";
68dc0745 1442 } else {
5a964f20 1443 my(@newitems,@newperms,$i);
1444 foreach $i (0 .. $#items) {
1445 @newitems = @items;
1446 @newperms = @perms;
1447 unshift(@newperms, splice(@newitems, $i, 1));
1448 permute([@newitems], [@newperms]);
68dc0745 1449 }
1450 }
1451 }
1452
b8d2732a 1453Unfortunately, this algorithm is very inefficient. The Algorithm::Permute
1454module from CPAN runs at least an order of magnitude faster. If you don't
1455have a C compiler (or a binary distribution of Algorithm::Permute), then
1456you can use List::Permutor which is written in pure Perl, and is still
f8620f40 1457several times faster than the algorithm above.
b8d2732a 1458
68dc0745 1459=head2 How do I sort an array by (anything)?
1460
1461Supply a comparison function to sort() (described in L<perlfunc/sort>):
1462
1463 @list = sort { $a <=> $b } @list;
1464
1465The default sort function is cmp, string comparison, which would
c47ff5f1 1466sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1467the numerical comparison operator.
1468
1469If you have a complicated function needed to pull out the part you
1470want to sort on, then don't do it inside the sort function. Pull it
1471out first, because the sort BLOCK can be called many times for the
1472same element. Here's an example of how to pull out the first word
1473after the first number on each item, and then sort those words
1474case-insensitively.
1475
1476 @idx = ();
1477 for (@data) {
1478 ($item) = /\d+\s*(\S+)/;
1479 push @idx, uc($item);
1480 }
1481 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1482
a6dd486b 1483which could also be written this way, using a trick
68dc0745 1484that's come to be known as the Schwartzian Transform:
1485
1486 @sorted = map { $_->[0] }
1487 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1488 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1489
1490If you need to sort on several fields, the following paradigm is useful.
1491
1492 @sorted = sort { field1($a) <=> field1($b) ||
1493 field2($a) cmp field2($b) ||
1494 field3($a) cmp field3($b)
1495 } @data;
1496
1497This can be conveniently combined with precalculation of keys as given
1498above.
1499
06a5f41f 1500See the F<sort> artitcle article in the "Far More Than You Ever Wanted
1501To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for
1502more about this approach.
68dc0745 1503
1504See also the question below on sorting hashes.
1505
1506=head2 How do I manipulate arrays of bits?
1507
1508Use pack() and unpack(), or else vec() and the bitwise operations.
1509
1510For example, this sets $vec to have bit N set if $ints[N] was set:
1511
1512 $vec = '';
1513 foreach(@ints) { vec($vec,$_,1) = 1 }
1514
cc30d1a7 1515Here's how, given a vector in $vec, you can
68dc0745 1516get those bits into your @ints array:
1517
1518 sub bitvec_to_list {
1519 my $vec = shift;
1520 my @ints;
1521 # Find null-byte density then select best algorithm
1522 if ($vec =~ tr/\0// / length $vec > 0.95) {
1523 use integer;
1524 my $i;
1525 # This method is faster with mostly null-bytes
1526 while($vec =~ /[^\0]/g ) {
1527 $i = -9 + 8 * pos $vec;
1528 push @ints, $i if vec($vec, ++$i, 1);
1529 push @ints, $i if vec($vec, ++$i, 1);
1530 push @ints, $i if vec($vec, ++$i, 1);
1531 push @ints, $i if vec($vec, ++$i, 1);
1532 push @ints, $i if vec($vec, ++$i, 1);
1533 push @ints, $i if vec($vec, ++$i, 1);
1534 push @ints, $i if vec($vec, ++$i, 1);
1535 push @ints, $i if vec($vec, ++$i, 1);
1536 }
1537 } else {
1538 # This method is a fast general algorithm
1539 use integer;
1540 my $bits = unpack "b*", $vec;
1541 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1542 push @ints, pos $bits while($bits =~ /1/g);
1543 }
1544 return \@ints;
1545 }
1546
1547This method gets faster the more sparse the bit vector is.
1548(Courtesy of Tim Bunce and Winfried Koenig.)
1549
76817d6d 1550You can make the while loop a lot shorter with this suggestion
1551from Benjamin Goldberg:
1552
1553 while($vec =~ /[^\0]+/g ) {
1554 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1555 }
1556
cc30d1a7 1557Or use the CPAN module Bit::Vector:
1558
1559 $vector = Bit::Vector->new($num_of_bits);
1560 $vector->Index_List_Store(@ints);
1561 @ints = $vector->Index_List_Read();
1562
1563Bit::Vector provides efficient methods for bit vector, sets of small integers
1564and "big int" math.
1565
1566Here's a more extensive illustration using vec():
65acb1b1 1567
1568 # vec demo
1569 $vector = "\xff\x0f\xef\xfe";
1570 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1571 unpack("N", $vector), "\n";
1572 $is_set = vec($vector, 23, 1);
1573 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1574 pvec($vector);
1575
1576 set_vec(1,1,1);
1577 set_vec(3,1,1);
1578 set_vec(23,1,1);
1579
1580 set_vec(3,1,3);
1581 set_vec(3,2,3);
1582 set_vec(3,4,3);
1583 set_vec(3,4,7);
1584 set_vec(3,8,3);
1585 set_vec(3,8,7);
1586
1587 set_vec(0,32,17);
1588 set_vec(1,32,17);
1589
1590 sub set_vec {
1591 my ($offset, $width, $value) = @_;
1592 my $vector = '';
1593 vec($vector, $offset, $width) = $value;
1594 print "offset=$offset width=$width value=$value\n";
1595 pvec($vector);
1596 }
1597
1598 sub pvec {
1599 my $vector = shift;
1600 my $bits = unpack("b*", $vector);
1601 my $i = 0;
1602 my $BASE = 8;
1603
1604 print "vector length in bytes: ", length($vector), "\n";
1605 @bytes = unpack("A8" x length($vector), $bits);
1606 print "bits are: @bytes\n\n";
1607 }
1608
68dc0745 1609=head2 Why does defined() return true on empty arrays and hashes?
1610
65acb1b1 1611The short story is that you should probably only use defined on scalars or
1612functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1613in the 5.004 release or later of Perl for more detail.
68dc0745 1614
1615=head1 Data: Hashes (Associative Arrays)
1616
1617=head2 How do I process an entire hash?
1618
1619Use the each() function (see L<perlfunc/each>) if you don't care
1620whether it's sorted:
1621
5a964f20 1622 while ( ($key, $value) = each %hash) {
68dc0745 1623 print "$key = $value\n";
1624 }
1625
1626If you want it sorted, you'll have to use foreach() on the result of
1627sorting the keys as shown in an earlier question.
1628
1629=head2 What happens if I add or remove keys from a hash while iterating over it?
1630
d92eb7b0 1631Don't do that. :-)
1632
1633[lwall] In Perl 4, you were not allowed to modify a hash at all while
87275199 1634iterating over it. In Perl 5 you can delete from it, but you still
d92eb7b0 1635can't add to it, because that might cause a doubling of the hash table,
1636in which half the entries get copied up to the new top half of the
87275199 1637table, at which point you've totally bamboozled the iterator code.
d92eb7b0 1638Even if the table doesn't double, there's no telling whether your new
1639entry will be inserted before or after the current iterator position.
1640
a6dd486b 1641Either treasure up your changes and make them after the iterator finishes
d92eb7b0 1642or use keys to fetch all the old keys at once, and iterate over the list
1643of keys.
68dc0745 1644
1645=head2 How do I look up a hash element by value?
1646
1647Create a reverse hash:
1648
1649 %by_value = reverse %by_key;
1650 $key = $by_value{$value};
1651
1652That's not particularly efficient. It would be more space-efficient
1653to use:
1654
1655 while (($key, $value) = each %by_key) {
1656 $by_value{$value} = $key;
1657 }
1658
d92eb7b0 1659If your hash could have repeated values, the methods above will only find
1660one of the associated keys. This may or may not worry you. If it does
1661worry you, you can always reverse the hash into a hash of arrays instead:
1662
1663 while (($key, $value) = each %by_key) {
1664 push @{$key_list_by_value{$value}}, $key;
1665 }
68dc0745 1666
1667=head2 How can I know how many entries are in a hash?
1668
1669If you mean how many keys, then all you have to do is
875e5c2f 1670use the keys() function in a scalar context:
68dc0745 1671
875e5c2f 1672 $num_keys = keys %hash;
68dc0745 1673
875e5c2f 1674The keys() function also resets the iterator, which means that you may
1675see strange results if you use this between uses of other hash operators
1676such as each().
68dc0745 1677
1678=head2 How do I sort a hash (optionally by value instead of key)?
1679
1680Internally, hashes are stored in a way that prevents you from imposing
1681an order on key-value pairs. Instead, you have to sort a list of the
1682keys or values:
1683
1684 @keys = sort keys %hash; # sorted by key
1685 @keys = sort {
1686 $hash{$a} cmp $hash{$b}
1687 } keys %hash; # and by value
1688
1689Here we'll do a reverse numeric sort by value, and if two keys are
a6dd486b 1690identical, sort by length of key, or if that fails, by straight ASCII
1691comparison of the keys (well, possibly modified by your locale--see
68dc0745 1692L<perllocale>).
1693
1694 @keys = sort {
1695 $hash{$b} <=> $hash{$a}
1696 ||
1697 length($b) <=> length($a)
1698 ||
1699 $a cmp $b
1700 } keys %hash;
1701
1702=head2 How can I always keep my hash sorted?
1703
1704You can look into using the DB_File module and tie() using the
1705$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1706The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1707
1708=head2 What's the difference between "delete" and "undef" with hashes?
1709
1710Hashes are pairs of scalars: the first is the key, the second is the
1711value. The key will be coerced to a string, although the value can be
1712any kind of scalar: string, number, or reference. If a key C<$key> is
1713present in the array, C<exists($key)> will return true. The value for
1714a given key can be C<undef>, in which case C<$array{$key}> will be
1715C<undef> while C<$exists{$key}> will return true. This corresponds to
1716(C<$key>, C<undef>) being in the hash.
1717
1718Pictures help... here's the C<%ary> table:
1719
1720 keys values
1721 +------+------+
1722 | a | 3 |
1723 | x | 7 |
1724 | d | 0 |
1725 | e | 2 |
1726 +------+------+
1727
1728And these conditions hold
1729
1730 $ary{'a'} is true
1731 $ary{'d'} is false
1732 defined $ary{'d'} is true
1733 defined $ary{'a'} is true
87275199 1734 exists $ary{'a'} is true (Perl5 only)
68dc0745 1735 grep ($_ eq 'a', keys %ary) is true
1736
1737If you now say
1738
1739 undef $ary{'a'}
1740
1741your table now reads:
1742
1743
1744 keys values
1745 +------+------+
1746 | a | undef|
1747 | x | 7 |
1748 | d | 0 |
1749 | e | 2 |
1750 +------+------+
1751
1752and these conditions now hold; changes in caps:
1753
1754 $ary{'a'} is FALSE
1755 $ary{'d'} is false
1756 defined $ary{'d'} is true
1757 defined $ary{'a'} is FALSE
87275199 1758 exists $ary{'a'} is true (Perl5 only)
68dc0745 1759 grep ($_ eq 'a', keys %ary) is true
1760
1761Notice the last two: you have an undef value, but a defined key!
1762
1763Now, consider this:
1764
1765 delete $ary{'a'}
1766
1767your table now reads:
1768
1769 keys values
1770 +------+------+
1771 | x | 7 |
1772 | d | 0 |
1773 | e | 2 |
1774 +------+------+
1775
1776and these conditions now hold; changes in caps:
1777
1778 $ary{'a'} is false
1779 $ary{'d'} is false
1780 defined $ary{'d'} is true
1781 defined $ary{'a'} is false
87275199 1782 exists $ary{'a'} is FALSE (Perl5 only)
68dc0745 1783 grep ($_ eq 'a', keys %ary) is FALSE
1784
1785See, the whole entry is gone!
1786
1787=head2 Why don't my tied hashes make the defined/exists distinction?
1788
1789They may or may not implement the EXISTS() and DEFINED() methods
1790differently. For example, there isn't the concept of undef with hashes
1791that are tied to DBM* files. This means the true/false tables above
1792will give different results when used on such a hash. It also means
1793that exists and defined do the same thing with a DBM* file, and what
1794they end up doing is not what they do with ordinary hashes.
1795
1796=head2 How do I reset an each() operation part-way through?
1797
5a964f20 1798Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1799the hash I<and> resets the iterator associated with the hash. You may
1800need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1801re-enter it, the hash iterator has been reset.
68dc0745 1802
1803=head2 How can I get the unique keys from two hashes?
1804
d92eb7b0 1805First you extract the keys from the hashes into lists, then solve
1806the "removing duplicates" problem described above. For example:
68dc0745 1807
1808 %seen = ();
1809 for $element (keys(%foo), keys(%bar)) {
1810 $seen{$element}++;
1811 }
1812 @uniq = keys %seen;
1813
1814Or more succinctly:
1815
1816 @uniq = keys %{{%foo,%bar}};
1817
1818Or if you really want to save space:
1819
1820 %seen = ();
1821 while (defined ($key = each %foo)) {
1822 $seen{$key}++;
1823 }
1824 while (defined ($key = each %bar)) {
1825 $seen{$key}++;
1826 }
1827 @uniq = keys %seen;
1828
1829=head2 How can I store a multidimensional array in a DBM file?
1830
1831Either stringify the structure yourself (no fun), or else
1832get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1833it on top of either DB_File or GDBM_File.
1834
1835=head2 How can I make my hash remember the order I put elements into it?
1836
1837Use the Tie::IxHash from CPAN.
1838
46fc3d4c 1839 use Tie::IxHash;
1840 tie(%myhash, Tie::IxHash);
1841 for ($i=0; $i<20; $i++) {
1842 $myhash{$i} = 2*$i;
1843 }
1844 @keys = keys %myhash;
1845 # @keys = (0,1,2,3,...)
1846
68dc0745 1847=head2 Why does passing a subroutine an undefined element in a hash create it?
1848
1849If you say something like:
1850
1851 somefunc($hash{"nonesuch key here"});
1852
1853Then that element "autovivifies"; that is, it springs into existence
1854whether you store something there or not. That's because functions
1855get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1856it has to be ready to write it back into the caller's version.
1857
87275199 1858This has been fixed as of Perl5.004.
68dc0745 1859
1860Normally, merely accessing a key's value for a nonexistent key does
1861I<not> cause that key to be forever there. This is different than
1862awk's behavior.
1863
fc36a67e 1864=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1865
65acb1b1 1866Usually a hash ref, perhaps like this:
1867
1868 $record = {
1869 NAME => "Jason",
1870 EMPNO => 132,
1871 TITLE => "deputy peon",
1872 AGE => 23,
1873 SALARY => 37_000,
1874 PALS => [ "Norbert", "Rhys", "Phineas"],
1875 };
1876
1877References are documented in L<perlref> and the upcoming L<perlreftut>.
1878Examples of complex data structures are given in L<perldsc> and
1879L<perllol>. Examples of structures and object-oriented classes are
1880in L<perltoot>.
68dc0745 1881
1882=head2 How can I use a reference as a hash key?
1883
fe854a6f 1884You can't do this directly, but you could use the standard Tie::RefHash
87275199 1885module distributed with Perl.
68dc0745 1886
1887=head1 Data: Misc
1888
1889=head2 How do I handle binary data correctly?
1890
1891Perl is binary clean, so this shouldn't be a problem. For example,
1892this works fine (assuming the files are found):
1893
1894 if (`cat /vmunix` =~ /gzip/) {
1895 print "Your kernel is GNU-zip enabled!\n";
1896 }
1897
d92eb7b0 1898On less elegant (read: Byzantine) systems, however, you have
1899to play tedious games with "text" versus "binary" files. See
1900L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
1901systems are curses out of Microsoft, who seem to be committed to putting
1902the backward into backward compatibility.
68dc0745 1903
1904If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1905
54310121 1906If you want to deal with multibyte characters, however, there are
68dc0745 1907some gotchas. See the section on Regular Expressions.
1908
1909=head2 How do I determine whether a scalar is a number/whole/integer/float?
1910
1911Assuming that you don't care about IEEE notations like "NaN" or
1912"Infinity", you probably just want to use a regular expression.
1913
65acb1b1 1914 if (/\D/) { print "has nondigits\n" }
1915 if (/^\d+$/) { print "is a whole number\n" }
1916 if (/^-?\d+$/) { print "is an integer\n" }
1917 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1918 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
881bdbd4 1919 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
65acb1b1 1920 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 1921 { print "a C float\n" }
68dc0745 1922
5a964f20 1923If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1924function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1925wrapper function for more convenient access. This function takes
1926a string and returns the number it found, or C<undef> for input that
1927isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1928if you just want to say, ``Is this a float?''
1929
1930 sub getnum {
1931 use POSIX qw(strtod);
1932 my $str = shift;
1933 $str =~ s/^\s+//;
1934 $str =~ s/\s+$//;
1935 $! = 0;
1936 my($num, $unparsed) = strtod($str);
1937 if (($str eq '') || ($unparsed != 0) || $!) {
1938 return undef;
1939 } else {
1940 return $num;
1941 }
1942 }
1943
072dc14b 1944 sub is_numeric { defined getnum($_[0]) }
5a964f20 1945
6cecdcac 1946Or you could check out the String::Scanf module on CPAN instead. The
1947POSIX module (part of the standard Perl distribution) provides the
bf4acbe4 1948C<strtod> and C<strtol> for converting strings to double and longs,
6cecdcac 1949respectively.
68dc0745 1950
1951=head2 How do I keep persistent data across program calls?
1952
1953For some specific applications, you can use one of the DBM modules.
fe854a6f 1954See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
1955or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
1956of the standard distribution. Here's one example using Storable's C<store>
1957and C<retrieve> functions:
65acb1b1 1958
1959 use Storable;
1960 store(\%hash, "filename");
1961
1962 # later on...
1963 $href = retrieve("filename"); # by ref
1964 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1965
1966=head2 How do I print out or copy a recursive data structure?
1967
65acb1b1 1968The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1969for printing out data structures. The Storable module, found on CPAN,
1970provides a function called C<dclone> that recursively copies its argument.
1971
1972 use Storable qw(dclone);
1973 $r2 = dclone($r1);
68dc0745 1974
65acb1b1 1975Where $r1 can be a reference to any kind of data structure you'd like.
1976It will be deeply copied. Because C<dclone> takes and returns references,
1977you'd have to add extra punctuation if you had a hash of arrays that
1978you wanted to copy.
68dc0745 1979
65acb1b1 1980 %newhash = %{ dclone(\%oldhash) };
68dc0745 1981
1982=head2 How do I define methods for every class/object?
1983
1984Use the UNIVERSAL class (see L<UNIVERSAL>).
1985
1986=head2 How do I verify a credit card checksum?
1987
1988Get the Business::CreditCard module from CPAN.
1989
65acb1b1 1990=head2 How do I pack arrays of doubles or floats for XS code?
1991
1992The kgbpack.c code in the PGPLOT module on CPAN does just this.
1993If you're doing a lot of float or double processing, consider using
1994the PDL module from CPAN instead--it makes number-crunching easy.
1995
68dc0745 1996=head1 AUTHOR AND COPYRIGHT
1997
0bc0ad85 1998Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
5a964f20 1999All rights reserved.
2000
5a7beb56 2001This documentation is free; you can redistribute it and/or modify it
2002under the same terms as Perl itself.
5a964f20 2003
2004Irrespective of its distribution, all code examples in this file
2005are hereby placed into the public domain. You are permitted and
2006encouraged to use this code in your own programs for fun
2007or for profit as you see fit. A simple comment in the code giving
2008credit would be courteous but is not required.