Integrate perlio:
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
23d8d7bf 3perlfaq4 - Data Manipulation ($Revision: 1.11 $, $Date: 2002/01/11 02:31:20 $)
68dc0745 4
5=head1 DESCRIPTION
6
a6dd486b 7The section of the FAQ answers questions related to the manipulation
68dc0745 8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20 15The infinite set that a mathematician thinks of as the real numbers can
a6dd486b 16only be approximated on a computer, since the computer only has a finite
5a964f20 17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05 20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
a6dd486b 22representation (eg, 19.95) to an internal binary representation.
46fc3d4c 23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
a6dd486b 32current output format for numbers. (See L<perlvar/"$#"> if you use
46fc3d4c 33print. C<$#> has a different default value in Perl5 than it did in
87275199 34Perl4. Changing C<$#> yourself is deprecated.)
46fc3d4c 35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
80ba158a 42If precision is important, such as when dealing with money, it's good
1affb2ee 43to work with integers and then divide at the last possible moment.
44For example, work in pennies (1995) instead of dollars and cents
6b927632 45(19.95) and divide by 100 at the end.
1affb2ee 46
46fc3d4c 47To get rid of the superfluous digits, just use a format (eg,
48C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 49See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 50
68dc0745 51=head2 Why isn't my octal data interpreted correctly?
52
53Perl only understands octal and hex numbers as such when they occur
33ce146f 54as literals in your program. Octal literals in perl must start with
55a leading "0" and hexadecimal literals must start with a leading "0x".
56If they are read in from somewhere and assigned, no automatic
57conversion takes place. You must explicitly use oct() or hex() if you
58want the values converted to decimal. oct() interprets
68dc0745 59both hex ("0x350") numbers and octal ones ("0350" or even without the
60leading "0", like "377"), while hex() only converts hexadecimal ones,
61with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
33ce146f 62The inverse mapping from decimal to octal can be done with either the
63"%o" or "%O" sprintf() formats. To get from decimal to hex try either
64the "%x" or the "%X" formats to sprintf().
68dc0745 65
66This problem shows up most often when people try using chmod(), mkdir(),
33ce146f 67umask(), or sysopen(), which by widespread tradition typically take
68permissions in octal.
68dc0745 69
33ce146f 70 chmod(644, $file); # WRONG
68dc0745 71 chmod(0644, $file); # right
72
33ce146f 73Note the mistake in the first line was specifying the decimal literal
74644, rather than the intended octal literal 0644. The problem can
75be seen with:
76
434f7166 77 printf("%#o",644); # prints 01204
33ce146f 78
79Surely you had not intended C<chmod(01204, $file);> - did you? If you
80want to use numeric literals as arguments to chmod() et al. then please
81try to express them as octal constants, that is with a leading zero and
82with the following digits restricted to the set 0..7.
83
65acb1b1 84=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 85
92c2ed05 86Remember that int() merely truncates toward 0. For rounding to a
87certain number of digits, sprintf() or printf() is usually the easiest
88route.
89
90 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 91
87275199 92The POSIX module (part of the standard Perl distribution) implements
68dc0745 93ceil(), floor(), and a number of other mathematical and trigonometric
94functions.
95
92c2ed05 96 use POSIX;
97 $ceil = ceil(3.5); # 4
98 $floor = floor(3.5); # 3
99
a6dd486b 100In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
87275199 101module. With 5.004, the Math::Trig module (part of the standard Perl
46fc3d4c 102distribution) implements the trigonometric functions. Internally it
103uses the Math::Complex module and some functions can break out from
104the real axis into the complex plane, for example the inverse sine of
1052.
68dc0745 106
107Rounding in financial applications can have serious implications, and
108the rounding method used should be specified precisely. In these
109cases, it probably pays not to trust whichever system rounding is
110being used by Perl, but to instead implement the rounding function you
111need yourself.
112
65acb1b1 113To see why, notice how you'll still have an issue on half-way-point
114alternation:
115
116 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
117
118 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
119 0.8 0.8 0.9 0.9 1.0 1.0
120
121Don't blame Perl. It's the same as in C. IEEE says we have to do this.
122Perl numbers whose absolute values are integers under 2**31 (on 32 bit
123machines) will work pretty much like mathematical integers. Other numbers
124are not guaranteed.
125
6761e064 126=head2 How do I convert between numeric representations:
68dc0745 127
6761e064 128As always with Perl there is more than one way to do it. Below
129are a few examples of approaches to making common conversions
130between number representations. This is intended to be representational
131rather than exhaustive.
68dc0745 132
6761e064 133Some of the examples below use the Bit::Vector module from CPAN.
134The reason you might choose Bit::Vector over the perl built in
135functions is that it works with numbers of ANY size, that it is
136optimized for speed on some operations, and for at least some
137programmers the notation might be familiar.
d92eb7b0 138
6761e064 139=item B<How do I convert Hexadecimal into decimal:>
d92eb7b0 140
6761e064 141Using perl's built in conversion of 0x notation:
142
143 $int = 0xDEADBEEF;
144 $dec = sprintf("%d", $int);
7207e29d 145
6761e064 146Using the hex function:
147
148 $int = hex("DEADBEEF");
149 $dec = sprintf("%d", $int);
150
151Using pack:
152
153 $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
154 $dec = sprintf("%d", $int);
155
156Using the CPAN module Bit::Vector:
157
158 use Bit::Vector;
159 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
160 $dec = $vec->to_Dec();
161
162=item B<How do I convert from decimal to hexadecimal:>
163
164Using sprint:
165
166 $hex = sprintf("%X", 3735928559);
167
168Using unpack
169
170 $hex = unpack("H*", pack("N", 3735928559));
171
172Using Bit::Vector
173
174 use Bit::Vector;
175 $vec = Bit::Vector->new_Dec(32, -559038737);
176 $hex = $vec->to_Hex();
177
178And Bit::Vector supports odd bit counts:
179
180 use Bit::Vector;
181 $vec = Bit::Vector->new_Dec(33, 3735928559);
182 $vec->Resize(32); # suppress leading 0 if unwanted
183 $hex = $vec->to_Hex();
184
185=item B<How do I convert from octal to decimal:>
186
187Using Perl's built in conversion of numbers with leading zeros:
188
189 $int = 033653337357; # note the leading 0!
190 $dec = sprintf("%d", $int);
191
192Using the oct function:
193
194 $int = oct("33653337357");
195 $dec = sprintf("%d", $int);
196
197Using Bit::Vector:
198
199 use Bit::Vector;
200 $vec = Bit::Vector->new(32);
201 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
202 $dec = $vec->to_Dec();
203
204=item B<How do I convert from decimal to octal:>
205
206Using sprintf:
207
208 $oct = sprintf("%o", 3735928559);
209
210Using Bit::Vector
211
212 use Bit::Vector;
213 $vec = Bit::Vector->new_Dec(32, -559038737);
214 $oct = reverse join('', $vec->Chunk_List_Read(3));
215
216=item B<How do I convert from binary to decimal:>
217
218Using pack and ord
d92eb7b0 219
220 $decimal = ord(pack('B8', '10110110'));
68dc0745 221
6761e064 222Using pack and unpack for larger strings
223
224 $int = unpack("N", pack("B32",
225 substr("0" x 32 . "11110101011011011111011101111", -32)));
226 $dec = sprintf("%d", $int);
227
5efd7060 228 # substr() is used to left pad a 32 character string with zeros.
6761e064 229
230Using Bit::Vector:
231
232 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
233 $dec = $vec->to_Dec();
234
235=item B<How do I convert from decimal to binary:>
236
237Using unpack;
238
239 $bin = unpack("B*", pack("N", 3735928559));
240
241Using Bit::Vector:
242
243 use Bit::Vector;
244 $vec = Bit::Vector->new_Dec(32, -559038737);
245 $bin = $vec->to_Bin();
246
247The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
248are left as an exercise to the inclined reader.
68dc0745 249
68dc0745 250
65acb1b1 251=head2 Why doesn't & work the way I want it to?
252
253The behavior of binary arithmetic operators depends on whether they're
254used on numbers or strings. The operators treat a string as a series
255of bits and work with that (the string C<"3"> is the bit pattern
256C<00110011>). The operators work with the binary form of a number
257(the number C<3> is treated as the bit pattern C<00000011>).
258
259So, saying C<11 & 3> performs the "and" operation on numbers (yielding
260C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
261(yielding C<"1">).
262
263Most problems with C<&> and C<|> arise because the programmer thinks
264they have a number but really it's a string. The rest arise because
265the programmer says:
266
267 if ("\020\020" & "\101\101") {
268 # ...
269 }
270
271but a string consisting of two null bytes (the result of C<"\020\020"
272& "\101\101">) is not a false value in Perl. You need:
273
274 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
275 # ...
276 }
277
68dc0745 278=head2 How do I multiply matrices?
279
280Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
281or the PDL extension (also available from CPAN).
282
283=head2 How do I perform an operation on a series of integers?
284
285To call a function on each element in an array, and collect the
286results, use:
287
288 @results = map { my_func($_) } @array;
289
290For example:
291
292 @triple = map { 3 * $_ } @single;
293
294To call a function on each element of an array, but ignore the
295results:
296
297 foreach $iterator (@array) {
65acb1b1 298 some_func($iterator);
68dc0745 299 }
300
301To call a function on each integer in a (small) range, you B<can> use:
302
65acb1b1 303 @results = map { some_func($_) } (5 .. 25);
68dc0745 304
305but you should be aware that the C<..> operator creates an array of
306all integers in the range. This can take a lot of memory for large
307ranges. Instead use:
308
309 @results = ();
310 for ($i=5; $i < 500_005; $i++) {
65acb1b1 311 push(@results, some_func($i));
68dc0745 312 }
313
87275199 314This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
315loop will iterate over the range, without creating the entire range.
316
317 for my $i (5 .. 500_005) {
318 push(@results, some_func($i));
319 }
320
321will not create a list of 500,000 integers.
322
68dc0745 323=head2 How can I output Roman numerals?
324
a93751fa 325Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 326
327=head2 Why aren't my random numbers random?
328
65acb1b1 329If you're using a version of Perl before 5.004, you must call C<srand>
330once at the start of your program to seed the random number generator.
3315.004 and later automatically call C<srand> at the beginning. Don't
332call C<srand> more than once--you make your numbers less random, rather
333than more.
92c2ed05 334
65acb1b1 335Computers are good at being predictable and bad at being random
06a5f41f 336(despite appearances caused by bugs in your programs :-). see the
337F<random> artitcle in the "Far More Than You Ever Wanted To Know"
338collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of
339Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
340who attempts to generate random numbers by deterministic means is, of
65acb1b1 341course, living in a state of sin.''
342
343If you want numbers that are more random than C<rand> with C<srand>
344provides, you should also check out the Math::TrulyRandom module from
345CPAN. It uses the imperfections in your system's timer to generate
346random numbers, but this takes quite a while. If you want a better
92c2ed05 347pseudorandom generator than comes with your operating system, look at
65acb1b1 348``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 349
350=head1 Data: Dates
351
352=head2 How do I find the week-of-the-year/day-of-the-year?
353
354The day of the year is in the array returned by localtime() (see
355L<perlfunc/"localtime">):
356
357 $day_of_year = (localtime(time()))[7];
358
d92eb7b0 359=head2 How do I find the current century or millennium?
360
361Use the following simple functions:
362
363 sub get_century {
364 return int((((localtime(shift || time))[5] + 1999))/100);
365 }
366 sub get_millennium {
367 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
368 }
369
370On some systems, you'll find that the POSIX module's strftime() function
371has been extended in a non-standard way to use a C<%C> format, which they
372sometimes claim is the "century". It isn't, because on most such systems,
373this is only the first two digits of the four-digit year, and thus cannot
374be used to reliably determine the current century or millennium.
375
92c2ed05 376=head2 How can I compare two dates and find the difference?
68dc0745 377
92c2ed05 378If you're storing your dates as epoch seconds then simply subtract one
379from the other. If you've got a structured date (distinct year, day,
d92eb7b0 380month, hour, minute, seconds values), then for reasons of accessibility,
381simplicity, and efficiency, merely use either timelocal or timegm (from
382the Time::Local module in the standard distribution) to reduce structured
383dates to epoch seconds. However, if you don't know the precise format of
384your dates, then you should probably use either of the Date::Manip and
385Date::Calc modules from CPAN before you go hacking up your own parsing
386routine to handle arbitrary date formats.
68dc0745 387
388=head2 How can I take a string and turn it into epoch seconds?
389
390If it's a regular enough string that it always has the same format,
92c2ed05 391you can split it up and pass the parts to C<timelocal> in the standard
392Time::Local module. Otherwise, you should look into the Date::Calc
393and Date::Manip modules from CPAN.
68dc0745 394
395=head2 How can I find the Julian Day?
396
2a2bf5f4 397Use the Time::JulianDay module (part of the Time-modules bundle
398available from CPAN.)
d92eb7b0 399
89435c96 400Before you immerse yourself too deeply in this, be sure to verify that
401it is the I<Julian> Day you really want. Are you interested in a way
402of getting serial days so that you just can tell how many days they
403are apart or so that you can do also other date arithmetic? If you
d92eb7b0 404are interested in performing date arithmetic, this can be done using
2a2bf5f4 405modules Date::Manip or Date::Calc.
89435c96 406
407There is too many details and much confusion on this issue to cover in
408this FAQ, but the term is applied (correctly) to a calendar now
409supplanted by the Gregorian Calendar, with the Julian Calendar failing
410to adjust properly for leap years on centennial years (among other
411annoyances). The term is also used (incorrectly) to mean: [1] days in
412the Gregorian Calendar; and [2] days since a particular starting time
413or `epoch', usually 1970 in the Unix world and 1980 in the
414MS-DOS/Windows world. If you find that it is not the first meaning
415that you really want, then check out the Date::Manip and Date::Calc
416modules. (Thanks to David Cassell for most of this text.)
be94a901 417
65acb1b1 418=head2 How do I find yesterday's date?
419
420The C<time()> function returns the current time in seconds since the
d92eb7b0 421epoch. Take twenty-four hours off that:
65acb1b1 422
423 $yesterday = time() - ( 24 * 60 * 60 );
424
425Then you can pass this to C<localtime()> and get the individual year,
426month, day, hour, minute, seconds values.
427
d92eb7b0 428Note very carefully that the code above assumes that your days are
429twenty-four hours each. For most people, there are two days a year
430when they aren't: the switch to and from summer time throws this off.
431A solution to this issue is offered by Russ Allbery.
432
433 sub yesterday {
434 my $now = defined $_[0] ? $_[0] : time;
435 my $then = $now - 60 * 60 * 24;
436 my $ndst = (localtime $now)[8] > 0;
437 my $tdst = (localtime $then)[8] > 0;
438 $then - ($tdst - $ndst) * 60 * 60;
439 }
440 # Should give you "this time yesterday" in seconds since epoch relative to
441 # the first argument or the current time if no argument is given and
442 # suitable for passing to localtime or whatever else you need to do with
443 # it. $ndst is whether we're currently in daylight savings time; $tdst is
444 # whether the point 24 hours ago was in daylight savings time. If $tdst
445 # and $ndst are the same, a boundary wasn't crossed, and the correction
446 # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
447 # from yesterday's time since we gained an extra hour while going off
448 # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
449 # negative hour (add an hour) to yesterday's time since we lost an hour.
450 #
451 # All of this is because during those days when one switches off or onto
452 # DST, a "day" isn't 24 hours long; it's either 23 or 25.
453 #
454 # The explicit settings of $ndst and $tdst are necessary because localtime
455 # only says it returns the system tm struct, and the system tm struct at
87275199 456 # least on Solaris doesn't guarantee any particular positive value (like,
d92eb7b0 457 # say, 1) for isdst, just a positive value. And that value can
458 # potentially be negative, if DST information isn't available (this sub
459 # just treats those cases like no DST).
460 #
461 # Note that between 2am and 3am on the day after the time zone switches
462 # off daylight savings time, the exact hour of "yesterday" corresponding
463 # to the current hour is not clearly defined. Note also that if used
464 # between 2am and 3am the day after the change to daylight savings time,
465 # the result will be between 3am and 4am of the previous day; it's
466 # arguable whether this is correct.
467 #
468 # This sub does not attempt to deal with leap seconds (most things don't).
469 #
470 # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
471 # This code is in the public domain
472
87275199 473=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 474
65acb1b1 475Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
476Y2K compliant (whatever that means). The programmers you've hired to
477use it, however, probably are not.
478
479Long answer: The question belies a true understanding of the issue.
480Perl is just as Y2K compliant as your pencil--no more, and no less.
481Can you use your pencil to write a non-Y2K-compliant memo? Of course
482you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 483
87275199 484The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 485supply adequate information to determine the year well beyond 2000
486(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 487by these functions when used in a list context is the year minus 1900.
65acb1b1 488For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
489number. To avoid the year 2000 problem simply do not treat the year as
490a 2-digit number. It isn't.
68dc0745 491
5a964f20 492When gmtime() and localtime() are used in scalar context they return
68dc0745 493a timestamp string that contains a fully-expanded year. For example,
494C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
4952001". There's no year 2000 problem here.
496
5a964f20 497That doesn't mean that Perl can't be used to create non-Y2K compliant
498programs. It can. But so can your pencil. It's the fault of the user,
499not the language. At the risk of inflaming the NRA: ``Perl doesn't
500break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
501a longer exposition.
502
68dc0745 503=head1 Data: Strings
504
505=head2 How do I validate input?
506
507The answer to this question is usually a regular expression, perhaps
5a964f20 508with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 509addresses, etc.) for details.
510
511=head2 How do I unescape a string?
512
92c2ed05 513It depends just what you mean by ``escape''. URL escapes are dealt
514with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 515character are removed with
68dc0745 516
517 s/\\(.)/$1/g;
518
92c2ed05 519This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 520
521=head2 How do I remove consecutive pairs of characters?
522
92c2ed05 523To turn C<"abbcccd"> into C<"abccd">:
68dc0745 524
d92eb7b0 525 s/(.)\1/$1/g; # add /s to include newlines
526
527Here's a solution that turns "abbcccd" to "abcd":
528
529 y///cs; # y == tr, but shorter :-)
68dc0745 530
531=head2 How do I expand function calls in a string?
532
533This is documented in L<perlref>. In general, this is fraught with
534quoting and readability problems, but it is possible. To interpolate
5a964f20 535a subroutine call (in list context) into a string:
68dc0745 536
537 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
538
539If you prefer scalar context, similar chicanery is also useful for
540arbitrary expressions:
541
542 print "That yields ${\($n + 5)} widgets\n";
543
92c2ed05 544Version 5.004 of Perl had a bug that gave list context to the
545expression in C<${...}>, but this is fixed in version 5.005.
546
547See also ``How can I expand variables in text strings?'' in this
548section of the FAQ.
46fc3d4c 549
68dc0745 550=head2 How do I find matching/nesting anything?
551
92c2ed05 552This isn't something that can be done in one regular expression, no
553matter how complicated. To find something between two single
554characters, a pattern like C</x([^x]*)x/> will get the intervening
555bits in $1. For multiple ones, then something more like
556C</alpha(.*?)omega/> would be needed. But none of these deals with
557nested patterns, nor can they. For that you'll have to write a
558parser.
559
560If you are serious about writing a parser, there are a number of
6a2af475 561modules or oddities that will make your life a lot easier. There are
562the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
83df6a1d 563and the byacc program. Starting from perl 5.8 the Text::Balanced
564is part of the standard distribution.
68dc0745 565
92c2ed05 566One simple destructive, inside-out approach that you might try is to
567pull out the smallest nesting parts one at a time:
5a964f20 568
d92eb7b0 569 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 570 # do something with $1
571 }
572
65acb1b1 573A more complicated and sneaky approach is to make Perl's regular
574expression engine do it for you. This is courtesy Dean Inada, and
575rather has the nature of an Obfuscated Perl Contest entry, but it
576really does work:
577
578 # $_ contains the string to parse
579 # BEGIN and END are the opening and closing markers for the
580 # nested text.
c47ff5f1 581
65acb1b1 582 @( = ('(','');
583 @) = (')','');
584 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
585 @$ = (eval{/$re/},$@!~/unmatched/);
586 print join("\n",@$[0..$#$]) if( $$[-1] );
587
68dc0745 588=head2 How do I reverse a string?
589
5a964f20 590Use reverse() in scalar context, as documented in
68dc0745 591L<perlfunc/reverse>.
592
593 $reversed = reverse $string;
594
595=head2 How do I expand tabs in a string?
596
5a964f20 597You can do it yourself:
68dc0745 598
599 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
600
87275199 601Or you can just use the Text::Tabs module (part of the standard Perl
68dc0745 602distribution).
603
604 use Text::Tabs;
605 @expanded_lines = expand(@lines_with_tabs);
606
607=head2 How do I reformat a paragraph?
608
87275199 609Use Text::Wrap (part of the standard Perl distribution):
68dc0745 610
611 use Text::Wrap;
612 print wrap("\t", ' ', @paragraphs);
613
92c2ed05 614The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 615newlines. Text::Wrap doesn't justify the lines (flush-right).
616
bc06af74 617Or use the CPAN module Text::Autoformat. Formatting files can be easily
618done by making a shell alias, like so:
619
620 alias fmt="perl -i -MText::Autoformat -n0777 \
621 -e 'print autoformat $_, {all=>1}' $*"
622
623See the documentation for Text::Autoformat to appreciate its many
624capabilities.
625
68dc0745 626=head2 How can I access/change the first N letters of a string?
627
628There are many ways. If you just want to grab a copy, use
92c2ed05 629substr():
68dc0745 630
631 $first_byte = substr($a, 0, 1);
632
633If you want to modify part of a string, the simplest way is often to
634use substr() as an lvalue:
635
636 substr($a, 0, 3) = "Tom";
637
92c2ed05 638Although those with a pattern matching kind of thought process will
a6dd486b 639likely prefer
68dc0745 640
641 $a =~ s/^.../Tom/;
642
643=head2 How do I change the Nth occurrence of something?
644
92c2ed05 645You have to keep track of N yourself. For example, let's say you want
646to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 647C<"whosoever"> or C<"whomsoever">, case insensitively. These
648all assume that $_ contains the string to be altered.
68dc0745 649
650 $count = 0;
651 s{((whom?)ever)}{
652 ++$count == 5 # is it the 5th?
653 ? "${2}soever" # yes, swap
654 : $1 # renege and leave it there
d92eb7b0 655 }ige;
68dc0745 656
5a964f20 657In the more general case, you can use the C</g> modifier in a C<while>
658loop, keeping count of matches.
659
660 $WANT = 3;
661 $count = 0;
d92eb7b0 662 $_ = "One fish two fish red fish blue fish";
5a964f20 663 while (/(\w+)\s+fish\b/gi) {
664 if (++$count == $WANT) {
665 print "The third fish is a $1 one.\n";
5a964f20 666 }
667 }
668
92c2ed05 669That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 670repetition count and repeated pattern like this:
671
672 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
673
68dc0745 674=head2 How can I count the number of occurrences of a substring within a string?
675
a6dd486b 676There are a number of ways, with varying efficiency. If you want a
68dc0745 677count of a certain single character (X) within a string, you can use the
678C<tr///> function like so:
679
368c9434 680 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 681 $count = ($string =~ tr/X//);
d92eb7b0 682 print "There are $count X characters in the string";
68dc0745 683
684This is fine if you are just looking for a single character. However,
685if you are trying to count multiple character substrings within a
686larger string, C<tr///> won't work. What you can do is wrap a while()
687loop around a global pattern match. For example, let's count negative
688integers:
689
690 $string = "-9 55 48 -2 23 -76 4 14 -44";
691 while ($string =~ /-\d+/g) { $count++ }
692 print "There are $count negative numbers in the string";
693
694=head2 How do I capitalize all the words on one line?
695
696To make the first letter of each word upper case:
3fe9a6f1 697
68dc0745 698 $line =~ s/\b(\w)/\U$1/g;
699
46fc3d4c 700This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 701Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 702more thorough solution (Suggested by brian d foy):
46fc3d4c 703
704 $string =~ s/ (
705 (^\w) #at the beginning of the line
706 | # or
707 (\s\w) #preceded by whitespace
708 )
709 /\U$1/xg;
710 $string =~ /([\w']+)/\u\L$1/g;
711
68dc0745 712To make the whole line upper case:
3fe9a6f1 713
68dc0745 714 $line = uc($line);
715
716To force each word to be lower case, with the first letter upper case:
3fe9a6f1 717
68dc0745 718 $line =~ s/(\w+)/\u\L$1/g;
719
5a964f20 720You can (and probably should) enable locale awareness of those
721characters by placing a C<use locale> pragma in your program.
92c2ed05 722See L<perllocale> for endless details on locales.
5a964f20 723
65acb1b1 724This is sometimes referred to as putting something into "title
d92eb7b0 725case", but that's not quite accurate. Consider the proper
65acb1b1 726capitalization of the movie I<Dr. Strangelove or: How I Learned to
727Stop Worrying and Love the Bomb>, for example.
728
68dc0745 729=head2 How can I split a [character] delimited string except when inside
730[character]? (Comma-separated files)
731
732Take the example case of trying to split a string that is comma-separated
733into its different fields. (We'll pretend you said comma-separated, not
734comma-delimited, which is different and almost never what you mean.) You
735can't use C<split(/,/)> because you shouldn't split if the comma is inside
736quotes. For example, take a data line like this:
737
738 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
739
740Due to the restriction of the quotes, this is a fairly complex
741problem. Thankfully, we have Jeffrey Friedl, author of a highly
742recommended book on regular expressions, to handle these for us. He
743suggests (assuming your string is contained in $text):
744
745 @new = ();
746 push(@new, $+) while $text =~ m{
747 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
748 | ([^,]+),?
749 | ,
750 }gx;
751 push(@new, undef) if substr($text,-1,1) eq ',';
752
46fc3d4c 753If you want to represent quotation marks inside a
754quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 755C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 756this section.
757
87275199 758Alternatively, the Text::ParseWords module (part of the standard Perl
68dc0745 759distribution) lets you say:
760
761 use Text::ParseWords;
762 @new = quotewords(",", 0, $text);
763
a6dd486b 764There's also a Text::CSV (Comma-Separated Values) module on CPAN.
65acb1b1 765
68dc0745 766=head2 How do I strip blank space from the beginning/end of a string?
767
a6dd486b 768Although the simplest approach would seem to be
68dc0745 769
770 $string =~ s/^\s*(.*?)\s*$/$1/;
771
a6dd486b 772not only is this unnecessarily slow and destructive, it also fails with
d92eb7b0 773embedded newlines. It is much faster to do this operation in two steps:
68dc0745 774
775 $string =~ s/^\s+//;
776 $string =~ s/\s+$//;
777
778Or more nicely written as:
779
780 for ($string) {
781 s/^\s+//;
782 s/\s+$//;
783 }
784
5e3006a4 785This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 786behavior to factor out common code. You can do this
787on several strings at once, or arrays, or even the
d92eb7b0 788values of a hash if you use a slice:
5a964f20 789
790 # trim whitespace in the scalar, the array,
791 # and all the values in the hash
792 foreach ($scalar, @array, @hash{keys %hash}) {
793 s/^\s+//;
794 s/\s+$//;
795 }
796
65acb1b1 797=head2 How do I pad a string with blanks or pad a number with zeroes?
798
d92eb7b0 799(This answer contributed by Uri Guttman, with kibitzing from
800Bart Lateur.)
65acb1b1 801
802In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 803to pad the string, C<$text> or C<$num> contains the string to be padded,
804and C<$pad_char> contains the padding character. You can use a single
805character string constant instead of the C<$pad_char> variable if you
806know what it is in advance. And in the same way you can use an integer in
807place of C<$pad_len> if you know the pad length in advance.
65acb1b1 808
d92eb7b0 809The simplest method uses the C<sprintf> function. It can pad on the left
810or right with blanks and on the left with zeroes and it will not
811truncate the result. The C<pack> function can only pad strings on the
812right with blanks and it will truncate the result to a maximum length of
813C<$pad_len>.
65acb1b1 814
d92eb7b0 815 # Left padding a string with blanks (no truncation):
816 $padded = sprintf("%${pad_len}s", $text);
65acb1b1 817
d92eb7b0 818 # Right padding a string with blanks (no truncation):
819 $padded = sprintf("%-${pad_len}s", $text);
65acb1b1 820
d92eb7b0 821 # Left padding a number with 0 (no truncation):
822 $padded = sprintf("%0${pad_len}d", $num);
65acb1b1 823
d92eb7b0 824 # Right padding a string with blanks using pack (will truncate):
825 $padded = pack("A$pad_len",$text);
65acb1b1 826
d92eb7b0 827If you need to pad with a character other than blank or zero you can use
828one of the following methods. They all generate a pad string with the
829C<x> operator and combine that with C<$text>. These methods do
830not truncate C<$text>.
65acb1b1 831
d92eb7b0 832Left and right padding with any character, creating a new string:
65acb1b1 833
d92eb7b0 834 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
835 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 836
d92eb7b0 837Left and right padding with any character, modifying C<$text> directly:
65acb1b1 838
d92eb7b0 839 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
840 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 841
68dc0745 842=head2 How do I extract selected columns from a string?
843
844Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 845If you prefer thinking in terms of columns instead of widths,
846you can use this kind of thing:
847
848 # determine the unpack format needed to split Linux ps output
849 # arguments are cut columns
850 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
851
852 sub cut2fmt {
853 my(@positions) = @_;
854 my $template = '';
855 my $lastpos = 1;
856 for my $place (@positions) {
857 $template .= "A" . ($place - $lastpos) . " ";
858 $lastpos = $place;
859 }
860 $template .= "A*";
861 return $template;
862 }
68dc0745 863
864=head2 How do I find the soundex value of a string?
865
87275199 866Use the standard Text::Soundex module distributed with Perl.
a6dd486b 867Before you do so, you may want to determine whether `soundex' is in
d92eb7b0 868fact what you think it is. Knuth's soundex algorithm compresses words
869into a small space, and so it does not necessarily distinguish between
870two words which you might want to appear separately. For example, the
871last names `Knuth' and `Kant' are both mapped to the soundex code K530.
872If Text::Soundex does not do what you are looking for, you might want
873to consider the String::Approx module available at CPAN.
68dc0745 874
875=head2 How can I expand variables in text strings?
876
877Let's assume that you have a string like:
878
879 $text = 'this has a $foo in it and a $bar';
5a964f20 880
881If those were both global variables, then this would
882suffice:
883
65acb1b1 884 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 885
5a964f20 886But since they are probably lexicals, or at least, they could
887be, you'd have to do this:
68dc0745 888
889 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 890 die if $@; # needed /ee, not /e
68dc0745 891
5a964f20 892It's probably better in the general case to treat those
893variables as entries in some special hash. For example:
894
895 %user_defs = (
896 foo => 23,
897 bar => 19,
898 );
899 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 900
92c2ed05 901See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 902of the FAQ.
903
68dc0745 904=head2 What's wrong with always quoting "$vars"?
905
a6dd486b 906The problem is that those double-quotes force stringification--
907coercing numbers and references into strings--even when you
908don't want them to be strings. Think of it this way: double-quote
65acb1b1 909expansion is used to produce new strings. If you already
910have a string, why do you need more?
68dc0745 911
912If you get used to writing odd things like these:
913
914 print "$var"; # BAD
915 $new = "$old"; # BAD
916 somefunc("$var"); # BAD
917
918You'll be in trouble. Those should (in 99.8% of the cases) be
919the simpler and more direct:
920
921 print $var;
922 $new = $old;
923 somefunc($var);
924
925Otherwise, besides slowing you down, you're going to break code when
926the thing in the scalar is actually neither a string nor a number, but
927a reference:
928
929 func(\@array);
930 sub func {
931 my $aref = shift;
932 my $oref = "$aref"; # WRONG
933 }
934
935You can also get into subtle problems on those few operations in Perl
936that actually do care about the difference between a string and a
937number, such as the magical C<++> autoincrement operator or the
938syscall() function.
939
5a964f20 940Stringification also destroys arrays.
941
942 @lines = `command`;
943 print "@lines"; # WRONG - extra blanks
944 print @lines; # right
945
c47ff5f1 946=head2 Why don't my <<HERE documents work?
68dc0745 947
948Check for these three things:
949
950=over 4
951
952=item 1. There must be no space after the << part.
953
954=item 2. There (probably) should be a semicolon at the end.
955
956=item 3. You can't (easily) have any space in front of the tag.
957
958=back
959
5a964f20 960If you want to indent the text in the here document, you
961can do this:
962
963 # all in one
964 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
965 your text
966 goes here
967 HERE_TARGET
968
969But the HERE_TARGET must still be flush against the margin.
970If you want that indented also, you'll have to quote
971in the indentation.
972
973 ($quote = <<' FINIS') =~ s/^\s+//gm;
974 ...we will have peace, when you and all your works have
975 perished--and the works of your dark master to whom you
976 would deliver us. You are a liar, Saruman, and a corrupter
977 of men's hearts. --Theoden in /usr/src/perl/taint.c
978 FINIS
979 $quote =~ s/\s*--/\n--/;
980
981A nice general-purpose fixer-upper function for indented here documents
982follows. It expects to be called with a here document as its argument.
983It looks to see whether each line begins with a common substring, and
a6dd486b 984if so, strips that substring off. Otherwise, it takes the amount of leading
985whitespace found on the first line and removes that much off each
5a964f20 986subsequent line.
987
988 sub fix {
989 local $_ = shift;
a6dd486b 990 my ($white, $leader); # common whitespace and common leading string
5a964f20 991 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
992 ($white, $leader) = ($2, quotemeta($1));
993 } else {
994 ($white, $leader) = (/^(\s+)/, '');
995 }
996 s/^\s*?$leader(?:$white)?//gm;
997 return $_;
998 }
999
c8db1d39 1000This works with leading special strings, dynamically determined:
5a964f20 1001
1002 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1003 @@@ int
1004 @@@ runops() {
1005 @@@ SAVEI32(runlevel);
1006 @@@ runlevel++;
d92eb7b0 1007 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1008 @@@ TAINT_NOT;
1009 @@@ return 0;
1010 @@@ }
1011 MAIN_INTERPRETER_LOOP
1012
a6dd486b 1013Or with a fixed amount of leading whitespace, with remaining
5a964f20 1014indentation correctly preserved:
1015
1016 $poem = fix<<EVER_ON_AND_ON;
1017 Now far ahead the Road has gone,
1018 And I must follow, if I can,
1019 Pursuing it with eager feet,
1020 Until it joins some larger way
1021 Where many paths and errands meet.
1022 And whither then? I cannot say.
1023 --Bilbo in /usr/src/perl/pp_ctl.c
1024 EVER_ON_AND_ON
1025
68dc0745 1026=head1 Data: Arrays
1027
65acb1b1 1028=head2 What is the difference between a list and an array?
1029
1030An array has a changeable length. A list does not. An array is something
1031you can push or pop, while a list is a set of values. Some people make
1032the distinction that a list is a value while an array is a variable.
1033Subroutines are passed and return lists, you put things into list
1034context, you initialize arrays with lists, and you foreach() across
1035a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
1036in scalar context behave like the number of elements in them, subroutines
a6dd486b 1037access their arguments through the array C<@_>, and push/pop/shift only work
65acb1b1 1038on arrays.
1039
1040As a side note, there's no such thing as a list in scalar context.
1041When you say
1042
1043 $scalar = (2, 5, 7, 9);
1044
d92eb7b0 1045you're using the comma operator in scalar context, so it uses the scalar
1046comma operator. There never was a list there at all! This causes the
1047last value to be returned: 9.
65acb1b1 1048
68dc0745 1049=head2 What is the difference between $array[1] and @array[1]?
1050
a6dd486b 1051The former is a scalar value; the latter an array slice, making
68dc0745 1052it a list with one (scalar) value. You should use $ when you want a
1053scalar value (most of the time) and @ when you want a list with one
1054scalar value in it (very, very rarely; nearly never, in fact).
1055
1056Sometimes it doesn't make a difference, but sometimes it does.
1057For example, compare:
1058
1059 $good[0] = `some program that outputs several lines`;
1060
1061with
1062
1063 @bad[0] = `same program that outputs several lines`;
1064
9f1b1f2d 1065The C<use warnings> pragma and the B<-w> flag will warn you about these
1066matters.
68dc0745 1067
d92eb7b0 1068=head2 How can I remove duplicate elements from a list or array?
68dc0745 1069
1070There are several possible ways, depending on whether the array is
1071ordered and whether you wish to preserve the ordering.
1072
1073=over 4
1074
551e1d92 1075=item a)
1076
1077If @in is sorted, and you want @out to be sorted:
5a964f20 1078(this assumes all true values in the array)
68dc0745 1079
a4341a65 1080 $prev = "not equal to $in[0]";
3bc5ef3e 1081 @out = grep($_ ne $prev && ($prev = $_, 1), @in);
68dc0745 1082
c8db1d39 1083This is nice in that it doesn't use much extra memory, simulating
3bc5ef3e 1084uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
1085guarantees that the expression is true (so that grep picks it up)
1086even if the $_ is 0, "", or undef.
68dc0745 1087
551e1d92 1088=item b)
1089
1090If you don't know whether @in is sorted:
68dc0745 1091
1092 undef %saw;
1093 @out = grep(!$saw{$_}++, @in);
1094
551e1d92 1095=item c)
1096
1097Like (b), but @in contains only small integers:
68dc0745 1098
1099 @out = grep(!$saw[$_]++, @in);
1100
551e1d92 1101=item d)
1102
1103A way to do (b) without any loops or greps:
68dc0745 1104
1105 undef %saw;
1106 @saw{@in} = ();
1107 @out = sort keys %saw; # remove sort if undesired
1108
551e1d92 1109=item e)
1110
1111Like (d), but @in contains only small positive integers:
68dc0745 1112
1113 undef @ary;
1114 @ary[@in] = @in;
87275199 1115 @out = grep {defined} @ary;
68dc0745 1116
1117=back
1118
65acb1b1 1119But perhaps you should have been using a hash all along, eh?
1120
ddbc1f16 1121=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1122
1123Hearing the word "in" is an I<in>dication that you probably should have
1124used a hash, not a list or array, to store your data. Hashes are
1125designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1126
5a964f20 1127That being said, there are several ways to approach this. If you
1128are going to make this query many times over arbitrary string values,
1129the fastest way is probably to invert the original array and keep an
68dc0745 1130associative array lying about whose keys are the first array's values.
1131
1132 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1133 undef %is_blue;
1134 for (@blues) { $is_blue{$_} = 1 }
1135
1136Now you can check whether $is_blue{$some_color}. It might have been a
1137good idea to keep the blues all in a hash in the first place.
1138
1139If the values are all small integers, you could use a simple indexed
1140array. This kind of an array will take up less space:
1141
1142 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1143 undef @is_tiny_prime;
d92eb7b0 1144 for (@primes) { $is_tiny_prime[$_] = 1 }
1145 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1146
1147Now you check whether $is_tiny_prime[$some_number].
1148
1149If the values in question are integers instead of strings, you can save
1150quite a lot of space by using bit strings instead:
1151
1152 @articles = ( 1..10, 150..2000, 2017 );
1153 undef $read;
7b8d334a 1154 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1155
1156Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1157
1158Please do not use
1159
a6dd486b 1160 ($is_there) = grep $_ eq $whatever, @array;
68dc0745 1161
1162or worse yet
1163
a6dd486b 1164 ($is_there) = grep /$whatever/, @array;
68dc0745 1165
1166These are slow (checks every element even if the first matches),
1167inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1168regex characters in $whatever?). If you're only testing once, then
65acb1b1 1169use:
1170
1171 $is_there = 0;
1172 foreach $elt (@array) {
1173 if ($elt eq $elt_to_find) {
1174 $is_there = 1;
1175 last;
1176 }
1177 }
1178 if ($is_there) { ... }
68dc0745 1179
1180=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1181
1182Use a hash. Here's code to do both and more. It assumes that
1183each element is unique in a given array:
1184
1185 @union = @intersection = @difference = ();
1186 %count = ();
1187 foreach $element (@array1, @array2) { $count{$element}++ }
1188 foreach $element (keys %count) {
1189 push @union, $element;
1190 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1191 }
1192
d92eb7b0 1193Note that this is the I<symmetric difference>, that is, all elements in
a6dd486b 1194either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1195
65acb1b1 1196=head2 How do I test whether two arrays or hashes are equal?
1197
1198The following code works for single-level arrays. It uses a stringwise
1199comparison, and does not distinguish defined versus undefined empty
1200strings. Modify if you have other needs.
1201
1202 $are_equal = compare_arrays(\@frogs, \@toads);
1203
1204 sub compare_arrays {
1205 my ($first, $second) = @_;
9f1b1f2d 1206 no warnings; # silence spurious -w undef complaints
65acb1b1 1207 return 0 unless @$first == @$second;
1208 for (my $i = 0; $i < @$first; $i++) {
1209 return 0 if $first->[$i] ne $second->[$i];
1210 }
1211 return 1;
1212 }
1213
1214For multilevel structures, you may wish to use an approach more
1215like this one. It uses the CPAN module FreezeThaw:
1216
1217 use FreezeThaw qw(cmpStr);
1218 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1219
1220 printf "a and b contain %s arrays\n",
1221 cmpStr(\@a, \@b) == 0
1222 ? "the same"
1223 : "different";
1224
1225This approach also works for comparing hashes. Here
1226we'll demonstrate two different answers:
1227
1228 use FreezeThaw qw(cmpStr cmpStrHard);
1229
1230 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1231 $a{EXTRA} = \%b;
1232 $b{EXTRA} = \%a;
1233
1234 printf "a and b contain %s hashes\n",
1235 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1236
1237 printf "a and b contain %s hashes\n",
1238 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1239
1240
1241The first reports that both those the hashes contain the same data,
1242while the second reports that they do not. Which you prefer is left as
1243an exercise to the reader.
1244
68dc0745 1245=head2 How do I find the first array element for which a condition is true?
1246
1247You can use this if you care about the index:
1248
65acb1b1 1249 for ($i= 0; $i < @array; $i++) {
68dc0745 1250 if ($array[$i] eq "Waldo") {
1251 $found_index = $i;
1252 last;
1253 }
1254 }
1255
1256Now C<$found_index> has what you want.
1257
1258=head2 How do I handle linked lists?
1259
1260In general, you usually don't need a linked list in Perl, since with
1261regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1262or you can use splice to add and/or remove arbitrary number of elements at
87275199 1263arbitrary points. Both pop and shift are both O(1) operations on Perl's
5a964f20 1264dynamic arrays. In the absence of shifts and pops, push in general
1265needs to reallocate on the order every log(N) times, and unshift will
1266need to copy pointers each time.
68dc0745 1267
1268If you really, really wanted, you could use structures as described in
1269L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1270to do. For example, imagine a list node like this:
1271
1272 $node = {
1273 VALUE => 42,
1274 LINK => undef,
1275 };
1276
1277You could walk the list this way:
1278
1279 print "List: ";
1280 for ($node = $head; $node; $node = $node->{LINK}) {
1281 print $node->{VALUE}, " ";
1282 }
1283 print "\n";
1284
a6dd486b 1285You could add to the list this way:
65acb1b1 1286
1287 my ($head, $tail);
1288 $tail = append($head, 1); # grow a new head
1289 for $value ( 2 .. 10 ) {
1290 $tail = append($tail, $value);
1291 }
1292
1293 sub append {
1294 my($list, $value) = @_;
1295 my $node = { VALUE => $value };
1296 if ($list) {
1297 $node->{LINK} = $list->{LINK};
1298 $list->{LINK} = $node;
1299 } else {
1300 $_[0] = $node; # replace caller's version
1301 }
1302 return $node;
1303 }
1304
1305But again, Perl's built-in are virtually always good enough.
68dc0745 1306
1307=head2 How do I handle circular lists?
1308
1309Circular lists could be handled in the traditional fashion with linked
1310lists, or you could just do something like this with an array:
1311
1312 unshift(@array, pop(@array)); # the last shall be first
1313 push(@array, shift(@array)); # and vice versa
1314
1315=head2 How do I shuffle an array randomly?
1316
45bbf655 1317If you either have Perl 5.8.0 or later installed, or if you have
1318Scalar-List-Utils 1.03 or later installed, you can say:
1319
1320 use List::Util 'shuffle';
1321
1322 @shuffled = shuffle(@list);
1323
1324If not, you can use this:
5a964f20 1325
cc30d1a7 1326 # fisher_yates_shuffle
1327 # generate a random permutation of an array in place
1328 # As in shuffling a deck of cards
1329 #
5a964f20 1330 sub fisher_yates_shuffle {
cc30d1a7 1331 my $deck = shift; # $deck is a reference to an array
1332 my $i = @$deck;
8caf10e0 1333 while (--$i) {
5a964f20 1334 my $j = int rand ($i+1);
cc30d1a7 1335 @$deck[$i,$j] = @$deck[$j,$i];
5a964f20 1336 }
1337 }
1338
cc30d1a7 1339And here is an example of using it:
1340
1341 #
1342 # shuffle my mpeg collection
1343 #
1344 my @mpeg = <audio/*/*.mp3>;
1345 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1346 print @mpeg;
5a964f20 1347
45bbf655 1348Note that the above implementation shuffles an array in place,
1349unlike the List::Util::shuffle() which takes a list and returns
1350a new shuffled list.
1351
d92eb7b0 1352You've probably seen shuffling algorithms that work using splice,
a6dd486b 1353randomly picking another element to swap the current element with
68dc0745 1354
1355 srand;
1356 @new = ();
1357 @old = 1 .. 10; # just a demo
1358 while (@old) {
1359 push(@new, splice(@old, rand @old, 1));
1360 }
1361
5a964f20 1362This is bad because splice is already O(N), and since you do it N times,
1363you just invented a quadratic algorithm; that is, O(N**2). This does
1364not scale, although Perl is so efficient that you probably won't notice
1365this until you have rather largish arrays.
68dc0745 1366
1367=head2 How do I process/modify each element of an array?
1368
1369Use C<for>/C<foreach>:
1370
1371 for (@lines) {
5a964f20 1372 s/foo/bar/; # change that word
1373 y/XZ/ZX/; # swap those letters
68dc0745 1374 }
1375
1376Here's another; let's compute spherical volumes:
1377
5a964f20 1378 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1379 $_ **= 3;
1380 $_ *= (4/3) * 3.14159; # this will be constant folded
1381 }
1382
5a964f20 1383If you want to do the same thing to modify the values of the hash,
1384you may not use the C<values> function, oddly enough. You need a slice:
1385
1386 for $orbit ( @orbits{keys %orbits} ) {
1387 ($orbit **= 3) *= (4/3) * 3.14159;
1388 }
1389
68dc0745 1390=head2 How do I select a random element from an array?
1391
1392Use the rand() function (see L<perlfunc/rand>):
1393
5a964f20 1394 # at the top of the program:
68dc0745 1395 srand; # not needed for 5.004 and later
5a964f20 1396
1397 # then later on
68dc0745 1398 $index = rand @array;
1399 $element = $array[$index];
1400
5a964f20 1401Make sure you I<only call srand once per program, if then>.
1402If you are calling it more than once (such as before each
1403call to rand), you're almost certainly doing something wrong.
1404
68dc0745 1405=head2 How do I permute N elements of a list?
1406
1407Here's a little program that generates all permutations
1408of all the words on each line of input. The algorithm embodied
5a964f20 1409in the permute() function should work on any list:
68dc0745 1410
1411 #!/usr/bin/perl -n
5a964f20 1412 # tsc-permute: permute each word of input
1413 permute([split], []);
1414 sub permute {
1415 my @items = @{ $_[0] };
1416 my @perms = @{ $_[1] };
1417 unless (@items) {
1418 print "@perms\n";
68dc0745 1419 } else {
5a964f20 1420 my(@newitems,@newperms,$i);
1421 foreach $i (0 .. $#items) {
1422 @newitems = @items;
1423 @newperms = @perms;
1424 unshift(@newperms, splice(@newitems, $i, 1));
1425 permute([@newitems], [@newperms]);
68dc0745 1426 }
1427 }
1428 }
1429
b8d2732a 1430Unfortunately, this algorithm is very inefficient. The Algorithm::Permute
1431module from CPAN runs at least an order of magnitude faster. If you don't
1432have a C compiler (or a binary distribution of Algorithm::Permute), then
1433you can use List::Permutor which is written in pure Perl, and is still
f8620f40 1434several times faster than the algorithm above.
b8d2732a 1435
68dc0745 1436=head2 How do I sort an array by (anything)?
1437
1438Supply a comparison function to sort() (described in L<perlfunc/sort>):
1439
1440 @list = sort { $a <=> $b } @list;
1441
1442The default sort function is cmp, string comparison, which would
c47ff5f1 1443sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1444the numerical comparison operator.
1445
1446If you have a complicated function needed to pull out the part you
1447want to sort on, then don't do it inside the sort function. Pull it
1448out first, because the sort BLOCK can be called many times for the
1449same element. Here's an example of how to pull out the first word
1450after the first number on each item, and then sort those words
1451case-insensitively.
1452
1453 @idx = ();
1454 for (@data) {
1455 ($item) = /\d+\s*(\S+)/;
1456 push @idx, uc($item);
1457 }
1458 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1459
a6dd486b 1460which could also be written this way, using a trick
68dc0745 1461that's come to be known as the Schwartzian Transform:
1462
1463 @sorted = map { $_->[0] }
1464 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1465 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1466
1467If you need to sort on several fields, the following paradigm is useful.
1468
1469 @sorted = sort { field1($a) <=> field1($b) ||
1470 field2($a) cmp field2($b) ||
1471 field3($a) cmp field3($b)
1472 } @data;
1473
1474This can be conveniently combined with precalculation of keys as given
1475above.
1476
06a5f41f 1477See the F<sort> artitcle article in the "Far More Than You Ever Wanted
1478To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for
1479more about this approach.
68dc0745 1480
1481See also the question below on sorting hashes.
1482
1483=head2 How do I manipulate arrays of bits?
1484
1485Use pack() and unpack(), or else vec() and the bitwise operations.
1486
1487For example, this sets $vec to have bit N set if $ints[N] was set:
1488
1489 $vec = '';
1490 foreach(@ints) { vec($vec,$_,1) = 1 }
1491
cc30d1a7 1492Here's how, given a vector in $vec, you can
68dc0745 1493get those bits into your @ints array:
1494
1495 sub bitvec_to_list {
1496 my $vec = shift;
1497 my @ints;
1498 # Find null-byte density then select best algorithm
1499 if ($vec =~ tr/\0// / length $vec > 0.95) {
1500 use integer;
1501 my $i;
1502 # This method is faster with mostly null-bytes
1503 while($vec =~ /[^\0]/g ) {
1504 $i = -9 + 8 * pos $vec;
1505 push @ints, $i if vec($vec, ++$i, 1);
1506 push @ints, $i if vec($vec, ++$i, 1);
1507 push @ints, $i if vec($vec, ++$i, 1);
1508 push @ints, $i if vec($vec, ++$i, 1);
1509 push @ints, $i if vec($vec, ++$i, 1);
1510 push @ints, $i if vec($vec, ++$i, 1);
1511 push @ints, $i if vec($vec, ++$i, 1);
1512 push @ints, $i if vec($vec, ++$i, 1);
1513 }
1514 } else {
1515 # This method is a fast general algorithm
1516 use integer;
1517 my $bits = unpack "b*", $vec;
1518 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1519 push @ints, pos $bits while($bits =~ /1/g);
1520 }
1521 return \@ints;
1522 }
1523
1524This method gets faster the more sparse the bit vector is.
1525(Courtesy of Tim Bunce and Winfried Koenig.)
1526
cc30d1a7 1527Or use the CPAN module Bit::Vector:
1528
1529 $vector = Bit::Vector->new($num_of_bits);
1530 $vector->Index_List_Store(@ints);
1531 @ints = $vector->Index_List_Read();
1532
1533Bit::Vector provides efficient methods for bit vector, sets of small integers
1534and "big int" math.
1535
1536Here's a more extensive illustration using vec():
65acb1b1 1537
1538 # vec demo
1539 $vector = "\xff\x0f\xef\xfe";
1540 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1541 unpack("N", $vector), "\n";
1542 $is_set = vec($vector, 23, 1);
1543 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1544 pvec($vector);
1545
1546 set_vec(1,1,1);
1547 set_vec(3,1,1);
1548 set_vec(23,1,1);
1549
1550 set_vec(3,1,3);
1551 set_vec(3,2,3);
1552 set_vec(3,4,3);
1553 set_vec(3,4,7);
1554 set_vec(3,8,3);
1555 set_vec(3,8,7);
1556
1557 set_vec(0,32,17);
1558 set_vec(1,32,17);
1559
1560 sub set_vec {
1561 my ($offset, $width, $value) = @_;
1562 my $vector = '';
1563 vec($vector, $offset, $width) = $value;
1564 print "offset=$offset width=$width value=$value\n";
1565 pvec($vector);
1566 }
1567
1568 sub pvec {
1569 my $vector = shift;
1570 my $bits = unpack("b*", $vector);
1571 my $i = 0;
1572 my $BASE = 8;
1573
1574 print "vector length in bytes: ", length($vector), "\n";
1575 @bytes = unpack("A8" x length($vector), $bits);
1576 print "bits are: @bytes\n\n";
1577 }
1578
68dc0745 1579=head2 Why does defined() return true on empty arrays and hashes?
1580
65acb1b1 1581The short story is that you should probably only use defined on scalars or
1582functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1583in the 5.004 release or later of Perl for more detail.
68dc0745 1584
1585=head1 Data: Hashes (Associative Arrays)
1586
1587=head2 How do I process an entire hash?
1588
1589Use the each() function (see L<perlfunc/each>) if you don't care
1590whether it's sorted:
1591
5a964f20 1592 while ( ($key, $value) = each %hash) {
68dc0745 1593 print "$key = $value\n";
1594 }
1595
1596If you want it sorted, you'll have to use foreach() on the result of
1597sorting the keys as shown in an earlier question.
1598
1599=head2 What happens if I add or remove keys from a hash while iterating over it?
1600
d92eb7b0 1601Don't do that. :-)
1602
1603[lwall] In Perl 4, you were not allowed to modify a hash at all while
87275199 1604iterating over it. In Perl 5 you can delete from it, but you still
d92eb7b0 1605can't add to it, because that might cause a doubling of the hash table,
1606in which half the entries get copied up to the new top half of the
87275199 1607table, at which point you've totally bamboozled the iterator code.
d92eb7b0 1608Even if the table doesn't double, there's no telling whether your new
1609entry will be inserted before or after the current iterator position.
1610
a6dd486b 1611Either treasure up your changes and make them after the iterator finishes
d92eb7b0 1612or use keys to fetch all the old keys at once, and iterate over the list
1613of keys.
68dc0745 1614
1615=head2 How do I look up a hash element by value?
1616
1617Create a reverse hash:
1618
1619 %by_value = reverse %by_key;
1620 $key = $by_value{$value};
1621
1622That's not particularly efficient. It would be more space-efficient
1623to use:
1624
1625 while (($key, $value) = each %by_key) {
1626 $by_value{$value} = $key;
1627 }
1628
d92eb7b0 1629If your hash could have repeated values, the methods above will only find
1630one of the associated keys. This may or may not worry you. If it does
1631worry you, you can always reverse the hash into a hash of arrays instead:
1632
1633 while (($key, $value) = each %by_key) {
1634 push @{$key_list_by_value{$value}}, $key;
1635 }
68dc0745 1636
1637=head2 How can I know how many entries are in a hash?
1638
1639If you mean how many keys, then all you have to do is
875e5c2f 1640use the keys() function in a scalar context:
68dc0745 1641
875e5c2f 1642 $num_keys = keys %hash;
68dc0745 1643
875e5c2f 1644The keys() function also resets the iterator, which means that you may
1645see strange results if you use this between uses of other hash operators
1646such as each().
68dc0745 1647
1648=head2 How do I sort a hash (optionally by value instead of key)?
1649
1650Internally, hashes are stored in a way that prevents you from imposing
1651an order on key-value pairs. Instead, you have to sort a list of the
1652keys or values:
1653
1654 @keys = sort keys %hash; # sorted by key
1655 @keys = sort {
1656 $hash{$a} cmp $hash{$b}
1657 } keys %hash; # and by value
1658
1659Here we'll do a reverse numeric sort by value, and if two keys are
a6dd486b 1660identical, sort by length of key, or if that fails, by straight ASCII
1661comparison of the keys (well, possibly modified by your locale--see
68dc0745 1662L<perllocale>).
1663
1664 @keys = sort {
1665 $hash{$b} <=> $hash{$a}
1666 ||
1667 length($b) <=> length($a)
1668 ||
1669 $a cmp $b
1670 } keys %hash;
1671
1672=head2 How can I always keep my hash sorted?
1673
1674You can look into using the DB_File module and tie() using the
1675$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1676The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1677
1678=head2 What's the difference between "delete" and "undef" with hashes?
1679
1680Hashes are pairs of scalars: the first is the key, the second is the
1681value. The key will be coerced to a string, although the value can be
1682any kind of scalar: string, number, or reference. If a key C<$key> is
1683present in the array, C<exists($key)> will return true. The value for
1684a given key can be C<undef>, in which case C<$array{$key}> will be
1685C<undef> while C<$exists{$key}> will return true. This corresponds to
1686(C<$key>, C<undef>) being in the hash.
1687
1688Pictures help... here's the C<%ary> table:
1689
1690 keys values
1691 +------+------+
1692 | a | 3 |
1693 | x | 7 |
1694 | d | 0 |
1695 | e | 2 |
1696 +------+------+
1697
1698And these conditions hold
1699
1700 $ary{'a'} is true
1701 $ary{'d'} is false
1702 defined $ary{'d'} is true
1703 defined $ary{'a'} is true
87275199 1704 exists $ary{'a'} is true (Perl5 only)
68dc0745 1705 grep ($_ eq 'a', keys %ary) is true
1706
1707If you now say
1708
1709 undef $ary{'a'}
1710
1711your table now reads:
1712
1713
1714 keys values
1715 +------+------+
1716 | a | undef|
1717 | x | 7 |
1718 | d | 0 |
1719 | e | 2 |
1720 +------+------+
1721
1722and these conditions now hold; changes in caps:
1723
1724 $ary{'a'} is FALSE
1725 $ary{'d'} is false
1726 defined $ary{'d'} is true
1727 defined $ary{'a'} is FALSE
87275199 1728 exists $ary{'a'} is true (Perl5 only)
68dc0745 1729 grep ($_ eq 'a', keys %ary) is true
1730
1731Notice the last two: you have an undef value, but a defined key!
1732
1733Now, consider this:
1734
1735 delete $ary{'a'}
1736
1737your table now reads:
1738
1739 keys values
1740 +------+------+
1741 | x | 7 |
1742 | d | 0 |
1743 | e | 2 |
1744 +------+------+
1745
1746and these conditions now hold; changes in caps:
1747
1748 $ary{'a'} is false
1749 $ary{'d'} is false
1750 defined $ary{'d'} is true
1751 defined $ary{'a'} is false
87275199 1752 exists $ary{'a'} is FALSE (Perl5 only)
68dc0745 1753 grep ($_ eq 'a', keys %ary) is FALSE
1754
1755See, the whole entry is gone!
1756
1757=head2 Why don't my tied hashes make the defined/exists distinction?
1758
1759They may or may not implement the EXISTS() and DEFINED() methods
1760differently. For example, there isn't the concept of undef with hashes
1761that are tied to DBM* files. This means the true/false tables above
1762will give different results when used on such a hash. It also means
1763that exists and defined do the same thing with a DBM* file, and what
1764they end up doing is not what they do with ordinary hashes.
1765
1766=head2 How do I reset an each() operation part-way through?
1767
5a964f20 1768Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1769the hash I<and> resets the iterator associated with the hash. You may
1770need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1771re-enter it, the hash iterator has been reset.
68dc0745 1772
1773=head2 How can I get the unique keys from two hashes?
1774
d92eb7b0 1775First you extract the keys from the hashes into lists, then solve
1776the "removing duplicates" problem described above. For example:
68dc0745 1777
1778 %seen = ();
1779 for $element (keys(%foo), keys(%bar)) {
1780 $seen{$element}++;
1781 }
1782 @uniq = keys %seen;
1783
1784Or more succinctly:
1785
1786 @uniq = keys %{{%foo,%bar}};
1787
1788Or if you really want to save space:
1789
1790 %seen = ();
1791 while (defined ($key = each %foo)) {
1792 $seen{$key}++;
1793 }
1794 while (defined ($key = each %bar)) {
1795 $seen{$key}++;
1796 }
1797 @uniq = keys %seen;
1798
1799=head2 How can I store a multidimensional array in a DBM file?
1800
1801Either stringify the structure yourself (no fun), or else
1802get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1803it on top of either DB_File or GDBM_File.
1804
1805=head2 How can I make my hash remember the order I put elements into it?
1806
1807Use the Tie::IxHash from CPAN.
1808
46fc3d4c 1809 use Tie::IxHash;
1810 tie(%myhash, Tie::IxHash);
1811 for ($i=0; $i<20; $i++) {
1812 $myhash{$i} = 2*$i;
1813 }
1814 @keys = keys %myhash;
1815 # @keys = (0,1,2,3,...)
1816
68dc0745 1817=head2 Why does passing a subroutine an undefined element in a hash create it?
1818
1819If you say something like:
1820
1821 somefunc($hash{"nonesuch key here"});
1822
1823Then that element "autovivifies"; that is, it springs into existence
1824whether you store something there or not. That's because functions
1825get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1826it has to be ready to write it back into the caller's version.
1827
87275199 1828This has been fixed as of Perl5.004.
68dc0745 1829
1830Normally, merely accessing a key's value for a nonexistent key does
1831I<not> cause that key to be forever there. This is different than
1832awk's behavior.
1833
fc36a67e 1834=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1835
65acb1b1 1836Usually a hash ref, perhaps like this:
1837
1838 $record = {
1839 NAME => "Jason",
1840 EMPNO => 132,
1841 TITLE => "deputy peon",
1842 AGE => 23,
1843 SALARY => 37_000,
1844 PALS => [ "Norbert", "Rhys", "Phineas"],
1845 };
1846
1847References are documented in L<perlref> and the upcoming L<perlreftut>.
1848Examples of complex data structures are given in L<perldsc> and
1849L<perllol>. Examples of structures and object-oriented classes are
1850in L<perltoot>.
68dc0745 1851
1852=head2 How can I use a reference as a hash key?
1853
fe854a6f 1854You can't do this directly, but you could use the standard Tie::RefHash
87275199 1855module distributed with Perl.
68dc0745 1856
1857=head1 Data: Misc
1858
1859=head2 How do I handle binary data correctly?
1860
1861Perl is binary clean, so this shouldn't be a problem. For example,
1862this works fine (assuming the files are found):
1863
1864 if (`cat /vmunix` =~ /gzip/) {
1865 print "Your kernel is GNU-zip enabled!\n";
1866 }
1867
d92eb7b0 1868On less elegant (read: Byzantine) systems, however, you have
1869to play tedious games with "text" versus "binary" files. See
1870L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
1871systems are curses out of Microsoft, who seem to be committed to putting
1872the backward into backward compatibility.
68dc0745 1873
1874If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1875
54310121 1876If you want to deal with multibyte characters, however, there are
68dc0745 1877some gotchas. See the section on Regular Expressions.
1878
1879=head2 How do I determine whether a scalar is a number/whole/integer/float?
1880
1881Assuming that you don't care about IEEE notations like "NaN" or
1882"Infinity", you probably just want to use a regular expression.
1883
65acb1b1 1884 if (/\D/) { print "has nondigits\n" }
1885 if (/^\d+$/) { print "is a whole number\n" }
1886 if (/^-?\d+$/) { print "is an integer\n" }
1887 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1888 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1889 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
1890 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1891 { print "a C float" }
68dc0745 1892
5a964f20 1893If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1894function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1895wrapper function for more convenient access. This function takes
1896a string and returns the number it found, or C<undef> for input that
1897isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1898if you just want to say, ``Is this a float?''
1899
1900 sub getnum {
1901 use POSIX qw(strtod);
1902 my $str = shift;
1903 $str =~ s/^\s+//;
1904 $str =~ s/\s+$//;
1905 $! = 0;
1906 my($num, $unparsed) = strtod($str);
1907 if (($str eq '') || ($unparsed != 0) || $!) {
1908 return undef;
1909 } else {
1910 return $num;
1911 }
1912 }
1913
072dc14b 1914 sub is_numeric { defined getnum($_[0]) }
5a964f20 1915
6cecdcac 1916Or you could check out the String::Scanf module on CPAN instead. The
1917POSIX module (part of the standard Perl distribution) provides the
bf4acbe4 1918C<strtod> and C<strtol> for converting strings to double and longs,
6cecdcac 1919respectively.
68dc0745 1920
1921=head2 How do I keep persistent data across program calls?
1922
1923For some specific applications, you can use one of the DBM modules.
fe854a6f 1924See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
1925or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
1926of the standard distribution. Here's one example using Storable's C<store>
1927and C<retrieve> functions:
65acb1b1 1928
1929 use Storable;
1930 store(\%hash, "filename");
1931
1932 # later on...
1933 $href = retrieve("filename"); # by ref
1934 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1935
1936=head2 How do I print out or copy a recursive data structure?
1937
65acb1b1 1938The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1939for printing out data structures. The Storable module, found on CPAN,
1940provides a function called C<dclone> that recursively copies its argument.
1941
1942 use Storable qw(dclone);
1943 $r2 = dclone($r1);
68dc0745 1944
65acb1b1 1945Where $r1 can be a reference to any kind of data structure you'd like.
1946It will be deeply copied. Because C<dclone> takes and returns references,
1947you'd have to add extra punctuation if you had a hash of arrays that
1948you wanted to copy.
68dc0745 1949
65acb1b1 1950 %newhash = %{ dclone(\%oldhash) };
68dc0745 1951
1952=head2 How do I define methods for every class/object?
1953
1954Use the UNIVERSAL class (see L<UNIVERSAL>).
1955
1956=head2 How do I verify a credit card checksum?
1957
1958Get the Business::CreditCard module from CPAN.
1959
65acb1b1 1960=head2 How do I pack arrays of doubles or floats for XS code?
1961
1962The kgbpack.c code in the PGPLOT module on CPAN does just this.
1963If you're doing a lot of float or double processing, consider using
1964the PDL module from CPAN instead--it makes number-crunching easy.
1965
68dc0745 1966=head1 AUTHOR AND COPYRIGHT
1967
65acb1b1 1968Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1969All rights reserved.
1970
5a7beb56 1971This documentation is free; you can redistribute it and/or modify it
1972under the same terms as Perl itself.
5a964f20 1973
1974Irrespective of its distribution, all code examples in this file
1975are hereby placed into the public domain. You are permitted and
1976encouraged to use this code in your own programs for fun
1977or for profit as you see fit. A simple comment in the code giving
1978credit would be courteous but is not required.