better diagnostics on read operations from write-only
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
d92eb7b0 3perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $)
68dc0745 4
5=head1 DESCRIPTION
6
7The section of the FAQ answers question related to the manipulation
8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20 15The infinite set that a mathematician thinks of as the real numbers can
16only be approximate on a computer, since the computer only has a finite
17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05 20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
46fc3d4c 22representation (eg, 19.95) to the internal binary representation.
23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
32current output format for numbers (see L<perlvar/"$#"> if you use
33print. C<$#> has a different default value in Perl5 than it did in
34Perl4. Changing C<$#> yourself is deprecated.
35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
42To get rid of the superfluous digits, just use a format (eg,
43C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 44See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 45
68dc0745 46=head2 Why isn't my octal data interpreted correctly?
47
48Perl only understands octal and hex numbers as such when they occur
49as literals in your program. If they are read in from somewhere and
50assigned, no automatic conversion takes place. You must explicitly
51use oct() or hex() if you want the values converted. oct() interprets
52both hex ("0x350") numbers and octal ones ("0350" or even without the
53leading "0", like "377"), while hex() only converts hexadecimal ones,
54with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
55
56This problem shows up most often when people try using chmod(), mkdir(),
57umask(), or sysopen(), which all want permissions in octal.
58
59 chmod(644, $file); # WRONG -- perl -w catches this
60 chmod(0644, $file); # right
61
65acb1b1 62=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 63
92c2ed05 64Remember that int() merely truncates toward 0. For rounding to a
65certain number of digits, sprintf() or printf() is usually the easiest
66route.
67
68 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 69
70The POSIX module (part of the standard perl distribution) implements
71ceil(), floor(), and a number of other mathematical and trigonometric
72functions.
73
92c2ed05 74 use POSIX;
75 $ceil = ceil(3.5); # 4
76 $floor = floor(3.5); # 3
77
46fc3d4c 78In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
79module. With 5.004, the Math::Trig module (part of the standard perl
80distribution) implements the trigonometric functions. Internally it
81uses the Math::Complex module and some functions can break out from
82the real axis into the complex plane, for example the inverse sine of
832.
68dc0745 84
85Rounding in financial applications can have serious implications, and
86the rounding method used should be specified precisely. In these
87cases, it probably pays not to trust whichever system rounding is
88being used by Perl, but to instead implement the rounding function you
89need yourself.
90
65acb1b1 91To see why, notice how you'll still have an issue on half-way-point
92alternation:
93
94 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
95
96 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
97 0.8 0.8 0.9 0.9 1.0 1.0
98
99Don't blame Perl. It's the same as in C. IEEE says we have to do this.
100Perl numbers whose absolute values are integers under 2**31 (on 32 bit
101machines) will work pretty much like mathematical integers. Other numbers
102are not guaranteed.
103
68dc0745 104=head2 How do I convert bits into ints?
105
92c2ed05 106To turn a string of 1s and 0s like C<10110110> into a scalar containing
d92eb7b0 107its binary value, use the pack() and unpack() functions (documented in
108L<perlfunc/"pack" L<perlfunc/"unpack">):
68dc0745 109
d92eb7b0 110 $decimal = unpack('c', pack('B8', '10110110'));
111
112This packs the string C<10110110> into an eight bit binary structure.
113This is then unpack as a character, which returns its ordinal value.
114
115This does the same thing:
116
117 $decimal = ord(pack('B8', '10110110'));
68dc0745 118
119Here's an example of going the other way:
120
d92eb7b0 121 $binary_string = unpack('B*', "\x29");
68dc0745 122
65acb1b1 123=head2 Why doesn't & work the way I want it to?
124
125The behavior of binary arithmetic operators depends on whether they're
126used on numbers or strings. The operators treat a string as a series
127of bits and work with that (the string C<"3"> is the bit pattern
128C<00110011>). The operators work with the binary form of a number
129(the number C<3> is treated as the bit pattern C<00000011>).
130
131So, saying C<11 & 3> performs the "and" operation on numbers (yielding
132C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
133(yielding C<"1">).
134
135Most problems with C<&> and C<|> arise because the programmer thinks
136they have a number but really it's a string. The rest arise because
137the programmer says:
138
139 if ("\020\020" & "\101\101") {
140 # ...
141 }
142
143but a string consisting of two null bytes (the result of C<"\020\020"
144& "\101\101">) is not a false value in Perl. You need:
145
146 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
147 # ...
148 }
149
68dc0745 150=head2 How do I multiply matrices?
151
152Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
153or the PDL extension (also available from CPAN).
154
155=head2 How do I perform an operation on a series of integers?
156
157To call a function on each element in an array, and collect the
158results, use:
159
160 @results = map { my_func($_) } @array;
161
162For example:
163
164 @triple = map { 3 * $_ } @single;
165
166To call a function on each element of an array, but ignore the
167results:
168
169 foreach $iterator (@array) {
65acb1b1 170 some_func($iterator);
68dc0745 171 }
172
173To call a function on each integer in a (small) range, you B<can> use:
174
65acb1b1 175 @results = map { some_func($_) } (5 .. 25);
68dc0745 176
177but you should be aware that the C<..> operator creates an array of
178all integers in the range. This can take a lot of memory for large
179ranges. Instead use:
180
181 @results = ();
182 for ($i=5; $i < 500_005; $i++) {
65acb1b1 183 push(@results, some_func($i));
68dc0745 184 }
185
186=head2 How can I output Roman numerals?
187
188Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
189
190=head2 Why aren't my random numbers random?
191
65acb1b1 192If you're using a version of Perl before 5.004, you must call C<srand>
193once at the start of your program to seed the random number generator.
1945.004 and later automatically call C<srand> at the beginning. Don't
195call C<srand> more than once--you make your numbers less random, rather
196than more.
92c2ed05 197
65acb1b1 198Computers are good at being predictable and bad at being random
199(despite appearances caused by bugs in your programs :-).
200http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
201Phoenix, talks more about this.. John von Neumann said, ``Anyone who
202attempts to generate random numbers by deterministic means is, of
203course, living in a state of sin.''
204
205If you want numbers that are more random than C<rand> with C<srand>
206provides, you should also check out the Math::TrulyRandom module from
207CPAN. It uses the imperfections in your system's timer to generate
208random numbers, but this takes quite a while. If you want a better
92c2ed05 209pseudorandom generator than comes with your operating system, look at
65acb1b1 210``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 211
212=head1 Data: Dates
213
214=head2 How do I find the week-of-the-year/day-of-the-year?
215
216The day of the year is in the array returned by localtime() (see
217L<perlfunc/"localtime">):
218
219 $day_of_year = (localtime(time()))[7];
220
221or more legibly (in 5.004 or higher):
222
223 use Time::localtime;
224 $day_of_year = localtime(time())->yday;
225
226You can find the week of the year by dividing this by 7:
227
228 $week_of_year = int($day_of_year / 7);
229
92c2ed05 230Of course, this believes that weeks start at zero. The Date::Calc
231module from CPAN has a lot of date calculation functions, including
5e3006a4 232day of the year, week of the year, and so on. Note that not
65acb1b1 233all businesses consider ``week 1'' to be the same; for example,
234American businesses often consider the first week with a Monday
235in it to be Work Week #1, despite ISO 8601, which considers
236WW1 to be the first week with a Thursday in it.
68dc0745 237
d92eb7b0 238=head2 How do I find the current century or millennium?
239
240Use the following simple functions:
241
242 sub get_century {
243 return int((((localtime(shift || time))[5] + 1999))/100);
244 }
245 sub get_millennium {
246 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
247 }
248
249On some systems, you'll find that the POSIX module's strftime() function
250has been extended in a non-standard way to use a C<%C> format, which they
251sometimes claim is the "century". It isn't, because on most such systems,
252this is only the first two digits of the four-digit year, and thus cannot
253be used to reliably determine the current century or millennium.
254
92c2ed05 255=head2 How can I compare two dates and find the difference?
68dc0745 256
92c2ed05 257If you're storing your dates as epoch seconds then simply subtract one
258from the other. If you've got a structured date (distinct year, day,
d92eb7b0 259month, hour, minute, seconds values), then for reasons of accessibility,
260simplicity, and efficiency, merely use either timelocal or timegm (from
261the Time::Local module in the standard distribution) to reduce structured
262dates to epoch seconds. However, if you don't know the precise format of
263your dates, then you should probably use either of the Date::Manip and
264Date::Calc modules from CPAN before you go hacking up your own parsing
265routine to handle arbitrary date formats.
68dc0745 266
267=head2 How can I take a string and turn it into epoch seconds?
268
269If it's a regular enough string that it always has the same format,
92c2ed05 270you can split it up and pass the parts to C<timelocal> in the standard
271Time::Local module. Otherwise, you should look into the Date::Calc
272and Date::Manip modules from CPAN.
68dc0745 273
274=head2 How can I find the Julian Day?
275
d92eb7b0 276You could use Date::Calc's Delta_Days function and calculate the number
277of days from there. Assuming that's what you really want, that is.
278
279Before you immerse yourself too deeply in this, be sure to verify that it
280is the I<Julian> Day you really want. Are they really just interested in
281a way of getting serial days so that they can do date arithmetic? If you
282are interested in performing date arithmetic, this can be done using
283either Date::Manip or Date::Calc, without converting to Julian Day first.
284
285There is too much confusion on this issue to cover in this FAQ, but the
286term is applied (correctly) to a calendar now supplanted by the Gregorian
287Calendar, with the Julian Calendar failing to adjust properly for leap
288years on centennial years (among other annoyances). The term is also used
289(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days
290since a particular starting time or `epoch', usually 1970 in the Unix
291world and 1980 in the MS-DOS/Windows world. If you find that it is not
292the first meaning that you really want, then check out the Date::Manip
293and Date::Calc modules. (Thanks to David Cassell for most of this text.)
be94a901 294
d92eb7b0 295There is also an example of Julian date calculation that should help you in
296http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz
68dc0745 297
65acb1b1 298=head2 How do I find yesterday's date?
299
300The C<time()> function returns the current time in seconds since the
d92eb7b0 301epoch. Take twenty-four hours off that:
65acb1b1 302
303 $yesterday = time() - ( 24 * 60 * 60 );
304
305Then you can pass this to C<localtime()> and get the individual year,
306month, day, hour, minute, seconds values.
307
d92eb7b0 308Note very carefully that the code above assumes that your days are
309twenty-four hours each. For most people, there are two days a year
310when they aren't: the switch to and from summer time throws this off.
311A solution to this issue is offered by Russ Allbery.
312
313 sub yesterday {
314 my $now = defined $_[0] ? $_[0] : time;
315 my $then = $now - 60 * 60 * 24;
316 my $ndst = (localtime $now)[8] > 0;
317 my $tdst = (localtime $then)[8] > 0;
318 $then - ($tdst - $ndst) * 60 * 60;
319 }
320 # Should give you "this time yesterday" in seconds since epoch relative to
321 # the first argument or the current time if no argument is given and
322 # suitable for passing to localtime or whatever else you need to do with
323 # it. $ndst is whether we're currently in daylight savings time; $tdst is
324 # whether the point 24 hours ago was in daylight savings time. If $tdst
325 # and $ndst are the same, a boundary wasn't crossed, and the correction
326 # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
327 # from yesterday's time since we gained an extra hour while going off
328 # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
329 # negative hour (add an hour) to yesterday's time since we lost an hour.
330 #
331 # All of this is because during those days when one switches off or onto
332 # DST, a "day" isn't 24 hours long; it's either 23 or 25.
333 #
334 # The explicit settings of $ndst and $tdst are necessary because localtime
335 # only says it returns the system tm struct, and the system tm struct at
336 # least on Solaris doesn't guarantee any particuliar positive value (like,
337 # say, 1) for isdst, just a positive value. And that value can
338 # potentially be negative, if DST information isn't available (this sub
339 # just treats those cases like no DST).
340 #
341 # Note that between 2am and 3am on the day after the time zone switches
342 # off daylight savings time, the exact hour of "yesterday" corresponding
343 # to the current hour is not clearly defined. Note also that if used
344 # between 2am and 3am the day after the change to daylight savings time,
345 # the result will be between 3am and 4am of the previous day; it's
346 # arguable whether this is correct.
347 #
348 # This sub does not attempt to deal with leap seconds (most things don't).
349 #
350 # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
351 # This code is in the public domain
352
5a964f20 353=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
68dc0745 354
65acb1b1 355Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
356Y2K compliant (whatever that means). The programmers you've hired to
357use it, however, probably are not.
358
359Long answer: The question belies a true understanding of the issue.
360Perl is just as Y2K compliant as your pencil--no more, and no less.
361Can you use your pencil to write a non-Y2K-compliant memo? Of course
362you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 363
65acb1b1 364The date and time functions supplied with perl (gmtime and localtime)
365supply adequate information to determine the year well beyond 2000
366(2038 is when trouble strikes for 32-bit machines). The year returned
367by these functions when used in an array context is the year minus 1900.
368For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
369number. To avoid the year 2000 problem simply do not treat the year as
370a 2-digit number. It isn't.
68dc0745 371
5a964f20 372When gmtime() and localtime() are used in scalar context they return
68dc0745 373a timestamp string that contains a fully-expanded year. For example,
374C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
3752001". There's no year 2000 problem here.
376
5a964f20 377That doesn't mean that Perl can't be used to create non-Y2K compliant
378programs. It can. But so can your pencil. It's the fault of the user,
379not the language. At the risk of inflaming the NRA: ``Perl doesn't
380break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
381a longer exposition.
382
68dc0745 383=head1 Data: Strings
384
385=head2 How do I validate input?
386
387The answer to this question is usually a regular expression, perhaps
5a964f20 388with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 389addresses, etc.) for details.
390
391=head2 How do I unescape a string?
392
92c2ed05 393It depends just what you mean by ``escape''. URL escapes are dealt
394with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
68dc0745 395character are removed with:
396
397 s/\\(.)/$1/g;
398
92c2ed05 399This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 400
401=head2 How do I remove consecutive pairs of characters?
402
92c2ed05 403To turn C<"abbcccd"> into C<"abccd">:
68dc0745 404
d92eb7b0 405 s/(.)\1/$1/g; # add /s to include newlines
406
407Here's a solution that turns "abbcccd" to "abcd":
408
409 y///cs; # y == tr, but shorter :-)
68dc0745 410
411=head2 How do I expand function calls in a string?
412
413This is documented in L<perlref>. In general, this is fraught with
414quoting and readability problems, but it is possible. To interpolate
5a964f20 415a subroutine call (in list context) into a string:
68dc0745 416
417 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
418
419If you prefer scalar context, similar chicanery is also useful for
420arbitrary expressions:
421
422 print "That yields ${\($n + 5)} widgets\n";
423
92c2ed05 424Version 5.004 of Perl had a bug that gave list context to the
425expression in C<${...}>, but this is fixed in version 5.005.
426
427See also ``How can I expand variables in text strings?'' in this
428section of the FAQ.
46fc3d4c 429
68dc0745 430=head2 How do I find matching/nesting anything?
431
92c2ed05 432This isn't something that can be done in one regular expression, no
433matter how complicated. To find something between two single
434characters, a pattern like C</x([^x]*)x/> will get the intervening
435bits in $1. For multiple ones, then something more like
436C</alpha(.*?)omega/> would be needed. But none of these deals with
437nested patterns, nor can they. For that you'll have to write a
438parser.
439
440If you are serious about writing a parser, there are a number of
441modules or oddities that will make your life a lot easier. There is
442the CPAN module Parse::RecDescent, the standard module Text::Balanced,
65acb1b1 443the byacc program, the CPAN module Parse::Yapp, and Mark-Jason
444Dominus's excellent I<py> tool at http://www.plover.com/~mjd/perl/py/
445.
68dc0745 446
92c2ed05 447One simple destructive, inside-out approach that you might try is to
448pull out the smallest nesting parts one at a time:
5a964f20 449
d92eb7b0 450 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 451 # do something with $1
452 }
453
65acb1b1 454A more complicated and sneaky approach is to make Perl's regular
455expression engine do it for you. This is courtesy Dean Inada, and
456rather has the nature of an Obfuscated Perl Contest entry, but it
457really does work:
458
459 # $_ contains the string to parse
460 # BEGIN and END are the opening and closing markers for the
461 # nested text.
462
463 @( = ('(','');
464 @) = (')','');
465 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
466 @$ = (eval{/$re/},$@!~/unmatched/);
467 print join("\n",@$[0..$#$]) if( $$[-1] );
468
68dc0745 469=head2 How do I reverse a string?
470
5a964f20 471Use reverse() in scalar context, as documented in
68dc0745 472L<perlfunc/reverse>.
473
474 $reversed = reverse $string;
475
476=head2 How do I expand tabs in a string?
477
5a964f20 478You can do it yourself:
68dc0745 479
480 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
481
482Or you can just use the Text::Tabs module (part of the standard perl
483distribution).
484
485 use Text::Tabs;
486 @expanded_lines = expand(@lines_with_tabs);
487
488=head2 How do I reformat a paragraph?
489
490Use Text::Wrap (part of the standard perl distribution):
491
492 use Text::Wrap;
493 print wrap("\t", ' ', @paragraphs);
494
92c2ed05 495The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 496newlines. Text::Wrap doesn't justify the lines (flush-right).
497
68dc0745 498=head2 How can I access/change the first N letters of a string?
499
500There are many ways. If you just want to grab a copy, use
92c2ed05 501substr():
68dc0745 502
503 $first_byte = substr($a, 0, 1);
504
505If you want to modify part of a string, the simplest way is often to
506use substr() as an lvalue:
507
508 substr($a, 0, 3) = "Tom";
509
92c2ed05 510Although those with a pattern matching kind of thought process will
511likely prefer:
68dc0745 512
513 $a =~ s/^.../Tom/;
514
515=head2 How do I change the Nth occurrence of something?
516
92c2ed05 517You have to keep track of N yourself. For example, let's say you want
518to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 519C<"whosoever"> or C<"whomsoever">, case insensitively. These
520all assume that $_ contains the string to be altered.
68dc0745 521
522 $count = 0;
523 s{((whom?)ever)}{
524 ++$count == 5 # is it the 5th?
525 ? "${2}soever" # yes, swap
526 : $1 # renege and leave it there
d92eb7b0 527 }ige;
68dc0745 528
5a964f20 529In the more general case, you can use the C</g> modifier in a C<while>
530loop, keeping count of matches.
531
532 $WANT = 3;
533 $count = 0;
d92eb7b0 534 $_ = "One fish two fish red fish blue fish";
5a964f20 535 while (/(\w+)\s+fish\b/gi) {
536 if (++$count == $WANT) {
537 print "The third fish is a $1 one.\n";
5a964f20 538 }
539 }
540
92c2ed05 541That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 542repetition count and repeated pattern like this:
543
544 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
545
68dc0745 546=head2 How can I count the number of occurrences of a substring within a string?
547
548There are a number of ways, with varying efficiency: If you want a
549count of a certain single character (X) within a string, you can use the
550C<tr///> function like so:
551
368c9434 552 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 553 $count = ($string =~ tr/X//);
d92eb7b0 554 print "There are $count X characters in the string";
68dc0745 555
556This is fine if you are just looking for a single character. However,
557if you are trying to count multiple character substrings within a
558larger string, C<tr///> won't work. What you can do is wrap a while()
559loop around a global pattern match. For example, let's count negative
560integers:
561
562 $string = "-9 55 48 -2 23 -76 4 14 -44";
563 while ($string =~ /-\d+/g) { $count++ }
564 print "There are $count negative numbers in the string";
565
566=head2 How do I capitalize all the words on one line?
567
568To make the first letter of each word upper case:
3fe9a6f1 569
68dc0745 570 $line =~ s/\b(\w)/\U$1/g;
571
46fc3d4c 572This has the strange effect of turning "C<don't do it>" into "C<Don'T
573Do It>". Sometimes you might want this, instead (Suggested by Brian
92c2ed05 574Foy):
46fc3d4c 575
576 $string =~ s/ (
577 (^\w) #at the beginning of the line
578 | # or
579 (\s\w) #preceded by whitespace
580 )
581 /\U$1/xg;
582 $string =~ /([\w']+)/\u\L$1/g;
583
68dc0745 584To make the whole line upper case:
3fe9a6f1 585
68dc0745 586 $line = uc($line);
587
588To force each word to be lower case, with the first letter upper case:
3fe9a6f1 589
68dc0745 590 $line =~ s/(\w+)/\u\L$1/g;
591
5a964f20 592You can (and probably should) enable locale awareness of those
593characters by placing a C<use locale> pragma in your program.
92c2ed05 594See L<perllocale> for endless details on locales.
5a964f20 595
65acb1b1 596This is sometimes referred to as putting something into "title
d92eb7b0 597case", but that's not quite accurate. Consider the proper
65acb1b1 598capitalization of the movie I<Dr. Strangelove or: How I Learned to
599Stop Worrying and Love the Bomb>, for example.
600
68dc0745 601=head2 How can I split a [character] delimited string except when inside
602[character]? (Comma-separated files)
603
604Take the example case of trying to split a string that is comma-separated
605into its different fields. (We'll pretend you said comma-separated, not
606comma-delimited, which is different and almost never what you mean.) You
607can't use C<split(/,/)> because you shouldn't split if the comma is inside
608quotes. For example, take a data line like this:
609
610 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
611
612Due to the restriction of the quotes, this is a fairly complex
613problem. Thankfully, we have Jeffrey Friedl, author of a highly
614recommended book on regular expressions, to handle these for us. He
615suggests (assuming your string is contained in $text):
616
617 @new = ();
618 push(@new, $+) while $text =~ m{
619 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
620 | ([^,]+),?
621 | ,
622 }gx;
623 push(@new, undef) if substr($text,-1,1) eq ',';
624
46fc3d4c 625If you want to represent quotation marks inside a
626quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 627C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 628this section.
629
68dc0745 630Alternatively, the Text::ParseWords module (part of the standard perl
631distribution) lets you say:
632
633 use Text::ParseWords;
634 @new = quotewords(",", 0, $text);
635
65acb1b1 636There's also a Text::CSV module on CPAN.
637
68dc0745 638=head2 How do I strip blank space from the beginning/end of a string?
639
5a964f20 640Although the simplest approach would seem to be:
68dc0745 641
642 $string =~ s/^\s*(.*?)\s*$/$1/;
643
d92eb7b0 644Not only is this unnecessarily slow and destructive, it also fails with
645embedded newlines. It is much faster to do this operation in two steps:
68dc0745 646
647 $string =~ s/^\s+//;
648 $string =~ s/\s+$//;
649
650Or more nicely written as:
651
652 for ($string) {
653 s/^\s+//;
654 s/\s+$//;
655 }
656
5e3006a4 657This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 658behavior to factor out common code. You can do this
659on several strings at once, or arrays, or even the
d92eb7b0 660values of a hash if you use a slice:
5a964f20 661
662 # trim whitespace in the scalar, the array,
663 # and all the values in the hash
664 foreach ($scalar, @array, @hash{keys %hash}) {
665 s/^\s+//;
666 s/\s+$//;
667 }
668
65acb1b1 669=head2 How do I pad a string with blanks or pad a number with zeroes?
670
d92eb7b0 671(This answer contributed by Uri Guttman, with kibitzing from
672Bart Lateur.)
65acb1b1 673
674In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 675to pad the string, C<$text> or C<$num> contains the string to be padded,
676and C<$pad_char> contains the padding character. You can use a single
677character string constant instead of the C<$pad_char> variable if you
678know what it is in advance. And in the same way you can use an integer in
679place of C<$pad_len> if you know the pad length in advance.
65acb1b1 680
d92eb7b0 681The simplest method uses the C<sprintf> function. It can pad on the left
682or right with blanks and on the left with zeroes and it will not
683truncate the result. The C<pack> function can only pad strings on the
684right with blanks and it will truncate the result to a maximum length of
685C<$pad_len>.
65acb1b1 686
d92eb7b0 687 # Left padding a string with blanks (no truncation):
688 $padded = sprintf("%${pad_len}s", $text);
65acb1b1 689
d92eb7b0 690 # Right padding a string with blanks (no truncation):
691 $padded = sprintf("%-${pad_len}s", $text);
65acb1b1 692
d92eb7b0 693 # Left padding a number with 0 (no truncation):
694 $padded = sprintf("%0${pad_len}d", $num);
65acb1b1 695
d92eb7b0 696 # Right padding a string with blanks using pack (will truncate):
697 $padded = pack("A$pad_len",$text);
65acb1b1 698
d92eb7b0 699If you need to pad with a character other than blank or zero you can use
700one of the following methods. They all generate a pad string with the
701C<x> operator and combine that with C<$text>. These methods do
702not truncate C<$text>.
65acb1b1 703
d92eb7b0 704Left and right padding with any character, creating a new string:
65acb1b1 705
d92eb7b0 706 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
707 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 708
d92eb7b0 709Left and right padding with any character, modifying C<$text> directly:
65acb1b1 710
d92eb7b0 711 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
712 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 713
68dc0745 714=head2 How do I extract selected columns from a string?
715
716Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 717If you prefer thinking in terms of columns instead of widths,
718you can use this kind of thing:
719
720 # determine the unpack format needed to split Linux ps output
721 # arguments are cut columns
722 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
723
724 sub cut2fmt {
725 my(@positions) = @_;
726 my $template = '';
727 my $lastpos = 1;
728 for my $place (@positions) {
729 $template .= "A" . ($place - $lastpos) . " ";
730 $lastpos = $place;
731 }
732 $template .= "A*";
733 return $template;
734 }
68dc0745 735
736=head2 How do I find the soundex value of a string?
737
738Use the standard Text::Soundex module distributed with perl.
d92eb7b0 739But before you do so, you may want to determine whether `soundex' is in
740fact what you think it is. Knuth's soundex algorithm compresses words
741into a small space, and so it does not necessarily distinguish between
742two words which you might want to appear separately. For example, the
743last names `Knuth' and `Kant' are both mapped to the soundex code K530.
744If Text::Soundex does not do what you are looking for, you might want
745to consider the String::Approx module available at CPAN.
68dc0745 746
747=head2 How can I expand variables in text strings?
748
749Let's assume that you have a string like:
750
751 $text = 'this has a $foo in it and a $bar';
5a964f20 752
753If those were both global variables, then this would
754suffice:
755
65acb1b1 756 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 757
5a964f20 758But since they are probably lexicals, or at least, they could
759be, you'd have to do this:
68dc0745 760
761 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 762 die if $@; # needed /ee, not /e
68dc0745 763
5a964f20 764It's probably better in the general case to treat those
765variables as entries in some special hash. For example:
766
767 %user_defs = (
768 foo => 23,
769 bar => 19,
770 );
771 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 772
92c2ed05 773See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 774of the FAQ.
775
68dc0745 776=head2 What's wrong with always quoting "$vars"?
777
778The problem is that those double-quotes force stringification,
779coercing numbers and references into strings, even when you
65acb1b1 780don't want them to be. Think of it this way: double-quote
781expansion is used to produce new strings. If you already
782have a string, why do you need more?
68dc0745 783
784If you get used to writing odd things like these:
785
786 print "$var"; # BAD
787 $new = "$old"; # BAD
788 somefunc("$var"); # BAD
789
790You'll be in trouble. Those should (in 99.8% of the cases) be
791the simpler and more direct:
792
793 print $var;
794 $new = $old;
795 somefunc($var);
796
797Otherwise, besides slowing you down, you're going to break code when
798the thing in the scalar is actually neither a string nor a number, but
799a reference:
800
801 func(\@array);
802 sub func {
803 my $aref = shift;
804 my $oref = "$aref"; # WRONG
805 }
806
807You can also get into subtle problems on those few operations in Perl
808that actually do care about the difference between a string and a
809number, such as the magical C<++> autoincrement operator or the
810syscall() function.
811
5a964f20 812Stringification also destroys arrays.
813
814 @lines = `command`;
815 print "@lines"; # WRONG - extra blanks
816 print @lines; # right
817
65acb1b1 818=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 819
820Check for these three things:
821
822=over 4
823
824=item 1. There must be no space after the << part.
825
826=item 2. There (probably) should be a semicolon at the end.
827
828=item 3. You can't (easily) have any space in front of the tag.
829
830=back
831
5a964f20 832If you want to indent the text in the here document, you
833can do this:
834
835 # all in one
836 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
837 your text
838 goes here
839 HERE_TARGET
840
841But the HERE_TARGET must still be flush against the margin.
842If you want that indented also, you'll have to quote
843in the indentation.
844
845 ($quote = <<' FINIS') =~ s/^\s+//gm;
846 ...we will have peace, when you and all your works have
847 perished--and the works of your dark master to whom you
848 would deliver us. You are a liar, Saruman, and a corrupter
849 of men's hearts. --Theoden in /usr/src/perl/taint.c
850 FINIS
851 $quote =~ s/\s*--/\n--/;
852
853A nice general-purpose fixer-upper function for indented here documents
854follows. It expects to be called with a here document as its argument.
855It looks to see whether each line begins with a common substring, and
856if so, strips that off. Otherwise, it takes the amount of leading
857white space found on the first line and removes that much off each
858subsequent line.
859
860 sub fix {
861 local $_ = shift;
862 my ($white, $leader); # common white space and common leading string
863 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
864 ($white, $leader) = ($2, quotemeta($1));
865 } else {
866 ($white, $leader) = (/^(\s+)/, '');
867 }
868 s/^\s*?$leader(?:$white)?//gm;
869 return $_;
870 }
871
c8db1d39 872This works with leading special strings, dynamically determined:
5a964f20 873
874 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
875 @@@ int
876 @@@ runops() {
877 @@@ SAVEI32(runlevel);
878 @@@ runlevel++;
d92eb7b0 879 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 880 @@@ TAINT_NOT;
881 @@@ return 0;
882 @@@ }
883 MAIN_INTERPRETER_LOOP
884
885Or with a fixed amount of leading white space, with remaining
886indentation correctly preserved:
887
888 $poem = fix<<EVER_ON_AND_ON;
889 Now far ahead the Road has gone,
890 And I must follow, if I can,
891 Pursuing it with eager feet,
892 Until it joins some larger way
893 Where many paths and errands meet.
894 And whither then? I cannot say.
895 --Bilbo in /usr/src/perl/pp_ctl.c
896 EVER_ON_AND_ON
897
68dc0745 898=head1 Data: Arrays
899
65acb1b1 900=head2 What is the difference between a list and an array?
901
902An array has a changeable length. A list does not. An array is something
903you can push or pop, while a list is a set of values. Some people make
904the distinction that a list is a value while an array is a variable.
905Subroutines are passed and return lists, you put things into list
906context, you initialize arrays with lists, and you foreach() across
907a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
908in scalar context behave like the number of elements in them, subroutines
909access their arguments through the array C<@_>, push/pop/shift only work
910on arrays.
911
912As a side note, there's no such thing as a list in scalar context.
913When you say
914
915 $scalar = (2, 5, 7, 9);
916
d92eb7b0 917you're using the comma operator in scalar context, so it uses the scalar
918comma operator. There never was a list there at all! This causes the
919last value to be returned: 9.
65acb1b1 920
68dc0745 921=head2 What is the difference between $array[1] and @array[1]?
922
923The former is a scalar value, the latter an array slice, which makes
924it a list with one (scalar) value. You should use $ when you want a
925scalar value (most of the time) and @ when you want a list with one
926scalar value in it (very, very rarely; nearly never, in fact).
927
928Sometimes it doesn't make a difference, but sometimes it does.
929For example, compare:
930
931 $good[0] = `some program that outputs several lines`;
932
933with
934
935 @bad[0] = `same program that outputs several lines`;
936
937The B<-w> flag will warn you about these matters.
938
d92eb7b0 939=head2 How can I remove duplicate elements from a list or array?
68dc0745 940
941There are several possible ways, depending on whether the array is
942ordered and whether you wish to preserve the ordering.
943
944=over 4
945
946=item a) If @in is sorted, and you want @out to be sorted:
5a964f20 947(this assumes all true values in the array)
68dc0745 948
949 $prev = 'nonesuch';
950 @out = grep($_ ne $prev && ($prev = $_), @in);
951
c8db1d39 952This is nice in that it doesn't use much extra memory, simulating
953uniq(1)'s behavior of removing only adjacent duplicates. It's less
954nice in that it won't work with false values like undef, 0, or "";
955"0 but true" is ok, though.
68dc0745 956
957=item b) If you don't know whether @in is sorted:
958
959 undef %saw;
960 @out = grep(!$saw{$_}++, @in);
961
962=item c) Like (b), but @in contains only small integers:
963
964 @out = grep(!$saw[$_]++, @in);
965
966=item d) A way to do (b) without any loops or greps:
967
968 undef %saw;
969 @saw{@in} = ();
970 @out = sort keys %saw; # remove sort if undesired
971
972=item e) Like (d), but @in contains only small positive integers:
973
974 undef @ary;
975 @ary[@in] = @in;
976 @out = @ary;
977
978=back
979
65acb1b1 980But perhaps you should have been using a hash all along, eh?
981
5a964f20 982=head2 How can I tell whether a list or array contains a certain element?
983
984Hearing the word "in" is an I<in>dication that you probably should have
985used a hash, not a list or array, to store your data. Hashes are
986designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 987
5a964f20 988That being said, there are several ways to approach this. If you
989are going to make this query many times over arbitrary string values,
990the fastest way is probably to invert the original array and keep an
68dc0745 991associative array lying about whose keys are the first array's values.
992
993 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
994 undef %is_blue;
995 for (@blues) { $is_blue{$_} = 1 }
996
997Now you can check whether $is_blue{$some_color}. It might have been a
998good idea to keep the blues all in a hash in the first place.
999
1000If the values are all small integers, you could use a simple indexed
1001array. This kind of an array will take up less space:
1002
1003 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1004 undef @is_tiny_prime;
d92eb7b0 1005 for (@primes) { $is_tiny_prime[$_] = 1 }
1006 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1007
1008Now you check whether $is_tiny_prime[$some_number].
1009
1010If the values in question are integers instead of strings, you can save
1011quite a lot of space by using bit strings instead:
1012
1013 @articles = ( 1..10, 150..2000, 2017 );
1014 undef $read;
7b8d334a 1015 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1016
1017Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1018
1019Please do not use
1020
1021 $is_there = grep $_ eq $whatever, @array;
1022
1023or worse yet
1024
1025 $is_there = grep /$whatever/, @array;
1026
1027These are slow (checks every element even if the first matches),
1028inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1029regex characters in $whatever?). If you're only testing once, then
65acb1b1 1030use:
1031
1032 $is_there = 0;
1033 foreach $elt (@array) {
1034 if ($elt eq $elt_to_find) {
1035 $is_there = 1;
1036 last;
1037 }
1038 }
1039 if ($is_there) { ... }
68dc0745 1040
1041=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1042
1043Use a hash. Here's code to do both and more. It assumes that
1044each element is unique in a given array:
1045
1046 @union = @intersection = @difference = ();
1047 %count = ();
1048 foreach $element (@array1, @array2) { $count{$element}++ }
1049 foreach $element (keys %count) {
1050 push @union, $element;
1051 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1052 }
1053
d92eb7b0 1054Note that this is the I<symmetric difference>, that is, all elements in
1055either A or in B, but not in both. Think of it as an xor operation.
1056
65acb1b1 1057=head2 How do I test whether two arrays or hashes are equal?
1058
1059The following code works for single-level arrays. It uses a stringwise
1060comparison, and does not distinguish defined versus undefined empty
1061strings. Modify if you have other needs.
1062
1063 $are_equal = compare_arrays(\@frogs, \@toads);
1064
1065 sub compare_arrays {
1066 my ($first, $second) = @_;
1067 local $^W = 0; # silence spurious -w undef complaints
1068 return 0 unless @$first == @$second;
1069 for (my $i = 0; $i < @$first; $i++) {
1070 return 0 if $first->[$i] ne $second->[$i];
1071 }
1072 return 1;
1073 }
1074
1075For multilevel structures, you may wish to use an approach more
1076like this one. It uses the CPAN module FreezeThaw:
1077
1078 use FreezeThaw qw(cmpStr);
1079 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1080
1081 printf "a and b contain %s arrays\n",
1082 cmpStr(\@a, \@b) == 0
1083 ? "the same"
1084 : "different";
1085
1086This approach also works for comparing hashes. Here
1087we'll demonstrate two different answers:
1088
1089 use FreezeThaw qw(cmpStr cmpStrHard);
1090
1091 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1092 $a{EXTRA} = \%b;
1093 $b{EXTRA} = \%a;
1094
1095 printf "a and b contain %s hashes\n",
1096 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1097
1098 printf "a and b contain %s hashes\n",
1099 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1100
1101
1102The first reports that both those the hashes contain the same data,
1103while the second reports that they do not. Which you prefer is left as
1104an exercise to the reader.
1105
68dc0745 1106=head2 How do I find the first array element for which a condition is true?
1107
1108You can use this if you care about the index:
1109
65acb1b1 1110 for ($i= 0; $i < @array; $i++) {
68dc0745 1111 if ($array[$i] eq "Waldo") {
1112 $found_index = $i;
1113 last;
1114 }
1115 }
1116
1117Now C<$found_index> has what you want.
1118
1119=head2 How do I handle linked lists?
1120
1121In general, you usually don't need a linked list in Perl, since with
1122regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1123or you can use splice to add and/or remove arbitrary number of elements at
1124arbitrary points. Both pop and shift are both O(1) operations on perl's
1125dynamic arrays. In the absence of shifts and pops, push in general
1126needs to reallocate on the order every log(N) times, and unshift will
1127need to copy pointers each time.
68dc0745 1128
1129If you really, really wanted, you could use structures as described in
1130L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1131to do. For example, imagine a list node like this:
1132
1133 $node = {
1134 VALUE => 42,
1135 LINK => undef,
1136 };
1137
1138You could walk the list this way:
1139
1140 print "List: ";
1141 for ($node = $head; $node; $node = $node->{LINK}) {
1142 print $node->{VALUE}, " ";
1143 }
1144 print "\n";
1145
1146You could grow the list this way:
1147
1148 my ($head, $tail);
1149 $tail = append($head, 1); # grow a new head
1150 for $value ( 2 .. 10 ) {
1151 $tail = append($tail, $value);
1152 }
1153
1154 sub append {
1155 my($list, $value) = @_;
1156 my $node = { VALUE => $value };
1157 if ($list) {
1158 $node->{LINK} = $list->{LINK};
1159 $list->{LINK} = $node;
1160 } else {
1161 $_[0] = $node; # replace caller's version
1162 }
1163 return $node;
1164 }
1165
1166But again, Perl's built-in are virtually always good enough.
68dc0745 1167
1168=head2 How do I handle circular lists?
1169
1170Circular lists could be handled in the traditional fashion with linked
1171lists, or you could just do something like this with an array:
1172
1173 unshift(@array, pop(@array)); # the last shall be first
1174 push(@array, shift(@array)); # and vice versa
1175
1176=head2 How do I shuffle an array randomly?
1177
5a964f20 1178Use this:
1179
1180 # fisher_yates_shuffle( \@array ) :
1181 # generate a random permutation of @array in place
1182 sub fisher_yates_shuffle {
1183 my $array = shift;
1184 my $i;
1185 for ($i = @$array; --$i; ) {
1186 my $j = int rand ($i+1);
1187 next if $i == $j;
1188 @$array[$i,$j] = @$array[$j,$i];
1189 }
1190 }
1191
1192 fisher_yates_shuffle( \@array ); # permutes @array in place
1193
d92eb7b0 1194You've probably seen shuffling algorithms that work using splice,
68dc0745 1195randomly picking another element to swap the current element with:
1196
1197 srand;
1198 @new = ();
1199 @old = 1 .. 10; # just a demo
1200 while (@old) {
1201 push(@new, splice(@old, rand @old, 1));
1202 }
1203
5a964f20 1204This is bad because splice is already O(N), and since you do it N times,
1205you just invented a quadratic algorithm; that is, O(N**2). This does
1206not scale, although Perl is so efficient that you probably won't notice
1207this until you have rather largish arrays.
68dc0745 1208
1209=head2 How do I process/modify each element of an array?
1210
1211Use C<for>/C<foreach>:
1212
1213 for (@lines) {
5a964f20 1214 s/foo/bar/; # change that word
1215 y/XZ/ZX/; # swap those letters
68dc0745 1216 }
1217
1218Here's another; let's compute spherical volumes:
1219
5a964f20 1220 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1221 $_ **= 3;
1222 $_ *= (4/3) * 3.14159; # this will be constant folded
1223 }
1224
5a964f20 1225If you want to do the same thing to modify the values of the hash,
1226you may not use the C<values> function, oddly enough. You need a slice:
1227
1228 for $orbit ( @orbits{keys %orbits} ) {
1229 ($orbit **= 3) *= (4/3) * 3.14159;
1230 }
1231
68dc0745 1232=head2 How do I select a random element from an array?
1233
1234Use the rand() function (see L<perlfunc/rand>):
1235
5a964f20 1236 # at the top of the program:
68dc0745 1237 srand; # not needed for 5.004 and later
5a964f20 1238
1239 # then later on
68dc0745 1240 $index = rand @array;
1241 $element = $array[$index];
1242
5a964f20 1243Make sure you I<only call srand once per program, if then>.
1244If you are calling it more than once (such as before each
1245call to rand), you're almost certainly doing something wrong.
1246
68dc0745 1247=head2 How do I permute N elements of a list?
1248
1249Here's a little program that generates all permutations
1250of all the words on each line of input. The algorithm embodied
5a964f20 1251in the permute() function should work on any list:
68dc0745 1252
1253 #!/usr/bin/perl -n
5a964f20 1254 # tsc-permute: permute each word of input
1255 permute([split], []);
1256 sub permute {
1257 my @items = @{ $_[0] };
1258 my @perms = @{ $_[1] };
1259 unless (@items) {
1260 print "@perms\n";
68dc0745 1261 } else {
5a964f20 1262 my(@newitems,@newperms,$i);
1263 foreach $i (0 .. $#items) {
1264 @newitems = @items;
1265 @newperms = @perms;
1266 unshift(@newperms, splice(@newitems, $i, 1));
1267 permute([@newitems], [@newperms]);
68dc0745 1268 }
1269 }
1270 }
1271
1272=head2 How do I sort an array by (anything)?
1273
1274Supply a comparison function to sort() (described in L<perlfunc/sort>):
1275
1276 @list = sort { $a <=> $b } @list;
1277
1278The default sort function is cmp, string comparison, which would
1279sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<E<lt>=E<gt>>, used above, is
1280the numerical comparison operator.
1281
1282If you have a complicated function needed to pull out the part you
1283want to sort on, then don't do it inside the sort function. Pull it
1284out first, because the sort BLOCK can be called many times for the
1285same element. Here's an example of how to pull out the first word
1286after the first number on each item, and then sort those words
1287case-insensitively.
1288
1289 @idx = ();
1290 for (@data) {
1291 ($item) = /\d+\s*(\S+)/;
1292 push @idx, uc($item);
1293 }
1294 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1295
1296Which could also be written this way, using a trick
1297that's come to be known as the Schwartzian Transform:
1298
1299 @sorted = map { $_->[0] }
1300 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1301 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1302
1303If you need to sort on several fields, the following paradigm is useful.
1304
1305 @sorted = sort { field1($a) <=> field1($b) ||
1306 field2($a) cmp field2($b) ||
1307 field3($a) cmp field3($b)
1308 } @data;
1309
1310This can be conveniently combined with precalculation of keys as given
1311above.
1312
1313See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
1314this approach.
1315
1316See also the question below on sorting hashes.
1317
1318=head2 How do I manipulate arrays of bits?
1319
1320Use pack() and unpack(), or else vec() and the bitwise operations.
1321
1322For example, this sets $vec to have bit N set if $ints[N] was set:
1323
1324 $vec = '';
1325 foreach(@ints) { vec($vec,$_,1) = 1 }
1326
1327And here's how, given a vector in $vec, you can
1328get those bits into your @ints array:
1329
1330 sub bitvec_to_list {
1331 my $vec = shift;
1332 my @ints;
1333 # Find null-byte density then select best algorithm
1334 if ($vec =~ tr/\0// / length $vec > 0.95) {
1335 use integer;
1336 my $i;
1337 # This method is faster with mostly null-bytes
1338 while($vec =~ /[^\0]/g ) {
1339 $i = -9 + 8 * pos $vec;
1340 push @ints, $i if vec($vec, ++$i, 1);
1341 push @ints, $i if vec($vec, ++$i, 1);
1342 push @ints, $i if vec($vec, ++$i, 1);
1343 push @ints, $i if vec($vec, ++$i, 1);
1344 push @ints, $i if vec($vec, ++$i, 1);
1345 push @ints, $i if vec($vec, ++$i, 1);
1346 push @ints, $i if vec($vec, ++$i, 1);
1347 push @ints, $i if vec($vec, ++$i, 1);
1348 }
1349 } else {
1350 # This method is a fast general algorithm
1351 use integer;
1352 my $bits = unpack "b*", $vec;
1353 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1354 push @ints, pos $bits while($bits =~ /1/g);
1355 }
1356 return \@ints;
1357 }
1358
1359This method gets faster the more sparse the bit vector is.
1360(Courtesy of Tim Bunce and Winfried Koenig.)
1361
65acb1b1 1362Here's a demo on how to use vec():
1363
1364 # vec demo
1365 $vector = "\xff\x0f\xef\xfe";
1366 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1367 unpack("N", $vector), "\n";
1368 $is_set = vec($vector, 23, 1);
1369 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1370 pvec($vector);
1371
1372 set_vec(1,1,1);
1373 set_vec(3,1,1);
1374 set_vec(23,1,1);
1375
1376 set_vec(3,1,3);
1377 set_vec(3,2,3);
1378 set_vec(3,4,3);
1379 set_vec(3,4,7);
1380 set_vec(3,8,3);
1381 set_vec(3,8,7);
1382
1383 set_vec(0,32,17);
1384 set_vec(1,32,17);
1385
1386 sub set_vec {
1387 my ($offset, $width, $value) = @_;
1388 my $vector = '';
1389 vec($vector, $offset, $width) = $value;
1390 print "offset=$offset width=$width value=$value\n";
1391 pvec($vector);
1392 }
1393
1394 sub pvec {
1395 my $vector = shift;
1396 my $bits = unpack("b*", $vector);
1397 my $i = 0;
1398 my $BASE = 8;
1399
1400 print "vector length in bytes: ", length($vector), "\n";
1401 @bytes = unpack("A8" x length($vector), $bits);
1402 print "bits are: @bytes\n\n";
1403 }
1404
68dc0745 1405=head2 Why does defined() return true on empty arrays and hashes?
1406
65acb1b1 1407The short story is that you should probably only use defined on scalars or
1408functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1409in the 5.004 release or later of Perl for more detail.
68dc0745 1410
1411=head1 Data: Hashes (Associative Arrays)
1412
1413=head2 How do I process an entire hash?
1414
1415Use the each() function (see L<perlfunc/each>) if you don't care
1416whether it's sorted:
1417
5a964f20 1418 while ( ($key, $value) = each %hash) {
68dc0745 1419 print "$key = $value\n";
1420 }
1421
1422If you want it sorted, you'll have to use foreach() on the result of
1423sorting the keys as shown in an earlier question.
1424
1425=head2 What happens if I add or remove keys from a hash while iterating over it?
1426
d92eb7b0 1427Don't do that. :-)
1428
1429[lwall] In Perl 4, you were not allowed to modify a hash at all while
1430interating over it. In Perl 5 you can delete from it, but you still
1431can't add to it, because that might cause a doubling of the hash table,
1432in which half the entries get copied up to the new top half of the
1433table, at which point you've totally bamboozled the interator code.
1434Even if the table doesn't double, there's no telling whether your new
1435entry will be inserted before or after the current iterator position.
1436
1437Either treasure up your changes and make them after the iterator finishes,
1438or use keys to fetch all the old keys at once, and iterate over the list
1439of keys.
68dc0745 1440
1441=head2 How do I look up a hash element by value?
1442
1443Create a reverse hash:
1444
1445 %by_value = reverse %by_key;
1446 $key = $by_value{$value};
1447
1448That's not particularly efficient. It would be more space-efficient
1449to use:
1450
1451 while (($key, $value) = each %by_key) {
1452 $by_value{$value} = $key;
1453 }
1454
d92eb7b0 1455If your hash could have repeated values, the methods above will only find
1456one of the associated keys. This may or may not worry you. If it does
1457worry you, you can always reverse the hash into a hash of arrays instead:
1458
1459 while (($key, $value) = each %by_key) {
1460 push @{$key_list_by_value{$value}}, $key;
1461 }
68dc0745 1462
1463=head2 How can I know how many entries are in a hash?
1464
1465If you mean how many keys, then all you have to do is
1466take the scalar sense of the keys() function:
1467
3fe9a6f1 1468 $num_keys = scalar keys %hash;
68dc0745 1469
d92eb7b0 1470In void context, the keys() function just resets the iterator, which is
1471faster for tied hashes than would be iterating through the whole
1472hash, one key-value pair at a time.
68dc0745 1473
1474=head2 How do I sort a hash (optionally by value instead of key)?
1475
1476Internally, hashes are stored in a way that prevents you from imposing
1477an order on key-value pairs. Instead, you have to sort a list of the
1478keys or values:
1479
1480 @keys = sort keys %hash; # sorted by key
1481 @keys = sort {
1482 $hash{$a} cmp $hash{$b}
1483 } keys %hash; # and by value
1484
1485Here we'll do a reverse numeric sort by value, and if two keys are
1486identical, sort by length of key, and if that fails, by straight ASCII
1487comparison of the keys (well, possibly modified by your locale -- see
1488L<perllocale>).
1489
1490 @keys = sort {
1491 $hash{$b} <=> $hash{$a}
1492 ||
1493 length($b) <=> length($a)
1494 ||
1495 $a cmp $b
1496 } keys %hash;
1497
1498=head2 How can I always keep my hash sorted?
1499
1500You can look into using the DB_File module and tie() using the
1501$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1502The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1503
1504=head2 What's the difference between "delete" and "undef" with hashes?
1505
1506Hashes are pairs of scalars: the first is the key, the second is the
1507value. The key will be coerced to a string, although the value can be
1508any kind of scalar: string, number, or reference. If a key C<$key> is
1509present in the array, C<exists($key)> will return true. The value for
1510a given key can be C<undef>, in which case C<$array{$key}> will be
1511C<undef> while C<$exists{$key}> will return true. This corresponds to
1512(C<$key>, C<undef>) being in the hash.
1513
1514Pictures help... here's the C<%ary> table:
1515
1516 keys values
1517 +------+------+
1518 | a | 3 |
1519 | x | 7 |
1520 | d | 0 |
1521 | e | 2 |
1522 +------+------+
1523
1524And these conditions hold
1525
1526 $ary{'a'} is true
1527 $ary{'d'} is false
1528 defined $ary{'d'} is true
1529 defined $ary{'a'} is true
1530 exists $ary{'a'} is true (perl5 only)
1531 grep ($_ eq 'a', keys %ary) is true
1532
1533If you now say
1534
1535 undef $ary{'a'}
1536
1537your table now reads:
1538
1539
1540 keys values
1541 +------+------+
1542 | a | undef|
1543 | x | 7 |
1544 | d | 0 |
1545 | e | 2 |
1546 +------+------+
1547
1548and these conditions now hold; changes in caps:
1549
1550 $ary{'a'} is FALSE
1551 $ary{'d'} is false
1552 defined $ary{'d'} is true
1553 defined $ary{'a'} is FALSE
1554 exists $ary{'a'} is true (perl5 only)
1555 grep ($_ eq 'a', keys %ary) is true
1556
1557Notice the last two: you have an undef value, but a defined key!
1558
1559Now, consider this:
1560
1561 delete $ary{'a'}
1562
1563your table now reads:
1564
1565 keys values
1566 +------+------+
1567 | x | 7 |
1568 | d | 0 |
1569 | e | 2 |
1570 +------+------+
1571
1572and these conditions now hold; changes in caps:
1573
1574 $ary{'a'} is false
1575 $ary{'d'} is false
1576 defined $ary{'d'} is true
1577 defined $ary{'a'} is false
1578 exists $ary{'a'} is FALSE (perl5 only)
1579 grep ($_ eq 'a', keys %ary) is FALSE
1580
1581See, the whole entry is gone!
1582
1583=head2 Why don't my tied hashes make the defined/exists distinction?
1584
1585They may or may not implement the EXISTS() and DEFINED() methods
1586differently. For example, there isn't the concept of undef with hashes
1587that are tied to DBM* files. This means the true/false tables above
1588will give different results when used on such a hash. It also means
1589that exists and defined do the same thing with a DBM* file, and what
1590they end up doing is not what they do with ordinary hashes.
1591
1592=head2 How do I reset an each() operation part-way through?
1593
5a964f20 1594Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1595the hash I<and> resets the iterator associated with the hash. You may
1596need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1597re-enter it, the hash iterator has been reset.
68dc0745 1598
1599=head2 How can I get the unique keys from two hashes?
1600
d92eb7b0 1601First you extract the keys from the hashes into lists, then solve
1602the "removing duplicates" problem described above. For example:
68dc0745 1603
1604 %seen = ();
1605 for $element (keys(%foo), keys(%bar)) {
1606 $seen{$element}++;
1607 }
1608 @uniq = keys %seen;
1609
1610Or more succinctly:
1611
1612 @uniq = keys %{{%foo,%bar}};
1613
1614Or if you really want to save space:
1615
1616 %seen = ();
1617 while (defined ($key = each %foo)) {
1618 $seen{$key}++;
1619 }
1620 while (defined ($key = each %bar)) {
1621 $seen{$key}++;
1622 }
1623 @uniq = keys %seen;
1624
1625=head2 How can I store a multidimensional array in a DBM file?
1626
1627Either stringify the structure yourself (no fun), or else
1628get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1629it on top of either DB_File or GDBM_File.
1630
1631=head2 How can I make my hash remember the order I put elements into it?
1632
1633Use the Tie::IxHash from CPAN.
1634
46fc3d4c 1635 use Tie::IxHash;
1636 tie(%myhash, Tie::IxHash);
1637 for ($i=0; $i<20; $i++) {
1638 $myhash{$i} = 2*$i;
1639 }
1640 @keys = keys %myhash;
1641 # @keys = (0,1,2,3,...)
1642
68dc0745 1643=head2 Why does passing a subroutine an undefined element in a hash create it?
1644
1645If you say something like:
1646
1647 somefunc($hash{"nonesuch key here"});
1648
1649Then that element "autovivifies"; that is, it springs into existence
1650whether you store something there or not. That's because functions
1651get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1652it has to be ready to write it back into the caller's version.
1653
1654This has been fixed as of perl5.004.
1655
1656Normally, merely accessing a key's value for a nonexistent key does
1657I<not> cause that key to be forever there. This is different than
1658awk's behavior.
1659
fc36a67e 1660=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1661
65acb1b1 1662Usually a hash ref, perhaps like this:
1663
1664 $record = {
1665 NAME => "Jason",
1666 EMPNO => 132,
1667 TITLE => "deputy peon",
1668 AGE => 23,
1669 SALARY => 37_000,
1670 PALS => [ "Norbert", "Rhys", "Phineas"],
1671 };
1672
1673References are documented in L<perlref> and the upcoming L<perlreftut>.
1674Examples of complex data structures are given in L<perldsc> and
1675L<perllol>. Examples of structures and object-oriented classes are
1676in L<perltoot>.
68dc0745 1677
1678=head2 How can I use a reference as a hash key?
1679
1680You can't do this directly, but you could use the standard Tie::Refhash
1681module distributed with perl.
1682
1683=head1 Data: Misc
1684
1685=head2 How do I handle binary data correctly?
1686
1687Perl is binary clean, so this shouldn't be a problem. For example,
1688this works fine (assuming the files are found):
1689
1690 if (`cat /vmunix` =~ /gzip/) {
1691 print "Your kernel is GNU-zip enabled!\n";
1692 }
1693
d92eb7b0 1694On less elegant (read: Byzantine) systems, however, you have
1695to play tedious games with "text" versus "binary" files. See
1696L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
1697systems are curses out of Microsoft, who seem to be committed to putting
1698the backward into backward compatibility.
68dc0745 1699
1700If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1701
54310121 1702If you want to deal with multibyte characters, however, there are
68dc0745 1703some gotchas. See the section on Regular Expressions.
1704
1705=head2 How do I determine whether a scalar is a number/whole/integer/float?
1706
1707Assuming that you don't care about IEEE notations like "NaN" or
1708"Infinity", you probably just want to use a regular expression.
1709
65acb1b1 1710 if (/\D/) { print "has nondigits\n" }
1711 if (/^\d+$/) { print "is a whole number\n" }
1712 if (/^-?\d+$/) { print "is an integer\n" }
1713 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1714 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1715 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
1716 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1717 { print "a C float" }
68dc0745 1718
5a964f20 1719If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1720function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1721wrapper function for more convenient access. This function takes
1722a string and returns the number it found, or C<undef> for input that
1723isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1724if you just want to say, ``Is this a float?''
1725
1726 sub getnum {
1727 use POSIX qw(strtod);
1728 my $str = shift;
1729 $str =~ s/^\s+//;
1730 $str =~ s/\s+$//;
1731 $! = 0;
1732 my($num, $unparsed) = strtod($str);
1733 if (($str eq '') || ($unparsed != 0) || $!) {
1734 return undef;
1735 } else {
1736 return $num;
1737 }
1738 }
1739
1740 sub is_numeric { defined &getnum }
1741
d92eb7b0 1742Or you could check out
1743http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz
1744instead. The POSIX module (part of the standard Perl distribution)
1745provides the C<strtol> and C<strtod> for converting strings to double
68dc0745 1746and longs, respectively.
1747
1748=head2 How do I keep persistent data across program calls?
1749
1750For some specific applications, you can use one of the DBM modules.
65acb1b1 1751See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
1752Storable, or Class::Eroot modules from CPAN. Here's one example using
1753Storable's C<store> and C<retrieve> functions:
1754
1755 use Storable;
1756 store(\%hash, "filename");
1757
1758 # later on...
1759 $href = retrieve("filename"); # by ref
1760 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1761
1762=head2 How do I print out or copy a recursive data structure?
1763
65acb1b1 1764The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1765for printing out data structures. The Storable module, found on CPAN,
1766provides a function called C<dclone> that recursively copies its argument.
1767
1768 use Storable qw(dclone);
1769 $r2 = dclone($r1);
68dc0745 1770
65acb1b1 1771Where $r1 can be a reference to any kind of data structure you'd like.
1772It will be deeply copied. Because C<dclone> takes and returns references,
1773you'd have to add extra punctuation if you had a hash of arrays that
1774you wanted to copy.
68dc0745 1775
65acb1b1 1776 %newhash = %{ dclone(\%oldhash) };
68dc0745 1777
1778=head2 How do I define methods for every class/object?
1779
1780Use the UNIVERSAL class (see L<UNIVERSAL>).
1781
1782=head2 How do I verify a credit card checksum?
1783
1784Get the Business::CreditCard module from CPAN.
1785
65acb1b1 1786=head2 How do I pack arrays of doubles or floats for XS code?
1787
1788The kgbpack.c code in the PGPLOT module on CPAN does just this.
1789If you're doing a lot of float or double processing, consider using
1790the PDL module from CPAN instead--it makes number-crunching easy.
1791
68dc0745 1792=head1 AUTHOR AND COPYRIGHT
1793
65acb1b1 1794Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1795All rights reserved.
1796
1797When included as part of the Standard Version of Perl, or as part of
1798its complete documentation whether printed or otherwise, this work
d92eb7b0 1799may be distributed only under the terms of Perl's Artistic License.
5a964f20 1800Any distribution of this file or derivatives thereof I<outside>
1801of that package require that special arrangements be made with
1802copyright holder.
1803
1804Irrespective of its distribution, all code examples in this file
1805are hereby placed into the public domain. You are permitted and
1806encouraged to use this code in your own programs for fun
1807or for profit as you see fit. A simple comment in the code giving
1808credit would be courteous but is not required.