minor bug in dumping blessed subrefs
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
65acb1b1 3perlfaq4 - Data Manipulation ($Revision: 1.40 $, $Date: 1999/01/08 04:26:39 $)
68dc0745 4
5=head1 DESCRIPTION
6
7The section of the FAQ answers question related to the manipulation
8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20 15The infinite set that a mathematician thinks of as the real numbers can
16only be approximate on a computer, since the computer only has a finite
17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05 20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
46fc3d4c 22representation (eg, 19.95) to the internal binary representation.
23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
32current output format for numbers (see L<perlvar/"$#"> if you use
33print. C<$#> has a different default value in Perl5 than it did in
34Perl4. Changing C<$#> yourself is deprecated.
35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
42To get rid of the superfluous digits, just use a format (eg,
43C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 44See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 45
68dc0745 46=head2 Why isn't my octal data interpreted correctly?
47
48Perl only understands octal and hex numbers as such when they occur
49as literals in your program. If they are read in from somewhere and
50assigned, no automatic conversion takes place. You must explicitly
51use oct() or hex() if you want the values converted. oct() interprets
52both hex ("0x350") numbers and octal ones ("0350" or even without the
53leading "0", like "377"), while hex() only converts hexadecimal ones,
54with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
55
56This problem shows up most often when people try using chmod(), mkdir(),
57umask(), or sysopen(), which all want permissions in octal.
58
59 chmod(644, $file); # WRONG -- perl -w catches this
60 chmod(0644, $file); # right
61
65acb1b1 62=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 63
92c2ed05 64Remember that int() merely truncates toward 0. For rounding to a
65certain number of digits, sprintf() or printf() is usually the easiest
66route.
67
68 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 69
70The POSIX module (part of the standard perl distribution) implements
71ceil(), floor(), and a number of other mathematical and trigonometric
72functions.
73
92c2ed05 74 use POSIX;
75 $ceil = ceil(3.5); # 4
76 $floor = floor(3.5); # 3
77
46fc3d4c 78In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
79module. With 5.004, the Math::Trig module (part of the standard perl
80distribution) implements the trigonometric functions. Internally it
81uses the Math::Complex module and some functions can break out from
82the real axis into the complex plane, for example the inverse sine of
832.
68dc0745 84
85Rounding in financial applications can have serious implications, and
86the rounding method used should be specified precisely. In these
87cases, it probably pays not to trust whichever system rounding is
88being used by Perl, but to instead implement the rounding function you
89need yourself.
90
65acb1b1 91To see why, notice how you'll still have an issue on half-way-point
92alternation:
93
94 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
95
96 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
97 0.8 0.8 0.9 0.9 1.0 1.0
98
99Don't blame Perl. It's the same as in C. IEEE says we have to do this.
100Perl numbers whose absolute values are integers under 2**31 (on 32 bit
101machines) will work pretty much like mathematical integers. Other numbers
102are not guaranteed.
103
68dc0745 104=head2 How do I convert bits into ints?
105
92c2ed05 106To turn a string of 1s and 0s like C<10110110> into a scalar containing
68dc0745 107its binary value, use the pack() function (documented in
108L<perlfunc/"pack">):
109
110 $decimal = pack('B8', '10110110');
111
112Here's an example of going the other way:
113
114 $binary_string = join('', unpack('B*', "\x29"));
115
65acb1b1 116=head2 Why doesn't & work the way I want it to?
117
118The behavior of binary arithmetic operators depends on whether they're
119used on numbers or strings. The operators treat a string as a series
120of bits and work with that (the string C<"3"> is the bit pattern
121C<00110011>). The operators work with the binary form of a number
122(the number C<3> is treated as the bit pattern C<00000011>).
123
124So, saying C<11 & 3> performs the "and" operation on numbers (yielding
125C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
126(yielding C<"1">).
127
128Most problems with C<&> and C<|> arise because the programmer thinks
129they have a number but really it's a string. The rest arise because
130the programmer says:
131
132 if ("\020\020" & "\101\101") {
133 # ...
134 }
135
136but a string consisting of two null bytes (the result of C<"\020\020"
137& "\101\101">) is not a false value in Perl. You need:
138
139 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
140 # ...
141 }
142
68dc0745 143=head2 How do I multiply matrices?
144
145Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
146or the PDL extension (also available from CPAN).
147
148=head2 How do I perform an operation on a series of integers?
149
150To call a function on each element in an array, and collect the
151results, use:
152
153 @results = map { my_func($_) } @array;
154
155For example:
156
157 @triple = map { 3 * $_ } @single;
158
159To call a function on each element of an array, but ignore the
160results:
161
162 foreach $iterator (@array) {
65acb1b1 163 some_func($iterator);
68dc0745 164 }
165
166To call a function on each integer in a (small) range, you B<can> use:
167
65acb1b1 168 @results = map { some_func($_) } (5 .. 25);
68dc0745 169
170but you should be aware that the C<..> operator creates an array of
171all integers in the range. This can take a lot of memory for large
172ranges. Instead use:
173
174 @results = ();
175 for ($i=5; $i < 500_005; $i++) {
65acb1b1 176 push(@results, some_func($i));
68dc0745 177 }
178
179=head2 How can I output Roman numerals?
180
181Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
182
183=head2 Why aren't my random numbers random?
184
65acb1b1 185If you're using a version of Perl before 5.004, you must call C<srand>
186once at the start of your program to seed the random number generator.
1875.004 and later automatically call C<srand> at the beginning. Don't
188call C<srand> more than once--you make your numbers less random, rather
189than more.
92c2ed05 190
65acb1b1 191Computers are good at being predictable and bad at being random
192(despite appearances caused by bugs in your programs :-).
193http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
194Phoenix, talks more about this.. John von Neumann said, ``Anyone who
195attempts to generate random numbers by deterministic means is, of
196course, living in a state of sin.''
197
198If you want numbers that are more random than C<rand> with C<srand>
199provides, you should also check out the Math::TrulyRandom module from
200CPAN. It uses the imperfections in your system's timer to generate
201random numbers, but this takes quite a while. If you want a better
92c2ed05 202pseudorandom generator than comes with your operating system, look at
65acb1b1 203``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 204
205=head1 Data: Dates
206
207=head2 How do I find the week-of-the-year/day-of-the-year?
208
209The day of the year is in the array returned by localtime() (see
210L<perlfunc/"localtime">):
211
212 $day_of_year = (localtime(time()))[7];
213
214or more legibly (in 5.004 or higher):
215
216 use Time::localtime;
217 $day_of_year = localtime(time())->yday;
218
219You can find the week of the year by dividing this by 7:
220
221 $week_of_year = int($day_of_year / 7);
222
92c2ed05 223Of course, this believes that weeks start at zero. The Date::Calc
224module from CPAN has a lot of date calculation functions, including
5e3006a4 225day of the year, week of the year, and so on. Note that not
65acb1b1 226all businesses consider ``week 1'' to be the same; for example,
227American businesses often consider the first week with a Monday
228in it to be Work Week #1, despite ISO 8601, which considers
229WW1 to be the first week with a Thursday in it.
68dc0745 230
92c2ed05 231=head2 How can I compare two dates and find the difference?
68dc0745 232
92c2ed05 233If you're storing your dates as epoch seconds then simply subtract one
234from the other. If you've got a structured date (distinct year, day,
235month, hour, minute, seconds values) then use one of the Date::Manip
236and Date::Calc modules from CPAN.
68dc0745 237
238=head2 How can I take a string and turn it into epoch seconds?
239
240If it's a regular enough string that it always has the same format,
92c2ed05 241you can split it up and pass the parts to C<timelocal> in the standard
242Time::Local module. Otherwise, you should look into the Date::Calc
243and Date::Manip modules from CPAN.
68dc0745 244
245=head2 How can I find the Julian Day?
246
92c2ed05 247Neither Date::Manip nor Date::Calc deal with Julian days. Instead,
248there is an example of Julian date calculation that should help you in
249http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz
250.
68dc0745 251
65acb1b1 252=head2 How do I find yesterday's date?
253
254The C<time()> function returns the current time in seconds since the
255epoch. Take one day off that:
256
257 $yesterday = time() - ( 24 * 60 * 60 );
258
259Then you can pass this to C<localtime()> and get the individual year,
260month, day, hour, minute, seconds values.
261
5a964f20 262=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
68dc0745 263
65acb1b1 264Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
265Y2K compliant (whatever that means). The programmers you've hired to
266use it, however, probably are not.
267
268Long answer: The question belies a true understanding of the issue.
269Perl is just as Y2K compliant as your pencil--no more, and no less.
270Can you use your pencil to write a non-Y2K-compliant memo? Of course
271you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 272
65acb1b1 273The date and time functions supplied with perl (gmtime and localtime)
274supply adequate information to determine the year well beyond 2000
275(2038 is when trouble strikes for 32-bit machines). The year returned
276by these functions when used in an array context is the year minus 1900.
277For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
278number. To avoid the year 2000 problem simply do not treat the year as
279a 2-digit number. It isn't.
68dc0745 280
5a964f20 281When gmtime() and localtime() are used in scalar context they return
68dc0745 282a timestamp string that contains a fully-expanded year. For example,
283C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
2842001". There's no year 2000 problem here.
285
5a964f20 286That doesn't mean that Perl can't be used to create non-Y2K compliant
287programs. It can. But so can your pencil. It's the fault of the user,
288not the language. At the risk of inflaming the NRA: ``Perl doesn't
289break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
290a longer exposition.
291
68dc0745 292=head1 Data: Strings
293
294=head2 How do I validate input?
295
296The answer to this question is usually a regular expression, perhaps
5a964f20 297with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 298addresses, etc.) for details.
299
300=head2 How do I unescape a string?
301
92c2ed05 302It depends just what you mean by ``escape''. URL escapes are dealt
303with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
68dc0745 304character are removed with:
305
306 s/\\(.)/$1/g;
307
92c2ed05 308This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 309
310=head2 How do I remove consecutive pairs of characters?
311
92c2ed05 312To turn C<"abbcccd"> into C<"abccd">:
68dc0745 313
314 s/(.)\1/$1/g;
315
316=head2 How do I expand function calls in a string?
317
318This is documented in L<perlref>. In general, this is fraught with
319quoting and readability problems, but it is possible. To interpolate
5a964f20 320a subroutine call (in list context) into a string:
68dc0745 321
322 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
323
324If you prefer scalar context, similar chicanery is also useful for
325arbitrary expressions:
326
327 print "That yields ${\($n + 5)} widgets\n";
328
92c2ed05 329Version 5.004 of Perl had a bug that gave list context to the
330expression in C<${...}>, but this is fixed in version 5.005.
331
332See also ``How can I expand variables in text strings?'' in this
333section of the FAQ.
46fc3d4c 334
68dc0745 335=head2 How do I find matching/nesting anything?
336
92c2ed05 337This isn't something that can be done in one regular expression, no
338matter how complicated. To find something between two single
339characters, a pattern like C</x([^x]*)x/> will get the intervening
340bits in $1. For multiple ones, then something more like
341C</alpha(.*?)omega/> would be needed. But none of these deals with
342nested patterns, nor can they. For that you'll have to write a
343parser.
344
345If you are serious about writing a parser, there are a number of
346modules or oddities that will make your life a lot easier. There is
347the CPAN module Parse::RecDescent, the standard module Text::Balanced,
65acb1b1 348the byacc program, the CPAN module Parse::Yapp, and Mark-Jason
349Dominus's excellent I<py> tool at http://www.plover.com/~mjd/perl/py/
350.
68dc0745 351
92c2ed05 352One simple destructive, inside-out approach that you might try is to
353pull out the smallest nesting parts one at a time:
5a964f20 354
c8db1d39 355 while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {
5a964f20 356 # do something with $1
357 }
358
65acb1b1 359A more complicated and sneaky approach is to make Perl's regular
360expression engine do it for you. This is courtesy Dean Inada, and
361rather has the nature of an Obfuscated Perl Contest entry, but it
362really does work:
363
364 # $_ contains the string to parse
365 # BEGIN and END are the opening and closing markers for the
366 # nested text.
367
368 @( = ('(','');
369 @) = (')','');
370 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
371 @$ = (eval{/$re/},$@!~/unmatched/);
372 print join("\n",@$[0..$#$]) if( $$[-1] );
373
68dc0745 374=head2 How do I reverse a string?
375
5a964f20 376Use reverse() in scalar context, as documented in
68dc0745 377L<perlfunc/reverse>.
378
379 $reversed = reverse $string;
380
381=head2 How do I expand tabs in a string?
382
5a964f20 383You can do it yourself:
68dc0745 384
385 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
386
387Or you can just use the Text::Tabs module (part of the standard perl
388distribution).
389
390 use Text::Tabs;
391 @expanded_lines = expand(@lines_with_tabs);
392
393=head2 How do I reformat a paragraph?
394
395Use Text::Wrap (part of the standard perl distribution):
396
397 use Text::Wrap;
398 print wrap("\t", ' ', @paragraphs);
399
92c2ed05 400The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 401newlines. Text::Wrap doesn't justify the lines (flush-right).
402
68dc0745 403=head2 How can I access/change the first N letters of a string?
404
405There are many ways. If you just want to grab a copy, use
92c2ed05 406substr():
68dc0745 407
408 $first_byte = substr($a, 0, 1);
409
410If you want to modify part of a string, the simplest way is often to
411use substr() as an lvalue:
412
413 substr($a, 0, 3) = "Tom";
414
92c2ed05 415Although those with a pattern matching kind of thought process will
416likely prefer:
68dc0745 417
418 $a =~ s/^.../Tom/;
419
420=head2 How do I change the Nth occurrence of something?
421
92c2ed05 422You have to keep track of N yourself. For example, let's say you want
423to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
424C<"whosoever"> or C<"whomsoever">, case insensitively.
68dc0745 425
426 $count = 0;
427 s{((whom?)ever)}{
428 ++$count == 5 # is it the 5th?
429 ? "${2}soever" # yes, swap
430 : $1 # renege and leave it there
431 }igex;
432
5a964f20 433In the more general case, you can use the C</g> modifier in a C<while>
434loop, keeping count of matches.
435
436 $WANT = 3;
437 $count = 0;
438 while (/(\w+)\s+fish\b/gi) {
439 if (++$count == $WANT) {
440 print "The third fish is a $1 one.\n";
441 # Warning: don't `last' out of this loop
442 }
443 }
444
92c2ed05 445That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 446repetition count and repeated pattern like this:
447
448 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
449
68dc0745 450=head2 How can I count the number of occurrences of a substring within a string?
451
452There are a number of ways, with varying efficiency: If you want a
453count of a certain single character (X) within a string, you can use the
454C<tr///> function like so:
455
368c9434 456 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 457 $count = ($string =~ tr/X//);
46fc3d4c 458 print "There are $count X charcters in the string";
68dc0745 459
460This is fine if you are just looking for a single character. However,
461if you are trying to count multiple character substrings within a
462larger string, C<tr///> won't work. What you can do is wrap a while()
463loop around a global pattern match. For example, let's count negative
464integers:
465
466 $string = "-9 55 48 -2 23 -76 4 14 -44";
467 while ($string =~ /-\d+/g) { $count++ }
468 print "There are $count negative numbers in the string";
469
470=head2 How do I capitalize all the words on one line?
471
472To make the first letter of each word upper case:
3fe9a6f1 473
68dc0745 474 $line =~ s/\b(\w)/\U$1/g;
475
46fc3d4c 476This has the strange effect of turning "C<don't do it>" into "C<Don'T
477Do It>". Sometimes you might want this, instead (Suggested by Brian
92c2ed05 478Foy):
46fc3d4c 479
480 $string =~ s/ (
481 (^\w) #at the beginning of the line
482 | # or
483 (\s\w) #preceded by whitespace
484 )
485 /\U$1/xg;
486 $string =~ /([\w']+)/\u\L$1/g;
487
68dc0745 488To make the whole line upper case:
3fe9a6f1 489
68dc0745 490 $line = uc($line);
491
492To force each word to be lower case, with the first letter upper case:
3fe9a6f1 493
68dc0745 494 $line =~ s/(\w+)/\u\L$1/g;
495
5a964f20 496You can (and probably should) enable locale awareness of those
497characters by placing a C<use locale> pragma in your program.
92c2ed05 498See L<perllocale> for endless details on locales.
5a964f20 499
65acb1b1 500This is sometimes referred to as putting something into "title
501case", but that's not quite accurate. Consdier the proper
502capitalization of the movie I<Dr. Strangelove or: How I Learned to
503Stop Worrying and Love the Bomb>, for example.
504
68dc0745 505=head2 How can I split a [character] delimited string except when inside
506[character]? (Comma-separated files)
507
508Take the example case of trying to split a string that is comma-separated
509into its different fields. (We'll pretend you said comma-separated, not
510comma-delimited, which is different and almost never what you mean.) You
511can't use C<split(/,/)> because you shouldn't split if the comma is inside
512quotes. For example, take a data line like this:
513
514 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
515
516Due to the restriction of the quotes, this is a fairly complex
517problem. Thankfully, we have Jeffrey Friedl, author of a highly
518recommended book on regular expressions, to handle these for us. He
519suggests (assuming your string is contained in $text):
520
521 @new = ();
522 push(@new, $+) while $text =~ m{
523 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
524 | ([^,]+),?
525 | ,
526 }gx;
527 push(@new, undef) if substr($text,-1,1) eq ',';
528
46fc3d4c 529If you want to represent quotation marks inside a
530quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 531C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 532this section.
533
68dc0745 534Alternatively, the Text::ParseWords module (part of the standard perl
535distribution) lets you say:
536
537 use Text::ParseWords;
538 @new = quotewords(",", 0, $text);
539
65acb1b1 540There's also a Text::CSV module on CPAN.
541
68dc0745 542=head2 How do I strip blank space from the beginning/end of a string?
543
5a964f20 544Although the simplest approach would seem to be:
68dc0745 545
546 $string =~ s/^\s*(.*?)\s*$/$1/;
547
65acb1b1 548This is unnecessarily slow, destructive, and fails with embedded newlines.
5a964f20 549It is much better faster to do this in two steps:
68dc0745 550
551 $string =~ s/^\s+//;
552 $string =~ s/\s+$//;
553
554Or more nicely written as:
555
556 for ($string) {
557 s/^\s+//;
558 s/\s+$//;
559 }
560
5e3006a4 561This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 562behavior to factor out common code. You can do this
563on several strings at once, or arrays, or even the
564values of a hash if you use a slide:
565
566 # trim whitespace in the scalar, the array,
567 # and all the values in the hash
568 foreach ($scalar, @array, @hash{keys %hash}) {
569 s/^\s+//;
570 s/\s+$//;
571 }
572
65acb1b1 573=head2 How do I pad a string with blanks or pad a number with zeroes?
574
575(This answer contributed by Uri Guttman)
576
577In the following examples, C<$pad_len> is the length to which you wish
578to pad the string, C<$text> or C<$num> contains the string to be
579padded, and C<$pad_char> contains the padding character. You can use a
580single character string constant instead of the C<$pad_char> variable
581if you know what it is in advance.
582
583The simplest method use the C<sprintf> function. It can pad on the
584left or right with blanks and on the left with zeroes.
585
586 # Left padding with blank:
587 $padded = sprintf( "%${pad_len}s", $text ) ;
588
589 # Right padding with blank:
590 $padded = sprintf( "%${pad_len}s", $text ) ;
591
592 # Left padding with 0:
593 $padded = sprintf( "%0${pad_len}d", $num ) ;
594
595If you need to pad with a character other than blank or zero you can use
596one of the following methods.
597
598These methods generate a pad string with the C<x> operator and
599concatenate that with the original text.
600
601Left and right padding with any character:
602
603 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text ;
604 $padded = $text . $pad_char x ( $pad_len - length( $text ) ) ;
605
606Or you can left or right pad $text directly:
607
608 $text .= $pad_char x ( $pad_len - length( $text ) ) ;
609 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ) ;
610
68dc0745 611=head2 How do I extract selected columns from a string?
612
613Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 614If you prefer thinking in terms of columns instead of widths,
615you can use this kind of thing:
616
617 # determine the unpack format needed to split Linux ps output
618 # arguments are cut columns
619 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
620
621 sub cut2fmt {
622 my(@positions) = @_;
623 my $template = '';
624 my $lastpos = 1;
625 for my $place (@positions) {
626 $template .= "A" . ($place - $lastpos) . " ";
627 $lastpos = $place;
628 }
629 $template .= "A*";
630 return $template;
631 }
68dc0745 632
633=head2 How do I find the soundex value of a string?
634
635Use the standard Text::Soundex module distributed with perl.
636
637=head2 How can I expand variables in text strings?
638
639Let's assume that you have a string like:
640
641 $text = 'this has a $foo in it and a $bar';
5a964f20 642
643If those were both global variables, then this would
644suffice:
645
65acb1b1 646 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 647
5a964f20 648But since they are probably lexicals, or at least, they could
649be, you'd have to do this:
68dc0745 650
651 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 652 die if $@; # needed /ee, not /e
68dc0745 653
5a964f20 654It's probably better in the general case to treat those
655variables as entries in some special hash. For example:
656
657 %user_defs = (
658 foo => 23,
659 bar => 19,
660 );
661 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 662
92c2ed05 663See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 664of the FAQ.
665
68dc0745 666=head2 What's wrong with always quoting "$vars"?
667
668The problem is that those double-quotes force stringification,
669coercing numbers and references into strings, even when you
65acb1b1 670don't want them to be. Think of it this way: double-quote
671expansion is used to produce new strings. If you already
672have a string, why do you need more?
68dc0745 673
674If you get used to writing odd things like these:
675
676 print "$var"; # BAD
677 $new = "$old"; # BAD
678 somefunc("$var"); # BAD
679
680You'll be in trouble. Those should (in 99.8% of the cases) be
681the simpler and more direct:
682
683 print $var;
684 $new = $old;
685 somefunc($var);
686
687Otherwise, besides slowing you down, you're going to break code when
688the thing in the scalar is actually neither a string nor a number, but
689a reference:
690
691 func(\@array);
692 sub func {
693 my $aref = shift;
694 my $oref = "$aref"; # WRONG
695 }
696
697You can also get into subtle problems on those few operations in Perl
698that actually do care about the difference between a string and a
699number, such as the magical C<++> autoincrement operator or the
700syscall() function.
701
5a964f20 702Stringification also destroys arrays.
703
704 @lines = `command`;
705 print "@lines"; # WRONG - extra blanks
706 print @lines; # right
707
65acb1b1 708=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 709
710Check for these three things:
711
712=over 4
713
714=item 1. There must be no space after the << part.
715
716=item 2. There (probably) should be a semicolon at the end.
717
718=item 3. You can't (easily) have any space in front of the tag.
719
720=back
721
5a964f20 722If you want to indent the text in the here document, you
723can do this:
724
725 # all in one
726 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
727 your text
728 goes here
729 HERE_TARGET
730
731But the HERE_TARGET must still be flush against the margin.
732If you want that indented also, you'll have to quote
733in the indentation.
734
735 ($quote = <<' FINIS') =~ s/^\s+//gm;
736 ...we will have peace, when you and all your works have
737 perished--and the works of your dark master to whom you
738 would deliver us. You are a liar, Saruman, and a corrupter
739 of men's hearts. --Theoden in /usr/src/perl/taint.c
740 FINIS
741 $quote =~ s/\s*--/\n--/;
742
743A nice general-purpose fixer-upper function for indented here documents
744follows. It expects to be called with a here document as its argument.
745It looks to see whether each line begins with a common substring, and
746if so, strips that off. Otherwise, it takes the amount of leading
747white space found on the first line and removes that much off each
748subsequent line.
749
750 sub fix {
751 local $_ = shift;
752 my ($white, $leader); # common white space and common leading string
753 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
754 ($white, $leader) = ($2, quotemeta($1));
755 } else {
756 ($white, $leader) = (/^(\s+)/, '');
757 }
758 s/^\s*?$leader(?:$white)?//gm;
759 return $_;
760 }
761
c8db1d39 762This works with leading special strings, dynamically determined:
5a964f20 763
764 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
765 @@@ int
766 @@@ runops() {
767 @@@ SAVEI32(runlevel);
768 @@@ runlevel++;
769 @@@ while ( op = (*op->op_ppaddr)() ) ;
770 @@@ TAINT_NOT;
771 @@@ return 0;
772 @@@ }
773 MAIN_INTERPRETER_LOOP
774
775Or with a fixed amount of leading white space, with remaining
776indentation correctly preserved:
777
778 $poem = fix<<EVER_ON_AND_ON;
779 Now far ahead the Road has gone,
780 And I must follow, if I can,
781 Pursuing it with eager feet,
782 Until it joins some larger way
783 Where many paths and errands meet.
784 And whither then? I cannot say.
785 --Bilbo in /usr/src/perl/pp_ctl.c
786 EVER_ON_AND_ON
787
68dc0745 788=head1 Data: Arrays
789
65acb1b1 790=head2 What is the difference between a list and an array?
791
792An array has a changeable length. A list does not. An array is something
793you can push or pop, while a list is a set of values. Some people make
794the distinction that a list is a value while an array is a variable.
795Subroutines are passed and return lists, you put things into list
796context, you initialize arrays with lists, and you foreach() across
797a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
798in scalar context behave like the number of elements in them, subroutines
799access their arguments through the array C<@_>, push/pop/shift only work
800on arrays.
801
802As a side note, there's no such thing as a list in scalar context.
803When you say
804
805 $scalar = (2, 5, 7, 9);
806
807you're using the comma operator in scalar context, so it evaluates the
808left hand side, then evaluates and returns the left hand side. This
809causes the last value to be returned: 9.
810
68dc0745 811=head2 What is the difference between $array[1] and @array[1]?
812
813The former is a scalar value, the latter an array slice, which makes
814it a list with one (scalar) value. You should use $ when you want a
815scalar value (most of the time) and @ when you want a list with one
816scalar value in it (very, very rarely; nearly never, in fact).
817
818Sometimes it doesn't make a difference, but sometimes it does.
819For example, compare:
820
821 $good[0] = `some program that outputs several lines`;
822
823with
824
825 @bad[0] = `same program that outputs several lines`;
826
827The B<-w> flag will warn you about these matters.
828
829=head2 How can I extract just the unique elements of an array?
830
831There are several possible ways, depending on whether the array is
832ordered and whether you wish to preserve the ordering.
833
834=over 4
835
836=item a) If @in is sorted, and you want @out to be sorted:
5a964f20 837(this assumes all true values in the array)
68dc0745 838
839 $prev = 'nonesuch';
840 @out = grep($_ ne $prev && ($prev = $_), @in);
841
c8db1d39 842This is nice in that it doesn't use much extra memory, simulating
843uniq(1)'s behavior of removing only adjacent duplicates. It's less
844nice in that it won't work with false values like undef, 0, or "";
845"0 but true" is ok, though.
68dc0745 846
847=item b) If you don't know whether @in is sorted:
848
849 undef %saw;
850 @out = grep(!$saw{$_}++, @in);
851
852=item c) Like (b), but @in contains only small integers:
853
854 @out = grep(!$saw[$_]++, @in);
855
856=item d) A way to do (b) without any loops or greps:
857
858 undef %saw;
859 @saw{@in} = ();
860 @out = sort keys %saw; # remove sort if undesired
861
862=item e) Like (d), but @in contains only small positive integers:
863
864 undef @ary;
865 @ary[@in] = @in;
866 @out = @ary;
867
868=back
869
65acb1b1 870But perhaps you should have been using a hash all along, eh?
871
5a964f20 872=head2 How can I tell whether a list or array contains a certain element?
873
874Hearing the word "in" is an I<in>dication that you probably should have
875used a hash, not a list or array, to store your data. Hashes are
876designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 877
5a964f20 878That being said, there are several ways to approach this. If you
879are going to make this query many times over arbitrary string values,
880the fastest way is probably to invert the original array and keep an
68dc0745 881associative array lying about whose keys are the first array's values.
882
883 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
884 undef %is_blue;
885 for (@blues) { $is_blue{$_} = 1 }
886
887Now you can check whether $is_blue{$some_color}. It might have been a
888good idea to keep the blues all in a hash in the first place.
889
890If the values are all small integers, you could use a simple indexed
891array. This kind of an array will take up less space:
892
893 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
894 undef @is_tiny_prime;
895 for (@primes) { $is_tiny_prime[$_] = 1; }
896
897Now you check whether $is_tiny_prime[$some_number].
898
899If the values in question are integers instead of strings, you can save
900quite a lot of space by using bit strings instead:
901
902 @articles = ( 1..10, 150..2000, 2017 );
903 undef $read;
7b8d334a 904 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 905
906Now check whether C<vec($read,$n,1)> is true for some C<$n>.
907
908Please do not use
909
910 $is_there = grep $_ eq $whatever, @array;
911
912or worse yet
913
914 $is_there = grep /$whatever/, @array;
915
916These are slow (checks every element even if the first matches),
917inefficient (same reason), and potentially buggy (what if there are
65acb1b1 918regexp characters in $whatever?). If you're only testing once, then
919use:
920
921 $is_there = 0;
922 foreach $elt (@array) {
923 if ($elt eq $elt_to_find) {
924 $is_there = 1;
925 last;
926 }
927 }
928 if ($is_there) { ... }
68dc0745 929
930=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
931
932Use a hash. Here's code to do both and more. It assumes that
933each element is unique in a given array:
934
935 @union = @intersection = @difference = ();
936 %count = ();
937 foreach $element (@array1, @array2) { $count{$element}++ }
938 foreach $element (keys %count) {
939 push @union, $element;
940 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
941 }
942
65acb1b1 943=head2 How do I test whether two arrays or hashes are equal?
944
945The following code works for single-level arrays. It uses a stringwise
946comparison, and does not distinguish defined versus undefined empty
947strings. Modify if you have other needs.
948
949 $are_equal = compare_arrays(\@frogs, \@toads);
950
951 sub compare_arrays {
952 my ($first, $second) = @_;
953 local $^W = 0; # silence spurious -w undef complaints
954 return 0 unless @$first == @$second;
955 for (my $i = 0; $i < @$first; $i++) {
956 return 0 if $first->[$i] ne $second->[$i];
957 }
958 return 1;
959 }
960
961For multilevel structures, you may wish to use an approach more
962like this one. It uses the CPAN module FreezeThaw:
963
964 use FreezeThaw qw(cmpStr);
965 @a = @b = ( "this", "that", [ "more", "stuff" ] );
966
967 printf "a and b contain %s arrays\n",
968 cmpStr(\@a, \@b) == 0
969 ? "the same"
970 : "different";
971
972This approach also works for comparing hashes. Here
973we'll demonstrate two different answers:
974
975 use FreezeThaw qw(cmpStr cmpStrHard);
976
977 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
978 $a{EXTRA} = \%b;
979 $b{EXTRA} = \%a;
980
981 printf "a and b contain %s hashes\n",
982 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
983
984 printf "a and b contain %s hashes\n",
985 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
986
987
988The first reports that both those the hashes contain the same data,
989while the second reports that they do not. Which you prefer is left as
990an exercise to the reader.
991
68dc0745 992=head2 How do I find the first array element for which a condition is true?
993
994You can use this if you care about the index:
995
65acb1b1 996 for ($i= 0; $i < @array; $i++) {
68dc0745 997 if ($array[$i] eq "Waldo") {
998 $found_index = $i;
999 last;
1000 }
1001 }
1002
1003Now C<$found_index> has what you want.
1004
1005=head2 How do I handle linked lists?
1006
1007In general, you usually don't need a linked list in Perl, since with
1008regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1009or you can use splice to add and/or remove arbitrary number of elements at
1010arbitrary points. Both pop and shift are both O(1) operations on perl's
1011dynamic arrays. In the absence of shifts and pops, push in general
1012needs to reallocate on the order every log(N) times, and unshift will
1013need to copy pointers each time.
68dc0745 1014
1015If you really, really wanted, you could use structures as described in
1016L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1017to do. For example, imagine a list node like this:
1018
1019 $node = {
1020 VALUE => 42,
1021 LINK => undef,
1022 };
1023
1024You could walk the list this way:
1025
1026 print "List: ";
1027 for ($node = $head; $node; $node = $node->{LINK}) {
1028 print $node->{VALUE}, " ";
1029 }
1030 print "\n";
1031
1032You could grow the list this way:
1033
1034 my ($head, $tail);
1035 $tail = append($head, 1); # grow a new head
1036 for $value ( 2 .. 10 ) {
1037 $tail = append($tail, $value);
1038 }
1039
1040 sub append {
1041 my($list, $value) = @_;
1042 my $node = { VALUE => $value };
1043 if ($list) {
1044 $node->{LINK} = $list->{LINK};
1045 $list->{LINK} = $node;
1046 } else {
1047 $_[0] = $node; # replace caller's version
1048 }
1049 return $node;
1050 }
1051
1052But again, Perl's built-in are virtually always good enough.
68dc0745 1053
1054=head2 How do I handle circular lists?
1055
1056Circular lists could be handled in the traditional fashion with linked
1057lists, or you could just do something like this with an array:
1058
1059 unshift(@array, pop(@array)); # the last shall be first
1060 push(@array, shift(@array)); # and vice versa
1061
1062=head2 How do I shuffle an array randomly?
1063
5a964f20 1064Use this:
1065
1066 # fisher_yates_shuffle( \@array ) :
1067 # generate a random permutation of @array in place
1068 sub fisher_yates_shuffle {
1069 my $array = shift;
1070 my $i;
1071 for ($i = @$array; --$i; ) {
1072 my $j = int rand ($i+1);
1073 next if $i == $j;
1074 @$array[$i,$j] = @$array[$j,$i];
1075 }
1076 }
1077
1078 fisher_yates_shuffle( \@array ); # permutes @array in place
1079
1080You've probably seen shuffling algorithms that works using splice,
68dc0745 1081randomly picking another element to swap the current element with:
1082
1083 srand;
1084 @new = ();
1085 @old = 1 .. 10; # just a demo
1086 while (@old) {
1087 push(@new, splice(@old, rand @old, 1));
1088 }
1089
5a964f20 1090This is bad because splice is already O(N), and since you do it N times,
1091you just invented a quadratic algorithm; that is, O(N**2). This does
1092not scale, although Perl is so efficient that you probably won't notice
1093this until you have rather largish arrays.
68dc0745 1094
1095=head2 How do I process/modify each element of an array?
1096
1097Use C<for>/C<foreach>:
1098
1099 for (@lines) {
5a964f20 1100 s/foo/bar/; # change that word
1101 y/XZ/ZX/; # swap those letters
68dc0745 1102 }
1103
1104Here's another; let's compute spherical volumes:
1105
5a964f20 1106 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1107 $_ **= 3;
1108 $_ *= (4/3) * 3.14159; # this will be constant folded
1109 }
1110
5a964f20 1111If you want to do the same thing to modify the values of the hash,
1112you may not use the C<values> function, oddly enough. You need a slice:
1113
1114 for $orbit ( @orbits{keys %orbits} ) {
1115 ($orbit **= 3) *= (4/3) * 3.14159;
1116 }
1117
68dc0745 1118=head2 How do I select a random element from an array?
1119
1120Use the rand() function (see L<perlfunc/rand>):
1121
5a964f20 1122 # at the top of the program:
68dc0745 1123 srand; # not needed for 5.004 and later
5a964f20 1124
1125 # then later on
68dc0745 1126 $index = rand @array;
1127 $element = $array[$index];
1128
5a964f20 1129Make sure you I<only call srand once per program, if then>.
1130If you are calling it more than once (such as before each
1131call to rand), you're almost certainly doing something wrong.
1132
68dc0745 1133=head2 How do I permute N elements of a list?
1134
1135Here's a little program that generates all permutations
1136of all the words on each line of input. The algorithm embodied
5a964f20 1137in the permute() function should work on any list:
68dc0745 1138
1139 #!/usr/bin/perl -n
5a964f20 1140 # tsc-permute: permute each word of input
1141 permute([split], []);
1142 sub permute {
1143 my @items = @{ $_[0] };
1144 my @perms = @{ $_[1] };
1145 unless (@items) {
1146 print "@perms\n";
68dc0745 1147 } else {
5a964f20 1148 my(@newitems,@newperms,$i);
1149 foreach $i (0 .. $#items) {
1150 @newitems = @items;
1151 @newperms = @perms;
1152 unshift(@newperms, splice(@newitems, $i, 1));
1153 permute([@newitems], [@newperms]);
68dc0745 1154 }
1155 }
1156 }
1157
1158=head2 How do I sort an array by (anything)?
1159
1160Supply a comparison function to sort() (described in L<perlfunc/sort>):
1161
1162 @list = sort { $a <=> $b } @list;
1163
1164The default sort function is cmp, string comparison, which would
1165sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<E<lt>=E<gt>>, used above, is
1166the numerical comparison operator.
1167
1168If you have a complicated function needed to pull out the part you
1169want to sort on, then don't do it inside the sort function. Pull it
1170out first, because the sort BLOCK can be called many times for the
1171same element. Here's an example of how to pull out the first word
1172after the first number on each item, and then sort those words
1173case-insensitively.
1174
1175 @idx = ();
1176 for (@data) {
1177 ($item) = /\d+\s*(\S+)/;
1178 push @idx, uc($item);
1179 }
1180 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1181
1182Which could also be written this way, using a trick
1183that's come to be known as the Schwartzian Transform:
1184
1185 @sorted = map { $_->[0] }
1186 sort { $a->[1] cmp $b->[1] }
46fc3d4c 1187 map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
68dc0745 1188
1189If you need to sort on several fields, the following paradigm is useful.
1190
1191 @sorted = sort { field1($a) <=> field1($b) ||
1192 field2($a) cmp field2($b) ||
1193 field3($a) cmp field3($b)
1194 } @data;
1195
1196This can be conveniently combined with precalculation of keys as given
1197above.
1198
1199See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
1200this approach.
1201
1202See also the question below on sorting hashes.
1203
1204=head2 How do I manipulate arrays of bits?
1205
1206Use pack() and unpack(), or else vec() and the bitwise operations.
1207
1208For example, this sets $vec to have bit N set if $ints[N] was set:
1209
1210 $vec = '';
1211 foreach(@ints) { vec($vec,$_,1) = 1 }
1212
1213And here's how, given a vector in $vec, you can
1214get those bits into your @ints array:
1215
1216 sub bitvec_to_list {
1217 my $vec = shift;
1218 my @ints;
1219 # Find null-byte density then select best algorithm
1220 if ($vec =~ tr/\0// / length $vec > 0.95) {
1221 use integer;
1222 my $i;
1223 # This method is faster with mostly null-bytes
1224 while($vec =~ /[^\0]/g ) {
1225 $i = -9 + 8 * pos $vec;
1226 push @ints, $i if vec($vec, ++$i, 1);
1227 push @ints, $i if vec($vec, ++$i, 1);
1228 push @ints, $i if vec($vec, ++$i, 1);
1229 push @ints, $i if vec($vec, ++$i, 1);
1230 push @ints, $i if vec($vec, ++$i, 1);
1231 push @ints, $i if vec($vec, ++$i, 1);
1232 push @ints, $i if vec($vec, ++$i, 1);
1233 push @ints, $i if vec($vec, ++$i, 1);
1234 }
1235 } else {
1236 # This method is a fast general algorithm
1237 use integer;
1238 my $bits = unpack "b*", $vec;
1239 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1240 push @ints, pos $bits while($bits =~ /1/g);
1241 }
1242 return \@ints;
1243 }
1244
1245This method gets faster the more sparse the bit vector is.
1246(Courtesy of Tim Bunce and Winfried Koenig.)
1247
65acb1b1 1248Here's a demo on how to use vec():
1249
1250 # vec demo
1251 $vector = "\xff\x0f\xef\xfe";
1252 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1253 unpack("N", $vector), "\n";
1254 $is_set = vec($vector, 23, 1);
1255 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1256 pvec($vector);
1257
1258 set_vec(1,1,1);
1259 set_vec(3,1,1);
1260 set_vec(23,1,1);
1261
1262 set_vec(3,1,3);
1263 set_vec(3,2,3);
1264 set_vec(3,4,3);
1265 set_vec(3,4,7);
1266 set_vec(3,8,3);
1267 set_vec(3,8,7);
1268
1269 set_vec(0,32,17);
1270 set_vec(1,32,17);
1271
1272 sub set_vec {
1273 my ($offset, $width, $value) = @_;
1274 my $vector = '';
1275 vec($vector, $offset, $width) = $value;
1276 print "offset=$offset width=$width value=$value\n";
1277 pvec($vector);
1278 }
1279
1280 sub pvec {
1281 my $vector = shift;
1282 my $bits = unpack("b*", $vector);
1283 my $i = 0;
1284 my $BASE = 8;
1285
1286 print "vector length in bytes: ", length($vector), "\n";
1287 @bytes = unpack("A8" x length($vector), $bits);
1288 print "bits are: @bytes\n\n";
1289 }
1290
68dc0745 1291=head2 Why does defined() return true on empty arrays and hashes?
1292
65acb1b1 1293The short story is that you should probably only use defined on scalars or
1294functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1295in the 5.004 release or later of Perl for more detail.
68dc0745 1296
1297=head1 Data: Hashes (Associative Arrays)
1298
1299=head2 How do I process an entire hash?
1300
1301Use the each() function (see L<perlfunc/each>) if you don't care
1302whether it's sorted:
1303
5a964f20 1304 while ( ($key, $value) = each %hash) {
68dc0745 1305 print "$key = $value\n";
1306 }
1307
1308If you want it sorted, you'll have to use foreach() on the result of
1309sorting the keys as shown in an earlier question.
1310
1311=head2 What happens if I add or remove keys from a hash while iterating over it?
1312
1313Don't do that.
1314
1315=head2 How do I look up a hash element by value?
1316
1317Create a reverse hash:
1318
1319 %by_value = reverse %by_key;
1320 $key = $by_value{$value};
1321
1322That's not particularly efficient. It would be more space-efficient
1323to use:
1324
1325 while (($key, $value) = each %by_key) {
1326 $by_value{$value} = $key;
1327 }
1328
1329If your hash could have repeated values, the methods above will only
1330find one of the associated keys. This may or may not worry you.
1331
1332=head2 How can I know how many entries are in a hash?
1333
1334If you mean how many keys, then all you have to do is
1335take the scalar sense of the keys() function:
1336
3fe9a6f1 1337 $num_keys = scalar keys %hash;
68dc0745 1338
1339In void context it just resets the iterator, which is faster
1340for tied hashes.
1341
1342=head2 How do I sort a hash (optionally by value instead of key)?
1343
1344Internally, hashes are stored in a way that prevents you from imposing
1345an order on key-value pairs. Instead, you have to sort a list of the
1346keys or values:
1347
1348 @keys = sort keys %hash; # sorted by key
1349 @keys = sort {
1350 $hash{$a} cmp $hash{$b}
1351 } keys %hash; # and by value
1352
1353Here we'll do a reverse numeric sort by value, and if two keys are
1354identical, sort by length of key, and if that fails, by straight ASCII
1355comparison of the keys (well, possibly modified by your locale -- see
1356L<perllocale>).
1357
1358 @keys = sort {
1359 $hash{$b} <=> $hash{$a}
1360 ||
1361 length($b) <=> length($a)
1362 ||
1363 $a cmp $b
1364 } keys %hash;
1365
1366=head2 How can I always keep my hash sorted?
1367
1368You can look into using the DB_File module and tie() using the
1369$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1370The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1371
1372=head2 What's the difference between "delete" and "undef" with hashes?
1373
1374Hashes are pairs of scalars: the first is the key, the second is the
1375value. The key will be coerced to a string, although the value can be
1376any kind of scalar: string, number, or reference. If a key C<$key> is
1377present in the array, C<exists($key)> will return true. The value for
1378a given key can be C<undef>, in which case C<$array{$key}> will be
1379C<undef> while C<$exists{$key}> will return true. This corresponds to
1380(C<$key>, C<undef>) being in the hash.
1381
1382Pictures help... here's the C<%ary> table:
1383
1384 keys values
1385 +------+------+
1386 | a | 3 |
1387 | x | 7 |
1388 | d | 0 |
1389 | e | 2 |
1390 +------+------+
1391
1392And these conditions hold
1393
1394 $ary{'a'} is true
1395 $ary{'d'} is false
1396 defined $ary{'d'} is true
1397 defined $ary{'a'} is true
1398 exists $ary{'a'} is true (perl5 only)
1399 grep ($_ eq 'a', keys %ary) is true
1400
1401If you now say
1402
1403 undef $ary{'a'}
1404
1405your table now reads:
1406
1407
1408 keys values
1409 +------+------+
1410 | a | undef|
1411 | x | 7 |
1412 | d | 0 |
1413 | e | 2 |
1414 +------+------+
1415
1416and these conditions now hold; changes in caps:
1417
1418 $ary{'a'} is FALSE
1419 $ary{'d'} is false
1420 defined $ary{'d'} is true
1421 defined $ary{'a'} is FALSE
1422 exists $ary{'a'} is true (perl5 only)
1423 grep ($_ eq 'a', keys %ary) is true
1424
1425Notice the last two: you have an undef value, but a defined key!
1426
1427Now, consider this:
1428
1429 delete $ary{'a'}
1430
1431your table now reads:
1432
1433 keys values
1434 +------+------+
1435 | x | 7 |
1436 | d | 0 |
1437 | e | 2 |
1438 +------+------+
1439
1440and these conditions now hold; changes in caps:
1441
1442 $ary{'a'} is false
1443 $ary{'d'} is false
1444 defined $ary{'d'} is true
1445 defined $ary{'a'} is false
1446 exists $ary{'a'} is FALSE (perl5 only)
1447 grep ($_ eq 'a', keys %ary) is FALSE
1448
1449See, the whole entry is gone!
1450
1451=head2 Why don't my tied hashes make the defined/exists distinction?
1452
1453They may or may not implement the EXISTS() and DEFINED() methods
1454differently. For example, there isn't the concept of undef with hashes
1455that are tied to DBM* files. This means the true/false tables above
1456will give different results when used on such a hash. It also means
1457that exists and defined do the same thing with a DBM* file, and what
1458they end up doing is not what they do with ordinary hashes.
1459
1460=head2 How do I reset an each() operation part-way through?
1461
5a964f20 1462Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1463the hash I<and> resets the iterator associated with the hash. You may
1464need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1465re-enter it, the hash iterator has been reset.
68dc0745 1466
1467=head2 How can I get the unique keys from two hashes?
1468
1469First you extract the keys from the hashes into arrays, and then solve
1470the uniquifying the array problem described above. For example:
1471
1472 %seen = ();
1473 for $element (keys(%foo), keys(%bar)) {
1474 $seen{$element}++;
1475 }
1476 @uniq = keys %seen;
1477
1478Or more succinctly:
1479
1480 @uniq = keys %{{%foo,%bar}};
1481
1482Or if you really want to save space:
1483
1484 %seen = ();
1485 while (defined ($key = each %foo)) {
1486 $seen{$key}++;
1487 }
1488 while (defined ($key = each %bar)) {
1489 $seen{$key}++;
1490 }
1491 @uniq = keys %seen;
1492
1493=head2 How can I store a multidimensional array in a DBM file?
1494
1495Either stringify the structure yourself (no fun), or else
1496get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1497it on top of either DB_File or GDBM_File.
1498
1499=head2 How can I make my hash remember the order I put elements into it?
1500
1501Use the Tie::IxHash from CPAN.
1502
46fc3d4c 1503 use Tie::IxHash;
1504 tie(%myhash, Tie::IxHash);
1505 for ($i=0; $i<20; $i++) {
1506 $myhash{$i} = 2*$i;
1507 }
1508 @keys = keys %myhash;
1509 # @keys = (0,1,2,3,...)
1510
68dc0745 1511=head2 Why does passing a subroutine an undefined element in a hash create it?
1512
1513If you say something like:
1514
1515 somefunc($hash{"nonesuch key here"});
1516
1517Then that element "autovivifies"; that is, it springs into existence
1518whether you store something there or not. That's because functions
1519get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1520it has to be ready to write it back into the caller's version.
1521
1522This has been fixed as of perl5.004.
1523
1524Normally, merely accessing a key's value for a nonexistent key does
1525I<not> cause that key to be forever there. This is different than
1526awk's behavior.
1527
fc36a67e 1528=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1529
65acb1b1 1530Usually a hash ref, perhaps like this:
1531
1532 $record = {
1533 NAME => "Jason",
1534 EMPNO => 132,
1535 TITLE => "deputy peon",
1536 AGE => 23,
1537 SALARY => 37_000,
1538 PALS => [ "Norbert", "Rhys", "Phineas"],
1539 };
1540
1541References are documented in L<perlref> and the upcoming L<perlreftut>.
1542Examples of complex data structures are given in L<perldsc> and
1543L<perllol>. Examples of structures and object-oriented classes are
1544in L<perltoot>.
68dc0745 1545
1546=head2 How can I use a reference as a hash key?
1547
1548You can't do this directly, but you could use the standard Tie::Refhash
1549module distributed with perl.
1550
1551=head1 Data: Misc
1552
1553=head2 How do I handle binary data correctly?
1554
1555Perl is binary clean, so this shouldn't be a problem. For example,
1556this works fine (assuming the files are found):
1557
1558 if (`cat /vmunix` =~ /gzip/) {
1559 print "Your kernel is GNU-zip enabled!\n";
1560 }
1561
65acb1b1 1562On some legacy systems, however, you have to play tedious games with
1563"text" versus "binary" files. See L<perlfunc/"binmode">, or the upcoming
1564L<perlopentut> manpage.
68dc0745 1565
1566If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1567
54310121 1568If you want to deal with multibyte characters, however, there are
68dc0745 1569some gotchas. See the section on Regular Expressions.
1570
1571=head2 How do I determine whether a scalar is a number/whole/integer/float?
1572
1573Assuming that you don't care about IEEE notations like "NaN" or
1574"Infinity", you probably just want to use a regular expression.
1575
65acb1b1 1576 if (/\D/) { print "has nondigits\n" }
1577 if (/^\d+$/) { print "is a whole number\n" }
1578 if (/^-?\d+$/) { print "is an integer\n" }
1579 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1580 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1581 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
1582 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1583 { print "a C float" }
68dc0745 1584
5a964f20 1585If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1586function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1587wrapper function for more convenient access. This function takes
1588a string and returns the number it found, or C<undef> for input that
1589isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1590if you just want to say, ``Is this a float?''
1591
1592 sub getnum {
1593 use POSIX qw(strtod);
1594 my $str = shift;
1595 $str =~ s/^\s+//;
1596 $str =~ s/\s+$//;
1597 $! = 0;
1598 my($num, $unparsed) = strtod($str);
1599 if (($str eq '') || ($unparsed != 0) || $!) {
1600 return undef;
1601 } else {
1602 return $num;
1603 }
1604 }
1605
1606 sub is_numeric { defined &getnum }
1607
68dc0745 1608Or you could check out
1609http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz
1610instead. The POSIX module (part of the standard Perl distribution)
1611provides the C<strtol> and C<strtod> for converting strings to double
1612and longs, respectively.
1613
1614=head2 How do I keep persistent data across program calls?
1615
1616For some specific applications, you can use one of the DBM modules.
65acb1b1 1617See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
1618Storable, or Class::Eroot modules from CPAN. Here's one example using
1619Storable's C<store> and C<retrieve> functions:
1620
1621 use Storable;
1622 store(\%hash, "filename");
1623
1624 # later on...
1625 $href = retrieve("filename"); # by ref
1626 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1627
1628=head2 How do I print out or copy a recursive data structure?
1629
65acb1b1 1630The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1631for printing out data structures. The Storable module, found on CPAN,
1632provides a function called C<dclone> that recursively copies its argument.
1633
1634 use Storable qw(dclone);
1635 $r2 = dclone($r1);
68dc0745 1636
65acb1b1 1637Where $r1 can be a reference to any kind of data structure you'd like.
1638It will be deeply copied. Because C<dclone> takes and returns references,
1639you'd have to add extra punctuation if you had a hash of arrays that
1640you wanted to copy.
68dc0745 1641
65acb1b1 1642 %newhash = %{ dclone(\%oldhash) };
68dc0745 1643
1644=head2 How do I define methods for every class/object?
1645
1646Use the UNIVERSAL class (see L<UNIVERSAL>).
1647
1648=head2 How do I verify a credit card checksum?
1649
1650Get the Business::CreditCard module from CPAN.
1651
65acb1b1 1652=head2 How do I pack arrays of doubles or floats for XS code?
1653
1654The kgbpack.c code in the PGPLOT module on CPAN does just this.
1655If you're doing a lot of float or double processing, consider using
1656the PDL module from CPAN instead--it makes number-crunching easy.
1657
68dc0745 1658=head1 AUTHOR AND COPYRIGHT
1659
65acb1b1 1660Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1661All rights reserved.
1662
1663When included as part of the Standard Version of Perl, or as part of
1664its complete documentation whether printed or otherwise, this work
1665may be distributed only under the terms of Perl's Artistic License.
1666Any distribution of this file or derivatives thereof I<outside>
1667of that package require that special arrangements be made with
1668copyright holder.
1669
1670Irrespective of its distribution, all code examples in this file
1671are hereby placed into the public domain. You are permitted and
1672encouraged to use this code in your own programs for fun
1673or for profit as you see fit. A simple comment in the code giving
1674credit would be courteous but is not required.
65acb1b1 1675