applied patch, with tweak suggested by Michael Parker
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
92c2ed05 3perlfaq4 - Data Manipulation ($Revision: 1.25 $, $Date: 1998/07/16 22:49:55 $)
68dc0745 4
5=head1 DESCRIPTION
6
7The section of the FAQ answers question related to the manipulation
8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20 15The infinite set that a mathematician thinks of as the real numbers can
16only be approximate on a computer, since the computer only has a finite
17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05 20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
46fc3d4c 22representation (eg, 19.95) to the internal binary representation.
23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
32current output format for numbers (see L<perlvar/"$#"> if you use
33print. C<$#> has a different default value in Perl5 than it did in
34Perl4. Changing C<$#> yourself is deprecated.
35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
42To get rid of the superfluous digits, just use a format (eg,
43C<printf("%.2f", 19.95)>) to get the required precision.
5a964f20 44See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 45
68dc0745 46=head2 Why isn't my octal data interpreted correctly?
47
48Perl only understands octal and hex numbers as such when they occur
49as literals in your program. If they are read in from somewhere and
50assigned, no automatic conversion takes place. You must explicitly
51use oct() or hex() if you want the values converted. oct() interprets
52both hex ("0x350") numbers and octal ones ("0350" or even without the
53leading "0", like "377"), while hex() only converts hexadecimal ones,
54with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
55
56This problem shows up most often when people try using chmod(), mkdir(),
57umask(), or sysopen(), which all want permissions in octal.
58
59 chmod(644, $file); # WRONG -- perl -w catches this
60 chmod(0644, $file); # right
61
5a964f20 62=head2 Does perl have a round function? What about ceil() and floor()? Trig functions?
68dc0745 63
92c2ed05 64Remember that int() merely truncates toward 0. For rounding to a
65certain number of digits, sprintf() or printf() is usually the easiest
66route.
67
68 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 69
70The POSIX module (part of the standard perl distribution) implements
71ceil(), floor(), and a number of other mathematical and trigonometric
72functions.
73
92c2ed05 74 use POSIX;
75 $ceil = ceil(3.5); # 4
76 $floor = floor(3.5); # 3
77
46fc3d4c 78In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
79module. With 5.004, the Math::Trig module (part of the standard perl
80distribution) implements the trigonometric functions. Internally it
81uses the Math::Complex module and some functions can break out from
82the real axis into the complex plane, for example the inverse sine of
832.
68dc0745 84
85Rounding in financial applications can have serious implications, and
86the rounding method used should be specified precisely. In these
87cases, it probably pays not to trust whichever system rounding is
88being used by Perl, but to instead implement the rounding function you
89need yourself.
90
91=head2 How do I convert bits into ints?
92
92c2ed05 93To turn a string of 1s and 0s like C<10110110> into a scalar containing
68dc0745 94its binary value, use the pack() function (documented in
95L<perlfunc/"pack">):
96
97 $decimal = pack('B8', '10110110');
98
99Here's an example of going the other way:
100
101 $binary_string = join('', unpack('B*', "\x29"));
102
103=head2 How do I multiply matrices?
104
105Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
106or the PDL extension (also available from CPAN).
107
108=head2 How do I perform an operation on a series of integers?
109
110To call a function on each element in an array, and collect the
111results, use:
112
113 @results = map { my_func($_) } @array;
114
115For example:
116
117 @triple = map { 3 * $_ } @single;
118
119To call a function on each element of an array, but ignore the
120results:
121
122 foreach $iterator (@array) {
123 &my_func($iterator);
124 }
125
126To call a function on each integer in a (small) range, you B<can> use:
127
128 @results = map { &my_func($_) } (5 .. 25);
129
130but you should be aware that the C<..> operator creates an array of
131all integers in the range. This can take a lot of memory for large
132ranges. Instead use:
133
134 @results = ();
135 for ($i=5; $i < 500_005; $i++) {
136 push(@results, &my_func($i));
137 }
138
139=head2 How can I output Roman numerals?
140
141Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
142
143=head2 Why aren't my random numbers random?
144
145The short explanation is that you're getting pseudorandom numbers, not
92c2ed05 146random ones, because computers are good at being predictable and bad
147at being random (despite appearances caused by bugs in your programs
148:-). A longer explanation is available on
149http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
150Phoenix. John von Neumann said, ``Anyone who attempts to generate
151random numbers by deterministic means is, of course, living in a state
152of sin.''
153
154You should also check out the Math::TrulyRandom module from CPAN. It
155uses the imperfections in your system's timer to generate random
156numbers, but this takes quite a while. If you want a better
157pseudorandom generator than comes with your operating system, look at
158``Numerical Recipes in C'' at http://nr.harvard.edu/nr/bookc.html .
68dc0745 159
160=head1 Data: Dates
161
162=head2 How do I find the week-of-the-year/day-of-the-year?
163
164The day of the year is in the array returned by localtime() (see
165L<perlfunc/"localtime">):
166
167 $day_of_year = (localtime(time()))[7];
168
169or more legibly (in 5.004 or higher):
170
171 use Time::localtime;
172 $day_of_year = localtime(time())->yday;
173
174You can find the week of the year by dividing this by 7:
175
176 $week_of_year = int($day_of_year / 7);
177
92c2ed05 178Of course, this believes that weeks start at zero. The Date::Calc
179module from CPAN has a lot of date calculation functions, including
180day of the year, week of the year, and so on.
68dc0745 181
92c2ed05 182=head2 How can I compare two dates and find the difference?
68dc0745 183
92c2ed05 184If you're storing your dates as epoch seconds then simply subtract one
185from the other. If you've got a structured date (distinct year, day,
186month, hour, minute, seconds values) then use one of the Date::Manip
187and Date::Calc modules from CPAN.
68dc0745 188
189=head2 How can I take a string and turn it into epoch seconds?
190
191If it's a regular enough string that it always has the same format,
92c2ed05 192you can split it up and pass the parts to C<timelocal> in the standard
193Time::Local module. Otherwise, you should look into the Date::Calc
194and Date::Manip modules from CPAN.
68dc0745 195
196=head2 How can I find the Julian Day?
197
92c2ed05 198Neither Date::Manip nor Date::Calc deal with Julian days. Instead,
199there is an example of Julian date calculation that should help you in
200http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz
201.
68dc0745 202
5a964f20 203=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
68dc0745 204
92c2ed05 205Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl
206is Y2K compliant.
207
208Long answer: Perl is just as Y2K compliant as your pencil--no more,
209and no less. The date and time functions supplied with perl (gmtime
210and localtime) supply adequate information to determine the year well
211beyond 2000 (2038 is when trouble strikes for 32-bit machines). The
212year returned by these functions when used in an array context is the
213year minus 1900. For years between 1910 and 1999 this I<happens> to
214be a 2-digit decimal number. To avoid the year 2000 problem simply do
215not treat the year as a 2-digit number. It isn't.
68dc0745 216
5a964f20 217When gmtime() and localtime() are used in scalar context they return
68dc0745 218a timestamp string that contains a fully-expanded year. For example,
219C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
2202001". There's no year 2000 problem here.
221
5a964f20 222That doesn't mean that Perl can't be used to create non-Y2K compliant
223programs. It can. But so can your pencil. It's the fault of the user,
224not the language. At the risk of inflaming the NRA: ``Perl doesn't
225break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
226a longer exposition.
227
68dc0745 228=head1 Data: Strings
229
230=head2 How do I validate input?
231
232The answer to this question is usually a regular expression, perhaps
5a964f20 233with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 234addresses, etc.) for details.
235
236=head2 How do I unescape a string?
237
92c2ed05 238It depends just what you mean by ``escape''. URL escapes are dealt
239with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
68dc0745 240character are removed with:
241
242 s/\\(.)/$1/g;
243
92c2ed05 244This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 245
246=head2 How do I remove consecutive pairs of characters?
247
92c2ed05 248To turn C<"abbcccd"> into C<"abccd">:
68dc0745 249
250 s/(.)\1/$1/g;
251
252=head2 How do I expand function calls in a string?
253
254This is documented in L<perlref>. In general, this is fraught with
255quoting and readability problems, but it is possible. To interpolate
5a964f20 256a subroutine call (in list context) into a string:
68dc0745 257
258 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
259
260If you prefer scalar context, similar chicanery is also useful for
261arbitrary expressions:
262
263 print "That yields ${\($n + 5)} widgets\n";
264
92c2ed05 265Version 5.004 of Perl had a bug that gave list context to the
266expression in C<${...}>, but this is fixed in version 5.005.
267
268See also ``How can I expand variables in text strings?'' in this
269section of the FAQ.
46fc3d4c 270
68dc0745 271=head2 How do I find matching/nesting anything?
272
92c2ed05 273This isn't something that can be done in one regular expression, no
274matter how complicated. To find something between two single
275characters, a pattern like C</x([^x]*)x/> will get the intervening
276bits in $1. For multiple ones, then something more like
277C</alpha(.*?)omega/> would be needed. But none of these deals with
278nested patterns, nor can they. For that you'll have to write a
279parser.
280
281If you are serious about writing a parser, there are a number of
282modules or oddities that will make your life a lot easier. There is
283the CPAN module Parse::RecDescent, the standard module Text::Balanced,
284the byacc program, and Mark-Jason Dominus's excellent I<py> tool at
285http://www.plover.com/~mjd/perl/py/ .
68dc0745 286
92c2ed05 287One simple destructive, inside-out approach that you might try is to
288pull out the smallest nesting parts one at a time:
5a964f20 289
c8db1d39 290 while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {
5a964f20 291 # do something with $1
292 }
293
68dc0745 294=head2 How do I reverse a string?
295
5a964f20 296Use reverse() in scalar context, as documented in
68dc0745 297L<perlfunc/reverse>.
298
299 $reversed = reverse $string;
300
301=head2 How do I expand tabs in a string?
302
5a964f20 303You can do it yourself:
68dc0745 304
305 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
306
307Or you can just use the Text::Tabs module (part of the standard perl
308distribution).
309
310 use Text::Tabs;
311 @expanded_lines = expand(@lines_with_tabs);
312
313=head2 How do I reformat a paragraph?
314
315Use Text::Wrap (part of the standard perl distribution):
316
317 use Text::Wrap;
318 print wrap("\t", ' ', @paragraphs);
319
92c2ed05 320The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 321newlines. Text::Wrap doesn't justify the lines (flush-right).
322
68dc0745 323=head2 How can I access/change the first N letters of a string?
324
325There are many ways. If you just want to grab a copy, use
92c2ed05 326substr():
68dc0745 327
328 $first_byte = substr($a, 0, 1);
329
330If you want to modify part of a string, the simplest way is often to
331use substr() as an lvalue:
332
333 substr($a, 0, 3) = "Tom";
334
92c2ed05 335Although those with a pattern matching kind of thought process will
336likely prefer:
68dc0745 337
338 $a =~ s/^.../Tom/;
339
340=head2 How do I change the Nth occurrence of something?
341
92c2ed05 342You have to keep track of N yourself. For example, let's say you want
343to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
344C<"whosoever"> or C<"whomsoever">, case insensitively.
68dc0745 345
346 $count = 0;
347 s{((whom?)ever)}{
348 ++$count == 5 # is it the 5th?
349 ? "${2}soever" # yes, swap
350 : $1 # renege and leave it there
351 }igex;
352
5a964f20 353In the more general case, you can use the C</g> modifier in a C<while>
354loop, keeping count of matches.
355
356 $WANT = 3;
357 $count = 0;
358 while (/(\w+)\s+fish\b/gi) {
359 if (++$count == $WANT) {
360 print "The third fish is a $1 one.\n";
361 # Warning: don't `last' out of this loop
362 }
363 }
364
92c2ed05 365That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 366repetition count and repeated pattern like this:
367
368 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
369
68dc0745 370=head2 How can I count the number of occurrences of a substring within a string?
371
372There are a number of ways, with varying efficiency: If you want a
373count of a certain single character (X) within a string, you can use the
374C<tr///> function like so:
375
376 $string = "ThisXlineXhasXsomeXx'sXinXit":
377 $count = ($string =~ tr/X//);
46fc3d4c 378 print "There are $count X charcters in the string";
68dc0745 379
380This is fine if you are just looking for a single character. However,
381if you are trying to count multiple character substrings within a
382larger string, C<tr///> won't work. What you can do is wrap a while()
383loop around a global pattern match. For example, let's count negative
384integers:
385
386 $string = "-9 55 48 -2 23 -76 4 14 -44";
387 while ($string =~ /-\d+/g) { $count++ }
388 print "There are $count negative numbers in the string";
389
390=head2 How do I capitalize all the words on one line?
391
392To make the first letter of each word upper case:
3fe9a6f1 393
68dc0745 394 $line =~ s/\b(\w)/\U$1/g;
395
46fc3d4c 396This has the strange effect of turning "C<don't do it>" into "C<Don'T
397Do It>". Sometimes you might want this, instead (Suggested by Brian
92c2ed05 398Foy):
46fc3d4c 399
400 $string =~ s/ (
401 (^\w) #at the beginning of the line
402 | # or
403 (\s\w) #preceded by whitespace
404 )
405 /\U$1/xg;
406 $string =~ /([\w']+)/\u\L$1/g;
407
68dc0745 408To make the whole line upper case:
3fe9a6f1 409
68dc0745 410 $line = uc($line);
411
412To force each word to be lower case, with the first letter upper case:
3fe9a6f1 413
68dc0745 414 $line =~ s/(\w+)/\u\L$1/g;
415
5a964f20 416You can (and probably should) enable locale awareness of those
417characters by placing a C<use locale> pragma in your program.
92c2ed05 418See L<perllocale> for endless details on locales.
5a964f20 419
68dc0745 420=head2 How can I split a [character] delimited string except when inside
421[character]? (Comma-separated files)
422
423Take the example case of trying to split a string that is comma-separated
424into its different fields. (We'll pretend you said comma-separated, not
425comma-delimited, which is different and almost never what you mean.) You
426can't use C<split(/,/)> because you shouldn't split if the comma is inside
427quotes. For example, take a data line like this:
428
429 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
430
431Due to the restriction of the quotes, this is a fairly complex
432problem. Thankfully, we have Jeffrey Friedl, author of a highly
433recommended book on regular expressions, to handle these for us. He
434suggests (assuming your string is contained in $text):
435
436 @new = ();
437 push(@new, $+) while $text =~ m{
438 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
439 | ([^,]+),?
440 | ,
441 }gx;
442 push(@new, undef) if substr($text,-1,1) eq ',';
443
46fc3d4c 444If you want to represent quotation marks inside a
445quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 446C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 447this section.
448
68dc0745 449Alternatively, the Text::ParseWords module (part of the standard perl
450distribution) lets you say:
451
452 use Text::ParseWords;
453 @new = quotewords(",", 0, $text);
454
455=head2 How do I strip blank space from the beginning/end of a string?
456
5a964f20 457Although the simplest approach would seem to be:
68dc0745 458
459 $string =~ s/^\s*(.*?)\s*$/$1/;
460
5a964f20 461This is unneccesarily slow, destructive, and fails with embedded newlines.
462It is much better faster to do this in two steps:
68dc0745 463
464 $string =~ s/^\s+//;
465 $string =~ s/\s+$//;
466
467Or more nicely written as:
468
469 for ($string) {
470 s/^\s+//;
471 s/\s+$//;
472 }
473
92c2ed05 474This idiom takes advantage of the C<for(each)> loop's aliasing
5a964f20 475behavior to factor out common code. You can do this
476on several strings at once, or arrays, or even the
477values of a hash if you use a slide:
478
479 # trim whitespace in the scalar, the array,
480 # and all the values in the hash
481 foreach ($scalar, @array, @hash{keys %hash}) {
482 s/^\s+//;
483 s/\s+$//;
484 }
485
68dc0745 486=head2 How do I extract selected columns from a string?
487
488Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 489If you prefer thinking in terms of columns instead of widths,
490you can use this kind of thing:
491
492 # determine the unpack format needed to split Linux ps output
493 # arguments are cut columns
494 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
495
496 sub cut2fmt {
497 my(@positions) = @_;
498 my $template = '';
499 my $lastpos = 1;
500 for my $place (@positions) {
501 $template .= "A" . ($place - $lastpos) . " ";
502 $lastpos = $place;
503 }
504 $template .= "A*";
505 return $template;
506 }
68dc0745 507
508=head2 How do I find the soundex value of a string?
509
510Use the standard Text::Soundex module distributed with perl.
511
512=head2 How can I expand variables in text strings?
513
514Let's assume that you have a string like:
515
516 $text = 'this has a $foo in it and a $bar';
5a964f20 517
518If those were both global variables, then this would
519suffice:
520
68dc0745 521 $text =~ s/\$(\w+)/${$1}/g;
522
5a964f20 523But since they are probably lexicals, or at least, they could
524be, you'd have to do this:
68dc0745 525
526 $text =~ s/(\$\w+)/$1/eeg;
5a964f20 527 die if $@; # needed on /ee, not /e
68dc0745 528
5a964f20 529It's probably better in the general case to treat those
530variables as entries in some special hash. For example:
531
532 %user_defs = (
533 foo => 23,
534 bar => 19,
535 );
536 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 537
92c2ed05 538See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 539of the FAQ.
540
68dc0745 541=head2 What's wrong with always quoting "$vars"?
542
543The problem is that those double-quotes force stringification,
544coercing numbers and references into strings, even when you
545don't want them to be.
546
547If you get used to writing odd things like these:
548
549 print "$var"; # BAD
550 $new = "$old"; # BAD
551 somefunc("$var"); # BAD
552
553You'll be in trouble. Those should (in 99.8% of the cases) be
554the simpler and more direct:
555
556 print $var;
557 $new = $old;
558 somefunc($var);
559
560Otherwise, besides slowing you down, you're going to break code when
561the thing in the scalar is actually neither a string nor a number, but
562a reference:
563
564 func(\@array);
565 sub func {
566 my $aref = shift;
567 my $oref = "$aref"; # WRONG
568 }
569
570You can also get into subtle problems on those few operations in Perl
571that actually do care about the difference between a string and a
572number, such as the magical C<++> autoincrement operator or the
573syscall() function.
574
5a964f20 575Stringification also destroys arrays.
576
577 @lines = `command`;
578 print "@lines"; # WRONG - extra blanks
579 print @lines; # right
580
68dc0745 581=head2 Why don't my <<HERE documents work?
582
583Check for these three things:
584
585=over 4
586
587=item 1. There must be no space after the << part.
588
589=item 2. There (probably) should be a semicolon at the end.
590
591=item 3. You can't (easily) have any space in front of the tag.
592
593=back
594
5a964f20 595If you want to indent the text in the here document, you
596can do this:
597
598 # all in one
599 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
600 your text
601 goes here
602 HERE_TARGET
603
604But the HERE_TARGET must still be flush against the margin.
605If you want that indented also, you'll have to quote
606in the indentation.
607
608 ($quote = <<' FINIS') =~ s/^\s+//gm;
609 ...we will have peace, when you and all your works have
610 perished--and the works of your dark master to whom you
611 would deliver us. You are a liar, Saruman, and a corrupter
612 of men's hearts. --Theoden in /usr/src/perl/taint.c
613 FINIS
614 $quote =~ s/\s*--/\n--/;
615
616A nice general-purpose fixer-upper function for indented here documents
617follows. It expects to be called with a here document as its argument.
618It looks to see whether each line begins with a common substring, and
619if so, strips that off. Otherwise, it takes the amount of leading
620white space found on the first line and removes that much off each
621subsequent line.
622
623 sub fix {
624 local $_ = shift;
625 my ($white, $leader); # common white space and common leading string
626 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
627 ($white, $leader) = ($2, quotemeta($1));
628 } else {
629 ($white, $leader) = (/^(\s+)/, '');
630 }
631 s/^\s*?$leader(?:$white)?//gm;
632 return $_;
633 }
634
c8db1d39 635This works with leading special strings, dynamically determined:
5a964f20 636
637 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
638 @@@ int
639 @@@ runops() {
640 @@@ SAVEI32(runlevel);
641 @@@ runlevel++;
642 @@@ while ( op = (*op->op_ppaddr)() ) ;
643 @@@ TAINT_NOT;
644 @@@ return 0;
645 @@@ }
646 MAIN_INTERPRETER_LOOP
647
648Or with a fixed amount of leading white space, with remaining
649indentation correctly preserved:
650
651 $poem = fix<<EVER_ON_AND_ON;
652 Now far ahead the Road has gone,
653 And I must follow, if I can,
654 Pursuing it with eager feet,
655 Until it joins some larger way
656 Where many paths and errands meet.
657 And whither then? I cannot say.
658 --Bilbo in /usr/src/perl/pp_ctl.c
659 EVER_ON_AND_ON
660
68dc0745 661=head1 Data: Arrays
662
663=head2 What is the difference between $array[1] and @array[1]?
664
665The former is a scalar value, the latter an array slice, which makes
666it a list with one (scalar) value. You should use $ when you want a
667scalar value (most of the time) and @ when you want a list with one
668scalar value in it (very, very rarely; nearly never, in fact).
669
670Sometimes it doesn't make a difference, but sometimes it does.
671For example, compare:
672
673 $good[0] = `some program that outputs several lines`;
674
675with
676
677 @bad[0] = `same program that outputs several lines`;
678
679The B<-w> flag will warn you about these matters.
680
681=head2 How can I extract just the unique elements of an array?
682
683There are several possible ways, depending on whether the array is
684ordered and whether you wish to preserve the ordering.
685
686=over 4
687
688=item a) If @in is sorted, and you want @out to be sorted:
5a964f20 689(this assumes all true values in the array)
68dc0745 690
691 $prev = 'nonesuch';
692 @out = grep($_ ne $prev && ($prev = $_), @in);
693
c8db1d39 694This is nice in that it doesn't use much extra memory, simulating
695uniq(1)'s behavior of removing only adjacent duplicates. It's less
696nice in that it won't work with false values like undef, 0, or "";
697"0 but true" is ok, though.
68dc0745 698
699=item b) If you don't know whether @in is sorted:
700
701 undef %saw;
702 @out = grep(!$saw{$_}++, @in);
703
704=item c) Like (b), but @in contains only small integers:
705
706 @out = grep(!$saw[$_]++, @in);
707
708=item d) A way to do (b) without any loops or greps:
709
710 undef %saw;
711 @saw{@in} = ();
712 @out = sort keys %saw; # remove sort if undesired
713
714=item e) Like (d), but @in contains only small positive integers:
715
716 undef @ary;
717 @ary[@in] = @in;
718 @out = @ary;
719
720=back
721
5a964f20 722=head2 How can I tell whether a list or array contains a certain element?
723
724Hearing the word "in" is an I<in>dication that you probably should have
725used a hash, not a list or array, to store your data. Hashes are
726designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 727
5a964f20 728That being said, there are several ways to approach this. If you
729are going to make this query many times over arbitrary string values,
730the fastest way is probably to invert the original array and keep an
68dc0745 731associative array lying about whose keys are the first array's values.
732
733 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
734 undef %is_blue;
735 for (@blues) { $is_blue{$_} = 1 }
736
737Now you can check whether $is_blue{$some_color}. It might have been a
738good idea to keep the blues all in a hash in the first place.
739
740If the values are all small integers, you could use a simple indexed
741array. This kind of an array will take up less space:
742
743 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
744 undef @is_tiny_prime;
745 for (@primes) { $is_tiny_prime[$_] = 1; }
746
747Now you check whether $is_tiny_prime[$some_number].
748
749If the values in question are integers instead of strings, you can save
750quite a lot of space by using bit strings instead:
751
752 @articles = ( 1..10, 150..2000, 2017 );
753 undef $read;
7b8d334a 754 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 755
756Now check whether C<vec($read,$n,1)> is true for some C<$n>.
757
758Please do not use
759
760 $is_there = grep $_ eq $whatever, @array;
761
762or worse yet
763
764 $is_there = grep /$whatever/, @array;
765
766These are slow (checks every element even if the first matches),
767inefficient (same reason), and potentially buggy (what if there are
768regexp characters in $whatever?).
769
770=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
771
772Use a hash. Here's code to do both and more. It assumes that
773each element is unique in a given array:
774
775 @union = @intersection = @difference = ();
776 %count = ();
777 foreach $element (@array1, @array2) { $count{$element}++ }
778 foreach $element (keys %count) {
779 push @union, $element;
780 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
781 }
782
783=head2 How do I find the first array element for which a condition is true?
784
785You can use this if you care about the index:
786
787 for ($i=0; $i < @array; $i++) {
788 if ($array[$i] eq "Waldo") {
789 $found_index = $i;
790 last;
791 }
792 }
793
794Now C<$found_index> has what you want.
795
796=head2 How do I handle linked lists?
797
798In general, you usually don't need a linked list in Perl, since with
799regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 800or you can use splice to add and/or remove arbitrary number of elements at
801arbitrary points. Both pop and shift are both O(1) operations on perl's
802dynamic arrays. In the absence of shifts and pops, push in general
803needs to reallocate on the order every log(N) times, and unshift will
804need to copy pointers each time.
68dc0745 805
806If you really, really wanted, you could use structures as described in
807L<perldsc> or L<perltoot> and do just what the algorithm book tells you
808to do.
809
810=head2 How do I handle circular lists?
811
812Circular lists could be handled in the traditional fashion with linked
813lists, or you could just do something like this with an array:
814
815 unshift(@array, pop(@array)); # the last shall be first
816 push(@array, shift(@array)); # and vice versa
817
818=head2 How do I shuffle an array randomly?
819
5a964f20 820Use this:
821
822 # fisher_yates_shuffle( \@array ) :
823 # generate a random permutation of @array in place
824 sub fisher_yates_shuffle {
825 my $array = shift;
826 my $i;
827 for ($i = @$array; --$i; ) {
828 my $j = int rand ($i+1);
829 next if $i == $j;
830 @$array[$i,$j] = @$array[$j,$i];
831 }
832 }
833
834 fisher_yates_shuffle( \@array ); # permutes @array in place
835
836You've probably seen shuffling algorithms that works using splice,
68dc0745 837randomly picking another element to swap the current element with:
838
839 srand;
840 @new = ();
841 @old = 1 .. 10; # just a demo
842 while (@old) {
843 push(@new, splice(@old, rand @old, 1));
844 }
845
5a964f20 846This is bad because splice is already O(N), and since you do it N times,
847you just invented a quadratic algorithm; that is, O(N**2). This does
848not scale, although Perl is so efficient that you probably won't notice
849this until you have rather largish arrays.
68dc0745 850
851=head2 How do I process/modify each element of an array?
852
853Use C<for>/C<foreach>:
854
855 for (@lines) {
5a964f20 856 s/foo/bar/; # change that word
857 y/XZ/ZX/; # swap those letters
68dc0745 858 }
859
860Here's another; let's compute spherical volumes:
861
5a964f20 862 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 863 $_ **= 3;
864 $_ *= (4/3) * 3.14159; # this will be constant folded
865 }
866
5a964f20 867If you want to do the same thing to modify the values of the hash,
868you may not use the C<values> function, oddly enough. You need a slice:
869
870 for $orbit ( @orbits{keys %orbits} ) {
871 ($orbit **= 3) *= (4/3) * 3.14159;
872 }
873
68dc0745 874=head2 How do I select a random element from an array?
875
876Use the rand() function (see L<perlfunc/rand>):
877
5a964f20 878 # at the top of the program:
68dc0745 879 srand; # not needed for 5.004 and later
5a964f20 880
881 # then later on
68dc0745 882 $index = rand @array;
883 $element = $array[$index];
884
5a964f20 885Make sure you I<only call srand once per program, if then>.
886If you are calling it more than once (such as before each
887call to rand), you're almost certainly doing something wrong.
888
68dc0745 889=head2 How do I permute N elements of a list?
890
891Here's a little program that generates all permutations
892of all the words on each line of input. The algorithm embodied
5a964f20 893in the permute() function should work on any list:
68dc0745 894
895 #!/usr/bin/perl -n
5a964f20 896 # tsc-permute: permute each word of input
897 permute([split], []);
898 sub permute {
899 my @items = @{ $_[0] };
900 my @perms = @{ $_[1] };
901 unless (@items) {
902 print "@perms\n";
68dc0745 903 } else {
5a964f20 904 my(@newitems,@newperms,$i);
905 foreach $i (0 .. $#items) {
906 @newitems = @items;
907 @newperms = @perms;
908 unshift(@newperms, splice(@newitems, $i, 1));
909 permute([@newitems], [@newperms]);
68dc0745 910 }
911 }
912 }
913
914=head2 How do I sort an array by (anything)?
915
916Supply a comparison function to sort() (described in L<perlfunc/sort>):
917
918 @list = sort { $a <=> $b } @list;
919
920The default sort function is cmp, string comparison, which would
921sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<E<lt>=E<gt>>, used above, is
922the numerical comparison operator.
923
924If you have a complicated function needed to pull out the part you
925want to sort on, then don't do it inside the sort function. Pull it
926out first, because the sort BLOCK can be called many times for the
927same element. Here's an example of how to pull out the first word
928after the first number on each item, and then sort those words
929case-insensitively.
930
931 @idx = ();
932 for (@data) {
933 ($item) = /\d+\s*(\S+)/;
934 push @idx, uc($item);
935 }
936 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
937
938Which could also be written this way, using a trick
939that's come to be known as the Schwartzian Transform:
940
941 @sorted = map { $_->[0] }
942 sort { $a->[1] cmp $b->[1] }
46fc3d4c 943 map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
68dc0745 944
945If you need to sort on several fields, the following paradigm is useful.
946
947 @sorted = sort { field1($a) <=> field1($b) ||
948 field2($a) cmp field2($b) ||
949 field3($a) cmp field3($b)
950 } @data;
951
952This can be conveniently combined with precalculation of keys as given
953above.
954
955See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
956this approach.
957
958See also the question below on sorting hashes.
959
960=head2 How do I manipulate arrays of bits?
961
962Use pack() and unpack(), or else vec() and the bitwise operations.
963
964For example, this sets $vec to have bit N set if $ints[N] was set:
965
966 $vec = '';
967 foreach(@ints) { vec($vec,$_,1) = 1 }
968
969And here's how, given a vector in $vec, you can
970get those bits into your @ints array:
971
972 sub bitvec_to_list {
973 my $vec = shift;
974 my @ints;
975 # Find null-byte density then select best algorithm
976 if ($vec =~ tr/\0// / length $vec > 0.95) {
977 use integer;
978 my $i;
979 # This method is faster with mostly null-bytes
980 while($vec =~ /[^\0]/g ) {
981 $i = -9 + 8 * pos $vec;
982 push @ints, $i if vec($vec, ++$i, 1);
983 push @ints, $i if vec($vec, ++$i, 1);
984 push @ints, $i if vec($vec, ++$i, 1);
985 push @ints, $i if vec($vec, ++$i, 1);
986 push @ints, $i if vec($vec, ++$i, 1);
987 push @ints, $i if vec($vec, ++$i, 1);
988 push @ints, $i if vec($vec, ++$i, 1);
989 push @ints, $i if vec($vec, ++$i, 1);
990 }
991 } else {
992 # This method is a fast general algorithm
993 use integer;
994 my $bits = unpack "b*", $vec;
995 push @ints, 0 if $bits =~ s/^(\d)// && $1;
996 push @ints, pos $bits while($bits =~ /1/g);
997 }
998 return \@ints;
999 }
1000
1001This method gets faster the more sparse the bit vector is.
1002(Courtesy of Tim Bunce and Winfried Koenig.)
1003
1004=head2 Why does defined() return true on empty arrays and hashes?
1005
1006See L<perlfunc/defined> in the 5.004 release or later of Perl.
1007
1008=head1 Data: Hashes (Associative Arrays)
1009
1010=head2 How do I process an entire hash?
1011
1012Use the each() function (see L<perlfunc/each>) if you don't care
1013whether it's sorted:
1014
5a964f20 1015 while ( ($key, $value) = each %hash) {
68dc0745 1016 print "$key = $value\n";
1017 }
1018
1019If you want it sorted, you'll have to use foreach() on the result of
1020sorting the keys as shown in an earlier question.
1021
1022=head2 What happens if I add or remove keys from a hash while iterating over it?
1023
1024Don't do that.
1025
1026=head2 How do I look up a hash element by value?
1027
1028Create a reverse hash:
1029
1030 %by_value = reverse %by_key;
1031 $key = $by_value{$value};
1032
1033That's not particularly efficient. It would be more space-efficient
1034to use:
1035
1036 while (($key, $value) = each %by_key) {
1037 $by_value{$value} = $key;
1038 }
1039
1040If your hash could have repeated values, the methods above will only
1041find one of the associated keys. This may or may not worry you.
1042
1043=head2 How can I know how many entries are in a hash?
1044
1045If you mean how many keys, then all you have to do is
1046take the scalar sense of the keys() function:
1047
3fe9a6f1 1048 $num_keys = scalar keys %hash;
68dc0745 1049
1050In void context it just resets the iterator, which is faster
1051for tied hashes.
1052
1053=head2 How do I sort a hash (optionally by value instead of key)?
1054
1055Internally, hashes are stored in a way that prevents you from imposing
1056an order on key-value pairs. Instead, you have to sort a list of the
1057keys or values:
1058
1059 @keys = sort keys %hash; # sorted by key
1060 @keys = sort {
1061 $hash{$a} cmp $hash{$b}
1062 } keys %hash; # and by value
1063
1064Here we'll do a reverse numeric sort by value, and if two keys are
1065identical, sort by length of key, and if that fails, by straight ASCII
1066comparison of the keys (well, possibly modified by your locale -- see
1067L<perllocale>).
1068
1069 @keys = sort {
1070 $hash{$b} <=> $hash{$a}
1071 ||
1072 length($b) <=> length($a)
1073 ||
1074 $a cmp $b
1075 } keys %hash;
1076
1077=head2 How can I always keep my hash sorted?
1078
1079You can look into using the DB_File module and tie() using the
1080$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1081The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1082
1083=head2 What's the difference between "delete" and "undef" with hashes?
1084
1085Hashes are pairs of scalars: the first is the key, the second is the
1086value. The key will be coerced to a string, although the value can be
1087any kind of scalar: string, number, or reference. If a key C<$key> is
1088present in the array, C<exists($key)> will return true. The value for
1089a given key can be C<undef>, in which case C<$array{$key}> will be
1090C<undef> while C<$exists{$key}> will return true. This corresponds to
1091(C<$key>, C<undef>) being in the hash.
1092
1093Pictures help... here's the C<%ary> table:
1094
1095 keys values
1096 +------+------+
1097 | a | 3 |
1098 | x | 7 |
1099 | d | 0 |
1100 | e | 2 |
1101 +------+------+
1102
1103And these conditions hold
1104
1105 $ary{'a'} is true
1106 $ary{'d'} is false
1107 defined $ary{'d'} is true
1108 defined $ary{'a'} is true
1109 exists $ary{'a'} is true (perl5 only)
1110 grep ($_ eq 'a', keys %ary) is true
1111
1112If you now say
1113
1114 undef $ary{'a'}
1115
1116your table now reads:
1117
1118
1119 keys values
1120 +------+------+
1121 | a | undef|
1122 | x | 7 |
1123 | d | 0 |
1124 | e | 2 |
1125 +------+------+
1126
1127and these conditions now hold; changes in caps:
1128
1129 $ary{'a'} is FALSE
1130 $ary{'d'} is false
1131 defined $ary{'d'} is true
1132 defined $ary{'a'} is FALSE
1133 exists $ary{'a'} is true (perl5 only)
1134 grep ($_ eq 'a', keys %ary) is true
1135
1136Notice the last two: you have an undef value, but a defined key!
1137
1138Now, consider this:
1139
1140 delete $ary{'a'}
1141
1142your table now reads:
1143
1144 keys values
1145 +------+------+
1146 | x | 7 |
1147 | d | 0 |
1148 | e | 2 |
1149 +------+------+
1150
1151and these conditions now hold; changes in caps:
1152
1153 $ary{'a'} is false
1154 $ary{'d'} is false
1155 defined $ary{'d'} is true
1156 defined $ary{'a'} is false
1157 exists $ary{'a'} is FALSE (perl5 only)
1158 grep ($_ eq 'a', keys %ary) is FALSE
1159
1160See, the whole entry is gone!
1161
1162=head2 Why don't my tied hashes make the defined/exists distinction?
1163
1164They may or may not implement the EXISTS() and DEFINED() methods
1165differently. For example, there isn't the concept of undef with hashes
1166that are tied to DBM* files. This means the true/false tables above
1167will give different results when used on such a hash. It also means
1168that exists and defined do the same thing with a DBM* file, and what
1169they end up doing is not what they do with ordinary hashes.
1170
1171=head2 How do I reset an each() operation part-way through?
1172
5a964f20 1173Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1174the hash I<and> resets the iterator associated with the hash. You may
1175need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1176re-enter it, the hash iterator has been reset.
68dc0745 1177
1178=head2 How can I get the unique keys from two hashes?
1179
1180First you extract the keys from the hashes into arrays, and then solve
1181the uniquifying the array problem described above. For example:
1182
1183 %seen = ();
1184 for $element (keys(%foo), keys(%bar)) {
1185 $seen{$element}++;
1186 }
1187 @uniq = keys %seen;
1188
1189Or more succinctly:
1190
1191 @uniq = keys %{{%foo,%bar}};
1192
1193Or if you really want to save space:
1194
1195 %seen = ();
1196 while (defined ($key = each %foo)) {
1197 $seen{$key}++;
1198 }
1199 while (defined ($key = each %bar)) {
1200 $seen{$key}++;
1201 }
1202 @uniq = keys %seen;
1203
1204=head2 How can I store a multidimensional array in a DBM file?
1205
1206Either stringify the structure yourself (no fun), or else
1207get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1208it on top of either DB_File or GDBM_File.
1209
1210=head2 How can I make my hash remember the order I put elements into it?
1211
1212Use the Tie::IxHash from CPAN.
1213
46fc3d4c 1214 use Tie::IxHash;
1215 tie(%myhash, Tie::IxHash);
1216 for ($i=0; $i<20; $i++) {
1217 $myhash{$i} = 2*$i;
1218 }
1219 @keys = keys %myhash;
1220 # @keys = (0,1,2,3,...)
1221
68dc0745 1222=head2 Why does passing a subroutine an undefined element in a hash create it?
1223
1224If you say something like:
1225
1226 somefunc($hash{"nonesuch key here"});
1227
1228Then that element "autovivifies"; that is, it springs into existence
1229whether you store something there or not. That's because functions
1230get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1231it has to be ready to write it back into the caller's version.
1232
1233This has been fixed as of perl5.004.
1234
1235Normally, merely accessing a key's value for a nonexistent key does
1236I<not> cause that key to be forever there. This is different than
1237awk's behavior.
1238
fc36a67e 1239=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1240
1241Use references (documented in L<perlref>). Examples of complex data
1242structures are given in L<perldsc> and L<perllol>. Examples of
1243structures and object-oriented classes are in L<perltoot>.
1244
1245=head2 How can I use a reference as a hash key?
1246
1247You can't do this directly, but you could use the standard Tie::Refhash
1248module distributed with perl.
1249
1250=head1 Data: Misc
1251
1252=head2 How do I handle binary data correctly?
1253
1254Perl is binary clean, so this shouldn't be a problem. For example,
1255this works fine (assuming the files are found):
1256
1257 if (`cat /vmunix` =~ /gzip/) {
1258 print "Your kernel is GNU-zip enabled!\n";
1259 }
1260
1261On some systems, however, you have to play tedious games with "text"
1262versus "binary" files. See L<perlfunc/"binmode">.
1263
1264If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1265
54310121 1266If you want to deal with multibyte characters, however, there are
68dc0745 1267some gotchas. See the section on Regular Expressions.
1268
1269=head2 How do I determine whether a scalar is a number/whole/integer/float?
1270
1271Assuming that you don't care about IEEE notations like "NaN" or
1272"Infinity", you probably just want to use a regular expression.
1273
1274 warn "has nondigits" if /\D/;
5a964f20 1275 warn "not a natural number" unless /^\d+$/; # rejects -3
1276 warn "not an integer" unless /^-?\d+$/; # rejects +3
54310121 1277 warn "not an integer" unless /^[+-]?\d+$/;
68dc0745 1278 warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
1279 warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
1280 warn "not a C float"
1281 unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
1282
5a964f20 1283If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1284function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1285wrapper function for more convenient access. This function takes
1286a string and returns the number it found, or C<undef> for input that
1287isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1288if you just want to say, ``Is this a float?''
1289
1290 sub getnum {
1291 use POSIX qw(strtod);
1292 my $str = shift;
1293 $str =~ s/^\s+//;
1294 $str =~ s/\s+$//;
1295 $! = 0;
1296 my($num, $unparsed) = strtod($str);
1297 if (($str eq '') || ($unparsed != 0) || $!) {
1298 return undef;
1299 } else {
1300 return $num;
1301 }
1302 }
1303
1304 sub is_numeric { defined &getnum }
1305
68dc0745 1306Or you could check out
1307http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz
1308instead. The POSIX module (part of the standard Perl distribution)
1309provides the C<strtol> and C<strtod> for converting strings to double
1310and longs, respectively.
1311
1312=head2 How do I keep persistent data across program calls?
1313
1314For some specific applications, you can use one of the DBM modules.
1315See L<AnyDBM_File>. More generically, you should consult the
1316FreezeThaw, Storable, or Class::Eroot modules from CPAN.
1317
1318=head2 How do I print out or copy a recursive data structure?
1319
1320The Data::Dumper module on CPAN is nice for printing out
1321data structures, and FreezeThaw for copying them. For example:
1322
1323 use FreezeThaw qw(freeze thaw);
1324 $new = thaw freeze $old;
1325
1326Where $old can be (a reference to) any kind of data structure you'd like.
1327It will be deeply copied.
1328
1329=head2 How do I define methods for every class/object?
1330
1331Use the UNIVERSAL class (see L<UNIVERSAL>).
1332
1333=head2 How do I verify a credit card checksum?
1334
1335Get the Business::CreditCard module from CPAN.
1336
1337=head1 AUTHOR AND COPYRIGHT
1338
5a964f20 1339Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
1340All rights reserved.
1341
1342When included as part of the Standard Version of Perl, or as part of
1343its complete documentation whether printed or otherwise, this work
1344may be distributed only under the terms of Perl's Artistic License.
1345Any distribution of this file or derivatives thereof I<outside>
1346of that package require that special arrangements be made with
1347copyright holder.
1348
1349Irrespective of its distribution, all code examples in this file
1350are hereby placed into the public domain. You are permitted and
1351encouraged to use this code in your own programs for fun
1352or for profit as you see fit. A simple comment in the code giving
1353credit would be courteous but is not required.