hints/openbsd.sh tweaks.
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
d92eb7b0 3perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $)
68dc0745 4
5=head1 DESCRIPTION
6
a6dd486b 7The section of the FAQ answers questions related to the manipulation
68dc0745 8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20 15The infinite set that a mathematician thinks of as the real numbers can
a6dd486b 16only be approximated on a computer, since the computer only has a finite
5a964f20 17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05 20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
a6dd486b 22representation (eg, 19.95) to an internal binary representation.
46fc3d4c 23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
a6dd486b 32current output format for numbers. (See L<perlvar/"$#"> if you use
46fc3d4c 33print. C<$#> has a different default value in Perl5 than it did in
87275199 34Perl4. Changing C<$#> yourself is deprecated.)
46fc3d4c 35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
80ba158a 42If precision is important, such as when dealing with money, it's good
1affb2ee 43to work with integers and then divide at the last possible moment.
44For example, work in pennies (1995) instead of dollars and cents
6b927632 45(19.95) and divide by 100 at the end.
1affb2ee 46
46fc3d4c 47To get rid of the superfluous digits, just use a format (eg,
48C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 49See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 50
68dc0745 51=head2 Why isn't my octal data interpreted correctly?
52
53Perl only understands octal and hex numbers as such when they occur
33ce146f 54as literals in your program. Octal literals in perl must start with
55a leading "0" and hexadecimal literals must start with a leading "0x".
56If they are read in from somewhere and assigned, no automatic
57conversion takes place. You must explicitly use oct() or hex() if you
58want the values converted to decimal. oct() interprets
68dc0745 59both hex ("0x350") numbers and octal ones ("0350" or even without the
60leading "0", like "377"), while hex() only converts hexadecimal ones,
61with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
33ce146f 62The inverse mapping from decimal to octal can be done with either the
63"%o" or "%O" sprintf() formats. To get from decimal to hex try either
64the "%x" or the "%X" formats to sprintf().
68dc0745 65
66This problem shows up most often when people try using chmod(), mkdir(),
33ce146f 67umask(), or sysopen(), which by widespread tradition typically take
68permissions in octal.
68dc0745 69
33ce146f 70 chmod(644, $file); # WRONG
68dc0745 71 chmod(0644, $file); # right
72
33ce146f 73Note the mistake in the first line was specifying the decimal literal
74644, rather than the intended octal literal 0644. The problem can
75be seen with:
76
434f7166 77 printf("%#o",644); # prints 01204
33ce146f 78
79Surely you had not intended C<chmod(01204, $file);> - did you? If you
80want to use numeric literals as arguments to chmod() et al. then please
81try to express them as octal constants, that is with a leading zero and
82with the following digits restricted to the set 0..7.
83
65acb1b1 84=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 85
92c2ed05 86Remember that int() merely truncates toward 0. For rounding to a
87certain number of digits, sprintf() or printf() is usually the easiest
88route.
89
90 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 91
87275199 92The POSIX module (part of the standard Perl distribution) implements
68dc0745 93ceil(), floor(), and a number of other mathematical and trigonometric
94functions.
95
92c2ed05 96 use POSIX;
97 $ceil = ceil(3.5); # 4
98 $floor = floor(3.5); # 3
99
a6dd486b 100In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
87275199 101module. With 5.004, the Math::Trig module (part of the standard Perl
46fc3d4c 102distribution) implements the trigonometric functions. Internally it
103uses the Math::Complex module and some functions can break out from
104the real axis into the complex plane, for example the inverse sine of
1052.
68dc0745 106
107Rounding in financial applications can have serious implications, and
108the rounding method used should be specified precisely. In these
109cases, it probably pays not to trust whichever system rounding is
110being used by Perl, but to instead implement the rounding function you
111need yourself.
112
65acb1b1 113To see why, notice how you'll still have an issue on half-way-point
114alternation:
115
116 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
117
118 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
119 0.8 0.8 0.9 0.9 1.0 1.0
120
121Don't blame Perl. It's the same as in C. IEEE says we have to do this.
122Perl numbers whose absolute values are integers under 2**31 (on 32 bit
123machines) will work pretty much like mathematical integers. Other numbers
124are not guaranteed.
125
68dc0745 126=head2 How do I convert bits into ints?
127
92c2ed05 128To turn a string of 1s and 0s like C<10110110> into a scalar containing
d92eb7b0 129its binary value, use the pack() and unpack() functions (documented in
87275199 130L<perlfunc/"pack"> and L<perlfunc/"unpack">):
68dc0745 131
d92eb7b0 132 $decimal = unpack('c', pack('B8', '10110110'));
133
134This packs the string C<10110110> into an eight bit binary structure.
87275199 135This is then unpacked as a character, which returns its ordinal value.
d92eb7b0 136
137This does the same thing:
138
139 $decimal = ord(pack('B8', '10110110'));
68dc0745 140
141Here's an example of going the other way:
142
d92eb7b0 143 $binary_string = unpack('B*', "\x29");
68dc0745 144
65acb1b1 145=head2 Why doesn't & work the way I want it to?
146
147The behavior of binary arithmetic operators depends on whether they're
148used on numbers or strings. The operators treat a string as a series
149of bits and work with that (the string C<"3"> is the bit pattern
150C<00110011>). The operators work with the binary form of a number
151(the number C<3> is treated as the bit pattern C<00000011>).
152
153So, saying C<11 & 3> performs the "and" operation on numbers (yielding
154C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
155(yielding C<"1">).
156
157Most problems with C<&> and C<|> arise because the programmer thinks
158they have a number but really it's a string. The rest arise because
159the programmer says:
160
161 if ("\020\020" & "\101\101") {
162 # ...
163 }
164
165but a string consisting of two null bytes (the result of C<"\020\020"
166& "\101\101">) is not a false value in Perl. You need:
167
168 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
169 # ...
170 }
171
68dc0745 172=head2 How do I multiply matrices?
173
174Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
175or the PDL extension (also available from CPAN).
176
177=head2 How do I perform an operation on a series of integers?
178
179To call a function on each element in an array, and collect the
180results, use:
181
182 @results = map { my_func($_) } @array;
183
184For example:
185
186 @triple = map { 3 * $_ } @single;
187
188To call a function on each element of an array, but ignore the
189results:
190
191 foreach $iterator (@array) {
65acb1b1 192 some_func($iterator);
68dc0745 193 }
194
195To call a function on each integer in a (small) range, you B<can> use:
196
65acb1b1 197 @results = map { some_func($_) } (5 .. 25);
68dc0745 198
199but you should be aware that the C<..> operator creates an array of
200all integers in the range. This can take a lot of memory for large
201ranges. Instead use:
202
203 @results = ();
204 for ($i=5; $i < 500_005; $i++) {
65acb1b1 205 push(@results, some_func($i));
68dc0745 206 }
207
87275199 208This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
209loop will iterate over the range, without creating the entire range.
210
211 for my $i (5 .. 500_005) {
212 push(@results, some_func($i));
213 }
214
215will not create a list of 500,000 integers.
216
68dc0745 217=head2 How can I output Roman numerals?
218
219Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
220
221=head2 Why aren't my random numbers random?
222
65acb1b1 223If you're using a version of Perl before 5.004, you must call C<srand>
224once at the start of your program to seed the random number generator.
2255.004 and later automatically call C<srand> at the beginning. Don't
226call C<srand> more than once--you make your numbers less random, rather
227than more.
92c2ed05 228
65acb1b1 229Computers are good at being predictable and bad at being random
230(despite appearances caused by bugs in your programs :-).
a6dd486b 231http://www.perl.com/CPAN/doc/FMTEYEWTK/random , courtesy of Tom
232Phoenix, talks more about this. John von Neumann said, ``Anyone who
65acb1b1 233attempts to generate random numbers by deterministic means is, of
234course, living in a state of sin.''
235
236If you want numbers that are more random than C<rand> with C<srand>
237provides, you should also check out the Math::TrulyRandom module from
238CPAN. It uses the imperfections in your system's timer to generate
239random numbers, but this takes quite a while. If you want a better
92c2ed05 240pseudorandom generator than comes with your operating system, look at
65acb1b1 241``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 242
243=head1 Data: Dates
244
245=head2 How do I find the week-of-the-year/day-of-the-year?
246
247The day of the year is in the array returned by localtime() (see
248L<perlfunc/"localtime">):
249
250 $day_of_year = (localtime(time()))[7];
251
89435c96 252or more legibly (in 5.7.1 or higher):
68dc0745 253
89435c96 254 use Time::Piece;
255 $day_of_year = localtime->day_of_year();
68dc0745 256
89435c96 257You can find the week of the year by using Time::Piece's strftime():
68dc0745 258
89435c96 259 $week_of_year = localtime->strftime("%U");
260 $iso_week = localtime->strftime("%V");
68dc0745 261
89435c96 262The difference between %U and %V is that %U assumes that the first day
263of week 1 is the first Sunday of the year, whereas ISO 8601:1988 uses
264the first week that has at least 4 days in the current year, and with
265Monday as the first day of the week. You can also use %W, which will
266return the week of the year with Monday as the first day of week 1. See
267your strftime(3) man page for more details.
68dc0745 268
d92eb7b0 269=head2 How do I find the current century or millennium?
270
271Use the following simple functions:
272
273 sub get_century {
274 return int((((localtime(shift || time))[5] + 1999))/100);
275 }
276 sub get_millennium {
277 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
278 }
279
280On some systems, you'll find that the POSIX module's strftime() function
281has been extended in a non-standard way to use a C<%C> format, which they
282sometimes claim is the "century". It isn't, because on most such systems,
283this is only the first two digits of the four-digit year, and thus cannot
284be used to reliably determine the current century or millennium.
285
92c2ed05 286=head2 How can I compare two dates and find the difference?
68dc0745 287
92c2ed05 288If you're storing your dates as epoch seconds then simply subtract one
289from the other. If you've got a structured date (distinct year, day,
d92eb7b0 290month, hour, minute, seconds values), then for reasons of accessibility,
291simplicity, and efficiency, merely use either timelocal or timegm (from
292the Time::Local module in the standard distribution) to reduce structured
293dates to epoch seconds. However, if you don't know the precise format of
294your dates, then you should probably use either of the Date::Manip and
295Date::Calc modules from CPAN before you go hacking up your own parsing
296routine to handle arbitrary date formats.
68dc0745 297
89435c96 298Also note that the core module Time::Piece overloads the addition and
299subtraction operators to provide date calculation options. See
300L<Time::Piece/Date Calculations>.
301
68dc0745 302=head2 How can I take a string and turn it into epoch seconds?
303
304If it's a regular enough string that it always has the same format,
92c2ed05 305you can split it up and pass the parts to C<timelocal> in the standard
306Time::Local module. Otherwise, you should look into the Date::Calc
307and Date::Manip modules from CPAN.
68dc0745 308
309=head2 How can I find the Julian Day?
310
89435c96 311Use Time::Piece as follows:
312
313 use Time::Piece;
314 my $julian_day = localtime->julian_day;
315 my $mjd = localtime->mjd; # modified julian day
d92eb7b0 316
89435c96 317Before you immerse yourself too deeply in this, be sure to verify that
318it is the I<Julian> Day you really want. Are you interested in a way
319of getting serial days so that you just can tell how many days they
320are apart or so that you can do also other date arithmetic? If you
d92eb7b0 321are interested in performing date arithmetic, this can be done using
89435c96 322Time::Piece (standard module since Perl 5.8), or by modules
323Date::Manip or Date::Calc.
324
325There is too many details and much confusion on this issue to cover in
326this FAQ, but the term is applied (correctly) to a calendar now
327supplanted by the Gregorian Calendar, with the Julian Calendar failing
328to adjust properly for leap years on centennial years (among other
329annoyances). The term is also used (incorrectly) to mean: [1] days in
330the Gregorian Calendar; and [2] days since a particular starting time
331or `epoch', usually 1970 in the Unix world and 1980 in the
332MS-DOS/Windows world. If you find that it is not the first meaning
333that you really want, then check out the Date::Manip and Date::Calc
334modules. (Thanks to David Cassell for most of this text.)
be94a901 335
65acb1b1 336=head2 How do I find yesterday's date?
337
338The C<time()> function returns the current time in seconds since the
d92eb7b0 339epoch. Take twenty-four hours off that:
65acb1b1 340
341 $yesterday = time() - ( 24 * 60 * 60 );
342
343Then you can pass this to C<localtime()> and get the individual year,
344month, day, hour, minute, seconds values.
345
89435c96 346Alternatively, you can use Time::Piece to subtract a day from the value
347returned from C<localtime()>:
348
349 use Time::Piece;
350 use Time::Seconds; # imports seconds constants, like ONE_DAY
351 my $today = localtime();
352 my $yesterday = $today - ONE_DAY;
353
d92eb7b0 354Note very carefully that the code above assumes that your days are
355twenty-four hours each. For most people, there are two days a year
356when they aren't: the switch to and from summer time throws this off.
357A solution to this issue is offered by Russ Allbery.
358
359 sub yesterday {
360 my $now = defined $_[0] ? $_[0] : time;
361 my $then = $now - 60 * 60 * 24;
362 my $ndst = (localtime $now)[8] > 0;
363 my $tdst = (localtime $then)[8] > 0;
364 $then - ($tdst - $ndst) * 60 * 60;
365 }
366 # Should give you "this time yesterday" in seconds since epoch relative to
367 # the first argument or the current time if no argument is given and
368 # suitable for passing to localtime or whatever else you need to do with
369 # it. $ndst is whether we're currently in daylight savings time; $tdst is
370 # whether the point 24 hours ago was in daylight savings time. If $tdst
371 # and $ndst are the same, a boundary wasn't crossed, and the correction
372 # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
373 # from yesterday's time since we gained an extra hour while going off
374 # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
375 # negative hour (add an hour) to yesterday's time since we lost an hour.
376 #
377 # All of this is because during those days when one switches off or onto
378 # DST, a "day" isn't 24 hours long; it's either 23 or 25.
379 #
380 # The explicit settings of $ndst and $tdst are necessary because localtime
381 # only says it returns the system tm struct, and the system tm struct at
87275199 382 # least on Solaris doesn't guarantee any particular positive value (like,
d92eb7b0 383 # say, 1) for isdst, just a positive value. And that value can
384 # potentially be negative, if DST information isn't available (this sub
385 # just treats those cases like no DST).
386 #
387 # Note that between 2am and 3am on the day after the time zone switches
388 # off daylight savings time, the exact hour of "yesterday" corresponding
389 # to the current hour is not clearly defined. Note also that if used
390 # between 2am and 3am the day after the change to daylight savings time,
391 # the result will be between 3am and 4am of the previous day; it's
392 # arguable whether this is correct.
393 #
394 # This sub does not attempt to deal with leap seconds (most things don't).
395 #
396 # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
397 # This code is in the public domain
398
87275199 399=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 400
65acb1b1 401Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
402Y2K compliant (whatever that means). The programmers you've hired to
403use it, however, probably are not.
404
405Long answer: The question belies a true understanding of the issue.
406Perl is just as Y2K compliant as your pencil--no more, and no less.
407Can you use your pencil to write a non-Y2K-compliant memo? Of course
408you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 409
87275199 410The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 411supply adequate information to determine the year well beyond 2000
412(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 413by these functions when used in a list context is the year minus 1900.
65acb1b1 414For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
415number. To avoid the year 2000 problem simply do not treat the year as
416a 2-digit number. It isn't.
68dc0745 417
5a964f20 418When gmtime() and localtime() are used in scalar context they return
68dc0745 419a timestamp string that contains a fully-expanded year. For example,
420C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
4212001". There's no year 2000 problem here.
422
5a964f20 423That doesn't mean that Perl can't be used to create non-Y2K compliant
424programs. It can. But so can your pencil. It's the fault of the user,
425not the language. At the risk of inflaming the NRA: ``Perl doesn't
426break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
427a longer exposition.
428
68dc0745 429=head1 Data: Strings
430
431=head2 How do I validate input?
432
433The answer to this question is usually a regular expression, perhaps
5a964f20 434with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 435addresses, etc.) for details.
436
437=head2 How do I unescape a string?
438
92c2ed05 439It depends just what you mean by ``escape''. URL escapes are dealt
440with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 441character are removed with
68dc0745 442
443 s/\\(.)/$1/g;
444
92c2ed05 445This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 446
447=head2 How do I remove consecutive pairs of characters?
448
92c2ed05 449To turn C<"abbcccd"> into C<"abccd">:
68dc0745 450
d92eb7b0 451 s/(.)\1/$1/g; # add /s to include newlines
452
453Here's a solution that turns "abbcccd" to "abcd":
454
455 y///cs; # y == tr, but shorter :-)
68dc0745 456
457=head2 How do I expand function calls in a string?
458
459This is documented in L<perlref>. In general, this is fraught with
460quoting and readability problems, but it is possible. To interpolate
5a964f20 461a subroutine call (in list context) into a string:
68dc0745 462
463 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
464
465If you prefer scalar context, similar chicanery is also useful for
466arbitrary expressions:
467
468 print "That yields ${\($n + 5)} widgets\n";
469
92c2ed05 470Version 5.004 of Perl had a bug that gave list context to the
471expression in C<${...}>, but this is fixed in version 5.005.
472
473See also ``How can I expand variables in text strings?'' in this
474section of the FAQ.
46fc3d4c 475
68dc0745 476=head2 How do I find matching/nesting anything?
477
92c2ed05 478This isn't something that can be done in one regular expression, no
479matter how complicated. To find something between two single
480characters, a pattern like C</x([^x]*)x/> will get the intervening
481bits in $1. For multiple ones, then something more like
482C</alpha(.*?)omega/> would be needed. But none of these deals with
483nested patterns, nor can they. For that you'll have to write a
484parser.
485
486If you are serious about writing a parser, there are a number of
6a2af475 487modules or oddities that will make your life a lot easier. There are
488the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
83df6a1d 489and the byacc program. Starting from perl 5.8 the Text::Balanced
490is part of the standard distribution.
68dc0745 491
92c2ed05 492One simple destructive, inside-out approach that you might try is to
493pull out the smallest nesting parts one at a time:
5a964f20 494
d92eb7b0 495 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
5a964f20 496 # do something with $1
497 }
498
65acb1b1 499A more complicated and sneaky approach is to make Perl's regular
500expression engine do it for you. This is courtesy Dean Inada, and
501rather has the nature of an Obfuscated Perl Contest entry, but it
502really does work:
503
504 # $_ contains the string to parse
505 # BEGIN and END are the opening and closing markers for the
506 # nested text.
c47ff5f1 507
65acb1b1 508 @( = ('(','');
509 @) = (')','');
510 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
511 @$ = (eval{/$re/},$@!~/unmatched/);
512 print join("\n",@$[0..$#$]) if( $$[-1] );
513
68dc0745 514=head2 How do I reverse a string?
515
5a964f20 516Use reverse() in scalar context, as documented in
68dc0745 517L<perlfunc/reverse>.
518
519 $reversed = reverse $string;
520
521=head2 How do I expand tabs in a string?
522
5a964f20 523You can do it yourself:
68dc0745 524
525 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
526
87275199 527Or you can just use the Text::Tabs module (part of the standard Perl
68dc0745 528distribution).
529
530 use Text::Tabs;
531 @expanded_lines = expand(@lines_with_tabs);
532
533=head2 How do I reformat a paragraph?
534
87275199 535Use Text::Wrap (part of the standard Perl distribution):
68dc0745 536
537 use Text::Wrap;
538 print wrap("\t", ' ', @paragraphs);
539
92c2ed05 540The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 541newlines. Text::Wrap doesn't justify the lines (flush-right).
542
68dc0745 543=head2 How can I access/change the first N letters of a string?
544
545There are many ways. If you just want to grab a copy, use
92c2ed05 546substr():
68dc0745 547
548 $first_byte = substr($a, 0, 1);
549
550If you want to modify part of a string, the simplest way is often to
551use substr() as an lvalue:
552
553 substr($a, 0, 3) = "Tom";
554
92c2ed05 555Although those with a pattern matching kind of thought process will
a6dd486b 556likely prefer
68dc0745 557
558 $a =~ s/^.../Tom/;
559
560=head2 How do I change the Nth occurrence of something?
561
92c2ed05 562You have to keep track of N yourself. For example, let's say you want
563to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 564C<"whosoever"> or C<"whomsoever">, case insensitively. These
565all assume that $_ contains the string to be altered.
68dc0745 566
567 $count = 0;
568 s{((whom?)ever)}{
569 ++$count == 5 # is it the 5th?
570 ? "${2}soever" # yes, swap
571 : $1 # renege and leave it there
d92eb7b0 572 }ige;
68dc0745 573
5a964f20 574In the more general case, you can use the C</g> modifier in a C<while>
575loop, keeping count of matches.
576
577 $WANT = 3;
578 $count = 0;
d92eb7b0 579 $_ = "One fish two fish red fish blue fish";
5a964f20 580 while (/(\w+)\s+fish\b/gi) {
581 if (++$count == $WANT) {
582 print "The third fish is a $1 one.\n";
5a964f20 583 }
584 }
585
92c2ed05 586That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 587repetition count and repeated pattern like this:
588
589 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
590
68dc0745 591=head2 How can I count the number of occurrences of a substring within a string?
592
a6dd486b 593There are a number of ways, with varying efficiency. If you want a
68dc0745 594count of a certain single character (X) within a string, you can use the
595C<tr///> function like so:
596
368c9434 597 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 598 $count = ($string =~ tr/X//);
d92eb7b0 599 print "There are $count X characters in the string";
68dc0745 600
601This is fine if you are just looking for a single character. However,
602if you are trying to count multiple character substrings within a
603larger string, C<tr///> won't work. What you can do is wrap a while()
604loop around a global pattern match. For example, let's count negative
605integers:
606
607 $string = "-9 55 48 -2 23 -76 4 14 -44";
608 while ($string =~ /-\d+/g) { $count++ }
609 print "There are $count negative numbers in the string";
610
611=head2 How do I capitalize all the words on one line?
612
613To make the first letter of each word upper case:
3fe9a6f1 614
68dc0745 615 $line =~ s/\b(\w)/\U$1/g;
616
46fc3d4c 617This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 618Do It>". Sometimes you might want this. Other times you might need a
619more thorough solution (Suggested by brian d. foy):
46fc3d4c 620
621 $string =~ s/ (
622 (^\w) #at the beginning of the line
623 | # or
624 (\s\w) #preceded by whitespace
625 )
626 /\U$1/xg;
627 $string =~ /([\w']+)/\u\L$1/g;
628
68dc0745 629To make the whole line upper case:
3fe9a6f1 630
68dc0745 631 $line = uc($line);
632
633To force each word to be lower case, with the first letter upper case:
3fe9a6f1 634
68dc0745 635 $line =~ s/(\w+)/\u\L$1/g;
636
5a964f20 637You can (and probably should) enable locale awareness of those
638characters by placing a C<use locale> pragma in your program.
92c2ed05 639See L<perllocale> for endless details on locales.
5a964f20 640
65acb1b1 641This is sometimes referred to as putting something into "title
d92eb7b0 642case", but that's not quite accurate. Consider the proper
65acb1b1 643capitalization of the movie I<Dr. Strangelove or: How I Learned to
644Stop Worrying and Love the Bomb>, for example.
645
68dc0745 646=head2 How can I split a [character] delimited string except when inside
647[character]? (Comma-separated files)
648
649Take the example case of trying to split a string that is comma-separated
650into its different fields. (We'll pretend you said comma-separated, not
651comma-delimited, which is different and almost never what you mean.) You
652can't use C<split(/,/)> because you shouldn't split if the comma is inside
653quotes. For example, take a data line like this:
654
655 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
656
657Due to the restriction of the quotes, this is a fairly complex
658problem. Thankfully, we have Jeffrey Friedl, author of a highly
659recommended book on regular expressions, to handle these for us. He
660suggests (assuming your string is contained in $text):
661
662 @new = ();
663 push(@new, $+) while $text =~ m{
664 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
665 | ([^,]+),?
666 | ,
667 }gx;
668 push(@new, undef) if substr($text,-1,1) eq ',';
669
46fc3d4c 670If you want to represent quotation marks inside a
671quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 672C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 673this section.
674
87275199 675Alternatively, the Text::ParseWords module (part of the standard Perl
68dc0745 676distribution) lets you say:
677
678 use Text::ParseWords;
679 @new = quotewords(",", 0, $text);
680
a6dd486b 681There's also a Text::CSV (Comma-Separated Values) module on CPAN.
65acb1b1 682
68dc0745 683=head2 How do I strip blank space from the beginning/end of a string?
684
a6dd486b 685Although the simplest approach would seem to be
68dc0745 686
687 $string =~ s/^\s*(.*?)\s*$/$1/;
688
a6dd486b 689not only is this unnecessarily slow and destructive, it also fails with
d92eb7b0 690embedded newlines. It is much faster to do this operation in two steps:
68dc0745 691
692 $string =~ s/^\s+//;
693 $string =~ s/\s+$//;
694
695Or more nicely written as:
696
697 for ($string) {
698 s/^\s+//;
699 s/\s+$//;
700 }
701
5e3006a4 702This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20 703behavior to factor out common code. You can do this
704on several strings at once, or arrays, or even the
d92eb7b0 705values of a hash if you use a slice:
5a964f20 706
707 # trim whitespace in the scalar, the array,
708 # and all the values in the hash
709 foreach ($scalar, @array, @hash{keys %hash}) {
710 s/^\s+//;
711 s/\s+$//;
712 }
713
65acb1b1 714=head2 How do I pad a string with blanks or pad a number with zeroes?
715
d92eb7b0 716(This answer contributed by Uri Guttman, with kibitzing from
717Bart Lateur.)
65acb1b1 718
719In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 720to pad the string, C<$text> or C<$num> contains the string to be padded,
721and C<$pad_char> contains the padding character. You can use a single
722character string constant instead of the C<$pad_char> variable if you
723know what it is in advance. And in the same way you can use an integer in
724place of C<$pad_len> if you know the pad length in advance.
65acb1b1 725
d92eb7b0 726The simplest method uses the C<sprintf> function. It can pad on the left
727or right with blanks and on the left with zeroes and it will not
728truncate the result. The C<pack> function can only pad strings on the
729right with blanks and it will truncate the result to a maximum length of
730C<$pad_len>.
65acb1b1 731
d92eb7b0 732 # Left padding a string with blanks (no truncation):
733 $padded = sprintf("%${pad_len}s", $text);
65acb1b1 734
d92eb7b0 735 # Right padding a string with blanks (no truncation):
736 $padded = sprintf("%-${pad_len}s", $text);
65acb1b1 737
d92eb7b0 738 # Left padding a number with 0 (no truncation):
739 $padded = sprintf("%0${pad_len}d", $num);
65acb1b1 740
d92eb7b0 741 # Right padding a string with blanks using pack (will truncate):
742 $padded = pack("A$pad_len",$text);
65acb1b1 743
d92eb7b0 744If you need to pad with a character other than blank or zero you can use
745one of the following methods. They all generate a pad string with the
746C<x> operator and combine that with C<$text>. These methods do
747not truncate C<$text>.
65acb1b1 748
d92eb7b0 749Left and right padding with any character, creating a new string:
65acb1b1 750
d92eb7b0 751 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
752 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 753
d92eb7b0 754Left and right padding with any character, modifying C<$text> directly:
65acb1b1 755
d92eb7b0 756 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
757 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 758
68dc0745 759=head2 How do I extract selected columns from a string?
760
761Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20 762If you prefer thinking in terms of columns instead of widths,
763you can use this kind of thing:
764
765 # determine the unpack format needed to split Linux ps output
766 # arguments are cut columns
767 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
768
769 sub cut2fmt {
770 my(@positions) = @_;
771 my $template = '';
772 my $lastpos = 1;
773 for my $place (@positions) {
774 $template .= "A" . ($place - $lastpos) . " ";
775 $lastpos = $place;
776 }
777 $template .= "A*";
778 return $template;
779 }
68dc0745 780
781=head2 How do I find the soundex value of a string?
782
87275199 783Use the standard Text::Soundex module distributed with Perl.
a6dd486b 784Before you do so, you may want to determine whether `soundex' is in
d92eb7b0 785fact what you think it is. Knuth's soundex algorithm compresses words
786into a small space, and so it does not necessarily distinguish between
787two words which you might want to appear separately. For example, the
788last names `Knuth' and `Kant' are both mapped to the soundex code K530.
789If Text::Soundex does not do what you are looking for, you might want
790to consider the String::Approx module available at CPAN.
68dc0745 791
792=head2 How can I expand variables in text strings?
793
794Let's assume that you have a string like:
795
796 $text = 'this has a $foo in it and a $bar';
5a964f20 797
798If those were both global variables, then this would
799suffice:
800
65acb1b1 801 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 802
5a964f20 803But since they are probably lexicals, or at least, they could
804be, you'd have to do this:
68dc0745 805
806 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 807 die if $@; # needed /ee, not /e
68dc0745 808
5a964f20 809It's probably better in the general case to treat those
810variables as entries in some special hash. For example:
811
812 %user_defs = (
813 foo => 23,
814 bar => 19,
815 );
816 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 817
92c2ed05 818See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 819of the FAQ.
820
68dc0745 821=head2 What's wrong with always quoting "$vars"?
822
a6dd486b 823The problem is that those double-quotes force stringification--
824coercing numbers and references into strings--even when you
825don't want them to be strings. Think of it this way: double-quote
65acb1b1 826expansion is used to produce new strings. If you already
827have a string, why do you need more?
68dc0745 828
829If you get used to writing odd things like these:
830
831 print "$var"; # BAD
832 $new = "$old"; # BAD
833 somefunc("$var"); # BAD
834
835You'll be in trouble. Those should (in 99.8% of the cases) be
836the simpler and more direct:
837
838 print $var;
839 $new = $old;
840 somefunc($var);
841
842Otherwise, besides slowing you down, you're going to break code when
843the thing in the scalar is actually neither a string nor a number, but
844a reference:
845
846 func(\@array);
847 sub func {
848 my $aref = shift;
849 my $oref = "$aref"; # WRONG
850 }
851
852You can also get into subtle problems on those few operations in Perl
853that actually do care about the difference between a string and a
854number, such as the magical C<++> autoincrement operator or the
855syscall() function.
856
5a964f20 857Stringification also destroys arrays.
858
859 @lines = `command`;
860 print "@lines"; # WRONG - extra blanks
861 print @lines; # right
862
c47ff5f1 863=head2 Why don't my <<HERE documents work?
68dc0745 864
865Check for these three things:
866
867=over 4
868
869=item 1. There must be no space after the << part.
870
871=item 2. There (probably) should be a semicolon at the end.
872
873=item 3. You can't (easily) have any space in front of the tag.
874
875=back
876
5a964f20 877If you want to indent the text in the here document, you
878can do this:
879
880 # all in one
881 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
882 your text
883 goes here
884 HERE_TARGET
885
886But the HERE_TARGET must still be flush against the margin.
887If you want that indented also, you'll have to quote
888in the indentation.
889
890 ($quote = <<' FINIS') =~ s/^\s+//gm;
891 ...we will have peace, when you and all your works have
892 perished--and the works of your dark master to whom you
893 would deliver us. You are a liar, Saruman, and a corrupter
894 of men's hearts. --Theoden in /usr/src/perl/taint.c
895 FINIS
896 $quote =~ s/\s*--/\n--/;
897
898A nice general-purpose fixer-upper function for indented here documents
899follows. It expects to be called with a here document as its argument.
900It looks to see whether each line begins with a common substring, and
a6dd486b 901if so, strips that substring off. Otherwise, it takes the amount of leading
902whitespace found on the first line and removes that much off each
5a964f20 903subsequent line.
904
905 sub fix {
906 local $_ = shift;
a6dd486b 907 my ($white, $leader); # common whitespace and common leading string
5a964f20 908 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
909 ($white, $leader) = ($2, quotemeta($1));
910 } else {
911 ($white, $leader) = (/^(\s+)/, '');
912 }
913 s/^\s*?$leader(?:$white)?//gm;
914 return $_;
915 }
916
c8db1d39 917This works with leading special strings, dynamically determined:
5a964f20 918
919 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
920 @@@ int
921 @@@ runops() {
922 @@@ SAVEI32(runlevel);
923 @@@ runlevel++;
d92eb7b0 924 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 925 @@@ TAINT_NOT;
926 @@@ return 0;
927 @@@ }
928 MAIN_INTERPRETER_LOOP
929
a6dd486b 930Or with a fixed amount of leading whitespace, with remaining
5a964f20 931indentation correctly preserved:
932
933 $poem = fix<<EVER_ON_AND_ON;
934 Now far ahead the Road has gone,
935 And I must follow, if I can,
936 Pursuing it with eager feet,
937 Until it joins some larger way
938 Where many paths and errands meet.
939 And whither then? I cannot say.
940 --Bilbo in /usr/src/perl/pp_ctl.c
941 EVER_ON_AND_ON
942
68dc0745 943=head1 Data: Arrays
944
65acb1b1 945=head2 What is the difference between a list and an array?
946
947An array has a changeable length. A list does not. An array is something
948you can push or pop, while a list is a set of values. Some people make
949the distinction that a list is a value while an array is a variable.
950Subroutines are passed and return lists, you put things into list
951context, you initialize arrays with lists, and you foreach() across
952a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
953in scalar context behave like the number of elements in them, subroutines
a6dd486b 954access their arguments through the array C<@_>, and push/pop/shift only work
65acb1b1 955on arrays.
956
957As a side note, there's no such thing as a list in scalar context.
958When you say
959
960 $scalar = (2, 5, 7, 9);
961
d92eb7b0 962you're using the comma operator in scalar context, so it uses the scalar
963comma operator. There never was a list there at all! This causes the
964last value to be returned: 9.
65acb1b1 965
68dc0745 966=head2 What is the difference between $array[1] and @array[1]?
967
a6dd486b 968The former is a scalar value; the latter an array slice, making
68dc0745 969it a list with one (scalar) value. You should use $ when you want a
970scalar value (most of the time) and @ when you want a list with one
971scalar value in it (very, very rarely; nearly never, in fact).
972
973Sometimes it doesn't make a difference, but sometimes it does.
974For example, compare:
975
976 $good[0] = `some program that outputs several lines`;
977
978with
979
980 @bad[0] = `same program that outputs several lines`;
981
9f1b1f2d 982The C<use warnings> pragma and the B<-w> flag will warn you about these
983matters.
68dc0745 984
d92eb7b0 985=head2 How can I remove duplicate elements from a list or array?
68dc0745 986
987There are several possible ways, depending on whether the array is
988ordered and whether you wish to preserve the ordering.
989
990=over 4
991
551e1d92 992=item a)
993
994If @in is sorted, and you want @out to be sorted:
5a964f20 995(this assumes all true values in the array)
68dc0745 996
a4341a65 997 $prev = "not equal to $in[0]";
3bc5ef3e 998 @out = grep($_ ne $prev && ($prev = $_, 1), @in);
68dc0745 999
c8db1d39 1000This is nice in that it doesn't use much extra memory, simulating
3bc5ef3e 1001uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
1002guarantees that the expression is true (so that grep picks it up)
1003even if the $_ is 0, "", or undef.
68dc0745 1004
551e1d92 1005=item b)
1006
1007If you don't know whether @in is sorted:
68dc0745 1008
1009 undef %saw;
1010 @out = grep(!$saw{$_}++, @in);
1011
551e1d92 1012=item c)
1013
1014Like (b), but @in contains only small integers:
68dc0745 1015
1016 @out = grep(!$saw[$_]++, @in);
1017
551e1d92 1018=item d)
1019
1020A way to do (b) without any loops or greps:
68dc0745 1021
1022 undef %saw;
1023 @saw{@in} = ();
1024 @out = sort keys %saw; # remove sort if undesired
1025
551e1d92 1026=item e)
1027
1028Like (d), but @in contains only small positive integers:
68dc0745 1029
1030 undef @ary;
1031 @ary[@in] = @in;
87275199 1032 @out = grep {defined} @ary;
68dc0745 1033
1034=back
1035
65acb1b1 1036But perhaps you should have been using a hash all along, eh?
1037
ddbc1f16 1038=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1039
1040Hearing the word "in" is an I<in>dication that you probably should have
1041used a hash, not a list or array, to store your data. Hashes are
1042designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1043
5a964f20 1044That being said, there are several ways to approach this. If you
1045are going to make this query many times over arbitrary string values,
1046the fastest way is probably to invert the original array and keep an
68dc0745 1047associative array lying about whose keys are the first array's values.
1048
1049 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1050 undef %is_blue;
1051 for (@blues) { $is_blue{$_} = 1 }
1052
1053Now you can check whether $is_blue{$some_color}. It might have been a
1054good idea to keep the blues all in a hash in the first place.
1055
1056If the values are all small integers, you could use a simple indexed
1057array. This kind of an array will take up less space:
1058
1059 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1060 undef @is_tiny_prime;
d92eb7b0 1061 for (@primes) { $is_tiny_prime[$_] = 1 }
1062 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1063
1064Now you check whether $is_tiny_prime[$some_number].
1065
1066If the values in question are integers instead of strings, you can save
1067quite a lot of space by using bit strings instead:
1068
1069 @articles = ( 1..10, 150..2000, 2017 );
1070 undef $read;
7b8d334a 1071 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1072
1073Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1074
1075Please do not use
1076
a6dd486b 1077 ($is_there) = grep $_ eq $whatever, @array;
68dc0745 1078
1079or worse yet
1080
a6dd486b 1081 ($is_there) = grep /$whatever/, @array;
68dc0745 1082
1083These are slow (checks every element even if the first matches),
1084inefficient (same reason), and potentially buggy (what if there are
d92eb7b0 1085regex characters in $whatever?). If you're only testing once, then
65acb1b1 1086use:
1087
1088 $is_there = 0;
1089 foreach $elt (@array) {
1090 if ($elt eq $elt_to_find) {
1091 $is_there = 1;
1092 last;
1093 }
1094 }
1095 if ($is_there) { ... }
68dc0745 1096
1097=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1098
1099Use a hash. Here's code to do both and more. It assumes that
1100each element is unique in a given array:
1101
1102 @union = @intersection = @difference = ();
1103 %count = ();
1104 foreach $element (@array1, @array2) { $count{$element}++ }
1105 foreach $element (keys %count) {
1106 push @union, $element;
1107 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1108 }
1109
d92eb7b0 1110Note that this is the I<symmetric difference>, that is, all elements in
a6dd486b 1111either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1112
65acb1b1 1113=head2 How do I test whether two arrays or hashes are equal?
1114
1115The following code works for single-level arrays. It uses a stringwise
1116comparison, and does not distinguish defined versus undefined empty
1117strings. Modify if you have other needs.
1118
1119 $are_equal = compare_arrays(\@frogs, \@toads);
1120
1121 sub compare_arrays {
1122 my ($first, $second) = @_;
9f1b1f2d 1123 no warnings; # silence spurious -w undef complaints
65acb1b1 1124 return 0 unless @$first == @$second;
1125 for (my $i = 0; $i < @$first; $i++) {
1126 return 0 if $first->[$i] ne $second->[$i];
1127 }
1128 return 1;
1129 }
1130
1131For multilevel structures, you may wish to use an approach more
1132like this one. It uses the CPAN module FreezeThaw:
1133
1134 use FreezeThaw qw(cmpStr);
1135 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1136
1137 printf "a and b contain %s arrays\n",
1138 cmpStr(\@a, \@b) == 0
1139 ? "the same"
1140 : "different";
1141
1142This approach also works for comparing hashes. Here
1143we'll demonstrate two different answers:
1144
1145 use FreezeThaw qw(cmpStr cmpStrHard);
1146
1147 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1148 $a{EXTRA} = \%b;
1149 $b{EXTRA} = \%a;
1150
1151 printf "a and b contain %s hashes\n",
1152 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1153
1154 printf "a and b contain %s hashes\n",
1155 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1156
1157
1158The first reports that both those the hashes contain the same data,
1159while the second reports that they do not. Which you prefer is left as
1160an exercise to the reader.
1161
68dc0745 1162=head2 How do I find the first array element for which a condition is true?
1163
1164You can use this if you care about the index:
1165
65acb1b1 1166 for ($i= 0; $i < @array; $i++) {
68dc0745 1167 if ($array[$i] eq "Waldo") {
1168 $found_index = $i;
1169 last;
1170 }
1171 }
1172
1173Now C<$found_index> has what you want.
1174
1175=head2 How do I handle linked lists?
1176
1177In general, you usually don't need a linked list in Perl, since with
1178regular arrays, you can push and pop or shift and unshift at either end,
5a964f20 1179or you can use splice to add and/or remove arbitrary number of elements at
87275199 1180arbitrary points. Both pop and shift are both O(1) operations on Perl's
5a964f20 1181dynamic arrays. In the absence of shifts and pops, push in general
1182needs to reallocate on the order every log(N) times, and unshift will
1183need to copy pointers each time.
68dc0745 1184
1185If you really, really wanted, you could use structures as described in
1186L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1 1187to do. For example, imagine a list node like this:
1188
1189 $node = {
1190 VALUE => 42,
1191 LINK => undef,
1192 };
1193
1194You could walk the list this way:
1195
1196 print "List: ";
1197 for ($node = $head; $node; $node = $node->{LINK}) {
1198 print $node->{VALUE}, " ";
1199 }
1200 print "\n";
1201
a6dd486b 1202You could add to the list this way:
65acb1b1 1203
1204 my ($head, $tail);
1205 $tail = append($head, 1); # grow a new head
1206 for $value ( 2 .. 10 ) {
1207 $tail = append($tail, $value);
1208 }
1209
1210 sub append {
1211 my($list, $value) = @_;
1212 my $node = { VALUE => $value };
1213 if ($list) {
1214 $node->{LINK} = $list->{LINK};
1215 $list->{LINK} = $node;
1216 } else {
1217 $_[0] = $node; # replace caller's version
1218 }
1219 return $node;
1220 }
1221
1222But again, Perl's built-in are virtually always good enough.
68dc0745 1223
1224=head2 How do I handle circular lists?
1225
1226Circular lists could be handled in the traditional fashion with linked
1227lists, or you could just do something like this with an array:
1228
1229 unshift(@array, pop(@array)); # the last shall be first
1230 push(@array, shift(@array)); # and vice versa
1231
1232=head2 How do I shuffle an array randomly?
1233
5a964f20 1234Use this:
1235
1236 # fisher_yates_shuffle( \@array ) :
1237 # generate a random permutation of @array in place
1238 sub fisher_yates_shuffle {
1239 my $array = shift;
1240 my $i;
1241 for ($i = @$array; --$i; ) {
1242 my $j = int rand ($i+1);
5a964f20 1243 @$array[$i,$j] = @$array[$j,$i];
1244 }
1245 }
1246
1247 fisher_yates_shuffle( \@array ); # permutes @array in place
1248
d92eb7b0 1249You've probably seen shuffling algorithms that work using splice,
a6dd486b 1250randomly picking another element to swap the current element with
68dc0745 1251
1252 srand;
1253 @new = ();
1254 @old = 1 .. 10; # just a demo
1255 while (@old) {
1256 push(@new, splice(@old, rand @old, 1));
1257 }
1258
5a964f20 1259This is bad because splice is already O(N), and since you do it N times,
1260you just invented a quadratic algorithm; that is, O(N**2). This does
1261not scale, although Perl is so efficient that you probably won't notice
1262this until you have rather largish arrays.
68dc0745 1263
1264=head2 How do I process/modify each element of an array?
1265
1266Use C<for>/C<foreach>:
1267
1268 for (@lines) {
5a964f20 1269 s/foo/bar/; # change that word
1270 y/XZ/ZX/; # swap those letters
68dc0745 1271 }
1272
1273Here's another; let's compute spherical volumes:
1274
5a964f20 1275 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1276 $_ **= 3;
1277 $_ *= (4/3) * 3.14159; # this will be constant folded
1278 }
1279
5a964f20 1280If you want to do the same thing to modify the values of the hash,
1281you may not use the C<values> function, oddly enough. You need a slice:
1282
1283 for $orbit ( @orbits{keys %orbits} ) {
1284 ($orbit **= 3) *= (4/3) * 3.14159;
1285 }
1286
68dc0745 1287=head2 How do I select a random element from an array?
1288
1289Use the rand() function (see L<perlfunc/rand>):
1290
5a964f20 1291 # at the top of the program:
68dc0745 1292 srand; # not needed for 5.004 and later
5a964f20 1293
1294 # then later on
68dc0745 1295 $index = rand @array;
1296 $element = $array[$index];
1297
5a964f20 1298Make sure you I<only call srand once per program, if then>.
1299If you are calling it more than once (such as before each
1300call to rand), you're almost certainly doing something wrong.
1301
68dc0745 1302=head2 How do I permute N elements of a list?
1303
1304Here's a little program that generates all permutations
1305of all the words on each line of input. The algorithm embodied
5a964f20 1306in the permute() function should work on any list:
68dc0745 1307
1308 #!/usr/bin/perl -n
5a964f20 1309 # tsc-permute: permute each word of input
1310 permute([split], []);
1311 sub permute {
1312 my @items = @{ $_[0] };
1313 my @perms = @{ $_[1] };
1314 unless (@items) {
1315 print "@perms\n";
68dc0745 1316 } else {
5a964f20 1317 my(@newitems,@newperms,$i);
1318 foreach $i (0 .. $#items) {
1319 @newitems = @items;
1320 @newperms = @perms;
1321 unshift(@newperms, splice(@newitems, $i, 1));
1322 permute([@newitems], [@newperms]);
68dc0745 1323 }
1324 }
1325 }
1326
1327=head2 How do I sort an array by (anything)?
1328
1329Supply a comparison function to sort() (described in L<perlfunc/sort>):
1330
1331 @list = sort { $a <=> $b } @list;
1332
1333The default sort function is cmp, string comparison, which would
c47ff5f1 1334sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1335the numerical comparison operator.
1336
1337If you have a complicated function needed to pull out the part you
1338want to sort on, then don't do it inside the sort function. Pull it
1339out first, because the sort BLOCK can be called many times for the
1340same element. Here's an example of how to pull out the first word
1341after the first number on each item, and then sort those words
1342case-insensitively.
1343
1344 @idx = ();
1345 for (@data) {
1346 ($item) = /\d+\s*(\S+)/;
1347 push @idx, uc($item);
1348 }
1349 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1350
a6dd486b 1351which could also be written this way, using a trick
68dc0745 1352that's come to be known as the Schwartzian Transform:
1353
1354 @sorted = map { $_->[0] }
1355 sort { $a->[1] cmp $b->[1] }
d92eb7b0 1356 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1357
1358If you need to sort on several fields, the following paradigm is useful.
1359
1360 @sorted = sort { field1($a) <=> field1($b) ||
1361 field2($a) cmp field2($b) ||
1362 field3($a) cmp field3($b)
1363 } @data;
1364
1365This can be conveniently combined with precalculation of keys as given
1366above.
1367
1368See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
1369this approach.
1370
1371See also the question below on sorting hashes.
1372
1373=head2 How do I manipulate arrays of bits?
1374
1375Use pack() and unpack(), or else vec() and the bitwise operations.
1376
1377For example, this sets $vec to have bit N set if $ints[N] was set:
1378
1379 $vec = '';
1380 foreach(@ints) { vec($vec,$_,1) = 1 }
1381
1382And here's how, given a vector in $vec, you can
1383get those bits into your @ints array:
1384
1385 sub bitvec_to_list {
1386 my $vec = shift;
1387 my @ints;
1388 # Find null-byte density then select best algorithm
1389 if ($vec =~ tr/\0// / length $vec > 0.95) {
1390 use integer;
1391 my $i;
1392 # This method is faster with mostly null-bytes
1393 while($vec =~ /[^\0]/g ) {
1394 $i = -9 + 8 * pos $vec;
1395 push @ints, $i if vec($vec, ++$i, 1);
1396 push @ints, $i if vec($vec, ++$i, 1);
1397 push @ints, $i if vec($vec, ++$i, 1);
1398 push @ints, $i if vec($vec, ++$i, 1);
1399 push @ints, $i if vec($vec, ++$i, 1);
1400 push @ints, $i if vec($vec, ++$i, 1);
1401 push @ints, $i if vec($vec, ++$i, 1);
1402 push @ints, $i if vec($vec, ++$i, 1);
1403 }
1404 } else {
1405 # This method is a fast general algorithm
1406 use integer;
1407 my $bits = unpack "b*", $vec;
1408 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1409 push @ints, pos $bits while($bits =~ /1/g);
1410 }
1411 return \@ints;
1412 }
1413
1414This method gets faster the more sparse the bit vector is.
1415(Courtesy of Tim Bunce and Winfried Koenig.)
1416
65acb1b1 1417Here's a demo on how to use vec():
1418
1419 # vec demo
1420 $vector = "\xff\x0f\xef\xfe";
1421 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1422 unpack("N", $vector), "\n";
1423 $is_set = vec($vector, 23, 1);
1424 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1425 pvec($vector);
1426
1427 set_vec(1,1,1);
1428 set_vec(3,1,1);
1429 set_vec(23,1,1);
1430
1431 set_vec(3,1,3);
1432 set_vec(3,2,3);
1433 set_vec(3,4,3);
1434 set_vec(3,4,7);
1435 set_vec(3,8,3);
1436 set_vec(3,8,7);
1437
1438 set_vec(0,32,17);
1439 set_vec(1,32,17);
1440
1441 sub set_vec {
1442 my ($offset, $width, $value) = @_;
1443 my $vector = '';
1444 vec($vector, $offset, $width) = $value;
1445 print "offset=$offset width=$width value=$value\n";
1446 pvec($vector);
1447 }
1448
1449 sub pvec {
1450 my $vector = shift;
1451 my $bits = unpack("b*", $vector);
1452 my $i = 0;
1453 my $BASE = 8;
1454
1455 print "vector length in bytes: ", length($vector), "\n";
1456 @bytes = unpack("A8" x length($vector), $bits);
1457 print "bits are: @bytes\n\n";
1458 }
1459
68dc0745 1460=head2 Why does defined() return true on empty arrays and hashes?
1461
65acb1b1 1462The short story is that you should probably only use defined on scalars or
1463functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1464in the 5.004 release or later of Perl for more detail.
68dc0745 1465
1466=head1 Data: Hashes (Associative Arrays)
1467
1468=head2 How do I process an entire hash?
1469
1470Use the each() function (see L<perlfunc/each>) if you don't care
1471whether it's sorted:
1472
5a964f20 1473 while ( ($key, $value) = each %hash) {
68dc0745 1474 print "$key = $value\n";
1475 }
1476
1477If you want it sorted, you'll have to use foreach() on the result of
1478sorting the keys as shown in an earlier question.
1479
1480=head2 What happens if I add or remove keys from a hash while iterating over it?
1481
d92eb7b0 1482Don't do that. :-)
1483
1484[lwall] In Perl 4, you were not allowed to modify a hash at all while
87275199 1485iterating over it. In Perl 5 you can delete from it, but you still
d92eb7b0 1486can't add to it, because that might cause a doubling of the hash table,
1487in which half the entries get copied up to the new top half of the
87275199 1488table, at which point you've totally bamboozled the iterator code.
d92eb7b0 1489Even if the table doesn't double, there's no telling whether your new
1490entry will be inserted before or after the current iterator position.
1491
a6dd486b 1492Either treasure up your changes and make them after the iterator finishes
d92eb7b0 1493or use keys to fetch all the old keys at once, and iterate over the list
1494of keys.
68dc0745 1495
1496=head2 How do I look up a hash element by value?
1497
1498Create a reverse hash:
1499
1500 %by_value = reverse %by_key;
1501 $key = $by_value{$value};
1502
1503That's not particularly efficient. It would be more space-efficient
1504to use:
1505
1506 while (($key, $value) = each %by_key) {
1507 $by_value{$value} = $key;
1508 }
1509
d92eb7b0 1510If your hash could have repeated values, the methods above will only find
1511one of the associated keys. This may or may not worry you. If it does
1512worry you, you can always reverse the hash into a hash of arrays instead:
1513
1514 while (($key, $value) = each %by_key) {
1515 push @{$key_list_by_value{$value}}, $key;
1516 }
68dc0745 1517
1518=head2 How can I know how many entries are in a hash?
1519
1520If you mean how many keys, then all you have to do is
1521take the scalar sense of the keys() function:
1522
3fe9a6f1 1523 $num_keys = scalar keys %hash;
68dc0745 1524
a6dd486b 1525The keys() function also resets the iterator, which in void context is
d92eb7b0 1526faster for tied hashes than would be iterating through the whole
1527hash, one key-value pair at a time.
68dc0745 1528
1529=head2 How do I sort a hash (optionally by value instead of key)?
1530
1531Internally, hashes are stored in a way that prevents you from imposing
1532an order on key-value pairs. Instead, you have to sort a list of the
1533keys or values:
1534
1535 @keys = sort keys %hash; # sorted by key
1536 @keys = sort {
1537 $hash{$a} cmp $hash{$b}
1538 } keys %hash; # and by value
1539
1540Here we'll do a reverse numeric sort by value, and if two keys are
a6dd486b 1541identical, sort by length of key, or if that fails, by straight ASCII
1542comparison of the keys (well, possibly modified by your locale--see
68dc0745 1543L<perllocale>).
1544
1545 @keys = sort {
1546 $hash{$b} <=> $hash{$a}
1547 ||
1548 length($b) <=> length($a)
1549 ||
1550 $a cmp $b
1551 } keys %hash;
1552
1553=head2 How can I always keep my hash sorted?
1554
1555You can look into using the DB_File module and tie() using the
1556$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1557The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1558
1559=head2 What's the difference between "delete" and "undef" with hashes?
1560
1561Hashes are pairs of scalars: the first is the key, the second is the
1562value. The key will be coerced to a string, although the value can be
1563any kind of scalar: string, number, or reference. If a key C<$key> is
1564present in the array, C<exists($key)> will return true. The value for
1565a given key can be C<undef>, in which case C<$array{$key}> will be
1566C<undef> while C<$exists{$key}> will return true. This corresponds to
1567(C<$key>, C<undef>) being in the hash.
1568
1569Pictures help... here's the C<%ary> table:
1570
1571 keys values
1572 +------+------+
1573 | a | 3 |
1574 | x | 7 |
1575 | d | 0 |
1576 | e | 2 |
1577 +------+------+
1578
1579And these conditions hold
1580
1581 $ary{'a'} is true
1582 $ary{'d'} is false
1583 defined $ary{'d'} is true
1584 defined $ary{'a'} is true
87275199 1585 exists $ary{'a'} is true (Perl5 only)
68dc0745 1586 grep ($_ eq 'a', keys %ary) is true
1587
1588If you now say
1589
1590 undef $ary{'a'}
1591
1592your table now reads:
1593
1594
1595 keys values
1596 +------+------+
1597 | a | undef|
1598 | x | 7 |
1599 | d | 0 |
1600 | e | 2 |
1601 +------+------+
1602
1603and these conditions now hold; changes in caps:
1604
1605 $ary{'a'} is FALSE
1606 $ary{'d'} is false
1607 defined $ary{'d'} is true
1608 defined $ary{'a'} is FALSE
87275199 1609 exists $ary{'a'} is true (Perl5 only)
68dc0745 1610 grep ($_ eq 'a', keys %ary) is true
1611
1612Notice the last two: you have an undef value, but a defined key!
1613
1614Now, consider this:
1615
1616 delete $ary{'a'}
1617
1618your table now reads:
1619
1620 keys values
1621 +------+------+
1622 | x | 7 |
1623 | d | 0 |
1624 | e | 2 |
1625 +------+------+
1626
1627and these conditions now hold; changes in caps:
1628
1629 $ary{'a'} is false
1630 $ary{'d'} is false
1631 defined $ary{'d'} is true
1632 defined $ary{'a'} is false
87275199 1633 exists $ary{'a'} is FALSE (Perl5 only)
68dc0745 1634 grep ($_ eq 'a', keys %ary) is FALSE
1635
1636See, the whole entry is gone!
1637
1638=head2 Why don't my tied hashes make the defined/exists distinction?
1639
1640They may or may not implement the EXISTS() and DEFINED() methods
1641differently. For example, there isn't the concept of undef with hashes
1642that are tied to DBM* files. This means the true/false tables above
1643will give different results when used on such a hash. It also means
1644that exists and defined do the same thing with a DBM* file, and what
1645they end up doing is not what they do with ordinary hashes.
1646
1647=head2 How do I reset an each() operation part-way through?
1648
5a964f20 1649Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1650the hash I<and> resets the iterator associated with the hash. You may
1651need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1652re-enter it, the hash iterator has been reset.
68dc0745 1653
1654=head2 How can I get the unique keys from two hashes?
1655
d92eb7b0 1656First you extract the keys from the hashes into lists, then solve
1657the "removing duplicates" problem described above. For example:
68dc0745 1658
1659 %seen = ();
1660 for $element (keys(%foo), keys(%bar)) {
1661 $seen{$element}++;
1662 }
1663 @uniq = keys %seen;
1664
1665Or more succinctly:
1666
1667 @uniq = keys %{{%foo,%bar}};
1668
1669Or if you really want to save space:
1670
1671 %seen = ();
1672 while (defined ($key = each %foo)) {
1673 $seen{$key}++;
1674 }
1675 while (defined ($key = each %bar)) {
1676 $seen{$key}++;
1677 }
1678 @uniq = keys %seen;
1679
1680=head2 How can I store a multidimensional array in a DBM file?
1681
1682Either stringify the structure yourself (no fun), or else
1683get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1684it on top of either DB_File or GDBM_File.
1685
1686=head2 How can I make my hash remember the order I put elements into it?
1687
1688Use the Tie::IxHash from CPAN.
1689
46fc3d4c 1690 use Tie::IxHash;
1691 tie(%myhash, Tie::IxHash);
1692 for ($i=0; $i<20; $i++) {
1693 $myhash{$i} = 2*$i;
1694 }
1695 @keys = keys %myhash;
1696 # @keys = (0,1,2,3,...)
1697
68dc0745 1698=head2 Why does passing a subroutine an undefined element in a hash create it?
1699
1700If you say something like:
1701
1702 somefunc($hash{"nonesuch key here"});
1703
1704Then that element "autovivifies"; that is, it springs into existence
1705whether you store something there or not. That's because functions
1706get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1707it has to be ready to write it back into the caller's version.
1708
87275199 1709This has been fixed as of Perl5.004.
68dc0745 1710
1711Normally, merely accessing a key's value for a nonexistent key does
1712I<not> cause that key to be forever there. This is different than
1713awk's behavior.
1714
fc36a67e 1715=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1716
65acb1b1 1717Usually a hash ref, perhaps like this:
1718
1719 $record = {
1720 NAME => "Jason",
1721 EMPNO => 132,
1722 TITLE => "deputy peon",
1723 AGE => 23,
1724 SALARY => 37_000,
1725 PALS => [ "Norbert", "Rhys", "Phineas"],
1726 };
1727
1728References are documented in L<perlref> and the upcoming L<perlreftut>.
1729Examples of complex data structures are given in L<perldsc> and
1730L<perllol>. Examples of structures and object-oriented classes are
1731in L<perltoot>.
68dc0745 1732
1733=head2 How can I use a reference as a hash key?
1734
1735You can't do this directly, but you could use the standard Tie::Refhash
87275199 1736module distributed with Perl.
68dc0745 1737
1738=head1 Data: Misc
1739
1740=head2 How do I handle binary data correctly?
1741
1742Perl is binary clean, so this shouldn't be a problem. For example,
1743this works fine (assuming the files are found):
1744
1745 if (`cat /vmunix` =~ /gzip/) {
1746 print "Your kernel is GNU-zip enabled!\n";
1747 }
1748
d92eb7b0 1749On less elegant (read: Byzantine) systems, however, you have
1750to play tedious games with "text" versus "binary" files. See
1751L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
1752systems are curses out of Microsoft, who seem to be committed to putting
1753the backward into backward compatibility.
68dc0745 1754
1755If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1756
54310121 1757If you want to deal with multibyte characters, however, there are
68dc0745 1758some gotchas. See the section on Regular Expressions.
1759
1760=head2 How do I determine whether a scalar is a number/whole/integer/float?
1761
1762Assuming that you don't care about IEEE notations like "NaN" or
1763"Infinity", you probably just want to use a regular expression.
1764
65acb1b1 1765 if (/\D/) { print "has nondigits\n" }
1766 if (/^\d+$/) { print "is a whole number\n" }
1767 if (/^-?\d+$/) { print "is an integer\n" }
1768 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1769 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1770 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
1771 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1772 { print "a C float" }
68dc0745 1773
5a964f20 1774If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1775function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1776wrapper function for more convenient access. This function takes
1777a string and returns the number it found, or C<undef> for input that
1778isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1779if you just want to say, ``Is this a float?''
1780
1781 sub getnum {
1782 use POSIX qw(strtod);
1783 my $str = shift;
1784 $str =~ s/^\s+//;
1785 $str =~ s/\s+$//;
1786 $! = 0;
1787 my($num, $unparsed) = strtod($str);
1788 if (($str eq '') || ($unparsed != 0) || $!) {
1789 return undef;
1790 } else {
1791 return $num;
1792 }
1793 }
1794
072dc14b 1795 sub is_numeric { defined getnum($_[0]) }
5a964f20 1796
6cecdcac 1797Or you could check out the String::Scanf module on CPAN instead. The
1798POSIX module (part of the standard Perl distribution) provides the
bf4acbe4 1799C<strtod> and C<strtol> for converting strings to double and longs,
6cecdcac 1800respectively.
68dc0745 1801
1802=head2 How do I keep persistent data across program calls?
1803
1804For some specific applications, you can use one of the DBM modules.
65acb1b1 1805See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
83df6a1d 1806Storable, or Class::Eroot modules from CPAN. Starting from Perl 5.8
1807Storable is part of the standard distribution. Here's one example using
65acb1b1 1808Storable's C<store> and C<retrieve> functions:
1809
1810 use Storable;
1811 store(\%hash, "filename");
1812
1813 # later on...
1814 $href = retrieve("filename"); # by ref
1815 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1816
1817=head2 How do I print out or copy a recursive data structure?
1818
65acb1b1 1819The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1820for printing out data structures. The Storable module, found on CPAN,
1821provides a function called C<dclone> that recursively copies its argument.
1822
1823 use Storable qw(dclone);
1824 $r2 = dclone($r1);
68dc0745 1825
65acb1b1 1826Where $r1 can be a reference to any kind of data structure you'd like.
1827It will be deeply copied. Because C<dclone> takes and returns references,
1828you'd have to add extra punctuation if you had a hash of arrays that
1829you wanted to copy.
68dc0745 1830
65acb1b1 1831 %newhash = %{ dclone(\%oldhash) };
68dc0745 1832
1833=head2 How do I define methods for every class/object?
1834
1835Use the UNIVERSAL class (see L<UNIVERSAL>).
1836
1837=head2 How do I verify a credit card checksum?
1838
1839Get the Business::CreditCard module from CPAN.
1840
65acb1b1 1841=head2 How do I pack arrays of doubles or floats for XS code?
1842
1843The kgbpack.c code in the PGPLOT module on CPAN does just this.
1844If you're doing a lot of float or double processing, consider using
1845the PDL module from CPAN instead--it makes number-crunching easy.
1846
68dc0745 1847=head1 AUTHOR AND COPYRIGHT
1848
65acb1b1 1849Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1850All rights reserved.
1851
1852When included as part of the Standard Version of Perl, or as part of
1853its complete documentation whether printed or otherwise, this work
d92eb7b0 1854may be distributed only under the terms of Perl's Artistic License.
5a964f20 1855Any distribution of this file or derivatives thereof I<outside>
1856of that package require that special arrangements be made with
1857copyright holder.
1858
1859Irrespective of its distribution, all code examples in this file
1860are hereby placed into the public domain. You are permitted and
1861encouraged to use this code in your own programs for fun
1862or for profit as you see fit. A simple comment in the code giving
1863credit would be courteous but is not required.