* FAQ sync
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
d12d61cf 14For the long explanation, see David Goldberg's "What Every Computer
15Scientist Should Know About Floating-Point Arithmetic"
16(http://docs.sun.com/source/806-3568/ncg_goldberg.html).
17
ac9dac7f 18Internally, your computer represents floating-point numbers in binary.
19Digital (as in powers of two) computers cannot store all numbers
20exactly. Some real numbers lose precision in the process. This is a
21problem with how computers store numbers and affects all computer
22languages, not just Perl.
46fc3d4c 23
ee891a00 24L<perlnumber> shows the gory details of number representations and
ac9dac7f 25conversions.
49d635f9 26
ac9dac7f 27To limit the number of decimal places in your numbers, you can use the
3bc3c5be 28C<printf> or C<sprintf> function. See the L<"Floating Point
ac9dac7f 29Arithmetic"|perlop> for more details.
49d635f9 30
31 printf "%.2f", 10/3;
197aec24 32
49d635f9 33 my $number = sprintf "%.2f", 10/3;
197aec24 34
32969b6e 35=head2 Why is int() broken?
36
ac9dac7f 37Your C<int()> is most probably working just fine. It's the numbers that
32969b6e 38aren't quite what you think.
39
ac9dac7f 40First, see the answer to "Why am I getting long decimals
32969b6e 41(eg, 19.9499999999999) instead of the numbers I should be getting
42(eg, 19.95)?".
43
44For example, this
45
ac9dac7f 46 print int(0.6/0.2-2), "\n";
32969b6e 47
48will in most computers print 0, not 1, because even such simple
49numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
50numbers. What you think in the above as 'three' is really more like
512.9999999999999995559.
52
68dc0745 53=head2 Why isn't my octal data interpreted correctly?
54
109f0441 55(contributed by brian d foy)
56
57You're probably trying to convert a string to a number, which Perl only
58converts as a decimal number. When Perl converts a string to a number, it
59ignores leading spaces and zeroes, then assumes the rest of the digits
60are in base 10:
61
62 my $string = '0644';
63
64 print $string + 0; # prints 644
65
66 print $string + 44; # prints 688, certainly not octal!
67
68This problem usually involves one of the Perl built-ins that has the
23bec515 69same name a Unix command that uses octal numbers as arguments on the
109f0441 70command line. In this example, C<chmod> on the command line knows that
71its first argument is octal because that's what it does:
72
73 %prompt> chmod 644 file
74
75If you want to use the same literal digits (644) in Perl, you have to tell
76Perl to treat them as octal numbers either by prefixing the digits with
77a C<0> or using C<oct>:
78
79 chmod( 0644, $file); # right, has leading zero
80 chmod( oct(644), $file ); # also correct
68dc0745 81
109f0441 82The problem comes in when you take your numbers from something that Perl
83thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 84
109f0441 85 chmod( $ARGV[0], $file); # wrong, even if "0644"
68dc0745 86
109f0441 87 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
33ce146f 88
109f0441 89You can always check the value you're using by printing it in octal
90notation to ensure it matches what you think it should be. Print it
91in octal and decimal format:
33ce146f 92
109f0441 93 printf "0%o %d", $number, $number;
33ce146f 94
65acb1b1 95=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 96
ac9dac7f 97Remember that C<int()> merely truncates toward 0. For rounding to a
98certain number of digits, C<sprintf()> or C<printf()> is usually the
99easiest route.
92c2ed05 100
ac9dac7f 101 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 102
ac9dac7f 103The C<POSIX> module (part of the standard Perl distribution)
104implements C<ceil()>, C<floor()>, and a number of other mathematical
105and trigonometric functions.
68dc0745 106
ac9dac7f 107 use POSIX;
108 $ceil = ceil(3.5); # 4
109 $floor = floor(3.5); # 3
92c2ed05 110
ac9dac7f 111In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
112module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 113distribution) implements the trigonometric functions. Internally it
ac9dac7f 114uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 115the real axis into the complex plane, for example the inverse sine of
1162.
68dc0745 117
118Rounding in financial applications can have serious implications, and
119the rounding method used should be specified precisely. In these
120cases, it probably pays not to trust whichever system rounding is
121being used by Perl, but to instead implement the rounding function you
122need yourself.
123
65acb1b1 124To see why, notice how you'll still have an issue on half-way-point
125alternation:
126
ac9dac7f 127 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 128
ac9dac7f 129 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
130 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 131
ac9dac7f 132Don't blame Perl. It's the same as in C. IEEE says we have to do
133this. Perl numbers whose absolute values are integers under 2**31 (on
13432 bit machines) will work pretty much like mathematical integers.
135Other numbers are not guaranteed.
65acb1b1 136
6f0efb17 137=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 138
ac9dac7f 139As always with Perl there is more than one way to do it. Below are a
140few examples of approaches to making common conversions between number
141representations. This is intended to be representational rather than
142exhaustive.
68dc0745 143
ac9dac7f 144Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
145module from CPAN. The reason you might choose C<Bit::Vector> over the
146perl built in functions is that it works with numbers of ANY size,
147that it is optimized for speed on some operations, and for at least
148some programmers the notation might be familiar.
d92eb7b0 149
818c4caa 150=over 4
151
152=item How do I convert hexadecimal into decimal
d92eb7b0 153
ac9dac7f 154Using perl's built in conversion of C<0x> notation:
6761e064 155
ac9dac7f 156 $dec = 0xDEADBEEF;
7207e29d 157
ac9dac7f 158Using the C<hex> function:
6761e064 159
ac9dac7f 160 $dec = hex("DEADBEEF");
6761e064 161
ac9dac7f 162Using C<pack>:
6761e064 163
ac9dac7f 164 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 165
ac9dac7f 166Using the CPAN module C<Bit::Vector>:
6761e064 167
ac9dac7f 168 use Bit::Vector;
169 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
170 $dec = $vec->to_Dec();
6761e064 171
818c4caa 172=item How do I convert from decimal to hexadecimal
6761e064 173
ac9dac7f 174Using C<sprintf>:
6761e064 175
ac9dac7f 176 $hex = sprintf("%X", 3735928559); # upper case A-F
177 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 178
ac9dac7f 179Using C<unpack>:
6761e064 180
ac9dac7f 181 $hex = unpack("H*", pack("N", 3735928559));
6761e064 182
ac9dac7f 183Using C<Bit::Vector>:
6761e064 184
ac9dac7f 185 use Bit::Vector;
186 $vec = Bit::Vector->new_Dec(32, -559038737);
187 $hex = $vec->to_Hex();
6761e064 188
ac9dac7f 189And C<Bit::Vector> supports odd bit counts:
6761e064 190
ac9dac7f 191 use Bit::Vector;
192 $vec = Bit::Vector->new_Dec(33, 3735928559);
193 $vec->Resize(32); # suppress leading 0 if unwanted
194 $hex = $vec->to_Hex();
6761e064 195
818c4caa 196=item How do I convert from octal to decimal
6761e064 197
198Using Perl's built in conversion of numbers with leading zeros:
199
ac9dac7f 200 $dec = 033653337357; # note the leading 0!
6761e064 201
ac9dac7f 202Using the C<oct> function:
6761e064 203
ac9dac7f 204 $dec = oct("33653337357");
6761e064 205
ac9dac7f 206Using C<Bit::Vector>:
6761e064 207
ac9dac7f 208 use Bit::Vector;
209 $vec = Bit::Vector->new(32);
210 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
211 $dec = $vec->to_Dec();
6761e064 212
818c4caa 213=item How do I convert from decimal to octal
6761e064 214
ac9dac7f 215Using C<sprintf>:
6761e064 216
ac9dac7f 217 $oct = sprintf("%o", 3735928559);
6761e064 218
ac9dac7f 219Using C<Bit::Vector>:
6761e064 220
ac9dac7f 221 use Bit::Vector;
222 $vec = Bit::Vector->new_Dec(32, -559038737);
223 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 224
818c4caa 225=item How do I convert from binary to decimal
6761e064 226
2c646907 227Perl 5.6 lets you write binary numbers directly with
ac9dac7f 228the C<0b> notation:
2c646907 229
ac9dac7f 230 $number = 0b10110110;
6f0efb17 231
ac9dac7f 232Using C<oct>:
6f0efb17 233
ac9dac7f 234 my $input = "10110110";
235 $decimal = oct( "0b$input" );
2c646907 236
ac9dac7f 237Using C<pack> and C<ord>:
d92eb7b0 238
ac9dac7f 239 $decimal = ord(pack('B8', '10110110'));
68dc0745 240
ac9dac7f 241Using C<pack> and C<unpack> for larger strings:
6761e064 242
ac9dac7f 243 $int = unpack("N", pack("B32",
6761e064 244 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 245 $dec = sprintf("%d", $int);
6761e064 246
ac9dac7f 247 # substr() is used to left pad a 32 character string with zeros.
6761e064 248
ac9dac7f 249Using C<Bit::Vector>:
6761e064 250
ac9dac7f 251 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
252 $dec = $vec->to_Dec();
6761e064 253
818c4caa 254=item How do I convert from decimal to binary
6761e064 255
ac9dac7f 256Using C<sprintf> (perl 5.6+):
4dfcc30b 257
ac9dac7f 258 $bin = sprintf("%b", 3735928559);
4dfcc30b 259
ac9dac7f 260Using C<unpack>:
6761e064 261
ac9dac7f 262 $bin = unpack("B*", pack("N", 3735928559));
6761e064 263
ac9dac7f 264Using C<Bit::Vector>:
6761e064 265
ac9dac7f 266 use Bit::Vector;
267 $vec = Bit::Vector->new_Dec(32, -559038737);
268 $bin = $vec->to_Bin();
6761e064 269
270The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
271are left as an exercise to the inclined reader.
68dc0745 272
818c4caa 273=back
68dc0745 274
65acb1b1 275=head2 Why doesn't & work the way I want it to?
276
277The behavior of binary arithmetic operators depends on whether they're
278used on numbers or strings. The operators treat a string as a series
279of bits and work with that (the string C<"3"> is the bit pattern
280C<00110011>). The operators work with the binary form of a number
281(the number C<3> is treated as the bit pattern C<00000011>).
282
283So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 284C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1 285(yielding C<"1">).
286
287Most problems with C<&> and C<|> arise because the programmer thinks
288they have a number but really it's a string. The rest arise because
289the programmer says:
290
ac9dac7f 291 if ("\020\020" & "\101\101") {
292 # ...
293 }
65acb1b1 294
295but a string consisting of two null bytes (the result of C<"\020\020"
296& "\101\101">) is not a false value in Perl. You need:
297
ac9dac7f 298 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
299 # ...
300 }
65acb1b1 301
68dc0745 302=head2 How do I multiply matrices?
303
d12d61cf 304Use the C<Math::Matrix> or C<Math::MatrixReal> modules (available from CPAN)
305or the C<PDL> extension (also available from CPAN).
68dc0745 306
307=head2 How do I perform an operation on a series of integers?
308
309To call a function on each element in an array, and collect the
310results, use:
311
ac9dac7f 312 @results = map { my_func($_) } @array;
68dc0745 313
314For example:
315
ac9dac7f 316 @triple = map { 3 * $_ } @single;
68dc0745 317
318To call a function on each element of an array, but ignore the
319results:
320
ac9dac7f 321 foreach $iterator (@array) {
322 some_func($iterator);
323 }
68dc0745 324
325To call a function on each integer in a (small) range, you B<can> use:
326
ac9dac7f 327 @results = map { some_func($_) } (5 .. 25);
68dc0745 328
329but you should be aware that the C<..> operator creates an array of
330all integers in the range. This can take a lot of memory for large
331ranges. Instead use:
332
ac9dac7f 333 @results = ();
334 for ($i=5; $i < 500_005; $i++) {
335 push(@results, some_func($i));
336 }
68dc0745 337
87275199 338This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
339loop will iterate over the range, without creating the entire range.
340
ac9dac7f 341 for my $i (5 .. 500_005) {
342 push(@results, some_func($i));
343 }
87275199 344
345will not create a list of 500,000 integers.
346
68dc0745 347=head2 How can I output Roman numerals?
348
d12d61cf 349Get the L<http://www.cpan.org/modules/by-module/Roman> module.
68dc0745 350
351=head2 Why aren't my random numbers random?
352
65acb1b1 353If you're using a version of Perl before 5.004, you must call C<srand>
354once at the start of your program to seed the random number generator.
49d635f9 355
5cd0b561 356 BEGIN { srand() if $] < 5.004 }
49d635f9 357
65acb1b1 3585.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f 359call C<srand> more than once--you make your numbers less random,
360rather than more.
92c2ed05 361
65acb1b1 362Computers are good at being predictable and bad at being random
06a5f41f 363(despite appearances caused by bugs in your programs :-). see the
49d635f9 364F<random> article in the "Far More Than You Ever Wanted To Know"
d12d61cf 365collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
ac9dac7f 366of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 367who attempts to generate random numbers by deterministic means is, of
b432a672 368course, living in a state of sin."
65acb1b1 369
370If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 371provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1 372CPAN. It uses the imperfections in your system's timer to generate
373random numbers, but this takes quite a while. If you want a better
92c2ed05 374pseudorandom generator than comes with your operating system, look at
d12d61cf 375"Numerical Recipes in C" at L<http://www.nr.com/>.
68dc0745 376
881bdbd4 377=head2 How do I get a random number between X and Y?
378
ee891a00 379To get a random number between two values, you can use the C<rand()>
109f0441 380built-in to get a random number between 0 and 1. From there, you shift
ee891a00 381that into the range that you want.
500071f4 382
ee891a00 383C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
384what you want to have perl figure out is a random number in the range
385from 0 to the difference between your I<X> and I<Y>.
793f5136 386
ee891a00 387That is, to get a number between 10 and 15, inclusive, you want a
388random number between 0 and 5 that you can then add to 10.
793f5136 389
109f0441 390 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136 391
392Hence you derive the following simple function to abstract
393that. It selects a random integer between the two given
500071f4 394integers (inclusive), For example: C<random_int_between(50,120)>.
395
ac9dac7f 396 sub random_int_between {
500071f4 397 my($min, $max) = @_;
398 # Assumes that the two arguments are integers themselves!
399 return $min if $min == $max;
400 ($min, $max) = ($max, $min) if $min > $max;
401 return $min + int rand(1 + $max - $min);
402 }
881bdbd4 403
68dc0745 404=head1 Data: Dates
405
5cd0b561 406=head2 How do I find the day or week of the year?
68dc0745 407
d12d61cf 408The C<localtime> function returns the day of the year. Without an
409argument C<localtime> uses the current time.
68dc0745 410
a05e4845 411 $day_of_year = (localtime)[7];
ffc145e8 412
ac9dac7f 413The C<POSIX> module can also format a date as the day of the year or
5cd0b561 414week of the year.
68dc0745 415
5cd0b561 416 use POSIX qw/strftime/;
417 my $day_of_year = strftime "%j", localtime;
418 my $week_of_year = strftime "%W", localtime;
419
ac9dac7f 420To get the day of year for any date, use C<POSIX>'s C<mktime> to get
d12d61cf 421a time in epoch seconds for the argument to C<localtime>.
ffc145e8 422
ac9dac7f 423 use POSIX qw/mktime strftime/;
6670e5e7 424 my $week_of_year = strftime "%W",
ac9dac7f 425 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 426
ac9dac7f 427The C<Date::Calc> module provides two functions to calculate these.
5cd0b561 428
429 use Date::Calc;
430 my $day_of_year = Day_of_Year( 1987, 12, 18 );
431 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 432
d92eb7b0 433=head2 How do I find the current century or millennium?
434
435Use the following simple functions:
436
ac9dac7f 437 sub get_century {
438 return int((((localtime(shift || time))[5] + 1999))/100);
439 }
6670e5e7 440
ac9dac7f 441 sub get_millennium {
442 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
443 }
d92eb7b0 444
ac9dac7f 445On some systems, the C<POSIX> module's C<strftime()> function has been
446extended in a non-standard way to use a C<%C> format, which they
447sometimes claim is the "century". It isn't, because on most such
448systems, this is only the first two digits of the four-digit year, and
449thus cannot be used to reliably determine the current century or
450millennium.
d92eb7b0 451
92c2ed05 452=head2 How can I compare two dates and find the difference?
68dc0745 453
b68463f7 454(contributed by brian d foy)
455
ac9dac7f 456You could just store all your dates as a number and then subtract.
457Life isn't always that simple though. If you want to work with
458formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
459modules can help you.
68dc0745 460
461=head2 How can I take a string and turn it into epoch seconds?
462
463If it's a regular enough string that it always has the same format,
92c2ed05 464you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f 465C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
466and C<Date::Manip> modules from CPAN.
68dc0745 467
468=head2 How can I find the Julian Day?
469
7678cced 470(contributed by brian d foy and Dave Cross)
471
ac9dac7f 472You can use the C<Time::JulianDay> module available on CPAN. Ensure
473that you really want to find a Julian day, though, as many people have
7678cced 474different ideas about Julian days. See
475http://www.hermetic.ch/cal_stud/jdn.htm for instance.
476
ac9dac7f 477You can also try the C<DateTime> module, which can convert a date/time
7678cced 478to a Julian Day.
479
ac9dac7f 480 $ perl -MDateTime -le'print DateTime->today->jd'
481 2453401.5
7678cced 482
483Or the modified Julian Day
484
ac9dac7f 485 $ perl -MDateTime -le'print DateTime->today->mjd'
486 53401
7678cced 487
488Or even the day of the year (which is what some people think of as a
489Julian day)
490
ac9dac7f 491 $ perl -MDateTime -le'print DateTime->today->doy'
492 31
be94a901 493
65acb1b1 494=head2 How do I find yesterday's date?
109f0441 495X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
496X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
497X<timelocal>
65acb1b1 498
6670e5e7 499(contributed by brian d foy)
49d635f9 500
6670e5e7 501Use one of the Date modules. The C<DateTime> module makes it simple, and
502give you the same time of day, only the day before.
49d635f9 503
6670e5e7 504 use DateTime;
58103a2e 505
6670e5e7 506 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 507
6670e5e7 508 print "Yesterday was $yesterday\n";
49d635f9 509
ee891a00 510You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 511function.
49d635f9 512
6670e5e7 513 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 514
6670e5e7 515 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 516
ee891a00 517 print "@date_time\n";
58103a2e 518
6670e5e7 519Most people try to use the time rather than the calendar to figure out
520dates, but that assumes that days are twenty-four hours each. For
521most people, there are two days a year when they aren't: the switch to
522and from summer time throws this off. Let the modules do the work.
d92eb7b0 523
109f0441 524If you absolutely must do it yourself (or can't use one of the
525modules), here's a solution using C<Time::Local>, which comes with
526Perl:
527
528 # contributed by Gunnar Hjalmarsson
529 use Time::Local;
530 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
531 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
532 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
533
534In this case, you measure the day starting at noon, and subtract 24
535hours. Even if the length of the calendar day is 23 or 25 hours,
536you'll still end up on the previous calendar day, although not at
537noon. Since you don't care about the time, the one hour difference
538doesn't matter and you end up with the previous date.
539
3bc3c5be 540=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
541
542(contributed by brian d foy)
543
23bec515 544Perl itself never had a Y2K problem, although that never stopped people
3bc3c5be 545from creating Y2K problems on their own. See the documentation for
546C<localtime> for its proper use.
547
d12d61cf 548Starting with Perl 5.11, C<localtime> and C<gmtime> can handle dates past
3bc3c5be 54903:14:08 January 19, 2038, when a 32-bit based time would overflow. You
550still might get a warning on a 32-bit C<perl>:
551
552 % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
553 Integer overflow in hexadecimal number at -e line 1.
554 Wed Nov 1 19:42:39 5576711
555
556On a 64-bit C<perl>, you can get even larger dates for those really long
557running projects:
558
559 % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
560 Thu Nov 2 00:42:39 5576711
561
562You're still out of luck if you need to keep tracking of decaying protons
563though.
5a964f20 564
68dc0745 565=head1 Data: Strings
566
567=head2 How do I validate input?
568
6670e5e7 569(contributed by brian d foy)
570
571There are many ways to ensure that values are what you expect or
572want to accept. Besides the specific examples that we cover in the
573perlfaq, you can also look at the modules with "Assert" and "Validate"
574in their names, along with other modules such as C<Regexp::Common>.
575
576Some modules have validation for particular types of input, such
577as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
578and C<Data::Validate::IP>.
68dc0745 579
580=head2 How do I unescape a string?
581
b432a672 582It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 583with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 584character are removed with
68dc0745 585
ac9dac7f 586 s/\\(.)/$1/g;
68dc0745 587
92c2ed05 588This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 589
590=head2 How do I remove consecutive pairs of characters?
591
6670e5e7 592(contributed by brian d foy)
593
594You can use the substitution operator to find pairs of characters (or
595runs of characters) and replace them with a single instance. In this
596substitution, we find a character in C<(.)>. The memory parentheses
597store the matched character in the back-reference C<\1> and we use
598that to require that the same thing immediately follow it. We replace
599that part of the string with the character in C<$1>.
68dc0745 600
ac9dac7f 601 s/(.)\1/$1/g;
d92eb7b0 602
6670e5e7 603We can also use the transliteration operator, C<tr///>. In this
604example, the search list side of our C<tr///> contains nothing, but
605the C<c> option complements that so it contains everything. The
606replacement list also contains nothing, so the transliteration is
607almost a no-op since it won't do any replacements (or more exactly,
608replace the character with itself). However, the C<s> option squashes
609duplicated and consecutive characters in the string so a character
610does not show up next to itself
d92eb7b0 611
6670e5e7 612 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 613 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 614
615=head2 How do I expand function calls in a string?
616
6670e5e7 617(contributed by brian d foy)
618
619This is documented in L<perlref>, and although it's not the easiest
620thing to read, it does work. In each of these examples, we call the
58103a2e 621function inside the braces used to dereference a reference. If we
5ae37c3f 622have more than one return value, we can construct and dereference an
6670e5e7 623anonymous array. In this case, we call the function in list context.
624
58103a2e 625 print "The time values are @{ [localtime] }.\n";
6670e5e7 626
627If we want to call the function in scalar context, we have to do a bit
628more work. We can really have any code we like inside the braces, so
629we simply have to end with the scalar reference, although how you do
e573f903 630that is up to you, and you can use code inside the braces. Note that
631the use of parens creates a list context, so we need C<scalar> to
632force the scalar context on the function:
68dc0745 633
6670e5e7 634 print "The time is ${\(scalar localtime)}.\n"
58103a2e 635
6670e5e7 636 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 637
6670e5e7 638If your function already returns a reference, you don't need to create
639the reference yourself.
640
641 sub timestamp { my $t = localtime; \$t }
58103a2e 642
6670e5e7 643 print "The time is ${ timestamp() }.\n";
58103a2e 644
645The C<Interpolation> module can also do a lot of magic for you. You can
646specify a variable name, in this case C<E>, to set up a tied hash that
647does the interpolation for you. It has several other methods to do this
648as well.
649
650 use Interpolation E => 'eval';
651 print "The time values are $E{localtime()}.\n";
652
653In most cases, it is probably easier to simply use string concatenation,
654which also forces scalar context.
6670e5e7 655
ac9dac7f 656 print "The time is " . localtime() . ".\n";
68dc0745 657
68dc0745 658=head2 How do I find matching/nesting anything?
659
92c2ed05 660This isn't something that can be done in one regular expression, no
661matter how complicated. To find something between two single
662characters, a pattern like C</x([^x]*)x/> will get the intervening
663bits in $1. For multiple ones, then something more like
ac9dac7f 664C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7 665nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
666C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
667L<perlre/(??{ code })>. For other cases, you'll have to write a
668parser.
92c2ed05 669
670If you are serious about writing a parser, there are a number of
6a2af475 671modules or oddities that will make your life a lot easier. There are
ac9dac7f 672the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
673C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
674the C<Text::Balanced> is part of the standard distribution.
68dc0745 675
92c2ed05 676One simple destructive, inside-out approach that you might try is to
677pull out the smallest nesting parts one at a time:
5a964f20 678
ac9dac7f 679 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
680 # do something with $1
681 }
5a964f20 682
65acb1b1 683A more complicated and sneaky approach is to make Perl's regular
684expression engine do it for you. This is courtesy Dean Inada, and
685rather has the nature of an Obfuscated Perl Contest entry, but it
686really does work:
687
ac9dac7f 688 # $_ contains the string to parse
689 # BEGIN and END are the opening and closing markers for the
690 # nested text.
c47ff5f1 691
ac9dac7f 692 @( = ('(','');
693 @) = (')','');
694 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
695 @$ = (eval{/$re/},$@!~/unmatched/i);
696 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 697
68dc0745 698=head2 How do I reverse a string?
699
ac9dac7f 700Use C<reverse()> in scalar context, as documented in
68dc0745 701L<perlfunc/reverse>.
702
ac9dac7f 703 $reversed = reverse $string;
68dc0745 704
705=head2 How do I expand tabs in a string?
706
5a964f20 707You can do it yourself:
68dc0745 708
ac9dac7f 709 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 710
ac9dac7f 711Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 712distribution).
713
ac9dac7f 714 use Text::Tabs;
715 @expanded_lines = expand(@lines_with_tabs);
68dc0745 716
717=head2 How do I reformat a paragraph?
718
ac9dac7f 719Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 720
ac9dac7f 721 use Text::Wrap;
722 print wrap("\t", ' ', @paragraphs);
68dc0745 723
ac9dac7f 724The paragraphs you give to C<Text::Wrap> should not contain embedded
725newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 726
ac9dac7f 727Or use the CPAN module C<Text::Autoformat>. Formatting files can be
728easily done by making a shell alias, like so:
bc06af74 729
ac9dac7f 730 alias fmt="perl -i -MText::Autoformat -n0777 \
731 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 732
ac9dac7f 733See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74 734capabilities.
735
49d635f9 736=head2 How can I access or change N characters of a string?
68dc0745 737
49d635f9 738You can access the first characters of a string with substr().
739To get the first character, for example, start at position 0
197aec24 740and grab the string of length 1.
68dc0745 741
68dc0745 742
49d635f9 743 $string = "Just another Perl Hacker";
ac9dac7f 744 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 745
49d635f9 746To change part of a string, you can use the optional fourth
747argument which is the replacement string.
68dc0745 748
ac9dac7f 749 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 750
49d635f9 751You can also use substr() as an lvalue.
68dc0745 752
ac9dac7f 753 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 754
68dc0745 755=head2 How do I change the Nth occurrence of something?
756
92c2ed05 757You have to keep track of N yourself. For example, let's say you want
758to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 759C<"whosoever"> or C<"whomsoever">, case insensitively. These
760all assume that $_ contains the string to be altered.
68dc0745 761
ac9dac7f 762 $count = 0;
763 s{((whom?)ever)}{
764 ++$count == 5 # is it the 5th?
765 ? "${2}soever" # yes, swap
766 : $1 # renege and leave it there
767 }ige;
68dc0745 768
5a964f20 769In the more general case, you can use the C</g> modifier in a C<while>
770loop, keeping count of matches.
771
ac9dac7f 772 $WANT = 3;
773 $count = 0;
774 $_ = "One fish two fish red fish blue fish";
775 while (/(\w+)\s+fish\b/gi) {
776 if (++$count == $WANT) {
777 print "The third fish is a $1 one.\n";
778 }
779 }
5a964f20 780
92c2ed05 781That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 782repetition count and repeated pattern like this:
783
ac9dac7f 784 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 785
68dc0745 786=head2 How can I count the number of occurrences of a substring within a string?
787
a6dd486b 788There are a number of ways, with varying efficiency. If you want a
68dc0745 789count of a certain single character (X) within a string, you can use the
790C<tr///> function like so:
791
ac9dac7f 792 $string = "ThisXlineXhasXsomeXx'sXinXit";
793 $count = ($string =~ tr/X//);
794 print "There are $count X characters in the string";
68dc0745 795
796This is fine if you are just looking for a single character. However,
797if you are trying to count multiple character substrings within a
798larger string, C<tr///> won't work. What you can do is wrap a while()
799loop around a global pattern match. For example, let's count negative
800integers:
801
ac9dac7f 802 $string = "-9 55 48 -2 23 -76 4 14 -44";
803 while ($string =~ /-\d+/g) { $count++ }
804 print "There are $count negative numbers in the string";
68dc0745 805
881bdbd4 806Another version uses a global match in list context, then assigns the
807result to a scalar, producing a count of the number of matches.
808
809 $count = () = $string =~ /-\d+/g;
810
109f0441 811=head2 How do I capitalize all the words on one line?
812X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 813
109f0441 814(contributed by brian d foy)
65acb1b1 815
109f0441 816Damian Conway's L<Text::Autoformat> handles all of the thinking
817for you.
369b44b4 818
ac9dac7f 819 use Text::Autoformat;
820 my $x = "Dr. Strangelove or: How I Learned to Stop ".
821 "Worrying and Love the Bomb";
369b44b4 822
ac9dac7f 823 print $x, "\n";
824 for my $style (qw( sentence title highlight )) {
825 print autoformat($x, { case => $style }), "\n";
826 }
369b44b4 827
109f0441 828How do you want to capitalize those words?
829
830 FRED AND BARNEY'S LODGE # all uppercase
831 Fred And Barney's Lodge # title case
832 Fred and Barney's Lodge # highlight case
833
834It's not as easy a problem as it looks. How many words do you think
835are in there? Wait for it... wait for it.... If you answered 5
836you're right. Perl words are groups of C<\w+>, but that's not what
837you want to capitalize. How is Perl supposed to know not to capitalize
838that C<s> after the apostrophe? You could try a regular expression:
839
840 $string =~ s/ (
841 (^\w) #at the beginning of the line
842 | # or
843 (\s\w) #preceded by whitespace
844 )
845 /\U$1/xg;
846
847 $string =~ s/([\w']+)/\u\L$1/g;
848
849Now, what if you don't want to capitalize that "and"? Just use
850L<Text::Autoformat> and get on with the next problem. :)
851
49d635f9 852=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 853
ac9dac7f 854Several modules can handle this sort of parsing--C<Text::Balanced>,
855C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9 856
857Take the example case of trying to split a string that is
858comma-separated into its different fields. You can't use C<split(/,/)>
859because you shouldn't split if the comma is inside quotes. For
860example, take a data line like this:
68dc0745 861
ac9dac7f 862 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 863
864Due to the restriction of the quotes, this is a fairly complex
197aec24 865problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 866I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 867suggests (assuming your string is contained in C<$text>):
68dc0745 868
ac9dac7f 869 @new = ();
870 push(@new, $+) while $text =~ m{
871 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
872 | ([^,]+),?
873 | ,
874 }gx;
875 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 876
46fc3d4c 877If you want to represent quotation marks inside a
878quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 879C<"like \"this\"">.
46fc3d4c 880
ac9dac7f 881Alternatively, the C<Text::ParseWords> module (part of the standard
882Perl distribution) lets you say:
68dc0745 883
ac9dac7f 884 use Text::ParseWords;
885 @new = quotewords(",", 0, $text);
65acb1b1 886
68dc0745 887=head2 How do I strip blank space from the beginning/end of a string?
888
6670e5e7 889(contributed by brian d foy)
68dc0745 890
6670e5e7 891A substitution can do this for you. For a single line, you want to
892replace all the leading or trailing whitespace with nothing. You
893can do that with a pair of substitutions.
68dc0745 894
6670e5e7 895 s/^\s+//;
896 s/\s+$//;
68dc0745 897
6670e5e7 898You can also write that as a single substitution, although it turns
899out the combined statement is slower than the separate ones. That
900might not matter to you, though.
68dc0745 901
6670e5e7 902 s/^\s+|\s+$//g;
68dc0745 903
6670e5e7 904In this regular expression, the alternation matches either at the
905beginning or the end of the string since the anchors have a lower
906precedence than the alternation. With the C</g> flag, the substitution
907makes all possible matches, so it gets both. Remember, the trailing
908newline matches the C<\s+>, and the C<$> anchor can match to the
909physical end of the string, so the newline disappears too. Just add
910the newline to the output, which has the added benefit of preserving
911"blank" (consisting entirely of whitespace) lines which the C<^\s+>
912would remove all by itself.
68dc0745 913
6670e5e7 914 while( <> )
915 {
916 s/^\s+|\s+$//g;
917 print "$_\n";
918 }
5a964f20 919
6670e5e7 920For a multi-line string, you can apply the regular expression
921to each logical line in the string by adding the C</m> flag (for
922"multi-line"). With the C</m> flag, the C<$> matches I<before> an
923embedded newline, so it doesn't remove it. It still removes the
924newline at the end of the string.
925
ac9dac7f 926 $string =~ s/^\s+|\s+$//gm;
6670e5e7 927
928Remember that lines consisting entirely of whitespace will disappear,
929since the first part of the alternation can match the entire string
930and replace it with nothing. If need to keep embedded blank lines,
931you have to do a little more work. Instead of matching any whitespace
932(since that includes a newline), just match the other whitespace.
933
934 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 935
65acb1b1 936=head2 How do I pad a string with blanks or pad a number with zeroes?
937
65acb1b1 938In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 939to pad the string, C<$text> or C<$num> contains the string to be padded,
940and C<$pad_char> contains the padding character. You can use a single
941character string constant instead of the C<$pad_char> variable if you
942know what it is in advance. And in the same way you can use an integer in
943place of C<$pad_len> if you know the pad length in advance.
65acb1b1 944
d92eb7b0 945The simplest method uses the C<sprintf> function. It can pad on the left
946or right with blanks and on the left with zeroes and it will not
947truncate the result. The C<pack> function can only pad strings on the
948right with blanks and it will truncate the result to a maximum length of
949C<$pad_len>.
65acb1b1 950
ac9dac7f 951 # Left padding a string with blanks (no truncation):
04d666b1 952 $padded = sprintf("%${pad_len}s", $text);
953 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 954
ac9dac7f 955 # Right padding a string with blanks (no truncation):
04d666b1 956 $padded = sprintf("%-${pad_len}s", $text);
957 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 958
ac9dac7f 959 # Left padding a number with 0 (no truncation):
04d666b1 960 $padded = sprintf("%0${pad_len}d", $num);
961 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 962
ac9dac7f 963 # Right padding a string with blanks using pack (will truncate):
964 $padded = pack("A$pad_len",$text);
65acb1b1 965
d92eb7b0 966If you need to pad with a character other than blank or zero you can use
967one of the following methods. They all generate a pad string with the
968C<x> operator and combine that with C<$text>. These methods do
969not truncate C<$text>.
65acb1b1 970
d92eb7b0 971Left and right padding with any character, creating a new string:
65acb1b1 972
ac9dac7f 973 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
974 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 975
d92eb7b0 976Left and right padding with any character, modifying C<$text> directly:
65acb1b1 977
ac9dac7f 978 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
979 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 980
68dc0745 981=head2 How do I extract selected columns from a string?
982
e573f903 983(contributed by brian d foy)
984
d12d61cf 985If you know the columns that contain the data, you can
e573f903 986use C<substr> to extract a single column.
987
988 my $column = substr( $line, $start_column, $length );
989
990You can use C<split> if the columns are separated by whitespace or
991some other delimiter, as long as whitespace or the delimiter cannot
992appear as part of the data.
993
994 my $line = ' fred barney betty ';
995 my @columns = split /\s+/, $line;
996 # ( '', 'fred', 'barney', 'betty' );
997
998 my $line = 'fred||barney||betty';
999 my @columns = split /\|/, $line;
1000 # ( 'fred', '', 'barney', '', 'betty' );
1001
1002If you want to work with comma-separated values, don't do this since
1003that format is a bit more complicated. Use one of the modules that
109f0441 1004handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903 1005C<Text::CSV_PP>.
1006
1007If you want to break apart an entire line of fixed columns, you can use
589a5df2 1008C<unpack> with the A (ASCII) format. By using a number after the format
e573f903 1009specifier, you can denote the column width. See the C<pack> and C<unpack>
1010entries in L<perlfunc> for more details.
1011
1012 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1013
1014Note that spaces in the format argument to C<unpack> do not denote literal
1015spaces. If you have space separated data, you may want C<split> instead.
68dc0745 1016
1017=head2 How do I find the soundex value of a string?
1018
7678cced 1019(contributed by brian d foy)
1020
1021You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f 1022matching, you might also try the C<String::Approx>, and
1023C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1024
1025=head2 How can I expand variables in text strings?
1026
e573f903 1027(contributed by brian d foy)
5a964f20 1028
322be77c 1029If you can avoid it, don't, or if you can use a templating system,
c195e131 1030such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1031might even be able to get the job done with C<sprintf> or C<printf>:
1032
1033 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
322be77c 1034
1035However, for the one-off simple case where I don't want to pull out a
1036full templating system, I'll use a string that has two Perl scalar
1037variables in it. In this example, I want to expand C<$foo> and C<$bar>
c195e131 1038to their variable's values:
e573f903 1039
1040 my $foo = 'Fred';
1041 my $bar = 'Barney';
1042 $string = 'Say hello to $foo and $bar';
1043
1044One way I can do this involves the substitution operator and a double
1045C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1046turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1047it with its value. C<$foo>, then, turns into 'Fred', and that's finally
c195e131 1048what's left in the string:
e573f903 1049
1050 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1051
e573f903 1052The C</e> will also silently ignore violations of strict, replacing
c195e131 1053undefined variable names with the empty string. Since I'm using the
109f0441 1054C</e> flag (twice even!), I have all of the same security problems I
c195e131 1055have with C<eval> in its string form. If there's something odd in
1056C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1057I could get myself in trouble.
1058
1059To get around the security problem, I could also pull the values from
1060a hash instead of evaluating variable names. Using a single C</e>, I
1061can check the hash to ensure the value exists, and if it doesn't, I
1062can replace the missing value with a marker, in this case C<???> to
1063signal that I missed something:
e573f903 1064
1065 my $string = 'This has $foo and $bar';
109f0441 1066
e573f903 1067 my %Replacements = (
1068 foo => 'Fred',
ac9dac7f 1069 );
322be77c 1070
e573f903 1071 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1072 $string =~ s/\$(\w+)/
1073 exists $Replacements{$1} ? $Replacements{$1} : '???'
1074 /eg;
322be77c 1075
e573f903 1076 print $string;
322be77c 1077
68dc0745 1078=head2 What's wrong with always quoting "$vars"?
1079
ac9dac7f 1080The problem is that those double-quotes force
e573f903 1081stringification--coercing numbers and references into strings--even
1082when you don't want them to be strings. Think of it this way:
1083double-quote expansion is used to produce new strings. If you already
1084have a string, why do you need more?
68dc0745 1085
1086If you get used to writing odd things like these:
1087
ac9dac7f 1088 print "$var"; # BAD
1089 $new = "$old"; # BAD
1090 somefunc("$var"); # BAD
68dc0745 1091
1092You'll be in trouble. Those should (in 99.8% of the cases) be
1093the simpler and more direct:
1094
ac9dac7f 1095 print $var;
1096 $new = $old;
1097 somefunc($var);
68dc0745 1098
1099Otherwise, besides slowing you down, you're going to break code when
1100the thing in the scalar is actually neither a string nor a number, but
1101a reference:
1102
ac9dac7f 1103 func(\@array);
1104 sub func {
1105 my $aref = shift;
1106 my $oref = "$aref"; # WRONG
1107 }
68dc0745 1108
1109You can also get into subtle problems on those few operations in Perl
1110that actually do care about the difference between a string and a
1111number, such as the magical C<++> autoincrement operator or the
1112syscall() function.
1113
197aec24 1114Stringification also destroys arrays.
5a964f20 1115
ac9dac7f 1116 @lines = `command`;
1117 print "@lines"; # WRONG - extra blanks
1118 print @lines; # right
5a964f20 1119
04d666b1 1120=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1121
1122Check for these three things:
1123
1124=over 4
1125
04d666b1 1126=item There must be no space after the E<lt>E<lt> part.
68dc0745 1127
197aec24 1128=item There (probably) should be a semicolon at the end.
68dc0745 1129
197aec24 1130=item You can't (easily) have any space in front of the tag.
68dc0745 1131
1132=back
1133
197aec24 1134If you want to indent the text in the here document, you
5a964f20 1135can do this:
1136
1137 # all in one
1138 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1139 your text
1140 goes here
1141 HERE_TARGET
1142
1143But the HERE_TARGET must still be flush against the margin.
197aec24 1144If you want that indented also, you'll have to quote
5a964f20 1145in the indentation.
1146
1147 ($quote = <<' FINIS') =~ s/^\s+//gm;
1148 ...we will have peace, when you and all your works have
1149 perished--and the works of your dark master to whom you
1150 would deliver us. You are a liar, Saruman, and a corrupter
1151 of men's hearts. --Theoden in /usr/src/perl/taint.c
1152 FINIS
83ded9ee 1153 $quote =~ s/\s+--/\n--/;
5a964f20 1154
1155A nice general-purpose fixer-upper function for indented here documents
1156follows. It expects to be called with a here document as its argument.
1157It looks to see whether each line begins with a common substring, and
a6dd486b 1158if so, strips that substring off. Otherwise, it takes the amount of leading
1159whitespace found on the first line and removes that much off each
5a964f20 1160subsequent line.
1161
1162 sub fix {
1163 local $_ = shift;
a6dd486b 1164 my ($white, $leader); # common whitespace and common leading string
5a964f20 1165 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1166 ($white, $leader) = ($2, quotemeta($1));
1167 } else {
1168 ($white, $leader) = (/^(\s+)/, '');
1169 }
1170 s/^\s*?$leader(?:$white)?//gm;
1171 return $_;
1172 }
1173
c8db1d39 1174This works with leading special strings, dynamically determined:
5a964f20 1175
ac9dac7f 1176 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20 1177 @@@ int
1178 @@@ runops() {
1179 @@@ SAVEI32(runlevel);
1180 @@@ runlevel++;
d92eb7b0 1181 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1182 @@@ TAINT_NOT;
1183 @@@ return 0;
1184 @@@ }
ac9dac7f 1185 MAIN_INTERPRETER_LOOP
5a964f20 1186
a6dd486b 1187Or with a fixed amount of leading whitespace, with remaining
5a964f20 1188indentation correctly preserved:
1189
ac9dac7f 1190 $poem = fix<<EVER_ON_AND_ON;
5a964f20 1191 Now far ahead the Road has gone,
1192 And I must follow, if I can,
1193 Pursuing it with eager feet,
1194 Until it joins some larger way
1195 Where many paths and errands meet.
1196 And whither then? I cannot say.
1197 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1198 EVER_ON_AND_ON
5a964f20 1199
68dc0745 1200=head1 Data: Arrays
1201
65acb1b1 1202=head2 What is the difference between a list and an array?
1203
8d2e243f 1204(contributed by brian d foy)
1205
1206A list is a fixed collection of scalars. An array is a variable that
1207holds a variable collection of scalars. An array can supply its collection
1208for list operations, so list operations also work on arrays:
1209
1210 # slices
1211 ( 'dog', 'cat', 'bird' )[2,3];
1212 @animals[2,3];
1213
1214 # iteration
1215 foreach ( qw( dog cat bird ) ) { ... }
1216 foreach ( @animals ) { ... }
1217
1218 my @three = grep { length == 3 } qw( dog cat bird );
1219 my @three = grep { length == 3 } @animals;
d12d61cf 1220
8d2e243f 1221 # supply an argument list
1222 wash_animals( qw( dog cat bird ) );
1223 wash_animals( @animals );
1224
1225Array operations, which change the scalars, reaaranges them, or adds
1226or subtracts some scalars, only work on arrays. These can't work on a
1227list, which is fixed. Array operations include C<shift>, C<unshift>,
1228C<push>, C<pop>, and C<splice>.
1229
1230An array can also change its length:
1231
1232 $#animals = 1; # truncate to two elements
1233 $#animals = 10000; # pre-extend to 10,001 elements
1234
1235You can change an array element, but you can't change a list element:
1236
1237 $animals[0] = 'Rottweiler';
1238 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1239
1240 foreach ( @animals ) {
1241 s/^d/fr/; # works fine
1242 }
d12d61cf 1243
8d2e243f 1244 foreach ( qw( dog cat bird ) ) {
1245 s/^d/fr/; # Error! Modification of read only value!
1246 }
1247
d12d61cf 1248However, if the list element is itself a variable, it appears that you
8d2e243f 1249can change a list element. However, the list element is the variable, not
1250the data. You're not changing the list element, but something the list
d12d61cf 1251element refers to. The list element itself doesn't change: it's still
8d2e243f 1252the same variable.
65acb1b1 1253
8d2e243f 1254You also have to be careful about context. You can assign an array to
1255a scalar to get the number of elements in the array. This only works
1256for arrays, though:
1257
1258 my $count = @animals; # only works with arrays
d12d61cf 1259
8d2e243f 1260If you try to do the same thing with what you think is a list, you
1261get a quite different result. Although it looks like you have a list
1262on the righthand side, Perl actually sees a bunch of scalars separated
1263by a comma:
65acb1b1 1264
8d2e243f 1265 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
65acb1b1 1266
8d2e243f 1267Since you're assigning to a scalar, the righthand side is in scalar
1268context. The comma operator (yes, it's an operator!) in scalar
1269context evaluates its lefthand side, throws away the result, and
1270evaluates it's righthand side and returns the result. In effect,
1271that list-lookalike assigns to C<$scalar> it's rightmost value. Many
1272people mess this up becuase they choose a list-lookalike whose
1273last element is also the count they expect:
1274
1275 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
65acb1b1 1276
68dc0745 1277=head2 What is the difference between $array[1] and @array[1]?
1278
8d2e243f 1279(contributed by brian d foy)
1280
1281The difference is the sigil, that special character in front of the
1282array name. The C<$> sigil means "exactly one item", while the C<@>
1283sigil means "zero or more items". The C<$> gets you a single scalar,
1284while the C<@> gets you a list.
68dc0745 1285
8d2e243f 1286The confusion arises because people incorrectly assume that the sigil
1287denotes the variable type.
68dc0745 1288
8d2e243f 1289The C<$array[1]> is a single-element access to the array. It's going
1290to return the item in index 1 (or undef if there is no item there).
1291If you intend to get exactly one element from the array, this is the
1292form you should use.
68dc0745 1293
8d2e243f 1294The C<@array[1]> is an array slice, although it has only one index.
1295You can pull out multiple elements simultaneously by specifying
1296additional indices as a list, like C<@array[1,4,3,0]>.
68dc0745 1297
8d2e243f 1298Using a slice on the lefthand side of the assignment supplies list
d12d61cf 1299context to the righthand side. This can lead to unexpected results.
1300For instance, if you want to read a single line from a filehandle,
8d2e243f 1301assigning to a scalar value is fine:
68dc0745 1302
8d2e243f 1303 $array[1] = <STDIN>;
1304
1305However, in list context, the line input operator returns all of the
1306lines as a list. The first line goes into C<@array[1]> and the rest
1307of the lines mysteriously disappear:
1308
1309 @array[1] = <STDIN>; # most likely not what you want
1310
1311Either the C<use warnings> pragma or the B<-w> flag will warn you when
1312you use an array slice with a single index.
68dc0745 1313
d92eb7b0 1314=head2 How can I remove duplicate elements from a list or array?
68dc0745 1315
6670e5e7 1316(contributed by brian d foy)
68dc0745 1317
6670e5e7 1318Use a hash. When you think the words "unique" or "duplicated", think
1319"hash keys".
68dc0745 1320
6670e5e7 1321If you don't care about the order of the elements, you could just
1322create the hash then extract the keys. It's not important how you
1323create that hash: just that you use C<keys> to get the unique
1324elements.
551e1d92 1325
ac9dac7f 1326 my %hash = map { $_, 1 } @array;
1327 # or a hash slice: @hash{ @array } = ();
1328 # or a foreach: $hash{$_} = 1 foreach ( @array );
1329
1330 my @unique = keys %hash;
68dc0745 1331
ac9dac7f 1332If you want to use a module, try the C<uniq> function from
1333C<List::MoreUtils>. In list context it returns the unique elements,
1334preserving their order in the list. In scalar context, it returns the
1335number of unique elements.
1336
1337 use List::MoreUtils qw(uniq);
1338
1339 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1340 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1341
6670e5e7 1342You can also go through each element and skip the ones you've seen
1343before. Use a hash to keep track. The first time the loop sees an
1344element, that element has no key in C<%Seen>. The C<next> statement
1345creates the key and immediately uses its value, which is C<undef>, so
1346the loop continues to the C<push> and increments the value for that
1347key. The next time the loop sees that same element, its key exists in
1348the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f 1349C<undef>), so the next skips that iteration and the loop goes to the
1350next element.
551e1d92 1351
6670e5e7 1352 my @unique = ();
1353 my %seen = ();
68dc0745 1354
6670e5e7 1355 foreach my $elem ( @array )
1356 {
1357 next if $seen{ $elem }++;
1358 push @unique, $elem;
1359 }
68dc0745 1360
6670e5e7 1361You can write this more briefly using a grep, which does the
1362same thing.
68dc0745 1363
ac9dac7f 1364 my %seen = ();
1365 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1366
ddbc1f16 1367=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1368
109f0441 1369(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1370
5a964f20 1371Hearing the word "in" is an I<in>dication that you probably should have
1372used a hash, not a list or array, to store your data. Hashes are
1373designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1374
109f0441 1375That being said, there are several ways to approach this. In Perl 5.10
1376and later, you can use the smart match operator to check that an item is
1377contained in an array or a hash:
1378
1379 use 5.010;
1380
1381 if( $item ~~ @array )
1382 {
1383 say "The array contains $item"
1384 }
1385
1386 if( $item ~~ %hash )
1387 {
1388 say "The hash contains $item"
1389 }
1390
1391With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1392are going to make this query many times over arbitrary string values,
881bdbd4 1393the fastest way is probably to invert the original array and maintain a
109f0441 1394hash whose keys are the first array's values:
68dc0745 1395
ac9dac7f 1396 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1397 %is_blue = ();
1398 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1399
ac9dac7f 1400Now you can check whether C<$is_blue{$some_color}>. It might have
1401been a good idea to keep the blues all in a hash in the first place.
68dc0745 1402
1403If the values are all small integers, you could use a simple indexed
1404array. This kind of an array will take up less space:
1405
ac9dac7f 1406 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1407 @is_tiny_prime = ();
1408 for (@primes) { $is_tiny_prime[$_] = 1 }
1409 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1410
1411Now you check whether $is_tiny_prime[$some_number].
1412
1413If the values in question are integers instead of strings, you can save
1414quite a lot of space by using bit strings instead:
1415
ac9dac7f 1416 @articles = ( 1..10, 150..2000, 2017 );
1417 undef $read;
1418 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1419
1420Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1421
9e72e4c6 1422These methods guarantee fast individual tests but require a re-organization
1423of the original list or array. They only pay off if you have to test
1424multiple values against the same array.
68dc0745 1425
ac9dac7f 1426If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1427the function C<first> for this purpose. It works by stopping once it
c195e131 1428finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1429looks like this subroutine:
68dc0745 1430
9e72e4c6 1431 sub first (&@) {
1432 my $code = shift;
1433 foreach (@_) {
1434 return $_ if &{$code}();
1435 }
1436 undef;
1437 }
68dc0745 1438
9e72e4c6 1439If speed is of little concern, the common idiom uses grep in scalar context
1440(which returns the number of items that passed its condition) to traverse the
1441entire list. This does have the benefit of telling you how many matches it
1442found, though.
68dc0745 1443
9e72e4c6 1444 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1445
9e72e4c6 1446If you want to actually extract the matching elements, simply use grep in
1447list context.
68dc0745 1448
9e72e4c6 1449 my @matches = grep $_ eq $whatever, @array;
58103a2e 1450
68dc0745 1451=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1452
ac9dac7f 1453Use a hash. Here's code to do both and more. It assumes that each
1454element is unique in a given array:
68dc0745 1455
ac9dac7f 1456 @union = @intersection = @difference = ();
1457 %count = ();
1458 foreach $element (@array1, @array2) { $count{$element}++ }
1459 foreach $element (keys %count) {
1460 push @union, $element;
1461 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1462 }
68dc0745 1463
ac9dac7f 1464Note that this is the I<symmetric difference>, that is, all elements
1465in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1466
65acb1b1 1467=head2 How do I test whether two arrays or hashes are equal?
1468
109f0441 1469With Perl 5.10 and later, the smart match operator can give you the answer
1470with the least amount of work:
1471
1472 use 5.010;
1473
1474 if( @array1 ~~ @array2 )
1475 {
1476 say "The arrays are the same";
1477 }
1478
1479 if( %hash1 ~~ %hash2 ) # doesn't check values!
1480 {
1481 say "The hash keys are the same";
1482 }
1483
ac9dac7f 1484The following code works for single-level arrays. It uses a
1485stringwise comparison, and does not distinguish defined versus
1486undefined empty strings. Modify if you have other needs.
65acb1b1 1487
ac9dac7f 1488 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1489
ac9dac7f 1490 sub compare_arrays {
1491 my ($first, $second) = @_;
1492 no warnings; # silence spurious -w undef complaints
1493 return 0 unless @$first == @$second;
1494 for (my $i = 0; $i < @$first; $i++) {
1495 return 0 if $first->[$i] ne $second->[$i];
1496 }
1497 return 1;
1498 }
65acb1b1 1499
1500For multilevel structures, you may wish to use an approach more
ac9dac7f 1501like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1502
ac9dac7f 1503 use FreezeThaw qw(cmpStr);
1504 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1505
ac9dac7f 1506 printf "a and b contain %s arrays\n",
1507 cmpStr(\@a, \@b) == 0
1508 ? "the same"
1509 : "different";
65acb1b1 1510
ac9dac7f 1511This approach also works for comparing hashes. Here we'll demonstrate
1512two different answers:
65acb1b1 1513
ac9dac7f 1514 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1515
ac9dac7f 1516 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1517 $a{EXTRA} = \%b;
1518 $b{EXTRA} = \%a;
65acb1b1 1519
ac9dac7f 1520 printf "a and b contain %s hashes\n",
65acb1b1 1521 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1522
ac9dac7f 1523 printf "a and b contain %s hashes\n",
65acb1b1 1524 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1525
1526
1527The first reports that both those the hashes contain the same data,
1528while the second reports that they do not. Which you prefer is left as
1529an exercise to the reader.
1530
68dc0745 1531=head2 How do I find the first array element for which a condition is true?
1532
49d635f9 1533To find the first array element which satisfies a condition, you can
ac9dac7f 1534use the C<first()> function in the C<List::Util> module, which comes
1535with Perl 5.8. This example finds the first element that contains
1536"Perl".
49d635f9 1537
1538 use List::Util qw(first);
197aec24 1539
49d635f9 1540 my $element = first { /Perl/ } @array;
197aec24 1541
ac9dac7f 1542If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9 1543same thing. Once you find the element, you stop the loop with last.
1544
1545 my $found;
ac9dac7f 1546 foreach ( @array ) {
6670e5e7 1547 if( /Perl/ ) { $found = $_; last }
49d635f9 1548 }
1549
1550If you want the array index, you can iterate through the indices
1551and check the array element at each index until you find one
1552that satisfies the condition.
1553
197aec24 1554 my( $found, $index ) = ( undef, -1 );
ac9dac7f 1555 for( $i = 0; $i < @array; $i++ ) {
1556 if( $array[$i] =~ /Perl/ ) {
6670e5e7 1557 $found = $array[$i];
1558 $index = $i;
1559 last;
1560 }
1561 }
68dc0745 1562
1563=head2 How do I handle linked lists?
1564
1565In general, you usually don't need a linked list in Perl, since with
ac9dac7f 1566regular arrays, you can push and pop or shift and unshift at either
1567end, or you can use splice to add and/or remove arbitrary number of
ac003c96 1568elements at arbitrary points. Both pop and shift are O(1)
ac9dac7f 1569operations on Perl's dynamic arrays. In the absence of shifts and
1570pops, push in general needs to reallocate on the order every log(N)
1571times, and unshift will need to copy pointers each time.
68dc0745 1572
1573If you really, really wanted, you could use structures as described in
ac9dac7f 1574L<perldsc> or L<perltoot> and do just what the algorithm book tells
1575you to do. For example, imagine a list node like this:
65acb1b1 1576
ac9dac7f 1577 $node = {
1578 VALUE => 42,
1579 LINK => undef,
1580 };
65acb1b1 1581
1582You could walk the list this way:
1583
ac9dac7f 1584 print "List: ";
1585 for ($node = $head; $node; $node = $node->{LINK}) {
1586 print $node->{VALUE}, " ";
1587 }
1588 print "\n";
65acb1b1 1589
a6dd486b 1590You could add to the list this way:
65acb1b1 1591
ac9dac7f 1592 my ($head, $tail);
1593 $tail = append($head, 1); # grow a new head
1594 for $value ( 2 .. 10 ) {
1595 $tail = append($tail, $value);
1596 }
65acb1b1 1597
ac9dac7f 1598 sub append {
1599 my($list, $value) = @_;
1600 my $node = { VALUE => $value };
1601 if ($list) {
1602 $node->{LINK} = $list->{LINK};
1603 $list->{LINK} = $node;
1604 }
1605 else {
1606 $_[0] = $node; # replace caller's version
1607 }
1608 return $node;
1609 }
65acb1b1 1610
1611But again, Perl's built-in are virtually always good enough.
68dc0745 1612
1613=head2 How do I handle circular lists?
109f0441 1614X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1615X<cycle> X<modulus>
68dc0745 1616
109f0441 1617(contributed by brian d foy)
1618
589a5df2 1619If you want to cycle through an array endlessly, you can increment the
109f0441 1620index modulo the number of elements in the array:
68dc0745 1621
109f0441 1622 my @array = qw( a b c );
1623 my $i = 0;
1624
1625 while( 1 ) {
1626 print $array[ $i++ % @array ], "\n";
1627 last if $i > 20;
1628 }
ac9dac7f 1629
109f0441 1630You can also use C<Tie::Cycle> to use a scalar that always has the
1631next element of the circular array:
ac9dac7f 1632
1633 use Tie::Cycle;
1634
1635 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1636
1637 print $cycle; # FFFFFF
1638 print $cycle; # 000000
1639 print $cycle; # FFFF00
68dc0745 1640
109f0441 1641The C<Array::Iterator::Circular> creates an iterator object for
1642circular arrays:
1643
1644 use Array::Iterator::Circular;
1645
1646 my $color_iterator = Array::Iterator::Circular->new(
1647 qw(red green blue orange)
1648 );
1649
1650 foreach ( 1 .. 20 ) {
1651 print $color_iterator->next, "\n";
1652 }
1653
68dc0745 1654=head2 How do I shuffle an array randomly?
1655
45bbf655 1656If you either have Perl 5.8.0 or later installed, or if you have
1657Scalar-List-Utils 1.03 or later installed, you can say:
1658
ac9dac7f 1659 use List::Util 'shuffle';
45bbf655 1660
1661 @shuffled = shuffle(@list);
1662
f05bbc40 1663If not, you can use a Fisher-Yates shuffle.
5a964f20 1664
ac9dac7f 1665 sub fisher_yates_shuffle {
1666 my $deck = shift; # $deck is a reference to an array
109f0441 1667 return unless @$deck; # must not be empty!
1668
ac9dac7f 1669 my $i = @$deck;
1670 while (--$i) {
1671 my $j = int rand ($i+1);
1672 @$deck[$i,$j] = @$deck[$j,$i];
1673 }
1674 }
5a964f20 1675
ac9dac7f 1676 # shuffle my mpeg collection
1677 #
1678 my @mpeg = <audio/*/*.mp3>;
1679 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1680 print @mpeg;
5a964f20 1681
45bbf655 1682Note that the above implementation shuffles an array in place,
ac9dac7f 1683unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655 1684a new shuffled list.
1685
d92eb7b0 1686You've probably seen shuffling algorithms that work using splice,
a6dd486b 1687randomly picking another element to swap the current element with
68dc0745 1688
ac9dac7f 1689 srand;
1690 @new = ();
1691 @old = 1 .. 10; # just a demo
1692 while (@old) {
1693 push(@new, splice(@old, rand @old, 1));
1694 }
68dc0745 1695
ac9dac7f 1696This is bad because splice is already O(N), and since you do it N
1697times, you just invented a quadratic algorithm; that is, O(N**2).
1698This does not scale, although Perl is so efficient that you probably
1699won't notice this until you have rather largish arrays.
68dc0745 1700
1701=head2 How do I process/modify each element of an array?
1702
1703Use C<for>/C<foreach>:
1704
ac9dac7f 1705 for (@lines) {
6670e5e7 1706 s/foo/bar/; # change that word
1707 tr/XZ/ZX/; # swap those letters
ac9dac7f 1708 }
68dc0745 1709
1710Here's another; let's compute spherical volumes:
1711
ac9dac7f 1712 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7 1713 $_ **= 3;
1714 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1715 }
197aec24 1716
ac9dac7f 1717which can also be done with C<map()> which is made to transform
49d635f9 1718one list into another:
1719
1720 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1721
76817d6d 1722If you want to do the same thing to modify the values of the
1723hash, you can use the C<values> function. As of Perl 5.6
1724the values are not copied, so if you modify $orbit (in this
1725case), you modify the value.
5a964f20 1726
ac9dac7f 1727 for $orbit ( values %orbits ) {
6670e5e7 1728 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1729 }
818c4caa 1730
76817d6d 1731Prior to perl 5.6 C<values> returned copies of the values,
1732so older perl code often contains constructions such as
1733C<@orbits{keys %orbits}> instead of C<values %orbits> where
1734the hash is to be modified.
818c4caa 1735
68dc0745 1736=head2 How do I select a random element from an array?
1737
ac9dac7f 1738Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1739
ac9dac7f 1740 $index = rand @array;
1741 $element = $array[$index];
68dc0745 1742
793f5136 1743Or, simply:
ac9dac7f 1744
1745 my $element = $array[ rand @array ];
5a964f20 1746
68dc0745 1747=head2 How do I permute N elements of a list?
c195e131 1748X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1749X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1750
c195e131 1751Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1752array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1753written in XS code and is very efficient:
49d635f9 1754
1755 use Algorithm::Permute;
c195e131 1756
49d635f9 1757 my @array = 'a'..'d';
1758 my $p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1759
49d635f9 1760 while (my @perm = $p_iterator->next) {
1761 print "next permutation: (@perm)\n";
ac9dac7f 1762 }
49d635f9 1763
197aec24 1764For even faster execution, you could do:
1765
ac9dac7f 1766 use Algorithm::Permute;
c195e131 1767
ac9dac7f 1768 my @array = 'a'..'d';
c195e131 1769
ac9dac7f 1770 Algorithm::Permute::permute {
1771 print "next permutation: (@array)\n";
1772 } @array;
197aec24 1773
c195e131 1774Here's a little program that generates all permutations of all the
1775words on each line of input. The algorithm embodied in the
1776C<permute()> function is discussed in Volume 4 (still unpublished) of
1777Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9 1778
1779 #!/usr/bin/perl -n
ac003c96 1780 # Fischer-Krause ordered permutation generator
49d635f9 1781
1782 sub permute (&@) {
1783 my $code = shift;
1784 my @idx = 0..$#_;
1785 while ( $code->(@_[@idx]) ) {
1786 my $p = $#idx;
1787 --$p while $idx[$p-1] > $idx[$p];
1788 my $q = $p or return;
1789 push @idx, reverse splice @idx, $p;
1790 ++$q while $idx[$p-1] > $idx[$q];
1791 @idx[$p-1,$q]=@idx[$q,$p-1];
1792 }
68dc0745 1793 }
68dc0745 1794
c195e131 1795 permute { print "@_\n" } split;
1796
1797The C<Algorithm::Loops> module also provides the C<NextPermute> and
1798C<NextPermuteNum> functions which efficiently find all unique permutations
1799of an array, even if it contains duplicate values, modifying it in-place:
1800if its elements are in reverse-sorted order then the array is reversed,
1801making it sorted, and it returns false; otherwise the next
1802permutation is returned.
1803
1804C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1805you can enumerate all the permutations of C<0..9> like this:
1806
1807 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1808
c195e131 1809 my @list= 0..9;
1810 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1811
68dc0745 1812=head2 How do I sort an array by (anything)?
1813
1814Supply a comparison function to sort() (described in L<perlfunc/sort>):
1815
ac9dac7f 1816 @list = sort { $a <=> $b } @list;
68dc0745 1817
1818The default sort function is cmp, string comparison, which would
c47ff5f1 1819sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1820the numerical comparison operator.
1821
1822If you have a complicated function needed to pull out the part you
1823want to sort on, then don't do it inside the sort function. Pull it
1824out first, because the sort BLOCK can be called many times for the
1825same element. Here's an example of how to pull out the first word
1826after the first number on each item, and then sort those words
1827case-insensitively.
1828
ac9dac7f 1829 @idx = ();
1830 for (@data) {
1831 ($item) = /\d+\s*(\S+)/;
1832 push @idx, uc($item);
1833 }
1834 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1835
a6dd486b 1836which could also be written this way, using a trick
68dc0745 1837that's come to be known as the Schwartzian Transform:
1838
ac9dac7f 1839 @sorted = map { $_->[0] }
1840 sort { $a->[1] cmp $b->[1] }
1841 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1842
1843If you need to sort on several fields, the following paradigm is useful.
1844
ac9dac7f 1845 @sorted = sort {
1846 field1($a) <=> field1($b) ||
1847 field2($a) cmp field2($b) ||
1848 field3($a) cmp field3($b)
1849 } @data;
68dc0745 1850
1851This can be conveniently combined with precalculation of keys as given
1852above.
1853
379e39d7 1854See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1855To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1856more about this approach.
68dc0745 1857
ac9dac7f 1858See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1859
1860=head2 How do I manipulate arrays of bits?
1861
ac9dac7f 1862Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1863operations.
1864
109f0441 1865For example, you don't have to store individual bits in an array
1866(which would mean that you're wasting a lot of space). To convert an
1867array of bits to a string, use C<vec()> to set the right bits. This
1868sets C<$vec> to have bit N set only if C<$ints[N]> was set:
ac9dac7f 1869
109f0441 1870 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1871 $vec = '';
109f0441 1872 foreach( 0 .. $#ints ) {
1873 vec($vec,$_,1) = 1 if $ints[$_];
1874 }
ac9dac7f 1875
109f0441 1876The string C<$vec> only takes up as many bits as it needs. For
1877instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1878bytes to store them (not counting the scalar variable overhead).
1879
1880Here's how, given a vector in C<$vec>, you can get those bits into
1881your C<@ints> array:
ac9dac7f 1882
1883 sub bitvec_to_list {
1884 my $vec = shift;
1885 my @ints;
1886 # Find null-byte density then select best algorithm
1887 if ($vec =~ tr/\0// / length $vec > 0.95) {
1888 use integer;
1889 my $i;
1890
1891 # This method is faster with mostly null-bytes
1892 while($vec =~ /[^\0]/g ) {
1893 $i = -9 + 8 * pos $vec;
1894 push @ints, $i if vec($vec, ++$i, 1);
1895 push @ints, $i if vec($vec, ++$i, 1);
1896 push @ints, $i if vec($vec, ++$i, 1);
1897 push @ints, $i if vec($vec, ++$i, 1);
1898 push @ints, $i if vec($vec, ++$i, 1);
1899 push @ints, $i if vec($vec, ++$i, 1);
1900 push @ints, $i if vec($vec, ++$i, 1);
1901 push @ints, $i if vec($vec, ++$i, 1);
1902 }
1903 }
1904 else {
1905 # This method is a fast general algorithm
1906 use integer;
1907 my $bits = unpack "b*", $vec;
1908 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1909 push @ints, pos $bits while($bits =~ /1/g);
1910 }
1911
1912 return \@ints;
1913 }
68dc0745 1914
1915This method gets faster the more sparse the bit vector is.
1916(Courtesy of Tim Bunce and Winfried Koenig.)
1917
76817d6d 1918You can make the while loop a lot shorter with this suggestion
1919from Benjamin Goldberg:
1920
1921 while($vec =~ /[^\0]+/g ) {
ac9dac7f 1922 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1923 }
76817d6d 1924
ac9dac7f 1925Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1926
ac9dac7f 1927 $vector = Bit::Vector->new($num_of_bits);
1928 $vector->Index_List_Store(@ints);
1929 @ints = $vector->Index_List_Read();
cc30d1a7 1930
ac9dac7f 1931C<Bit::Vector> provides efficient methods for bit vector, sets of
1932small integers and "big int" math.
cc30d1a7 1933
1934Here's a more extensive illustration using vec():
65acb1b1 1935
ac9dac7f 1936 # vec demo
1937 $vector = "\xff\x0f\xef\xfe";
1938 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1939 unpack("N", $vector), "\n";
ac9dac7f 1940 $is_set = vec($vector, 23, 1);
1941 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1942 pvec($vector);
65acb1b1 1943
ac9dac7f 1944 set_vec(1,1,1);
1945 set_vec(3,1,1);
1946 set_vec(23,1,1);
1947
1948 set_vec(3,1,3);
1949 set_vec(3,2,3);
1950 set_vec(3,4,3);
1951 set_vec(3,4,7);
1952 set_vec(3,8,3);
1953 set_vec(3,8,7);
1954
1955 set_vec(0,32,17);
1956 set_vec(1,32,17);
1957
1958 sub set_vec {
1959 my ($offset, $width, $value) = @_;
1960 my $vector = '';
1961 vec($vector, $offset, $width) = $value;
1962 print "offset=$offset width=$width value=$value\n";
1963 pvec($vector);
1964 }
65acb1b1 1965
ac9dac7f 1966 sub pvec {
1967 my $vector = shift;
1968 my $bits = unpack("b*", $vector);
1969 my $i = 0;
1970 my $BASE = 8;
1971
1972 print "vector length in bytes: ", length($vector), "\n";
1973 @bytes = unpack("A8" x length($vector), $bits);
1974 print "bits are: @bytes\n\n";
1975 }
65acb1b1 1976
68dc0745 1977=head2 Why does defined() return true on empty arrays and hashes?
1978
65acb1b1 1979The short story is that you should probably only use defined on scalars or
1980functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1981in the 5.004 release or later of Perl for more detail.
68dc0745 1982
1983=head1 Data: Hashes (Associative Arrays)
1984
1985=head2 How do I process an entire hash?
1986
ee891a00 1987(contributed by brian d foy)
1988
1989There are a couple of ways that you can process an entire hash. You
1990can get a list of keys, then go through each key, or grab a one
1991key-value pair at a time.
68dc0745 1992
ee891a00 1993To go through all of the keys, use the C<keys> function. This extracts
1994all of the keys of the hash and gives them back to you as a list. You
1995can then get the value through the particular key you're processing:
1996
1997 foreach my $key ( keys %hash ) {
1998 my $value = $hash{$key}
1999 ...
ac9dac7f 2000 }
68dc0745 2001
ee891a00 2002Once you have the list of keys, you can process that list before you
109f0441 2003process the hash elements. For instance, you can sort the keys so you
ee891a00 2004can process them in lexical order:
2005
2006 foreach my $key ( sort keys %hash ) {
2007 my $value = $hash{$key}
2008 ...
2009 }
2010
2011Or, you might want to only process some of the items. If you only want
2012to deal with the keys that start with C<text:>, you can select just
2013those using C<grep>:
2014
2015 foreach my $key ( grep /^text:/, keys %hash ) {
2016 my $value = $hash{$key}
2017 ...
2018 }
2019
2020If the hash is very large, you might not want to create a long list of
109f0441 2021keys. To save some memory, you can grab one key-value pair at a time using
ee891a00 2022C<each()>, which returns a pair you haven't seen yet:
2023
2024 while( my( $key, $value ) = each( %hash ) ) {
2025 ...
2026 }
2027
2028The C<each> operator returns the pairs in apparently random order, so if
2029ordering matters to you, you'll have to stick with the C<keys> method.
2030
2031The C<each()> operator can be a bit tricky though. You can't add or
2032delete keys of the hash while you're using it without possibly
2033skipping or re-processing some pairs after Perl internally rehashes
2034all of the elements. Additionally, a hash has only one iterator, so if
2035you use C<keys>, C<values>, or C<each> on the same hash, you can reset
2036the iterator and mess up your processing. See the C<each> entry in
2037L<perlfunc> for more details.
68dc0745 2038
109f0441 2039=head2 How do I merge two hashes?
2040X<hash> X<merge> X<slice, hash>
2041
2042(contributed by brian d foy)
2043
2044Before you decide to merge two hashes, you have to decide what to do
2045if both hashes contain keys that are the same and if you want to leave
2046the original hashes as they were.
2047
2048If you want to preserve the original hashes, copy one hash (C<%hash1>)
2049to a new hash (C<%new_hash>), then add the keys from the other hash
2050(C<%hash2> to the new hash. Checking that the key already exists in
2051C<%new_hash> gives you a chance to decide what to do with the
2052duplicates:
2053
2054 my %new_hash = %hash1; # make a copy; leave %hash1 alone
2055
2056 foreach my $key2 ( keys %hash2 )
2057 {
2058 if( exists $new_hash{$key2} )
2059 {
2060 warn "Key [$key2] is in both hashes!";
2061 # handle the duplicate (perhaps only warning)
2062 ...
2063 next;
2064 }
2065 else
2066 {
2067 $new_hash{$key2} = $hash2{$key2};
2068 }
2069 }
2070
2071If you don't want to create a new hash, you can still use this looping
2072technique; just change the C<%new_hash> to C<%hash1>.
2073
2074 foreach my $key2 ( keys %hash2 )
2075 {
2076 if( exists $hash1{$key2} )
2077 {
2078 warn "Key [$key2] is in both hashes!";
2079 # handle the duplicate (perhaps only warning)
2080 ...
2081 next;
2082 }
2083 else
2084 {
2085 $hash1{$key2} = $hash2{$key2};
2086 }
2087 }
2088
2089If you don't care that one hash overwrites keys and values from the other, you
2090could just use a hash slice to add one hash to another. In this case, values
2091from C<%hash2> replace values from C<%hash1> when they have keys in common:
2092
2093 @hash1{ keys %hash2 } = values %hash2;
2094
68dc0745 2095=head2 What happens if I add or remove keys from a hash while iterating over it?
2096
28b41a80 2097(contributed by brian d foy)
d92eb7b0 2098
28b41a80 2099The easy answer is "Don't do that!"
d92eb7b0 2100
28b41a80 2101If you iterate through the hash with each(), you can delete the key
2102most recently returned without worrying about it. If you delete or add
2103other keys, the iterator may skip or double up on them since perl
2104may rearrange the hash table. See the
2105entry for C<each()> in L<perlfunc>.
68dc0745 2106
2107=head2 How do I look up a hash element by value?
2108
2109Create a reverse hash:
2110
ac9dac7f 2111 %by_value = reverse %by_key;
2112 $key = $by_value{$value};
68dc0745 2113
2114That's not particularly efficient. It would be more space-efficient
2115to use:
2116
ac9dac7f 2117 while (($key, $value) = each %by_key) {
2118 $by_value{$value} = $key;
2119 }
68dc0745 2120
d92eb7b0 2121If your hash could have repeated values, the methods above will only find
2122one of the associated keys. This may or may not worry you. If it does
2123worry you, you can always reverse the hash into a hash of arrays instead:
2124
ac9dac7f 2125 while (($key, $value) = each %by_key) {
2126 push @{$key_list_by_value{$value}}, $key;
2127 }
68dc0745 2128
2129=head2 How can I know how many entries are in a hash?
2130
109f0441 2131(contributed by brian d foy)
2132
2133This is very similar to "How do I process an entire hash?", also in
2134L<perlfaq4>, but a bit simpler in the common cases.
2135
2136You can use the C<keys()> built-in function in scalar context to find out
2137have many entries you have in a hash:
68dc0745 2138
109f0441 2139 my $key_count = keys %hash; # must be scalar context!
d12d61cf 2140
109f0441 2141If you want to find out how many entries have a defined value, that's
d12d61cf 2142a bit different. You have to check each value. A C<grep> is handy:
109f0441 2143
2144 my $defined_value_count = grep { defined } values %hash;
68dc0745 2145
109f0441 2146You can use that same structure to count the entries any way that
2147you like. If you want the count of the keys with vowels in them,
2148you just test for that instead:
2149
2150 my $vowel_count = grep { /[aeiou]/ } keys %hash;
d12d61cf 2151
109f0441 2152The C<grep> in scalar context returns the count. If you want the list
2153of matching items, just use it in list context instead:
2154
2155 my @defined_values = grep { defined } values %hash;
2156
2157The C<keys()> function also resets the iterator, which means that you may
197aec24 2158see strange results if you use this between uses of other hash operators
109f0441 2159such as C<each()>.
68dc0745 2160
2161=head2 How do I sort a hash (optionally by value instead of key)?
2162
a05e4845 2163(contributed by brian d foy)
2164
2165To sort a hash, start with the keys. In this example, we give the list of
2166keys to the sort function which then compares them ASCIIbetically (which
2167might be affected by your locale settings). The output list has the keys
2168in ASCIIbetical order. Once we have the keys, we can go through them to
2169create a report which lists the keys in ASCIIbetical order.
2170
2171 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 2172
a05e4845 2173 foreach my $key ( @keys )
2174 {
109f0441 2175 printf "%-20s %6d\n", $key, $hash{$key};
a05e4845 2176 }
2177
58103a2e 2178We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2179comparing the keys, we can compute a value with them and use that
58103a2e 2180value as the comparison.
a05e4845 2181
2182For instance, to make our report order case-insensitive, we use
58103a2e 2183the C<\L> sequence in a double-quoted string to make everything
a05e4845 2184lowercase. The C<sort()> block then compares the lowercased
2185values to determine in which order to put the keys.
2186
2187 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 2188
a05e4845 2189Note: if the computation is expensive or the hash has many elements,
58103a2e 2190you may want to look at the Schwartzian Transform to cache the
a05e4845 2191computation results.
2192
2193If we want to sort by the hash value instead, we use the hash key
2194to look it up. We still get out a list of keys, but this time they
2195are ordered by their value.
2196
2197 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2198
2199From there we can get more complex. If the hash values are the same,
2200we can provide a secondary sort on the hash key.
2201
58103a2e 2202 my @keys = sort {
2203 $hash{$a} <=> $hash{$b}
a05e4845 2204 or
2205 "\L$a" cmp "\L$b"
2206 } keys %hash;
68dc0745 2207
2208=head2 How can I always keep my hash sorted?
ac9dac7f 2209X<hash tie sort DB_File Tie::IxHash>
68dc0745 2210
ac9dac7f 2211You can look into using the C<DB_File> module and C<tie()> using the
2212C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2213Databases">. The C<Tie::IxHash> module from CPAN might also be
2214instructive. Although this does keep your hash sorted, you might not
2215like the slow down you suffer from the tie interface. Are you sure you
2216need to do this? :)
68dc0745 2217
2218=head2 What's the difference between "delete" and "undef" with hashes?
2219
92993692 2220Hashes contain pairs of scalars: the first is the key, the
2221second is the value. The key will be coerced to a string,
2222although the value can be any kind of scalar: string,
ac9dac7f 2223number, or reference. If a key C<$key> is present in
92993692 2224%hash, C<exists($hash{$key})> will return true. The value
2225for a given key can be C<undef>, in which case
2226C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2227will return true. This corresponds to (C<$key>, C<undef>)
2228being in the hash.
68dc0745 2229
589a5df2 2230Pictures help... Here's the C<%hash> table:
68dc0745 2231
2232 keys values
2233 +------+------+
2234 | a | 3 |
2235 | x | 7 |
2236 | d | 0 |
2237 | e | 2 |
2238 +------+------+
2239
2240And these conditions hold
2241
92993692 2242 $hash{'a'} is true
2243 $hash{'d'} is false
2244 defined $hash{'d'} is true
2245 defined $hash{'a'} is true
e9d185f8 2246 exists $hash{'a'} is true (Perl 5 only)
92993692 2247 grep ($_ eq 'a', keys %hash) is true
68dc0745 2248
2249If you now say
2250
92993692 2251 undef $hash{'a'}
68dc0745 2252
2253your table now reads:
2254
2255
2256 keys values
2257 +------+------+
2258 | a | undef|
2259 | x | 7 |
2260 | d | 0 |
2261 | e | 2 |
2262 +------+------+
2263
2264and these conditions now hold; changes in caps:
2265
92993692 2266 $hash{'a'} is FALSE
2267 $hash{'d'} is false
2268 defined $hash{'d'} is true
2269 defined $hash{'a'} is FALSE
e9d185f8 2270 exists $hash{'a'} is true (Perl 5 only)
92993692 2271 grep ($_ eq 'a', keys %hash) is true
68dc0745 2272
2273Notice the last two: you have an undef value, but a defined key!
2274
2275Now, consider this:
2276
92993692 2277 delete $hash{'a'}
68dc0745 2278
2279your table now reads:
2280
2281 keys values
2282 +------+------+
2283 | x | 7 |
2284 | d | 0 |
2285 | e | 2 |
2286 +------+------+
2287
2288and these conditions now hold; changes in caps:
2289
92993692 2290 $hash{'a'} is false
2291 $hash{'d'} is false
2292 defined $hash{'d'} is true
2293 defined $hash{'a'} is false
e9d185f8 2294 exists $hash{'a'} is FALSE (Perl 5 only)
92993692 2295 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 2296
2297See, the whole entry is gone!
2298
2299=head2 Why don't my tied hashes make the defined/exists distinction?
2300
92993692 2301This depends on the tied hash's implementation of EXISTS().
2302For example, there isn't the concept of undef with hashes
2303that are tied to DBM* files. It also means that exists() and
2304defined() do the same thing with a DBM* file, and what they
2305end up doing is not what they do with ordinary hashes.
68dc0745 2306
2307=head2 How do I reset an each() operation part-way through?
2308
fb2fe781 2309(contributed by brian d foy)
2310
2311You can use the C<keys> or C<values> functions to reset C<each>. To
2312simply reset the iterator used by C<each> without doing anything else,
2313use one of them in void context:
2314
2315 keys %hash; # resets iterator, nothing else.
2316 values %hash; # resets iterator, nothing else.
2317
2318See the documentation for C<each> in L<perlfunc>.
68dc0745 2319
2320=head2 How can I get the unique keys from two hashes?
2321
d92eb7b0 2322First you extract the keys from the hashes into lists, then solve
2323the "removing duplicates" problem described above. For example:
68dc0745 2324
ac9dac7f 2325 %seen = ();
2326 for $element (keys(%foo), keys(%bar)) {
2327 $seen{$element}++;
2328 }
2329 @uniq = keys %seen;
68dc0745 2330
2331Or more succinctly:
2332
ac9dac7f 2333 @uniq = keys %{{%foo,%bar}};
68dc0745 2334
2335Or if you really want to save space:
2336
ac9dac7f 2337 %seen = ();
2338 while (defined ($key = each %foo)) {
2339 $seen{$key}++;
2340 }
2341 while (defined ($key = each %bar)) {
2342 $seen{$key}++;
2343 }
2344 @uniq = keys %seen;
68dc0745 2345
2346=head2 How can I store a multidimensional array in a DBM file?
2347
2348Either stringify the structure yourself (no fun), or else
2349get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2350it on top of either DB_File or GDBM_File.
2351
2352=head2 How can I make my hash remember the order I put elements into it?
2353
ac9dac7f 2354Use the C<Tie::IxHash> from CPAN.
68dc0745 2355
ac9dac7f 2356 use Tie::IxHash;
2357
2358 tie my %myhash, 'Tie::IxHash';
2359
2360 for (my $i=0; $i<20; $i++) {
2361 $myhash{$i} = 2*$i;
2362 }
2363
2364 my @keys = keys %myhash;
2365 # @keys = (0,1,2,3,...)
46fc3d4c 2366
68dc0745 2367=head2 Why does passing a subroutine an undefined element in a hash create it?
2368
109f0441 2369(contributed by brian d foy)
2370
2371Are you using a really old version of Perl?
2372
2373Normally, accessing a hash key's value for a nonexistent key will
2374I<not> create the key.
2375
2376 my %hash = ();
2377 my $value = $hash{ 'foo' };
2378 print "This won't print\n" if exists $hash{ 'foo' };
2379
2380Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2381Since you could assign directly to C<$_[0]>, Perl had to be ready to
2382make that assignment so it created the hash key ahead of time:
2383
2384 my_sub( $hash{ 'foo' } );
2385 print "This will print before 5.004\n" if exists $hash{ 'foo' };
68dc0745 2386
109f0441 2387 sub my_sub {
2388 # $_[0] = 'bar'; # create hash key in case you do this
2389 1;
2390 }
2391
2392Since Perl 5.004, however, this situation is a special case and Perl
2393creates the hash key only when you make the assignment:
68dc0745 2394
109f0441 2395 my_sub( $hash{ 'foo' } );
2396 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2397
2398 sub my_sub {
2399 $_[0] = 'bar';
2400 }
68dc0745 2401
109f0441 2402However, if you want the old behavior (and think carefully about that
2403because it's a weird side effect), you can pass a hash slice instead.
2404Perl 5.004 didn't make this a special case:
68dc0745 2405
109f0441 2406 my_sub( @hash{ qw/foo/ } );
68dc0745 2407
fc36a67e 2408=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2409
65acb1b1 2410Usually a hash ref, perhaps like this:
2411
ac9dac7f 2412 $record = {
2413 NAME => "Jason",
2414 EMPNO => 132,
2415 TITLE => "deputy peon",
2416 AGE => 23,
2417 SALARY => 37_000,
2418 PALS => [ "Norbert", "Rhys", "Phineas"],
2419 };
65acb1b1 2420
2421References are documented in L<perlref> and the upcoming L<perlreftut>.
2422Examples of complex data structures are given in L<perldsc> and
2423L<perllol>. Examples of structures and object-oriented classes are
2424in L<perltoot>.
68dc0745 2425
2426=head2 How can I use a reference as a hash key?
2427
109f0441 2428(contributed by brian d foy and Ben Morrow)
9e72e4c6 2429
2430Hash keys are strings, so you can't really use a reference as the key.
2431When you try to do that, perl turns the reference into its stringified
ac9dac7f 2432form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2433back the reference from the stringified form, at least without doing
109f0441 2434some extra work on your own.
2435
2436Remember that the entry in the hash will still be there even if
2437the referenced variable goes out of scope, and that it is entirely
2438possible for Perl to subsequently allocate a different variable at
2439the same address. This will mean a new variable might accidentally
2440be associated with the value for an old.
2441
2442If you have Perl 5.10 or later, and you just want to store a value
2443against the reference for lookup later, you can use the core
2444Hash::Util::Fieldhash module. This will also handle renaming the
2445keys if you use multiple threads (which causes all variables to be
2446reallocated at new addresses, changing their stringification), and
2447garbage-collecting the entries when the referenced variable goes out
2448of scope.
2449
2450If you actually need to be able to get a real reference back from
2451each hash entry, you can use the Tie::RefHash module, which does the
2452required work for you.
68dc0745 2453
2454=head1 Data: Misc
2455
2456=head2 How do I handle binary data correctly?
2457
ac9dac7f 2458Perl is binary clean, so it can handle binary data just fine.
e573f903 2459On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f 2460files to avoid conversions for line endings. In general, you should
2461use C<binmode> any time you want to work with binary data.
68dc0745 2462
ac9dac7f 2463Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2464
ac9dac7f 2465If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2466If you want to deal with multibyte characters, however, there are
68dc0745 2467some gotchas. See the section on Regular Expressions.
2468
2469=head2 How do I determine whether a scalar is a number/whole/integer/float?
2470
2471Assuming that you don't care about IEEE notations like "NaN" or
2472"Infinity", you probably just want to use a regular expression.
2473
ac9dac7f 2474 if (/\D/) { print "has nondigits\n" }
2475 if (/^\d+$/) { print "is a whole number\n" }
2476 if (/^-?\d+$/) { print "is an integer\n" }
2477 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2478 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2479 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2480 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2481 { print "a C float\n" }
68dc0745 2482
f0d19b68 2483There are also some commonly used modules for the task.
2484L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f 2485internal function C<looks_like_number> for determining whether a
2486variable looks like a number. L<Data::Types> exports functions that
2487validate data types using both the above and other regular
2488expressions. Thirdly, there is C<Regexp::Common> which has regular
2489expressions to match various types of numbers. Those three modules are
2490available from the CPAN.
f0d19b68 2491
2492If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f 2493function. Its semantics are somewhat cumbersome, so here's a
2494C<getnum> wrapper function for more convenient access. This function
2495takes a string and returns the number it found, or C<undef> for input
2496that isn't a C float. The C<is_numeric> function is a front end to
2497C<getnum> if you just want to say, "Is this a float?"
2498
2499 sub getnum {
2500 use POSIX qw(strtod);
2501 my $str = shift;
2502 $str =~ s/^\s+//;
2503 $str =~ s/\s+$//;
2504 $! = 0;
2505 my($num, $unparsed) = strtod($str);
2506 if (($str eq '') || ($unparsed != 0) || $!) {
2507 return undef;
2508 }
2509 else {
2510 return $num;
2511 }
2512 }
5a964f20 2513
ac9dac7f 2514 sub is_numeric { defined getnum($_[0]) }
5a964f20 2515
f0d19b68 2516Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f 2517instead. The C<POSIX> module (part of the standard Perl distribution)
2518provides the C<strtod> and C<strtol> for converting strings to double
2519and longs, respectively.
68dc0745 2520
2521=head2 How do I keep persistent data across program calls?
2522
2523For some specific applications, you can use one of the DBM modules.
ac9dac7f 2524See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2525or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2526of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2527and C<retrieve> functions:
65acb1b1 2528
ac9dac7f 2529 use Storable;
2530 store(\%hash, "filename");
65acb1b1 2531
ac9dac7f 2532 # later on...
2533 $href = retrieve("filename"); # by ref
2534 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2535
2536=head2 How do I print out or copy a recursive data structure?
2537
ac9dac7f 2538The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2539for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a 25405.8 release of Perl), provides a function called C<dclone> that recursively
2541copies its argument.
65acb1b1 2542
ac9dac7f 2543 use Storable qw(dclone);
2544 $r2 = dclone($r1);
68dc0745 2545
ac9dac7f 2546Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1 2547It will be deeply copied. Because C<dclone> takes and returns references,
2548you'd have to add extra punctuation if you had a hash of arrays that
2549you wanted to copy.
68dc0745 2550
ac9dac7f 2551 %newhash = %{ dclone(\%oldhash) };
68dc0745 2552
2553=head2 How do I define methods for every class/object?
2554
109f0441 2555(contributed by Ben Morrow)
2556
2557You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2558be very careful to consider the consequences of doing this: adding
2559methods to every object is very likely to have unintended
2560consequences. If possible, it would be better to have all your object
2561inherit from some common base class, or to use an object system like
2562Moose that supports roles.
68dc0745 2563
2564=head2 How do I verify a credit card checksum?
2565
ac9dac7f 2566Get the C<Business::CreditCard> module from CPAN.
68dc0745 2567
65acb1b1 2568=head2 How do I pack arrays of doubles or floats for XS code?
2569
109f0441 2570The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2571If you're doing a lot of float or double processing, consider using
ac9dac7f 2572the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2573
109f0441 2574See L<http://search.cpan.org/dist/PGPLOT> for the code.
2575
68dc0745 2576=head1 AUTHOR AND COPYRIGHT
2577
8d2e243f 2578Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
7678cced 2579other authors as noted. All rights reserved.
5a964f20 2580
5a7beb56 2581This documentation is free; you can redistribute it and/or modify it
2582under the same terms as Perl itself.
5a964f20 2583
2584Irrespective of its distribution, all code examples in this file
2585are hereby placed into the public domain. You are permitted and
2586encouraged to use this code in your own programs for fun
2587or for profit as you see fit. A simple comment in the code giving
2588credit would be courteous but is not required.