FAQ sync
[p5sagit/p5-mst-13.2.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
e573f903 3perlfaq4 - Data Manipulation ($Revision: 7954 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f 7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f 14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ac9dac7f 20L<perlnumber> show the gory details of number representations and
21conversions.
49d635f9 22
ac9dac7f 23To limit the number of decimal places in your numbers, you can use the
24printf or sprintf function. See the L<"Floating Point
25Arithmetic"|perlop> for more details.
49d635f9 26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e 31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e 34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e 37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e 43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
49d635f9 51Perl only understands octal and hex numbers as such when they occur as
52literals in your program. Octal literals in perl must start with a
ac9dac7f 53leading C<0> and hexadecimal literals must start with a leading C<0x>.
49d635f9 54If they are read in from somewhere and assigned, no automatic
ac9dac7f 55conversion takes place. You must explicitly use C<oct()> or C<hex()> if you
56want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>),
57octal (C<0350> or even without the leading C<0>, like C<377>) and binary
58(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with
59or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>.
33ce146f 60The inverse mapping from decimal to octal can be done with either the
ac9dac7f 61<%o> or C<%O> C<sprintf()> formats.
68dc0745 62
ac9dac7f 63This problem shows up most often when people try using C<chmod()>,
64C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition
65typically take permissions in octal.
68dc0745 66
ac9dac7f 67 chmod(644, $file); # WRONG
68 chmod(0644, $file); # right
68dc0745 69
197aec24 70Note the mistake in the first line was specifying the decimal literal
ac9dac7f 71C<644>, rather than the intended octal literal C<0644>. The problem can
33ce146f 72be seen with:
73
ac9dac7f 74 printf("%#o",644); # prints 01204
33ce146f 75
76Surely you had not intended C<chmod(01204, $file);> - did you? If you
77want to use numeric literals as arguments to chmod() et al. then please
197aec24 78try to express them as octal constants, that is with a leading zero and
ac9dac7f 79with the following digits restricted to the set C<0..7>.
33ce146f 80
65acb1b1 81=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 82
ac9dac7f 83Remember that C<int()> merely truncates toward 0. For rounding to a
84certain number of digits, C<sprintf()> or C<printf()> is usually the
85easiest route.
92c2ed05 86
ac9dac7f 87 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 88
ac9dac7f 89The C<POSIX> module (part of the standard Perl distribution)
90implements C<ceil()>, C<floor()>, and a number of other mathematical
91and trigonometric functions.
68dc0745 92
ac9dac7f 93 use POSIX;
94 $ceil = ceil(3.5); # 4
95 $floor = floor(3.5); # 3
92c2ed05 96
ac9dac7f 97In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
98module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 99distribution) implements the trigonometric functions. Internally it
ac9dac7f 100uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 101the real axis into the complex plane, for example the inverse sine of
1022.
68dc0745 103
104Rounding in financial applications can have serious implications, and
105the rounding method used should be specified precisely. In these
106cases, it probably pays not to trust whichever system rounding is
107being used by Perl, but to instead implement the rounding function you
108need yourself.
109
65acb1b1 110To see why, notice how you'll still have an issue on half-way-point
111alternation:
112
ac9dac7f 113 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 114
ac9dac7f 115 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
116 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 117
ac9dac7f 118Don't blame Perl. It's the same as in C. IEEE says we have to do
119this. Perl numbers whose absolute values are integers under 2**31 (on
12032 bit machines) will work pretty much like mathematical integers.
121Other numbers are not guaranteed.
65acb1b1 122
6f0efb17 123=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 124
ac9dac7f 125As always with Perl there is more than one way to do it. Below are a
126few examples of approaches to making common conversions between number
127representations. This is intended to be representational rather than
128exhaustive.
68dc0745 129
ac9dac7f 130Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
131module from CPAN. The reason you might choose C<Bit::Vector> over the
132perl built in functions is that it works with numbers of ANY size,
133that it is optimized for speed on some operations, and for at least
134some programmers the notation might be familiar.
d92eb7b0 135
818c4caa 136=over 4
137
138=item How do I convert hexadecimal into decimal
d92eb7b0 139
ac9dac7f 140Using perl's built in conversion of C<0x> notation:
6761e064 141
ac9dac7f 142 $dec = 0xDEADBEEF;
7207e29d 143
ac9dac7f 144Using the C<hex> function:
6761e064 145
ac9dac7f 146 $dec = hex("DEADBEEF");
6761e064 147
ac9dac7f 148Using C<pack>:
6761e064 149
ac9dac7f 150 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 151
ac9dac7f 152Using the CPAN module C<Bit::Vector>:
6761e064 153
ac9dac7f 154 use Bit::Vector;
155 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
156 $dec = $vec->to_Dec();
6761e064 157
818c4caa 158=item How do I convert from decimal to hexadecimal
6761e064 159
ac9dac7f 160Using C<sprintf>:
6761e064 161
ac9dac7f 162 $hex = sprintf("%X", 3735928559); # upper case A-F
163 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 164
ac9dac7f 165Using C<unpack>:
6761e064 166
ac9dac7f 167 $hex = unpack("H*", pack("N", 3735928559));
6761e064 168
ac9dac7f 169Using C<Bit::Vector>:
6761e064 170
ac9dac7f 171 use Bit::Vector;
172 $vec = Bit::Vector->new_Dec(32, -559038737);
173 $hex = $vec->to_Hex();
6761e064 174
ac9dac7f 175And C<Bit::Vector> supports odd bit counts:
6761e064 176
ac9dac7f 177 use Bit::Vector;
178 $vec = Bit::Vector->new_Dec(33, 3735928559);
179 $vec->Resize(32); # suppress leading 0 if unwanted
180 $hex = $vec->to_Hex();
6761e064 181
818c4caa 182=item How do I convert from octal to decimal
6761e064 183
184Using Perl's built in conversion of numbers with leading zeros:
185
ac9dac7f 186 $dec = 033653337357; # note the leading 0!
6761e064 187
ac9dac7f 188Using the C<oct> function:
6761e064 189
ac9dac7f 190 $dec = oct("33653337357");
6761e064 191
ac9dac7f 192Using C<Bit::Vector>:
6761e064 193
ac9dac7f 194 use Bit::Vector;
195 $vec = Bit::Vector->new(32);
196 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
197 $dec = $vec->to_Dec();
6761e064 198
818c4caa 199=item How do I convert from decimal to octal
6761e064 200
ac9dac7f 201Using C<sprintf>:
6761e064 202
ac9dac7f 203 $oct = sprintf("%o", 3735928559);
6761e064 204
ac9dac7f 205Using C<Bit::Vector>:
6761e064 206
ac9dac7f 207 use Bit::Vector;
208 $vec = Bit::Vector->new_Dec(32, -559038737);
209 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 210
818c4caa 211=item How do I convert from binary to decimal
6761e064 212
2c646907 213Perl 5.6 lets you write binary numbers directly with
ac9dac7f 214the C<0b> notation:
2c646907 215
ac9dac7f 216 $number = 0b10110110;
6f0efb17 217
ac9dac7f 218Using C<oct>:
6f0efb17 219
ac9dac7f 220 my $input = "10110110";
221 $decimal = oct( "0b$input" );
2c646907 222
ac9dac7f 223Using C<pack> and C<ord>:
d92eb7b0 224
ac9dac7f 225 $decimal = ord(pack('B8', '10110110'));
68dc0745 226
ac9dac7f 227Using C<pack> and C<unpack> for larger strings:
6761e064 228
ac9dac7f 229 $int = unpack("N", pack("B32",
6761e064 230 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 231 $dec = sprintf("%d", $int);
6761e064 232
ac9dac7f 233 # substr() is used to left pad a 32 character string with zeros.
6761e064 234
ac9dac7f 235Using C<Bit::Vector>:
6761e064 236
ac9dac7f 237 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
238 $dec = $vec->to_Dec();
6761e064 239
818c4caa 240=item How do I convert from decimal to binary
6761e064 241
ac9dac7f 242Using C<sprintf> (perl 5.6+):
4dfcc30b 243
ac9dac7f 244 $bin = sprintf("%b", 3735928559);
4dfcc30b 245
ac9dac7f 246Using C<unpack>:
6761e064 247
ac9dac7f 248 $bin = unpack("B*", pack("N", 3735928559));
6761e064 249
ac9dac7f 250Using C<Bit::Vector>:
6761e064 251
ac9dac7f 252 use Bit::Vector;
253 $vec = Bit::Vector->new_Dec(32, -559038737);
254 $bin = $vec->to_Bin();
6761e064 255
256The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257are left as an exercise to the inclined reader.
68dc0745 258
818c4caa 259=back
68dc0745 260
65acb1b1 261=head2 Why doesn't & work the way I want it to?
262
263The behavior of binary arithmetic operators depends on whether they're
264used on numbers or strings. The operators treat a string as a series
265of bits and work with that (the string C<"3"> is the bit pattern
266C<00110011>). The operators work with the binary form of a number
267(the number C<3> is treated as the bit pattern C<00000011>).
268
269So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 270C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1 271(yielding C<"1">).
272
273Most problems with C<&> and C<|> arise because the programmer thinks
274they have a number but really it's a string. The rest arise because
275the programmer says:
276
ac9dac7f 277 if ("\020\020" & "\101\101") {
278 # ...
279 }
65acb1b1 280
281but a string consisting of two null bytes (the result of C<"\020\020"
282& "\101\101">) is not a false value in Perl. You need:
283
ac9dac7f 284 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
285 # ...
286 }
65acb1b1 287
68dc0745 288=head2 How do I multiply matrices?
289
290Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
291or the PDL extension (also available from CPAN).
292
293=head2 How do I perform an operation on a series of integers?
294
295To call a function on each element in an array, and collect the
296results, use:
297
ac9dac7f 298 @results = map { my_func($_) } @array;
68dc0745 299
300For example:
301
ac9dac7f 302 @triple = map { 3 * $_ } @single;
68dc0745 303
304To call a function on each element of an array, but ignore the
305results:
306
ac9dac7f 307 foreach $iterator (@array) {
308 some_func($iterator);
309 }
68dc0745 310
311To call a function on each integer in a (small) range, you B<can> use:
312
ac9dac7f 313 @results = map { some_func($_) } (5 .. 25);
68dc0745 314
315but you should be aware that the C<..> operator creates an array of
316all integers in the range. This can take a lot of memory for large
317ranges. Instead use:
318
ac9dac7f 319 @results = ();
320 for ($i=5; $i < 500_005; $i++) {
321 push(@results, some_func($i));
322 }
68dc0745 323
87275199 324This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
325loop will iterate over the range, without creating the entire range.
326
ac9dac7f 327 for my $i (5 .. 500_005) {
328 push(@results, some_func($i));
329 }
87275199 330
331will not create a list of 500,000 integers.
332
68dc0745 333=head2 How can I output Roman numerals?
334
a93751fa 335Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 336
337=head2 Why aren't my random numbers random?
338
65acb1b1 339If you're using a version of Perl before 5.004, you must call C<srand>
340once at the start of your program to seed the random number generator.
49d635f9 341
5cd0b561 342 BEGIN { srand() if $] < 5.004 }
49d635f9 343
65acb1b1 3445.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f 345call C<srand> more than once--you make your numbers less random,
346rather than more.
92c2ed05 347
65acb1b1 348Computers are good at being predictable and bad at being random
06a5f41f 349(despite appearances caused by bugs in your programs :-). see the
49d635f9 350F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f 351collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
352of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 353who attempts to generate random numbers by deterministic means is, of
b432a672 354course, living in a state of sin."
65acb1b1 355
356If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 357provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1 358CPAN. It uses the imperfections in your system's timer to generate
359random numbers, but this takes quite a while. If you want a better
92c2ed05 360pseudorandom generator than comes with your operating system, look at
b432a672 361"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 362
881bdbd4 363=head2 How do I get a random number between X and Y?
364
500071f4 365To get a random number between two values, you can use the
366C<rand()> builtin to get a random number between 0 and
367
793f5136 368C<rand($x)> returns a number such that
369C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
370figure out is a random number in the range from 0 to the
371difference between your I<X> and I<Y>.
372
373That is, to get a number between 10 and 15, inclusive, you
374want a random number between 0 and 5 that you can then add
375to 10.
376
500071f4 377 my $number = 10 + int rand( 15-10+1 );
793f5136 378
379Hence you derive the following simple function to abstract
380that. It selects a random integer between the two given
500071f4 381integers (inclusive), For example: C<random_int_between(50,120)>.
382
ac9dac7f 383 sub random_int_between {
500071f4 384 my($min, $max) = @_;
385 # Assumes that the two arguments are integers themselves!
386 return $min if $min == $max;
387 ($min, $max) = ($max, $min) if $min > $max;
388 return $min + int rand(1 + $max - $min);
389 }
881bdbd4 390
68dc0745 391=head1 Data: Dates
392
5cd0b561 393=head2 How do I find the day or week of the year?
68dc0745 394
571e049f 395The localtime function returns the day of the year. Without an
5cd0b561 396argument localtime uses the current time.
68dc0745 397
a05e4845 398 $day_of_year = (localtime)[7];
ffc145e8 399
ac9dac7f 400The C<POSIX> module can also format a date as the day of the year or
5cd0b561 401week of the year.
68dc0745 402
5cd0b561 403 use POSIX qw/strftime/;
404 my $day_of_year = strftime "%j", localtime;
405 my $week_of_year = strftime "%W", localtime;
406
ac9dac7f 407To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 408a time in epoch seconds for the argument to localtime.
ffc145e8 409
ac9dac7f 410 use POSIX qw/mktime strftime/;
6670e5e7 411 my $week_of_year = strftime "%W",
ac9dac7f 412 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 413
ac9dac7f 414The C<Date::Calc> module provides two functions to calculate these.
5cd0b561 415
416 use Date::Calc;
417 my $day_of_year = Day_of_Year( 1987, 12, 18 );
418 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 419
d92eb7b0 420=head2 How do I find the current century or millennium?
421
422Use the following simple functions:
423
ac9dac7f 424 sub get_century {
425 return int((((localtime(shift || time))[5] + 1999))/100);
426 }
6670e5e7 427
ac9dac7f 428 sub get_millennium {
429 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
430 }
d92eb7b0 431
ac9dac7f 432On some systems, the C<POSIX> module's C<strftime()> function has been
433extended in a non-standard way to use a C<%C> format, which they
434sometimes claim is the "century". It isn't, because on most such
435systems, this is only the first two digits of the four-digit year, and
436thus cannot be used to reliably determine the current century or
437millennium.
d92eb7b0 438
92c2ed05 439=head2 How can I compare two dates and find the difference?
68dc0745 440
b68463f7 441(contributed by brian d foy)
442
ac9dac7f 443You could just store all your dates as a number and then subtract.
444Life isn't always that simple though. If you want to work with
445formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
446modules can help you.
68dc0745 447
448=head2 How can I take a string and turn it into epoch seconds?
449
450If it's a regular enough string that it always has the same format,
92c2ed05 451you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f 452C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
453and C<Date::Manip> modules from CPAN.
68dc0745 454
455=head2 How can I find the Julian Day?
456
7678cced 457(contributed by brian d foy and Dave Cross)
458
ac9dac7f 459You can use the C<Time::JulianDay> module available on CPAN. Ensure
460that you really want to find a Julian day, though, as many people have
7678cced 461different ideas about Julian days. See
462http://www.hermetic.ch/cal_stud/jdn.htm for instance.
463
ac9dac7f 464You can also try the C<DateTime> module, which can convert a date/time
7678cced 465to a Julian Day.
466
ac9dac7f 467 $ perl -MDateTime -le'print DateTime->today->jd'
468 2453401.5
7678cced 469
470Or the modified Julian Day
471
ac9dac7f 472 $ perl -MDateTime -le'print DateTime->today->mjd'
473 53401
7678cced 474
475Or even the day of the year (which is what some people think of as a
476Julian day)
477
ac9dac7f 478 $ perl -MDateTime -le'print DateTime->today->doy'
479 31
be94a901 480
65acb1b1 481=head2 How do I find yesterday's date?
482
6670e5e7 483(contributed by brian d foy)
49d635f9 484
6670e5e7 485Use one of the Date modules. The C<DateTime> module makes it simple, and
486give you the same time of day, only the day before.
49d635f9 487
6670e5e7 488 use DateTime;
58103a2e 489
6670e5e7 490 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 491
6670e5e7 492 print "Yesterday was $yesterday\n";
49d635f9 493
6670e5e7 494You can also use the C<Date::Calc> module using its Today_and_Now
495function.
49d635f9 496
6670e5e7 497 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 498
6670e5e7 499 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 500
6670e5e7 501 print "@date\n";
58103a2e 502
6670e5e7 503Most people try to use the time rather than the calendar to figure out
504dates, but that assumes that days are twenty-four hours each. For
505most people, there are two days a year when they aren't: the switch to
506and from summer time throws this off. Let the modules do the work.
d92eb7b0 507
ac9dac7f 508=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 509
65acb1b1 510Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
ac9dac7f 511Y2K compliant (whatever that means). The programmers you've hired to
65acb1b1 512use it, however, probably are not.
513
514Long answer: The question belies a true understanding of the issue.
515Perl is just as Y2K compliant as your pencil--no more, and no less.
516Can you use your pencil to write a non-Y2K-compliant memo? Of course
517you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 518
87275199 519The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1 520supply adequate information to determine the year well beyond 2000
521(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 522by these functions when used in a list context is the year minus 1900.
65acb1b1 523For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
524number. To avoid the year 2000 problem simply do not treat the year as
525a 2-digit number. It isn't.
68dc0745 526
5a964f20 527When gmtime() and localtime() are used in scalar context they return
68dc0745 528a timestamp string that contains a fully-expanded year. For example,
529C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5302001". There's no year 2000 problem here.
531
5a964f20 532That doesn't mean that Perl can't be used to create non-Y2K compliant
533programs. It can. But so can your pencil. It's the fault of the user,
b432a672 534not the language. At the risk of inflaming the NRA: "Perl doesn't
535break Y2K, people do." See http://www.perl.org/about/y2k.html for
5a964f20 536a longer exposition.
537
68dc0745 538=head1 Data: Strings
539
540=head2 How do I validate input?
541
6670e5e7 542(contributed by brian d foy)
543
544There are many ways to ensure that values are what you expect or
545want to accept. Besides the specific examples that we cover in the
546perlfaq, you can also look at the modules with "Assert" and "Validate"
547in their names, along with other modules such as C<Regexp::Common>.
548
549Some modules have validation for particular types of input, such
550as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
551and C<Data::Validate::IP>.
68dc0745 552
553=head2 How do I unescape a string?
554
b432a672 555It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 556with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 557character are removed with
68dc0745 558
ac9dac7f 559 s/\\(.)/$1/g;
68dc0745 560
92c2ed05 561This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 562
563=head2 How do I remove consecutive pairs of characters?
564
6670e5e7 565(contributed by brian d foy)
566
567You can use the substitution operator to find pairs of characters (or
568runs of characters) and replace them with a single instance. In this
569substitution, we find a character in C<(.)>. The memory parentheses
570store the matched character in the back-reference C<\1> and we use
571that to require that the same thing immediately follow it. We replace
572that part of the string with the character in C<$1>.
68dc0745 573
ac9dac7f 574 s/(.)\1/$1/g;
d92eb7b0 575
6670e5e7 576We can also use the transliteration operator, C<tr///>. In this
577example, the search list side of our C<tr///> contains nothing, but
578the C<c> option complements that so it contains everything. The
579replacement list also contains nothing, so the transliteration is
580almost a no-op since it won't do any replacements (or more exactly,
581replace the character with itself). However, the C<s> option squashes
582duplicated and consecutive characters in the string so a character
583does not show up next to itself
d92eb7b0 584
6670e5e7 585 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 586 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 587
588=head2 How do I expand function calls in a string?
589
6670e5e7 590(contributed by brian d foy)
591
592This is documented in L<perlref>, and although it's not the easiest
593thing to read, it does work. In each of these examples, we call the
58103a2e 594function inside the braces used to dereference a reference. If we
5ae37c3f 595have more than one return value, we can construct and dereference an
6670e5e7 596anonymous array. In this case, we call the function in list context.
597
58103a2e 598 print "The time values are @{ [localtime] }.\n";
6670e5e7 599
600If we want to call the function in scalar context, we have to do a bit
601more work. We can really have any code we like inside the braces, so
602we simply have to end with the scalar reference, although how you do
e573f903 603that is up to you, and you can use code inside the braces. Note that
604the use of parens creates a list context, so we need C<scalar> to
605force the scalar context on the function:
68dc0745 606
6670e5e7 607 print "The time is ${\(scalar localtime)}.\n"
58103a2e 608
6670e5e7 609 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 610
6670e5e7 611If your function already returns a reference, you don't need to create
612the reference yourself.
613
614 sub timestamp { my $t = localtime; \$t }
58103a2e 615
6670e5e7 616 print "The time is ${ timestamp() }.\n";
58103a2e 617
618The C<Interpolation> module can also do a lot of magic for you. You can
619specify a variable name, in this case C<E>, to set up a tied hash that
620does the interpolation for you. It has several other methods to do this
621as well.
622
623 use Interpolation E => 'eval';
624 print "The time values are $E{localtime()}.\n";
625
626In most cases, it is probably easier to simply use string concatenation,
627which also forces scalar context.
6670e5e7 628
ac9dac7f 629 print "The time is " . localtime() . ".\n";
68dc0745 630
68dc0745 631=head2 How do I find matching/nesting anything?
632
92c2ed05 633This isn't something that can be done in one regular expression, no
634matter how complicated. To find something between two single
635characters, a pattern like C</x([^x]*)x/> will get the intervening
636bits in $1. For multiple ones, then something more like
ac9dac7f 637C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7 638nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
639C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
640L<perlre/(??{ code })>. For other cases, you'll have to write a
641parser.
92c2ed05 642
643If you are serious about writing a parser, there are a number of
6a2af475 644modules or oddities that will make your life a lot easier. There are
ac9dac7f 645the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
646C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
647the C<Text::Balanced> is part of the standard distribution.
68dc0745 648
92c2ed05 649One simple destructive, inside-out approach that you might try is to
650pull out the smallest nesting parts one at a time:
5a964f20 651
ac9dac7f 652 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
653 # do something with $1
654 }
5a964f20 655
65acb1b1 656A more complicated and sneaky approach is to make Perl's regular
657expression engine do it for you. This is courtesy Dean Inada, and
658rather has the nature of an Obfuscated Perl Contest entry, but it
659really does work:
660
ac9dac7f 661 # $_ contains the string to parse
662 # BEGIN and END are the opening and closing markers for the
663 # nested text.
c47ff5f1 664
ac9dac7f 665 @( = ('(','');
666 @) = (')','');
667 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
668 @$ = (eval{/$re/},$@!~/unmatched/i);
669 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 670
68dc0745 671=head2 How do I reverse a string?
672
ac9dac7f 673Use C<reverse()> in scalar context, as documented in
68dc0745 674L<perlfunc/reverse>.
675
ac9dac7f 676 $reversed = reverse $string;
68dc0745 677
678=head2 How do I expand tabs in a string?
679
5a964f20 680You can do it yourself:
68dc0745 681
ac9dac7f 682 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 683
ac9dac7f 684Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 685distribution).
686
ac9dac7f 687 use Text::Tabs;
688 @expanded_lines = expand(@lines_with_tabs);
68dc0745 689
690=head2 How do I reformat a paragraph?
691
ac9dac7f 692Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 693
ac9dac7f 694 use Text::Wrap;
695 print wrap("\t", ' ', @paragraphs);
68dc0745 696
ac9dac7f 697The paragraphs you give to C<Text::Wrap> should not contain embedded
698newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 699
ac9dac7f 700Or use the CPAN module C<Text::Autoformat>. Formatting files can be
701easily done by making a shell alias, like so:
bc06af74 702
ac9dac7f 703 alias fmt="perl -i -MText::Autoformat -n0777 \
704 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 705
ac9dac7f 706See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74 707capabilities.
708
49d635f9 709=head2 How can I access or change N characters of a string?
68dc0745 710
49d635f9 711You can access the first characters of a string with substr().
712To get the first character, for example, start at position 0
197aec24 713and grab the string of length 1.
68dc0745 714
68dc0745 715
49d635f9 716 $string = "Just another Perl Hacker";
ac9dac7f 717 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 718
49d635f9 719To change part of a string, you can use the optional fourth
720argument which is the replacement string.
68dc0745 721
ac9dac7f 722 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 723
49d635f9 724You can also use substr() as an lvalue.
68dc0745 725
ac9dac7f 726 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 727
68dc0745 728=head2 How do I change the Nth occurrence of something?
729
92c2ed05 730You have to keep track of N yourself. For example, let's say you want
731to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0 732C<"whosoever"> or C<"whomsoever">, case insensitively. These
733all assume that $_ contains the string to be altered.
68dc0745 734
ac9dac7f 735 $count = 0;
736 s{((whom?)ever)}{
737 ++$count == 5 # is it the 5th?
738 ? "${2}soever" # yes, swap
739 : $1 # renege and leave it there
740 }ige;
68dc0745 741
5a964f20 742In the more general case, you can use the C</g> modifier in a C<while>
743loop, keeping count of matches.
744
ac9dac7f 745 $WANT = 3;
746 $count = 0;
747 $_ = "One fish two fish red fish blue fish";
748 while (/(\w+)\s+fish\b/gi) {
749 if (++$count == $WANT) {
750 print "The third fish is a $1 one.\n";
751 }
752 }
5a964f20 753
92c2ed05 754That prints out: C<"The third fish is a red one."> You can also use a
5a964f20 755repetition count and repeated pattern like this:
756
ac9dac7f 757 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 758
68dc0745 759=head2 How can I count the number of occurrences of a substring within a string?
760
a6dd486b 761There are a number of ways, with varying efficiency. If you want a
68dc0745 762count of a certain single character (X) within a string, you can use the
763C<tr///> function like so:
764
ac9dac7f 765 $string = "ThisXlineXhasXsomeXx'sXinXit";
766 $count = ($string =~ tr/X//);
767 print "There are $count X characters in the string";
68dc0745 768
769This is fine if you are just looking for a single character. However,
770if you are trying to count multiple character substrings within a
771larger string, C<tr///> won't work. What you can do is wrap a while()
772loop around a global pattern match. For example, let's count negative
773integers:
774
ac9dac7f 775 $string = "-9 55 48 -2 23 -76 4 14 -44";
776 while ($string =~ /-\d+/g) { $count++ }
777 print "There are $count negative numbers in the string";
68dc0745 778
881bdbd4 779Another version uses a global match in list context, then assigns the
780result to a scalar, producing a count of the number of matches.
781
782 $count = () = $string =~ /-\d+/g;
783
68dc0745 784=head2 How do I capitalize all the words on one line?
785
786To make the first letter of each word upper case:
3fe9a6f1 787
ac9dac7f 788 $line =~ s/\b(\w)/\U$1/g;
68dc0745 789
46fc3d4c 790This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 791Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 792more thorough solution (Suggested by brian d foy):
46fc3d4c 793
ac9dac7f 794 $string =~ s/ (
795 (^\w) #at the beginning of the line
796 | # or
797 (\s\w) #preceded by whitespace
798 )
799 /\U$1/xg;
800
801 $string =~ s/([\w']+)/\u\L$1/g;
46fc3d4c 802
68dc0745 803To make the whole line upper case:
3fe9a6f1 804
ac9dac7f 805 $line = uc($line);
68dc0745 806
807To force each word to be lower case, with the first letter upper case:
3fe9a6f1 808
ac9dac7f 809 $line =~ s/(\w+)/\u\L$1/g;
68dc0745 810
5a964f20 811You can (and probably should) enable locale awareness of those
812characters by placing a C<use locale> pragma in your program.
92c2ed05 813See L<perllocale> for endless details on locales.
5a964f20 814
65acb1b1 815This is sometimes referred to as putting something into "title
d92eb7b0 816case", but that's not quite accurate. Consider the proper
65acb1b1 817capitalization of the movie I<Dr. Strangelove or: How I Learned to
818Stop Worrying and Love the Bomb>, for example.
819
369b44b4 820Damian Conway's L<Text::Autoformat> module provides some smart
821case transformations:
822
ac9dac7f 823 use Text::Autoformat;
824 my $x = "Dr. Strangelove or: How I Learned to Stop ".
825 "Worrying and Love the Bomb";
369b44b4 826
ac9dac7f 827 print $x, "\n";
828 for my $style (qw( sentence title highlight )) {
829 print autoformat($x, { case => $style }), "\n";
830 }
369b44b4 831
49d635f9 832=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 833
ac9dac7f 834Several modules can handle this sort of parsing--C<Text::Balanced>,
835C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9 836
837Take the example case of trying to split a string that is
838comma-separated into its different fields. You can't use C<split(/,/)>
839because you shouldn't split if the comma is inside quotes. For
840example, take a data line like this:
68dc0745 841
ac9dac7f 842 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 843
844Due to the restriction of the quotes, this is a fairly complex
197aec24 845problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 846I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 847suggests (assuming your string is contained in C<$text>):
68dc0745 848
ac9dac7f 849 @new = ();
850 push(@new, $+) while $text =~ m{
851 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
852 | ([^,]+),?
853 | ,
854 }gx;
855 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 856
46fc3d4c 857If you want to represent quotation marks inside a
858quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 859C<"like \"this\"">.
46fc3d4c 860
ac9dac7f 861Alternatively, the C<Text::ParseWords> module (part of the standard
862Perl distribution) lets you say:
68dc0745 863
ac9dac7f 864 use Text::ParseWords;
865 @new = quotewords(",", 0, $text);
65acb1b1 866
68dc0745 867=head2 How do I strip blank space from the beginning/end of a string?
868
6670e5e7 869(contributed by brian d foy)
68dc0745 870
6670e5e7 871A substitution can do this for you. For a single line, you want to
872replace all the leading or trailing whitespace with nothing. You
873can do that with a pair of substitutions.
68dc0745 874
6670e5e7 875 s/^\s+//;
876 s/\s+$//;
68dc0745 877
6670e5e7 878You can also write that as a single substitution, although it turns
879out the combined statement is slower than the separate ones. That
880might not matter to you, though.
68dc0745 881
6670e5e7 882 s/^\s+|\s+$//g;
68dc0745 883
6670e5e7 884In this regular expression, the alternation matches either at the
885beginning or the end of the string since the anchors have a lower
886precedence than the alternation. With the C</g> flag, the substitution
887makes all possible matches, so it gets both. Remember, the trailing
888newline matches the C<\s+>, and the C<$> anchor can match to the
889physical end of the string, so the newline disappears too. Just add
890the newline to the output, which has the added benefit of preserving
891"blank" (consisting entirely of whitespace) lines which the C<^\s+>
892would remove all by itself.
68dc0745 893
6670e5e7 894 while( <> )
895 {
896 s/^\s+|\s+$//g;
897 print "$_\n";
898 }
5a964f20 899
6670e5e7 900For a multi-line string, you can apply the regular expression
901to each logical line in the string by adding the C</m> flag (for
902"multi-line"). With the C</m> flag, the C<$> matches I<before> an
903embedded newline, so it doesn't remove it. It still removes the
904newline at the end of the string.
905
ac9dac7f 906 $string =~ s/^\s+|\s+$//gm;
6670e5e7 907
908Remember that lines consisting entirely of whitespace will disappear,
909since the first part of the alternation can match the entire string
910and replace it with nothing. If need to keep embedded blank lines,
911you have to do a little more work. Instead of matching any whitespace
912(since that includes a newline), just match the other whitespace.
913
914 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 915
65acb1b1 916=head2 How do I pad a string with blanks or pad a number with zeroes?
917
65acb1b1 918In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0 919to pad the string, C<$text> or C<$num> contains the string to be padded,
920and C<$pad_char> contains the padding character. You can use a single
921character string constant instead of the C<$pad_char> variable if you
922know what it is in advance. And in the same way you can use an integer in
923place of C<$pad_len> if you know the pad length in advance.
65acb1b1 924
d92eb7b0 925The simplest method uses the C<sprintf> function. It can pad on the left
926or right with blanks and on the left with zeroes and it will not
927truncate the result. The C<pack> function can only pad strings on the
928right with blanks and it will truncate the result to a maximum length of
929C<$pad_len>.
65acb1b1 930
ac9dac7f 931 # Left padding a string with blanks (no truncation):
04d666b1 932 $padded = sprintf("%${pad_len}s", $text);
933 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 934
ac9dac7f 935 # Right padding a string with blanks (no truncation):
04d666b1 936 $padded = sprintf("%-${pad_len}s", $text);
937 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 938
ac9dac7f 939 # Left padding a number with 0 (no truncation):
04d666b1 940 $padded = sprintf("%0${pad_len}d", $num);
941 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 942
ac9dac7f 943 # Right padding a string with blanks using pack (will truncate):
944 $padded = pack("A$pad_len",$text);
65acb1b1 945
d92eb7b0 946If you need to pad with a character other than blank or zero you can use
947one of the following methods. They all generate a pad string with the
948C<x> operator and combine that with C<$text>. These methods do
949not truncate C<$text>.
65acb1b1 950
d92eb7b0 951Left and right padding with any character, creating a new string:
65acb1b1 952
ac9dac7f 953 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
954 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 955
d92eb7b0 956Left and right padding with any character, modifying C<$text> directly:
65acb1b1 957
ac9dac7f 958 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
959 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 960
68dc0745 961=head2 How do I extract selected columns from a string?
962
e573f903 963(contributed by brian d foy)
964
965If you know where the columns that contain the data, you can
966use C<substr> to extract a single column.
967
968 my $column = substr( $line, $start_column, $length );
969
970You can use C<split> if the columns are separated by whitespace or
971some other delimiter, as long as whitespace or the delimiter cannot
972appear as part of the data.
973
974 my $line = ' fred barney betty ';
975 my @columns = split /\s+/, $line;
976 # ( '', 'fred', 'barney', 'betty' );
977
978 my $line = 'fred||barney||betty';
979 my @columns = split /\|/, $line;
980 # ( 'fred', '', 'barney', '', 'betty' );
981
982If you want to work with comma-separated values, don't do this since
983that format is a bit more complicated. Use one of the modules that
984handle that fornat, such as C<Text::CSV>, C<Text::CSV_XS>, or
985C<Text::CSV_PP>.
986
987If you want to break apart an entire line of fixed columns, you can use
988C<unpack> with the A (ASCII) format. by using a number after the format
989specifier, you can denote the column width. See the C<pack> and C<unpack>
990entries in L<perlfunc> for more details.
991
992 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
993
994Note that spaces in the format argument to C<unpack> do not denote literal
995spaces. If you have space separated data, you may want C<split> instead.
68dc0745 996
997=head2 How do I find the soundex value of a string?
998
7678cced 999(contributed by brian d foy)
1000
1001You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f 1002matching, you might also try the C<String::Approx>, and
1003C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1004
1005=head2 How can I expand variables in text strings?
1006
e573f903 1007(contributed by brian d foy)
5a964f20 1008
e573f903 1009For example, I'll use a string that has two Perl scalar variables
1010in it. In this example, I want to expand C<$foo> and C<$bar> to
1011their variable's values.
1012
1013 my $foo = 'Fred';
1014 my $bar = 'Barney';
1015 $string = 'Say hello to $foo and $bar';
1016
1017One way I can do this involves the substitution operator and a double
1018C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1019turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1020it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1021what's left in the string.
1022
1023 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1024
1025The C</e> will also silently ignore violations of strict, replacing
1026undefined variable names with the empty string.
1027
1028I could also pull the values from a hash instead of evaluating
1029variable names. Using a single C</e>, I can check the hash to ensure
1030the value exists, and if it doesn't, I can replace the missing value
1031with a marker, in this case C<???> to signal that I missed something:
1032
1033 my $string = 'This has $foo and $bar';
1034
1035 my %Replacements = (
1036 foo => 'Fred',
ac9dac7f 1037 );
e573f903 1038
1039 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1040 $string =~ s/\$(\w+)/
1041 exists $Replacements{$1} ? $Replacements{$1} : '???'
1042 /eg;
1043
1044 print $string;
1045
68dc0745 1046=head2 What's wrong with always quoting "$vars"?
1047
ac9dac7f 1048The problem is that those double-quotes force
e573f903 1049stringification--coercing numbers and references into strings--even
1050when you don't want them to be strings. Think of it this way:
1051double-quote expansion is used to produce new strings. If you already
1052have a string, why do you need more?
68dc0745 1053
1054If you get used to writing odd things like these:
1055
ac9dac7f 1056 print "$var"; # BAD
1057 $new = "$old"; # BAD
1058 somefunc("$var"); # BAD
68dc0745 1059
1060You'll be in trouble. Those should (in 99.8% of the cases) be
1061the simpler and more direct:
1062
ac9dac7f 1063 print $var;
1064 $new = $old;
1065 somefunc($var);
68dc0745 1066
1067Otherwise, besides slowing you down, you're going to break code when
1068the thing in the scalar is actually neither a string nor a number, but
1069a reference:
1070
ac9dac7f 1071 func(\@array);
1072 sub func {
1073 my $aref = shift;
1074 my $oref = "$aref"; # WRONG
1075 }
68dc0745 1076
1077You can also get into subtle problems on those few operations in Perl
1078that actually do care about the difference between a string and a
1079number, such as the magical C<++> autoincrement operator or the
1080syscall() function.
1081
197aec24 1082Stringification also destroys arrays.
5a964f20 1083
ac9dac7f 1084 @lines = `command`;
1085 print "@lines"; # WRONG - extra blanks
1086 print @lines; # right
5a964f20 1087
04d666b1 1088=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1089
1090Check for these three things:
1091
1092=over 4
1093
04d666b1 1094=item There must be no space after the E<lt>E<lt> part.
68dc0745 1095
197aec24 1096=item There (probably) should be a semicolon at the end.
68dc0745 1097
197aec24 1098=item You can't (easily) have any space in front of the tag.
68dc0745 1099
1100=back
1101
197aec24 1102If you want to indent the text in the here document, you
5a964f20 1103can do this:
1104
1105 # all in one
1106 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1107 your text
1108 goes here
1109 HERE_TARGET
1110
1111But the HERE_TARGET must still be flush against the margin.
197aec24 1112If you want that indented also, you'll have to quote
5a964f20 1113in the indentation.
1114
1115 ($quote = <<' FINIS') =~ s/^\s+//gm;
1116 ...we will have peace, when you and all your works have
1117 perished--and the works of your dark master to whom you
1118 would deliver us. You are a liar, Saruman, and a corrupter
1119 of men's hearts. --Theoden in /usr/src/perl/taint.c
1120 FINIS
83ded9ee 1121 $quote =~ s/\s+--/\n--/;
5a964f20 1122
1123A nice general-purpose fixer-upper function for indented here documents
1124follows. It expects to be called with a here document as its argument.
1125It looks to see whether each line begins with a common substring, and
a6dd486b 1126if so, strips that substring off. Otherwise, it takes the amount of leading
1127whitespace found on the first line and removes that much off each
5a964f20 1128subsequent line.
1129
1130 sub fix {
1131 local $_ = shift;
a6dd486b 1132 my ($white, $leader); # common whitespace and common leading string
5a964f20 1133 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1134 ($white, $leader) = ($2, quotemeta($1));
1135 } else {
1136 ($white, $leader) = (/^(\s+)/, '');
1137 }
1138 s/^\s*?$leader(?:$white)?//gm;
1139 return $_;
1140 }
1141
c8db1d39 1142This works with leading special strings, dynamically determined:
5a964f20 1143
ac9dac7f 1144 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20 1145 @@@ int
1146 @@@ runops() {
1147 @@@ SAVEI32(runlevel);
1148 @@@ runlevel++;
d92eb7b0 1149 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20 1150 @@@ TAINT_NOT;
1151 @@@ return 0;
1152 @@@ }
ac9dac7f 1153 MAIN_INTERPRETER_LOOP
5a964f20 1154
a6dd486b 1155Or with a fixed amount of leading whitespace, with remaining
5a964f20 1156indentation correctly preserved:
1157
ac9dac7f 1158 $poem = fix<<EVER_ON_AND_ON;
5a964f20 1159 Now far ahead the Road has gone,
1160 And I must follow, if I can,
1161 Pursuing it with eager feet,
1162 Until it joins some larger way
1163 Where many paths and errands meet.
1164 And whither then? I cannot say.
1165 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1166 EVER_ON_AND_ON
5a964f20 1167
68dc0745 1168=head1 Data: Arrays
1169
65acb1b1 1170=head2 What is the difference between a list and an array?
1171
ac9dac7f 1172An array has a changeable length. A list does not. An array is
1173something you can push or pop, while a list is a set of values. Some
1174people make the distinction that a list is a value while an array is a
1175variable. Subroutines are passed and return lists, you put things into
1176list context, you initialize arrays with lists, and you C<foreach()>
1177across a list. C<@> variables are arrays, anonymous arrays are
1178arrays, arrays in scalar context behave like the number of elements in
1179them, subroutines access their arguments through the array C<@_>, and
1180C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1 1181
1182As a side note, there's no such thing as a list in scalar context.
1183When you say
1184
ac9dac7f 1185 $scalar = (2, 5, 7, 9);
65acb1b1 1186
d92eb7b0 1187you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1188comma operator. There never was a list there at all! This causes the
d92eb7b0 1189last value to be returned: 9.
65acb1b1 1190
68dc0745 1191=head2 What is the difference between $array[1] and @array[1]?
1192
a6dd486b 1193The former is a scalar value; the latter an array slice, making
68dc0745 1194it a list with one (scalar) value. You should use $ when you want a
1195scalar value (most of the time) and @ when you want a list with one
1196scalar value in it (very, very rarely; nearly never, in fact).
1197
1198Sometimes it doesn't make a difference, but sometimes it does.
1199For example, compare:
1200
ac9dac7f 1201 $good[0] = `some program that outputs several lines`;
68dc0745 1202
1203with
1204
ac9dac7f 1205 @bad[0] = `same program that outputs several lines`;
68dc0745 1206
197aec24 1207The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1208matters.
68dc0745 1209
d92eb7b0 1210=head2 How can I remove duplicate elements from a list or array?
68dc0745 1211
6670e5e7 1212(contributed by brian d foy)
68dc0745 1213
6670e5e7 1214Use a hash. When you think the words "unique" or "duplicated", think
1215"hash keys".
68dc0745 1216
6670e5e7 1217If you don't care about the order of the elements, you could just
1218create the hash then extract the keys. It's not important how you
1219create that hash: just that you use C<keys> to get the unique
1220elements.
551e1d92 1221
ac9dac7f 1222 my %hash = map { $_, 1 } @array;
1223 # or a hash slice: @hash{ @array } = ();
1224 # or a foreach: $hash{$_} = 1 foreach ( @array );
1225
1226 my @unique = keys %hash;
68dc0745 1227
ac9dac7f 1228If you want to use a module, try the C<uniq> function from
1229C<List::MoreUtils>. In list context it returns the unique elements,
1230preserving their order in the list. In scalar context, it returns the
1231number of unique elements.
1232
1233 use List::MoreUtils qw(uniq);
1234
1235 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1236 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1237
6670e5e7 1238You can also go through each element and skip the ones you've seen
1239before. Use a hash to keep track. The first time the loop sees an
1240element, that element has no key in C<%Seen>. The C<next> statement
1241creates the key and immediately uses its value, which is C<undef>, so
1242the loop continues to the C<push> and increments the value for that
1243key. The next time the loop sees that same element, its key exists in
1244the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f 1245C<undef>), so the next skips that iteration and the loop goes to the
1246next element.
551e1d92 1247
6670e5e7 1248 my @unique = ();
1249 my %seen = ();
68dc0745 1250
6670e5e7 1251 foreach my $elem ( @array )
1252 {
1253 next if $seen{ $elem }++;
1254 push @unique, $elem;
1255 }
68dc0745 1256
6670e5e7 1257You can write this more briefly using a grep, which does the
1258same thing.
68dc0745 1259
ac9dac7f 1260 my %seen = ();
1261 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1262
ddbc1f16 1263=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1264
9e72e4c6 1265(portions of this answer contributed by Anno Siegel)
1266
5a964f20 1267Hearing the word "in" is an I<in>dication that you probably should have
1268used a hash, not a list or array, to store your data. Hashes are
1269designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1270
5a964f20 1271That being said, there are several ways to approach this. If you
1272are going to make this query many times over arbitrary string values,
881bdbd4 1273the fastest way is probably to invert the original array and maintain a
1274hash whose keys are the first array's values.
68dc0745 1275
ac9dac7f 1276 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1277 %is_blue = ();
1278 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1279
ac9dac7f 1280Now you can check whether C<$is_blue{$some_color}>. It might have
1281been a good idea to keep the blues all in a hash in the first place.
68dc0745 1282
1283If the values are all small integers, you could use a simple indexed
1284array. This kind of an array will take up less space:
1285
ac9dac7f 1286 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1287 @is_tiny_prime = ();
1288 for (@primes) { $is_tiny_prime[$_] = 1 }
1289 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1290
1291Now you check whether $is_tiny_prime[$some_number].
1292
1293If the values in question are integers instead of strings, you can save
1294quite a lot of space by using bit strings instead:
1295
ac9dac7f 1296 @articles = ( 1..10, 150..2000, 2017 );
1297 undef $read;
1298 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1299
1300Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1301
9e72e4c6 1302These methods guarantee fast individual tests but require a re-organization
1303of the original list or array. They only pay off if you have to test
1304multiple values against the same array.
68dc0745 1305
ac9dac7f 1306If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1307the function C<first> for this purpose. It works by stopping once it
1308finds the element. It's written in C for speed, and its Perl equivalant
1309looks like this subroutine:
68dc0745 1310
9e72e4c6 1311 sub first (&@) {
1312 my $code = shift;
1313 foreach (@_) {
1314 return $_ if &{$code}();
1315 }
1316 undef;
1317 }
68dc0745 1318
9e72e4c6 1319If speed is of little concern, the common idiom uses grep in scalar context
1320(which returns the number of items that passed its condition) to traverse the
1321entire list. This does have the benefit of telling you how many matches it
1322found, though.
68dc0745 1323
9e72e4c6 1324 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1325
9e72e4c6 1326If you want to actually extract the matching elements, simply use grep in
1327list context.
68dc0745 1328
9e72e4c6 1329 my @matches = grep $_ eq $whatever, @array;
58103a2e 1330
68dc0745 1331=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1332
ac9dac7f 1333Use a hash. Here's code to do both and more. It assumes that each
1334element is unique in a given array:
68dc0745 1335
ac9dac7f 1336 @union = @intersection = @difference = ();
1337 %count = ();
1338 foreach $element (@array1, @array2) { $count{$element}++ }
1339 foreach $element (keys %count) {
1340 push @union, $element;
1341 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1342 }
68dc0745 1343
ac9dac7f 1344Note that this is the I<symmetric difference>, that is, all elements
1345in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1346
65acb1b1 1347=head2 How do I test whether two arrays or hashes are equal?
1348
ac9dac7f 1349The following code works for single-level arrays. It uses a
1350stringwise comparison, and does not distinguish defined versus
1351undefined empty strings. Modify if you have other needs.
65acb1b1 1352
ac9dac7f 1353 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1354
ac9dac7f 1355 sub compare_arrays {
1356 my ($first, $second) = @_;
1357 no warnings; # silence spurious -w undef complaints
1358 return 0 unless @$first == @$second;
1359 for (my $i = 0; $i < @$first; $i++) {
1360 return 0 if $first->[$i] ne $second->[$i];
1361 }
1362 return 1;
1363 }
65acb1b1 1364
1365For multilevel structures, you may wish to use an approach more
ac9dac7f 1366like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1367
ac9dac7f 1368 use FreezeThaw qw(cmpStr);
1369 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1370
ac9dac7f 1371 printf "a and b contain %s arrays\n",
1372 cmpStr(\@a, \@b) == 0
1373 ? "the same"
1374 : "different";
65acb1b1 1375
ac9dac7f 1376This approach also works for comparing hashes. Here we'll demonstrate
1377two different answers:
65acb1b1 1378
ac9dac7f 1379 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1380
ac9dac7f 1381 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1382 $a{EXTRA} = \%b;
1383 $b{EXTRA} = \%a;
65acb1b1 1384
ac9dac7f 1385 printf "a and b contain %s hashes\n",
65acb1b1 1386 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1387
ac9dac7f 1388 printf "a and b contain %s hashes\n",
65acb1b1 1389 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1390
1391
1392The first reports that both those the hashes contain the same data,
1393while the second reports that they do not. Which you prefer is left as
1394an exercise to the reader.
1395
68dc0745 1396=head2 How do I find the first array element for which a condition is true?
1397
49d635f9 1398To find the first array element which satisfies a condition, you can
ac9dac7f 1399use the C<first()> function in the C<List::Util> module, which comes
1400with Perl 5.8. This example finds the first element that contains
1401"Perl".
49d635f9 1402
1403 use List::Util qw(first);
197aec24 1404
49d635f9 1405 my $element = first { /Perl/ } @array;
197aec24 1406
ac9dac7f 1407If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9 1408same thing. Once you find the element, you stop the loop with last.
1409
1410 my $found;
ac9dac7f 1411 foreach ( @array ) {
6670e5e7 1412 if( /Perl/ ) { $found = $_; last }
49d635f9 1413 }
1414
1415If you want the array index, you can iterate through the indices
1416and check the array element at each index until you find one
1417that satisfies the condition.
1418
197aec24 1419 my( $found, $index ) = ( undef, -1 );
ac9dac7f 1420 for( $i = 0; $i < @array; $i++ ) {
1421 if( $array[$i] =~ /Perl/ ) {
6670e5e7 1422 $found = $array[$i];
1423 $index = $i;
1424 last;
1425 }
1426 }
68dc0745 1427
1428=head2 How do I handle linked lists?
1429
1430In general, you usually don't need a linked list in Perl, since with
ac9dac7f 1431regular arrays, you can push and pop or shift and unshift at either
1432end, or you can use splice to add and/or remove arbitrary number of
1433elements at arbitrary points. Both pop and shift are both O(1)
1434operations on Perl's dynamic arrays. In the absence of shifts and
1435pops, push in general needs to reallocate on the order every log(N)
1436times, and unshift will need to copy pointers each time.
68dc0745 1437
1438If you really, really wanted, you could use structures as described in
ac9dac7f 1439L<perldsc> or L<perltoot> and do just what the algorithm book tells
1440you to do. For example, imagine a list node like this:
65acb1b1 1441
ac9dac7f 1442 $node = {
1443 VALUE => 42,
1444 LINK => undef,
1445 };
65acb1b1 1446
1447You could walk the list this way:
1448
ac9dac7f 1449 print "List: ";
1450 for ($node = $head; $node; $node = $node->{LINK}) {
1451 print $node->{VALUE}, " ";
1452 }
1453 print "\n";
65acb1b1 1454
a6dd486b 1455You could add to the list this way:
65acb1b1 1456
ac9dac7f 1457 my ($head, $tail);
1458 $tail = append($head, 1); # grow a new head
1459 for $value ( 2 .. 10 ) {
1460 $tail = append($tail, $value);
1461 }
65acb1b1 1462
ac9dac7f 1463 sub append {
1464 my($list, $value) = @_;
1465 my $node = { VALUE => $value };
1466 if ($list) {
1467 $node->{LINK} = $list->{LINK};
1468 $list->{LINK} = $node;
1469 }
1470 else {
1471 $_[0] = $node; # replace caller's version
1472 }
1473 return $node;
1474 }
65acb1b1 1475
1476But again, Perl's built-in are virtually always good enough.
68dc0745 1477
1478=head2 How do I handle circular lists?
1479
1480Circular lists could be handled in the traditional fashion with linked
1481lists, or you could just do something like this with an array:
1482
ac9dac7f 1483 unshift(@array, pop(@array)); # the last shall be first
1484 push(@array, shift(@array)); # and vice versa
1485
1486You can also use C<Tie::Cycle>:
1487
1488 use Tie::Cycle;
1489
1490 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1491
1492 print $cycle; # FFFFFF
1493 print $cycle; # 000000
1494 print $cycle; # FFFF00
68dc0745 1495
1496=head2 How do I shuffle an array randomly?
1497
45bbf655 1498If you either have Perl 5.8.0 or later installed, or if you have
1499Scalar-List-Utils 1.03 or later installed, you can say:
1500
ac9dac7f 1501 use List::Util 'shuffle';
45bbf655 1502
1503 @shuffled = shuffle(@list);
1504
f05bbc40 1505If not, you can use a Fisher-Yates shuffle.
5a964f20 1506
ac9dac7f 1507 sub fisher_yates_shuffle {
1508 my $deck = shift; # $deck is a reference to an array
1509 my $i = @$deck;
1510 while (--$i) {
1511 my $j = int rand ($i+1);
1512 @$deck[$i,$j] = @$deck[$j,$i];
1513 }
1514 }
5a964f20 1515
ac9dac7f 1516 # shuffle my mpeg collection
1517 #
1518 my @mpeg = <audio/*/*.mp3>;
1519 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1520 print @mpeg;
5a964f20 1521
45bbf655 1522Note that the above implementation shuffles an array in place,
ac9dac7f 1523unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655 1524a new shuffled list.
1525
d92eb7b0 1526You've probably seen shuffling algorithms that work using splice,
a6dd486b 1527randomly picking another element to swap the current element with
68dc0745 1528
ac9dac7f 1529 srand;
1530 @new = ();
1531 @old = 1 .. 10; # just a demo
1532 while (@old) {
1533 push(@new, splice(@old, rand @old, 1));
1534 }
68dc0745 1535
ac9dac7f 1536This is bad because splice is already O(N), and since you do it N
1537times, you just invented a quadratic algorithm; that is, O(N**2).
1538This does not scale, although Perl is so efficient that you probably
1539won't notice this until you have rather largish arrays.
68dc0745 1540
1541=head2 How do I process/modify each element of an array?
1542
1543Use C<for>/C<foreach>:
1544
ac9dac7f 1545 for (@lines) {
6670e5e7 1546 s/foo/bar/; # change that word
1547 tr/XZ/ZX/; # swap those letters
ac9dac7f 1548 }
68dc0745 1549
1550Here's another; let's compute spherical volumes:
1551
ac9dac7f 1552 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7 1553 $_ **= 3;
1554 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1555 }
197aec24 1556
ac9dac7f 1557which can also be done with C<map()> which is made to transform
49d635f9 1558one list into another:
1559
1560 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1561
76817d6d 1562If you want to do the same thing to modify the values of the
1563hash, you can use the C<values> function. As of Perl 5.6
1564the values are not copied, so if you modify $orbit (in this
1565case), you modify the value.
5a964f20 1566
ac9dac7f 1567 for $orbit ( values %orbits ) {
6670e5e7 1568 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1569 }
818c4caa 1570
76817d6d 1571Prior to perl 5.6 C<values> returned copies of the values,
1572so older perl code often contains constructions such as
1573C<@orbits{keys %orbits}> instead of C<values %orbits> where
1574the hash is to be modified.
818c4caa 1575
68dc0745 1576=head2 How do I select a random element from an array?
1577
ac9dac7f 1578Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1579
ac9dac7f 1580 $index = rand @array;
1581 $element = $array[$index];
68dc0745 1582
793f5136 1583Or, simply:
ac9dac7f 1584
1585 my $element = $array[ rand @array ];
5a964f20 1586
68dc0745 1587=head2 How do I permute N elements of a list?
1588
ac9dac7f 1589Use the C<List::Permutor> module on CPAN. If the list is actually an
1590array, try the C<Algorithm::Permute> module (also on CPAN). It's
1591written in XS code and is very efficient.
49d635f9 1592
1593 use Algorithm::Permute;
1594 my @array = 'a'..'d';
1595 my $p_iterator = Algorithm::Permute->new ( \@array );
1596 while (my @perm = $p_iterator->next) {
1597 print "next permutation: (@perm)\n";
ac9dac7f 1598 }
49d635f9 1599
197aec24 1600For even faster execution, you could do:
1601
ac9dac7f 1602 use Algorithm::Permute;
1603 my @array = 'a'..'d';
1604 Algorithm::Permute::permute {
1605 print "next permutation: (@array)\n";
1606 } @array;
197aec24 1607
49d635f9 1608Here's a little program that generates all permutations of
1609all the words on each line of input. The algorithm embodied
ac9dac7f 1610in the C<permute()> function is discussed in Volume 4 (still
49d635f9 1611unpublished) of Knuth's I<The Art of Computer Programming>
1612and will work on any list:
1613
1614 #!/usr/bin/perl -n
1615 # Fischer-Kause ordered permutation generator
1616
1617 sub permute (&@) {
1618 my $code = shift;
1619 my @idx = 0..$#_;
1620 while ( $code->(@_[@idx]) ) {
1621 my $p = $#idx;
1622 --$p while $idx[$p-1] > $idx[$p];
1623 my $q = $p or return;
1624 push @idx, reverse splice @idx, $p;
1625 ++$q while $idx[$p-1] > $idx[$q];
1626 @idx[$p-1,$q]=@idx[$q,$p-1];
1627 }
68dc0745 1628 }
68dc0745 1629
49d635f9 1630 permute {print"@_\n"} split;
b8d2732a 1631
68dc0745 1632=head2 How do I sort an array by (anything)?
1633
1634Supply a comparison function to sort() (described in L<perlfunc/sort>):
1635
ac9dac7f 1636 @list = sort { $a <=> $b } @list;
68dc0745 1637
1638The default sort function is cmp, string comparison, which would
c47ff5f1 1639sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1640the numerical comparison operator.
1641
1642If you have a complicated function needed to pull out the part you
1643want to sort on, then don't do it inside the sort function. Pull it
1644out first, because the sort BLOCK can be called many times for the
1645same element. Here's an example of how to pull out the first word
1646after the first number on each item, and then sort those words
1647case-insensitively.
1648
ac9dac7f 1649 @idx = ();
1650 for (@data) {
1651 ($item) = /\d+\s*(\S+)/;
1652 push @idx, uc($item);
1653 }
1654 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1655
a6dd486b 1656which could also be written this way, using a trick
68dc0745 1657that's come to be known as the Schwartzian Transform:
1658
ac9dac7f 1659 @sorted = map { $_->[0] }
1660 sort { $a->[1] cmp $b->[1] }
1661 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1662
1663If you need to sort on several fields, the following paradigm is useful.
1664
ac9dac7f 1665 @sorted = sort {
1666 field1($a) <=> field1($b) ||
1667 field2($a) cmp field2($b) ||
1668 field3($a) cmp field3($b)
1669 } @data;
68dc0745 1670
1671This can be conveniently combined with precalculation of keys as given
1672above.
1673
379e39d7 1674See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1675To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1676more about this approach.
68dc0745 1677
ac9dac7f 1678See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1679
1680=head2 How do I manipulate arrays of bits?
1681
ac9dac7f 1682Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1683operations.
1684
1685For example, this sets C<$vec> to have bit N set if C<$ints[N]> was
1686set:
1687
1688 $vec = '';
1689 foreach(@ints) { vec($vec,$_,1) = 1 }
1690
1691Here's how, given a vector in C<$vec>, you can get those bits into your
1692C<@ints> array:
1693
1694 sub bitvec_to_list {
1695 my $vec = shift;
1696 my @ints;
1697 # Find null-byte density then select best algorithm
1698 if ($vec =~ tr/\0// / length $vec > 0.95) {
1699 use integer;
1700 my $i;
1701
1702 # This method is faster with mostly null-bytes
1703 while($vec =~ /[^\0]/g ) {
1704 $i = -9 + 8 * pos $vec;
1705 push @ints, $i if vec($vec, ++$i, 1);
1706 push @ints, $i if vec($vec, ++$i, 1);
1707 push @ints, $i if vec($vec, ++$i, 1);
1708 push @ints, $i if vec($vec, ++$i, 1);
1709 push @ints, $i if vec($vec, ++$i, 1);
1710 push @ints, $i if vec($vec, ++$i, 1);
1711 push @ints, $i if vec($vec, ++$i, 1);
1712 push @ints, $i if vec($vec, ++$i, 1);
1713 }
1714 }
1715 else {
1716 # This method is a fast general algorithm
1717 use integer;
1718 my $bits = unpack "b*", $vec;
1719 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1720 push @ints, pos $bits while($bits =~ /1/g);
1721 }
1722
1723 return \@ints;
1724 }
68dc0745 1725
1726This method gets faster the more sparse the bit vector is.
1727(Courtesy of Tim Bunce and Winfried Koenig.)
1728
76817d6d 1729You can make the while loop a lot shorter with this suggestion
1730from Benjamin Goldberg:
1731
1732 while($vec =~ /[^\0]+/g ) {
ac9dac7f 1733 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1734 }
76817d6d 1735
ac9dac7f 1736Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1737
ac9dac7f 1738 $vector = Bit::Vector->new($num_of_bits);
1739 $vector->Index_List_Store(@ints);
1740 @ints = $vector->Index_List_Read();
cc30d1a7 1741
ac9dac7f 1742C<Bit::Vector> provides efficient methods for bit vector, sets of
1743small integers and "big int" math.
cc30d1a7 1744
1745Here's a more extensive illustration using vec():
65acb1b1 1746
ac9dac7f 1747 # vec demo
1748 $vector = "\xff\x0f\xef\xfe";
1749 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1750 unpack("N", $vector), "\n";
ac9dac7f 1751 $is_set = vec($vector, 23, 1);
1752 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1753 pvec($vector);
65acb1b1 1754
ac9dac7f 1755 set_vec(1,1,1);
1756 set_vec(3,1,1);
1757 set_vec(23,1,1);
1758
1759 set_vec(3,1,3);
1760 set_vec(3,2,3);
1761 set_vec(3,4,3);
1762 set_vec(3,4,7);
1763 set_vec(3,8,3);
1764 set_vec(3,8,7);
1765
1766 set_vec(0,32,17);
1767 set_vec(1,32,17);
1768
1769 sub set_vec {
1770 my ($offset, $width, $value) = @_;
1771 my $vector = '';
1772 vec($vector, $offset, $width) = $value;
1773 print "offset=$offset width=$width value=$value\n";
1774 pvec($vector);
1775 }
65acb1b1 1776
ac9dac7f 1777 sub pvec {
1778 my $vector = shift;
1779 my $bits = unpack("b*", $vector);
1780 my $i = 0;
1781 my $BASE = 8;
1782
1783 print "vector length in bytes: ", length($vector), "\n";
1784 @bytes = unpack("A8" x length($vector), $bits);
1785 print "bits are: @bytes\n\n";
1786 }
65acb1b1 1787
68dc0745 1788=head2 Why does defined() return true on empty arrays and hashes?
1789
65acb1b1 1790The short story is that you should probably only use defined on scalars or
1791functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1792in the 5.004 release or later of Perl for more detail.
68dc0745 1793
1794=head1 Data: Hashes (Associative Arrays)
1795
1796=head2 How do I process an entire hash?
1797
1798Use the each() function (see L<perlfunc/each>) if you don't care
1799whether it's sorted:
1800
ac9dac7f 1801 while ( ($key, $value) = each %hash) {
1802 print "$key = $value\n";
1803 }
68dc0745 1804
1805If you want it sorted, you'll have to use foreach() on the result of
1806sorting the keys as shown in an earlier question.
1807
1808=head2 What happens if I add or remove keys from a hash while iterating over it?
1809
28b41a80 1810(contributed by brian d foy)
d92eb7b0 1811
28b41a80 1812The easy answer is "Don't do that!"
d92eb7b0 1813
28b41a80 1814If you iterate through the hash with each(), you can delete the key
1815most recently returned without worrying about it. If you delete or add
1816other keys, the iterator may skip or double up on them since perl
1817may rearrange the hash table. See the
1818entry for C<each()> in L<perlfunc>.
68dc0745 1819
1820=head2 How do I look up a hash element by value?
1821
1822Create a reverse hash:
1823
ac9dac7f 1824 %by_value = reverse %by_key;
1825 $key = $by_value{$value};
68dc0745 1826
1827That's not particularly efficient. It would be more space-efficient
1828to use:
1829
ac9dac7f 1830 while (($key, $value) = each %by_key) {
1831 $by_value{$value} = $key;
1832 }
68dc0745 1833
d92eb7b0 1834If your hash could have repeated values, the methods above will only find
1835one of the associated keys. This may or may not worry you. If it does
1836worry you, you can always reverse the hash into a hash of arrays instead:
1837
ac9dac7f 1838 while (($key, $value) = each %by_key) {
1839 push @{$key_list_by_value{$value}}, $key;
1840 }
68dc0745 1841
1842=head2 How can I know how many entries are in a hash?
1843
1844If you mean how many keys, then all you have to do is
875e5c2f 1845use the keys() function in a scalar context:
68dc0745 1846
875e5c2f 1847 $num_keys = keys %hash;
68dc0745 1848
197aec24 1849The keys() function also resets the iterator, which means that you may
1850see strange results if you use this between uses of other hash operators
875e5c2f 1851such as each().
68dc0745 1852
1853=head2 How do I sort a hash (optionally by value instead of key)?
1854
a05e4845 1855(contributed by brian d foy)
1856
1857To sort a hash, start with the keys. In this example, we give the list of
1858keys to the sort function which then compares them ASCIIbetically (which
1859might be affected by your locale settings). The output list has the keys
1860in ASCIIbetical order. Once we have the keys, we can go through them to
1861create a report which lists the keys in ASCIIbetical order.
1862
1863 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 1864
a05e4845 1865 foreach my $key ( @keys )
1866 {
1867 printf "%-20s %6d\n", $key, $hash{$value};
1868 }
1869
58103a2e 1870We could get more fancy in the C<sort()> block though. Instead of
a05e4845 1871comparing the keys, we can compute a value with them and use that
58103a2e 1872value as the comparison.
a05e4845 1873
1874For instance, to make our report order case-insensitive, we use
58103a2e 1875the C<\L> sequence in a double-quoted string to make everything
a05e4845 1876lowercase. The C<sort()> block then compares the lowercased
1877values to determine in which order to put the keys.
1878
1879 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 1880
a05e4845 1881Note: if the computation is expensive or the hash has many elements,
58103a2e 1882you may want to look at the Schwartzian Transform to cache the
a05e4845 1883computation results.
1884
1885If we want to sort by the hash value instead, we use the hash key
1886to look it up. We still get out a list of keys, but this time they
1887are ordered by their value.
1888
1889 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
1890
1891From there we can get more complex. If the hash values are the same,
1892we can provide a secondary sort on the hash key.
1893
58103a2e 1894 my @keys = sort {
1895 $hash{$a} <=> $hash{$b}
a05e4845 1896 or
1897 "\L$a" cmp "\L$b"
1898 } keys %hash;
68dc0745 1899
1900=head2 How can I always keep my hash sorted?
ac9dac7f 1901X<hash tie sort DB_File Tie::IxHash>
68dc0745 1902
ac9dac7f 1903You can look into using the C<DB_File> module and C<tie()> using the
1904C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
1905Databases">. The C<Tie::IxHash> module from CPAN might also be
1906instructive. Although this does keep your hash sorted, you might not
1907like the slow down you suffer from the tie interface. Are you sure you
1908need to do this? :)
68dc0745 1909
1910=head2 What's the difference between "delete" and "undef" with hashes?
1911
92993692 1912Hashes contain pairs of scalars: the first is the key, the
1913second is the value. The key will be coerced to a string,
1914although the value can be any kind of scalar: string,
ac9dac7f 1915number, or reference. If a key C<$key> is present in
92993692 1916%hash, C<exists($hash{$key})> will return true. The value
1917for a given key can be C<undef>, in which case
1918C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1919will return true. This corresponds to (C<$key>, C<undef>)
1920being in the hash.
68dc0745 1921
ac9dac7f 1922Pictures help... here's the C<%hash> table:
68dc0745 1923
1924 keys values
1925 +------+------+
1926 | a | 3 |
1927 | x | 7 |
1928 | d | 0 |
1929 | e | 2 |
1930 +------+------+
1931
1932And these conditions hold
1933
92993692 1934 $hash{'a'} is true
1935 $hash{'d'} is false
1936 defined $hash{'d'} is true
1937 defined $hash{'a'} is true
1938 exists $hash{'a'} is true (Perl5 only)
1939 grep ($_ eq 'a', keys %hash) is true
68dc0745 1940
1941If you now say
1942
92993692 1943 undef $hash{'a'}
68dc0745 1944
1945your table now reads:
1946
1947
1948 keys values
1949 +------+------+
1950 | a | undef|
1951 | x | 7 |
1952 | d | 0 |
1953 | e | 2 |
1954 +------+------+
1955
1956and these conditions now hold; changes in caps:
1957
92993692 1958 $hash{'a'} is FALSE
1959 $hash{'d'} is false
1960 defined $hash{'d'} is true
1961 defined $hash{'a'} is FALSE
1962 exists $hash{'a'} is true (Perl5 only)
1963 grep ($_ eq 'a', keys %hash) is true
68dc0745 1964
1965Notice the last two: you have an undef value, but a defined key!
1966
1967Now, consider this:
1968
92993692 1969 delete $hash{'a'}
68dc0745 1970
1971your table now reads:
1972
1973 keys values
1974 +------+------+
1975 | x | 7 |
1976 | d | 0 |
1977 | e | 2 |
1978 +------+------+
1979
1980and these conditions now hold; changes in caps:
1981
92993692 1982 $hash{'a'} is false
1983 $hash{'d'} is false
1984 defined $hash{'d'} is true
1985 defined $hash{'a'} is false
1986 exists $hash{'a'} is FALSE (Perl5 only)
1987 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 1988
1989See, the whole entry is gone!
1990
1991=head2 Why don't my tied hashes make the defined/exists distinction?
1992
92993692 1993This depends on the tied hash's implementation of EXISTS().
1994For example, there isn't the concept of undef with hashes
1995that are tied to DBM* files. It also means that exists() and
1996defined() do the same thing with a DBM* file, and what they
1997end up doing is not what they do with ordinary hashes.
68dc0745 1998
1999=head2 How do I reset an each() operation part-way through?
2000
5a964f20 2001Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 2002the hash I<and> resets the iterator associated with the hash. You may
ac9dac7f 2003need to do this if you use C<last> to exit a loop early so that when
2004you re-enter it, the hash iterator has been reset.
68dc0745 2005
2006=head2 How can I get the unique keys from two hashes?
2007
d92eb7b0 2008First you extract the keys from the hashes into lists, then solve
2009the "removing duplicates" problem described above. For example:
68dc0745 2010
ac9dac7f 2011 %seen = ();
2012 for $element (keys(%foo), keys(%bar)) {
2013 $seen{$element}++;
2014 }
2015 @uniq = keys %seen;
68dc0745 2016
2017Or more succinctly:
2018
ac9dac7f 2019 @uniq = keys %{{%foo,%bar}};
68dc0745 2020
2021Or if you really want to save space:
2022
ac9dac7f 2023 %seen = ();
2024 while (defined ($key = each %foo)) {
2025 $seen{$key}++;
2026 }
2027 while (defined ($key = each %bar)) {
2028 $seen{$key}++;
2029 }
2030 @uniq = keys %seen;
68dc0745 2031
2032=head2 How can I store a multidimensional array in a DBM file?
2033
2034Either stringify the structure yourself (no fun), or else
2035get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2036it on top of either DB_File or GDBM_File.
2037
2038=head2 How can I make my hash remember the order I put elements into it?
2039
ac9dac7f 2040Use the C<Tie::IxHash> from CPAN.
68dc0745 2041
ac9dac7f 2042 use Tie::IxHash;
2043
2044 tie my %myhash, 'Tie::IxHash';
2045
2046 for (my $i=0; $i<20; $i++) {
2047 $myhash{$i} = 2*$i;
2048 }
2049
2050 my @keys = keys %myhash;
2051 # @keys = (0,1,2,3,...)
46fc3d4c 2052
68dc0745 2053=head2 Why does passing a subroutine an undefined element in a hash create it?
2054
2055If you say something like:
2056
ac9dac7f 2057 somefunc($hash{"nonesuch key here"});
68dc0745 2058
2059Then that element "autovivifies"; that is, it springs into existence
2060whether you store something there or not. That's because functions
2061get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
2062it has to be ready to write it back into the caller's version.
2063
87275199 2064This has been fixed as of Perl5.004.
68dc0745 2065
2066Normally, merely accessing a key's value for a nonexistent key does
2067I<not> cause that key to be forever there. This is different than
2068awk's behavior.
2069
fc36a67e 2070=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2071
65acb1b1 2072Usually a hash ref, perhaps like this:
2073
ac9dac7f 2074 $record = {
2075 NAME => "Jason",
2076 EMPNO => 132,
2077 TITLE => "deputy peon",
2078 AGE => 23,
2079 SALARY => 37_000,
2080 PALS => [ "Norbert", "Rhys", "Phineas"],
2081 };
65acb1b1 2082
2083References are documented in L<perlref> and the upcoming L<perlreftut>.
2084Examples of complex data structures are given in L<perldsc> and
2085L<perllol>. Examples of structures and object-oriented classes are
2086in L<perltoot>.
68dc0745 2087
2088=head2 How can I use a reference as a hash key?
2089
9e72e4c6 2090(contributed by brian d foy)
2091
2092Hash keys are strings, so you can't really use a reference as the key.
2093When you try to do that, perl turns the reference into its stringified
ac9dac7f 2094form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2095back the reference from the stringified form, at least without doing
2096some extra work on your own. Also remember that hash keys must be
2097unique, but two different variables can store the same reference (and
2098those variables can change later).
9e72e4c6 2099
ac9dac7f 2100The C<Tie::RefHash> module, which is distributed with perl, might be
2101what you want. It handles that extra work.
68dc0745 2102
2103=head1 Data: Misc
2104
2105=head2 How do I handle binary data correctly?
2106
ac9dac7f 2107Perl is binary clean, so it can handle binary data just fine.
e573f903 2108On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f 2109files to avoid conversions for line endings. In general, you should
2110use C<binmode> any time you want to work with binary data.
68dc0745 2111
ac9dac7f 2112Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2113
ac9dac7f 2114If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2115If you want to deal with multibyte characters, however, there are
68dc0745 2116some gotchas. See the section on Regular Expressions.
2117
2118=head2 How do I determine whether a scalar is a number/whole/integer/float?
2119
2120Assuming that you don't care about IEEE notations like "NaN" or
2121"Infinity", you probably just want to use a regular expression.
2122
ac9dac7f 2123 if (/\D/) { print "has nondigits\n" }
2124 if (/^\d+$/) { print "is a whole number\n" }
2125 if (/^-?\d+$/) { print "is an integer\n" }
2126 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2127 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2128 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2129 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2130 { print "a C float\n" }
68dc0745 2131
f0d19b68 2132There are also some commonly used modules for the task.
2133L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f 2134internal function C<looks_like_number> for determining whether a
2135variable looks like a number. L<Data::Types> exports functions that
2136validate data types using both the above and other regular
2137expressions. Thirdly, there is C<Regexp::Common> which has regular
2138expressions to match various types of numbers. Those three modules are
2139available from the CPAN.
f0d19b68 2140
2141If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f 2142function. Its semantics are somewhat cumbersome, so here's a
2143C<getnum> wrapper function for more convenient access. This function
2144takes a string and returns the number it found, or C<undef> for input
2145that isn't a C float. The C<is_numeric> function is a front end to
2146C<getnum> if you just want to say, "Is this a float?"
2147
2148 sub getnum {
2149 use POSIX qw(strtod);
2150 my $str = shift;
2151 $str =~ s/^\s+//;
2152 $str =~ s/\s+$//;
2153 $! = 0;
2154 my($num, $unparsed) = strtod($str);
2155 if (($str eq '') || ($unparsed != 0) || $!) {
2156 return undef;
2157 }
2158 else {
2159 return $num;
2160 }
2161 }
5a964f20 2162
ac9dac7f 2163 sub is_numeric { defined getnum($_[0]) }
5a964f20 2164
f0d19b68 2165Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f 2166instead. The C<POSIX> module (part of the standard Perl distribution)
2167provides the C<strtod> and C<strtol> for converting strings to double
2168and longs, respectively.
68dc0745 2169
2170=head2 How do I keep persistent data across program calls?
2171
2172For some specific applications, you can use one of the DBM modules.
ac9dac7f 2173See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2174or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2175of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2176and C<retrieve> functions:
65acb1b1 2177
ac9dac7f 2178 use Storable;
2179 store(\%hash, "filename");
65acb1b1 2180
ac9dac7f 2181 # later on...
2182 $href = retrieve("filename"); # by ref
2183 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2184
2185=head2 How do I print out or copy a recursive data structure?
2186
ac9dac7f 2187The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2188for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a 21895.8 release of Perl), provides a function called C<dclone> that recursively
2190copies its argument.
65acb1b1 2191
ac9dac7f 2192 use Storable qw(dclone);
2193 $r2 = dclone($r1);
68dc0745 2194
ac9dac7f 2195Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1 2196It will be deeply copied. Because C<dclone> takes and returns references,
2197you'd have to add extra punctuation if you had a hash of arrays that
2198you wanted to copy.
68dc0745 2199
ac9dac7f 2200 %newhash = %{ dclone(\%oldhash) };
68dc0745 2201
2202=head2 How do I define methods for every class/object?
2203
ac9dac7f 2204Use the C<UNIVERSAL> class (see L<UNIVERSAL>).
68dc0745 2205
2206=head2 How do I verify a credit card checksum?
2207
ac9dac7f 2208Get the C<Business::CreditCard> module from CPAN.
68dc0745 2209
65acb1b1 2210=head2 How do I pack arrays of doubles or floats for XS code?
2211
ac9dac7f 2212The kgbpack.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2213If you're doing a lot of float or double processing, consider using
ac9dac7f 2214the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2215
500071f4 2216=head1 REVISION
2217
e573f903 2218Revision: $Revision: 7954 $
500071f4 2219
e573f903 2220Date: $Date: 2006-10-16 22:07:58 +0200 (lun, 16 oct 2006) $
500071f4 2221
2222See L<perlfaq> for source control details and availability.
2223
68dc0745 2224=head1 AUTHOR AND COPYRIGHT
2225
58103a2e 2226Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
7678cced 2227other authors as noted. All rights reserved.
5a964f20 2228
5a7beb56 2229This documentation is free; you can redistribute it and/or modify it
2230under the same terms as Perl itself.
5a964f20 2231
2232Irrespective of its distribution, all code examples in this file
2233are hereby placed into the public domain. You are permitted and
2234encouraged to use this code in your own programs for fun
2235or for profit as you see fit. A simple comment in the code giving
2236credit would be courteous but is not required.