=head1 DESCRIPTION
-The section of the FAQ answers question related to the manipulation
+The section of the FAQ answers questions related to the manipulation
of data as numbers, dates, strings, arrays, hashes, and miscellaneous
data issues.
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
The infinite set that a mathematician thinks of as the real numbers can
-only be approximate on a computer, since the computer only has a finite
+only be approximated on a computer, since the computer only has a finite
number of bits to store an infinite number of, um, numbers.
Internally, your computer represents floating-point numbers in binary.
Floating-point numbers read in from a file or appearing as literals
in your program are converted from their decimal floating-point
-representation (eg, 19.95) to the internal binary representation.
+representation (eg, 19.95) to an internal binary representation.
However, 19.95 can't be precisely represented as a binary
floating-point number, just like 1/3 can't be exactly represented as a
When a floating-point number gets printed, the binary floating-point
representation is converted back to decimal. These decimal numbers
are displayed in either the format you specify with printf(), or the
-current output format for numbers (see L<perlvar/"$#"> if you use
+current output format for numbers. (See L<perlvar/"$#"> if you use
print. C<$#> has a different default value in Perl5 than it did in
Perl4. Changing C<$#> yourself is deprecated.)
(part of the standard Perl distribution), but mathematical operations
are consequently slower.
+If precision is important, such as when dealing with money, it's good
+to work with integers and then divide at the last possible moment.
+For example, work in pennies (1995) instead of dollars and cents
+(19.95) and divide by 100 at the end.
+
To get rid of the superfluous digits, just use a format (eg,
C<printf("%.2f", 19.95)>) to get the required precision.
See L<perlop/"Floating-point Arithmetic">.
$ceil = ceil(3.5); # 4
$floor = floor(3.5); # 3
-In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
+In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
module. With 5.004, the Math::Trig module (part of the standard Perl
distribution) implements the trigonometric functions. Internally it
uses the Math::Complex module and some functions can break out from
Computers are good at being predictable and bad at being random
(despite appearances caused by bugs in your programs :-).
-http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
-Phoenix, talks more about this.. John von Neumann said, ``Anyone who
+http://www.perl.com/CPAN/doc/FMTEYEWTK/random , courtesy of Tom
+Phoenix, talks more about this. John von Neumann said, ``Anyone who
attempts to generate random numbers by deterministic means is, of
course, living in a state of sin.''
$day_of_year = (localtime(time()))[7];
-or more legibly (in 5.004 or higher):
+or more legibly (in 5.7.1 or higher):
- use Time::localtime;
- $day_of_year = localtime(time())->yday;
+ use Time::Piece;
+ $day_of_year = localtime->day_of_year();
-You can find the week of the year by dividing this by 7:
+You can find the week of the year by using Time::Piece's strftime():
- $week_of_year = int($day_of_year / 7);
+ $week_of_year = localtime->strftime("%U");
+ $iso_week = localtime->strftime("%V");
-Of course, this believes that weeks start at zero. The Date::Calc
-module from CPAN has a lot of date calculation functions, including
-day of the year, week of the year, and so on. Note that not
-all businesses consider ``week 1'' to be the same; for example,
-American businesses often consider the first week with a Monday
-in it to be Work Week #1, despite ISO 8601, which considers
-WW1 to be the first week with a Thursday in it.
+The difference between %U and %V is that %U assumes that the first day
+of week 1 is the first Sunday of the year, whereas ISO 8601:1988 uses
+the first week that has at least 4 days in the current year, and with
+Monday as the first day of the week. You can also use %W, which will
+return the week of the year with Monday as the first day of week 1. See
+your strftime(3) man page for more details.
=head2 How do I find the current century or millennium?
Date::Calc modules from CPAN before you go hacking up your own parsing
routine to handle arbitrary date formats.
+Also note that the core module Time::Piece overloads the addition and
+subtraction operators to provide date calculation options. See
+L<Time::Piece/Date Calculations>.
+
=head2 How can I take a string and turn it into epoch seconds?
If it's a regular enough string that it always has the same format,
=head2 How can I find the Julian Day?
-Use the Time::JulianDay module (part of the Time-modules bundle
-available from CPAN.)
+Use Time::Piece as follows:
+
+ use Time::Piece;
+ my $julian_day = localtime->julian_day;
+ my $mjd = localtime->mjd; # modified julian day
-Before you immerse yourself too deeply in this, be sure to verify that it
-is the I<Julian> Day you really want. Are they really just interested in
-a way of getting serial days so that they can do date arithmetic? If you
+Before you immerse yourself too deeply in this, be sure to verify that
+it is the I<Julian> Day you really want. Are you interested in a way
+of getting serial days so that you just can tell how many days they
+are apart or so that you can do also other date arithmetic? If you
are interested in performing date arithmetic, this can be done using
-either Date::Manip or Date::Calc, without converting to Julian Day first.
-
-There is too much confusion on this issue to cover in this FAQ, but the
-term is applied (correctly) to a calendar now supplanted by the Gregorian
-Calendar, with the Julian Calendar failing to adjust properly for leap
-years on centennial years (among other annoyances). The term is also used
-(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days
-since a particular starting time or `epoch', usually 1970 in the Unix
-world and 1980 in the MS-DOS/Windows world. If you find that it is not
-the first meaning that you really want, then check out the Date::Manip
-and Date::Calc modules. (Thanks to David Cassell for most of this text.)
+Time::Piece (standard module since Perl 5.8), or by modules
+Date::Manip or Date::Calc.
+
+There is too many details and much confusion on this issue to cover in
+this FAQ, but the term is applied (correctly) to a calendar now
+supplanted by the Gregorian Calendar, with the Julian Calendar failing
+to adjust properly for leap years on centennial years (among other
+annoyances). The term is also used (incorrectly) to mean: [1] days in
+the Gregorian Calendar; and [2] days since a particular starting time
+or `epoch', usually 1970 in the Unix world and 1980 in the
+MS-DOS/Windows world. If you find that it is not the first meaning
+that you really want, then check out the Date::Manip and Date::Calc
+modules. (Thanks to David Cassell for most of this text.)
=head2 How do I find yesterday's date?
Then you can pass this to C<localtime()> and get the individual year,
month, day, hour, minute, seconds values.
+Alternatively, you can use Time::Piece to subtract a day from the value
+returned from C<localtime()>:
+
+ use Time::Piece;
+ use Time::Seconds; # imports seconds constants, like ONE_DAY
+ my $today = localtime();
+ my $yesterday = $today - ONE_DAY;
+
Note very carefully that the code above assumes that your days are
twenty-four hours each. For most people, there are two days a year
when they aren't: the switch to and from summer time throws this off.
It depends just what you mean by ``escape''. URL escapes are dealt
with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
-character are removed with:
+character are removed with
s/\\(.)/$1/g;
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There are
the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
-and the byacc program.
+and the byacc program. Starting from perl 5.8 the Text::Balanced
+is part of the standard distribution.
One simple destructive, inside-out approach that you might try is to
pull out the smallest nesting parts one at a time:
substr($a, 0, 3) = "Tom";
Although those with a pattern matching kind of thought process will
-likely prefer:
+likely prefer
$a =~ s/^.../Tom/;
=head2 How can I count the number of occurrences of a substring within a string?
-There are a number of ways, with varying efficiency: If you want a
+There are a number of ways, with varying efficiency. If you want a
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:
$line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning "C<don't do it>" into "C<Don'T
-Do It>". Sometimes you might want this, instead (Suggested by brian d.
-foy):
+Do It>". Sometimes you might want this. Other times you might need a
+more thorough solution (Suggested by brian d. foy):
$string =~ s/ (
(^\w) #at the beginning of the line
use Text::ParseWords;
@new = quotewords(",", 0, $text);
-There's also a Text::CSV module on CPAN.
+There's also a Text::CSV (Comma-Separated Values) module on CPAN.
=head2 How do I strip blank space from the beginning/end of a string?
-Although the simplest approach would seem to be:
+Although the simplest approach would seem to be
$string =~ s/^\s*(.*?)\s*$/$1/;
-Not only is this unnecessarily slow and destructive, it also fails with
+not only is this unnecessarily slow and destructive, it also fails with
embedded newlines. It is much faster to do this operation in two steps:
$string =~ s/^\s+//;
=head2 How do I find the soundex value of a string?
Use the standard Text::Soundex module distributed with Perl.
-But before you do so, you may want to determine whether `soundex' is in
+Before you do so, you may want to determine whether `soundex' is in
fact what you think it is. Knuth's soundex algorithm compresses words
into a small space, and so it does not necessarily distinguish between
two words which you might want to appear separately. For example, the
=head2 What's wrong with always quoting "$vars"?
-The problem is that those double-quotes force stringification,
-coercing numbers and references into strings, even when you
-don't want them to be. Think of it this way: double-quote
+The problem is that those double-quotes force stringification--
+coercing numbers and references into strings--even when you
+don't want them to be strings. Think of it this way: double-quote
expansion is used to produce new strings. If you already
have a string, why do you need more?
A nice general-purpose fixer-upper function for indented here documents
follows. It expects to be called with a here document as its argument.
It looks to see whether each line begins with a common substring, and
-if so, strips that off. Otherwise, it takes the amount of leading
-white space found on the first line and removes that much off each
+if so, strips that substring off. Otherwise, it takes the amount of leading
+whitespace found on the first line and removes that much off each
subsequent line.
sub fix {
local $_ = shift;
- my ($white, $leader); # common white space and common leading string
+ my ($white, $leader); # common whitespace and common leading string
if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
($white, $leader) = ($2, quotemeta($1));
} else {
@@@ }
MAIN_INTERPRETER_LOOP
-Or with a fixed amount of leading white space, with remaining
+Or with a fixed amount of leading whitespace, with remaining
indentation correctly preserved:
$poem = fix<<EVER_ON_AND_ON;
context, you initialize arrays with lists, and you foreach() across
a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
in scalar context behave like the number of elements in them, subroutines
-access their arguments through the array C<@_>, push/pop/shift only work
+access their arguments through the array C<@_>, and push/pop/shift only work
on arrays.
As a side note, there's no such thing as a list in scalar context.
=head2 What is the difference between $array[1] and @array[1]?
-The former is a scalar value, the latter an array slice, which makes
+The former is a scalar value; the latter an array slice, making
it a list with one (scalar) value. You should use $ when you want a
scalar value (most of the time) and @ when you want a list with one
scalar value in it (very, very rarely; nearly never, in fact).
=over 4
-=item a) If @in is sorted, and you want @out to be sorted:
+=item a)
+
+If @in is sorted, and you want @out to be sorted:
(this assumes all true values in the array)
- $prev = 'nonesuch';
- @out = grep($_ ne $prev && ($prev = $_), @in);
+ $prev = "not equal to $in[0]";
+ @out = grep($_ ne $prev && ($prev = $_, 1), @in);
This is nice in that it doesn't use much extra memory, simulating
-uniq(1)'s behavior of removing only adjacent duplicates. It's less
-nice in that it won't work with false values like undef, 0, or "";
-"0 but true" is OK, though.
+uniq(1)'s behavior of removing only adjacent duplicates. The ", 1"
+guarantees that the expression is true (so that grep picks it up)
+even if the $_ is 0, "", or undef.
+
+=item b)
-=item b) If you don't know whether @in is sorted:
+If you don't know whether @in is sorted:
undef %saw;
@out = grep(!$saw{$_}++, @in);
-=item c) Like (b), but @in contains only small integers:
+=item c)
+
+Like (b), but @in contains only small integers:
@out = grep(!$saw[$_]++, @in);
-=item d) A way to do (b) without any loops or greps:
+=item d)
+
+A way to do (b) without any loops or greps:
undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired
-=item e) Like (d), but @in contains only small positive integers:
+=item e)
+
+Like (d), but @in contains only small positive integers:
undef @ary;
@ary[@in] = @in;
Please do not use
- $is_there = grep $_ eq $whatever, @array;
+ ($is_there) = grep $_ eq $whatever, @array;
or worse yet
- $is_there = grep /$whatever/, @array;
+ ($is_there) = grep /$whatever/, @array;
These are slow (checks every element even if the first matches),
inefficient (same reason), and potentially buggy (what if there are
}
Note that this is the I<symmetric difference>, that is, all elements in
-either A or in B, but not in both. Think of it as an xor operation.
+either A or in B but not in both. Think of it as an xor operation.
=head2 How do I test whether two arrays or hashes are equal?
}
print "\n";
-You could grow the list this way:
+You could add to the list this way:
my ($head, $tail);
$tail = append($head, 1); # grow a new head
my $i;
for ($i = @$array; --$i; ) {
my $j = int rand ($i+1);
- next if $i == $j;
@$array[$i,$j] = @$array[$j,$i];
}
}
fisher_yates_shuffle( \@array ); # permutes @array in place
You've probably seen shuffling algorithms that work using splice,
-randomly picking another element to swap the current element with:
+randomly picking another element to swap the current element with
srand;
@new = ();
}
@sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
-Which could also be written this way, using a trick
+which could also be written this way, using a trick
that's come to be known as the Schwartzian Transform:
@sorted = map { $_->[0] }
Even if the table doesn't double, there's no telling whether your new
entry will be inserted before or after the current iterator position.
-Either treasure up your changes and make them after the iterator finishes,
+Either treasure up your changes and make them after the iterator finishes
or use keys to fetch all the old keys at once, and iterate over the list
of keys.
$num_keys = scalar keys %hash;
-In void context, the keys() function just resets the iterator, which is
+The keys() function also resets the iterator, which in void context is
faster for tied hashes than would be iterating through the whole
hash, one key-value pair at a time.
} keys %hash; # and by value
Here we'll do a reverse numeric sort by value, and if two keys are
-identical, sort by length of key, and if that fails, by straight ASCII
-comparison of the keys (well, possibly modified by your locale -- see
+identical, sort by length of key, or if that fails, by straight ASCII
+comparison of the keys (well, possibly modified by your locale--see
L<perllocale>).
@keys = sort {
For some specific applications, you can use one of the DBM modules.
See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
-Storable, or Class::Eroot modules from CPAN. Here's one example using
+Storable, or Class::Eroot modules from CPAN. Starting from Perl 5.8
+Storable is part of the standard distribution. Here's one example using
Storable's C<store> and C<retrieve> functions:
use Storable;