=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 1.43 $, $Date: 2003/02/23 20:25:09 $)
+perlfaq4 - Data Manipulation ($Revision: 1.60 $, $Date: 2005/02/14 18:24:01 $)
=head1 DESCRIPTION
my $number = sprintf "%.2f", 10/3;
+=head2 Why is int() broken?
+
+Your int() is most probably working just fine. It's the numbers that
+aren't quite what you think.
+
+First, see the above item "Why am I getting long decimals
+(eg, 19.9499999999999) instead of the numbers I should be getting
+(eg, 19.95)?".
+
+For example, this
+
+ print int(0.6/0.2-2), "\n";
+
+will in most computers print 0, not 1, because even such simple
+numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
+numbers. What you think in the above as 'three' is really more like
+2.9999999999999995559.
+
=head2 Why isn't my octal data interpreted correctly?
Perl only understands octal and hex numbers as such when they occur as
machines) will work pretty much like mathematical integers. Other numbers
are not guaranteed.
-=head2 How do I convert between numeric representations?
+=head2 How do I convert between numeric representations/bases/radixes?
As always with Perl there is more than one way to do it. Below
are a few examples of approaches to making common conversions
Using perl's built in conversion of 0x notation:
- $int = 0xDEADBEEF;
- $dec = sprintf("%d", $int);
+ $dec = 0xDEADBEEF;
Using the hex function:
- $int = hex("DEADBEEF");
- $dec = sprintf("%d", $int);
+ $dec = hex("DEADBEEF");
Using pack:
- $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
- $dec = sprintf("%d", $int);
+ $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
Using the CPAN module Bit::Vector:
Using sprintf:
- $hex = sprintf("%X", 3735928559);
+ $hex = sprintf("%X", 3735928559); # upper case A-F
+ $hex = sprintf("%x", 3735928559); # lower case a-f
-Using unpack
+Using unpack:
$hex = unpack("H*", pack("N", 3735928559));
-Using Bit::Vector
+Using Bit::Vector:
use Bit::Vector;
$vec = Bit::Vector->new_Dec(32, -559038737);
Using Perl's built in conversion of numbers with leading zeros:
- $int = 033653337357; # note the leading 0!
- $dec = sprintf("%d", $int);
+ $dec = 033653337357; # note the leading 0!
Using the oct function:
- $int = oct("33653337357");
- $dec = sprintf("%d", $int);
+ $dec = oct("33653337357");
Using Bit::Vector:
$oct = sprintf("%o", 3735928559);
-Using Bit::Vector
+Using Bit::Vector:
use Bit::Vector;
$vec = Bit::Vector->new_Dec(32, -559038737);
Perl 5.6 lets you write binary numbers directly with
the 0b notation:
- $number = 0b10110110;
+ $number = 0b10110110;
+
+Using oct:
+
+ my $input = "10110110";
+ $decimal = oct( "0b$input" );
-Using pack and ord
+Using pack and ord:
$decimal = ord(pack('B8', '10110110'));
-Using pack and unpack for larger strings
+Using pack and unpack for larger strings:
$int = unpack("N", pack("B32",
substr("0" x 32 . "11110101011011011111011101111", -32)));
=item How do I convert from decimal to binary
-Using unpack;
+Using sprintf (perl 5.6+):
+
+ $bin = sprintf("%b", 3735928559);
+
+Using unpack:
$bin = unpack("B*", pack("N", 3735928559));
=head2 How do I get a random number between X and Y?
-Use the following simple function. It selects a random integer between
-(and possibly including!) the two given integers, e.g.,
-C<random_int_in(50,120)>
+C<rand($x)> returns a number such that
+C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
+figure out is a random number in the range from 0 to the
+difference between your I<X> and I<Y>.
+
+That is, to get a number between 10 and 15, inclusive, you
+want a random number between 0 and 5 that you can then add
+to 10.
+
+ my $number = 10 + int rand( 15-10+1 );
+
+Hence you derive the following simple function to abstract
+that. It selects a random integer between the two given
+integers (inclusive), For example: C<random_int_in(50,120)>.
sub random_int_in ($$) {
my($min, $max) = @_;
argument localtime uses the current time.
$day_of_year = (localtime)[7];
-
+
The POSIX module can also format a date as the day of the year or
week of the year.
To get the day of year for any date, use the Time::Local module to get
a time in epoch seconds for the argument to localtime.
-
+
use POSIX qw/strftime/;
use Time::Local;
my $week_of_year = strftime "%W",
use Date::Calc;
my $day_of_year = Day_of_Year( 1987, 12, 18 );
my $week_of_year = Week_of_Year( 1987, 12, 18 );
-
+
=head2 How do I find the current century or millennium?
Use the following simple functions:
return 1+int((((localtime(shift || time))[5] + 1899))/1000);
}
-You can also use the POSIX strftime() function which may be a bit
-slower but is easier to read and maintain.
-
- use POSIX qw/strftime/;
-
- my $week_of_the_year = strftime "%W", localtime;
- my $day_of_the_year = strftime "%j", localtime;
-
On some systems, the POSIX module's strftime() function has
been extended in a non-standard way to use a C<%C> format,
which they sometimes claim is the "century". It isn't,
=head2 How can I find the Julian Day?
-Use the Time::JulianDay module (part of the Time-modules bundle
-available from CPAN.)
-
-Before you immerse yourself too deeply in this, be sure to verify that
-it is the I<Julian> Day you really want. Are you interested in a way
-of getting serial days so that you just can tell how many days they
-are apart or so that you can do also other date arithmetic? If you
-are interested in performing date arithmetic, this can be done using
-modules Date::Manip or Date::Calc.
-
-There is too many details and much confusion on this issue to cover in
-this FAQ, but the term is applied (correctly) to a calendar now
-supplanted by the Gregorian Calendar, with the Julian Calendar failing
-to adjust properly for leap years on centennial years (among other
-annoyances). The term is also used (incorrectly) to mean: [1] days in
-the Gregorian Calendar; and [2] days since a particular starting time
-or `epoch', usually 1970 in the Unix world and 1980 in the
-MS-DOS/Windows world. If you find that it is not the first meaning
-that you really want, then check out the Date::Manip and Date::Calc
-modules. (Thanks to David Cassell for most of this text.)
+(contributed by brian d foy and Dave Cross)
+
+You can use the Time::JulianDay module available on CPAN. Ensure that
+you really want to find a Julian day, though, as many people have
+different ideas about Julian days. See
+http://www.hermetic.ch/cal_stud/jdn.htm for instance.
+
+You can also try the DateTime module, which can convert a date/time
+to a Julian Day.
+
+ $ perl -MDateTime -le'print DateTime->today->jd'
+ 2453401.5
+
+Or the modified Julian Day
+
+ $ perl -MDateTime -le'print DateTime->today->mjd'
+ 53401
+
+Or even the day of the year (which is what some people think of as a
+Julian day)
+
+ $ perl -MDateTime -le'print DateTime->today->doy'
+ 31
=head2 How do I find yesterday's date?
That doesn't mean that Perl can't be used to create non-Y2K compliant
programs. It can. But so can your pencil. It's the fault of the user,
not the language. At the risk of inflaming the NRA: ``Perl doesn't
-break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
+break Y2K, people do.'' See http://www.perl.org/about/y2k.html for
a longer exposition.
=head1 Data: Strings
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
-See also ``How can I expand variables in text strings?'' in this
-section of the FAQ.
-
=head2 How do I find matching/nesting anything?
This isn't something that can be done in one regular expression, no
capitalization of the movie I<Dr. Strangelove or: How I Learned to
Stop Worrying and Love the Bomb>, for example.
+Damian Conway's L<Text::Autoformat> module provides some smart
+case transformations:
+
+ use Text::Autoformat;
+ my $x = "Dr. Strangelove or: How I Learned to Stop ".
+ "Worrying and Love the Bomb";
+
+ print $x, "\n";
+ for my $style (qw( sentence title highlight ))
+ {
+ print autoformat($x, { case => $style }), "\n";
+ }
+
=head2 How can I split a [character] delimited string except when inside [character]?
Several modules can handle this sort of pasing---Text::Balanced,
-Text::CVS, Text::CVS_XS, and Text::ParseWords, among others.
+Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
Take the example case of trying to split a string that is
comma-separated into its different fields. You can't use C<split(/,/)>
=head2 How do I find the soundex value of a string?
-Use the standard Text::Soundex module distributed with Perl.
-Before you do so, you may want to determine whether `soundex' is in
-fact what you think it is. Knuth's soundex algorithm compresses words
-into a small space, and so it does not necessarily distinguish between
-two words which you might want to appear separately. For example, the
-last names `Knuth' and `Kant' are both mapped to the soundex code K530.
-If Text::Soundex does not do what you are looking for, you might want
-to consider the String::Approx module available at CPAN.
+(contributed by brian d foy)
+
+You can use the Text::Soundex module. If you want to do fuzzy or close
+matching, you might also try the String::Approx, and Text::Metaphone,
+and Text::DoubleMetaphone modules.
=head2 How can I expand variables in text strings?
-Let's assume that you have a string like:
+Let's assume that you have a string that contains placeholder
+variables.
$text = 'this has a $foo in it and a $bar';
-If those were both global variables, then this would
-suffice:
+You can use a substitution with a double evaluation. The
+first /e turns C<$1> into C<$foo>, and the second /e turns
+C<$foo> into its value. You may want to wrap this in an
+C<eval>: if you try to get the value of an undeclared variable
+while running under C<use strict>, you get a fatal error.
- $text =~ s/\$(\w+)/${$1}/g; # no /e needed
-
-But since they are probably lexicals, or at least, they could
-be, you'd have to do this:
-
- $text =~ s/(\$\w+)/$1/eeg;
- die if $@; # needed /ee, not /e
+ eval { $text =~ s/(\$\w+)/$1/eeg };
+ die if $@;
It's probably better in the general case to treat those
variables as entries in some special hash. For example:
);
$text =~ s/\$(\w+)/$user_defs{$1}/g;
-See also ``How do I expand function calls in a string?'' in this section
-of the FAQ.
-
=head2 What's wrong with always quoting "$vars"?
The problem is that those double-quotes force stringification--
Use the rand() function (see L<perlfunc/rand>):
- # at the top of the program:
- srand; # not needed for 5.004 and later
-
- # then later on
$index = rand @array;
$element = $array[$index];
-Make sure you I<only call srand once per program, if then>.
-If you are calling it more than once (such as before each
-call to rand), you're almost certainly doing something wrong.
+Or, simply:
+ my $element = $array[ rand @array ];
=head2 How do I permute N elements of a list?
This can be conveniently combined with precalculation of keys as given
above.
-See the F<sort> artitcle article in the "Far More Than You Ever Wanted
+See the F<sort> article in the "Far More Than You Ever Wanted
To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
more about this approach.
=head2 What happens if I add or remove keys from a hash while iterating over it?
-Don't do that. :-)
+(contributed by brian d foy)
-[lwall] In Perl 4, you were not allowed to modify a hash at all while
-iterating over it. In Perl 5 you can delete from it, but you still
-can't add to it, because that might cause a doubling of the hash table,
-in which half the entries get copied up to the new top half of the
-table, at which point you've totally bamboozled the iterator code.
-Even if the table doesn't double, there's no telling whether your new
-entry will be inserted before or after the current iterator position.
+The easy answer is "Don't do that!"
-Either treasure up your changes and make them after the iterator finishes
-or use keys to fetch all the old keys at once, and iterate over the list
-of keys.
+If you iterate through the hash with each(), you can delete the key
+most recently returned without worrying about it. If you delete or add
+other keys, the iterator may skip or double up on them since perl
+may rearrange the hash table. See the
+entry for C<each()> in L<perlfunc>.
=head2 How do I look up a hash element by value?
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
-You can also use the L<Data::Types|Data::Types> module on
-the CPAN, which exports functions that validate data types
-using these and other regular expressions, or you can use
-the C<Regexp::Common> module from CPAN which has regular
-expressions to match various types of numbers.
-
-If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
+There are also some commonly used modules for the task.
+L<Scalar::Util> (distributed with 5.8) provides access to perl's
+internal function C<looks_like_number> for determining
+whether a variable looks like a number. L<Data::Types>
+exports functions that validate data types using both the
+above and other regular expressions. Thirdly, there is
+C<Regexp::Common> which has regular expressions to match
+various types of numbers. Those three modules are available
+from the CPAN.
+
+If you're on a POSIX system, Perl supports the C<POSIX::strtod>
function. Its semantics are somewhat cumbersome, so here's a C<getnum>
wrapper function for more convenient access. This function takes
a string and returns the number it found, or C<undef> for input that
sub is_numeric { defined getnum($_[0]) }
-Or you could check out the L<String::Scanf|String::Scanf> module on the CPAN
+Or you could check out the L<String::Scanf> module on the CPAN
instead. The POSIX module (part of the standard Perl distribution) provides
the C<strtod> and C<strtol> for converting strings to double and longs,
respectively.
=head2 How do I print out or copy a recursive data structure?
The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
-for printing out data structures. The Storable module, found on CPAN,
-provides a function called C<dclone> that recursively copies its argument.
+for printing out data structures. The Storable module on CPAN (or the
+5.8 release of Perl), provides a function called C<dclone> that recursively
+copies its argument.
use Storable qw(dclone);
$r2 = dclone($r1);
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
-All rights reserved.
+Copyright (c) 1997-2005 Tom Christiansen, Nathan Torkington, and
+other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
under the same terms as Perl itself.