=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 1.24 $, $Date: 2002/05/20 16:50:08 $)
+perlfaq4 - Data Manipulation ($Revision: 1.37 $, $Date: 2002/11/13 06:04:00 $)
=head1 DESCRIPTION
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
-The infinite set that a mathematician thinks of as the real numbers can
-only be approximated on a computer, since the computer only has a finite
-number of bits to store an infinite number of, um, numbers.
-
-Internally, your computer represents floating-point numbers in binary.
-Floating-point numbers read in from a file or appearing as literals
-in your program are converted from their decimal floating-point
-representation (eg, 19.95) to an internal binary representation.
-
-However, 19.95 can't be precisely represented as a binary
-floating-point number, just like 1/3 can't be exactly represented as a
-decimal floating-point number. The computer's binary representation
-of 19.95, therefore, isn't exactly 19.95.
-
-When a floating-point number gets printed, the binary floating-point
-representation is converted back to decimal. These decimal numbers
-are displayed in either the format you specify with printf(), or the
-current output format for numbers. (See L<perlvar/"$#"> if you use
-print. C<$#> has a different default value in Perl5 than it did in
-Perl4. Changing C<$#> yourself is deprecated.)
-
-This affects B<all> computer languages that represent decimal
-floating-point numbers in binary, not just Perl. Perl provides
-arbitrary-precision decimal numbers with the Math::BigFloat module
-(part of the standard Perl distribution), but mathematical operations
-are consequently slower.
-
-If precision is important, such as when dealing with money, it's good
-to work with integers and then divide at the last possible moment.
-For example, work in pennies (1995) instead of dollars and cents
-(19.95) and divide by 100 at the end.
-
-To get rid of the superfluous digits, just use a format (eg,
-C<printf("%.2f", 19.95)>) to get the required precision.
-See L<perlop/"Floating-point Arithmetic">.
+Internally, your computer represents floating-point numbers
+in binary. Digital (as in powers of two) computers cannot
+store all numbers exactly. Some real numbers lose precision
+in the process. This is a problem with how computers store
+numbers and affects all computer languages, not just Perl.
+L<perlnumber> show the gory details of number
+representations and conversions.
+
+To limit the number of decimal places in your numbers, you
+can use the printf or sprintf function. See the
+L<perlop|"Floating Point Arithmetic"> for more details.
+
+ printf "%.2f", 10/3;
+
+ my $number = sprintf "%.2f", 10/3;
+
=head2 Why isn't my octal data interpreted correctly?
-Perl only understands octal and hex numbers as such when they occur
-as literals in your program. Octal literals in perl must start with
-a leading "0" and hexadecimal literals must start with a leading "0x".
-If they are read in from somewhere and assigned, no automatic
-conversion takes place. You must explicitly use oct() or hex() if you
-want the values converted to decimal. oct() interprets
-both hex ("0x350") numbers and octal ones ("0350" or even without the
-leading "0", like "377"), while hex() only converts hexadecimal ones,
-with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
+Perl only understands octal and hex numbers as such when they occur as
+literals in your program. Octal literals in perl must start with a
+leading "0" and hexadecimal literals must start with a leading "0x".
+If they are read in from somewhere and assigned, no automatic
+conversion takes place. You must explicitly use oct() or hex() if you
+want the values converted to decimal. oct() interprets hex ("0x350"),
+octal ("0350" or even without the leading "0", like "377") and binary
+("0b1010") numbers, while hex() only converts hexadecimal ones, with
+or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
The inverse mapping from decimal to octal can be done with either the
-"%o" or "%O" sprintf() formats. To get from decimal to hex try either
-the "%x" or the "%X" formats to sprintf().
+"%o" or "%O" sprintf() formats.
This problem shows up most often when people try using chmod(), mkdir(),
umask(), or sysopen(), which by widespread tradition typically take
optimized for speed on some operations, and for at least some
programmers the notation might be familiar.
-=item B<How do I convert hexadecimal into decimal:>
+=over 4
+
+=item How do I convert hexadecimal into decimal
Using perl's built in conversion of 0x notation:
$vec = Bit::Vector->new_Hex(32, "DEADBEEF");
$dec = $vec->to_Dec();
-=item B<How do I convert from decimal to hexadecimal:>
+=item How do I convert from decimal to hexadecimal
Using sprint:
$vec->Resize(32); # suppress leading 0 if unwanted
$hex = $vec->to_Hex();
-=item B<How do I convert from octal to decimal:>
+=item How do I convert from octal to decimal
Using Perl's built in conversion of numbers with leading zeros:
$vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
$dec = $vec->to_Dec();
-=item B<How do I convert from decimal to octal:>
+=item How do I convert from decimal to octal
Using sprintf:
$vec = Bit::Vector->new_Dec(32, -559038737);
$oct = reverse join('', $vec->Chunk_List_Read(3));
-=item B<How do I convert from binary to decimal:>
+=item How do I convert from binary to decimal
Perl 5.6 lets you write binary numbers directly with
the 0b notation:
$vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
$dec = $vec->to_Dec();
-=item B<How do I convert from decimal to binary:>
+=item How do I convert from decimal to binary
Using unpack;
The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
are left as an exercise to the inclined reader.
+=back
=head2 Why doesn't & work the way I want it to?
(the number C<3> is treated as the bit pattern C<00000011>).
So, saying C<11 & 3> performs the "and" operation on numbers (yielding
-C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
+C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
(yielding C<"1">).
Most problems with C<&> and C<|> arise because the programmer thinks
If you're using a version of Perl before 5.004, you must call C<srand>
once at the start of your program to seed the random number generator.
+
+ BEGIN { srand() if $[ < 5.004 }
+
5.004 and later automatically call C<srand> at the beginning. Don't
-call C<srand> more than once--you make your numbers less random, rather
+call C<srand> more than once---you make your numbers less random, rather
than more.
Computers are good at being predictable and bad at being random
(despite appearances caused by bugs in your programs :-). see the
-F<random> artitcle in the "Far More Than You Ever Wanted To Know"
-collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of
+F<random> article in the "Far More Than You Ever Wanted To Know"
+collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
who attempts to generate random numbers by deterministic means is, of
course, living in a state of sin.''
return 1+int((((localtime(shift || time))[5] + 1899))/1000);
}
-On some systems, you'll find that the POSIX module's strftime() function
-has been extended in a non-standard way to use a C<%C> format, which they
-sometimes claim is the "century". It isn't, because on most such systems,
-this is only the first two digits of the four-digit year, and thus cannot
-be used to reliably determine the current century or millennium.
+You can also use the POSIX strftime() function which may be a bit
+slower but is easier to read and maintain.
+
+ use POSIX qw/strftime/;
+
+ my $week_of_the_year = strftime "%W", localtime;
+ my $day_of_the_year = strftime "%j", localtime;
+
+On some systems, the POSIX module's strftime() function has
+been extended in a non-standard way to use a C<%C> format,
+which they sometimes claim is the "century". It isn't,
+because on most such systems, this is only the first two
+digits of the four-digit year, and thus cannot be used to
+reliably determine the current century or millennium.
=head2 How can I compare two dates and find the difference?
=head2 How do I find yesterday's date?
-The C<time()> function returns the current time in seconds since the
-epoch. Take twenty-four hours off that:
-
- $yesterday = time() - ( 24 * 60 * 60 );
+If you only need to find the date (and not the same time), you
+can use the Date::Calc module.
-Then you can pass this to C<localtime()> and get the individual year,
-month, day, hour, minute, seconds values.
+ use Date::Calc qw(Today Add_Delta_Days);
+
+ my @date = Add_Delta_Days( Today(), -1 );
+
+ print "@date\n";
-Note very carefully that the code above assumes that your days are
-twenty-four hours each. For most people, there are two days a year
-when they aren't: the switch to and from summer time throws this off.
-A solution to this issue is offered by Russ Allbery.
+Most people try to use the time rather than the calendar to
+figure out dates, but that assumes that your days are
+twenty-four hours each. For most people, there are two days
+a year when they aren't: the switch to and from summer time
+throws this off. Russ Allbery offers this solution.
sub yesterday {
- my $now = defined $_[0] ? $_[0] : time;
- my $then = $now - 60 * 60 * 24;
- my $ndst = (localtime $now)[8] > 0;
- my $tdst = (localtime $then)[8] > 0;
- $then - ($tdst - $ndst) * 60 * 60;
- }
- # Should give you "this time yesterday" in seconds since epoch relative to
- # the first argument or the current time if no argument is given and
- # suitable for passing to localtime or whatever else you need to do with
- # it. $ndst is whether we're currently in daylight savings time; $tdst is
- # whether the point 24 hours ago was in daylight savings time. If $tdst
- # and $ndst are the same, a boundary wasn't crossed, and the correction
- # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
- # from yesterday's time since we gained an extra hour while going off
- # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
- # negative hour (add an hour) to yesterday's time since we lost an hour.
- #
- # All of this is because during those days when one switches off or onto
- # DST, a "day" isn't 24 hours long; it's either 23 or 25.
- #
- # The explicit settings of $ndst and $tdst are necessary because localtime
- # only says it returns the system tm struct, and the system tm struct at
- # least on Solaris doesn't guarantee any particular positive value (like,
- # say, 1) for isdst, just a positive value. And that value can
- # potentially be negative, if DST information isn't available (this sub
- # just treats those cases like no DST).
- #
- # Note that between 2am and 3am on the day after the time zone switches
- # off daylight savings time, the exact hour of "yesterday" corresponding
- # to the current hour is not clearly defined. Note also that if used
- # between 2am and 3am the day after the change to daylight savings time,
- # the result will be between 3am and 4am of the previous day; it's
- # arguable whether this is correct.
- #
- # This sub does not attempt to deal with leap seconds (most things don't).
- #
- # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
- # This code is in the public domain
+ my $now = defined $_[0] ? $_[0] : time;
+ my $then = $now - 60 * 60 * 24;
+ my $ndst = (localtime $now)[8] > 0;
+ my $tdst = (localtime $then)[8] > 0;
+ $then - ($tdst - $ndst) * 60 * 60;
+ }
+
+Should give you "this time yesterday" in seconds since epoch relative to
+the first argument or the current time if no argument is given and
+suitable for passing to localtime or whatever else you need to do with
+it. $ndst is whether we're currently in daylight savings time; $tdst is
+whether the point 24 hours ago was in daylight savings time. If $tdst
+and $ndst are the same, a boundary wasn't crossed, and the correction
+will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
+from yesterday's time since we gained an extra hour while going off
+daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
+negative hour (add an hour) to yesterday's time since we lost an hour.
+
+All of this is because during those days when one switches off or onto
+DST, a "day" isn't 24 hours long; it's either 23 or 25.
+
+The explicit settings of $ndst and $tdst are necessary because localtime
+only says it returns the system tm struct, and the system tm struct at
+least on Solaris doesn't guarantee any particular positive value (like,
+say, 1) for isdst, just a positive value. And that value can
+potentially be negative, if DST information isn't available (this sub
+just treats those cases like no DST).
+
+Note that between 2am and 3am on the day after the time zone switches
+off daylight savings time, the exact hour of "yesterday" corresponding
+to the current hour is not clearly defined. Note also that if used
+between 2am and 3am the day after the change to daylight savings time,
+the result will be between 3am and 4am of the previous day; it's
+arguable whether this is correct.
+
+This sub does not attempt to deal with leap seconds (most things don't).
+
+
=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
-If you prefer scalar context, similar chicanery is also useful for
-arbitrary expressions:
-
- print "That yields ${\($n + 5)} widgets\n";
-
-Version 5.004 of Perl had a bug that gave list context to the
-expression in C<${...}>, but this is fixed in version 5.005.
-
See also ``How can I expand variables in text strings?'' in this
section of the FAQ.
characters, a pattern like C</x([^x]*)x/> will get the intervening
bits in $1. For multiple ones, then something more like
C</alpha(.*?)omega/> would be needed. But none of these deals with
-nested patterns, nor can they. For that you'll have to write a
-parser.
+nested patterns. For balanced expressions using C<(>, C<{>, C<[>
+or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
+L<perlre/(??{ code })>. For other cases, you'll have to write a parser.
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There are
See the documentation for Text::Autoformat to appreciate its many
capabilities.
-=head2 How can I access/change the first N letters of a string?
-
-There are many ways. If you just want to grab a copy, use
-substr():
+=head2 How can I access or change N characters of a string?
- $first_byte = substr($a, 0, 1);
+You can access the first characters of a string with substr().
+To get the first character, for example, start at position 0
+and grab the string of length 1.
-If you want to modify part of a string, the simplest way is often to
-use substr() as an lvalue:
- substr($a, 0, 3) = "Tom";
+ $string = "Just another Perl Hacker";
+ $first_char = substr( $string, 0, 1 ); # 'J'
-Although those with a pattern matching kind of thought process will
-likely prefer
+To change part of a string, you can use the optional fourth
+argument which is the replacement string.
- $a =~ s/^.../Tom/;
+ substr( $string, 13, 4, "Perl 5.8.0" );
+
+You can also use substr() as an lvalue.
+ substr( $string, 13, 4 ) = "Perl 5.8.0";
+
=head2 How do I change the Nth occurrence of something?
You have to keep track of N yourself. For example, let's say you want
capitalization of the movie I<Dr. Strangelove or: How I Learned to
Stop Worrying and Love the Bomb>, for example.
-=head2 How can I split a [character] delimited string except when inside
-[character]? (Comma-separated files)
+=head2 How can I split a [character] delimited string except when inside [character]?
-Take the example case of trying to split a string that is comma-separated
-into its different fields. (We'll pretend you said comma-separated, not
-comma-delimited, which is different and almost never what you mean.) You
-can't use C<split(/,/)> because you shouldn't split if the comma is inside
-quotes. For example, take a data line like this:
+Several modules can handle this sort of pasing---Text::Balanced,
+Text::CVS, Text::CVS_XS, and Text::ParseWords, among others.
+
+Take the example case of trying to split a string that is
+comma-separated into its different fields. You can't use C<split(/,/)>
+because you shouldn't split if the comma is inside quotes. For
+example, take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
-problem. Thankfully, we have Jeffrey Friedl, author of a highly
-recommended book on regular expressions, to handle these for us. He
+problem. Thankfully, we have Jeffrey Friedl, author of
+I<Mastering Regular Expressions>, to handle these for us. He
suggests (assuming your string is contained in $text):
@new = ();
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
-C<"like \"this\"">. Unescaping them is a task addressed earlier in
-this section.
+C<"like \"this\"">.
Alternatively, the Text::ParseWords module (part of the standard Perl
distribution) lets you say:
=head2 How do I find the first array element for which a condition is true?
-You can use this if you care about the index:
-
- for ($i= 0; $i < @array; $i++) {
- if ($array[$i] eq "Waldo") {
- $found_index = $i;
- last;
+To find the first array element which satisfies a condition, you can
+use the first() function in the List::Util module, which comes with
+Perl 5.8. This example finds the first element that contains "Perl".
+
+ use List::Util qw(first);
+
+ my $element = first { /Perl/ } @array;
+
+If you cannot use List::Util, you can make your own loop to do the
+same thing. Once you find the element, you stop the loop with last.
+
+ my $found;
+ foreach my $element ( @array )
+ {
+ if( /Perl/ ) { $found = $element; last }
+ }
+
+If you want the array index, you can iterate through the indices
+and check the array element at each index until you find one
+that satisfies the condition.
+
+ my( $found, $i ) = ( undef, -1 );
+ for( $i = 0; $i < @array; $i++ )
+ {
+ if( $array[$i] =~ /Perl/ )
+ {
+ $found = $array[$i];
+ $index = $i;
+ last;
+ }
}
- }
-
-Now C<$found_index> has what you want.
=head2 How do I handle linked lists?
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
}
+
+which can also be done with map() which is made to transform
+one list into another:
+
+ @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
If you want to do the same thing to modify the values of the
hash, you can use the C<values> function. As of Perl 5.6
for $orbit ( values %orbits ) {
($orbit **= 3) *= (4/3) * 3.14159;
}
-
+
Prior to perl 5.6 C<values> returned copies of the values,
so older perl code often contains constructions such as
C<@orbits{keys %orbits}> instead of C<values %orbits> where
the hash is to be modified.
-
+
=head2 How do I select a random element from an array?
Use the rand() function (see L<perlfunc/rand>):
=head2 How do I permute N elements of a list?
-Here's a little program that generates all permutations
-of all the words on each line of input. The algorithm embodied
-in the permute() function should work on any list:
-
- #!/usr/bin/perl -n
- # tsc-permute: permute each word of input
- permute([split], []);
- sub permute {
- my @items = @{ $_[0] };
- my @perms = @{ $_[1] };
- unless (@items) {
- print "@perms\n";
- } else {
- my(@newitems,@newperms,$i);
- foreach $i (0 .. $#items) {
- @newitems = @items;
- @newperms = @perms;
- unshift(@newperms, splice(@newitems, $i, 1));
- permute([@newitems], [@newperms]);
- }
+Use the List::Permutor module on CPAN. If the list is
+actually an array, try the Algorithm::Permute module (also
+on CPAN). It's written in XS code and is very efficient.
+
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ my $p_iterator = Algorithm::Permute->new ( \@array );
+ while (my @perm = $p_iterator->next) {
+ print "next permutation: (@perm)\n";
+ }
+
+Here's a little program that generates all permutations of
+all the words on each line of input. The algorithm embodied
+in the permute() function is discussed in Volume 4 (still
+unpublished) of Knuth's I<The Art of Computer Programming>
+and will work on any list:
+
+ #!/usr/bin/perl -n
+ # Fischer-Kause ordered permutation generator
+
+ sub permute (&@) {
+ my $code = shift;
+ my @idx = 0..$#_;
+ while ( $code->(@_[@idx]) ) {
+ my $p = $#idx;
+ --$p while $idx[$p-1] > $idx[$p];
+ my $q = $p or return;
+ push @idx, reverse splice @idx, $p;
+ ++$q while $idx[$p-1] > $idx[$q];
+ @idx[$p-1,$q]=@idx[$q,$p-1];
+ }
}
- }
-Unfortunately, this algorithm is very inefficient. The Algorithm::Permute
-module from CPAN runs at least an order of magnitude faster. If you don't
-have a C compiler (or a binary distribution of Algorithm::Permute), then
-you can use List::Permutor which is written in pure Perl, and is still
-several times faster than the algorithm above.
+ permute {print"@_\n"} split;
=head2 How do I sort an array by (anything)?
above.
See the F<sort> artitcle article in the "Far More Than You Ever Wanted
-To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for
+To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
more about this approach.
See also the question below on sorting hashes.
Use the Tie::IxHash from CPAN.
use Tie::IxHash;
- tie(%myhash, Tie::IxHash);
- for ($i=0; $i<20; $i++) {
+ tie my %myhash, Tie::IxHash;
+ for (my $i=0; $i<20; $i++) {
$myhash{$i} = 2*$i;
}
- @keys = keys %myhash;
+ my @keys = keys %myhash;
# @keys = (0,1,2,3,...)
=head2 Why does passing a subroutine an undefined element in a hash create it?
On less elegant (read: Byzantine) systems, however, you have
to play tedious games with "text" versus "binary" files. See
-L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
-systems are curses out of Microsoft, who seem to be committed to putting
-the backward into backward compatibility.
+L<perlfunc/"binmode"> or L<perlopentut>.
If you're concerned about 8-bit ASCII data, then see L<perllocale>.
You can also use the L<Data::Types|Data::Types> module on
the CPAN, which exports functions that validate data types
-using these and other regular expressions.
+using these and other regular expressions, or you can use
+the C<Regexp::Common> module from CPAN which has regular
+expressions to match various types of numbers.
If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
function. Its semantics are somewhat cumbersome, so here's a C<getnum>