=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 1.37 $, $Date: 2002/11/13 06:04:00 $)
+perlfaq4 - Data Manipulation ($Revision: 1.54 $, $Date: 2003/11/30 00:50:08 $)
=head1 DESCRIPTION
To limit the number of decimal places in your numbers, you
can use the printf or sprintf function. See the
-L<perlop|"Floating Point Arithmetic"> for more details.
+L<"Floating Point Arithmetic"|perlop> for more details.
printf "%.2f", 10/3;
-
+
my $number = sprintf "%.2f", 10/3;
-
+
+=head2 Why is int() broken?
+
+Your int() is most probably working just fine. It's the numbers that
+aren't quite what you think.
+
+First, see the above item "Why am I getting long decimals
+(eg, 19.9499999999999) instead of the numbers I should be getting
+(eg, 19.95)?".
+
+For example, this
+
+ print int(0.6/0.2-2), "\n";
+
+will in most computers print 0, not 1, because even such simple
+numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
+numbers. What you think in the above as 'three' is really more like
+2.9999999999999995559.
+
=head2 Why isn't my octal data interpreted correctly?
Perl only understands octal and hex numbers as such when they occur as
"%o" or "%O" sprintf() formats.
This problem shows up most often when people try using chmod(), mkdir(),
-umask(), or sysopen(), which by widespread tradition typically take
+umask(), or sysopen(), which by widespread tradition typically take
permissions in octal.
chmod(644, $file); # WRONG
chmod(0644, $file); # right
-Note the mistake in the first line was specifying the decimal literal
+Note the mistake in the first line was specifying the decimal literal
644, rather than the intended octal literal 0644. The problem can
be seen with:
Surely you had not intended C<chmod(01204, $file);> - did you? If you
want to use numeric literals as arguments to chmod() et al. then please
-try to express them as octal constants, that is with a leading zero and
+try to express them as octal constants, that is with a leading zero and
with the following digits restricted to the set 0..7.
=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
- 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
+ 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
0.8 0.8 0.9 0.9 1.0 1.0
Don't blame Perl. It's the same as in C. IEEE says we have to do this.
machines) will work pretty much like mathematical integers. Other numbers
are not guaranteed.
-=head2 How do I convert between numeric representations?
+=head2 How do I convert between numeric representations/bases/radixes?
As always with Perl there is more than one way to do it. Below
are a few examples of approaches to making common conversions
Using perl's built in conversion of 0x notation:
- $int = 0xDEADBEEF;
- $dec = sprintf("%d", $int);
+ $dec = 0xDEADBEEF;
Using the hex function:
- $int = hex("DEADBEEF");
- $dec = sprintf("%d", $int);
+ $dec = hex("DEADBEEF");
Using pack:
- $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
- $dec = sprintf("%d", $int);
+ $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
Using the CPAN module Bit::Vector:
=item How do I convert from decimal to hexadecimal
-Using sprint:
+Using sprintf:
- $hex = sprintf("%X", 3735928559);
+ $hex = sprintf("%X", 3735928559); # upper case A-F
+ $hex = sprintf("%x", 3735928559); # lower case a-f
-Using unpack
+Using unpack:
$hex = unpack("H*", pack("N", 3735928559));
-Using Bit::Vector
+Using Bit::Vector:
use Bit::Vector;
$vec = Bit::Vector->new_Dec(32, -559038737);
Using Perl's built in conversion of numbers with leading zeros:
- $int = 033653337357; # note the leading 0!
- $dec = sprintf("%d", $int);
+ $dec = 033653337357; # note the leading 0!
Using the oct function:
- $int = oct("33653337357");
- $dec = sprintf("%d", $int);
+ $dec = oct("33653337357");
Using Bit::Vector:
$oct = sprintf("%o", 3735928559);
-Using Bit::Vector
+Using Bit::Vector:
use Bit::Vector;
$vec = Bit::Vector->new_Dec(32, -559038737);
Perl 5.6 lets you write binary numbers directly with
the 0b notation:
- $number = 0b10110110;
+ $number = 0b10110110;
+
+Using oct:
-Using pack and ord
+ my $input = "10110110";
+ $decimal = oct( "0b$input" );
+
+Using pack and ord:
$decimal = ord(pack('B8', '10110110'));
-Using pack and unpack for larger strings
+Using pack and unpack for larger strings:
$int = unpack("N", pack("B32",
substr("0" x 32 . "11110101011011011111011101111", -32)));
=item How do I convert from decimal to binary
-Using unpack;
+Using sprintf (perl 5.6+):
+
+ $bin = sprintf("%b", 3735928559);
+
+Using unpack:
$bin = unpack("B*", pack("N", 3735928559));
If you're using a version of Perl before 5.004, you must call C<srand>
once at the start of your program to seed the random number generator.
- BEGIN { srand() if $[ < 5.004 }
+ BEGIN { srand() if $] < 5.004 }
5.004 and later automatically call C<srand> at the beginning. Don't
call C<srand> more than once---you make your numbers less random, rather
=head2 How do I get a random number between X and Y?
-Use the following simple function. It selects a random integer between
-(and possibly including!) the two given integers, e.g.,
-C<random_int_in(50,120)>
+C<rand($x)> returns a number such that
+C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
+figure out is a random number in the range from 0 to the
+difference between your I<X> and I<Y>.
+
+That is, to get a number between 10 and 15, inclusive, you
+want a random number between 0 and 5 that you can then add
+to 10.
+
+ my $number = 10 + int rand( 15-10+1 );
+
+Hence you derive the following simple function to abstract
+that. It selects a random integer between the two given
+integers (inclusive), For example: C<random_int_in(50,120)>.
sub random_int_in ($$) {
my($min, $max) = @_;
=head1 Data: Dates
-=head2 How do I find the week-of-the-year/day-of-the-year?
+=head2 How do I find the day or week of the year?
+
+The localtime function returns the day of the week. Without an
+argument localtime uses the current time.
-The day of the year is in the array returned by localtime() (see
-L<perlfunc/"localtime">):
+ $day_of_year = (localtime)[7];
- $day_of_year = (localtime(time()))[7];
+The POSIX module can also format a date as the day of the year or
+week of the year.
+
+ use POSIX qw/strftime/;
+ my $day_of_year = strftime "%j", localtime;
+ my $week_of_year = strftime "%W", localtime;
+
+To get the day of year for any date, use the Time::Local module to get
+a time in epoch seconds for the argument to localtime.
+
+ use POSIX qw/strftime/;
+ use Time::Local;
+ my $week_of_year = strftime "%W",
+ localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
+
+The Date::Calc module provides two functions for to calculate these.
+
+ use Date::Calc;
+ my $day_of_year = Day_of_Year( 1987, 12, 18 );
+ my $week_of_year = Week_of_Year( 1987, 12, 18 );
=head2 How do I find the current century or millennium?
Use the following simple functions:
- sub get_century {
+ sub get_century {
return int((((localtime(shift || time))[5] + 1999))/100);
- }
- sub get_millennium {
+ }
+ sub get_millennium {
return 1+int((((localtime(shift || time))[5] + 1899))/1000);
- }
-
-You can also use the POSIX strftime() function which may be a bit
-slower but is easier to read and maintain.
-
- use POSIX qw/strftime/;
-
- my $week_of_the_year = strftime "%W", localtime;
- my $day_of_the_year = strftime "%j", localtime;
+ }
On some systems, the POSIX module's strftime() function has
been extended in a non-standard way to use a C<%C> format,
can use the Date::Calc module.
use Date::Calc qw(Today Add_Delta_Days);
-
+
my @date = Add_Delta_Days( Today(), -1 );
-
+
print "@date\n";
Most people try to use the time rather than the calendar to
my $tdst = (localtime $then)[8] > 0;
$then - ($tdst - $ndst) * 60 * 60;
}
-
+
Should give you "this time yesterday" in seconds since epoch relative to
the first argument or the current time if no argument is given and
suitable for passing to localtime or whatever else you need to do with
while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
# do something with $1
- }
+ }
A more complicated and sneaky approach is to make Perl's regular
expression engine do it for you. This is courtesy Dean Inada, and
You can access the first characters of a string with substr().
To get the first character, for example, start at position 0
-and grab the string of length 1.
+and grab the string of length 1.
$string = "Just another Perl Hacker";
argument which is the replacement string.
substr( $string, 13, 4, "Perl 5.8.0" );
-
+
You can also use substr() as an lvalue.
substr( $string, 13, 4 ) = "Perl 5.8.0";
-
+
=head2 How do I change the Nth occurrence of something?
You have to keep track of N yourself. For example, let's say you want
capitalization of the movie I<Dr. Strangelove or: How I Learned to
Stop Worrying and Love the Bomb>, for example.
+Damian Conway's L<Text::Autoformat> module provides some smart
+case transformations:
+
+ use Text::Autoformat;
+ my $x = "Dr. Strangelove or: How I Learned to Stop ".
+ "Worrying and Love the Bomb";
+
+ print $x, "\n";
+ for my $style (qw( sentence title highlight ))
+ {
+ print autoformat($x, { case => $style }), "\n";
+ }
+
=head2 How can I split a [character] delimited string except when inside [character]?
Several modules can handle this sort of pasing---Text::Balanced,
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
-problem. Thankfully, we have Jeffrey Friedl, author of
+problem. Thankfully, we have Jeffrey Friedl, author of
I<Mastering Regular Expressions>, to handle these for us. He
suggests (assuming your string is contained in $text):
This idiom takes advantage of the C<foreach> loop's aliasing
behavior to factor out common code. You can do this
-on several strings at once, or arrays, or even the
+on several strings at once, or arrays, or even the
values of a hash if you use a slice:
- # trim whitespace in the scalar, the array,
+ # trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
=head2 How do I pad a string with blanks or pad a number with zeroes?
-(This answer contributed by Uri Guttman, with kibitzing from
-Bart Lateur.)
-
In the following examples, C<$pad_len> is the length to which you wish
to pad the string, C<$text> or C<$num> contains the string to be padded,
and C<$pad_char> contains the padding character. You can use a single
C<$pad_len>.
# Left padding a string with blanks (no truncation):
- $padded = sprintf("%${pad_len}s", $text);
+ $padded = sprintf("%${pad_len}s", $text);
+ $padded = sprintf("%*s", $pad_len, $text); # same thing
# Right padding a string with blanks (no truncation):
- $padded = sprintf("%-${pad_len}s", $text);
+ $padded = sprintf("%-${pad_len}s", $text);
+ $padded = sprintf("%-*s", $pad_len, $text); # same thing
- # Left padding a number with 0 (no truncation):
- $padded = sprintf("%0${pad_len}d", $num);
+ # Left padding a number with 0 (no truncation):
+ $padded = sprintf("%0${pad_len}d", $num);
+ $padded = sprintf("%0*d", $pad_len, $num); # same thing
# Right padding a string with blanks using pack (will truncate):
$padded = pack("A$pad_len",$text);
=head2 How do I extract selected columns from a string?
Use substr() or unpack(), both documented in L<perlfunc>.
-If you prefer thinking in terms of columns instead of widths,
+If you prefer thinking in terms of columns instead of widths,
you can use this kind of thing:
# determine the unpack format needed to split Linux ps output
# arguments are cut columns
my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
- sub cut2fmt {
+ sub cut2fmt {
my(@positions) = @_;
my $template = '';
my $lastpos = 1;
for my $place (@positions) {
- $template .= "A" . ($place - $lastpos) . " ";
+ $template .= "A" . ($place - $lastpos) . " ";
$lastpos = $place;
}
$template .= "A*";
It's probably better in the general case to treat those
variables as entries in some special hash. For example:
- %user_defs = (
+ %user_defs = (
foo => 23,
bar => 19,
);
The problem is that those double-quotes force stringification--
coercing numbers and references into strings--even when you
don't want them to be strings. Think of it this way: double-quote
-expansion is used to produce new strings. If you already
+expansion is used to produce new strings. If you already
have a string, why do you need more?
If you get used to writing odd things like these:
number, such as the magical C<++> autoincrement operator or the
syscall() function.
-Stringification also destroys arrays.
+Stringification also destroys arrays.
@lines = `command`;
print "@lines"; # WRONG - extra blanks
print @lines; # right
-=head2 Why don't my <<HERE documents work?
+=head2 Why don't my E<lt>E<lt>HERE documents work?
Check for these three things:
=over 4
-=item 1. There must be no space after the << part.
+=item There must be no space after the E<lt>E<lt> part.
-=item 2. There (probably) should be a semicolon at the end.
+=item There (probably) should be a semicolon at the end.
-=item 3. You can't (easily) have any space in front of the tag.
+=item You can't (easily) have any space in front of the tag.
=back
-If you want to indent the text in the here document, you
+If you want to indent the text in the here document, you
can do this:
# all in one
HERE_TARGET
But the HERE_TARGET must still be flush against the margin.
-If you want that indented also, you'll have to quote
+If you want that indented also, you'll have to quote
in the indentation.
($quote = <<' FINIS') =~ s/^\s+//gm;
@bad[0] = `same program that outputs several lines`;
-The C<use warnings> pragma and the B<-w> flag will warn you about these
+The C<use warnings> pragma and the B<-w> flag will warn you about these
matters.
=head2 How can I remove duplicate elements from a list or array?
@a = @b = ( "this", "that", [ "more", "stuff" ] );
printf "a and b contain %s arrays\n",
- cmpStr(\@a, \@b) == 0
- ? "the same"
+ cmpStr(\@a, \@b) == 0
+ ? "the same"
: "different";
This approach also works for comparing hashes. Here
%a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
$a{EXTRA} = \%b;
- $b{EXTRA} = \%a;
+ $b{EXTRA} = \%a;
printf "a and b contain %s hashes\n",
cmpStr(\%a, \%b) == 0 ? "the same" : "different";
Perl 5.8. This example finds the first element that contains "Perl".
use List::Util qw(first);
-
+
my $element = first { /Perl/ } @array;
-
+
If you cannot use List::Util, you can make your own loop to do the
same thing. Once you find the element, you stop the loop with last.
and check the array element at each index until you find one
that satisfies the condition.
- my( $found, $i ) = ( undef, -1 );
- for( $i = 0; $i < @array; $i++ )
+ my( $found, $index ) = ( undef, -1 );
+ for( $i = 0; $i < @array; $i++ )
{
- if( $array[$i] =~ /Perl/ )
- {
+ if( $array[$i] =~ /Perl/ )
+ {
$found = $array[$i];
- $index = $i;
+ $index = $i;
last;
}
}
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
}
-
+
which can also be done with map() which is made to transform
one list into another:
case), you modify the value.
for $orbit ( values %orbits ) {
- ($orbit **= 3) *= (4/3) * 3.14159;
+ ($orbit **= 3) *= (4/3) * 3.14159;
}
Prior to perl 5.6 C<values> returned copies of the values,
Use the rand() function (see L<perlfunc/rand>):
- # at the top of the program:
- srand; # not needed for 5.004 and later
-
- # then later on
$index = rand @array;
$element = $array[$index];
-Make sure you I<only call srand once per program, if then>.
-If you are calling it more than once (such as before each
-call to rand), you're almost certainly doing something wrong.
+Or, simply:
+ my $element = $array[ rand @array ];
=head2 How do I permute N elements of a list?
print "next permutation: (@perm)\n";
}
+For even faster execution, you could do:
+
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ Algorithm::Permute::permute {
+ print "next permutation: (@array)\n";
+ } @array;
+
Here's a little program that generates all permutations of
all the words on each line of input. The algorithm embodied
in the permute() function is discussed in Volume 4 (still
This can be conveniently combined with precalculation of keys as given
above.
-See the F<sort> artitcle article in the "Far More Than You Ever Wanted
+See the F<sort> article in the "Far More Than You Ever Wanted
To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
more about this approach.
@ints = $vector->Index_List_Read();
Bit::Vector provides efficient methods for bit vector, sets of small integers
-and "big int" math.
+and "big int" math.
Here's a more extensive illustration using vec():
# vec demo
$vector = "\xff\x0f\xef\xfe";
- print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
+ print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
unpack("N", $vector), "\n";
$is_set = vec($vector, 23, 1);
print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
set_vec(0,32,17);
set_vec(1,32,17);
- sub set_vec {
+ sub set_vec {
my ($offset, $width, $value) = @_;
my $vector = '';
vec($vector, $offset, $width) = $value;
print "vector length in bytes: ", length($vector), "\n";
@bytes = unpack("A8" x length($vector), $bits);
print "bits are: @bytes\n\n";
- }
+ }
=head2 Why does defined() return true on empty arrays and hashes?
$num_keys = keys %hash;
-The keys() function also resets the iterator, which means that you may
-see strange results if you use this between uses of other hash operators
+The keys() function also resets the iterator, which means that you may
+see strange results if you use this between uses of other hash operators
such as each().
=head2 How do I sort a hash (optionally by value instead of key)?
Use the Tie::IxHash from CPAN.
use Tie::IxHash;
- tie my %myhash, Tie::IxHash;
+ tie my %myhash, 'Tie::IxHash';
for (my $i=0; $i<20; $i++) {
$myhash{$i} = 2*$i;
}
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
-You can also use the L<Data::Types|Data::Types> module on
-the CPAN, which exports functions that validate data types
-using these and other regular expressions, or you can use
-the C<Regexp::Common> module from CPAN which has regular
-expressions to match various types of numbers.
-
-If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
+There are also some commonly used modules for the task.
+L<Scalar::Util> (distributed with 5.8) provides access to perl's
+internal function C<looks_like_number> for determining
+whether a variable looks like a number. L<Data::Types>
+exports functions that validate data types using both the
+above and other regular expressions. Thirdly, there is
+C<Regexp::Common> which has regular expressions to match
+various types of numbers. Those three modules are available
+from the CPAN.
+
+If you're on a POSIX system, Perl supports the C<POSIX::strtod>
function. Its semantics are somewhat cumbersome, so here's a C<getnum>
wrapper function for more convenient access. This function takes
a string and returns the number it found, or C<undef> for input that
return undef;
} else {
return $num;
- }
- }
+ }
+ }
- sub is_numeric { defined getnum($_[0]) }
+ sub is_numeric { defined getnum($_[0]) }
-Or you could check out the L<String::Scanf|String::Scanf> module on the CPAN
+Or you could check out the L<String::Scanf> module on the CPAN
instead. The POSIX module (part of the standard Perl distribution) provides
the C<strtod> and C<strtol> for converting strings to double and longs,
respectively.
of the standard distribution. Here's one example using Storable's C<store>
and C<retrieve> functions:
- use Storable;
+ use Storable;
store(\%hash, "filename");
- # later on...
+ # later on...
$href = retrieve("filename"); # by ref
%hash = %{ retrieve("filename") }; # direct to hash
=head2 How do I print out or copy a recursive data structure?
The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
-for printing out data structures. The Storable module, found on CPAN,
-provides a function called C<dclone> that recursively copies its argument.
+for printing out data structures. The Storable module on CPAN (or the
+5.8 release of Perl), provides a function called C<dclone> that recursively
+copies its argument.
- use Storable qw(dclone);
+ use Storable qw(dclone);
$r2 = dclone($r1);
Where $r1 can be a reference to any kind of data structure you'd like.