=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 3606 $)
+perlfaq4 - Data Manipulation ($Revision: 6816 $)
=head1 DESCRIPTION
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
-Internally, your computer represents floating-point numbers
-in binary. Digital (as in powers of two) computers cannot
-store all numbers exactly. Some real numbers lose precision
-in the process. This is a problem with how computers store
-numbers and affects all computer languages, not just Perl.
+Internally, your computer represents floating-point numbers in binary.
+Digital (as in powers of two) computers cannot store all numbers
+exactly. Some real numbers lose precision in the process. This is a
+problem with how computers store numbers and affects all computer
+languages, not just Perl.
-L<perlnumber> show the gory details of number
-representations and conversions.
+L<perlnumber> show the gory details of number representations and
+conversions.
-To limit the number of decimal places in your numbers, you
-can use the printf or sprintf function. See the
-L<"Floating Point Arithmetic"|perlop> for more details.
+To limit the number of decimal places in your numbers, you can use the
+printf or sprintf function. See the L<"Floating Point
+Arithmetic"|perlop> for more details.
printf "%.2f", 10/3;
=head2 Why is int() broken?
-Your int() is most probably working just fine. It's the numbers that
+Your C<int()> is most probably working just fine. It's the numbers that
aren't quite what you think.
-First, see the above item "Why am I getting long decimals
+First, see the answer to "Why am I getting long decimals
(eg, 19.9499999999999) instead of the numbers I should be getting
(eg, 19.95)?".
For example, this
- print int(0.6/0.2-2), "\n";
+ print int(0.6/0.2-2), "\n";
will in most computers print 0, not 1, because even such simple
numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
Perl only understands octal and hex numbers as such when they occur as
literals in your program. Octal literals in perl must start with a
-leading "0" and hexadecimal literals must start with a leading "0x".
+leading C<0> and hexadecimal literals must start with a leading C<0x>.
If they are read in from somewhere and assigned, no automatic
-conversion takes place. You must explicitly use oct() or hex() if you
-want the values converted to decimal. oct() interprets hex ("0x350"),
-octal ("0350" or even without the leading "0", like "377") and binary
-("0b1010") numbers, while hex() only converts hexadecimal ones, with
-or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
+conversion takes place. You must explicitly use C<oct()> or C<hex()> if you
+want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>),
+octal (C<0350> or even without the leading C<0>, like C<377>) and binary
+(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with
+or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>.
The inverse mapping from decimal to octal can be done with either the
-"%o" or "%O" sprintf() formats.
+<%o> or C<%O> C<sprintf()> formats.
-This problem shows up most often when people try using chmod(), mkdir(),
-umask(), or sysopen(), which by widespread tradition typically take
-permissions in octal.
+This problem shows up most often when people try using C<chmod()>,
+C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition
+typically take permissions in octal.
- chmod(644, $file); # WRONG
- chmod(0644, $file); # right
+ chmod(644, $file); # WRONG
+ chmod(0644, $file); # right
Note the mistake in the first line was specifying the decimal literal
-644, rather than the intended octal literal 0644. The problem can
+C<644>, rather than the intended octal literal C<0644>. The problem can
be seen with:
- printf("%#o",644); # prints 01204
+ printf("%#o",644); # prints 01204
Surely you had not intended C<chmod(01204, $file);> - did you? If you
want to use numeric literals as arguments to chmod() et al. then please
try to express them as octal constants, that is with a leading zero and
-with the following digits restricted to the set 0..7.
+with the following digits restricted to the set C<0..7>.
=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
-Remember that int() merely truncates toward 0. For rounding to a
-certain number of digits, sprintf() or printf() is usually the easiest
-route.
+Remember that C<int()> merely truncates toward 0. For rounding to a
+certain number of digits, C<sprintf()> or C<printf()> is usually the
+easiest route.
- printf("%.3f", 3.1415926535); # prints 3.142
+ printf("%.3f", 3.1415926535); # prints 3.142
-The POSIX module (part of the standard Perl distribution) implements
-ceil(), floor(), and a number of other mathematical and trigonometric
-functions.
+The C<POSIX> module (part of the standard Perl distribution)
+implements C<ceil()>, C<floor()>, and a number of other mathematical
+and trigonometric functions.
- use POSIX;
- $ceil = ceil(3.5); # 4
- $floor = floor(3.5); # 3
+ use POSIX;
+ $ceil = ceil(3.5); # 4
+ $floor = floor(3.5); # 3
-In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
-module. With 5.004, the Math::Trig module (part of the standard Perl
+In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
+module. With 5.004, the C<Math::Trig> module (part of the standard Perl
distribution) implements the trigonometric functions. Internally it
-uses the Math::Complex module and some functions can break out from
+uses the C<Math::Complex> module and some functions can break out from
the real axis into the complex plane, for example the inverse sine of
2.
To see why, notice how you'll still have an issue on half-way-point
alternation:
- for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
+ for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
- 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
- 0.8 0.8 0.9 0.9 1.0 1.0
+ 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
+ 0.8 0.8 0.9 0.9 1.0 1.0
-Don't blame Perl. It's the same as in C. IEEE says we have to do this.
-Perl numbers whose absolute values are integers under 2**31 (on 32 bit
-machines) will work pretty much like mathematical integers. Other numbers
-are not guaranteed.
+Don't blame Perl. It's the same as in C. IEEE says we have to do
+this. Perl numbers whose absolute values are integers under 2**31 (on
+32 bit machines) will work pretty much like mathematical integers.
+Other numbers are not guaranteed.
=head2 How do I convert between numeric representations/bases/radixes?
-As always with Perl there is more than one way to do it. Below
-are a few examples of approaches to making common conversions
-between number representations. This is intended to be representational
-rather than exhaustive.
+As always with Perl there is more than one way to do it. Below are a
+few examples of approaches to making common conversions between number
+representations. This is intended to be representational rather than
+exhaustive.
-Some of the examples below use the Bit::Vector module from CPAN.
-The reason you might choose Bit::Vector over the perl built in
-functions is that it works with numbers of ANY size, that it is
-optimized for speed on some operations, and for at least some
-programmers the notation might be familiar.
+Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
+module from CPAN. The reason you might choose C<Bit::Vector> over the
+perl built in functions is that it works with numbers of ANY size,
+that it is optimized for speed on some operations, and for at least
+some programmers the notation might be familiar.
=over 4
=item How do I convert hexadecimal into decimal
-Using perl's built in conversion of 0x notation:
+Using perl's built in conversion of C<0x> notation:
- $dec = 0xDEADBEEF;
+ $dec = 0xDEADBEEF;
-Using the hex function:
+Using the C<hex> function:
- $dec = hex("DEADBEEF");
+ $dec = hex("DEADBEEF");
-Using pack:
+Using C<pack>:
- $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
+ $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
-Using the CPAN module Bit::Vector:
+Using the CPAN module C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
- $dec = $vec->to_Dec();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to hexadecimal
-Using sprintf:
+Using C<sprintf>:
- $hex = sprintf("%X", 3735928559); # upper case A-F
- $hex = sprintf("%x", 3735928559); # lower case a-f
+ $hex = sprintf("%X", 3735928559); # upper case A-F
+ $hex = sprintf("%x", 3735928559); # lower case a-f
-Using unpack:
+Using C<unpack>:
- $hex = unpack("H*", pack("N", 3735928559));
+ $hex = unpack("H*", pack("N", 3735928559));
-Using Bit::Vector:
+Using C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $hex = $vec->to_Hex();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $hex = $vec->to_Hex();
-And Bit::Vector supports odd bit counts:
+And C<Bit::Vector> supports odd bit counts:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(33, 3735928559);
- $vec->Resize(32); # suppress leading 0 if unwanted
- $hex = $vec->to_Hex();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(33, 3735928559);
+ $vec->Resize(32); # suppress leading 0 if unwanted
+ $hex = $vec->to_Hex();
=item How do I convert from octal to decimal
Using Perl's built in conversion of numbers with leading zeros:
- $dec = 033653337357; # note the leading 0!
+ $dec = 033653337357; # note the leading 0!
-Using the oct function:
+Using the C<oct> function:
- $dec = oct("33653337357");
+ $dec = oct("33653337357");
-Using Bit::Vector:
+Using C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new(32);
- $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
- $dec = $vec->to_Dec();
+ use Bit::Vector;
+ $vec = Bit::Vector->new(32);
+ $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to octal
-Using sprintf:
+Using C<sprintf>:
- $oct = sprintf("%o", 3735928559);
+ $oct = sprintf("%o", 3735928559);
-Using Bit::Vector:
+Using C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $oct = reverse join('', $vec->Chunk_List_Read(3));
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $oct = reverse join('', $vec->Chunk_List_Read(3));
=item How do I convert from binary to decimal
Perl 5.6 lets you write binary numbers directly with
-the 0b notation:
+the C<0b> notation:
- $number = 0b10110110;
+ $number = 0b10110110;
-Using oct:
+Using C<oct>:
- my $input = "10110110";
- $decimal = oct( "0b$input" );
+ my $input = "10110110";
+ $decimal = oct( "0b$input" );
-Using pack and ord:
+Using C<pack> and C<ord>:
- $decimal = ord(pack('B8', '10110110'));
+ $decimal = ord(pack('B8', '10110110'));
-Using pack and unpack for larger strings:
+Using C<pack> and C<unpack> for larger strings:
- $int = unpack("N", pack("B32",
+ $int = unpack("N", pack("B32",
substr("0" x 32 . "11110101011011011111011101111", -32)));
- $dec = sprintf("%d", $int);
+ $dec = sprintf("%d", $int);
- # substr() is used to left pad a 32 character string with zeros.
+ # substr() is used to left pad a 32 character string with zeros.
-Using Bit::Vector:
+Using C<Bit::Vector>:
- $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
- $dec = $vec->to_Dec();
+ $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
+ $dec = $vec->to_Dec();
=item How do I convert from decimal to binary
-Using sprintf (perl 5.6+):
+Using C<sprintf> (perl 5.6+):
- $bin = sprintf("%b", 3735928559);
+ $bin = sprintf("%b", 3735928559);
-Using unpack:
+Using C<unpack>:
- $bin = unpack("B*", pack("N", 3735928559));
+ $bin = unpack("B*", pack("N", 3735928559));
-Using Bit::Vector:
+Using C<Bit::Vector>:
- use Bit::Vector;
- $vec = Bit::Vector->new_Dec(32, -559038737);
- $bin = $vec->to_Bin();
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $bin = $vec->to_Bin();
The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
are left as an exercise to the inclined reader.
they have a number but really it's a string. The rest arise because
the programmer says:
- if ("\020\020" & "\101\101") {
- # ...
- }
+ if ("\020\020" & "\101\101") {
+ # ...
+ }
but a string consisting of two null bytes (the result of C<"\020\020"
& "\101\101">) is not a false value in Perl. You need:
- if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
- # ...
- }
+ if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
+ # ...
+ }
=head2 How do I multiply matrices?
To call a function on each element in an array, and collect the
results, use:
- @results = map { my_func($_) } @array;
+ @results = map { my_func($_) } @array;
For example:
- @triple = map { 3 * $_ } @single;
+ @triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the
results:
- foreach $iterator (@array) {
- some_func($iterator);
- }
+ foreach $iterator (@array) {
+ some_func($iterator);
+ }
To call a function on each integer in a (small) range, you B<can> use:
- @results = map { some_func($_) } (5 .. 25);
+ @results = map { some_func($_) } (5 .. 25);
but you should be aware that the C<..> operator creates an array of
all integers in the range. This can take a lot of memory for large
ranges. Instead use:
- @results = ();
- for ($i=5; $i < 500_005; $i++) {
- push(@results, some_func($i));
- }
+ @results = ();
+ for ($i=5; $i < 500_005; $i++) {
+ push(@results, some_func($i));
+ }
This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
loop will iterate over the range, without creating the entire range.
- for my $i (5 .. 500_005) {
- push(@results, some_func($i));
- }
+ for my $i (5 .. 500_005) {
+ push(@results, some_func($i));
+ }
will not create a list of 500,000 integers.
BEGIN { srand() if $] < 5.004 }
5.004 and later automatically call C<srand> at the beginning. Don't
-call C<srand> more than once---you make your numbers less random, rather
-than more.
+call C<srand> more than once--you make your numbers less random,
+rather than more.
Computers are good at being predictable and bad at being random
(despite appearances caused by bugs in your programs :-). see the
F<random> article in the "Far More Than You Ever Wanted To Know"
-collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
-Tom Phoenix, talks more about this. John von Neumann said, "Anyone
+collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
+of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
who attempts to generate random numbers by deterministic means is, of
course, living in a state of sin."
If you want numbers that are more random than C<rand> with C<srand>
-provides, you should also check out the Math::TrulyRandom module from
+provides, you should also check out the C<Math::TrulyRandom> module from
CPAN. It uses the imperfections in your system's timer to generate
random numbers, but this takes quite a while. If you want a better
pseudorandom generator than comes with your operating system, look at
that. It selects a random integer between the two given
integers (inclusive), For example: C<random_int_between(50,120)>.
- sub random_int_between ($$) {
+ sub random_int_between {
my($min, $max) = @_;
# Assumes that the two arguments are integers themselves!
return $min if $min == $max;
$day_of_year = (localtime)[7];
-The POSIX module can also format a date as the day of the year or
+The C<POSIX> module can also format a date as the day of the year or
week of the year.
use POSIX qw/strftime/;
my $day_of_year = strftime "%j", localtime;
my $week_of_year = strftime "%W", localtime;
-To get the day of year for any date, use the Time::Local module to get
+To get the day of year for any date, use C<POSIX>'s C<mktime> to get
a time in epoch seconds for the argument to localtime.
- use POSIX qw/strftime/;
- use Time::Local;
+ use POSIX qw/mktime strftime/;
my $week_of_year = strftime "%W",
- localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
+ localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
-The Date::Calc module provides two functions to calculate these.
+The C<Date::Calc> module provides two functions to calculate these.
use Date::Calc;
my $day_of_year = Day_of_Year( 1987, 12, 18 );
Use the following simple functions:
- sub get_century {
- return int((((localtime(shift || time))[5] + 1999))/100);
- }
+ sub get_century {
+ return int((((localtime(shift || time))[5] + 1999))/100);
+ }
- sub get_millennium {
- return 1+int((((localtime(shift || time))[5] + 1899))/1000);
- }
+ sub get_millennium {
+ return 1+int((((localtime(shift || time))[5] + 1899))/1000);
+ }
-On some systems, the POSIX module's strftime() function has
-been extended in a non-standard way to use a C<%C> format,
-which they sometimes claim is the "century". It isn't,
-because on most such systems, this is only the first two
-digits of the four-digit year, and thus cannot be used to
-reliably determine the current century or millennium.
+On some systems, the C<POSIX> module's C<strftime()> function has been
+extended in a non-standard way to use a C<%C> format, which they
+sometimes claim is the "century". It isn't, because on most such
+systems, this is only the first two digits of the four-digit year, and
+thus cannot be used to reliably determine the current century or
+millennium.
=head2 How can I compare two dates and find the difference?
(contributed by brian d foy)
-You could just store all your dates as a number and then subtract. Life
-isn't always that simple though. If you want to work with formatted
-dates, the Date::Manip, Date::Calc, or DateTime modules can help you.
-
+You could just store all your dates as a number and then subtract.
+Life isn't always that simple though. If you want to work with
+formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
+modules can help you.
=head2 How can I take a string and turn it into epoch seconds?
If it's a regular enough string that it always has the same format,
you can split it up and pass the parts to C<timelocal> in the standard
-Time::Local module. Otherwise, you should look into the Date::Calc
-and Date::Manip modules from CPAN.
+C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
+and C<Date::Manip> modules from CPAN.
=head2 How can I find the Julian Day?
(contributed by brian d foy and Dave Cross)
-You can use the Time::JulianDay module available on CPAN. Ensure that
-you really want to find a Julian day, though, as many people have
+You can use the C<Time::JulianDay> module available on CPAN. Ensure
+that you really want to find a Julian day, though, as many people have
different ideas about Julian days. See
http://www.hermetic.ch/cal_stud/jdn.htm for instance.
-You can also try the DateTime module, which can convert a date/time
+You can also try the C<DateTime> module, which can convert a date/time
to a Julian Day.
- $ perl -MDateTime -le'print DateTime->today->jd'
- 2453401.5
+ $ perl -MDateTime -le'print DateTime->today->jd'
+ 2453401.5
Or the modified Julian Day
- $ perl -MDateTime -le'print DateTime->today->mjd'
- 53401
+ $ perl -MDateTime -le'print DateTime->today->mjd'
+ 53401
Or even the day of the year (which is what some people think of as a
Julian day)
- $ perl -MDateTime -le'print DateTime->today->doy'
- 31
+ $ perl -MDateTime -le'print DateTime->today->doy'
+ 31
=head2 How do I find yesterday's date?
most people, there are two days a year when they aren't: the switch to
and from summer time throws this off. Let the modules do the work.
-=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
+=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
-Y2K compliant (whatever that means). The programmers you've hired to
+Y2K compliant (whatever that means). The programmers you've hired to
use it, however, probably are not.
Long answer: The question belies a true understanding of the issue.
with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
character are removed with
- s/\\(.)/$1/g;
+ s/\\(.)/$1/g;
This won't expand C<"\n"> or C<"\t"> or any other special escapes.
that to require that the same thing immediately follow it. We replace
that part of the string with the character in C<$1>.
- s/(.)\1/$1/g;
+ s/(.)\1/$1/g;
We can also use the transliteration operator, C<tr///>. In this
example, the search list side of our C<tr///> contains nothing, but
does not show up next to itself
my $str = 'Haarlem'; # in the Netherlands
- $str =~ tr///cs; # Now Harlem, like in New York
+ $str =~ tr///cs; # Now Harlem, like in New York
=head2 How do I expand function calls in a string?
In most cases, it is probably easier to simply use string concatenation,
which also forces scalar context.
- print "The time is " . localtime . ".\n";
+ print "The time is " . localtime() . ".\n";
=head2 How do I find matching/nesting anything?
matter how complicated. To find something between two single
characters, a pattern like C</x([^x]*)x/> will get the intervening
bits in $1. For multiple ones, then something more like
-C</alpha(.*?)omega/> would be needed. But none of these deals with
+C</alpha(.*?)omega/> would be needed. But none of these deals with
nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
L<perlre/(??{ code })>. For other cases, you'll have to write a
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There are
-the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
-and the byacc program. Starting from perl 5.8 the Text::Balanced is
-part of the standard distribution.
+the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
+C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
+the C<Text::Balanced> is part of the standard distribution.
One simple destructive, inside-out approach that you might try is to
pull out the smallest nesting parts one at a time:
- while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
- # do something with $1
- }
+ while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
+ # do something with $1
+ }
A more complicated and sneaky approach is to make Perl's regular
expression engine do it for you. This is courtesy Dean Inada, and
rather has the nature of an Obfuscated Perl Contest entry, but it
really does work:
- # $_ contains the string to parse
- # BEGIN and END are the opening and closing markers for the
- # nested text.
+ # $_ contains the string to parse
+ # BEGIN and END are the opening and closing markers for the
+ # nested text.
- @( = ('(','');
- @) = (')','');
- ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
- @$ = (eval{/$re/},$@!~/unmatched/i);
- print join("\n",@$[0..$#$]) if( $$[-1] );
+ @( = ('(','');
+ @) = (')','');
+ ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
+ @$ = (eval{/$re/},$@!~/unmatched/i);
+ print join("\n",@$[0..$#$]) if( $$[-1] );
=head2 How do I reverse a string?
-Use reverse() in scalar context, as documented in
+Use C<reverse()> in scalar context, as documented in
L<perlfunc/reverse>.
- $reversed = reverse $string;
+ $reversed = reverse $string;
=head2 How do I expand tabs in a string?
You can do it yourself:
- 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
+ 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
-Or you can just use the Text::Tabs module (part of the standard Perl
+Or you can just use the C<Text::Tabs> module (part of the standard Perl
distribution).
- use Text::Tabs;
- @expanded_lines = expand(@lines_with_tabs);
+ use Text::Tabs;
+ @expanded_lines = expand(@lines_with_tabs);
=head2 How do I reformat a paragraph?
-Use Text::Wrap (part of the standard Perl distribution):
+Use C<Text::Wrap> (part of the standard Perl distribution):
- use Text::Wrap;
- print wrap("\t", ' ', @paragraphs);
+ use Text::Wrap;
+ print wrap("\t", ' ', @paragraphs);
-The paragraphs you give to Text::Wrap should not contain embedded
-newlines. Text::Wrap doesn't justify the lines (flush-right).
+The paragraphs you give to C<Text::Wrap> should not contain embedded
+newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
-Or use the CPAN module Text::Autoformat. Formatting files can be easily
-done by making a shell alias, like so:
+Or use the CPAN module C<Text::Autoformat>. Formatting files can be
+easily done by making a shell alias, like so:
- alias fmt="perl -i -MText::Autoformat -n0777 \
- -e 'print autoformat $_, {all=>1}' $*"
+ alias fmt="perl -i -MText::Autoformat -n0777 \
+ -e 'print autoformat $_, {all=>1}' $*"
-See the documentation for Text::Autoformat to appreciate its many
+See the documentation for C<Text::Autoformat> to appreciate its many
capabilities.
=head2 How can I access or change N characters of a string?
$string = "Just another Perl Hacker";
- $first_char = substr( $string, 0, 1 ); # 'J'
+ $first_char = substr( $string, 0, 1 ); # 'J'
To change part of a string, you can use the optional fourth
argument which is the replacement string.
- substr( $string, 13, 4, "Perl 5.8.0" );
+ substr( $string, 13, 4, "Perl 5.8.0" );
You can also use substr() as an lvalue.
- substr( $string, 13, 4 ) = "Perl 5.8.0";
+ substr( $string, 13, 4 ) = "Perl 5.8.0";
=head2 How do I change the Nth occurrence of something?
C<"whosoever"> or C<"whomsoever">, case insensitively. These
all assume that $_ contains the string to be altered.
- $count = 0;
- s{((whom?)ever)}{
- ++$count == 5 # is it the 5th?
- ? "${2}soever" # yes, swap
- : $1 # renege and leave it there
- }ige;
+ $count = 0;
+ s{((whom?)ever)}{
+ ++$count == 5 # is it the 5th?
+ ? "${2}soever" # yes, swap
+ : $1 # renege and leave it there
+ }ige;
In the more general case, you can use the C</g> modifier in a C<while>
loop, keeping count of matches.
- $WANT = 3;
- $count = 0;
- $_ = "One fish two fish red fish blue fish";
- while (/(\w+)\s+fish\b/gi) {
- if (++$count == $WANT) {
- print "The third fish is a $1 one.\n";
- }
- }
+ $WANT = 3;
+ $count = 0;
+ $_ = "One fish two fish red fish blue fish";
+ while (/(\w+)\s+fish\b/gi) {
+ if (++$count == $WANT) {
+ print "The third fish is a $1 one.\n";
+ }
+ }
That prints out: C<"The third fish is a red one."> You can also use a
repetition count and repeated pattern like this:
- /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
+ /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
=head2 How can I count the number of occurrences of a substring within a string?
count of a certain single character (X) within a string, you can use the
C<tr///> function like so:
- $string = "ThisXlineXhasXsomeXx'sXinXit";
- $count = ($string =~ tr/X//);
- print "There are $count X characters in the string";
+ $string = "ThisXlineXhasXsomeXx'sXinXit";
+ $count = ($string =~ tr/X//);
+ print "There are $count X characters in the string";
This is fine if you are just looking for a single character. However,
if you are trying to count multiple character substrings within a
loop around a global pattern match. For example, let's count negative
integers:
- $string = "-9 55 48 -2 23 -76 4 14 -44";
- while ($string =~ /-\d+/g) { $count++ }
- print "There are $count negative numbers in the string";
+ $string = "-9 55 48 -2 23 -76 4 14 -44";
+ while ($string =~ /-\d+/g) { $count++ }
+ print "There are $count negative numbers in the string";
Another version uses a global match in list context, then assigns the
result to a scalar, producing a count of the number of matches.
To make the first letter of each word upper case:
- $line =~ s/\b(\w)/\U$1/g;
+ $line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning "C<don't do it>" into "C<Don'T
Do It>". Sometimes you might want this. Other times you might need a
more thorough solution (Suggested by brian d foy):
- $string =~ s/ (
- (^\w) #at the beginning of the line
- | # or
- (\s\w) #preceded by whitespace
- )
- /\U$1/xg;
- $string =~ /([\w']+)/\u\L$1/g;
+ $string =~ s/ (
+ (^\w) #at the beginning of the line
+ | # or
+ (\s\w) #preceded by whitespace
+ )
+ /\U$1/xg;
+
+ $string =~ s/([\w']+)/\u\L$1/g;
To make the whole line upper case:
- $line = uc($line);
+ $line = uc($line);
To force each word to be lower case, with the first letter upper case:
- $line =~ s/(\w+)/\u\L$1/g;
+ $line =~ s/(\w+)/\u\L$1/g;
You can (and probably should) enable locale awareness of those
characters by placing a C<use locale> pragma in your program.
Damian Conway's L<Text::Autoformat> module provides some smart
case transformations:
- use Text::Autoformat;
- my $x = "Dr. Strangelove or: How I Learned to Stop ".
- "Worrying and Love the Bomb";
+ use Text::Autoformat;
+ my $x = "Dr. Strangelove or: How I Learned to Stop ".
+ "Worrying and Love the Bomb";
- print $x, "\n";
- for my $style (qw( sentence title highlight ))
- {
- print autoformat($x, { case => $style }), "\n";
- }
+ print $x, "\n";
+ for my $style (qw( sentence title highlight )) {
+ print autoformat($x, { case => $style }), "\n";
+ }
=head2 How can I split a [character] delimited string except when inside [character]?
-Several modules can handle this sort of pasing---Text::Balanced,
-Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
+Several modules can handle this sort of parsing--C<Text::Balanced>,
+C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
Take the example case of trying to split a string that is
comma-separated into its different fields. You can't use C<split(/,/)>
because you shouldn't split if the comma is inside quotes. For
example, take a data line like this:
- SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
+ SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
problem. Thankfully, we have Jeffrey Friedl, author of
I<Mastering Regular Expressions>, to handle these for us. He
-suggests (assuming your string is contained in $text):
+suggests (assuming your string is contained in C<$text>):
- @new = ();
- push(@new, $+) while $text =~ m{
- "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
- | ([^,]+),?
- | ,
- }gx;
- push(@new, undef) if substr($text,-1,1) eq ',';
+ @new = ();
+ push(@new, $+) while $text =~ m{
+ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
+ | ([^,]+),?
+ | ,
+ }gx;
+ push(@new, undef) if substr($text,-1,1) eq ',';
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">.
-Alternatively, the Text::ParseWords module (part of the standard Perl
-distribution) lets you say:
-
- use Text::ParseWords;
- @new = quotewords(",", 0, $text);
+Alternatively, the C<Text::ParseWords> module (part of the standard
+Perl distribution) lets you say:
-There's also a Text::CSV (Comma-Separated Values) module on CPAN.
+ use Text::ParseWords;
+ @new = quotewords(",", 0, $text);
=head2 How do I strip blank space from the beginning/end of a string?
embedded newline, so it doesn't remove it. It still removes the
newline at the end of the string.
- $string =~ s/^\s+|\s+$//gm;
+ $string =~ s/^\s+|\s+$//gm;
Remember that lines consisting entirely of whitespace will disappear,
since the first part of the alternation can match the entire string
right with blanks and it will truncate the result to a maximum length of
C<$pad_len>.
- # Left padding a string with blanks (no truncation):
+ # Left padding a string with blanks (no truncation):
$padded = sprintf("%${pad_len}s", $text);
$padded = sprintf("%*s", $pad_len, $text); # same thing
- # Right padding a string with blanks (no truncation):
+ # Right padding a string with blanks (no truncation):
$padded = sprintf("%-${pad_len}s", $text);
$padded = sprintf("%-*s", $pad_len, $text); # same thing
- # Left padding a number with 0 (no truncation):
+ # Left padding a number with 0 (no truncation):
$padded = sprintf("%0${pad_len}d", $num);
$padded = sprintf("%0*d", $pad_len, $num); # same thing
- # Right padding a string with blanks using pack (will truncate):
- $padded = pack("A$pad_len",$text);
+ # Right padding a string with blanks using pack (will truncate):
+ $padded = pack("A$pad_len",$text);
If you need to pad with a character other than blank or zero you can use
one of the following methods. They all generate a pad string with the
Left and right padding with any character, creating a new string:
- $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
- $padded = $text . $pad_char x ( $pad_len - length( $text ) );
+ $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
+ $padded = $text . $pad_char x ( $pad_len - length( $text ) );
Left and right padding with any character, modifying C<$text> directly:
- substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
- $text .= $pad_char x ( $pad_len - length( $text ) );
+ substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
+ $text .= $pad_char x ( $pad_len - length( $text ) );
=head2 How do I extract selected columns from a string?
-Use substr() or unpack(), both documented in L<perlfunc>.
+Use C<substr()> or C<unpack()>, both documented in L<perlfunc>.
If you prefer thinking in terms of columns instead of widths,
you can use this kind of thing:
- # determine the unpack format needed to split Linux ps output
- # arguments are cut columns
- my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
-
- sub cut2fmt {
- my(@positions) = @_;
- my $template = '';
- my $lastpos = 1;
- for my $place (@positions) {
- $template .= "A" . ($place - $lastpos) . " ";
- $lastpos = $place;
- }
- $template .= "A*";
- return $template;
- }
+ # determine the unpack format needed to split Linux ps output
+ # arguments are cut columns
+ my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
+
+ sub cut2fmt {
+ my(@positions) = @_;
+ my $template = '';
+ my $lastpos = 1;
+ for my $place (@positions) {
+ $template .= "A" . ($place - $lastpos) . " ";
+ $lastpos = $place;
+ }
+ $template .= "A*";
+ return $template;
+ }
=head2 How do I find the soundex value of a string?
(contributed by brian d foy)
You can use the Text::Soundex module. If you want to do fuzzy or close
-matching, you might also try the String::Approx, and Text::Metaphone,
-and Text::DoubleMetaphone modules.
+matching, you might also try the C<String::Approx>, and
+C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
=head2 How can I expand variables in text strings?
Let's assume that you have a string that contains placeholder
variables.
- $text = 'this has a $foo in it and a $bar';
+ $text = 'this has a $foo in it and a $bar';
You can use a substitution with a double evaluation. The
first /e turns C<$1> into C<$foo>, and the second /e turns
C<eval>: if you try to get the value of an undeclared variable
while running under C<use strict>, you get a fatal error.
- eval { $text =~ s/(\$\w+)/$1/eeg };
- die if $@;
+ eval { $text =~ s/(\$\w+)/$1/eeg };
+ die if $@;
It's probably better in the general case to treat those
variables as entries in some special hash. For example:
- %user_defs = (
- foo => 23,
- bar => 19,
- );
- $text =~ s/\$(\w+)/$user_defs{$1}/g;
+ %user_defs = (
+ foo => 23,
+ bar => 19,
+ );
+ $text =~ s/\$(\w+)/$user_defs{$1}/g;
=head2 What's wrong with always quoting "$vars"?
-The problem is that those double-quotes force stringification--
-coercing numbers and references into strings--even when you
-don't want them to be strings. Think of it this way: double-quote
-expansion is used to produce new strings. If you already
-have a string, why do you need more?
+The problem is that those double-quotes force
+stringification--coercing numbers and references into
+strings--even when you don't want them to be strings. Think
+of it this way: double-quote expansion is used to produce
+new strings. If you already have a string, why do you need
+more?
If you get used to writing odd things like these:
- print "$var"; # BAD
- $new = "$old"; # BAD
- somefunc("$var"); # BAD
+ print "$var"; # BAD
+ $new = "$old"; # BAD
+ somefunc("$var"); # BAD
You'll be in trouble. Those should (in 99.8% of the cases) be
the simpler and more direct:
- print $var;
- $new = $old;
- somefunc($var);
+ print $var;
+ $new = $old;
+ somefunc($var);
Otherwise, besides slowing you down, you're going to break code when
the thing in the scalar is actually neither a string nor a number, but
a reference:
- func(\@array);
- sub func {
- my $aref = shift;
- my $oref = "$aref"; # WRONG
- }
+ func(\@array);
+ sub func {
+ my $aref = shift;
+ my $oref = "$aref"; # WRONG
+ }
You can also get into subtle problems on those few operations in Perl
that actually do care about the difference between a string and a
Stringification also destroys arrays.
- @lines = `command`;
- print "@lines"; # WRONG - extra blanks
- print @lines; # right
+ @lines = `command`;
+ print "@lines"; # WRONG - extra blanks
+ print @lines; # right
=head2 Why don't my E<lt>E<lt>HERE documents work?
This works with leading special strings, dynamically determined:
- $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
+ $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
@@@ int
@@@ runops() {
@@@ SAVEI32(runlevel);
@@@ TAINT_NOT;
@@@ return 0;
@@@ }
- MAIN_INTERPRETER_LOOP
+ MAIN_INTERPRETER_LOOP
Or with a fixed amount of leading whitespace, with remaining
indentation correctly preserved:
- $poem = fix<<EVER_ON_AND_ON;
+ $poem = fix<<EVER_ON_AND_ON;
Now far ahead the Road has gone,
And I must follow, if I can,
Pursuing it with eager feet,
Where many paths and errands meet.
And whither then? I cannot say.
--Bilbo in /usr/src/perl/pp_ctl.c
- EVER_ON_AND_ON
+ EVER_ON_AND_ON
=head1 Data: Arrays
=head2 What is the difference between a list and an array?
-An array has a changeable length. A list does not. An array is something
-you can push or pop, while a list is a set of values. Some people make
-the distinction that a list is a value while an array is a variable.
-Subroutines are passed and return lists, you put things into list
-context, you initialize arrays with lists, and you foreach() across
-a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
-in scalar context behave like the number of elements in them, subroutines
-access their arguments through the array C<@_>, and push/pop/shift only work
-on arrays.
+An array has a changeable length. A list does not. An array is
+something you can push or pop, while a list is a set of values. Some
+people make the distinction that a list is a value while an array is a
+variable. Subroutines are passed and return lists, you put things into
+list context, you initialize arrays with lists, and you C<foreach()>
+across a list. C<@> variables are arrays, anonymous arrays are
+arrays, arrays in scalar context behave like the number of elements in
+them, subroutines access their arguments through the array C<@_>, and
+C<push>/C<pop>/C<shift> only work on arrays.
As a side note, there's no such thing as a list in scalar context.
When you say
- $scalar = (2, 5, 7, 9);
+ $scalar = (2, 5, 7, 9);
you're using the comma operator in scalar context, so it uses the scalar
-comma operator. There never was a list there at all! This causes the
+comma operator. There never was a list there at all! This causes the
last value to be returned: 9.
=head2 What is the difference between $array[1] and @array[1]?
Sometimes it doesn't make a difference, but sometimes it does.
For example, compare:
- $good[0] = `some program that outputs several lines`;
+ $good[0] = `some program that outputs several lines`;
with
- @bad[0] = `same program that outputs several lines`;
+ @bad[0] = `same program that outputs several lines`;
The C<use warnings> pragma and the B<-w> flag will warn you about these
matters.
create that hash: just that you use C<keys> to get the unique
elements.
- my %hash = map { $_, 1 } @array;
- # or a hash slice: @hash{ @array } = ();
- # or a foreach: $hash{$_} = 1 foreach ( @array );
+ my %hash = map { $_, 1 } @array;
+ # or a hash slice: @hash{ @array } = ();
+ # or a foreach: $hash{$_} = 1 foreach ( @array );
+
+ my @unique = keys %hash;
- my @unique = keys %hash;
+If you want to use a module, try the C<uniq> function from
+C<List::MoreUtils>. In list context it returns the unique elements,
+preserving their order in the list. In scalar context, it returns the
+number of unique elements.
+
+ use List::MoreUtils qw(uniq);
+
+ my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
+ my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
the loop continues to the C<push> and increments the value for that
key. The next time the loop sees that same element, its key exists in
the hash I<and> the value for that key is true (since it's not 0 or
-undef), so the next skips that iteration and the loop goes to the next
-element.
+C<undef>), so the next skips that iteration and the loop goes to the
+next element.
my @unique = ();
my %seen = ();
You can write this more briefly using a grep, which does the
same thing.
- my %seen = ();
- my @unique = grep { ! $seen{ $_ }++ } @array;
+ my %seen = ();
+ my @unique = grep { ! $seen{ $_ }++ } @array;
=head2 How can I tell whether a certain element is contained in a list or array?
the fastest way is probably to invert the original array and maintain a
hash whose keys are the first array's values.
- @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
- %is_blue = ();
- for (@blues) { $is_blue{$_} = 1 }
+ @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
+ %is_blue = ();
+ for (@blues) { $is_blue{$_} = 1 }
-Now you can check whether $is_blue{$some_color}. It might have been a
-good idea to keep the blues all in a hash in the first place.
+Now you can check whether C<$is_blue{$some_color}>. It might have
+been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed
array. This kind of an array will take up less space:
- @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
- @is_tiny_prime = ();
- for (@primes) { $is_tiny_prime[$_] = 1 }
- # or simply @istiny_prime[@primes] = (1) x @primes;
+ @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
+ @is_tiny_prime = ();
+ for (@primes) { $is_tiny_prime[$_] = 1 }
+ # or simply @istiny_prime[@primes] = (1) x @primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save
quite a lot of space by using bit strings instead:
- @articles = ( 1..10, 150..2000, 2017 );
- undef $read;
- for (@articles) { vec($read,$_,1) = 1 }
+ @articles = ( 1..10, 150..2000, 2017 );
+ undef $read;
+ for (@articles) { vec($read,$_,1) = 1 }
Now check whether C<vec($read,$n,1)> is true for some C<$n>.
of the original list or array. They only pay off if you have to test
multiple values against the same array.
-If you are testing only once, the standard module List::Util exports
+If you are testing only once, the standard module C<List::Util> exports
the function C<first> for this purpose. It works by stopping once it
finds the element. It's written in C for speed, and its Perl equivalant
looks like this subroutine:
=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
-Use a hash. Here's code to do both and more. It assumes that
-each element is unique in a given array:
+Use a hash. Here's code to do both and more. It assumes that each
+element is unique in a given array:
- @union = @intersection = @difference = ();
- %count = ();
- foreach $element (@array1, @array2) { $count{$element}++ }
- foreach $element (keys %count) {
- push @union, $element;
- push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
- }
+ @union = @intersection = @difference = ();
+ %count = ();
+ foreach $element (@array1, @array2) { $count{$element}++ }
+ foreach $element (keys %count) {
+ push @union, $element;
+ push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
+ }
-Note that this is the I<symmetric difference>, that is, all elements in
-either A or in B but not in both. Think of it as an xor operation.
+Note that this is the I<symmetric difference>, that is, all elements
+in either A or in B but not in both. Think of it as an xor operation.
=head2 How do I test whether two arrays or hashes are equal?
-The following code works for single-level arrays. It uses a stringwise
-comparison, and does not distinguish defined versus undefined empty
-strings. Modify if you have other needs.
+The following code works for single-level arrays. It uses a
+stringwise comparison, and does not distinguish defined versus
+undefined empty strings. Modify if you have other needs.
- $are_equal = compare_arrays(\@frogs, \@toads);
+ $are_equal = compare_arrays(\@frogs, \@toads);
- sub compare_arrays {
- my ($first, $second) = @_;
- no warnings; # silence spurious -w undef complaints
- return 0 unless @$first == @$second;
- for (my $i = 0; $i < @$first; $i++) {
- return 0 if $first->[$i] ne $second->[$i];
- }
- return 1;
- }
+ sub compare_arrays {
+ my ($first, $second) = @_;
+ no warnings; # silence spurious -w undef complaints
+ return 0 unless @$first == @$second;
+ for (my $i = 0; $i < @$first; $i++) {
+ return 0 if $first->[$i] ne $second->[$i];
+ }
+ return 1;
+ }
For multilevel structures, you may wish to use an approach more
-like this one. It uses the CPAN module FreezeThaw:
+like this one. It uses the CPAN module C<FreezeThaw>:
- use FreezeThaw qw(cmpStr);
- @a = @b = ( "this", "that", [ "more", "stuff" ] );
+ use FreezeThaw qw(cmpStr);
+ @a = @b = ( "this", "that", [ "more", "stuff" ] );
- printf "a and b contain %s arrays\n",
- cmpStr(\@a, \@b) == 0
- ? "the same"
- : "different";
+ printf "a and b contain %s arrays\n",
+ cmpStr(\@a, \@b) == 0
+ ? "the same"
+ : "different";
-This approach also works for comparing hashes. Here
-we'll demonstrate two different answers:
+This approach also works for comparing hashes. Here we'll demonstrate
+two different answers:
- use FreezeThaw qw(cmpStr cmpStrHard);
+ use FreezeThaw qw(cmpStr cmpStrHard);
- %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
- $a{EXTRA} = \%b;
- $b{EXTRA} = \%a;
+ %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
+ $a{EXTRA} = \%b;
+ $b{EXTRA} = \%a;
- printf "a and b contain %s hashes\n",
+ printf "a and b contain %s hashes\n",
cmpStr(\%a, \%b) == 0 ? "the same" : "different";
- printf "a and b contain %s hashes\n",
+ printf "a and b contain %s hashes\n",
cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
=head2 How do I find the first array element for which a condition is true?
To find the first array element which satisfies a condition, you can
-use the first() function in the List::Util module, which comes with
-Perl 5.8. This example finds the first element that contains "Perl".
+use the C<first()> function in the C<List::Util> module, which comes
+with Perl 5.8. This example finds the first element that contains
+"Perl".
use List::Util qw(first);
my $element = first { /Perl/ } @array;
-If you cannot use List::Util, you can make your own loop to do the
+If you cannot use C<List::Util>, you can make your own loop to do the
same thing. Once you find the element, you stop the loop with last.
my $found;
- foreach ( @array )
- {
+ foreach ( @array ) {
if( /Perl/ ) { $found = $_; last }
}
that satisfies the condition.
my( $found, $index ) = ( undef, -1 );
- for( $i = 0; $i < @array; $i++ )
- {
- if( $array[$i] =~ /Perl/ )
- {
+ for( $i = 0; $i < @array; $i++ ) {
+ if( $array[$i] =~ /Perl/ ) {
$found = $array[$i];
$index = $i;
last;
=head2 How do I handle linked lists?
In general, you usually don't need a linked list in Perl, since with
-regular arrays, you can push and pop or shift and unshift at either end,
-or you can use splice to add and/or remove arbitrary number of elements at
-arbitrary points. Both pop and shift are both O(1) operations on Perl's
-dynamic arrays. In the absence of shifts and pops, push in general
-needs to reallocate on the order every log(N) times, and unshift will
-need to copy pointers each time.
+regular arrays, you can push and pop or shift and unshift at either
+end, or you can use splice to add and/or remove arbitrary number of
+elements at arbitrary points. Both pop and shift are both O(1)
+operations on Perl's dynamic arrays. In the absence of shifts and
+pops, push in general needs to reallocate on the order every log(N)
+times, and unshift will need to copy pointers each time.
If you really, really wanted, you could use structures as described in
-L<perldsc> or L<perltoot> and do just what the algorithm book tells you
-to do. For example, imagine a list node like this:
+L<perldsc> or L<perltoot> and do just what the algorithm book tells
+you to do. For example, imagine a list node like this:
- $node = {
- VALUE => 42,
- LINK => undef,
- };
+ $node = {
+ VALUE => 42,
+ LINK => undef,
+ };
You could walk the list this way:
- print "List: ";
- for ($node = $head; $node; $node = $node->{LINK}) {
- print $node->{VALUE}, " ";
- }
- print "\n";
+ print "List: ";
+ for ($node = $head; $node; $node = $node->{LINK}) {
+ print $node->{VALUE}, " ";
+ }
+ print "\n";
You could add to the list this way:
- my ($head, $tail);
- $tail = append($head, 1); # grow a new head
- for $value ( 2 .. 10 ) {
- $tail = append($tail, $value);
- }
+ my ($head, $tail);
+ $tail = append($head, 1); # grow a new head
+ for $value ( 2 .. 10 ) {
+ $tail = append($tail, $value);
+ }
- sub append {
- my($list, $value) = @_;
- my $node = { VALUE => $value };
- if ($list) {
- $node->{LINK} = $list->{LINK};
- $list->{LINK} = $node;
- } else {
- $_[0] = $node; # replace caller's version
- }
- return $node;
- }
+ sub append {
+ my($list, $value) = @_;
+ my $node = { VALUE => $value };
+ if ($list) {
+ $node->{LINK} = $list->{LINK};
+ $list->{LINK} = $node;
+ }
+ else {
+ $_[0] = $node; # replace caller's version
+ }
+ return $node;
+ }
But again, Perl's built-in are virtually always good enough.
Circular lists could be handled in the traditional fashion with linked
lists, or you could just do something like this with an array:
- unshift(@array, pop(@array)); # the last shall be first
- push(@array, shift(@array)); # and vice versa
+ unshift(@array, pop(@array)); # the last shall be first
+ push(@array, shift(@array)); # and vice versa
+
+You can also use C<Tie::Cycle>:
+
+ use Tie::Cycle;
+
+ tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
+
+ print $cycle; # FFFFFF
+ print $cycle; # 000000
+ print $cycle; # FFFF00
=head2 How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
Scalar-List-Utils 1.03 or later installed, you can say:
- use List::Util 'shuffle';
+ use List::Util 'shuffle';
@shuffled = shuffle(@list);
If not, you can use a Fisher-Yates shuffle.
- sub fisher_yates_shuffle {
- my $deck = shift; # $deck is a reference to an array
- my $i = @$deck;
- while (--$i) {
- my $j = int rand ($i+1);
- @$deck[$i,$j] = @$deck[$j,$i];
- }
- }
+ sub fisher_yates_shuffle {
+ my $deck = shift; # $deck is a reference to an array
+ my $i = @$deck;
+ while (--$i) {
+ my $j = int rand ($i+1);
+ @$deck[$i,$j] = @$deck[$j,$i];
+ }
+ }
- # shuffle my mpeg collection
- #
- my @mpeg = <audio/*/*.mp3>;
- fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
- print @mpeg;
+ # shuffle my mpeg collection
+ #
+ my @mpeg = <audio/*/*.mp3>;
+ fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
+ print @mpeg;
Note that the above implementation shuffles an array in place,
-unlike the List::Util::shuffle() which takes a list and returns
+unlike the C<List::Util::shuffle()> which takes a list and returns
a new shuffled list.
You've probably seen shuffling algorithms that work using splice,
randomly picking another element to swap the current element with
- srand;
- @new = ();
- @old = 1 .. 10; # just a demo
- while (@old) {
- push(@new, splice(@old, rand @old, 1));
- }
+ srand;
+ @new = ();
+ @old = 1 .. 10; # just a demo
+ while (@old) {
+ push(@new, splice(@old, rand @old, 1));
+ }
-This is bad because splice is already O(N), and since you do it N times,
-you just invented a quadratic algorithm; that is, O(N**2). This does
-not scale, although Perl is so efficient that you probably won't notice
-this until you have rather largish arrays.
+This is bad because splice is already O(N), and since you do it N
+times, you just invented a quadratic algorithm; that is, O(N**2).
+This does not scale, although Perl is so efficient that you probably
+won't notice this until you have rather largish arrays.
=head2 How do I process/modify each element of an array?
Use C<for>/C<foreach>:
- for (@lines) {
+ for (@lines) {
s/foo/bar/; # change that word
tr/XZ/ZX/; # swap those letters
- }
+ }
Here's another; let's compute spherical volumes:
- for (@volumes = @radii) { # @volumes has changed parts
+ for (@volumes = @radii) { # @volumes has changed parts
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
- }
+ }
-which can also be done with map() which is made to transform
+which can also be done with C<map()> which is made to transform
one list into another:
@volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
the values are not copied, so if you modify $orbit (in this
case), you modify the value.
- for $orbit ( values %orbits ) {
+ for $orbit ( values %orbits ) {
($orbit **= 3) *= (4/3) * 3.14159;
- }
+ }
Prior to perl 5.6 C<values> returned copies of the values,
so older perl code often contains constructions such as
=head2 How do I select a random element from an array?
-Use the rand() function (see L<perlfunc/rand>):
+Use the C<rand()> function (see L<perlfunc/rand>):
- $index = rand @array;
- $element = $array[$index];
+ $index = rand @array;
+ $element = $array[$index];
Or, simply:
- my $element = $array[ rand @array ];
+
+ my $element = $array[ rand @array ];
=head2 How do I permute N elements of a list?
-Use the List::Permutor module on CPAN. If the list is
-actually an array, try the Algorithm::Permute module (also
-on CPAN). It's written in XS code and is very efficient.
+Use the C<List::Permutor> module on CPAN. If the list is actually an
+array, try the C<Algorithm::Permute> module (also on CPAN). It's
+written in XS code and is very efficient.
use Algorithm::Permute;
my @array = 'a'..'d';
my $p_iterator = Algorithm::Permute->new ( \@array );
while (my @perm = $p_iterator->next) {
print "next permutation: (@perm)\n";
- }
+ }
For even faster execution, you could do:
- use Algorithm::Permute;
- my @array = 'a'..'d';
- Algorithm::Permute::permute {
- print "next permutation: (@array)\n";
- } @array;
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ Algorithm::Permute::permute {
+ print "next permutation: (@array)\n";
+ } @array;
Here's a little program that generates all permutations of
all the words on each line of input. The algorithm embodied
-in the permute() function is discussed in Volume 4 (still
+in the C<permute()> function is discussed in Volume 4 (still
unpublished) of Knuth's I<The Art of Computer Programming>
and will work on any list:
Supply a comparison function to sort() (described in L<perlfunc/sort>):
- @list = sort { $a <=> $b } @list;
+ @list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would
sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
after the first number on each item, and then sort those words
case-insensitively.
- @idx = ();
- for (@data) {
- ($item) = /\d+\s*(\S+)/;
- push @idx, uc($item);
- }
- @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
+ @idx = ();
+ for (@data) {
+ ($item) = /\d+\s*(\S+)/;
+ push @idx, uc($item);
+ }
+ @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
which could also be written this way, using a trick
that's come to be known as the Schwartzian Transform:
- @sorted = map { $_->[0] }
- sort { $a->[1] cmp $b->[1] }
- map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
+ @sorted = map { $_->[0] }
+ sort { $a->[1] cmp $b->[1] }
+ map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
If you need to sort on several fields, the following paradigm is useful.
- @sorted = sort { field1($a) <=> field1($b) ||
- field2($a) cmp field2($b) ||
- field3($a) cmp field3($b)
- } @data;
+ @sorted = sort {
+ field1($a) <=> field1($b) ||
+ field2($a) cmp field2($b) ||
+ field3($a) cmp field3($b)
+ } @data;
This can be conveniently combined with precalculation of keys as given
above.
To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
more about this approach.
-See also the question below on sorting hashes.
+See also the question later in L<perlfaq4> on sorting hashes.
=head2 How do I manipulate arrays of bits?
-Use pack() and unpack(), or else vec() and the bitwise operations.
-
-For example, this sets $vec to have bit N set if $ints[N] was set:
-
- $vec = '';
- foreach(@ints) { vec($vec,$_,1) = 1 }
-
-Here's how, given a vector in $vec, you can
-get those bits into your @ints array:
-
- sub bitvec_to_list {
- my $vec = shift;
- my @ints;
- # Find null-byte density then select best algorithm
- if ($vec =~ tr/\0// / length $vec > 0.95) {
- use integer;
- my $i;
- # This method is faster with mostly null-bytes
- while($vec =~ /[^\0]/g ) {
- $i = -9 + 8 * pos $vec;
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- }
- } else {
- # This method is a fast general algorithm
- use integer;
- my $bits = unpack "b*", $vec;
- push @ints, 0 if $bits =~ s/^(\d)// && $1;
- push @ints, pos $bits while($bits =~ /1/g);
- }
- return \@ints;
- }
+Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
+operations.
+
+For example, this sets C<$vec> to have bit N set if C<$ints[N]> was
+set:
+
+ $vec = '';
+ foreach(@ints) { vec($vec,$_,1) = 1 }
+
+Here's how, given a vector in C<$vec>, you can get those bits into your
+C<@ints> array:
+
+ sub bitvec_to_list {
+ my $vec = shift;
+ my @ints;
+ # Find null-byte density then select best algorithm
+ if ($vec =~ tr/\0// / length $vec > 0.95) {
+ use integer;
+ my $i;
+
+ # This method is faster with mostly null-bytes
+ while($vec =~ /[^\0]/g ) {
+ $i = -9 + 8 * pos $vec;
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ push @ints, $i if vec($vec, ++$i, 1);
+ }
+ }
+ else {
+ # This method is a fast general algorithm
+ use integer;
+ my $bits = unpack "b*", $vec;
+ push @ints, 0 if $bits =~ s/^(\d)// && $1;
+ push @ints, pos $bits while($bits =~ /1/g);
+ }
+
+ return \@ints;
+ }
This method gets faster the more sparse the bit vector is.
(Courtesy of Tim Bunce and Winfried Koenig.)
from Benjamin Goldberg:
while($vec =~ /[^\0]+/g ) {
- push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
- }
+ push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
+ }
-Or use the CPAN module Bit::Vector:
+Or use the CPAN module C<Bit::Vector>:
- $vector = Bit::Vector->new($num_of_bits);
- $vector->Index_List_Store(@ints);
- @ints = $vector->Index_List_Read();
+ $vector = Bit::Vector->new($num_of_bits);
+ $vector->Index_List_Store(@ints);
+ @ints = $vector->Index_List_Read();
-Bit::Vector provides efficient methods for bit vector, sets of small integers
-and "big int" math.
+C<Bit::Vector> provides efficient methods for bit vector, sets of
+small integers and "big int" math.
Here's a more extensive illustration using vec():
- # vec demo
- $vector = "\xff\x0f\xef\xfe";
- print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
+ # vec demo
+ $vector = "\xff\x0f\xef\xfe";
+ print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
unpack("N", $vector), "\n";
- $is_set = vec($vector, 23, 1);
- print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
- pvec($vector);
-
- set_vec(1,1,1);
- set_vec(3,1,1);
- set_vec(23,1,1);
-
- set_vec(3,1,3);
- set_vec(3,2,3);
- set_vec(3,4,3);
- set_vec(3,4,7);
- set_vec(3,8,3);
- set_vec(3,8,7);
-
- set_vec(0,32,17);
- set_vec(1,32,17);
-
- sub set_vec {
- my ($offset, $width, $value) = @_;
- my $vector = '';
- vec($vector, $offset, $width) = $value;
- print "offset=$offset width=$width value=$value\n";
+ $is_set = vec($vector, 23, 1);
+ print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
pvec($vector);
- }
- sub pvec {
- my $vector = shift;
- my $bits = unpack("b*", $vector);
- my $i = 0;
- my $BASE = 8;
+ set_vec(1,1,1);
+ set_vec(3,1,1);
+ set_vec(23,1,1);
+
+ set_vec(3,1,3);
+ set_vec(3,2,3);
+ set_vec(3,4,3);
+ set_vec(3,4,7);
+ set_vec(3,8,3);
+ set_vec(3,8,7);
+
+ set_vec(0,32,17);
+ set_vec(1,32,17);
+
+ sub set_vec {
+ my ($offset, $width, $value) = @_;
+ my $vector = '';
+ vec($vector, $offset, $width) = $value;
+ print "offset=$offset width=$width value=$value\n";
+ pvec($vector);
+ }
- print "vector length in bytes: ", length($vector), "\n";
- @bytes = unpack("A8" x length($vector), $bits);
- print "bits are: @bytes\n\n";
- }
+ sub pvec {
+ my $vector = shift;
+ my $bits = unpack("b*", $vector);
+ my $i = 0;
+ my $BASE = 8;
+
+ print "vector length in bytes: ", length($vector), "\n";
+ @bytes = unpack("A8" x length($vector), $bits);
+ print "bits are: @bytes\n\n";
+ }
=head2 Why does defined() return true on empty arrays and hashes?
Use the each() function (see L<perlfunc/each>) if you don't care
whether it's sorted:
- while ( ($key, $value) = each %hash) {
- print "$key = $value\n";
- }
+ while ( ($key, $value) = each %hash) {
+ print "$key = $value\n";
+ }
If you want it sorted, you'll have to use foreach() on the result of
sorting the keys as shown in an earlier question.
Create a reverse hash:
- %by_value = reverse %by_key;
- $key = $by_value{$value};
+ %by_value = reverse %by_key;
+ $key = $by_value{$value};
That's not particularly efficient. It would be more space-efficient
to use:
- while (($key, $value) = each %by_key) {
- $by_value{$value} = $key;
- }
+ while (($key, $value) = each %by_key) {
+ $by_value{$value} = $key;
+ }
If your hash could have repeated values, the methods above will only find
one of the associated keys. This may or may not worry you. If it does
worry you, you can always reverse the hash into a hash of arrays instead:
- while (($key, $value) = each %by_key) {
- push @{$key_list_by_value{$value}}, $key;
- }
+ while (($key, $value) = each %by_key) {
+ push @{$key_list_by_value{$value}}, $key;
+ }
=head2 How can I know how many entries are in a hash?
} keys %hash;
=head2 How can I always keep my hash sorted?
+X<hash tie sort DB_File Tie::IxHash>
-You can look into using the DB_File module and tie() using the
-$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
-The Tie::IxHash module from CPAN might also be instructive.
+You can look into using the C<DB_File> module and C<tie()> using the
+C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
+Databases">. The C<Tie::IxHash> module from CPAN might also be
+instructive. Although this does keep your hash sorted, you might not
+like the slow down you suffer from the tie interface. Are you sure you
+need to do this? :)
=head2 What's the difference between "delete" and "undef" with hashes?
Hashes contain pairs of scalars: the first is the key, the
second is the value. The key will be coerced to a string,
although the value can be any kind of scalar: string,
-number, or reference. If a key $key is present in
+number, or reference. If a key C<$key> is present in
%hash, C<exists($hash{$key})> will return true. The value
for a given key can be C<undef>, in which case
C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
will return true. This corresponds to (C<$key>, C<undef>)
being in the hash.
-Pictures help... here's the %hash table:
+Pictures help... here's the C<%hash> table:
keys values
+------+------+
Using C<keys %hash> in scalar context returns the number of keys in
the hash I<and> resets the iterator associated with the hash. You may
-need to do this if you use C<last> to exit a loop early so that when you
-re-enter it, the hash iterator has been reset.
+need to do this if you use C<last> to exit a loop early so that when
+you re-enter it, the hash iterator has been reset.
=head2 How can I get the unique keys from two hashes?
First you extract the keys from the hashes into lists, then solve
the "removing duplicates" problem described above. For example:
- %seen = ();
- for $element (keys(%foo), keys(%bar)) {
- $seen{$element}++;
- }
- @uniq = keys %seen;
+ %seen = ();
+ for $element (keys(%foo), keys(%bar)) {
+ $seen{$element}++;
+ }
+ @uniq = keys %seen;
Or more succinctly:
- @uniq = keys %{{%foo,%bar}};
+ @uniq = keys %{{%foo,%bar}};
Or if you really want to save space:
- %seen = ();
- while (defined ($key = each %foo)) {
- $seen{$key}++;
- }
- while (defined ($key = each %bar)) {
- $seen{$key}++;
- }
- @uniq = keys %seen;
+ %seen = ();
+ while (defined ($key = each %foo)) {
+ $seen{$key}++;
+ }
+ while (defined ($key = each %bar)) {
+ $seen{$key}++;
+ }
+ @uniq = keys %seen;
=head2 How can I store a multidimensional array in a DBM file?
=head2 How can I make my hash remember the order I put elements into it?
-Use the Tie::IxHash from CPAN.
+Use the C<Tie::IxHash> from CPAN.
- use Tie::IxHash;
- tie my %myhash, 'Tie::IxHash';
- for (my $i=0; $i<20; $i++) {
- $myhash{$i} = 2*$i;
- }
- my @keys = keys %myhash;
- # @keys = (0,1,2,3,...)
+ use Tie::IxHash;
+
+ tie my %myhash, 'Tie::IxHash';
+
+ for (my $i=0; $i<20; $i++) {
+ $myhash{$i} = 2*$i;
+ }
+
+ my @keys = keys %myhash;
+ # @keys = (0,1,2,3,...)
=head2 Why does passing a subroutine an undefined element in a hash create it?
If you say something like:
- somefunc($hash{"nonesuch key here"});
+ somefunc($hash{"nonesuch key here"});
Then that element "autovivifies"; that is, it springs into existence
whether you store something there or not. That's because functions
Usually a hash ref, perhaps like this:
- $record = {
- NAME => "Jason",
- EMPNO => 132,
- TITLE => "deputy peon",
- AGE => 23,
- SALARY => 37_000,
- PALS => [ "Norbert", "Rhys", "Phineas"],
- };
+ $record = {
+ NAME => "Jason",
+ EMPNO => 132,
+ TITLE => "deputy peon",
+ AGE => 23,
+ SALARY => 37_000,
+ PALS => [ "Norbert", "Rhys", "Phineas"],
+ };
References are documented in L<perlref> and the upcoming L<perlreftut>.
Examples of complex data structures are given in L<perldsc> and
Hash keys are strings, so you can't really use a reference as the key.
When you try to do that, perl turns the reference into its stringified
-form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get back
-the reference from the stringified form, at least without doing some
-extra work on your own. Also remember that hash keys must be unique, but
-two different variables can store the same reference (and those variables
-can change later).
+form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
+back the reference from the stringified form, at least without doing
+some extra work on your own. Also remember that hash keys must be
+unique, but two different variables can store the same reference (and
+those variables can change later).
-The Tie::RefHash module, which is distributed with perl, might be what
-you want. It handles that extra work.
+The C<Tie::RefHash> module, which is distributed with perl, might be
+what you want. It handles that extra work.
=head1 Data: Misc
=head2 How do I handle binary data correctly?
-Perl is binary clean, so this shouldn't be a problem. For example,
-this works fine (assuming the files are found):
-
- if (`cat /vmunix` =~ /gzip/) {
- print "Your kernel is GNU-zip enabled!\n";
- }
-
-On less elegant (read: Byzantine) systems, however, you have
-to play tedious games with "text" versus "binary" files. See
-L<perlfunc/"binmode"> or L<perlopentut>.
+Perl is binary clean, so it can handle binary data just fine.
+On Windows or DOS, however, you have to use C<binmode> for binary
+files to avoid conversions for line endings. In general, you should
+use C<binmode> any time you want to work with binary data.
-If you're concerned about 8-bit ASCII data, then see L<perllocale>.
+Also see L<perlfunc/"binmode"> or L<perlopentut>.
+If you're concerned about 8-bit textual data then see L<perllocale>.
If you want to deal with multibyte characters, however, there are
some gotchas. See the section on Regular Expressions.
Assuming that you don't care about IEEE notations like "NaN" or
"Infinity", you probably just want to use a regular expression.
- if (/\D/) { print "has nondigits\n" }
- if (/^\d+$/) { print "is a whole number\n" }
- if (/^-?\d+$/) { print "is an integer\n" }
- if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
- if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
- if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
- if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
+ if (/\D/) { print "has nondigits\n" }
+ if (/^\d+$/) { print "is a whole number\n" }
+ if (/^-?\d+$/) { print "is an integer\n" }
+ if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
+ if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
+ if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
+ if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
There are also some commonly used modules for the task.
L<Scalar::Util> (distributed with 5.8) provides access to perl's
-internal function C<looks_like_number> for determining
-whether a variable looks like a number. L<Data::Types>
-exports functions that validate data types using both the
-above and other regular expressions. Thirdly, there is
-C<Regexp::Common> which has regular expressions to match
-various types of numbers. Those three modules are available
-from the CPAN.
+internal function C<looks_like_number> for determining whether a
+variable looks like a number. L<Data::Types> exports functions that
+validate data types using both the above and other regular
+expressions. Thirdly, there is C<Regexp::Common> which has regular
+expressions to match various types of numbers. Those three modules are
+available from the CPAN.
If you're on a POSIX system, Perl supports the C<POSIX::strtod>
-function. Its semantics are somewhat cumbersome, so here's a C<getnum>
-wrapper function for more convenient access. This function takes
-a string and returns the number it found, or C<undef> for input that
-isn't a C float. The C<is_numeric> function is a front end to C<getnum>
-if you just want to say, "Is this a float?"
-
- sub getnum {
- use POSIX qw(strtod);
- my $str = shift;
- $str =~ s/^\s+//;
- $str =~ s/\s+$//;
- $! = 0;
- my($num, $unparsed) = strtod($str);
- if (($str eq '') || ($unparsed != 0) || $!) {
- return undef;
- } else {
- return $num;
- }
- }
+function. Its semantics are somewhat cumbersome, so here's a
+C<getnum> wrapper function for more convenient access. This function
+takes a string and returns the number it found, or C<undef> for input
+that isn't a C float. The C<is_numeric> function is a front end to
+C<getnum> if you just want to say, "Is this a float?"
+
+ sub getnum {
+ use POSIX qw(strtod);
+ my $str = shift;
+ $str =~ s/^\s+//;
+ $str =~ s/\s+$//;
+ $! = 0;
+ my($num, $unparsed) = strtod($str);
+ if (($str eq '') || ($unparsed != 0) || $!) {
+ return undef;
+ }
+ else {
+ return $num;
+ }
+ }
- sub is_numeric { defined getnum($_[0]) }
+ sub is_numeric { defined getnum($_[0]) }
Or you could check out the L<String::Scanf> module on the CPAN
-instead. The POSIX module (part of the standard Perl distribution) provides
-the C<strtod> and C<strtol> for converting strings to double and longs,
-respectively.
+instead. The C<POSIX> module (part of the standard Perl distribution)
+provides the C<strtod> and C<strtol> for converting strings to double
+and longs, respectively.
=head2 How do I keep persistent data across program calls?
For some specific applications, you can use one of the DBM modules.
-See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
-or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
-of the standard distribution. Here's one example using Storable's C<store>
+See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
+or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
+of the standard distribution. Here's one example using C<Storable>'s C<store>
and C<retrieve> functions:
- use Storable;
- store(\%hash, "filename");
+ use Storable;
+ store(\%hash, "filename");
- # later on...
- $href = retrieve("filename"); # by ref
- %hash = %{ retrieve("filename") }; # direct to hash
+ # later on...
+ $href = retrieve("filename"); # by ref
+ %hash = %{ retrieve("filename") }; # direct to hash
=head2 How do I print out or copy a recursive data structure?
-The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
-for printing out data structures. The Storable module on CPAN (or the
+The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
+for printing out data structures. The C<Storable> module on CPAN (or the
5.8 release of Perl), provides a function called C<dclone> that recursively
copies its argument.
- use Storable qw(dclone);
- $r2 = dclone($r1);
+ use Storable qw(dclone);
+ $r2 = dclone($r1);
-Where $r1 can be a reference to any kind of data structure you'd like.
+Where C<$r1> can be a reference to any kind of data structure you'd like.
It will be deeply copied. Because C<dclone> takes and returns references,
you'd have to add extra punctuation if you had a hash of arrays that
you wanted to copy.
- %newhash = %{ dclone(\%oldhash) };
+ %newhash = %{ dclone(\%oldhash) };
=head2 How do I define methods for every class/object?
-Use the UNIVERSAL class (see L<UNIVERSAL>).
+Use the C<UNIVERSAL> class (see L<UNIVERSAL>).
=head2 How do I verify a credit card checksum?
-Get the Business::CreditCard module from CPAN.
+Get the C<Business::CreditCard> module from CPAN.
=head2 How do I pack arrays of doubles or floats for XS code?
-The kgbpack.c code in the PGPLOT module on CPAN does just this.
+The kgbpack.c code in the C<PGPLOT> module on CPAN does just this.
If you're doing a lot of float or double processing, consider using
-the PDL module from CPAN instead--it makes number-crunching easy.
+the C<PDL> module from CPAN instead--it makes number-crunching easy.
=head1 REVISION
-Revision: $Revision: 3606 $
+Revision: $Revision: 6816 $
-Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $
+Date: $Date: 2006-08-20 21:20:03 +0200 (dim, 20 aoĆ» 2006) $
See L<perlfaq> for source control details and availability.
=head1 NAME
-perlfaq6 - Regular Expressions ($Revision: 3606 $)
+perlfaq6 - Regular Expressions ($Revision: 6479 $)
=head1 DESCRIPTION
Describe what you're doing and how you're doing it, using normal Perl
comments.
- # turn the line into the first word, a colon, and the
- # number of characters on the rest of the line
- s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
+ # turn the line into the first word, a colon, and the
+ # number of characters on the rest of the line
+ s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
=item Comments Inside the Regex
C</x> lets you turn this:
- s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
+ s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
into this:
- s{ < # opening angle bracket
- (?: # Non-backreffing grouping paren
- [^>'"] * # 0 or more things that are neither > nor ' nor "
- | # or else
- ".*?" # a section between double quotes (stingy match)
- | # or else
- '.*?' # a section between single quotes (stingy match)
- ) + # all occurring one or more times
- > # closing angle bracket
- }{}gsx; # replace with nothing, i.e. delete
+ s{ < # opening angle bracket
+ (?: # Non-backreffing grouping paren
+ [^>'"] * # 0 or more things that are neither > nor ' nor "
+ | # or else
+ ".*?" # a section between double quotes (stingy match)
+ | # or else
+ '.*?' # a section between single quotes (stingy match)
+ ) + # all occurring one or more times
+ > # closing angle bracket
+ }{}gsx; # replace with nothing, i.e. delete
It's still not quite so clear as prose, but it is very useful for
describing the meaning of each part of the pattern.
delimiters. Selecting another delimiter can avoid quoting the
delimiter within the pattern:
- s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
- s#/usr/local#/usr/share#g; # better
+ s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
+ s#/usr/local#/usr/share#g; # better
=back
than the default, or else we won't actually ever have a multiline
record read in.
- $/ = ''; # read in more whole paragraph, not just one line
- while ( <> ) {
- while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha
- print "Duplicate $1 at paragraph $.\n";
+ $/ = ''; # read in more whole paragraph, not just one line
+ while ( <> ) {
+ while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha
+ print "Duplicate $1 at paragraph $.\n";
+ }
}
- }
Here's code that finds sentences that begin with "From " (which would
be mangled by many mailers):
- $/ = ''; # read in more whole paragraph, not just one line
- while ( <> ) {
- while ( /^From /gm ) { # /m makes ^ match next to \n
- print "leading from in paragraph $.\n";
+ $/ = ''; # read in more whole paragraph, not just one line
+ while ( <> ) {
+ while ( /^From /gm ) { # /m makes ^ match next to \n
+ print "leading from in paragraph $.\n";
+ }
}
- }
Here's code that finds everything between START and END in a paragraph:
- undef $/; # read in whole file, not just one line or paragraph
- while ( <> ) {
- while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
- print "$1\n";
+ undef $/; # read in whole file, not just one line or paragraph
+ while ( <> ) {
+ while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
+ print "$1\n";
+ }
}
- }
=head2 How can I pull out lines between two patterns that are themselves on different lines?
X<..>
You can use Perl's somewhat exotic C<..> operator (documented in
L<perlop>):
- perl -ne 'print if /START/ .. /END/' file1 file2 ...
+ perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
- perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
+ perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of C<START> through C<END>, you'll
run up against the problem described in the question in this section
Here's another example of using C<..>:
- while (<>) {
- $in_header = 1 .. /^$/;
- $in_body = /^$/ .. eof();
+ while (<>) {
+ $in_header = 1 .. /^$/;
+ $in_body = /^$/ .. eof();
# now choose between them
- } continue {
- reset if eof(); # fix $.
- }
+ } continue {
+ reset if eof(); # fix $.
+ }
=head2 I put a regular expression into $/ but it didn't work. What's wrong?
X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in>
If you have File::Stream, this is easy.
- use File::Stream;
- my $stream = File::Stream->new(
- $filehandle,
- separator => qr/\s*,\s*/,
- );
+ use File::Stream;
+
+ my $stream = File::Stream->new(
+ $filehandle,
+ separator => qr/\s*,\s*/,
+ );
- print "$_\n" while <$stream>;
+ print "$_\n" while <$stream>;
If you don't have File::Stream, you have to do a little more work.
a buffer. After you add to the buffer, you check if you have a
complete line (using your regular expression).
- local $_ = "";
- while( sysread FH, $_, 8192, length ) {
- while( s/^((?s).*?)your_pattern/ ) {
- my $record = $1;
- # do stuff here.
- }
- }
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ while( s/^((?s).*?)your_pattern/ ) {
+ my $record = $1;
+ # do stuff here.
+ }
+ }
You can do the same thing with foreach and a match using the
c flag and the \G anchor, if you do not mind your entire file
being in memory at the end.
- local $_ = "";
- while( sysread FH, $_, 8192, length ) {
- foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
- # do stuff here.
- }
- substr( $_, 0, pos ) = "" if pos;
- }
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
+ # do stuff here.
+ }
+ substr( $_, 0, pos ) = "" if pos;
+ }
=head2 How do I substitute case insensitively on the LHS while preserving case on the RHS?
Here's a lovely Perlish solution by Larry Rosler. It exploits
properties of bitwise xor on ASCII strings.
- $_= "this is a TEsT case";
+ $_= "this is a TEsT case";
- $old = 'test';
- $new = 'success';
+ $old = 'test';
+ $new = 'success';
- s{(\Q$old\E)}
- { uc $new | (uc $1 ^ $1) .
- (uc(substr $1, -1) ^ substr $1, -1) x
- (length($new) - length $1)
- }egi;
+ s{(\Q$old\E)}
+ { uc $new | (uc $1 ^ $1) .
+ (uc(substr $1, -1) ^ substr $1, -1) x
+ (length($new) - length $1)
+ }egi;
- print;
+ print;
And here it is as a subroutine, modeled after the above:
- sub preserve_case($$) {
- my ($old, $new) = @_;
- my $mask = uc $old ^ $old;
+ sub preserve_case($$) {
+ my ($old, $new) = @_;
+ my $mask = uc $old ^ $old;
- uc $new | $mask .
- substr($mask, -1) x (length($new) - length($old))
+ uc $new | $mask .
+ substr($mask, -1) x (length($new) - length($old))
}
- $a = "this is a TEsT case";
- $a =~ s/(test)/preserve_case($1, "success")/egi;
- print "$a\n";
+ $a = "this is a TEsT case";
+ $a =~ s/(test)/preserve_case($1, "success")/egi;
+ print "$a\n";
This prints:
- this is a SUcCESS case
+ this is a SUcCESS case
As an alternative, to keep the case of the replacement word if it is
longer than the original, you can use this code, by Jeff Pinyan:
- sub preserve_case {
- my ($from, $to) = @_;
- my ($lf, $lt) = map length, @_;
+ sub preserve_case {
+ my ($from, $to) = @_;
+ my ($lf, $lt) = map length, @_;
- if ($lt < $lf) { $from = substr $from, 0, $lt }
- else { $from .= substr $to, $lf }
+ if ($lt < $lf) { $from = substr $from, 0, $lt }
+ else { $from .= substr $to, $lf }
- return uc $to | ($from ^ uc $from);
- }
+ return uc $to | ($from ^ uc $from);
+ }
This changes the sentence to "this is a SUcCess case."
If the substitution has more characters than the string being substituted,
the case of the last character is used for the rest of the substitution.
- # Original by Nathan Torkington, massaged by Jeffrey Friedl
- #
- sub preserve_case($$)
- {
- my ($old, $new) = @_;
- my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
- my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
- my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
-
- for ($i = 0; $i < $len; $i++) {
- if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
- $state = 0;
- } elsif (lc $c eq $c) {
- substr($new, $i, 1) = lc(substr($new, $i, 1));
- $state = 1;
- } else {
- substr($new, $i, 1) = uc(substr($new, $i, 1));
- $state = 2;
- }
- }
- # finish up with any remaining new (for when new is longer than old)
- if ($newlen > $oldlen) {
- if ($state == 1) {
- substr($new, $oldlen) = lc(substr($new, $oldlen));
- } elsif ($state == 2) {
- substr($new, $oldlen) = uc(substr($new, $oldlen));
- }
- }
- return $new;
- }
+ # Original by Nathan Torkington, massaged by Jeffrey Friedl
+ #
+ sub preserve_case($$)
+ {
+ my ($old, $new) = @_;
+ my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
+ my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
+ my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
+
+ for ($i = 0; $i < $len; $i++) {
+ if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
+ $state = 0;
+ } elsif (lc $c eq $c) {
+ substr($new, $i, 1) = lc(substr($new, $i, 1));
+ $state = 1;
+ } else {
+ substr($new, $i, 1) = uc(substr($new, $i, 1));
+ $state = 2;
+ }
+ }
+ # finish up with any remaining new (for when new is longer than old)
+ if ($newlen > $oldlen) {
+ if ($state == 1) {
+ substr($new, $oldlen) = lc(substr($new, $oldlen));
+ } elsif ($state == 2) {
+ substr($new, $oldlen) = uc(substr($new, $oldlen));
+ }
+ }
+ return $new;
+ }
=head2 How can I make C<\w> match national character sets?
X<\w>
also that any regex special characters will be acted on unless you
precede the substitution with \Q. Here's an example:
- $string = "Placido P. Octopus";
- $regex = "P.";
+ $string = "Placido P. Octopus";
+ $regex = "P.";
- $string =~ s/$regex/Polyp/;
- # $string is now "Polypacido P. Octopus"
+ $string =~ s/$regex/Polyp/;
+ # $string is now "Polypacido P. Octopus"
Because C<.> is special in regular expressions, and can match any
single character, the regex C<P.> here has matched the <Pl> in the
To escape the special meaning of C<.>, we use C<\Q>:
- $string = "Placido P. Octopus";
- $regex = "P.";
+ $string = "Placido P. Octopus";
+ $regex = "P.";
- $string =~ s/\Q$regex/Polyp/;
- # $string is now "Placido Polyp Octopus"
+ $string =~ s/\Q$regex/Polyp/;
+ # $string is now "Placido Polyp Octopus"
The use of C<\Q> causes the <.> in the regex to be treated as a
regular character, so that C<P.> matches a C<P> followed by a dot.
For example, here's a "paragrep" program:
- $/ = ''; # paragraph mode
- $pat = shift;
- while (<>) {
- print if /$pat/o;
- }
+ $/ = ''; # paragraph mode
+ $pat = shift;
+ while (<>) {
+ print if /$pat/o;
+ }
=head2 How do I use a regular expression to strip C style comments from a file?
While this actually can be done, it's much harder than you'd think.
For example, this one-liner
- perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
+ perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
will work in many but not all cases. You see, it's too simple-minded for
certain kinds of C programs, in particular, those with what appear to be
comments in quoted strings. For that, you'd need something like this,
created by Jeffrey Friedl and later modified by Fred Curtis.
- $/ = undef;
- $_ = <>;
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
- print;
+ $/ = undef;
+ $_ = <>;
+ s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
+ print;
This could, of course, be more legibly written with the C</x> modifier, adding
whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
A slight modification also removes C++ comments:
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
+ s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
=head2 Can I use Perl regular expressions to match balanced text?
X<regex, matching balanced test> X<regexp, matching balanced test>
An example:
- $s1 = $s2 = "I am very very cold";
- $s1 =~ s/ve.*y //; # I am cold
- $s2 =~ s/ve.*?y //; # I am very cold
+ $s1 = $s2 = "I am very very cold";
+ $s1 =~ s/ve.*y //; # I am cold
+ $s2 =~ s/ve.*?y //; # I am very cold
Notice how the second substitution stopped matching as soon as it
encountered "y ". The C<*?> quantifier effectively tells the regular
Use the split function:
- while (<>) {
- foreach $word ( split ) {
- # do something with $word here
+ while (<>) {
+ foreach $word ( split ) {
+ # do something with $word here
+ }
}
- }
Note that this isn't really a word in the English sense; it's just
chunks of consecutive non-whitespace characters.
To work with only alphanumeric sequences (including underscores), you
might consider
- while (<>) {
- foreach $word (m/(\w+)/g) {
- # do something with $word here
+ while (<>) {
+ foreach $word (m/(\w+)/g) {
+ # do something with $word here
+ }
}
- }
=head2 How can I print out a word-frequency or line-frequency summary?
apostrophes, rather than the non-whitespace chunk idea of a word given
in the previous question:
- while (<>) {
- while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
- $seen{$1}++;
+ while (<>) {
+ while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
+ $seen{$1}++;
+ }
}
- }
- while ( ($word, $count) = each %seen ) {
- print "$count $word\n";
- }
+
+ while ( ($word, $count) = each %seen ) {
+ print "$count $word\n";
+ }
If you wanted to do the same thing for lines, you wouldn't need a
regular expression:
- while (<>) {
- $seen{$_}++;
- }
- while ( ($line, $count) = each %seen ) {
- print "$count $line";
- }
+ while (<>) {
+ $seen{$_}++;
+ }
+
+ while ( ($line, $count) = each %seen ) {
+ print "$count $line";
+ }
If you want these output in a sorted order, see L<perlfaq4>: "How do I
sort a hash (optionally by value instead of key)?".
the regular expression for every iteration of the foreach()
loop since it has no way to know what $pattern will be.
- @patterns = qw( foo bar baz );
+ @patterns = qw( foo bar baz );
- LINE: while( <> )
- {
+ LINE: while( <DATA> )
+ {
foreach $pattern ( @patterns )
{
- print if /\b$pattern\b/i;
- next LINE;
- }
+ if( /\b$pattern\b/i )
+ {
+ print;
+ next LINE;
+ }
+ }
}
The qr// operator showed up in perl 5.005. It compiles a
its pre-compiled form. The rest of the script is the same,
but faster.
- @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
+ @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
- LINE: while( <> )
- {
+ LINE: while( <> )
+ {
foreach $pattern ( @patterns )
{
- print if /\b$pattern\b/i;
- next LINE;
- }
+ print if /\b$pattern\b/i;
+ next LINE;
+ }
}
In some cases, you may be able to make several patterns into
$regex = join '|', qw( foo bar baz );
- LINE: while( <> )
- {
+ LINE: while( <> )
+ {
print if /\b(?:$regex)\b/i;
}
such as in a tokenizer. Jeffrey Friedl offers this example
which works in 5.004 or later.
- while (<>) {
- chomp;
- PARSER: {
- m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
- m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
- m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
- m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
- }
- }
+ while (<>) {
+ chomp;
+ PARSER: {
+ m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
+ m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
+ m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
+ m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
+ }
+ }
For each line, the PARSER loop first tries to match a series
of digits followed by a word boundary. This match has to
=head2 How can I match strings with multibyte characters?
X<regex, and multibyte characters> X<regexp, and multibyte characters>
-X<regular expression, and multibyte characters>
+X<regular expression, and multibyte characters> X<martian> X<encoding, Martian>
Starting from Perl 5.6 Perl has had some level of multibyte character
support. Perl 5.8 or later is recommended. Supported multibyte
Here are a few ways, all painful, to deal with it:
- $martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent "martian"
- # bytes are no longer adjacent.
- print "found GX!\n" if $martian =~ /GX/;
+ # Make sure adjacent "martian" bytes are no longer adjacent.
+ $martian =~ s/([A-Z][A-Z])/ $1 /g;
+
+ print "found GX!\n" if $martian =~ /GX/;
Or like this:
- @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
- # above is conceptually similar to: @chars = $text =~ m/(.)/g;
- #
- foreach $char (@chars) {
- print "found GX!\n", last if $char eq 'GX';
- }
+ @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
+ # above is conceptually similar to: @chars = $text =~ m/(.)/g;
+ #
+ foreach $char (@chars) {
+ print "found GX!\n", last if $char eq 'GX';
+ }
Or like this:
- while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded
- print "found GX!\n", last if $1 eq 'GX';
- }
+ while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded
+ print "found GX!\n", last if $1 eq 'GX';
+ }
Here's another, slightly less painful, way to do it from Benjamin
Goldberg, who uses a zero-width negative look-behind assertion.
print "found GX!\n" if $martian =~ m/
- (?<![A-Z])
- (?:[A-Z][A-Z])*?
- GX
+ (?<![A-Z])
+ (?:[A-Z][A-Z])*?
+ GX
/x;
This succeeds if the "martian" character GX is in the string, and fails
It does have the drawback of putting the wrong thing in $-[0] and $+[0],
but this usually can be worked around.
-=head2 How do I match a pattern that is supplied by the user?
+=head2 How do I match a regular expression that's in a variable?
+X<regex, in variable> X<eval> X<regex> X<quotemeta> X<\Q, regex>
+X<\E, regex>, X<qr//>
-Well, if it's really a pattern, then just use
+(contributed by brian d foy)
- chomp($pattern = <STDIN>);
- if ($line =~ /$pattern/) { }
+We don't have to hard-code patterns into the match operator (or
+anything else that works with regular expressions). We can put the
+pattern in a variable for later use.
-Alternatively, since you have no guarantee that your user entered
-a valid regular expression, trap the exception this way:
+The match operator is a double quote context, so you can interpolate
+your variable just like a double quoted string. In this case, you
+read the regular expression as user input and store it in C<$regex>.
+Once you have the pattern in C<$regex>, you use that variable in the
+match operator.
- if (eval { $line =~ /$pattern/ }) { }
+ chomp( my $regex = <STDIN> );
-If all you really want is to search for a string, not a pattern,
-then you should either use the index() function, which is made for
-string searching, or, if you can't be disabused of using a pattern
-match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented
-in L<perlre>.
+ if( $string =~ m/$regex/ ) { ... }
- $pattern = <STDIN>;
+Any regular expression special characters in C<$regex> are still
+special, and the pattern still has to be valid or Perl will complain.
+For instance, in this pattern there is an unpaired parenthesis.
- open (FILE, $input) or die "Couldn't open input $input: $!; aborting";
- while (<FILE>) {
- print if /\Q$pattern\E/;
- }
- close FILE;
+ my $regex = "Unmatched ( paren";
+
+ "Two parens to bind them all" =~ m/$regex/;
+
+When Perl compiles the regular expression, it treats the parenthesis
+as the start of a memory match. When it doesn't find the closing
+parenthesis, it complains:
+
+ Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
+
+You can get around this in several ways depending on our situation.
+First, if you don't want any of the characters in the string to be
+special, you can escape them with C<quotemeta> before you use the string.
+
+ chomp( my $regex = <STDIN> );
+ $regex = quotemeta( $regex );
+
+ if( $string =~ m/$regex/ ) { ... }
+
+You can also do this directly in the match operator using the C<\Q>
+and C<\E> sequences. The C<\Q> tells Perl where to start escaping
+special characters, and the C<\E> tells it where to stop (see L<perlop>
+for more details).
+
+ chomp( my $regex = <STDIN> );
+
+ if( $string =~ m/\Q$regex\E/ ) { ... }
+
+Alternately, you can use C<qr//>, the regular expression quote operator (see
+L<perlop> for more details). It quotes and perhaps compiles the pattern,
+and you can apply regular expression flags to the pattern.
+
+ chomp( my $input = <STDIN> );
+
+ my $regex = qr/$input/is;
+
+ $string =~ m/$regex/ # same as m/$input/is;
+
+You might also want to trap any errors by wrapping an C<eval> block
+around the whole thing.
+
+ chomp( my $input = <STDIN> );
+
+ eval {
+ if( $string =~ m/\Q$input\E/ ) { ... }
+ };
+ warn $@ if $@;
+
+Or...
+
+ my $regex = eval { qr/$input/is };
+ if( defined $regex ) {
+ $string =~ m/$regex/;
+ }
+ else {
+ warn $@;
+ }
=head1 REVISION
-Revision: $Revision: 3606 $
+Revision: $Revision: 6479 $
-Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $
+Date: $Date: 2006-06-07 09:48:12 +0200 (mer, 07 jun 2006) $
See L<perlfaq> for source control details and availability.
=head1 NAME
-perlfaq8 - System Interaction ($Revision: 3606 $)
+perlfaq8 - System Interaction ($Revision: 6628 $)
=head1 DESCRIPTION
know that they have an ANSI terminal that understands
color, you can use the Term::ANSIColor module from CPAN:
- use Term::ANSIColor;
- print color("red"), "Stop!\n", color("reset");
- print color("green"), "Go!\n", color("reset");
+ use Term::ANSIColor;
+ print color("red"), "Stop!\n", color("reset");
+ print color("green"), "Go!\n", color("reset");
Or like this:
- use Term::ANSIColor qw(:constants);
- print RED, "Stop!\n", RESET;
- print GREEN, "Go!\n", RESET;
+ use Term::ANSIColor qw(:constants);
+ print RED, "Stop!\n", RESET;
+ print GREEN, "Go!\n", RESET;
=head2 How do I read just one key without waiting for a return key?
L<perlfunc/getc>, but as you see, that's already getting you into
portability snags.
- open(TTY, "+</dev/tty") or die "no tty: $!";
- system "stty cbreak </dev/tty >/dev/tty 2>&1";
- $key = getc(TTY); # perhaps this works
- # OR ELSE
- sysread(TTY, $key, 1); # probably this does
- system "stty -cbreak </dev/tty >/dev/tty 2>&1";
+ open(TTY, "+</dev/tty") or die "no tty: $!";
+ system "stty cbreak </dev/tty >/dev/tty 2>&1";
+ $key = getc(TTY); # perhaps this works
+ # OR ELSE
+ sysread(TTY, $key, 1); # probably this does
+ system "stty -cbreak </dev/tty >/dev/tty 2>&1";
The Term::ReadKey module from CPAN offers an easy-to-use interface that
should be more efficient than shelling out to B<stty> for each key.
It even includes limited support for Windows.
- use Term::ReadKey;
- ReadMode('cbreak');
- $key = ReadKey(0);
- ReadMode('normal');
+ use Term::ReadKey;
+ ReadMode('cbreak');
+ $key = ReadKey(0);
+ ReadMode('normal');
However, using the code requires that you have a working C compiler
and can use it to build and install a CPAN module. Here's a solution
using the standard POSIX module, which is already on your systems
(assuming your system supports POSIX).
- use HotKey;
- $key = readkey();
+ use HotKey;
+ $key = readkey();
And here's the HotKey module, which hides the somewhat mystifying calls
to manipulate the POSIX termios structures.
- # HotKey.pm
- package HotKey;
+ # HotKey.pm
+ package HotKey;
- @ISA = qw(Exporter);
- @EXPORT = qw(cbreak cooked readkey);
+ @ISA = qw(Exporter);
+ @EXPORT = qw(cbreak cooked readkey);
- use strict;
- use POSIX qw(:termios_h);
- my ($term, $oterm, $echo, $noecho, $fd_stdin);
+ use strict;
+ use POSIX qw(:termios_h);
+ my ($term, $oterm, $echo, $noecho, $fd_stdin);
- $fd_stdin = fileno(STDIN);
- $term = POSIX::Termios->new();
- $term->getattr($fd_stdin);
- $oterm = $term->getlflag();
+ $fd_stdin = fileno(STDIN);
+ $term = POSIX::Termios->new();
+ $term->getattr($fd_stdin);
+ $oterm = $term->getlflag();
- $echo = ECHO | ECHOK | ICANON;
- $noecho = $oterm & ~$echo;
+ $echo = ECHO | ECHOK | ICANON;
+ $noecho = $oterm & ~$echo;
- sub cbreak {
- $term->setlflag($noecho); # ok, so i don't want echo either
- $term->setcc(VTIME, 1);
- $term->setattr($fd_stdin, TCSANOW);
- }
+ sub cbreak {
+ $term->setlflag($noecho); # ok, so i don't want echo either
+ $term->setcc(VTIME, 1);
+ $term->setattr($fd_stdin, TCSANOW);
+ }
- sub cooked {
- $term->setlflag($oterm);
- $term->setcc(VTIME, 0);
- $term->setattr($fd_stdin, TCSANOW);
- }
+ sub cooked {
+ $term->setlflag($oterm);
+ $term->setcc(VTIME, 0);
+ $term->setattr($fd_stdin, TCSANOW);
+ }
- sub readkey {
- my $key = '';
- cbreak();
- sysread(STDIN, $key, 1);
- cooked();
- return $key;
- }
+ sub readkey {
+ my $key = '';
+ cbreak();
+ sysread(STDIN, $key, 1);
+ cooked();
+ return $key;
+ }
- END { cooked() }
+ END { cooked() }
- 1;
+ 1;
=head2 How do I check whether input is ready on the keyboard?
Term::ReadKey module from CPAN, passing it an argument of -1 to indicate
not to block:
- use Term::ReadKey;
+ use Term::ReadKey;
- ReadMode('cbreak');
+ ReadMode('cbreak');
- if (defined ($char = ReadKey(-1)) ) {
- # input was waiting and it was $char
- } else {
- # no input was waiting
- }
+ if (defined ($char = ReadKey(-1)) ) {
+ # input was waiting and it was $char
+ } else {
+ # no input was waiting
+ }
- ReadMode('normal'); # restore normal tty settings
+ ReadMode('normal'); # restore normal tty settings
=head2 How do I clear the screen?
If you only have do so infrequently, use C<system>:
- system("clear");
+ system("clear");
If you have to do this a lot, save the clear string
so you can print it 100 times without calling a program
100 times:
- $clear_string = `clear`;
- print $clear_string;
+ $clear_string = `clear`;
+ print $clear_string;
If you're planning on doing other screen manipulations, like cursor
positions, etc, you might wish to use Term::Cap module:
- use Term::Cap;
- $terminal = Term::Cap->Tgetent( {OSPEED => 9600} );
- $clear_string = $terminal->Tputs('cl');
+ use Term::Cap;
+ $terminal = Term::Cap->Tgetent( {OSPEED => 9600} );
+ $clear_string = $terminal->Tputs('cl');
=head2 How do I get the screen size?
you can use it to fetch the width and height in characters
and in pixels:
- use Term::ReadKey;
- ($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize();
+ use Term::ReadKey;
+ ($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize();
This is more portable than the raw C<ioctl>, but not as
illustrative:
- require 'sys/ioctl.ph';
- die "no TIOCGWINSZ " unless defined &TIOCGWINSZ;
- open(TTY, "+</dev/tty") or die "No tty: $!";
- unless (ioctl(TTY, &TIOCGWINSZ, $winsize='')) {
- die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ;
- }
- ($row, $col, $xpixel, $ypixel) = unpack('S4', $winsize);
- print "(row,col) = ($row,$col)";
- print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel;
- print "\n";
+ require 'sys/ioctl.ph';
+ die "no TIOCGWINSZ " unless defined &TIOCGWINSZ;
+ open(TTY, "+</dev/tty") or die "No tty: $!";
+ unless (ioctl(TTY, &TIOCGWINSZ, $winsize='')) {
+ die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ;
+ }
+ ($row, $col, $xpixel, $ypixel) = unpack('S4', $winsize);
+ print "(row,col) = ($row,$col)";
+ print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel;
+ print "\n";
=head2 How do I ask the user for a password?
You can also do this for most systems using the Term::ReadKey module
from CPAN, which is easier to use and in theory more portable.
- use Term::ReadKey;
+ use Term::ReadKey;
- ReadMode('noecho');
- $password = ReadLine(0);
+ ReadMode('noecho');
+ $password = ReadLine(0);
=head2 How do I read and write the serial port?
give the numeric values you want directly, using octal ("\015"), hex
("0x0D"), or as a control-character specification ("\cM").
- print DEV "atv1\012"; # wrong, for some devices
- print DEV "atv1\015"; # right, for some devices
+ print DEV "atv1\012"; # wrong, for some devices
+ print DEV "atv1\015"; # right, for some devices
Even though with normal text files a "\n" will do the trick, there is
still no unified scheme for terminating a line that is portable
and L<perlfunc/select>, or L<perlfaq5>, "How do I flush/unbuffer an
output filehandle? Why must I do this?"):
- $oldh = select(DEV);
- $| = 1;
- select($oldh);
+ $oldh = select(DEV);
+ $| = 1;
+ select($oldh);
You'll also see code that does this without a temporary variable, as in
- select((select(DEV), $| = 1)[0]);
+ select((select(DEV), $| = 1)[0]);
Or if you don't mind pulling in a few thousand lines
of code just because you're afraid of a little $| variable:
- use IO::Handle;
- DEV->autoflush(1);
+ use IO::Handle;
+ DEV->autoflush(1);
As mentioned in the previous item, this still doesn't work when using
socket I/O between Unix and Macintosh. You'll need to hard code your
=back
While trying to read from his caller-id box, the notorious Jamie Zawinski
-<jwz@netscape.com>, after much gnashing of teeth and fighting with sysread,
+C<< <jwz@netscape.com> >>, after much gnashing of teeth and fighting with sysread,
sysopen, POSIX's tcgetattr business, and various other functions that
go bump in the night, finally came up with this:
- sub open_modem {
- use IPC::Open2;
- my $stty = `/bin/stty -g`;
- open2( \*MODEM_IN, \*MODEM_OUT, "cu -l$modem_device -s2400 2>&1");
- # starting cu hoses /dev/tty's stty settings, even when it has
- # been opened on a pipe...
- system("/bin/stty $stty");
- $_ = <MODEM_IN>;
- chomp;
- if ( !m/^Connected/ ) {
- print STDERR "$0: cu printed `$_' instead of `Connected'\n";
+ sub open_modem {
+ use IPC::Open2;
+ my $stty = `/bin/stty -g`;
+ open2( \*MODEM_IN, \*MODEM_OUT, "cu -l$modem_device -s2400 2>&1");
+ # starting cu hoses /dev/tty's stty settings, even when it has
+ # been opened on a pipe...
+ system("/bin/stty $stty");
+ $_ = <MODEM_IN>;
+ chomp;
+ if ( !m/^Connected/ ) {
+ print STDERR "$0: cu printed `$_' instead of `Connected'\n";
+ }
}
- }
=head2 How do I decode encrypted password files?
You could also use
- system("cmd &")
+ system("cmd &")
or you could use fork as documented in L<perlfunc/"fork">, with
further examples in L<perlipc>. Some things to be aware of, if you're
You have to be prepared to "reap" the child process when it finishes.
- $SIG{CHLD} = sub { wait };
+ $SIG{CHLD} = sub { wait };
- $SIG{CHLD} = 'IGNORE';
+ $SIG{CHLD} = 'IGNORE';
You can also use a double fork. You immediately wait() for your
first child, and the init daemon will wait() for your grandchild once
it exits.
unless ($pid = fork) {
- unless (fork) {
- exec "what you really wanna do";
- die "exec failed!";
- }
- exit 0;
- }
- waitpid($pid,0);
-
+ unless (fork) {
+ exec "what you really wanna do";
+ die "exec failed!";
+ }
+ exit 0;
+ }
+ waitpid($pid, 0);
See L<perlipc/"Signals"> for other examples of code to do this.
Zombies are not an issue with C<system("prog &")>.
*after* the signal has been caught, rather than while it is being caught.
Previous versions of this answer were incorrect.
-
=head2 How do I modify the shadow password file on a Unix system?
If perl was installed correctly and your shadow library was written
However, if all you want to do is change your time zone, you can
probably get away with setting an environment variable:
- $ENV{TZ} = "MST7MDT"; # unixish
- $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms
- system "trn comp.lang.perl.misc";
+ $ENV{TZ} = "MST7MDT"; # unixish
+ $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms
+ system "trn comp.lang.perl.misc";
=head2 How can I sleep() or alarm() for under a second?
a system call like gettimeofday(2), then you may be able to do
something like this:
- require 'sys/syscall.ph';
+ require 'sys/syscall.ph';
- $TIMEVAL_T = "LL";
+ $TIMEVAL_T = "LL";
- $done = $start = pack($TIMEVAL_T, ());
+ $done = $start = pack($TIMEVAL_T, ());
- syscall(&SYS_gettimeofday, $start, 0) != -1
- or die "gettimeofday: $!";
+ syscall(&SYS_gettimeofday, $start, 0) != -1
+ or die "gettimeofday: $!";
- ##########################
- # DO YOUR OPERATION HERE #
- ##########################
+ ##########################
+ # DO YOUR OPERATION HERE #
+ ##########################
- syscall( &SYS_gettimeofday, $done, 0) != -1
- or die "gettimeofday: $!";
+ syscall( &SYS_gettimeofday, $done, 0) != -1
+ or die "gettimeofday: $!";
- @start = unpack($TIMEVAL_T, $start);
- @done = unpack($TIMEVAL_T, $done);
+ @start = unpack($TIMEVAL_T, $start);
+ @done = unpack($TIMEVAL_T, $done);
- # fix microseconds
- for ($done[1], $start[1]) { $_ /= 1_000_000 }
+ # fix microseconds
+ for ($done[1], $start[1]) { $_ /= 1_000_000 }
- $delta_time = sprintf "%.4f", ($done[0] + $done[1] )
- -
- ($start[0] + $start[1] );
+ $delta_time = sprintf "%.4f", ($done[0] + $done[1] )
+ -
+ ($start[0] + $start[1] );
=head2 How can I do an atexit() or setjmp()/longjmp()? (Exception handling)
For example, you can use this to make sure your filter program
managed to finish its output without filling up the disk:
- END {
- close(STDOUT) || die "stdout close failed: $!";
- }
+ END {
+ close(STDOUT) || die "stdout close failed: $!";
+ }
The END block isn't called when untrapped signals kill the program,
though, so if you use END blocks you should also use
L<perlfunc>).
Remember to check the modules that came with your distribution, and
-CPAN as well---someone may already have written a module to do it. On
+CPAN as well--someone may already have written a module to do it. On
Windows, try Win32::API. On Macs, try Mac::Carbon. If no module
has an interface to the C function, you can inline a bit of C in your
Perl source with Inline::C.
but the hard ones like F<ioctl.h> nearly always need to hand-edited.
Here's how to install the *.ph files:
- 1. become super-user
- 2. cd /usr/include
- 3. h2ph *.h */*.h
+ 1. become super-user
+ 2. cd /usr/include
+ 3. h2ph *.h */*.h
If your system supports dynamic loading, for reasons of portability and
sanity you probably ought to use h2xs (also part of the standard perl
the high 8 bits are the actual exit value). Backticks (``) run a
command and return what it sent to STDOUT.
- $exit_status = system("mail-users");
- $output_string = `ls`;
+ $exit_status = system("mail-users");
+ $output_string = `ls`;
=head2 How can I capture STDERR from an external command?
There are three basic ways of running external commands:
- system $cmd; # using system()
- $output = `$cmd`; # using backticks (``)
- open (PIPE, "cmd |"); # using open()
+ system $cmd; # using system()
+ $output = `$cmd`; # using backticks (``)
+ open (PIPE, "cmd |"); # using open()
With system(), both STDOUT and STDERR will go the same place as the
script's STDOUT and STDERR, unless the system() command redirects them.
To capture a program's STDOUT, but discard its STDERR:
- use IPC::Open3;
- use File::Spec;
- use Symbol qw(gensym);
- open(NULL, ">", File::Spec->devnull);
- my $pid = open3(gensym, \*PH, ">&NULL", "cmd");
- while( <PH> ) { }
- waitpid($pid, 0);
+ use IPC::Open3;
+ use File::Spec;
+ use Symbol qw(gensym);
+ open(NULL, ">", File::Spec->devnull);
+ my $pid = open3(gensym, \*PH, ">&NULL", "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
To capture a program's STDERR, but discard its STDOUT:
- use IPC::Open3;
- use File::Spec;
- use Symbol qw(gensym);
- open(NULL, ">", File::Spec->devnull);
- my $pid = open3(gensym, ">&NULL", \*PH, "cmd");
- while( <PH> ) { }
- waitpid($pid, 0);
+ use IPC::Open3;
+ use File::Spec;
+ use Symbol qw(gensym);
+ open(NULL, ">", File::Spec->devnull);
+ my $pid = open3(gensym, ">&NULL", \*PH, "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
To capture a program's STDERR, and let its STDOUT go to our own STDERR:
- use IPC::Open3;
- use Symbol qw(gensym);
- my $pid = open3(gensym, ">&STDERR", \*PH, "cmd");
- while( <PH> ) { }
- waitpid($pid, 0);
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ my $pid = open3(gensym, ">&STDERR", \*PH, "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
To read both a command's STDOUT and its STDERR separately, you can
redirect them to temp files, let the command run, then read the temp
files:
- use IPC::Open3;
- use Symbol qw(gensym);
- use IO::File;
- local *CATCHOUT = IO::File->new_tmpfile;
- local *CATCHERR = IO::File->new_tmpfile;
- my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd");
- waitpid($pid, 0);
- seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR;
- while( <CATCHOUT> ) {}
- while( <CATCHERR> ) {}
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ use IO::File;
+ local *CATCHOUT = IO::File->new_tmpfile;
+ local *CATCHERR = IO::File->new_tmpfile;
+ my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd");
+ waitpid($pid, 0);
+ seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR;
+ while( <CATCHOUT> ) {}
+ while( <CATCHERR> ) {}
But there's no real need for *both* to be tempfiles... the following
should work just as well, without deadlocking:
- use IPC::Open3;
- use Symbol qw(gensym);
- use IO::File;
- local *CATCHERR = IO::File->new_tmpfile;
- my $pid = open3(gensym, \*CATCHOUT, ">&CATCHERR", "cmd");
- while( <CATCHOUT> ) {}
- waitpid($pid, 0);
- seek CATCHERR, 0, 0;
- while( <CATCHERR> ) {}
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ use IO::File;
+ local *CATCHERR = IO::File->new_tmpfile;
+ my $pid = open3(gensym, \*CATCHOUT, ">&CATCHERR", "cmd");
+ while( <CATCHOUT> ) {}
+ waitpid($pid, 0);
+ seek CATCHERR, 0, 0;
+ while( <CATCHERR> ) {}
And it'll be faster, too, since we can begin processing the program's
stdout immediately, rather than waiting for the program to finish.
With any of these, you can change file descriptors before the call:
- open(STDOUT, ">logfile");
- system("ls");
+ open(STDOUT, ">logfile");
+ system("ls");
or you can use Bourne shell file-descriptor redirection:
- $output = `$cmd 2>some_file`;
- open (PIPE, "cmd 2>some_file |");
+ $output = `$cmd 2>some_file`;
+ open (PIPE, "cmd 2>some_file |");
You can also use file-descriptor redirection to make STDERR a
duplicate of STDOUT:
- $output = `$cmd 2>&1`;
- open (PIPE, "cmd 2>&1 |");
+ $output = `$cmd 2>&1`;
+ open (PIPE, "cmd 2>&1 |");
Note that you I<cannot> simply open STDERR to be a dup of STDOUT
in your Perl program and avoid calling the shell to do the redirection.
This doesn't work:
- open(STDERR, ">&STDOUT");
- $alloutput = `cmd args`; # stderr still escapes
+ open(STDERR, ">&STDOUT");
+ $alloutput = `cmd args`; # stderr still escapes
This fails because the open() makes STDERR go to where STDOUT was
going at the time of the open(). The backticks then make STDOUT go to
Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . To
capture a command's STDERR and STDOUT together:
- $output = `cmd 2>&1`; # either with backticks
- $pid = open(PH, "cmd 2>&1 |"); # or with an open pipe
- while (<PH>) { } # plus a read
+ $output = `cmd 2>&1`; # either with backticks
+ $pid = open(PH, "cmd 2>&1 |"); # or with an open pipe
+ while (<PH>) { } # plus a read
To capture a command's STDOUT but discard its STDERR:
- $output = `cmd 2>/dev/null`; # either with backticks
- $pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe
- while (<PH>) { } # plus a read
+ $output = `cmd 2>/dev/null`; # either with backticks
+ $pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe
+ while (<PH>) { } # plus a read
To capture a command's STDERR but discard its STDOUT:
- $output = `cmd 2>&1 1>/dev/null`; # either with backticks
- $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe
- while (<PH>) { } # plus a read
+ $output = `cmd 2>&1 1>/dev/null`; # either with backticks
+ $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe
+ while (<PH>) { } # plus a read
To exchange a command's STDOUT and STDERR in order to capture the STDERR
but leave its STDOUT to come out our old STDERR:
- $output = `cmd 3>&1 1>&2 2>&3 3>&-`; # either with backticks
- $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");# or with an open pipe
- while (<PH>) { } # plus a read
+ $output = `cmd 3>&1 1>&2 2>&3 3>&-`; # either with backticks
+ $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");# or with an open pipe
+ while (<PH>) { } # plus a read
To read both a command's STDOUT and its STDERR separately, it's easiest
to redirect them separately to files, and then read from those files
when the program is done:
- system("program args 1>program.stdout 2>program.stderr");
+ system("program args 1>program.stdout 2>program.stderr");
Ordering is important in all these examples. That's because the shell
processes file descriptor redirections in strictly left to right order.
- system("prog args 1>tmpfile 2>&1");
- system("prog args 2>&1 1>tmpfile");
+ system("prog args 1>tmpfile 2>&1");
+ system("prog args 2>&1 1>tmpfile");
The first command sends both standard out and standard error to the
temporary file. The second command sends only the old standard output
Consider this line:
- `cat /etc/termcap`;
+ `cat /etc/termcap`;
You forgot to check C<$?> to see whether the program even ran
correctly. Even if you wrote
- print `cat /etc/termcap`;
+ print `cat /etc/termcap`;
this code could and probably should be written as
- system("cat /etc/termcap") == 0
+ system("cat /etc/termcap") == 0
or die "cat program failed!";
which will get the output quickly (as it is generated, instead of only
at the end) and also check the return value.
-system() also provides direct control over whether shell wildcard
+C<system> also provides direct control over whether shell wildcard
processing may take place, whereas backticks do not.
=head2 How can I call backticks without shell processing?
This is a bit tricky. You can't simply write the command
like this:
- @ok = `grep @opts '$search_string' @filenames`;
+ @ok = `grep @opts '$search_string' @filenames`;
As of Perl 5.8.0, you can use open() with multiple arguments.
Just like the list forms of system() and exec(), no shell
escapes happen.
- open( GREP, "-|", 'grep', @opts, $search_string, @filenames );
- chomp(@ok = <GREP>);
- close GREP;
+ open( GREP, "-|", 'grep', @opts, $search_string, @filenames );
+ chomp(@ok = <GREP>);
+ close GREP;
You can also:
- my @ok = ();
- if (open(GREP, "-|")) {
- while (<GREP>) {
- chomp;
- push(@ok, $_);
- }
- close GREP;
- } else {
- exec 'grep', @opts, $search_string, @filenames;
- }
+ my @ok = ();
+ if (open(GREP, "-|")) {
+ while (<GREP>) {
+ chomp;
+ push(@ok, $_);
+ }
+ close GREP;
+ } else {
+ exec 'grep', @opts, $search_string, @filenames;
+ }
Just as with system(), no shell escapes happen when you exec() a list.
Further examples of this can be found in L<perlipc/"Safe Pipe Opens">.
Try keeping around the seekpointer and go there, like this:
- $where = tell(LOG);
- seek(LOG, $where, 0);
+ $where = tell(LOG);
+ seek(LOG, $where, 0);
=item 2
the initial telnet handshaking, then the standard dual-process
approach will suffice:
- use IO::Socket; # new in 5.004
- $handle = IO::Socket::INET->new('www.perl.com:80')
- || die "can't connect to port 80 on www.perl.com: $!";
- $handle->autoflush(1);
- if (fork()) { # XXX: undef means failure
- select($handle);
- print while <STDIN>; # everything from stdin to socket
- } else {
- print while <$handle>; # everything from socket to stdout
- }
- close $handle;
- exit;
+ use IO::Socket; # new in 5.004
+ $handle = IO::Socket::INET->new('www.perl.com:80')
+ or die "can't connect to port 80 on www.perl.com: $!";
+ $handle->autoflush(1);
+ if (fork()) { # XXX: undef means failure
+ select($handle);
+ print while <STDIN>; # everything from stdin to socket
+ } else {
+ print while <$handle>; # everything from socket to stdout
+ }
+ close $handle;
+ exit;
=head2 How can I write expect in Perl?
operating systems, though. Daemon programs like sendmail place their
state there, as in:
- $0 = "orcus [accepting connections]";
+ $0 = "orcus [accepting connections]";
=head2 I {changed directory, modified my environment} in a perl script. How come the change disappeared when I exited the script? How do I get my changes to be visible?
Background yourself like this:
- fork && exit;
+ fork && exit;
=back
=head2 How do I find out if I'm running interactively or not?
-Good question. Sometimes C<-t STDIN> and C<-t STDOUT> can give clues,
+Good question. Sometimes C<-t STDIN> and C<-t STDOUT> can give clues,
sometimes not.
- if (-t STDIN && -t STDOUT) {
- print "Now what? ";
- }
+ if (-t STDIN && -t STDOUT) {
+ print "Now what? ";
+ }
On POSIX systems, you can test whether your own process group matches
the current process group of your controlling terminal as follows:
- use POSIX qw/getpgrp tcgetpgrp/;
-
- # Some POSIX systems, such as Linux, can be
- # without a /dev/tty at boot time.
- if (!open(TTY, "/dev/tty")) {
- print "no tty\n";
- } else {
- $tpgrp = tcgetpgrp(fileno(*TTY));
- $pgrp = getpgrp();
- if ($tpgrp == $pgrp) {
- print "foreground\n";
- } else {
- print "background\n";
- }
- }
+ use POSIX qw/getpgrp tcgetpgrp/;
+
+ # Some POSIX systems, such as Linux, can be
+ # without a /dev/tty at boot time.
+ if (!open(TTY, "/dev/tty")) {
+ print "no tty\n";
+ } else {
+ $tpgrp = tcgetpgrp(fileno(*TTY));
+ $pgrp = getpgrp();
+ if ($tpgrp == $pgrp) {
+ print "foreground\n";
+ } else {
+ print "background\n";
+ }
+ }
=head2 How do I timeout a slow event?
sample code) and then have a signal handler for the INT signal that
passes the signal on to the subprocess. Or you can check for it:
- $rc = system($cmd);
- if ($rc & 127) { die "signal death" }
+ $rc = system($cmd);
+ if ($rc & 127) { die "signal death" }
=head2 How do I open a file without blocking?
O_NDELAY or O_NONBLOCK flag from the Fcntl module in conjunction with
sysopen():
- use Fcntl;
- sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
- or die "can't open /foo/somefile: $!":
+ use Fcntl;
+ sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
+ or die "can't open /foo/somefile: $!":
=head2 How do I tell the difference between errors from the shell and perl?
The easiest way is to have a module also named CPAN do it for you.
This module comes with perl version 5.004 and later.
- $ perl -MCPAN -e shell
+ $ perl -MCPAN -e shell
- cpan shell -- CPAN exploration and modules installation (v1.59_54)
- ReadLine support enabled
+ cpan shell -- CPAN exploration and modules installation (v1.59_54)
+ ReadLine support enabled
- cpan> install Some::Module
+ cpan> install Some::Module
To manually install the CPAN module, or any well-behaved CPAN module
for that matter, follow these steps:
=item 2
- perl Makefile.PL
+ perl Makefile.PL
=item 3
- make
+ make
=item 4
- make test
+ make test
=item 5
- make install
+ make install
=back
Perl offers several different ways to include code from one file into
another. Here are the deltas between the various inclusion constructs:
- 1) do $file is like eval `cat $file`, except the former
+ 1) do $file is like eval `cat $file`, except the former
1.1: searches @INC and updates %INC.
1.2: bequeaths an *unrelated* lexical scope on the eval'ed code.
- 2) require $file is like do $file, except the former
+ 2) require $file is like do $file, except the former
2.1: checks for redundant loading, skipping already loaded files.
2.2: raises an exception on failure to find, compile, or execute $file.
- 3) require Module is like require "Module.pm", except the former
+ 3) require Module is like require "Module.pm", except the former
3.1: translates each "::" into your system's directory separator.
3.2: primes the parser to disambiguate class Module as an indirect object.
- 4) use Module is like require Module, except the former
+ 4) use Module is like require Module, except the former
4.1: loads the module at compile time, not run-time.
4.2: imports symbols and semantics from that package to the current one.
When you build modules, use the PREFIX and LIB options when generating
Makefiles:
- perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib
+ perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib
then either set the PERL5LIB environment variable before you run
scripts that use the modules/libraries (see L<perlrun>) or say
- use lib '/mydir/perl/lib';
+ use lib '/mydir/perl/lib';
This is almost the same as
- BEGIN {
+ BEGIN {
unshift(@INC, '/mydir/perl/lib');
- }
+ }
except that the lib module checks for machine-dependent subdirectories.
See Perl's L<lib> for more information.
=head2 How do I add the directory my program lives in to the module/library search path?
- use FindBin;
- use lib "$FindBin::Bin";
- use your_own_modules;
+ use FindBin;
+ use lib "$FindBin::Bin";
+ use your_own_modules;
=head2 How do I add a directory to my include path (@INC) at runtime?
Here are the suggested ways of modifying your include path:
- the PERLLIB environment variable
- the PERL5LIB environment variable
- the perl -Idir command line flag
- the use lib pragma, as in
- use lib "$ENV{HOME}/myown_perllib";
+ the PERLLIB environment variable
+ the PERL5LIB environment variable
+ the perl -Idir command line flag
+ the use lib pragma, as in
+ use lib "$ENV{HOME}/myown_perllib";
The latter is particularly useful because it knows about machine
dependent architectures. The lib.pm pragmatic module was first
=head1 REVISION
-Revision: $Revision: 3606 $
+Revision: $Revision: 6628 $
-Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $
+Date: $Date: 2006-07-09 14:46:14 +0200 (dim, 09 jui 2006) $
See L<perlfaq> for source control details and availability.