of the latest perlfaq to comp.lang.perl.misc.
You can view the source tree at
-https://svn.perl.org/modules/perlfaq/trunk/ (which is outside of the
-main Perl source tree). The SVN repository notes all changes to the FAQ
+https://github.com/briandfoy/perlfaq (which is outside of the
+main Perl source tree). The git repository notes all changes to the FAQ
and holds the latest version of the working documents and may vary
significantly from the version distributed with the latest version of
Perl. Check the repository before sending your corrections.
answers. If you'd like to help review and update the answers, check out
comp.lang.perl.misc.
+You can also fork the git repository for the perlfaq and send a pull
+request so the main repository can pull your changes. The repository
+is at:
+
+ https://github.com/briandfoy/perlfaq
+
=head2 What will happen if you mail your Perl programming problems to the authors?
The perlfaq-workers like to keep all traffic on the perlfaq-workers list
=head1 AUTHOR AND COPYRIGHT
-Tom Christiansen wrote the original version of this document.
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
+other authors as noted. All rights reserved.
+
+Tom Christainsen wrote the original version of this document.
brian d foy C<< <bdfoy@cpan.org> >> wrote this version. See the
individual perlfaq documents for additional copyright information.
This document is available under the same terms as Perl itself. Code
examples in all the perlfaq documents are in the public domain. Use
-them as you see fit (and at your own risk with no warranty from anyone).
+them as you see fit and at your own risk with no warranty from anyone.
=head1 Table of Contents
=item *
+How do I merge two hashes?
+
+=item *
+
What happens if I add or remove keys from a hash while iterating over it?
=item *
Why do I get weird spaces when I print an array of lines?
+=item *
+
+How do I traverse a directory tree?
+
+=item *
+
+How do I delete a directory tree?
+
+=item *
+
+How do I copy an entire directory?
+
=back
=item *
+How do I match XML, HTML, or other nasty, ugly things with a regex?
+
+=item *
+
I put a regular expression into $/ but it didn't work. What's wrong?
=item *
=item *
-How can I find out my current package?
+How can I find out my current or calling package?
=item *
=head1 NAME
-perlfaq1 - General Questions About Perl ($Revision: 10427 $)
+perlfaq1 - General Questions About Perl
=head1 DESCRIPTION
(contributed by brian d foy)
There is often a matter of opinion and taste, and there isn't any one
-answer that fits anyone. In general, you want to use either the current
+answer that fits everyone. In general, you want to use either the current
stable release, or the stable release immediately prior to that one.
Currently, those are perl5.10.x and perl5.8.x, respectively.
=item *
-There is no Perl 6 release scheduled, but it will be available when
-it's ready. Stay tuned, but don't worry that you'll have to change
+There is no Perl 6 release scheduled, but it will be available when
+it's ready. Stay tuned, but don't worry that you'll have to change
major versions of Perl; no one is going to take Perl 5 away from you.
=item *
One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to
signify the language proper and "perl" the implementation of it, i.e.
the current interpreter. Hence Tom's quip that "Nothing but perl can
-parse Perl."
+parse Perl."
Before the first edition of I<Programming perl>, people commonly
referred to the language as "perl", and its name appeared that way in
previously used the phrase with many subjects ("Just another x hacker,"),
so to distinguish his JAPH, he started to write them as Perl programs:
- print "Just another Perl hacker, ";
-
-Note the trailing comma and space, which allows the addition of other
-JAxH clauses for his many other interests.
+ print "Just another Perl hacker,";
Other people picked up on this and started to write clever or obfuscated
programs to produce the same output, spinning things quickly out of
=head1 REVISION
-Revision: $Revision: 10427 $
+Revision: $Revision$
-Date: $Date: 2007-12-14 00:39:01 +0100 (Fri, 14 Dec 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq2 - Obtaining and Learning about Perl ($Revision: 10144 $)
+perlfaq2 - Obtaining and Learning about Perl
=head1 DESCRIPTION
The standard release of perl (the one maintained by the perl
development team) is distributed only in source code form. You
-can find this at http://www.cpan.org/src/latest.tar.gz , which
-is in a standard Internet format (a gzipped archive in POSIX tar format).
+can find the latest releases at http://www.cpan.org/src/README.html .
Perl builds and runs on a bewildering number of platforms. Virtually
all known and current Unix derivatives are supported (perl's native
platform), as are other systems like VMS, DOS, OS/2, Windows,
QNX, BeOS, OS X, MPE/iX and the Amiga.
-Binary distributions for some proprietary platforms, including
-Apple systems, can be found http://www.cpan.org/ports/ directory.
-Because these are not part of the standard distribution, they may
-and in fact do differ from the base perl port in a variety of ways.
-You'll have to check their respective release notes to see just
-what the differences are. These differences can be either positive
-(e.g. extensions for the features of the particular platform that
-are not supported in the source release of perl) or negative (e.g.
-might be based upon a less current source release of perl).
+Binary distributions for some proprietary platforms can be found
+http://www.cpan.org/ports/ directory. Because these are not part of
+the standard distribution, they may and in fact do differ from the
+base perl port in a variety of ways. You'll have to check their
+respective release notes to see just what the differences are. These
+differences can be either positive (e.g. extensions for the features
+of the particular platform that are not supported in the source
+release of perl) or negative (e.g. might be based upon a less current
+source release of perl).
=head2 How can I get a binary version of perl?
-For Windows, ActiveState provides a pre-built Perl for free:
+(contributed by brian d foy)
+
+ActiveState: Windows, Linux, Mac OS X, Solaris, AIX and HP-UX
http://www.activestate.com/
-Sunfreeware.com provides binaries for many utilities, including
-Perl, for Solaris on both Intel and SPARC hardware:
+Sunfreeware.com: Solaris 2.5 to Solaris 10 (SPARC and x86)
http://www.sunfreeware.com/
-If you don't have a C compiler because your vendor for whatever
-reasons did not include one with your system, the best thing to do is
-grab a binary version of gcc from the net and use that to compile perl
-with. CPAN only has binaries for systems that are terribly hard to
-get free compilers for, not for Unix systems.
-
-Some URLs that might help you are:
+Strawberry Perl: Windows, Perl 5.8.8 and 5.10.0
- http://www.cpan.org/ports/
- http://www.perl.com/pub/language/info/software.html
+ http://www.strawberryperl.com
+
+IndigoPerl: Windows
-Someone looking for a perl for Win16 might look to Laszlo Molnar's
-djgpp port in http://www.cpan.org/ports/#msdos , which comes with
-clear installation instructions.
+ http://indigostar.com/
=head2 I don't have a C compiler. How can I build my own Perl interpreter?
first. Consult the Usenet FAQs for your operating system for
information on where to get such a binary version.
-You might look around the net for a pre-built binary of Perl (or a
+You might look around the net for a pre-built binary of Perl (or a
C compiler!) that meets your needs, though:
For Windows, Vanilla Perl ( http://vanillaperl.com/ ) and Strawberry Perl
-( http://strawberryperl.com/ ) come with a
+( http://strawberryperl.com/ ) come with a
bundled C compiler. ActivePerl is a pre-compiled version of Perl
ready-to-use.
-For Sun systems, SunFreeware.com provides binaries of most popular
+For Sun systems, SunFreeware.com provides binaries of most popular
applications, including compilers and Perl.
=head2 I copied the perl binary from one machine to another, but scripts don't work.
Several groups devoted to the Perl language are on Usenet:
- comp.lang.perl.announce Moderated announcement group
- comp.lang.perl.misc High traffic general Perl discussion
- comp.lang.perl.moderated Moderated discussion group
- comp.lang.perl.modules Use and development of Perl modules
- comp.lang.perl.tk Using Tk (and X) from Perl
-
- comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web.
+ comp.lang.perl.announce Moderated announcement group
+ comp.lang.perl.misc High traffic general Perl discussion
+ comp.lang.perl.moderated Moderated discussion group
+ comp.lang.perl.modules Use and development of Perl modules
+ comp.lang.perl.tk Using Tk (and X) from Perl
Some years ago, comp.lang.perl was divided into those groups, and
comp.lang.perl itself officially removed. While that group may still
Perl 5 Pocket Reference
by Johan Vromans
- ISBN 0-596-00032-4 [3rd edition May 2000]
- http://www.oreilly.com/catalog/perlpr3/
+ ISBN 0-596-00374-9 [4th edition July 2002]
+ http://www.oreilly.com/catalog/perlpr4/
=item Tutorials
Learning Perl
by Randal L. Schwartz, Tom Phoenix, and brian d foy
- ISBN 0-596-10105-8 [4th edition July 2005]
- http://www.oreilly.com/catalog/learnperl4/
+ ISBN 0-596-52010-7 [5th edition June 2008]
+ http://oreilly.com/catalog/9780596520106/
Intermediate Perl (the "Alpaca Book")
by Randal L. Schwartz and brian d foy, with Tom Phoenix (foreword by Damian Conway)
The Google search engine now carries archived and searchable newsgroup
content.
-http://groups.google.com/groups?group=comp.lang.perl.misc
+http://groups.google.com/group/comp.lang.perl.misc/topics
If you have a question, you can be sure someone has already asked the
same question at some point on c.l.p.m. It requires some time and patience
=head2 Where do I send bug reports?
-If you are reporting a bug in the perl interpreter or the modules
-shipped with Perl, use the I<perlbug> program in the Perl distribution or
-mail your report to perlbug@perl.org or at http://rt.perl.org/perlbug/ .
+(contributed by brian d foy)
+
+First, ensure that you've found an actual bug. Second, ensure you've
+found an actual bug.
+
+If you've found a bug with the perl interpreter or one of the modules
+in the standard library (those that come with Perl), you can use the
+C<perlbug> utility that comes with Perl (>= 5.004). It collects
+information about your installation to include with your message, then
+sends the message to the right place.
-For Perl modules, you can submit bug reports to the Request Tracker set
-up at http://rt.cpan.org .
+To determine if a module came with your version of Perl, you can
+use the C<Module::CoreList> module. It has the information about
+the modules (with their versions) included with each release of Perl.
-If you are posting a bug with a non-standard port (see the answer to
-"What platforms is perl available for?"), a binary distribution, or a
-non-standard module (such as Tk, CGI, etc), then please see the
-documentation that came with it to determine the correct place to post
-bugs.
+Every CPAN module has a bug tracker set up in RT, http://rt.cpan.org .
+You can submit bugs to RT either through its web interface or by
+email. To email a bug report, send it to
+bug-E<lt>distribution-nameE<gt>@rt.cpan.org . For example, if you
+wanted to report a bug in C<Business::ISBN>, you could send a message to
+bug-Business-ISBN@rt.cpan.org .
-Read the perlbug(1) man page (perl5.004 or later) for more information.
+Some modules might have special reporting requirements, such as a
+Sourceforge or Google Code tracking system, so you should check the
+module documentation too.
=head2 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org?
=head1 REVISION
-Revision: $Revision: 10144 $
+Revision: $Revision$
-Date: $Date: 2007-10-31 13:50:01 +0100 (Wed, 31 Oct 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq3 - Programming Tools ($Revision: 10127 $)
+perlfaq3 - Programming Tools
=head1 DESCRIPTION
Zoidberg is a similar project and provides a shell written in perl,
configured in perl and operated in perl. It is intended as a login shell
-and development environment. It can be found at http://zoidberg.sf.net/
+and development environment. It can be found at
+http://pardus-larus.student.utwente.nl/~pardus/projects/zoidberg/
or your local CPAN mirror.
The Shell.pm module (distributed with Perl) makes Perl try commands
=head2 How do I find which modules are installed on my system?
-You can use the ExtUtils::Installed module to show all installed
-distributions, although it can take awhile to do its magic. The
-standard library which comes with Perl just shows up as "Perl" (although
-you can get those with Module::CoreList).
+From the command line, you can use the C<cpan> command's C<-l> switch:
+
+ $ cpan -l
+
+You can also use C<cpan>'s C<-a> switch to create an autobundle file
+that C<CPAN.pm> understands and cna use to re-install every module:
+
+ $ cpan -a
+
+Inside a Perl program, you can use the ExtUtils::Installed module to
+show all installed distributions, although it can take awhile to do
+its magic. The standard library which comes with Perl just shows up
+as "Perl" (although you can get those with Module::CoreList).
use ExtUtils::Installed;
use File::Find::Rule;
- my @files = File::Find::Rule->file()->name( '*.pm' )->in( @INC );
+ my @files = File::Find::Rule->
+ extras({follow => 1})->
+ file()->
+ name( '*.pm' )->
+ in( @INC )
+ ;
If you do not have that module, you can do the same thing
with File::Find which is part of the standard library.
- use File::Find;
- my @files;
-
- find(
- sub {
- push @files, $File::Find::name
- if -f $File::Find::name && /\.pm$/
- },
-
- @INC
- );
+ use File::Find;
+ my @files;
+
+ find(
+ {
+ wanted => sub {
+ push @files, $File::Find::fullname
+ if -f $File::Find::fullname && /\.pm$/
+ },
+ follow => 1,
+ follow_skip => 2,
+ },
+ @INC
+ );
print join "\n", @files;
If you cannot read the documentation, the module might not
have any (in rare cases).
- prompt% perldoc Module::Name
+ $ perldoc Module::Name
You can also try to include the module in a one-liner to see if
perl finds it.
- perl -MModule::Name -e1
+ $ perl -MModule::Name -e1
=head2 How do I debug my Perl programs?
=head2 How do I profile my Perl programs?
-You should get the Devel::DProf module from the standard distribution
-(or separately on CPAN) and also use Benchmark.pm from the standard
-distribution. The Benchmark module lets you time specific portions of
-your code, while Devel::DProf gives detailed breakdowns of where your
-code spends its time.
+(contributed by brian d foy, updated Fri Jul 25 12:22:26 PDT 2008)
+
+The C<Devel> namespace has several modules which you can use to
+profile your Perl programs. The C<Devel::DProf> module comes with Perl
+and you can invoke it with the C<-d> switch:
+
+ perl -d:DProf program.pl
+
+After running your program under C<DProf>, you'll get a F<tmon.out> file
+with the profile data. To look at the data, you can turn it into a
+human-readable report with the C<dprofpp> program that comes with
+C<Devel::DProf>.
+
+ dprofpp
-Here's a sample use of Benchmark:
+You can also do the profiling and reporting in one step with the C<-p>
+switch to <dprofpp>:
- use Benchmark;
+ dprofpp -p program.pl
- @junk = `cat /etc/motd`;
- $count = 10_000;
+The C<Devel::NYTProf> (New York Times Profiler) does both statement
+and subroutine profiling. It's available from CPAN and you also invoke
+it with the C<-d> switch:
- timethese($count, {
- 'map' => sub { my @a = @junk;
- map { s/a/b/ } @a;
- return @a },
- 'for' => sub { my @a = @junk;
- for (@a) { s/a/b/ };
- return @a },
- });
+ perl -d:NYTProf some_perl.pl
-This is what it prints (on one machine--your results will be dependent
-on your hardware, operating system, and the load on your machine):
+Like C<DProf>, it creates a database of the profile information that you
+can turn into reports. The C<nytprofhtml> command turns the data into
+an HTML report similar to the C<Devel::Cover> report:
- Benchmark: timing 10000 iterations of for, map...
- for: 4 secs ( 3.97 usr 0.01 sys = 3.98 cpu)
- map: 6 secs ( 4.97 usr 0.00 sys = 4.97 cpu)
+ nytprofhtml
-Be aware that a good benchmark is very hard to write. It only tests the
-data you give it and proves little about the differing complexities
-of contrasting algorithms.
+CPAN has several other profilers that you can invoke in the same
+fashion. You might also be interested in using the C<Benchmark> to
+measure and compare code snippets.
+
+You can read more about profiling in I<Programming Perl>, chapter 20,
+or I<Mastering Perl>, chapter 5.
+
+L<perldebguts> documents creating a custom debugger if you need to
+create a special sort of profiler. brian d foy describes the process
+in I<The Perl Journal>, "Creating a Perl Debugger",
+http://www.ddj.com/184404522 , and "Profiling in Perl"
+http://www.ddj.com/184404580 .
+
+Perl.com has two interesting articles on profiling: "Profiling Perl",
+by Simon Cozens, http://www.perl.com/lpt/a/850 and "Debugging and
+Profiling mod_perl Applications", by Frank Wiles,
+http://www.perl.com/pub/a/2006/02/09/debug_mod_perl.html .
+
+Randal L. Schwartz writes about profiling in "Speeding up Your Perl
+Programs" for I<Unix Review>,
+http://www.stonehenge.com/merlyn/UnixReview/col49.html , and "Profiling
+in Template Toolkit via Overriding" for I<Linux Magazine>,
+http://www.stonehenge.com/merlyn/LinuxMag/col75.html .
=head2 How do I cross-reference my Perl programs?
OptiPerl is a Windows IDE with simulated CGI environment, including
debugger and syntax highlighting editor.
+=item Padre
+
+http://padre.perlide.org/
+
+Padre is cross-platform IDE for Perl written in Perl using the the wxWidgets
+to provide a native look and feel. It's open source under the Artistic
+License.
+
=item PerlBuilder
http://www.solutionsoft.com/perl.htm
-PerlBuidler is an integrated development environment for Windows that
+PerlBuilder is an integrated development environment for Windows that
supports Perl development.
=item visiPerl+
to use Perl as the scripting language. nvi is not alone in this,
though: at least also vim and vile offer an embedded Perl.
-The following are Win32 multilanguage editor/IDESs that support Perl:
+The following are Win32 multilanguage editor/IDEs that support Perl:
=over 4
http://www.slickedit.com/
+=item ConTEXT
+
+http://www.contexteditor.org/
+
=back
There is also a toyedit Text widget based editor written in Perl
=item Ksh
-from the MKS Toolkit ( http://www.mks.com/ ), or the Bourne shell of
+from the MKS Toolkit ( http://www.mkssoftware.com/ ), or the Bourne shell of
the U/WIN environment ( http://www.research.att.com/sw/tools/uwin/ )
=item Tcsh
=back
MKS and U/WIN are commercial (U/WIN is free for educational and
-research purposes), Cygwin is covered by the GNU Public License (but
-that shouldn't matter for Perl use). The Cygwin, MKS, and U/WIN all
-contain (in addition to the shells) a comprehensive set of standard
-UNIX toolkit utilities.
+research purposes), Cygwin is covered by the GNU General Public
+License (but that shouldn't matter for Perl use). The Cygwin, MKS,
+and U/WIN all contain (in addition to the shells) a comprehensive set
+of standard UNIX toolkit utilities.
If you're transferring text files between Unix and Windows using FTP
be sure to transfer them in ASCII mode so the ends of lines are
=back
-Pepper and Pe are programming language sensitive text editors for Mac
-OS X and BeOS respectively ( http://www.hekkelman.com/ ).
-
=head2 Where can I get Perl macros for vi?
For a complete version of Tom Christiansen's vi configuration file,
=item Wx
-This is a Perl binding for the cross-platform wxWidgets toolkit
-L<http://www.wxwidgets.org>. It works under Unix, Win32 and Mac OS X,
+This is a Perl binding for the cross-platform wxWidgets toolkit
+( http://www.wxwidgets.org ). It works under Unix, Win32 and Mac OS X,
using native widgets (Gtk under Unix). The interface follows the C++
interface closely, but the documentation is a little sparse for someone
who doesn't know the library, mostly just referring you to the C++
=item Gtk and Gtk2
-These are Perl bindings for the Gtk toolkit L<http://www.gtk.org>. The
+These are Perl bindings for the Gtk toolkit ( http://www.gtk.org ). The
interface changed significantly between versions 1 and 2 so they have
separate Perl modules. It runs under Unix, Win32 and Mac OS X (currently
it requires an X server on Mac OS, but a 'native' port is underway), and
=item CamelBones
-CamelBones L<http://camelbones.sourceforge.net> is a Perl interface to
+CamelBones ( http://camelbones.sourceforge.net ) is a Perl interface to
Mac OS X's Cocoa GUI toolkit, and as such can be used to produce native
GUIs on Mac OS X. It's not on CPAN, as it requires frameworks that
CPAN.pm doesn't know how to install, but installation is via the
cannot be reclaimed or reused even if they go out of scope. It is
reserved in case the variables come back into scope. Memory allocated
to global variables can be reused (within your program) by using
-undef()ing and/or delete().
+undef() and/or delete().
On most operating systems, memory allocated to a program can never be
returned to the system. That's why long-running programs sometimes re-
=head1 REVISION
-Revision: $Revision: 10127 $
+Revision: $Revision$
-Date: $Date: 2007-10-27 21:40:20 +0200 (Sat, 27 Oct 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 10394 $)
+perlfaq4 - Data Manipulation
=head1 DESCRIPTION
=head2 Why isn't my octal data interpreted correctly?
-Perl only understands octal and hex numbers as such when they occur as
-literals in your program. Octal literals in perl must start with a
-leading C<0> and hexadecimal literals must start with a leading C<0x>.
-If they are read in from somewhere and assigned, no automatic
-conversion takes place. You must explicitly use C<oct()> or C<hex()> if you
-want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>),
-octal (C<0350> or even without the leading C<0>, like C<377>) and binary
-(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with
-or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>.
-The inverse mapping from decimal to octal can be done with either the
-<%o> or C<%O> C<sprintf()> formats.
+(contributed by brian d foy)
+
+You're probably trying to convert a string to a number, which Perl only
+converts as a decimal number. When Perl converts a string to a number, it
+ignores leading spaces and zeroes, then assumes the rest of the digits
+are in base 10:
+
+ my $string = '0644';
+
+ print $string + 0; # prints 644
+
+ print $string + 44; # prints 688, certainly not octal!
+
+This problem usually involves one of the Perl built-ins that has the
+same name a unix command that uses octal numbers as arguments on the
+command line. In this example, C<chmod> on the command line knows that
+its first argument is octal because that's what it does:
+
+ %prompt> chmod 644 file
+
+If you want to use the same literal digits (644) in Perl, you have to tell
+Perl to treat them as octal numbers either by prefixing the digits with
+a C<0> or using C<oct>:
+
+ chmod( 0644, $file); # right, has leading zero
+ chmod( oct(644), $file ); # also correct
-This problem shows up most often when people try using C<chmod()>,
-C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition
-typically take permissions in octal.
+The problem comes in when you take your numbers from something that Perl
+thinks is a string, such as a command line argument in C<@ARGV>:
- chmod(644, $file); # WRONG
- chmod(0644, $file); # right
+ chmod( $ARGV[0], $file); # wrong, even if "0644"
-Note the mistake in the first line was specifying the decimal literal
-C<644>, rather than the intended octal literal C<0644>. The problem can
-be seen with:
+ chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
- printf("%#o",644); # prints 01204
+You can always check the value you're using by printing it in octal
+notation to ensure it matches what you think it should be. Print it
+in octal and decimal format:
-Surely you had not intended C<chmod(01204, $file);> - did you? If you
-want to use numeric literals as arguments to chmod() et al. then please
-try to express them as octal constants, that is with a leading zero and
-with the following digits restricted to the set C<0..7>.
+ printf "0%o %d", $number, $number;
=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
=head2 How do I get a random number between X and Y?
To get a random number between two values, you can use the C<rand()>
-builtin to get a random number between 0 and 1. From there, you shift
+built-in to get a random number between 0 and 1. From there, you shift
that into the range that you want.
C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
That is, to get a number between 10 and 15, inclusive, you want a
random number between 0 and 5 that you can then add to 10.
- my $number = 10 + int rand( 15-10+1 );
+ my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
Hence you derive the following simple function to abstract
that. It selects a random integer between the two given
31
=head2 How do I find yesterday's date?
+X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
+X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
+X<timelocal>
(contributed by brian d foy)
most people, there are two days a year when they aren't: the switch to
and from summer time throws this off. Let the modules do the work.
+If you absolutely must do it yourself (or can't use one of the
+modules), here's a solution using C<Time::Local>, which comes with
+Perl:
+
+ # contributed by Gunnar Hjalmarsson
+ use Time::Local;
+ my $today = timelocal 0, 0, 12, ( localtime )[3..5];
+ my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
+ printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
+
+In this case, you measure the day starting at noon, and subtract 24
+hours. Even if the length of the calendar day is 23 or 25 hours,
+you'll still end up on the previous calendar day, although not at
+noon. Since you don't care about the time, the one hour difference
+doesn't matter and you end up with the previous date.
+
=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
break Y2K, people do." See http://www.perl.org/about/y2k.html for
a longer exposition.
-=head2 Does Perl have a Year 2038 problem?
-
-No, all of Perl's built in date and time functions and modules will
-work to about 2 billion years before and after 1970.
-
-Many systems cannot count time past the year 2038. Older versions of
-Perl were dependent on the system to do date calculation and thus
-shared their 2038 bug.
-
=head1 Data: Strings
=head2 How do I validate input?
$count = () = $string =~ /-\d+/g;
-=head2 How do I capitalize all the words on one line?
-
-To make the first letter of each word upper case:
-
- $line =~ s/\b(\w)/\U$1/g;
-
-This has the strange effect of turning "C<don't do it>" into "C<Don'T
-Do It>". Sometimes you might want this. Other times you might need a
-more thorough solution (Suggested by brian d foy):
-
- $string =~ s/ (
- (^\w) #at the beginning of the line
- | # or
- (\s\w) #preceded by whitespace
- )
- /\U$1/xg;
-
- $string =~ s/([\w']+)/\u\L$1/g;
-
-To make the whole line upper case:
-
- $line = uc($line);
+=head2 Does Perl have a Year 2038 problem?
-To force each word to be lower case, with the first letter upper case:
+No, all of Perl's built in date and time functions and modules will
+work to about 2 billion years before and after 1970.
- $line =~ s/(\w+)/\u\L$1/g;
+Many systems cannot count time past the year 2038. Older versions of
+Perl were dependent on the system to do date calculation and thus
+shared their 2038 bug.
-You can (and probably should) enable locale awareness of those
-characters by placing a C<use locale> pragma in your program.
-See L<perllocale> for endless details on locales.
+=head2 How do I capitalize all the words on one line?
+X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
-This is sometimes referred to as putting something into "title
-case", but that's not quite accurate. Consider the proper
-capitalization of the movie I<Dr. Strangelove or: How I Learned to
-Stop Worrying and Love the Bomb>, for example.
+(contributed by brian d foy)
-Damian Conway's L<Text::Autoformat> module provides some smart
-case transformations:
+Damian Conway's L<Text::Autoformat> handles all of the thinking
+for you.
use Text::Autoformat;
my $x = "Dr. Strangelove or: How I Learned to Stop ".
print autoformat($x, { case => $style }), "\n";
}
+How do you want to capitalize those words?
+
+ FRED AND BARNEY'S LODGE # all uppercase
+ Fred And Barney's Lodge # title case
+ Fred and Barney's Lodge # highlight case
+
+It's not as easy a problem as it looks. How many words do you think
+are in there? Wait for it... wait for it.... If you answered 5
+you're right. Perl words are groups of C<\w+>, but that's not what
+you want to capitalize. How is Perl supposed to know not to capitalize
+that C<s> after the apostrophe? You could try a regular expression:
+
+ $string =~ s/ (
+ (^\w) #at the beginning of the line
+ | # or
+ (\s\w) #preceded by whitespace
+ )
+ /\U$1/xg;
+
+ $string =~ s/([\w']+)/\u\L$1/g;
+
+Now, what if you don't want to capitalize that "and"? Just use
+L<Text::Autoformat> and get on with the next problem. :)
+
=head2 How can I split a [character] delimited string except when inside [character]?
Several modules can handle this sort of parsing--C<Text::Balanced>,
If you want to work with comma-separated values, don't do this since
that format is a bit more complicated. Use one of the modules that
-handle that fornat, such as C<Text::CSV>, C<Text::CSV_XS>, or
+handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
C<Text::CSV_PP>.
If you want to break apart an entire line of fixed columns, you can use
The C</e> will also silently ignore violations of strict, replacing
undefined variable names with the empty string. Since I'm using the
-C</e> flag (twice even!), I have all of the same security problems I
+C</e> flag (twice even!), I have all of the same security problems I
have with C<eval> in its string form. If there's something odd in
C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
I could get myself in trouble.
signal that I missed something:
my $string = 'This has $foo and $bar';
-
+
my %Replacements = (
foo => 'Fred',
);
=head2 How can I tell whether a certain element is contained in a list or array?
-(portions of this answer contributed by Anno Siegel)
+(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an I<in>dication that you probably should have
used a hash, not a list or array, to store your data. Hashes are
designed to answer this question quickly and efficiently. Arrays aren't.
-That being said, there are several ways to approach this. If you
+That being said, there are several ways to approach this. In Perl 5.10
+and later, you can use the smart match operator to check that an item is
+contained in an array or a hash:
+
+ use 5.010;
+
+ if( $item ~~ @array )
+ {
+ say "The array contains $item"
+ }
+
+ if( $item ~~ %hash )
+ {
+ say "The hash contains $item"
+ }
+
+With earlier versions of Perl, you have to do a bit more work. If you
are going to make this query many times over arbitrary string values,
the fastest way is probably to invert the original array and maintain a
-hash whose keys are the first array's values.
+hash whose keys are the first array's values:
@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
=head2 How do I test whether two arrays or hashes are equal?
+With Perl 5.10 and later, the smart match operator can give you the answer
+with the least amount of work:
+
+ use 5.010;
+
+ if( @array1 ~~ @array2 )
+ {
+ say "The arrays are the same";
+ }
+
+ if( %hash1 ~~ %hash2 ) # doesn't check values!
+ {
+ say "The hash keys are the same";
+ }
+
The following code works for single-level arrays. It uses a
stringwise comparison, and does not distinguish defined versus
undefined empty strings. Modify if you have other needs.
But again, Perl's built-in are virtually always good enough.
=head2 How do I handle circular lists?
+X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
+X<cycle> X<modulus>
-Circular lists could be handled in the traditional fashion with linked
-lists, or you could just do something like this with an array:
+(contributed by brian d foy)
+
+If you want to cycle through an array endlessy, you can increment the
+index modulo the number of elements in the array:
- unshift(@array, pop(@array)); # the last shall be first
- push(@array, shift(@array)); # and vice versa
+ my @array = qw( a b c );
+ my $i = 0;
+
+ while( 1 ) {
+ print $array[ $i++ % @array ], "\n";
+ last if $i > 20;
+ }
-You can also use C<Tie::Cycle>:
+You can also use C<Tie::Cycle> to use a scalar that always has the
+next element of the circular array:
use Tie::Cycle;
print $cycle; # 000000
print $cycle; # FFFF00
+The C<Array::Iterator::Circular> creates an iterator object for
+circular arrays:
+
+ use Array::Iterator::Circular;
+
+ my $color_iterator = Array::Iterator::Circular->new(
+ qw(red green blue orange)
+ );
+
+ foreach ( 1 .. 20 ) {
+ print $color_iterator->next, "\n";
+ }
+
=head2 How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
sub fisher_yates_shuffle {
my $deck = shift; # $deck is a reference to an array
+ return unless @$deck; # must not be empty!
+
my $i = @$deck;
while (--$i) {
my $j = int rand ($i+1);
you can enumerate all the permutations of C<0..9> like this:
use Algorithm::Loops qw(NextPermuteNum);
-
+
my @list= 0..9;
do { print "@list\n" } while NextPermuteNum @list;
Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
operations.
-For example, this sets C<$vec> to have bit N set if C<$ints[N]> was
-set:
+For example, you don't have to store individual bits in an array
+(which would mean that you're wasting a lot of space). To convert an
+array of bits to a string, use C<vec()> to set the right bits. This
+sets C<$vec> to have bit N set only if C<$ints[N]> was set:
+ @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
$vec = '';
- foreach(@ints) { vec($vec,$_,1) = 1 }
+ foreach( 0 .. $#ints ) {
+ vec($vec,$_,1) = 1 if $ints[$_];
+ }
-Here's how, given a vector in C<$vec>, you can get those bits into your
-C<@ints> array:
+The string C<$vec> only takes up as many bits as it needs. For
+instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
+bytes to store them (not counting the scalar variable overhead).
+
+Here's how, given a vector in C<$vec>, you can get those bits into
+your C<@ints> array:
sub bitvec_to_list {
my $vec = shift;
}
Once you have the list of keys, you can process that list before you
-process the hashh elements. For instance, you can sort the keys so you
+process the hash elements. For instance, you can sort the keys so you
can process them in lexical order:
foreach my $key ( sort keys %hash ) {
}
If the hash is very large, you might not want to create a long list of
-keys. To save some memory, you can grab on key-value pair at a time using
+keys. To save some memory, you can grab one key-value pair at a time using
C<each()>, which returns a pair you haven't seen yet:
while( my( $key, $value ) = each( %hash ) ) {
the iterator and mess up your processing. See the C<each> entry in
L<perlfunc> for more details.
+=head2 How do I merge two hashes?
+X<hash> X<merge> X<slice, hash>
+
+(contributed by brian d foy)
+
+Before you decide to merge two hashes, you have to decide what to do
+if both hashes contain keys that are the same and if you want to leave
+the original hashes as they were.
+
+If you want to preserve the original hashes, copy one hash (C<%hash1>)
+to a new hash (C<%new_hash>), then add the keys from the other hash
+(C<%hash2> to the new hash. Checking that the key already exists in
+C<%new_hash> gives you a chance to decide what to do with the
+duplicates:
+
+ my %new_hash = %hash1; # make a copy; leave %hash1 alone
+
+ foreach my $key2 ( keys %hash2 )
+ {
+ if( exists $new_hash{$key2} )
+ {
+ warn "Key [$key2] is in both hashes!";
+ # handle the duplicate (perhaps only warning)
+ ...
+ next;
+ }
+ else
+ {
+ $new_hash{$key2} = $hash2{$key2};
+ }
+ }
+
+If you don't want to create a new hash, you can still use this looping
+technique; just change the C<%new_hash> to C<%hash1>.
+
+ foreach my $key2 ( keys %hash2 )
+ {
+ if( exists $hash1{$key2} )
+ {
+ warn "Key [$key2] is in both hashes!";
+ # handle the duplicate (perhaps only warning)
+ ...
+ next;
+ }
+ else
+ {
+ $hash1{$key2} = $hash2{$key2};
+ }
+ }
+
+If you don't care that one hash overwrites keys and values from the other, you
+could just use a hash slice to add one hash to another. In this case, values
+from C<%hash2> replace values from C<%hash1> when they have keys in common:
+
+ @hash1{ keys %hash2 } = values %hash2;
+
=head2 What happens if I add or remove keys from a hash while iterating over it?
(contributed by brian d foy)
=head2 How can I know how many entries are in a hash?
-If you mean how many keys, then all you have to do is
-use the keys() function in a scalar context:
+(contributed by brian d foy)
+
+This is very similar to "How do I process an entire hash?", also in
+L<perlfaq4>, but a bit simpler in the common cases.
+
+You can use the C<keys()> built-in function in scalar context to find out
+have many entries you have in a hash:
- $num_keys = keys %hash;
+ my $key_count = keys %hash; # must be scalar context!
+
+If you want to find out how many entries have a defined value, that's
+a bit different. You have to check each value. A C<grep> is handy:
+
+ my $defined_value_count = grep { defined } values %hash;
-The keys() function also resets the iterator, which means that you may
+You can use that same structure to count the entries any way that
+you like. If you want the count of the keys with vowels in them,
+you just test for that instead:
+
+ my $vowel_count = grep { /[aeiou]/ } keys %hash;
+
+The C<grep> in scalar context returns the count. If you want the list
+of matching items, just use it in list context instead:
+
+ my @defined_values = grep { defined } values %hash;
+
+The C<keys()> function also resets the iterator, which means that you may
see strange results if you use this between uses of other hash operators
-such as each().
+such as C<each()>.
=head2 How do I sort a hash (optionally by value instead of key)?
foreach my $key ( @keys )
{
- printf "%-20s %6d\n", $key, $hash{$value};
+ printf "%-20s %6d\n", $key, $hash{$key};
}
We could get more fancy in the C<sort()> block though. Instead of
=head2 Why does passing a subroutine an undefined element in a hash create it?
-If you say something like:
+(contributed by brian d foy)
+
+Are you using a really old version of Perl?
+
+Normally, accessing a hash key's value for a nonexistent key will
+I<not> create the key.
+
+ my %hash = ();
+ my $value = $hash{ 'foo' };
+ print "This won't print\n" if exists $hash{ 'foo' };
+
+Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
+Since you could assign directly to C<$_[0]>, Perl had to be ready to
+make that assignment so it created the hash key ahead of time:
+
+ my_sub( $hash{ 'foo' } );
+ print "This will print before 5.004\n" if exists $hash{ 'foo' };
- somefunc($hash{"nonesuch key here"});
+ sub my_sub {
+ # $_[0] = 'bar'; # create hash key in case you do this
+ 1;
+ }
+
+Since Perl 5.004, however, this situation is a special case and Perl
+creates the hash key only when you make the assignment:
-Then that element "autovivifies"; that is, it springs into existence
-whether you store something there or not. That's because functions
-get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
-it has to be ready to write it back into the caller's version.
+ my_sub( $hash{ 'foo' } );
+ print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
+
+ sub my_sub {
+ $_[0] = 'bar';
+ }
-This has been fixed as of Perl5.004.
+However, if you want the old behavior (and think carefully about that
+because it's a weird side effect), you can pass a hash slice instead.
+Perl 5.004 didn't make this a special case:
-Normally, merely accessing a key's value for a nonexistent key does
-I<not> cause that key to be forever there. This is different than
-awk's behavior.
+ my_sub( @hash{ qw/foo/ } );
=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
=head2 How can I use a reference as a hash key?
-(contributed by brian d foy)
+(contributed by brian d foy and Ben Morrow)
Hash keys are strings, so you can't really use a reference as the key.
When you try to do that, perl turns the reference into its stringified
form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
back the reference from the stringified form, at least without doing
-some extra work on your own. Also remember that hash keys must be
-unique, but two different variables can store the same reference (and
-those variables can change later).
-
-The C<Tie::RefHash> module, which is distributed with perl, might be
-what you want. It handles that extra work.
+some extra work on your own.
+
+Remember that the entry in the hash will still be there even if
+the referenced variable goes out of scope, and that it is entirely
+possible for Perl to subsequently allocate a different variable at
+the same address. This will mean a new variable might accidentally
+be associated with the value for an old.
+
+If you have Perl 5.10 or later, and you just want to store a value
+against the reference for lookup later, you can use the core
+Hash::Util::Fieldhash module. This will also handle renaming the
+keys if you use multiple threads (which causes all variables to be
+reallocated at new addresses, changing their stringification), and
+garbage-collecting the entries when the referenced variable goes out
+of scope.
+
+If you actually need to be able to get a real reference back from
+each hash entry, you can use the Tie::RefHash module, which does the
+required work for you.
=head1 Data: Misc
=head2 How do I define methods for every class/object?
-Use the C<UNIVERSAL> class (see L<UNIVERSAL>).
+(contributed by Ben Morrow)
+
+You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
+be very careful to consider the consequences of doing this: adding
+methods to every object is very likely to have unintended
+consequences. If possible, it would be better to have all your object
+inherit from some common base class, or to use an object system like
+Moose that supports roles.
=head2 How do I verify a credit card checksum?
=head2 How do I pack arrays of doubles or floats for XS code?
-The kgbpack.c code in the C<PGPLOT> module on CPAN does just this.
+The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
If you're doing a lot of float or double processing, consider using
the C<PDL> module from CPAN instead--it makes number-crunching easy.
+See L<http://search.cpan.org/dist/PGPLOT> for the code.
+
=head1 REVISION
-Revision: $Revision: 10394 $
+Revision: $Revision$
-Date: $Date: 2007-12-09 18:47:15 +0100 (Sun, 09 Dec 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq5 - Files and Formats ($Revision: 10126 $)
+perlfaq5 - Files and Formats
=head1 DESCRIPTION
=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
X<flush> X<buffer> X<unbuffer> X<autoflush>
-Perl does not support truly unbuffered output (except insofar as you
-can C<syswrite(OUT, $char, 1)>), although it does support is "command
-buffering", in which a physical write is performed after every output
-command.
-
-The C standard I/O library (stdio) normally buffers characters sent to
-devices so that there isn't a system call for each byte. In most stdio
-implementations, the type of output buffering and the size of the
-buffer varies according to the type of device. Perl's C<print()> and
-C<write()> functions normally buffer output, while C<syswrite()>
-bypasses buffering all together.
-
-If you want your output to be sent immediately when you execute
-C<print()> or C<write()> (for instance, for some network protocols),
-you must set the handle's autoflush flag. This flag is the Perl
-variable C<$|> and when it is set to a true value, Perl will flush the
-handle's buffer after each C<print()> or C<write()>. Setting C<$|>
-affects buffering only for the currently selected default filehandle.
-You choose this handle with the one argument C<select()> call (see
-L<perlvar/$E<verbar>> and L<perlfunc/select>).
-
-Use C<select()> to choose the desired handle, then set its
-per-filehandle variables.
-
- $old_fh = select(OUTPUT_HANDLE);
- $| = 1;
- select($old_fh);
+(contributed by brian d foy)
-Some modules offer object-oriented access to handles and their
-variables, although they may be overkill if this is the only thing you
-do with them. You can use C<IO::Handle>:
+You might like to read Mark Jason Dominus's "Suffering From Buffering"
+at http://perl.plover.com/FAQs/Buffering.html .
- use IO::Handle;
- open my( $printer ), ">", "/dev/printer"); # but is this?
- $printer->autoflush(1);
+Perl normally buffers output so it doesn't make a system call for every
+bit of output. By saving up output, it makes fewer expensive system calls.
+For instance, in this little bit of code, you want to print a dot to the
+screen for every line you process to watch the progress of your program.
+Instead of seeing a dot for every line, Perl buffers the output and you
+have a long wait before you see a row of 50 dots all at once:
+
+ # long wait, then row of dots all at once
+ while( <> ) {
+ print ".";
+ print "\n" unless ++$count % 50;
+
+ #... expensive line processing operations
+ }
+
+To get around this, you have to unbuffer the output filehandle, in this
+case, C<STDOUT>. You can set the special variable C<$|> to a true value
+(mnemonic: making your filehandles "piping hot"):
+
+ $|++;
+
+ # dot shown immediately
+ while( <> ) {
+ print ".";
+ print "\n" unless ++$count % 50;
+
+ #... expensive line processing operations
+ }
+
+The C<$|> is one of the per-filehandle special variables, so each
+filehandle has its own copy of its value. If you want to merge
+standard output and standard error for instance, you have to unbuffer
+each (although STDERR might be unbuffered by default):
+
+ {
+ my $previous_default = select(STDOUT); # save previous default
+ $|++; # autoflush STDOUT
+ select(STDERR);
+ $|++; # autoflush STDERR, to be sure
+ select($previous_default); # restore previous default
+ }
-or C<IO::Socket> (which inherits from C<IO::Handle>):
+ # now should alternate . and +
+ while( 1 )
+ {
+ sleep 1;
+ print STDOUT ".";
+ print STDERR "+";
+ print STDOUT "\n" unless ++$count % 25;
+ }
+
+Besides the C<$|> special variable, you can use C<binmode> to give
+your filehandle a C<:unix> layer, which is unbuffered:
+
+ binmode( STDOUT, ":unix" );
- use IO::Socket; # this one is kinda a pipe?
- my $sock = IO::Socket::INET->new( 'www.example.com:80' );
+ while( 1 ) {
+ sleep 1;
+ print ".";
+ print "\n" unless ++$count % 50;
+ }
- $sock->autoflush();
+For more information on output layers, see the entries for C<binmode>
+and C<open> in L<perlfunc>, and the C<PerlIO> module documentation.
-You can also flush an C<IO::Handle> object without setting
-C<autoflush>. Call the C<flush> method to flush the buffer yourself:
+If you are using C<IO::Handle> or one of its subclasses, you can
+call the C<autoflush> method to change the settings of the
+filehandle:
use IO::Handle;
- open my( $printer ), ">", "/dev/printer");
- $printer->flush; # one time flush
+ open my( $io_fh ), ">", "output.txt";
+ $io_fh->autoflush(1);
+
+The C<IO::Handle> objects also have a C<flush> method. You can flush
+the buffer any time you want without auto-buffering
+ $io_fh->flush;
=head2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?
X<file, editing>
open my $in, '<', $file or die "Can't read old file: $!";
open my $out, '>', "$file.new" or die "Can't write new file: $!";
- print "# Add this line to the top\n"; # <--- HERE'S THE MAGIC
+ print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC
while( <$in> )
{
open my $in, '<', $file or die "Can't read old file: $!";
open my $out, '>', "$file.new" or die "Can't write new file: $!";
- print "# Add this line to the top\n";
+ print $out "# Add this line to the top\n";
while( <$in> )
{
{
print $out $_;
}
-
+
To skip lines, use the looping controls. The C<next> in this example
skips comment lines, and the C<last> stops all processing once it
encounters either C<__END__> or C<__DATA__>.
C<.c.orig> file.
=head2 How can I copy a file?
-X<copy> X<file, copy>
+X<copy> X<file, copy> X<File::Copy>
(contributed by brian d foy)
-Use the File::Copy module. It comes with Perl and can do a
+Use the C<File::Copy> module. It comes with Perl and can do a
true copy across file systems, and it does its magic in
a portable fashion.
copy( $original, $new_copy ) or die "Copy failed: $!";
-If you can't use File::Copy, you'll have to do the work yourself:
+If you can't use C<File::Copy>, you'll have to do the work yourself:
open the original file, open the destination file, then print
-to the destination file as you read the original.
+to the destination file as you read the original. You also have to
+remember to copy the permissions, owner, and group to the new file.
=head2 How do I make a temporary file name?
X<file, temporary>
If you don't need to know the name of the file, you can use C<open()>
-with C<undef> in place of the file name. The C<open()> function
-creates an anonymous temporary file.
+with C<undef> in place of the file name. In Perl 5.8 or later, the
+C<open()> function creates an anonymous temporary file:
open my $tmp, '+>', undef or die $!;
return ();
}
}
-
+
}
=head2 How can I manipulate fixed-record-length files?
See L<perlform/"Accessing Formatting Internals"> for an C<swrite()> function.
=head2 How can I open a filehandle to a string?
-X<string>, X<open>, X<IO::Scalar>, X<filehandle>
+X<string> X<open> X<IO::String> X<filehandle>
(contributed by Peter J. Holzer, hjp-usenet2@hjp.at)
-Since Perl 5.8.0, you can pass a reference to a scalar instead of the
-filename to create a file handle which you can used to read from or write to
-a string:
+Since Perl 5.8.0 a file handle referring to a string can be created by
+calling open with a reference to that string instead of the filename.
+This file handle can then be used to read from or write to the string:
open(my $fh, '>', \$string) or die "Could not open string for writing";
print $fh "foo\n";
=head2 How can I translate tildes (~) in a filename?
X<tilde> X<tilde expansion>
-Use the <> (glob()) operator, documented in L<perlfunc>. Older
-versions of Perl require that you have a shell installed that groks
-tildes. Recent perl versions have this feature built in. The
-File::KGlob module (available from CPAN) gives more portable glob
-functionality.
+Use the E<lt>E<gt> (C<glob()>) operator, documented in L<perlfunc>.
+Versions of Perl older than 5.6 require that you have a shell
+installed that groks tildes. Later versions of Perl have this feature
+built in. The C<File::KGlob> module (available from CPAN) gives more
+portable glob functionality.
Within Perl, you may use this directly:
=head2 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
X<append> X<file, append>
-If you are on a system that correctly implements flock() and you use the
-example appending code from "perldoc -f flock" everything will be OK
-even if the OS you are on doesn't implement append mode correctly (if
-such a system exists.) So if you are happy to restrict yourself to OSs
-that implement flock() (and that's not really much of a restriction)
-then that is what you should do.
+If you are on a system that correctly implements C<flock> and you use
+the example appending code from "perldoc -f flock" everything will be
+OK even if the OS you are on doesn't implement append mode correctly
+(if such a system exists.) So if you are happy to restrict yourself to
+OSs that implement C<flock> (and that's not really much of a
+restriction) then that is what you should do.
If you know you are only going to use a system that does correctly
-implement appending (i.e. not Win32) then you can omit the seek() from
-the code in the previous answer.
-
-If you know you are only writing code to run on an OS and filesystem that
-does implement append mode correctly (a local filesystem on a modern
-Unix for example), and you keep the file in block-buffered mode and you
-write less than one buffer-full of output between each manual flushing
-of the buffer then each bufferload is almost guaranteed to be written to
-the end of the file in one chunk without getting intermingled with
-anyone else's output. You can also use the syswrite() function which is
-simply a wrapper around your systems write(2) system call.
+implement appending (i.e. not Win32) then you can omit the C<seek>
+from the code in the previous answer.
+
+If you know you are only writing code to run on an OS and filesystem
+that does implement append mode correctly (a local filesystem on a
+modern Unix for example), and you keep the file in block-buffered mode
+and you write less than one buffer-full of output between each manual
+flushing of the buffer then each bufferload is almost guaranteed to be
+written to the end of the file in one chunk without getting
+intermingled with anyone else's output. You can also use the
+C<syswrite> function which is simply a wrapper around your system's
+C<write(2)> system call.
There is still a small theoretical chance that a signal will interrupt
-the system level write() operation before completion. There is also a
-possibility that some STDIO implementations may call multiple system
-level write()s even if the buffer was empty to start. There may be some
-systems where this probability is reduced to zero.
+the system level C<write()> operation before completion. There is also
+a possibility that some STDIO implementations may call multiple system
+level C<write()>s even if the buffer was empty to start. There may be
+some systems where this probability is reduced to zero, and this is
+not a concern when using C<:perlio> instead of your system's STDIO.
=head2 How do I randomly update a binary file?
X<file, binary patch>
use File::Slurp;
$all_of_it = read_file($filename); # entire file in scalar
- @all_lines = read_file($filename); # one line perl element
+ @all_lines = read_file($filename); # one line per element
The customary Perl approach for processing all the lines in a file is to
do so one line at a time:
C<close()> function from the C<POSIX> module:
use POSIX ();
-
+
POSIX::close( $fd );
-
+
This should rarely be necessary, as the Perl C<close()> function is to be
used for things that Perl opened itself, even if it was a dup of a
numeric descriptor as with C<MHCONTEXT> above. But if you really have
=head2 How do I select a random line from a file?
X<file, selecting a random line>
-Here's an algorithm from the Camel Book:
+Short of loading the file into a database or pre-indexing the lines in
+the file, there are a couple of things that you can do.
+
+Here's a reservoir-sampling algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
in. You can find a proof of this method in I<The Art of Computer
Programming>, Volume 2, Section 3.4.2, by Donald E. Knuth.
-You can use the File::Random module which provides a function
+You can use the C<File::Random> module which provides a function
for that algorithm:
use File::Random qw/random_line/;
my $line = random_line($filename);
-Another way is to use the Tie::File module, which treats the entire
+Another way is to use the C<Tie::File> module, which treats the entire
file as an array. Simply access a random array element.
=head2 Why do I get weird spaces when I print an array of lines?
-Saying
+(contributed by brian d foy)
+
+If you are seeing spaces between the elements of your array when
+you print the array, you are probably interpolating the array in
+double quotes:
+
+ my @animals = qw(camel llama alpaca vicuna);
+ print "animals are: @animals\n";
- print "@lines\n";
+It's the double quotes, not the C<print>, doing this. Whenever you
+interpolate an array in a double quote context, Perl joins the
+elements with spaces (or whatever is in C<$">, which is a space by
+default):
-joins together the elements of C<@lines> with a space between them.
-If C<@lines> were C<("little", "fluffy", "clouds")> then the above
-statement would print
+ animals are: camel llama alpaca vicuna
- little fluffy clouds
+This is different than printing the array without the interpolation:
-but if each element of C<@lines> was a line of text, ending a newline
-character C<("little\n", "fluffy\n", "clouds\n")> then it would print:
+ my @animals = qw(camel llama alpaca vicuna);
+ print "animals are: ", @animals, "\n";
- little
- fluffy
- clouds
+Now the output doesn't have the spaces between the elements because
+the elements of C<@animals> simply become part of the list to
+C<print>:
-If your array contains lines, just print them:
+ animals are: camelllamaalpacavicuna
+
+You might notice this when each of the elements of C<@array> end with
+a newline. You expect to print one element per line, but notice that
+every line after the first is indented:
+
+ this is a line
+ this is another line
+ this is the third line
+
+That extra space comes from the interpolation of the array. If you
+don't want to put anything between your array elements, don't use the
+array in double quotes. You can send it to print without them:
print @lines;
+=head2 How do I traverse a directory tree?
+
+(contributed by brian d foy)
+
+The C<File::Find> module, which comes with Perl, does all of the hard
+work to traverse a directory structure. It comes with Perl. You simply
+call the C<find> subroutine with a callback subroutine and the
+directories you want to traverse:
+
+ use File::Find;
+
+ find( \&wanted, @directories );
+
+ sub wanted {
+ # full path in $File::Find::name
+ # just filename in $_
+ ... do whatever you want to do ...
+ }
+
+The C<File::Find::Closures>, which you can download from CPAN, provides
+many ready-to-use subroutines that you can use with C<File::Find>.
+
+The C<File::Finder>, which you can download from CPAN, can help you
+create the callback subroutine using something closer to the syntax of
+the C<find> command-line utility:
+
+ use File::Find;
+ use File::Finder;
+
+ my $deep_dirs = File::Finder->depth->type('d')->ls->exec('rmdir','{}');
+
+ find( $deep_dirs->as_options, @places );
+
+The C<File::Find::Rule> module, which you can download from CPAN, has
+a similar interface, but does the traversal for you too:
+
+ use File::Find::Rule;
+
+ my @files = File::Find::Rule->file()
+ ->name( '*.pm' )
+ ->in( @INC );
+
+=head2 How do I delete a directory tree?
+
+(contributed by brian d foy)
+
+If you have an empty directory, you can use Perl's built-in C<rmdir>. If
+the directory is not empty (so, no files or subdirectories), you either
+have to empty it yourself (a lot of work) or use a module to help you.
+
+The C<File::Path> module, which comes with Perl, has a C<rmtree> which
+can take care of all of the hard work for you:
+
+ use File::Path qw(rmtree);
+
+ rmtree( \@directories, 0, 0 );
+
+The first argument to C<rmtree> is either a string representing a directory path
+or an array reference. The second argument controls progress messages, and the
+third argument controls the handling of files you don't have permissions to
+delete. See the C<File::Path> module for the details.
+
+=head2 How do I copy an entire directory?
+
+(contributed by Shlomi Fish)
+
+To do the equivalent of C<cp -R> (i.e. copy an entire directory tree
+recursively) in portable Perl, you'll either need to write something yourself
+or find a good CPAN module such as L<File::Copy::Recursive>.
=head1 REVISION
-Revision: $Revision: 10126 $
+Revision: $Revision$
-Date: $Date: 2007-10-27 21:29:20 +0200 (Sat, 27 Oct 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq6 - Regular Expressions ($Revision: 10126 $)
+perlfaq6 - Regular Expressions
=head1 DESCRIPTION
than the default, or else we won't actually ever have a multiline
record read in.
- $/ = ''; # read in more whole paragraph, not just one line
+ $/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha
print "Duplicate $1 at paragraph $.\n";
Here's code that finds sentences that begin with "From " (which would
be mangled by many mailers):
- $/ = ''; # read in more whole paragraph, not just one line
+ $/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /^From /gm ) { # /m makes ^ match next to \n
print "leading from in paragraph $.\n";
$. = 0 if eof; # fix $.
}
+=head2 How do I match XML, HTML, or other nasty, ugly things with a regex?
+X<regex, XML> X<regex, HTML> X<XML> X<HTML> X<pain> X<frustration>
+X<sucking out, will to live>
+
+(contributed by brian d foy)
+
+If you just want to get work done, use a module and forget about the
+regular expressions. The C<XML::Parser> and C<HTML::Parser> modules
+are good starts, although each namespace has other parsing modules
+specialized for certain tasks and different ways of doing it. Start at
+CPAN Search ( http://search.cpan.org ) and wonder at all the work people
+have done for you already! :)
+
+The problem with things such as XML is that they have balanced text
+containing multiple levels of balanced text, but sometimes it isn't
+balanced text, as in an empty tag (C<< <br/> >>, for instance). Even then,
+things can occur out-of-order. Just when you think you've got a
+pattern that matches your input, someone throws you a curveball.
+
+If you'd like to do it the hard way, scratching and clawing your way
+toward a right answer but constantly being disappointed, beseiged by
+bug reports, and weary from the inordinate amount of time you have to
+spend reinventing a triangular wheel, then there are several things
+you can try before you give up in frustration:
+
+=over 4
+
+=item * Solve the balanced text problem from another question in L<perlfaq6>
+
+=item * Try the recursive regex features in Perl 5.10 and later. See L<perlre>
+
+=item * Try defining a grammar using Perl 5.10's C<(?DEFINE)> feature.
+
+=item * Break the problem down into sub-problems instead of trying to use a single regex
+
+=item * Convince everyone not to use XML or HTML in the first place
+
+=back
+
+Good luck!
+
=head2 I put a regular expression into $/ but it didn't work. What's wrong?
X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in>
X<$RS, regexes in>
-$/ has to be a string. You can use these examples if you really need to
+$/ has to be a string. You can use these examples if you really need to
do this.
If you have File::Stream, this is easy.
If you don't have File::Stream, you have to do a little more work.
-You can use the four argument form of sysread to continually add to
+You can use the four-argument form of sysread to continually add to
a buffer. After you add to the buffer, you check if you have a
complete line (using your regular expression).
local $_ = "";
while( sysread FH, $_, 8192, length ) {
- while( s/^((?s).*?)your_pattern/ ) {
+ while( s/^((?s).*?)your_pattern// ) {
my $record = $1;
# do stuff here.
}
}
- You can do the same thing with foreach and a match using the
- c flag and the \G anchor, if you do not mind your entire file
- being in memory at the end.
+You can do the same thing with foreach and a match using the
+c flag and the \G anchor, if you do not mind your entire file
+being in memory at the end.
local $_ = "";
while( sysread FH, $_, 8192, length ) {
substr($mask, -1) x (length($new) - length($old))
}
- $a = "this is a TEsT case";
- $a =~ s/(test)/preserve_case($1, "success")/egi;
- print "$a\n";
+ $string = "this is a TEsT case";
+ $string =~ s/(test)/preserve_case($1, "success")/egi;
+ print "$string\n";
This prints:
prints the lines of input that match it:
my $pattern = shift @ARGV;
-
+
while( <> ) {
print if m/$pattern/;
}
time, then reuse that for subsequent iterations:
my $pattern = shift @ARGV;
-
+
while( <> ) {
print if m/$pattern/o; # useful for Perl < 5.6
}
later, you should only see C<re> report that for the first iteration.
use re 'debug';
-
+
$regex = 'Perl';
foreach ( qw(Perl Java Ruby Python) ) {
print STDERR "-" x 73, "\n";
)
}{defined $2 ? $2 : ""}gxse;
-A slight modification also removes C++ comments, as long as they are not
-spread over multiple lines using a continuation character):
+A slight modification also removes C++ comments, possibly spanning multiple lines
+using a continuation character:
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
+ s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
=head2 Can I use Perl regular expressions to match balanced text?
X<regex, matching balanced test> X<regexp, matching balanced test>
-X<regular expression, matching balanced test>
-
-Historically, Perl regular expressions were not capable of matching
-balanced text. As of more recent versions of perl including 5.6.1
-experimental features have been added that make it possible to do this.
-Look at the documentation for the (??{ }) construct in recent perlre manual
-pages to see an example of matching balanced parentheses. Be sure to take
-special notice of the warnings present in the manual before making use
-of this feature.
-
-CPAN contains many modules that can be useful for matching text
-depending on the context. Damian Conway provides some useful
-patterns in Regexp::Common. The module Text::Balanced provides a
-general solution to this problem.
-
-One of the common applications of balanced text matching is working
-with XML and HTML. There are many modules available that support
-these needs. Two examples are HTML::Parser and XML::Parser. There
-are many others.
-
-An elaborate subroutine (for 7-bit ASCII only) to pull out balanced
-and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>,
-or C<(> and C<)> can be found in
-http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz .
-
-The C::Scan module from CPAN also contains such subs for internal use,
-but they are undocumented.
+X<regular expression, matching balanced test> X<possessive> X<PARNO>
+X<Text::Balanced> X<Regexp::Common> X<backtracking> X<recursion>
+
+(contributed by brian d foy)
+
+Your first try should probably be the C<Text::Balanced> module, which
+is in the Perl standard library since Perl 5.8. It has a variety of
+functions to deal with tricky text. The C<Regexp::Common> module can
+also help by providing canned patterns you can use.
+
+As of Perl 5.10, you can match balanced text with regular expressions
+using recursive patterns. Before Perl 5.10, you had to resort to
+various tricks such as using Perl code in C<(??{})> sequences.
+
+Here's an example using a recursive regular expression. The goal is to
+capture all of the text within angle brackets, including the text in
+nested angle brackets. This sample text has two "major" groups: a
+group with one level of nesting and a group with two levels of
+nesting. There are five total groups in angle brackets:
+
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
+
+The regular expression to match the balanced text uses two new (to
+Perl 5.10) regular expression features. These are covered in L<perlre>
+and this example is a modified version of one in that documentation.
+
+First, adding the new possesive C<+> to any quantifier finds the
+longest match and does not backtrack. That's important since you want
+to handle any angle brackets through the recursion, not backtracking.
+The group C<< [^<>]++ >> finds one or more non-angle brackets without
+backtracking.
+
+Second, the new C<(?PARNO)> refers to the sub-pattern in the
+particular capture buffer given by C<PARNO>. In the following regex,
+the first capture buffer finds (and remembers) the balanced text, and
+you need that same pattern within the first buffer to get past the
+nested text. That's the recursive part. The C<(?1)> uses the pattern
+in the outer capture buffer as an independent part of the regex.
+
+Putting it all together, you have:
+
+ #!/usr/local/bin/perl5.10.0
+
+ my $string =<<"HERE";
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
+ HERE
+
+ my @groups = $string =~ m/
+ ( # start of capture buffer 1
+ < # match an opening angle bracket
+ (?:
+ [^<>]++ # one or more non angle brackets, non backtracking
+ |
+ (?1) # found < or >, so recurse to capture buffer 1
+ )*
+ > # match a closing angle bracket
+ ) # end of capture buffer 1
+ /xg;
+
+ $" = "\n\t";
+ print "Found:\n\t@groups\n";
+
+The output shows that Perl found the two major groups:
+
+ Found:
+ <brackets in <nested brackets> >
+ <another group <nested once <nested twice> > >
+
+With a little extra work, you can get the all of the groups in angle
+brackets even if they are in other angle brackets too. Each time you
+get a balanced match, remove its outer delimiter (that's the one you
+just matched so don't match it again) and add it to a queue of strings
+to process. Keep doing that until you get no matches:
+
+ #!/usr/local/bin/perl5.10.0
+
+ my @queue =<<"HERE";
+ I have some <brackets in <nested brackets> > and
+ <another group <nested once <nested twice> > >
+ and that's it.
+ HERE
+
+ my $regex = qr/
+ ( # start of bracket 1
+ < # match an opening angle bracket
+ (?:
+ [^<>]++ # one or more non angle brackets, non backtracking
+ |
+ (?1) # recurse to bracket 1
+ )*
+ > # match a closing angle bracket
+ ) # end of bracket 1
+ /x;
+
+ $" = "\n\t";
+
+ while( @queue )
+ {
+ my $string = shift @queue;
+
+ my @groups = $string =~ m/$regex/g;
+ print "Found:\n\t@groups\n\n" if @groups;
+
+ unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
+ }
+
+The output shows all of the groups. The outermost matches show up
+first and the nested matches so up later:
+
+ Found:
+ <brackets in <nested brackets> >
+ <another group <nested once <nested twice> > >
+
+ Found:
+ <nested brackets>
+
+ Found:
+ <nested once <nested twice> >
+
+ Found:
+ <nested twice>
=head2 What does it mean that regexes are greedy? How can I get around it?
X<greedy> X<greediness>
Avoid asking Perl to compile a regular expression every time
you want to match it. In this example, perl must recompile
-the regular expression for every iteration of the foreach()
+the regular expression for every iteration of the C<foreach>
loop since it has no way to know what $pattern will be.
@patterns = qw( foo bar baz );
}
}
-The qr// operator showed up in perl 5.005. It compiles a
+The C<qr//> operator showed up in perl 5.005. It compiles a
regular expression, but doesn't apply it. When you use the
pre-compiled version of the regex, perl does less work. In
-this example, I inserted a map() to turn each pattern into
+this example, I inserted a C<map> to turn each pattern into
its pre-compiled form. The rest of the script is the same,
but faster.
{
foreach $pattern ( @patterns )
{
- print if /$pattern/i;
- next LINE;
+ if( /$pattern/ )
+ {
+ print;
+ next LINE;
+ }
}
}
print if /\b(?:$regex)\b/i;
}
-For more details on regular expression efficiency, see Mastering
-Regular Expressions by Jeffrey Freidl. He explains how regular
+For more details on regular expression efficiency, see I<Mastering
+Regular Expressions> by Jeffrey Freidl. He explains how regular
expressions engine work and why some patterns are surprisingly
inefficient. Once you understand how perl applies regular
expressions, you can tune them for individual situations.
essentially the same information, but without the risk of excessive
string copying.
+Perl 5.10 added three specials, C<${^MATCH}>, C<${^PREMATCH}>, and
+C<${^POSTMATCH}> to do the same job but without the global performance
+penalty. Perl 5.10 only sets these variables if you compile or execute the
+regular expression with the C</p> modifier.
+
=head2 What good is C<\G> in a regular expression?
X<\G>
=head1 REVISION
-Revision: $Revision: 10126 $
+Revision: $Revision$
-Date: $Date: 2007-10-27 21:29:20 +0200 (Sat, 27 Oct 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq7 - General Perl Language Issues ($Revision: 10100 $)
+perlfaq7 - General Perl Language Issues
=head1 DESCRIPTION
if ($whoops) { exit 1 }
@nums = (1, 2, 3);
-
+
if ($whoops) {
exit 1;
}
just such situations as the one above.
Another operator with surprising precedence is exponentiation. It
-binds more tightly even than unary minus, making C<-2**2> product a
+binds more tightly even than unary minus, making C<-2**2> produce a
negative not a positive four. It is also right-associating, meaning
that C<2**3**2> is two raised to the ninth power, not eight squared.
$person = {}; # new anonymous hash
$person->{AGE} = 24; # set field AGE to 24
$person->{NAME} = "Nat"; # set field NAME to "Nat"
-
+
If you're looking for something a bit more rigorous, try L<perltoot>.
=head2 How do I create a module?
(contributed by brian d foy)
+The full answer to this can be found at
+http://cpan.org/modules/04pause.html#takeover
+
The easiest way to take over a module is to have the current
module maintainer either make you a co-maintainer or transfer
the module to you.
current maintainer. The PAUSE admins will also try to reach the
maintainer.
-=item
+=item
Post a public message in a heavily trafficked site announcing your
intention to take over the module.
=item
Wait a bit. The PAUSE admins don't want to act too quickly in case
-the current maintainer is on holiday. If there's no response to
+the current maintainer is on holiday. If there's no response to
private communication or the public post, a PAUSE admin can transfer
it to you.
=back
=head2 How do I create a class?
+X<class, creation> X<package>
-See L<perltoot> for an introduction to classes and objects, as well as
-L<perlobj> and L<perlbot>.
+(contributed by brian d foy)
+
+In Perl, a class is just a package, and methods are just subroutines.
+Perl doesn't get more formal than that and lets you set up the package
+just the way that you like it (that is, it doesn't set up anything for
+you).
+
+The Perl documentation has several tutorials that cover class
+creation, including L<perlboot> (Barnyard Object Oriented Tutorial),
+L<perltoot> (Tom's Object Oriented Tutorial), L<perlbot> (Bag o'
+Object Tricks), and L<perlobj>.
=head2 How can I tell if a variable is tainted?
hard-to-explain meaning. Usually, closures are implemented in Perl as
anonymous subroutines with lasting references to lexical variables
outside their own scopes. These lexicals magically refer to the
-variables that were around when the subroutine was defined (deep
+variables that were around when the subroutine was defined (deep
binding).
Closures are most often used in programming languages where you can
my $addpiece = shift;
return sub { shift() + $addpiece };
}
-
+
$f1 = make_adder(20);
$f2 = make_adder(555);
(contributed by brian d foy)
-Perl doesn't have "static" variables, which can only be accessed from
-the function in which they are declared. You can get the same effect
-with lexical variables, though.
+In Perl 5.10, declare the variable with C<state>. The C<state>
+declaration creates the lexical variable that persists between calls
+to the subroutine:
+
+ sub counter { state $count = 1; $counter++ }
You can fake a static variable by using a lexical variable which goes
out of scope. In this example, you define the subroutine C<counter>, and
my $count = 1;
sub counter { $count++ }
}
-
+
my $start = counter();
-
+
.... # code that calls counter();
-
+
my $end = counter();
In the previous example, you created a function-private variable
sub visible {
print "var has value $var\n";
}
-
+
sub dynamic {
local $var = 'local'; # new temporary value for the still-global
visible(); # variable called $var
}
-
+
sub lexical {
my $var = 'private'; # new private variable, $var
visible(); # (invisible outside of sub scope)
}
-
+
$var = 'global';
-
+
visible(); # prints global
dynamic(); # prints local
lexical(); # prints global
=head2 What's the difference between calling a function as &foo and foo()?
-When you call a function as C<&foo>, you allow that function access to
-your current @_ values, and you bypass prototypes.
-The function doesn't get an empty @_--it gets yours! While not
-strictly speaking a bug (it's documented that way in L<perlsub>), it
-would be hard to consider this a feature in most cases.
+(contributed by brian d foy)
+
+Calling a subroutine as C<&foo> with no trailing parentheses ignores
+the prototype of C<foo> and passes it the current value of the argumet
+list, C<@_>. Here's an example; the C<bar> subroutine calls C<&foo>,
+which prints what its arguments list:
+
+ sub bar { &foo }
+
+ sub foo { print "Args in foo are: @_\n" }
+
+ bar( qw( a b c ) );
+
+When you call C<bar> with arguments, you see that C<foo> got the same C<@_>:
-When you call your function as C<&foo()>, then you I<do> get a new @_,
-but prototyping is still circumvented.
+ Args in foo are: a b c
-Normally, you want to call a function using C<foo()>. You may only
-omit the parentheses if the function is already known to the compiler
-because it already saw the definition (C<use> but not C<require>),
-or via a forward reference or C<use subs> declaration. Even in this
-case, you get a clean @_ without any of the old values leaking through
-where they don't belong.
+Calling the subroutine with trailing parentheses, with or without arguments,
+does not use the current C<@_> and respects the subroutine prototype. Changing
+the example to put parentheses after the call to C<foo> changes the program:
+
+ sub bar { &foo() }
+
+ sub foo { print "Args in foo are: @_\n" }
+
+ bar( qw( a b c ) );
+
+Now the output shows that C<foo> doesn't get the C<@_> from its caller.
+
+ Args in foo are:
+
+The main use of the C<@_> pass-through feature is to write subroutines
+whose main job it is to call other subroutines for you. For further
+details, see L<perlsub>.
=head2 How do I create a switch or case statement?
+In Perl 5.10, use the C<given-when> construct described in L<perlsyn>:
+
+ use 5.010;
+
+ given ( $string ) {
+ when( 'Fred' ) { say "I found Fred!" }
+ when( 'Barney' ) { say "I found Barney!" }
+ when( /Bamm-?Bamm/ ) { say "I found Bamm-Bamm!" }
+ default { say "I don't recognize the name!" }
+ };
+
If one wants to use pure Perl and to be compatible with Perl versions
-prior to 5.10, the general answer is to write a construct like this:
+prior to 5.10, the general answer is to use C<if-elsif-else>:
for ($variable_to_test) {
if (/pat1/) { } # do something
"done" => sub { die "See ya!" },
"mad" => \&angry,
);
-
+
print "How are you? ";
chomp($string = <STDIN>);
if ($commands{$string}) {
print "No such command: $string\n";
}
-Note that starting from version 5.10, Perl has now a native switch
-statement. See L<perlsyn>.
-
Starting from Perl 5.8, a source filter module, C<Switch>, can also be
used to get switch and case. Its use is now discouraged, because it's
not fully compatible with the native switch of Perl 5.10, and because,
Make sure to read about creating modules in L<perlmod> and
the perils of indirect objects in L<perlobj/"Method Invocation">.
-=head2 How can I find out my current package?
+=head2 How can I find out my current or calling package?
-If you're just a random program, you can do this to find
-out what the currently compiled package is:
+(contributed by brian d foy)
- my $packname = __PACKAGE__;
+To find the package you are currently in, use the special literal
+C<__PACKAGE__>, as documented in L<perldata>. You can only use the
+special literals as separate tokens, so you can't interpolate them
+into strings like you can with variables:
-But, if you're a method and you want to print an error message
-that includes the kind of object you were called on (which is
-not necessarily the same as the one in which you were compiled):
+ my $current_package = __PACKAGE__;
+ print "I am in package $current_package\n";
- sub amethod {
- my $self = shift;
- my $class = ref($self) || $self;
- warn "called me from a $class object";
- }
+This is different from finding out the package an object is blessed
+into, which might not be the current package. For that, use C<blessed>
+from C<Scalar::Util>, part of the Standard Library since Perl 5.8:
+
+ use Scalar::Util qw(blessed);
+ my $object_package = blessed( $object );
+
+Most of the time, you shouldn't care what package an object is blessed
+into, however, as long as it claims to inherit from that class:
-=head2 How can I comment out a large block of perl code?
+ my $is_right_class = eval { $object->isa( $package ) }; # true or false
+
+If you want to find the package calling your code, perhaps to give better
+diagnostics as C<Carp> does, use the C<caller> built-in:
+
+ sub foo {
+ my @args = ...;
+ my( $package, $filename, $line ) = caller;
+
+ print "I was called from package $package\n";
+ );
+
+By default, your program starts in package C<main>, so you should
+always be in some package unless someone uses the C<package> built-in
+with no namespace. See the C<package> entry in L<perlfunc> for the
+details of empty packges.
+
+=head2 How can I comment out a large block of Perl code?
+
+(contributed by brian d foy)
-You can use embedded POD to discard it. Enclose the blocks you want
-to comment out in POD markers. The <=begin> directive marks a section
-for a specific formatter. Use the C<comment> format, which no formatter
-should claim to understand (by policy). Mark the end of the block
-with <=end>.
+The quick-and-dirty way to comment out more than one line of Perl is
+to surround those lines with Pod directives. You have to put these
+directives at the beginning of the line and somewhere where Perl
+expects a new statement (so not in the middle of statements like the #
+comments). You end the comment with C<=cut>, ending the Pod section:
+
+ =pod
+
+ my $object = NotGonnaHappen->new();
+
+ ignored_sub();
+
+ $wont_be_assigned = 37;
+
+ =cut
+
+The quick-and-dirty method only works well when you don't plan to
+leave the commented code in the source. If a Pod parser comes along,
+you're multiline comment is going to show up in the Pod translation.
+A better way hides it from Pod parsers as well.
+
+The C<=begin> directive can mark a section for a particular purpose.
+If the Pod parser doesn't want to handle it, it just ignores it. Label
+the comments with C<comment>. End the comment using C<=end> with the
+same label. You still need the C<=cut> to go back to Perl code from
+the Pod comment:
- # program is here
-
=begin comment
-
- all of this stuff
-
- here will be ignored
- by everyone
-
+
+ my $object = NotGonnaHappen->new();
+
+ ignored_sub();
+
+ $wont_be_assigned = 37;
+
=end comment
-
- =cut
-
- # program continues
-The pod directives cannot go just anywhere. You must put a
-pod directive where the parser is expecting a new statement,
-not just in the middle of an expression or some other
-arbitrary grammar production.
+ =cut
-See L<perlpod> for more details.
+For more information on Pod, check out L<perlpod> and L<perlpodspec>.
=head2 How do I clear a package?
=head1 REVISION
-Revision: $Revision: 10100 $
+Revision: $Revision$
-Date: $Date: 2007-10-21 20:59:30 +0200 (Sun, 21 Oct 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq8 - System Interaction ($Revision: 10183 $)
+perlfaq8 - System Interaction
=head1 DESCRIPTION
binary was built for.
=head2 How come exec() doesn't return?
+X<exec> X<system> X<fork> X<open> X<pipe>
-Because that's what it does: it replaces your currently running
-program with a different one. If you want to keep going (as is
-probably the case if you're asking this question) use system()
-instead.
+(contributed by brian d foy)
+
+The C<exec> function's job is to turn your process into another
+command and never to return. If that's not what you want to do, don't
+use C<exec>. :)
+
+If you want to run an external command and still keep your Perl process
+going, look at a piped C<open>, C<fork>, or C<system>.
=head2 How do I do fancy stuff with the keyboard/screen/mouse?
=head2 How do I clear the screen?
-If you only have do so infrequently, use C<system>:
+(contributed by brian d foy)
- system("clear");
+To clear the screen, you just have to print the special sequence
+that tells the terminal to clear the screen. Once you have that
+sequence, output it when you want to clear the screen.
-If you have to do this a lot, save the clear string
-so you can print it 100 times without calling a program
-100 times:
+You can use the C<Term::ANSIScreen> module to get the special
+sequence. Import the C<cls> function (or the C<:screen> tag):
- $clear_string = `clear`;
- print $clear_string;
+ use Term::ANSIScreen qw(cls);
+ my $clear_screen = cls();
+
+ print $clear_screen;
-If you're planning on doing other screen manipulations, like cursor
-positions, etc, you might wish to use Term::Cap module:
+The C<Term::Cap> module can also get the special sequence if you want
+to deal with the low-level details of terminal control. The C<Tputs>
+method returns the string for the given capability:
use Term::Cap;
- $terminal = Term::Cap->Tgetent( {OSPEED => 9600} );
+
+ $terminal = Term::Cap->Tgetent( { OSPEED => 9600 } );
$clear_string = $terminal->Tputs('cl');
+ print $clear_screen;
+
+On Windows, you can use the C<Win32::Console> module. After creating
+an object for the output filehandle you want to affect, call the
+C<Cls> method:
+
+ Win32::Console;
+
+ $OUT = Win32::Console->new(STD_OUTPUT_HANDLE);
+ my $clear_string = $OUT->Cls;
+
+ print $clear_screen;
+
+If you have a command-line program that does the job, you can call
+it in backticks to capture whatever it outputs so you can use it
+later:
+
+ $clear_string = `clear`;
+
+ print $clear_string;
+
=head2 How do I get the screen size?
If you have Term::ReadKey module installed from CPAN,
=head2 How do I start a process in the background?
-Several modules can start other processes that do not block
-your Perl program. You can use IPC::Open3, Parallel::Jobs,
-IPC::Run, and some of the POE modules. See CPAN for more
-details.
+(contributed by brian d foy)
-You could also use
+There's not a single way to run code in the background so you don't
+have to wait for it to finish before your program moves on to other
+tasks. Process management depends on your particular operating system,
+and many of the techniques are in L<perlipc>.
+
+Several CPAN modules may be able to help, including IPC::Open2 or
+IPC::Open3, IPC::Run, Parallel::Jobs, Parallel::ForkManager, POE,
+Proc::Background, and Win32::Process. There are many other modules you
+might use, so check those namespaces for other options too.
+
+If you are on a unix-like system, you might be able to get away with a
+system call where you put an C<&> on the end of the command:
system("cmd &")
-or you could use fork as documented in L<perlfunc/"fork">, with
-further examples in L<perlipc>. Some things to be aware of, if you're
-on a Unix-like system:
+You can also try using C<fork>, as described in L<perlfunc> (although
+this is the same thing that many of the modules will do for you).
=over 4
=head2 What's the difference between require and use?
-Perl offers several different ways to include code from one file into
-another. Here are the deltas between the various inclusion constructs:
+(contributed by brian d foy)
+
+Perl runs C<require> statement at run-time. Once Perl loads, compiles,
+and runs the file, it doesn't do anything else. The C<use> statement
+is the same as a C<require> run at compile-time, but Perl also calls the
+C<import> method for the loaded package. These two are the same:
+
+ use MODULE qw(import list);
- 1) do $file is like eval `cat $file`, except the former
- 1.1: searches @INC and updates %INC.
- 1.2: bequeaths an *unrelated* lexical scope on the eval'ed code.
+ BEGIN {
+ require MODULE;
+ MODULE->import(import list);
+ }
+
+However, you can suppress the C<import> by using an explicit, empty
+import list. Both of these still happen at compile-time:
- 2) require $file is like do $file, except the former
- 2.1: checks for redundant loading, skipping already loaded files.
- 2.2: raises an exception on failure to find, compile, or execute $file.
+ use MODULE ();
- 3) require Module is like require "Module.pm", except the former
- 3.1: translates each "::" into your system's directory separator.
- 3.2: primes the parser to disambiguate class Module as an indirect object.
+ BEGIN {
+ require MODULE;
+ }
- 4) use Module is like require Module, except the former
- 4.1: loads the module at compile time, not run-time.
- 4.2: imports symbols and semantics from that package to the current one.
+Since C<use> will also call the C<import> method, the actual value
+for C<MODULE> must be a bareword. That is, C<use> cannot load files
+by name, although C<require> can:
-In general, you usually want C<use> and a proper Perl module.
+ require "$ENV{HOME}/lib/Foo.pm"; # no @INC searching!
+
+See the entry for C<use> in L<perlfunc> for more details.
=head2 How do I keep my own module/library directory?
When you build modules, tell Perl where to install the modules.
-For C<Makefile.PL>-based distributions, use the PREFIX and LIB options
+For C<Makefile.PL>-based distributions, use the INSTALL_BASE option
when generating Makefiles:
- perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib
+ perl Makefile.PL INSTALL_BASE=/mydir/perl
You can set this in your CPAN.pm configuration so modules automatically install
in your private library directory when you use the CPAN.pm shell:
% cpan
- cpan> o conf makepl_arg PREFIX=/mydir/perl,LIB=/mydir/perl/lib
+ cpan> o conf makepl_arg INSTALL_BASE=/mydir/perl
cpan> o conf commit
For C<Build.PL>-based distributions, use the --install_base option:
- perl Build.PL --install_base /mydir/perl
+ perl Build.PL --install_base /mydir/perl
You can configure CPAN.pm to automatically use this option too:
cpan> o conf mbuild_arg --install_base /mydir/perl
cpan> o conf commit
+INSTALL_BASE tells these tools to put your modules into
+F</mydir/perl/lib/perl5>. See L<How do I add a directory to my
+include path (@INC) at runtime?> for details on how to run your newly
+installed moudles.
+
+There is one caveat with INSTALL_BASE, though, since it acts
+differently than the PREFIX and LIB settings that older versions of
+ExtUtils::MakeMaker advocated. INSTALL_BASE does not support
+installing modules for multiple versions of Perl or different
+architectures under the same directory. You should consider if you
+really want that , and if you do, use the older PREFIX and LIB
+settings. See the ExtUtils::Makemaker documentation for more details.
+
=head2 How do I add the directory my program lives in to the module/library search path?
(contributed by brian d foy)
at compile time:
use lib $directory;
-
+
The trick in this task is to find the directory. Before your script does
anything else (such as a C<chdir>), you can get the current working
directory with the C<Cwd> module, which comes with Perl:
use Cwd;
our $directory = cwd;
}
-
+
use lib $directory;
-
+
You can do a similar thing with the value of C<$0>, which holds the
script name. That might hold a relative path, but C<rel2abs> can turn
-it into an absolute path. Once you have the
+it into an absolute path. Once you have the
- BEGIN {
+ BEGIN {
use File::Spec::Functions qw(rel2abs);
use File::Basename qw(dirname);
-
+
my $path = rel2abs( $0 );
our $directory = dirname( $path );
}
-
+
use lib $directory;
-The C<FindBin> module, which comes with Perl, might work. It searches
-through C<$ENV{PATH}> (so your script has to be in one of those
-directories). You can then use that directory (in C<$FindBin::Bin>)
-to locate nearby directories you want to add:
+The C<FindBin> module, which comes with Perl, might work. It finds the
+directory of the currently running script and puts it in C<$Bin>, which
+you can then use to construct the right library path:
- use FindBin;
- use lib "$FindBin::Bin/../lib";
+ use FindBin qw($Bin);
=head2 How do I add a directory to my include path (@INC) at runtime?
=head1 REVISION
-Revision: $Revision: 10183 $
+Revision: $Revision$
-Date: $Date: 2007-11-07 09:35:12 +0100 (Wed, 07 Nov 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
=head1 NAME
-perlfaq9 - Networking ($Revision: 8539 $)
+perlfaq9 - Networking
=head1 DESCRIPTION
http://www.perl.org/CGI_MetaFAQ.html
-
=head2 How can I get better error messages from a CGI program?
Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
more verbose and safer versions. It still sends them to the normal
server error log.
- use CGI::Carp;
- warn "This is a complaint";
- die "But this one is serious";
+ use CGI::Carp;
+ warn "This is a complaint";
+ die "But this one is serious";
The following use of CGI::Carp also redirects errors to a file of your choice,
placed in a BEGIN block to catch compile-time warnings as well:
- BEGIN {
- use CGI::Carp qw(carpout);
- open(LOG, ">>/var/local/cgi-logs/mycgi-log")
- or die "Unable to append to mycgi-log: $!\n";
- carpout(*LOG);
- }
+ BEGIN {
+ use CGI::Carp qw(carpout);
+ open(LOG, ">>/var/local/cgi-logs/mycgi-log")
+ or die "Unable to append to mycgi-log: $!\n";
+ carpout(*LOG);
+ }
You can even arrange for fatal errors to go back to the client browser,
which is nice for your own debugging, but might confuse the end user.
- use CGI::Carp qw(fatalsToBrowser);
- die "Bad error here";
+ use CGI::Carp qw(fatalsToBrowser);
+ die "Bad error here";
Even if the error happens before you get the HTTP header out, the module
will try to take care of this to avoid the dreaded server 500 errors.
Here's one "simple-minded" approach, that works for most files:
- #!/usr/bin/perl -p0777
- s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
+ #!/usr/bin/perl -p0777
+ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
If you want a more complete solution, see the 3-stage striphtml
program in
Here are some tricky cases that you should think about when picking
a solution:
- <IMG SRC = "foo.gif" ALT = "A > B">
+ <IMG SRC = "foo.gif" ALT = "A > B">
- <IMG SRC = "foo.gif"
+ <IMG SRC = "foo.gif"
ALT = "A > B">
- <!-- <A comment> -->
+ <!-- <A comment> -->
- <script>if (a<b && a>c)</script>
+ <script>if (a<b && a>c)</script>
- <# Just data #>
+ <# Just data #>
- <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
+ <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
If HTML comments include other tags, those solutions would also break
on text like this:
- <!-- This section commented out.
- <B>You can't see me!</B>
- -->
+ <!-- This section commented out.
+ <B>You can't see me!</B>
+ -->
=head2 How do I extract URLs?
module based approaches but only extracts URLs from anchors where the first
attribute is HREF and there are no other attributes.
- #!/usr/bin/perl -n00
- # qxurl - tchrist@perl.com
- print "$2\n" while m{
- < \s*
- A \s+ HREF \s* = \s* (["']) (.*?) \1
- \s* >
- }gsix;
-
+ #!/usr/bin/perl -n00
+ # qxurl - tchrist@perl.com
+ print "$2\n" while m{
+ < \s*
+ A \s+ HREF \s* = \s* (["']) (.*?) \1
+ \s* >
+ }gsix;
=head2 How do I download a file from the user's machine? How do I open a file on another machine?
start_form,
"What's your favorite animal? ",
- popup_menu(
- -name => 'animal',
+ popup_menu(
+ -name => 'animal',
-values => [ qw( Llama Alpaca Camel Ram ) ]
),
- submit,
-
- end_form,
- end_html;
+ submit,
+ end_form,
+ end_html;
=head2 How do I fetch an HTML file?
-One approach, if you have the lynx text-based HTML browser installed
-on your system, is this:
+(contributed by brian d foy)
+
+Use the libwww-perl distribution. The C<LWP::Simple> module can fetch web
+resources and give their content back to you as a string:
- $html_code = `lynx -source $url`;
- $text_data = `lynx -dump $url`;
+ use LWP::Simple qw(get);
-The libwww-perl (LWP) modules from CPAN provide a more powerful way
-to do this. They don't require lynx, but like lynx, can still work
-through proxies:
+ my $html = get( "http://www.example.com/index.html" );
- # simplest version
- use LWP::Simple;
- $content = get($URL);
+It can also store the resource directly in a file:
- # or print HTML from a URL
- use LWP::Simple;
- getprint "http://www.linpro.no/lwp/";
+ use LWP::Simple qw(getstore);
- # or print ASCII from HTML from a URL
- # also need HTML-Tree package from CPAN
- use LWP::Simple;
- use HTML::Parser;
- use HTML::FormatText;
- my ($html, $ascii);
- $html = get("http://www.perl.com/");
- defined $html
- or die "Can't fetch HTML from http://www.perl.com/";
- $ascii = HTML::FormatText->new->format(parse_html($html));
- print $ascii;
+ getstore( "http://www.example.com/index.html", "foo.html" );
+
+If you need to do something more complicated, you can use
+C<LWP::UserAgent> module to create your own user-agent (e.g. browser)
+to get the job done. If you want to simulate an interactive web
+browser, you can use the C<WWW::Mechanize> module.
=head2 How do I automate an HTML form submission?
If you're submitting values using the GET method, create a URL and encode
the form using the C<query_form> method:
- use LWP::Simple;
- use URI::URL;
+ use LWP::Simple;
+ use URI::URL;
- my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
- $url->query_form(module => 'DB_File', readme => 1);
- $content = get($url);
+ my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
+ $url->query_form(module => 'DB_File', readme => 1);
+ $content = get($url);
If you're using the POST method, create your own user agent and encode
the content appropriately.
- use HTTP::Request::Common qw(POST);
- use LWP::UserAgent;
+ use HTTP::Request::Common qw(POST);
+ use LWP::UserAgent;
- $ua = LWP::UserAgent->new();
- my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
- [ module => 'DB_File', readme => 1 ];
- $content = $ua->request($req)->as_string;
+ $ua = LWP::UserAgent->new();
+ my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
+ [ module => 'DB_File', readme => 1 ];
+ $content = $ua->request($req)->as_string;
=head2 How do I decode or create those %-encodings on the web?
+X<URI> X<CGI.pm> X<CGI> X<URI::Escape> X<RFC 2396>
+
+(contributed by brian d foy)
+
+Those C<%> encodings handle reserved characters in URIs, as described
+in RFC 2396, Section 2. This encoding replaces the reserved character
+with the hexadecimal representation of the character's number from
+the US-ASCII table. For instance, a colon, C<:>, becomes C<%3A>.
+
+In CGI scripts, you don't have to worry about decoding URIs if you are
+using C<CGI.pm>. You shouldn't have to process the URI yourself,
+either on the way in or the way out.
-If you are writing a CGI script, you should be using the CGI.pm module
-that comes with perl, or some other equivalent module. The CGI module
-automatically decodes queries for you, and provides an escape()
-function to handle encoding.
+If you have to encode a string yourself, remember that you should
+never try to encode an already-composed URI. You need to escape the
+components separately then put them together. To encode a string, you
+can use the the C<URI::Escape> module. The C<uri_escape> function
+returns the escaped string:
-The best source of detailed information on URI encoding is RFC 2396.
-Basically, the following substitutions do it:
+ my $original = "Colon : Hash # Percent %";
- s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode
+ my $escaped = uri_escape( $original )
- s/%([A-Fa-f\d]{2})/chr hex $1/eg; # decode
- s/%([[:xdigit:]]{2})/chr hex $1/eg; # same thing
+ print "$string\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25%20'
-However, you should only apply them to individual URI components, not
-the entire URI, otherwise you'll lose information and generally mess
-things up. If that didn't explain it, don't worry. Just go read
-section 2 of the RFC, it's probably the best explanation there is.
+To decode the string, use the C<uri_unescape> function:
-RFC 2396 also contains a lot of other useful information, including a
-regexp for breaking any arbitrary URI into components (Appendix B).
+ my $unescaped = uri_unescape( $escaped );
+
+ print $unescaped; # back to original
+
+If you wanted to do it yourself, you simply need to replace the
+reserved characters with their encodings. A global substitution
+is one way to do it:
+
+ # encode
+ $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
+
+ #decode
+ $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
=head2 How do I redirect to another page?
Use of CGI.pm is strongly recommended. This example shows redirection
with a complete URL. This redirection is handled by the web browser.
- use CGI qw/:standard/;
-
- my $url = 'http://www.cpan.org/';
- print redirect($url);
+ use CGI qw/:standard/;
+ my $url = 'http://www.cpan.org/';
+ print redirect($url);
This example shows a redirection with an absolute URLpath. This
redirection is handled by the local web server.
- my $url = '/CPAN/index.html';
- print redirect($url);
-
+ my $url = '/CPAN/index.html';
+ print redirect($url);
But if coded directly, it could be as follows (the final "\n" is
shown separately, for clarity), using either a complete URL or
an absolute URLpath.
- print "Location: $url\n"; # CGI response header
- print "\n"; # end of headers
-
+ print "Location: $url\n"; # CGI response header
+ print "\n"; # end of headers
=head2 How do I put a password on my web pages?
a DBI compatible driver. HTTPD::UserAdmin supports files used by the
"Basic" and "Digest" authentication schemes. Here's an example:
- use HTTPD::UserAdmin ();
- HTTPD::UserAdmin
+ use HTTPD::UserAdmin ();
+ HTTPD::UserAdmin
->new(DB => "/foo/.htpasswd")
->add($username => $password);
For a quick-and-dirty solution, try this solution derived
from L<perlfunc/split>:
- $/ = '';
- $header = <MSG>;
- $header =~ s/\n\s+/ /g; # merge continuation lines
- %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
+ $/ = '';
+ $header = <MSG>;
+ $header =~ s/\n\s+/ /g; # merge continuation lines
+ %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
That solution doesn't do well if, for example, you're trying to
maintain all the Received lines. A more complete approach is to use
folding whitespace, or any other obsolete or non-essential elements.
This I<just> matches the address itself:
- my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+};
- my $dot_atom = qr{$atom(?:\.$atom)*};
- my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"};
- my $local = qr{(?:$dot_atom|$quoted)};
- my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]};
- my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]};
- my $domain = qr{(?:$dot_atom|$domain_lit)};
- my $addr_spec = qr{$local\@$domain};
+ my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+};
+ my $dot_atom = qr{$atom(?:\.$atom)*};
+ my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"};
+ my $local = qr{(?:$dot_atom|$quoted)};
+ my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]};
+ my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]};
+ my $domain = qr{(?:$dot_atom|$domain_lit)};
+ my $addr_spec = qr{$local\@$domain};
Just match an address against C</^${addr_spec}$/> to see if it follows
the RFC2822 specification. However, because it is impossible to be
The MIME-Base64 package (available from CPAN) handles this as well as
the MIME/QP encoding. Decoding BASE64 becomes as simple as:
- use MIME::Base64;
- $decoded = decode_base64($encoded);
+ use MIME::Base64;
+ $decoded = decode_base64($encoded);
The MIME-Tools package (available from CPAN) supports extraction with
decoding of BASE64 encoded attachments and content directly from email
a more direct approach is to use the unpack() function's "u"
format after minor transliterations:
- tr#A-Za-z0-9+/##cd; # remove non-base64 chars
- tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
- $len = pack("c", 32 + 0.75*length); # compute length byte
- print unpack("u", $len . $_); # uudecode and print
+ tr#A-Za-z0-9+/##cd; # remove non-base64 chars
+ tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
+ $len = pack("c", 32 + 0.75*length); # compute length byte
+ print unpack("u", $len . $_); # uudecode and print
=head2 How do I return the user's mail address?
Sys::Hostname module (which is part of the standard perl distribution),
you can probably try using something like this:
- use Sys::Hostname;
- $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
+ use Sys::Hostname;
+ $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
Company policies on mail address can mean that this generates addresses
that the company's mail system will not accept, so you should ask for
Use the C<sendmail> program directly:
- open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
- or die "Can't fork for sendmail: $!\n";
- print SENDMAIL <<"EOF";
- From: User Originating Mail <me\@host>
- To: Final Destination <you\@otherhost>
- Subject: A relevant subject line
+ open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
+ or die "Can't fork for sendmail: $!\n";
+ print SENDMAIL <<"EOF";
+ From: User Originating Mail <me\@host>
+ To: Final Destination <you\@otherhost>
+ Subject: A relevant subject line
- Body of the message goes here after the blank line
- in as many lines as you like.
- EOF
- close(SENDMAIL) or warn "sendmail didn't close nicely";
+ Body of the message goes here after the blank line
+ in as many lines as you like.
+ EOF
+ close(SENDMAIL) or warn "sendmail didn't close nicely";
The B<-oi> option prevents sendmail from interpreting a line consisting
of a single dot as "end of message". The B<-t> option says to use the
Or you might be able use the CPAN module Mail::Mailer:
- use Mail::Mailer;
+ use Mail::Mailer;
- $mailer = Mail::Mailer->new();
- $mailer->open({ From => $from_address,
- To => $to_address,
- Subject => $subject,
- })
- or die "Can't open: $!\n";
- print $mailer $body;
- $mailer->close();
+ $mailer = Mail::Mailer->new();
+ $mailer->open({ From => $from_address,
+ To => $to_address,
+ Subject => $subject,
+ })
+ or die "Can't open: $!\n";
+ print $mailer $body;
+ $mailer->close();
The Mail::Internet module uses Net::SMTP which is less Unix-centric than
Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
This answer is extracted directly from the MIME::Lite documentation.
Create a multipart message (i.e., one with attachments).
- use MIME::Lite;
+ use MIME::Lite;
- ### Create a new multipart message:
- $msg = MIME::Lite->new(
- From =>'me@myhost.com',
- To =>'you@yourhost.com',
- Cc =>'some@other.com, some@more.com',
- Subject =>'A message with 2 parts...',
- Type =>'multipart/mixed'
- );
+ ### Create a new multipart message:
+ $msg = MIME::Lite->new(
+ From =>'me@myhost.com',
+ To =>'you@yourhost.com',
+ Cc =>'some@other.com, some@more.com',
+ Subject =>'A message with 2 parts...',
+ Type =>'multipart/mixed'
+ );
- ### Add parts (each "attach" has same arguments as "new"):
- $msg->attach(Type =>'TEXT',
- Data =>"Here's the GIF file you wanted"
- );
- $msg->attach(Type =>'image/gif',
- Path =>'aaa000123.gif',
- Filename =>'logo.gif'
- );
+ ### Add parts (each "attach" has same arguments as "new"):
+ $msg->attach(Type =>'TEXT',
+ Data =>"Here's the GIF file you wanted"
+ );
+ $msg->attach(Type =>'image/gif',
+ Path =>'aaa000123.gif',
+ Filename =>'logo.gif'
+ );
- $text = $msg->as_string;
+ $text = $msg->as_string;
MIME::Lite also includes a method for sending these things.
- $msg->send;
+ $msg->send;
This defaults to using L<sendmail> but can be customized to use
SMTP via L<Net::SMTP>.
of the MailTools package), often a module is overkill. Here's a
mail sorter.
- #!/usr/bin/perl
-
- my(@msgs, @sub);
- my $msgno = -1;
- $/ = ''; # paragraph reads
- while (<>) {
- if (/^From /m) {
- /^Subject:\s*(?:Re:\s*)*(.*)/mi;
- $sub[++$msgno] = lc($1) || '';
- }
- $msgs[$msgno] .= $_;
- }
- for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
- print $msgs[$i];
- }
+ #!/usr/bin/perl
+
+ my(@msgs, @sub);
+ my $msgno = -1;
+ $/ = ''; # paragraph reads
+ while (<>) {
+ if (/^From /m) {
+ /^Subject:\s*(?:Re:\s*)*(.*)/mi;
+ $sub[++$msgno] = lc($1) || '';
+ }
+ $msgs[$msgno] .= $_;
+ }
+ for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
+ print $msgs[$i];
+ }
Or more succinctly,
- #!/usr/bin/perl -n00
- # bysub2 - awkish sort-by-subject
- BEGIN { $msgno = -1 }
- $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
- $msg[$msgno] .= $_;
- END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
+ #!/usr/bin/perl -n00
+ # bysub2 - awkish sort-by-subject
+ BEGIN { $msgno = -1 }
+ $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
+ $msg[$msgno] .= $_;
+ END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
=head2 How do I find out my hostname, domainname, or IP address?
X<hostname, domainname, IP address, host, domain, hostfqdn, inet_ntoa,
form (a.b.c.d) that most people expect, use the C<inet_ntoa> function
from the <Socket> module, which also comes with perl.
- use Socket;
+ use Socket;
- my $address = inet_ntoa(
- scalar gethostbyname( $host || 'localhost' )
- );
+ my $address = inet_ntoa(
+ scalar gethostbyname( $host || 'localhost' )
+ );
=head2 How do I fetch a news article or the active newsgroups?
Use the Net::NNTP or News::NNTPClient modules, both available from CPAN.
This can make tasks like fetching the newsgroup list as simple as
- perl -MNews::NNTPClient
- -e 'print News::NNTPClient->new->list("newsgroups")'
+ perl -MNews::NNTPClient
+ -e 'print News::NNTPClient->new->list("newsgroups")'
=head2 How do I fetch/put an FTP file?
=head1 REVISION
-Revision: $Revision: 8539 $
+Revision: $Revision$
-Date: $Date: 2007-01-11 00:07:14 +0100 (Thu, 11 Jan 2007) $
+Date: $Date$
See L<perlfaq> for source control details and availability.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and
+Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it