From: Steffen Mueller Date: Thu, 13 Aug 2009 12:32:00 +0000 (+0200) Subject: Merge the updated perlfaq from the perlfaq repository X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=109f04419ad154407413aa733c313fd77c1e12ca;p=p5sagit%2Fp5-mst-13.2.git Merge the updated perlfaq from the perlfaq repository --- diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod index f5e7a0d..5862823 100644 --- a/pod/perlfaq.pod +++ b/pod/perlfaq.pod @@ -24,8 +24,8 @@ at http://faq.perl.org/ . The perlfaq-workers periodically post extracts of the latest perlfaq to comp.lang.perl.misc. You can view the source tree at -https://svn.perl.org/modules/perlfaq/trunk/ (which is outside of the -main Perl source tree). The SVN repository notes all changes to the FAQ +https://github.com/briandfoy/perlfaq (which is outside of the +main Perl source tree). The git repository notes all changes to the FAQ and holds the latest version of the working documents and may vary significantly from the version distributed with the latest version of Perl. Check the repository before sending your corrections. @@ -43,6 +43,12 @@ The perlfaq server posts extracts of the perlfaq to that newsgroup every answers. If you'd like to help review and update the answers, check out comp.lang.perl.misc. +You can also fork the git repository for the perlfaq and send a pull +request so the main repository can pull your changes. The repository +is at: + + https://github.com/briandfoy/perlfaq + =head2 What will happen if you mail your Perl programming problems to the authors? The perlfaq-workers like to keep all traffic on the perlfaq-workers list @@ -67,13 +73,16 @@ and the perlfaq notes those contributions wherever appropriate. =head1 AUTHOR AND COPYRIGHT -Tom Christiansen wrote the original version of this document. +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and +other authors as noted. All rights reserved. + +Tom Christainsen wrote the original version of this document. brian d foy C<< >> wrote this version. See the individual perlfaq documents for additional copyright information. This document is available under the same terms as Perl itself. Code examples in all the perlfaq documents are in the public domain. Use -them as you see fit (and at your own risk with no warranty from anyone). +them as you see fit and at your own risk with no warranty from anyone. =head1 Table of Contents @@ -610,6 +619,10 @@ How do I process an entire hash? =item * +How do I merge two hashes? + +=item * + What happens if I add or remove keys from a hash while iterating over it? =item * @@ -857,6 +870,18 @@ How do I select a random line from a file? Why do I get weird spaces when I print an array of lines? +=item * + +How do I traverse a directory tree? + +=item * + +How do I delete a directory tree? + +=item * + +How do I copy an entire directory? + =back @@ -880,6 +905,10 @@ How can I pull out lines between two patterns that are themselves on different l =item * +How do I match XML, HTML, or other nasty, ugly things with a regex? + +=item * + I put a regular expression into $/ but it didn't work. What's wrong? =item * @@ -1069,7 +1098,7 @@ Why can't a method included in this same file be found? =item * -How can I find out my current package? +How can I find out my current or calling package? =item * diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod index ba3d64c..ec016f6 100644 --- a/pod/perlfaq1.pod +++ b/pod/perlfaq1.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq1 - General Questions About Perl ($Revision: 10427 $) +perlfaq1 - General Questions About Perl =head1 DESCRIPTION @@ -59,7 +59,7 @@ users the informal support will more than suffice. See the answer to (contributed by brian d foy) There is often a matter of opinion and taste, and there isn't any one -answer that fits anyone. In general, you want to use either the current +answer that fits everyone. In general, you want to use either the current stable release, or the stable release immediately prior to that one. Currently, those are perl5.10.x and perl5.8.x, respectively. @@ -107,8 +107,8 @@ as its whitewashed bones have fractured or eroded. =item * -There is no Perl 6 release scheduled, but it will be available when -it's ready. Stay tuned, but don't worry that you'll have to change +There is no Perl 6 release scheduled, but it will be available when +it's ready. Stay tuned, but don't worry that you'll have to change major versions of Perl; no one is going to take Perl 5 away from you. =item * @@ -285,7 +285,7 @@ will sleep easier, too--Wall Street programs not withstanding. :-) One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom's quip that "Nothing but perl can -parse Perl." +parse Perl." Before the first edition of I, people commonly referred to the language as "perl", and its name appeared that way in @@ -335,10 +335,7 @@ to sign email and usenet messages starting in the late 1980s. He previously used the phrase with many subjects ("Just another x hacker,"), so to distinguish his JAPH, he started to write them as Perl programs: - print "Just another Perl hacker, "; - -Note the trailing comma and space, which allows the addition of other -JAxH clauses for his many other interests. + print "Just another Perl hacker,"; Other people picked up on this and started to write clever or obfuscated programs to produce the same output, spinning things quickly out of @@ -400,15 +397,15 @@ You might find these links useful: =head1 REVISION -Revision: $Revision: 10427 $ +Revision: $Revision$ -Date: $Date: 2007-12-14 00:39:01 +0100 (Fri, 14 Dec 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod index 6b4d3da..f578d41 100644 --- a/pod/perlfaq2.pod +++ b/pod/perlfaq2.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq2 - Obtaining and Learning about Perl ($Revision: 10144 $) +perlfaq2 - Obtaining and Learning about Perl =head1 DESCRIPTION @@ -12,49 +12,42 @@ related matters. The standard release of perl (the one maintained by the perl development team) is distributed only in source code form. You -can find this at http://www.cpan.org/src/latest.tar.gz , which -is in a standard Internet format (a gzipped archive in POSIX tar format). +can find the latest releases at http://www.cpan.org/src/README.html . Perl builds and runs on a bewildering number of platforms. Virtually all known and current Unix derivatives are supported (perl's native platform), as are other systems like VMS, DOS, OS/2, Windows, QNX, BeOS, OS X, MPE/iX and the Amiga. -Binary distributions for some proprietary platforms, including -Apple systems, can be found http://www.cpan.org/ports/ directory. -Because these are not part of the standard distribution, they may -and in fact do differ from the base perl port in a variety of ways. -You'll have to check their respective release notes to see just -what the differences are. These differences can be either positive -(e.g. extensions for the features of the particular platform that -are not supported in the source release of perl) or negative (e.g. -might be based upon a less current source release of perl). +Binary distributions for some proprietary platforms can be found +http://www.cpan.org/ports/ directory. Because these are not part of +the standard distribution, they may and in fact do differ from the +base perl port in a variety of ways. You'll have to check their +respective release notes to see just what the differences are. These +differences can be either positive (e.g. extensions for the features +of the particular platform that are not supported in the source +release of perl) or negative (e.g. might be based upon a less current +source release of perl). =head2 How can I get a binary version of perl? -For Windows, ActiveState provides a pre-built Perl for free: +(contributed by brian d foy) + +ActiveState: Windows, Linux, Mac OS X, Solaris, AIX and HP-UX http://www.activestate.com/ -Sunfreeware.com provides binaries for many utilities, including -Perl, for Solaris on both Intel and SPARC hardware: +Sunfreeware.com: Solaris 2.5 to Solaris 10 (SPARC and x86) http://www.sunfreeware.com/ -If you don't have a C compiler because your vendor for whatever -reasons did not include one with your system, the best thing to do is -grab a binary version of gcc from the net and use that to compile perl -with. CPAN only has binaries for systems that are terribly hard to -get free compilers for, not for Unix systems. - -Some URLs that might help you are: +Strawberry Perl: Windows, Perl 5.8.8 and 5.10.0 - http://www.cpan.org/ports/ - http://www.perl.com/pub/language/info/software.html + http://www.strawberryperl.com + +IndigoPerl: Windows -Someone looking for a perl for Win16 might look to Laszlo Molnar's -djgpp port in http://www.cpan.org/ports/#msdos , which comes with -clear installation instructions. + http://indigostar.com/ =head2 I don't have a C compiler. How can I build my own Perl interpreter? @@ -65,15 +58,15 @@ What you need to do is get a binary version of gcc for your system first. Consult the Usenet FAQs for your operating system for information on where to get such a binary version. -You might look around the net for a pre-built binary of Perl (or a +You might look around the net for a pre-built binary of Perl (or a C compiler!) that meets your needs, though: For Windows, Vanilla Perl ( http://vanillaperl.com/ ) and Strawberry Perl -( http://strawberryperl.com/ ) come with a +( http://strawberryperl.com/ ) come with a bundled C compiler. ActivePerl is a pre-compiled version of Perl ready-to-use. -For Sun systems, SunFreeware.com provides binaries of most popular +For Sun systems, SunFreeware.com provides binaries of most popular applications, including compilers and Perl. =head2 I copied the perl binary from one machine to another, but scripts don't work. @@ -187,13 +180,11 @@ by the time you read this. These URLs might also be useful: Several groups devoted to the Perl language are on Usenet: - comp.lang.perl.announce Moderated announcement group - comp.lang.perl.misc High traffic general Perl discussion - comp.lang.perl.moderated Moderated discussion group - comp.lang.perl.modules Use and development of Perl modules - comp.lang.perl.tk Using Tk (and X) from Perl - - comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web. + comp.lang.perl.announce Moderated announcement group + comp.lang.perl.misc High traffic general Perl discussion + comp.lang.perl.moderated Moderated discussion group + comp.lang.perl.modules Use and development of Perl modules + comp.lang.perl.tk Using Tk (and X) from Perl Some years ago, comp.lang.perl was divided into those groups, and comp.lang.perl itself officially removed. While that group may still @@ -298,8 +289,8 @@ Recommended books on (or mostly on) Perl follow. Perl 5 Pocket Reference by Johan Vromans - ISBN 0-596-00032-4 [3rd edition May 2000] - http://www.oreilly.com/catalog/perlpr3/ + ISBN 0-596-00374-9 [4th edition July 2002] + http://www.oreilly.com/catalog/perlpr4/ =item Tutorials @@ -315,8 +306,8 @@ Recommended books on (or mostly on) Perl follow. Learning Perl by Randal L. Schwartz, Tom Phoenix, and brian d foy - ISBN 0-596-10105-8 [4th edition July 2005] - http://www.oreilly.com/catalog/learnperl4/ + ISBN 0-596-52010-7 [5th edition June 2008] + http://oreilly.com/catalog/9780596520106/ Intermediate Perl (the "Alpaca Book") by Randal L. Schwartz and brian d foy, with Tom Phoenix (foreword by Damian Conway) @@ -455,7 +446,7 @@ A comprehensive list of Perl related mailing lists can be found at: The Google search engine now carries archived and searchable newsgroup content. -http://groups.google.com/groups?group=comp.lang.perl.misc +http://groups.google.com/group/comp.lang.perl.misc/topics If you have a question, you can be sure someone has already asked the same question at some point on c.l.p.m. It requires some time and patience @@ -485,20 +476,31 @@ also all come with perl. =head2 Where do I send bug reports? -If you are reporting a bug in the perl interpreter or the modules -shipped with Perl, use the I program in the Perl distribution or -mail your report to perlbug@perl.org or at http://rt.perl.org/perlbug/ . +(contributed by brian d foy) + +First, ensure that you've found an actual bug. Second, ensure you've +found an actual bug. + +If you've found a bug with the perl interpreter or one of the modules +in the standard library (those that come with Perl), you can use the +C utility that comes with Perl (>= 5.004). It collects +information about your installation to include with your message, then +sends the message to the right place. -For Perl modules, you can submit bug reports to the Request Tracker set -up at http://rt.cpan.org . +To determine if a module came with your version of Perl, you can +use the C module. It has the information about +the modules (with their versions) included with each release of Perl. -If you are posting a bug with a non-standard port (see the answer to -"What platforms is perl available for?"), a binary distribution, or a -non-standard module (such as Tk, CGI, etc), then please see the -documentation that came with it to determine the correct place to post -bugs. +Every CPAN module has a bug tracker set up in RT, http://rt.cpan.org . +You can submit bugs to RT either through its web interface or by +email. To email a bug report, send it to +bug-Edistribution-nameE@rt.cpan.org . For example, if you +wanted to report a bug in C, you could send a message to +bug-Business-ISBN@rt.cpan.org . -Read the perlbug(1) man page (perl5.004 or later) for more information. +Some modules might have special reporting requirements, such as a +Sourceforge or Google Code tracking system, so you should check the +module documentation too. =head2 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org? @@ -529,15 +531,15 @@ the I question earlier in this document. =head1 REVISION -Revision: $Revision: 10144 $ +Revision: $Revision$ -Date: $Date: 2007-10-31 13:50:01 +0100 (Wed, 31 Oct 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod index 7b58df8..2fa78fc 100644 --- a/pod/perlfaq3.pod +++ b/pod/perlfaq3.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq3 - Programming Tools ($Revision: 10127 $) +perlfaq3 - Programming Tools =head1 DESCRIPTION @@ -51,7 +51,8 @@ http://sourceforge.net/projects/psh/ . Zoidberg is a similar project and provides a shell written in perl, configured in perl and operated in perl. It is intended as a login shell -and development environment. It can be found at http://zoidberg.sf.net/ +and development environment. It can be found at +http://pardus-larus.student.utwente.nl/~pardus/projects/zoidberg/ or your local CPAN mirror. The Shell.pm module (distributed with Perl) makes Perl try commands @@ -61,10 +62,19 @@ be what you want. =head2 How do I find which modules are installed on my system? -You can use the ExtUtils::Installed module to show all installed -distributions, although it can take awhile to do its magic. The -standard library which comes with Perl just shows up as "Perl" (although -you can get those with Module::CoreList). +From the command line, you can use the C command's C<-l> switch: + + $ cpan -l + +You can also use C's C<-a> switch to create an autobundle file +that C understands and cna use to re-install every module: + + $ cpan -a + +Inside a Perl program, you can use the ExtUtils::Installed module to +show all installed distributions, although it can take awhile to do +its magic. The standard library which comes with Perl just shows up +as "Perl" (although you can get those with Module::CoreList). use ExtUtils::Installed; @@ -76,22 +86,30 @@ can use File::Find::Rule. use File::Find::Rule; - my @files = File::Find::Rule->file()->name( '*.pm' )->in( @INC ); + my @files = File::Find::Rule-> + extras({follow => 1})-> + file()-> + name( '*.pm' )-> + in( @INC ) + ; If you do not have that module, you can do the same thing with File::Find which is part of the standard library. - use File::Find; - my @files; - - find( - sub { - push @files, $File::Find::name - if -f $File::Find::name && /\.pm$/ - }, - - @INC - ); + use File::Find; + my @files; + + find( + { + wanted => sub { + push @files, $File::Find::fullname + if -f $File::Find::fullname && /\.pm$/ + }, + follow => 1, + follow_skip => 2, + }, + @INC + ); print join "\n", @files; @@ -101,12 +119,12 @@ read the documentation the module is most likely installed. If you cannot read the documentation, the module might not have any (in rare cases). - prompt% perldoc Module::Name + $ perldoc Module::Name You can also try to include the module in a one-liner to see if perl finds it. - perl -MModule::Name -e1 + $ perl -MModule::Name -e1 =head2 How do I debug my Perl programs? @@ -148,38 +166,61 @@ from Activestate (Windows and Mac OS X), or EPIC (most platforms). =head2 How do I profile my Perl programs? -You should get the Devel::DProf module from the standard distribution -(or separately on CPAN) and also use Benchmark.pm from the standard -distribution. The Benchmark module lets you time specific portions of -your code, while Devel::DProf gives detailed breakdowns of where your -code spends its time. +(contributed by brian d foy, updated Fri Jul 25 12:22:26 PDT 2008) + +The C namespace has several modules which you can use to +profile your Perl programs. The C module comes with Perl +and you can invoke it with the C<-d> switch: + + perl -d:DProf program.pl + +After running your program under C, you'll get a F file +with the profile data. To look at the data, you can turn it into a +human-readable report with the C program that comes with +C. + + dprofpp -Here's a sample use of Benchmark: +You can also do the profiling and reporting in one step with the C<-p> +switch to : - use Benchmark; + dprofpp -p program.pl - @junk = `cat /etc/motd`; - $count = 10_000; +The C (New York Times Profiler) does both statement +and subroutine profiling. It's available from CPAN and you also invoke +it with the C<-d> switch: - timethese($count, { - 'map' => sub { my @a = @junk; - map { s/a/b/ } @a; - return @a }, - 'for' => sub { my @a = @junk; - for (@a) { s/a/b/ }; - return @a }, - }); + perl -d:NYTProf some_perl.pl -This is what it prints (on one machine--your results will be dependent -on your hardware, operating system, and the load on your machine): +Like C, it creates a database of the profile information that you +can turn into reports. The C command turns the data into +an HTML report similar to the C report: - Benchmark: timing 10000 iterations of for, map... - for: 4 secs ( 3.97 usr 0.01 sys = 3.98 cpu) - map: 6 secs ( 4.97 usr 0.00 sys = 4.97 cpu) + nytprofhtml -Be aware that a good benchmark is very hard to write. It only tests the -data you give it and proves little about the differing complexities -of contrasting algorithms. +CPAN has several other profilers that you can invoke in the same +fashion. You might also be interested in using the C to +measure and compare code snippets. + +You can read more about profiling in I, chapter 20, +or I, chapter 5. + +L documents creating a custom debugger if you need to +create a special sort of profiler. brian d foy describes the process +in I, "Creating a Perl Debugger", +http://www.ddj.com/184404522 , and "Profiling in Perl" +http://www.ddj.com/184404580 . + +Perl.com has two interesting articles on profiling: "Profiling Perl", +by Simon Cozens, http://www.perl.com/lpt/a/850 and "Debugging and +Profiling mod_perl Applications", by Frank Wiles, +http://www.perl.com/pub/a/2006/02/09/debug_mod_perl.html . + +Randal L. Schwartz writes about profiling in "Speeding up Your Perl +Programs" for I, +http://www.stonehenge.com/merlyn/UnixReview/col49.html , and "Profiling +in Template Toolkit via Overriding" for I, +http://www.stonehenge.com/merlyn/LinuxMag/col75.html . =head2 How do I cross-reference my Perl programs? @@ -281,11 +322,19 @@ http://www.optiperl.com/ OptiPerl is a Windows IDE with simulated CGI environment, including debugger and syntax highlighting editor. +=item Padre + +http://padre.perlide.org/ + +Padre is cross-platform IDE for Perl written in Perl using the the wxWidgets +to provide a native look and feel. It's open source under the Artistic +License. + =item PerlBuilder http://www.solutionsoft.com/perl.htm -PerlBuidler is an integrated development environment for Windows that +PerlBuilder is an integrated development environment for Windows that supports Perl development. =item visiPerl+ @@ -379,7 +428,7 @@ incarnation of it, and secondly because you can embed Perl inside it to use Perl as the scripting language. nvi is not alone in this, though: at least also vim and vile offer an embedded Perl. -The following are Win32 multilanguage editor/IDESs that support Perl: +The following are Win32 multilanguage editor/IDEs that support Perl: =over 4 @@ -395,6 +444,10 @@ http://www.MultiEdit.com/ http://www.slickedit.com/ +=item ConTEXT + +http://www.contexteditor.org/ + =back There is also a toyedit Text widget based editor written in Perl @@ -415,7 +468,7 @@ from the Cygwin package ( http://sources.redhat.com/cygwin/ ) =item Ksh -from the MKS Toolkit ( http://www.mks.com/ ), or the Bourne shell of +from the MKS Toolkit ( http://www.mkssoftware.com/ ), or the Bourne shell of the U/WIN environment ( http://www.research.att.com/sw/tools/uwin/ ) =item Tcsh @@ -430,10 +483,10 @@ http://www.zsh.org/ =back MKS and U/WIN are commercial (U/WIN is free for educational and -research purposes), Cygwin is covered by the GNU Public License (but -that shouldn't matter for Perl use). The Cygwin, MKS, and U/WIN all -contain (in addition to the shells) a comprehensive set of standard -UNIX toolkit utilities. +research purposes), Cygwin is covered by the GNU General Public +License (but that shouldn't matter for Perl use). The Cygwin, MKS, +and U/WIN all contain (in addition to the shells) a comprehensive set +of standard UNIX toolkit utilities. If you're transferring text files between Unix and Windows using FTP be sure to transfer them in ASCII mode so the ends of lines are @@ -465,9 +518,6 @@ are text editors for Mac OS that have a Perl sensitivity mode =back -Pepper and Pe are programming language sensitive text editors for Mac -OS X and BeOS respectively ( http://www.hekkelman.com/ ). - =head2 Where can I get Perl macros for vi? For a complete version of Tom Christiansen's vi configuration file, @@ -519,8 +569,8 @@ simple gui. It hasn't been updated in a while. =item Wx -This is a Perl binding for the cross-platform wxWidgets toolkit -L. It works under Unix, Win32 and Mac OS X, +This is a Perl binding for the cross-platform wxWidgets toolkit +( http://www.wxwidgets.org ). It works under Unix, Win32 and Mac OS X, using native widgets (Gtk under Unix). The interface follows the C++ interface closely, but the documentation is a little sparse for someone who doesn't know the library, mostly just referring you to the C++ @@ -528,7 +578,7 @@ documentation. =item Gtk and Gtk2 -These are Perl bindings for the Gtk toolkit L. The +These are Perl bindings for the Gtk toolkit ( http://www.gtk.org ). The interface changed significantly between versions 1 and 2 so they have separate Perl modules. It runs under Unix, Win32 and Mac OS X (currently it requires an X server on Mac OS, but a 'native' port is underway), and @@ -547,7 +597,7 @@ require familiarity with the C Win32 APIs, or reference to MSDN. =item CamelBones -CamelBones L is a Perl interface to +CamelBones ( http://camelbones.sourceforge.net ) is a Perl interface to Mac OS X's Cocoa GUI toolkit, and as such can be used to produce native GUIs on Mac OS X. It's not on CPAN, as it requires frameworks that CPAN.pm doesn't know how to install, but installation is via the @@ -745,7 +795,7 @@ You usually can't. Memory allocated to lexicals (i.e. my() variables) cannot be reclaimed or reused even if they go out of scope. It is reserved in case the variables come back into scope. Memory allocated to global variables can be reused (within your program) by using -undef()ing and/or delete(). +undef() and/or delete(). On most operating systems, memory allocated to a program can never be returned to the system. That's why long-running programs sometimes re- @@ -1033,15 +1083,15 @@ to process and install a Perl distribution. =head1 REVISION -Revision: $Revision: 10127 $ +Revision: $Revision$ -Date: $Date: 2007-10-27 21:40:20 +0200 (Sat, 27 Oct 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 326ec91..1c06b0b 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 10394 $) +perlfaq4 - Data Manipulation =head1 DESCRIPTION @@ -48,35 +48,45 @@ numbers. What you think in the above as 'three' is really more like =head2 Why isn't my octal data interpreted correctly? -Perl only understands octal and hex numbers as such when they occur as -literals in your program. Octal literals in perl must start with a -leading C<0> and hexadecimal literals must start with a leading C<0x>. -If they are read in from somewhere and assigned, no automatic -conversion takes place. You must explicitly use C or C if you -want the values converted to decimal. C interprets hexadecimal (C<0x350>), -octal (C<0350> or even without the leading C<0>, like C<377>) and binary -(C<0b1010>) numbers, while C only converts hexadecimal ones, with -or without a leading C<0x>, such as C<0x255>, C<3A>, C, or C. -The inverse mapping from decimal to octal can be done with either the -<%o> or C<%O> C formats. +(contributed by brian d foy) + +You're probably trying to convert a string to a number, which Perl only +converts as a decimal number. When Perl converts a string to a number, it +ignores leading spaces and zeroes, then assumes the rest of the digits +are in base 10: + + my $string = '0644'; + + print $string + 0; # prints 644 + + print $string + 44; # prints 688, certainly not octal! + +This problem usually involves one of the Perl built-ins that has the +same name a unix command that uses octal numbers as arguments on the +command line. In this example, C on the command line knows that +its first argument is octal because that's what it does: + + %prompt> chmod 644 file + +If you want to use the same literal digits (644) in Perl, you have to tell +Perl to treat them as octal numbers either by prefixing the digits with +a C<0> or using C: + + chmod( 0644, $file); # right, has leading zero + chmod( oct(644), $file ); # also correct -This problem shows up most often when people try using C, -C, C, or C, which by widespread tradition -typically take permissions in octal. +The problem comes in when you take your numbers from something that Perl +thinks is a string, such as a command line argument in C<@ARGV>: - chmod(644, $file); # WRONG - chmod(0644, $file); # right + chmod( $ARGV[0], $file); # wrong, even if "0644" -Note the mistake in the first line was specifying the decimal literal -C<644>, rather than the intended octal literal C<0644>. The problem can -be seen with: + chmod( oct($ARGV[0]), $file ); # correct, treat string as octal - printf("%#o",644); # prints 01204 +You can always check the value you're using by printing it in octal +notation to ensure it matches what you think it should be. Print it +in octal and decimal format: -Surely you had not intended C - did you? If you -want to use numeric literals as arguments to chmod() et al. then please -try to express them as octal constants, that is with a leading zero and -with the following digits restricted to the set C<0..7>. + printf "0%o %d", $number, $number; =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions? @@ -363,7 +373,7 @@ pseudorandom generator than comes with your operating system, look at =head2 How do I get a random number between X and Y? To get a random number between two values, you can use the C -builtin to get a random number between 0 and 1. From there, you shift +built-in to get a random number between 0 and 1. From there, you shift that into the range that you want. C returns a number such that C<< 0 <= rand($x) < $x >>. Thus @@ -373,7 +383,7 @@ from 0 to the difference between your I and I. That is, to get a number between 10 and 15, inclusive, you want a random number between 0 and 5 that you can then add to 10. - my $number = 10 + int rand( 15-10+1 ); + my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 ) Hence you derive the following simple function to abstract that. It selects a random integer between the two given @@ -478,6 +488,9 @@ Julian day) 31 =head2 How do I find yesterday's date? +X X X X X +X X X X +X (contributed by brian d foy) @@ -504,6 +517,22 @@ dates, but that assumes that days are twenty-four hours each. For most people, there are two days a year when they aren't: the switch to and from summer time throws this off. Let the modules do the work. +If you absolutely must do it yourself (or can't use one of the +modules), here's a solution using C, which comes with +Perl: + + # contributed by Gunnar Hjalmarsson + use Time::Local; + my $today = timelocal 0, 0, 12, ( localtime )[3..5]; + my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; + printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d; + +In this case, you measure the day starting at noon, and subtract 24 +hours. Even if the length of the calendar day is 23 or 25 hours, +you'll still end up on the previous calendar day, although not at +noon. Since you don't care about the time, the one hour difference +doesn't matter and you end up with the previous date. + =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is @@ -533,15 +562,6 @@ not the language. At the risk of inflaming the NRA: "Perl doesn't break Y2K, people do." See http://www.perl.org/about/y2k.html for a longer exposition. -=head2 Does Perl have a Year 2038 problem? - -No, all of Perl's built in date and time functions and modules will -work to about 2 billion years before and after 1970. - -Many systems cannot count time past the year 2038. Older versions of -Perl were dependent on the system to do date calculation and thus -shared their 2038 bug. - =head1 Data: Strings =head2 How do I validate input? @@ -788,44 +808,22 @@ result to a scalar, producing a count of the number of matches. $count = () = $string =~ /-\d+/g; -=head2 How do I capitalize all the words on one line? - -To make the first letter of each word upper case: - - $line =~ s/\b(\w)/\U$1/g; - -This has the strange effect of turning "C" into "C". Sometimes you might want this. Other times you might need a -more thorough solution (Suggested by brian d foy): - - $string =~ s/ ( - (^\w) #at the beginning of the line - | # or - (\s\w) #preceded by whitespace - ) - /\U$1/xg; - - $string =~ s/([\w']+)/\u\L$1/g; - -To make the whole line upper case: - - $line = uc($line); +=head2 Does Perl have a Year 2038 problem? -To force each word to be lower case, with the first letter upper case: +No, all of Perl's built in date and time functions and modules will +work to about 2 billion years before and after 1970. - $line =~ s/(\w+)/\u\L$1/g; +Many systems cannot count time past the year 2038. Older versions of +Perl were dependent on the system to do date calculation and thus +shared their 2038 bug. -You can (and probably should) enable locale awareness of those -characters by placing a C pragma in your program. -See L for endless details on locales. +=head2 How do I capitalize all the words on one line? +X X X X -This is sometimes referred to as putting something into "title -case", but that's not quite accurate. Consider the proper -capitalization of the movie I, for example. +(contributed by brian d foy) -Damian Conway's L module provides some smart -case transformations: +Damian Conway's L handles all of the thinking +for you. use Text::Autoformat; my $x = "Dr. Strangelove or: How I Learned to Stop ". @@ -836,6 +834,30 @@ case transformations: print autoformat($x, { case => $style }), "\n"; } +How do you want to capitalize those words? + + FRED AND BARNEY'S LODGE # all uppercase + Fred And Barney's Lodge # title case + Fred and Barney's Lodge # highlight case + +It's not as easy a problem as it looks. How many words do you think +are in there? Wait for it... wait for it.... If you answered 5 +you're right. Perl words are groups of C<\w+>, but that's not what +you want to capitalize. How is Perl supposed to know not to capitalize +that C after the apostrophe? You could try a regular expression: + + $string =~ s/ ( + (^\w) #at the beginning of the line + | # or + (\s\w) #preceded by whitespace + ) + /\U$1/xg; + + $string =~ s/([\w']+)/\u\L$1/g; + +Now, what if you don't want to capitalize that "and"? Just use +L and get on with the next problem. :) + =head2 How can I split a [character] delimited string except when inside [character]? Several modules can handle this sort of parsing--C, @@ -988,7 +1010,7 @@ appear as part of the data. If you want to work with comma-separated values, don't do this since that format is a bit more complicated. Use one of the modules that -handle that fornat, such as C, C, or +handle that format, such as C, C, or C. If you want to break apart an entire line of fixed columns, you can use @@ -1038,7 +1060,7 @@ what's left in the string: The C will also silently ignore violations of strict, replacing undefined variable names with the empty string. Since I'm using the -C flag (twice even!), I have all of the same security problems I +C flag (twice even!), I have all of the same security problems I have with C in its string form. If there's something odd in C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then I could get myself in trouble. @@ -1050,7 +1072,7 @@ can replace the missing value with a marker, in this case C to signal that I missed something: my $string = 'This has $foo and $bar'; - + my %Replacements = ( foo => 'Fred', ); @@ -1281,16 +1303,32 @@ same thing. =head2 How can I tell whether a certain element is contained in a list or array? -(portions of this answer contributed by Anno Siegel) +(portions of this answer contributed by Anno Siegel and brian d foy) Hearing the word "in" is an Idication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't. -That being said, there are several ways to approach this. If you +That being said, there are several ways to approach this. In Perl 5.10 +and later, you can use the smart match operator to check that an item is +contained in an array or a hash: + + use 5.010; + + if( $item ~~ @array ) + { + say "The array contains $item" + } + + if( $item ~~ %hash ) + { + say "The hash contains $item" + } + +With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a -hash whose keys are the first array's values. +hash whose keys are the first array's values: @blues = qw/azure cerulean teal turquoise lapis-lazuli/; %is_blue = (); @@ -1365,6 +1403,21 @@ in either A or in B but not in both. Think of it as an xor operation. =head2 How do I test whether two arrays or hashes are equal? +With Perl 5.10 and later, the smart match operator can give you the answer +with the least amount of work: + + use 5.010; + + if( @array1 ~~ @array2 ) + { + say "The arrays are the same"; + } + + if( %hash1 ~~ %hash2 ) # doesn't check values! + { + say "The hash keys are the same"; + } + The following code works for single-level arrays. It uses a stringwise comparison, and does not distinguish defined versus undefined empty strings. Modify if you have other needs. @@ -1495,14 +1548,24 @@ You could add to the list this way: But again, Perl's built-in are virtually always good enough. =head2 How do I handle circular lists? +X X X X +X X -Circular lists could be handled in the traditional fashion with linked -lists, or you could just do something like this with an array: +(contributed by brian d foy) + +If you want to cycle through an array endlessy, you can increment the +index modulo the number of elements in the array: - unshift(@array, pop(@array)); # the last shall be first - push(@array, shift(@array)); # and vice versa + my @array = qw( a b c ); + my $i = 0; + + while( 1 ) { + print $array[ $i++ % @array ], "\n"; + last if $i > 20; + } -You can also use C: +You can also use C to use a scalar that always has the +next element of the circular array: use Tie::Cycle; @@ -1512,6 +1575,19 @@ You can also use C: print $cycle; # 000000 print $cycle; # FFFF00 +The C creates an iterator object for +circular arrays: + + use Array::Iterator::Circular; + + my $color_iterator = Array::Iterator::Circular->new( + qw(red green blue orange) + ); + + foreach ( 1 .. 20 ) { + print $color_iterator->next, "\n"; + } + =head2 How do I shuffle an array randomly? If you either have Perl 5.8.0 or later installed, or if you have @@ -1525,6 +1601,8 @@ If not, you can use a Fisher-Yates shuffle. sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array + return unless @$deck; # must not be empty! + my $i = @$deck; while (--$i) { my $j = int rand ($i+1); @@ -1664,7 +1742,7 @@ C uses string order and C numeric order, so you can enumerate all the permutations of C<0..9> like this: use Algorithm::Loops qw(NextPermuteNum); - + my @list= 0..9; do { print "@list\n" } while NextPermuteNum @list; @@ -1721,14 +1799,23 @@ See also the question later in L on sorting hashes. Use C and C, or else C and the bitwise operations. -For example, this sets C<$vec> to have bit N set if C<$ints[N]> was -set: +For example, you don't have to store individual bits in an array +(which would mean that you're wasting a lot of space). To convert an +array of bits to a string, use C to set the right bits. This +sets C<$vec> to have bit N set only if C<$ints[N]> was set: + @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... ) $vec = ''; - foreach(@ints) { vec($vec,$_,1) = 1 } + foreach( 0 .. $#ints ) { + vec($vec,$_,1) = 1 if $ints[$_]; + } -Here's how, given a vector in C<$vec>, you can get those bits into your -C<@ints> array: +The string C<$vec> only takes up as many bits as it needs. For +instance, if you had 16 entries in C<@ints>, C<$vec> only needs two +bytes to store them (not counting the scalar variable overhead). + +Here's how, given a vector in C<$vec>, you can get those bits into +your C<@ints> array: sub bitvec_to_list { my $vec = shift; @@ -1850,7 +1937,7 @@ can then get the value through the particular key you're processing: } Once you have the list of keys, you can process that list before you -process the hashh elements. For instance, you can sort the keys so you +process the hash elements. For instance, you can sort the keys so you can process them in lexical order: foreach my $key ( sort keys %hash ) { @@ -1868,7 +1955,7 @@ those using C: } If the hash is very large, you might not want to create a long list of -keys. To save some memory, you can grab on key-value pair at a time using +keys. To save some memory, you can grab one key-value pair at a time using C, which returns a pair you haven't seen yet: while( my( $key, $value ) = each( %hash ) ) { @@ -1886,6 +1973,62 @@ you use C, C, or C on the same hash, you can reset the iterator and mess up your processing. See the C entry in L for more details. +=head2 How do I merge two hashes? +X X X + +(contributed by brian d foy) + +Before you decide to merge two hashes, you have to decide what to do +if both hashes contain keys that are the same and if you want to leave +the original hashes as they were. + +If you want to preserve the original hashes, copy one hash (C<%hash1>) +to a new hash (C<%new_hash>), then add the keys from the other hash +(C<%hash2> to the new hash. Checking that the key already exists in +C<%new_hash> gives you a chance to decide what to do with the +duplicates: + + my %new_hash = %hash1; # make a copy; leave %hash1 alone + + foreach my $key2 ( keys %hash2 ) + { + if( exists $new_hash{$key2} ) + { + warn "Key [$key2] is in both hashes!"; + # handle the duplicate (perhaps only warning) + ... + next; + } + else + { + $new_hash{$key2} = $hash2{$key2}; + } + } + +If you don't want to create a new hash, you can still use this looping +technique; just change the C<%new_hash> to C<%hash1>. + + foreach my $key2 ( keys %hash2 ) + { + if( exists $hash1{$key2} ) + { + warn "Key [$key2] is in both hashes!"; + # handle the duplicate (perhaps only warning) + ... + next; + } + else + { + $hash1{$key2} = $hash2{$key2}; + } + } + +If you don't care that one hash overwrites keys and values from the other, you +could just use a hash slice to add one hash to another. In this case, values +from C<%hash2> replace values from C<%hash1> when they have keys in common: + + @hash1{ keys %hash2 } = values %hash2; + =head2 What happens if I add or remove keys from a hash while iterating over it? (contributed by brian d foy) @@ -1922,14 +2065,35 @@ worry you, you can always reverse the hash into a hash of arrays instead: =head2 How can I know how many entries are in a hash? -If you mean how many keys, then all you have to do is -use the keys() function in a scalar context: +(contributed by brian d foy) + +This is very similar to "How do I process an entire hash?", also in +L, but a bit simpler in the common cases. + +You can use the C built-in function in scalar context to find out +have many entries you have in a hash: - $num_keys = keys %hash; + my $key_count = keys %hash; # must be scalar context! + +If you want to find out how many entries have a defined value, that's +a bit different. You have to check each value. A C is handy: + + my $defined_value_count = grep { defined } values %hash; -The keys() function also resets the iterator, which means that you may +You can use that same structure to count the entries any way that +you like. If you want the count of the keys with vowels in them, +you just test for that instead: + + my $vowel_count = grep { /[aeiou]/ } keys %hash; + +The C in scalar context returns the count. If you want the list +of matching items, just use it in list context instead: + + my @defined_values = grep { defined } values %hash; + +The C function also resets the iterator, which means that you may see strange results if you use this between uses of other hash operators -such as each(). +such as C. =head2 How do I sort a hash (optionally by value instead of key)? @@ -1945,7 +2109,7 @@ create a report which lists the keys in ASCIIbetical order. foreach my $key ( @keys ) { - printf "%-20s %6d\n", $key, $hash{$value}; + printf "%-20s %6d\n", $key, $hash{$key}; } We could get more fancy in the C block though. Instead of @@ -2139,20 +2303,44 @@ Use the C from CPAN. =head2 Why does passing a subroutine an undefined element in a hash create it? -If you say something like: +(contributed by brian d foy) + +Are you using a really old version of Perl? + +Normally, accessing a hash key's value for a nonexistent key will +I create the key. + + my %hash = (); + my $value = $hash{ 'foo' }; + print "This won't print\n" if exists $hash{ 'foo' }; + +Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though. +Since you could assign directly to C<$_[0]>, Perl had to be ready to +make that assignment so it created the hash key ahead of time: + + my_sub( $hash{ 'foo' } ); + print "This will print before 5.004\n" if exists $hash{ 'foo' }; - somefunc($hash{"nonesuch key here"}); + sub my_sub { + # $_[0] = 'bar'; # create hash key in case you do this + 1; + } + +Since Perl 5.004, however, this situation is a special case and Perl +creates the hash key only when you make the assignment: -Then that element "autovivifies"; that is, it springs into existence -whether you store something there or not. That's because functions -get scalars passed in by reference. If somefunc() modifies C<$_[0]>, -it has to be ready to write it back into the caller's version. + my_sub( $hash{ 'foo' } ); + print "This will print, even after 5.004\n" if exists $hash{ 'foo' }; + + sub my_sub { + $_[0] = 'bar'; + } -This has been fixed as of Perl5.004. +However, if you want the old behavior (and think carefully about that +because it's a weird side effect), you can pass a hash slice instead. +Perl 5.004 didn't make this a special case: -Normally, merely accessing a key's value for a nonexistent key does -I cause that key to be forever there. This is different than -awk's behavior. + my_sub( @hash{ qw/foo/ } ); =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays? @@ -2174,18 +2362,31 @@ in L. =head2 How can I use a reference as a hash key? -(contributed by brian d foy) +(contributed by brian d foy and Ben Morrow) Hash keys are strings, so you can't really use a reference as the key. When you try to do that, perl turns the reference into its stringified form (for instance, C). From there you can't get back the reference from the stringified form, at least without doing -some extra work on your own. Also remember that hash keys must be -unique, but two different variables can store the same reference (and -those variables can change later). - -The C module, which is distributed with perl, might be -what you want. It handles that extra work. +some extra work on your own. + +Remember that the entry in the hash will still be there even if +the referenced variable goes out of scope, and that it is entirely +possible for Perl to subsequently allocate a different variable at +the same address. This will mean a new variable might accidentally +be associated with the value for an old. + +If you have Perl 5.10 or later, and you just want to store a value +against the reference for lookup later, you can use the core +Hash::Util::Fieldhash module. This will also handle renaming the +keys if you use multiple threads (which causes all variables to be +reallocated at new addresses, changing their stringification), and +garbage-collecting the entries when the referenced variable goes out +of scope. + +If you actually need to be able to get a real reference back from +each hash entry, you can use the Tie::RefHash module, which does the +required work for you. =head1 Data: Misc @@ -2288,7 +2489,14 @@ you wanted to copy. =head2 How do I define methods for every class/object? -Use the C class (see L). +(contributed by Ben Morrow) + +You can use the C class (see L). However, please +be very careful to consider the consequences of doing this: adding +methods to every object is very likely to have unintended +consequences. If possible, it would be better to have all your object +inherit from some common base class, or to use an object system like +Moose that supports roles. =head2 How do I verify a credit card checksum? @@ -2296,21 +2504,23 @@ Get the C module from CPAN. =head2 How do I pack arrays of doubles or floats for XS code? -The kgbpack.c code in the C module on CPAN does just this. +The arrays.h/arrays.c code in the C module on CPAN does just this. If you're doing a lot of float or double processing, consider using the C module from CPAN instead--it makes number-crunching easy. +See L for the code. + =head1 REVISION -Revision: $Revision: 10394 $ +Revision: $Revision$ -Date: $Date: 2007-12-09 18:47:15 +0100 (Sun, 09 Dec 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index e6c1dcc..3e652bd 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 10126 $) +perlfaq5 - Files and Formats =head1 DESCRIPTION @@ -10,56 +10,88 @@ formats, and footers. =head2 How do I flush/unbuffer an output filehandle? Why must I do this? X X X X -Perl does not support truly unbuffered output (except insofar as you -can C), although it does support is "command -buffering", in which a physical write is performed after every output -command. - -The C standard I/O library (stdio) normally buffers characters sent to -devices so that there isn't a system call for each byte. In most stdio -implementations, the type of output buffering and the size of the -buffer varies according to the type of device. Perl's C and -C functions normally buffer output, while C -bypasses buffering all together. - -If you want your output to be sent immediately when you execute -C or C (for instance, for some network protocols), -you must set the handle's autoflush flag. This flag is the Perl -variable C<$|> and when it is set to a true value, Perl will flush the -handle's buffer after each C or C. Setting C<$|> -affects buffering only for the currently selected default filehandle. -You choose this handle with the one argument C call (see -L> and L). - -Use C to choose the desired handle, then set its -per-filehandle variables. - - $old_fh = select(OUTPUT_HANDLE); - $| = 1; - select($old_fh); +(contributed by brian d foy) -Some modules offer object-oriented access to handles and their -variables, although they may be overkill if this is the only thing you -do with them. You can use C: +You might like to read Mark Jason Dominus's "Suffering From Buffering" +at http://perl.plover.com/FAQs/Buffering.html . - use IO::Handle; - open my( $printer ), ">", "/dev/printer"); # but is this? - $printer->autoflush(1); +Perl normally buffers output so it doesn't make a system call for every +bit of output. By saving up output, it makes fewer expensive system calls. +For instance, in this little bit of code, you want to print a dot to the +screen for every line you process to watch the progress of your program. +Instead of seeing a dot for every line, Perl buffers the output and you +have a long wait before you see a row of 50 dots all at once: + + # long wait, then row of dots all at once + while( <> ) { + print "."; + print "\n" unless ++$count % 50; + + #... expensive line processing operations + } + +To get around this, you have to unbuffer the output filehandle, in this +case, C. You can set the special variable C<$|> to a true value +(mnemonic: making your filehandles "piping hot"): + + $|++; + + # dot shown immediately + while( <> ) { + print "."; + print "\n" unless ++$count % 50; + + #... expensive line processing operations + } + +The C<$|> is one of the per-filehandle special variables, so each +filehandle has its own copy of its value. If you want to merge +standard output and standard error for instance, you have to unbuffer +each (although STDERR might be unbuffered by default): + + { + my $previous_default = select(STDOUT); # save previous default + $|++; # autoflush STDOUT + select(STDERR); + $|++; # autoflush STDERR, to be sure + select($previous_default); # restore previous default + } -or C (which inherits from C): + # now should alternate . and + + while( 1 ) + { + sleep 1; + print STDOUT "."; + print STDERR "+"; + print STDOUT "\n" unless ++$count % 25; + } + +Besides the C<$|> special variable, you can use C to give +your filehandle a C<:unix> layer, which is unbuffered: + + binmode( STDOUT, ":unix" ); - use IO::Socket; # this one is kinda a pipe? - my $sock = IO::Socket::INET->new( 'www.example.com:80' ); + while( 1 ) { + sleep 1; + print "."; + print "\n" unless ++$count % 50; + } - $sock->autoflush(); +For more information on output layers, see the entries for C +and C in L, and the C module documentation. -You can also flush an C object without setting -C. Call the C method to flush the buffer yourself: +If you are using C or one of its subclasses, you can +call the C method to change the settings of the +filehandle: use IO::Handle; - open my( $printer ), ">", "/dev/printer"); - $printer->flush; # one time flush + open my( $io_fh ), ">", "output.txt"; + $io_fh->autoflush(1); + +The C objects also have a C method. You can flush +the buffer any time you want without auto-buffering + $io_fh->flush; =head2 How do I change, delete, or insert a line in a file, or append to the beginning of a file? X @@ -95,7 +127,7 @@ the loop that prints the existing lines. open my $in, '<', $file or die "Can't read old file: $!"; open my $out, '>', "$file.new" or die "Can't write new file: $!"; - print "# Add this line to the top\n"; # <--- HERE'S THE MAGIC + print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC while( <$in> ) { @@ -112,7 +144,7 @@ be sure that you're supposed to do that on every line! open my $in, '<', $file or die "Can't read old file: $!"; open my $out, '>', "$file.new" or die "Can't write new file: $!"; - print "# Add this line to the top\n"; + print $out "# Add this line to the top\n"; while( <$in> ) { @@ -141,7 +173,7 @@ print it. After that, read the rest of the lines and print those: { print $out $_; } - + To skip lines, use the looping controls. The C in this example skips comment lines, and the C stops all processing once it encounters either C<__END__> or C<__DATA__>. @@ -270,11 +302,11 @@ leaving a backup of the original data from each file in a new C<.c.orig> file. =head2 How can I copy a file? -X X +X X X (contributed by brian d foy) -Use the File::Copy module. It comes with Perl and can do a +Use the C module. It comes with Perl and can do a true copy across file systems, and it does its magic in a portable fashion. @@ -282,16 +314,17 @@ a portable fashion. copy( $original, $new_copy ) or die "Copy failed: $!"; -If you can't use File::Copy, you'll have to do the work yourself: +If you can't use C, you'll have to do the work yourself: open the original file, open the destination file, then print -to the destination file as you read the original. +to the destination file as you read the original. You also have to +remember to copy the permissions, owner, and group to the new file. =head2 How do I make a temporary file name? X If you don't need to know the name of the file, you can use C -with C in place of the file name. The C function -creates an anonymous temporary file. +with C in place of the file name. In Perl 5.8 or later, the +C function creates an anonymous temporary file: open my $tmp, '+>', undef or die $!; @@ -340,7 +373,7 @@ temporary files in one process, use a counter: return (); } } - + } =head2 How can I manipulate fixed-record-length files? @@ -524,13 +557,13 @@ X See L for an C function. =head2 How can I open a filehandle to a string? -X, X, X, X +X X X X (contributed by Peter J. Holzer, hjp-usenet2@hjp.at) -Since Perl 5.8.0, you can pass a reference to a scalar instead of the -filename to create a file handle which you can used to read from or write to -a string: +Since Perl 5.8.0 a file handle referring to a string can be created by +calling open with a reference to that string instead of the filename. +This file handle can then be used to read from or write to the string: open(my $fh, '>', \$string) or die "Could not open string for writing"; print $fh "foo\n"; @@ -581,11 +614,11 @@ It is easier to see with comments: =head2 How can I translate tildes (~) in a filename? X X -Use the <> (glob()) operator, documented in L. Older -versions of Perl require that you have a shell installed that groks -tildes. Recent perl versions have this feature built in. The -File::KGlob module (available from CPAN) gives more portable glob -functionality. +Use the EE (C) operator, documented in L. +Versions of Perl older than 5.6 require that you have a shell +installed that groks tildes. Later versions of Perl have this feature +built in. The C module (available from CPAN) gives more +portable glob functionality. Within Perl, you may use this directly: @@ -826,31 +859,33 @@ If the count doesn't impress your friends, then the code might. :-) =head2 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking? X X -If you are on a system that correctly implements flock() and you use the -example appending code from "perldoc -f flock" everything will be OK -even if the OS you are on doesn't implement append mode correctly (if -such a system exists.) So if you are happy to restrict yourself to OSs -that implement flock() (and that's not really much of a restriction) -then that is what you should do. +If you are on a system that correctly implements C and you use +the example appending code from "perldoc -f flock" everything will be +OK even if the OS you are on doesn't implement append mode correctly +(if such a system exists.) So if you are happy to restrict yourself to +OSs that implement C (and that's not really much of a +restriction) then that is what you should do. If you know you are only going to use a system that does correctly -implement appending (i.e. not Win32) then you can omit the seek() from -the code in the previous answer. - -If you know you are only writing code to run on an OS and filesystem that -does implement append mode correctly (a local filesystem on a modern -Unix for example), and you keep the file in block-buffered mode and you -write less than one buffer-full of output between each manual flushing -of the buffer then each bufferload is almost guaranteed to be written to -the end of the file in one chunk without getting intermingled with -anyone else's output. You can also use the syswrite() function which is -simply a wrapper around your systems write(2) system call. +implement appending (i.e. not Win32) then you can omit the C +from the code in the previous answer. + +If you know you are only writing code to run on an OS and filesystem +that does implement append mode correctly (a local filesystem on a +modern Unix for example), and you keep the file in block-buffered mode +and you write less than one buffer-full of output between each manual +flushing of the buffer then each bufferload is almost guaranteed to be +written to the end of the file in one chunk without getting +intermingled with anyone else's output. You can also use the +C function which is simply a wrapper around your system's +C system call. There is still a small theoretical chance that a signal will interrupt -the system level write() operation before completion. There is also a -possibility that some STDIO implementations may call multiple system -level write()s even if the buffer was empty to start. There may be some -systems where this probability is reduced to zero. +the system level C operation before completion. There is also +a possibility that some STDIO implementations may call multiple system +level Cs even if the buffer was empty to start. There may be +some systems where this probability is reduced to zero, and this is +not a concern when using C<:perlio> instead of your system's STDIO. =head2 How do I randomly update a binary file? X @@ -955,7 +990,7 @@ You can use the File::Slurp module to do it in one step. use File::Slurp; $all_of_it = read_file($filename); # entire file in scalar - @all_lines = read_file($filename); # one line perl element + @all_lines = read_file($filename); # one line per element The customary Perl approach for processing all the lines in a file is to do so one line at a time: @@ -1201,9 +1236,9 @@ filehandle (perhaps you used C), you can use the C function from the C module: use POSIX (); - + POSIX::close( $fd ); - + This should rarely be necessary, as the Perl C function is to be used for things that Perl opened itself, even if it was a dup of a numeric descriptor as with C above. But if you really have @@ -1263,7 +1298,10 @@ the permissions of the file govern whether you're allowed to. =head2 How do I select a random line from a file? X -Here's an algorithm from the Camel Book: +Short of loading the file into a database or pre-indexing the lines in +the file, there are a couple of things that you can do. + +Here's a reservoir-sampling algorithm from the Camel Book: srand; rand($.) < 1 && ($line = $_) while <>; @@ -1272,49 +1310,138 @@ This has a significant advantage in space over reading the whole file in. You can find a proof of this method in I, Volume 2, Section 3.4.2, by Donald E. Knuth. -You can use the File::Random module which provides a function +You can use the C module which provides a function for that algorithm: use File::Random qw/random_line/; my $line = random_line($filename); -Another way is to use the Tie::File module, which treats the entire +Another way is to use the C module, which treats the entire file as an array. Simply access a random array element. =head2 Why do I get weird spaces when I print an array of lines? -Saying +(contributed by brian d foy) + +If you are seeing spaces between the elements of your array when +you print the array, you are probably interpolating the array in +double quotes: + + my @animals = qw(camel llama alpaca vicuna); + print "animals are: @animals\n"; - print "@lines\n"; +It's the double quotes, not the C, doing this. Whenever you +interpolate an array in a double quote context, Perl joins the +elements with spaces (or whatever is in C<$">, which is a space by +default): -joins together the elements of C<@lines> with a space between them. -If C<@lines> were C<("little", "fluffy", "clouds")> then the above -statement would print + animals are: camel llama alpaca vicuna - little fluffy clouds +This is different than printing the array without the interpolation: -but if each element of C<@lines> was a line of text, ending a newline -character C<("little\n", "fluffy\n", "clouds\n")> then it would print: + my @animals = qw(camel llama alpaca vicuna); + print "animals are: ", @animals, "\n"; - little - fluffy - clouds +Now the output doesn't have the spaces between the elements because +the elements of C<@animals> simply become part of the list to +C: -If your array contains lines, just print them: + animals are: camelllamaalpacavicuna + +You might notice this when each of the elements of C<@array> end with +a newline. You expect to print one element per line, but notice that +every line after the first is indented: + + this is a line + this is another line + this is the third line + +That extra space comes from the interpolation of the array. If you +don't want to put anything between your array elements, don't use the +array in double quotes. You can send it to print without them: print @lines; +=head2 How do I traverse a directory tree? + +(contributed by brian d foy) + +The C module, which comes with Perl, does all of the hard +work to traverse a directory structure. It comes with Perl. You simply +call the C subroutine with a callback subroutine and the +directories you want to traverse: + + use File::Find; + + find( \&wanted, @directories ); + + sub wanted { + # full path in $File::Find::name + # just filename in $_ + ... do whatever you want to do ... + } + +The C, which you can download from CPAN, provides +many ready-to-use subroutines that you can use with C. + +The C, which you can download from CPAN, can help you +create the callback subroutine using something closer to the syntax of +the C command-line utility: + + use File::Find; + use File::Finder; + + my $deep_dirs = File::Finder->depth->type('d')->ls->exec('rmdir','{}'); + + find( $deep_dirs->as_options, @places ); + +The C module, which you can download from CPAN, has +a similar interface, but does the traversal for you too: + + use File::Find::Rule; + + my @files = File::Find::Rule->file() + ->name( '*.pm' ) + ->in( @INC ); + +=head2 How do I delete a directory tree? + +(contributed by brian d foy) + +If you have an empty directory, you can use Perl's built-in C. If +the directory is not empty (so, no files or subdirectories), you either +have to empty it yourself (a lot of work) or use a module to help you. + +The C module, which comes with Perl, has a C which +can take care of all of the hard work for you: + + use File::Path qw(rmtree); + + rmtree( \@directories, 0, 0 ); + +The first argument to C is either a string representing a directory path +or an array reference. The second argument controls progress messages, and the +third argument controls the handling of files you don't have permissions to +delete. See the C module for the details. + +=head2 How do I copy an entire directory? + +(contributed by Shlomi Fish) + +To do the equivalent of C (i.e. copy an entire directory tree +recursively) in portable Perl, you'll either need to write something yourself +or find a good CPAN module such as L. =head1 REVISION -Revision: $Revision: 10126 $ +Revision: $Revision$ -Date: $Date: 2007-10-27 21:29:20 +0200 (Sat, 27 Oct 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 6bf1428..daa8402 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regular Expressions ($Revision: 10126 $) +perlfaq6 - Regular Expressions =head1 DESCRIPTION @@ -97,7 +97,7 @@ to newlines. But it's imperative that $/ be set to something other than the default, or else we won't actually ever have a multiline record read in. - $/ = ''; # read in more whole paragraph, not just one line + $/ = ''; # read in whole paragraph, not just one line while ( <> ) { while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha print "Duplicate $1 at paragraph $.\n"; @@ -107,7 +107,7 @@ record read in. Here's code that finds sentences that begin with "From " (which would be mangled by many mailers): - $/ = ''; # read in more whole paragraph, not just one line + $/ = ''; # read in whole paragraph, not just one line while ( <> ) { while ( /^From /gm ) { # /m makes ^ match next to \n print "leading from in paragraph $.\n"; @@ -149,11 +149,52 @@ Here's another example of using C<..>: $. = 0 if eof; # fix $. } +=head2 How do I match XML, HTML, or other nasty, ugly things with a regex? +X X X X X X +X + +(contributed by brian d foy) + +If you just want to get work done, use a module and forget about the +regular expressions. The C and C modules +are good starts, although each namespace has other parsing modules +specialized for certain tasks and different ways of doing it. Start at +CPAN Search ( http://search.cpan.org ) and wonder at all the work people +have done for you already! :) + +The problem with things such as XML is that they have balanced text +containing multiple levels of balanced text, but sometimes it isn't +balanced text, as in an empty tag (C<<
>>, for instance). Even then, +things can occur out-of-order. Just when you think you've got a +pattern that matches your input, someone throws you a curveball. + +If you'd like to do it the hard way, scratching and clawing your way +toward a right answer but constantly being disappointed, beseiged by +bug reports, and weary from the inordinate amount of time you have to +spend reinventing a triangular wheel, then there are several things +you can try before you give up in frustration: + +=over 4 + +=item * Solve the balanced text problem from another question in L + +=item * Try the recursive regex features in Perl 5.10 and later. See L + +=item * Try defining a grammar using Perl 5.10's C<(?DEFINE)> feature. + +=item * Break the problem down into sub-problems instead of trying to use a single regex + +=item * Convince everyone not to use XML or HTML in the first place + +=back + +Good luck! + =head2 I put a regular expression into $/ but it didn't work. What's wrong? X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in> X<$RS, regexes in> -$/ has to be a string. You can use these examples if you really need to +$/ has to be a string. You can use these examples if you really need to do this. If you have File::Stream, this is easy. @@ -169,21 +210,21 @@ If you have File::Stream, this is easy. If you don't have File::Stream, you have to do a little more work. -You can use the four argument form of sysread to continually add to +You can use the four-argument form of sysread to continually add to a buffer. After you add to the buffer, you check if you have a complete line (using your regular expression). local $_ = ""; while( sysread FH, $_, 8192, length ) { - while( s/^((?s).*?)your_pattern/ ) { + while( s/^((?s).*?)your_pattern// ) { my $record = $1; # do stuff here. } } - You can do the same thing with foreach and a match using the - c flag and the \G anchor, if you do not mind your entire file - being in memory at the end. +You can do the same thing with foreach and a match using the +c flag and the \G anchor, if you do not mind your entire file +being in memory at the end. local $_ = ""; while( sysread FH, $_, 8192, length ) { @@ -224,9 +265,9 @@ And here it is as a subroutine, modeled after the above: substr($mask, -1) x (length($new) - length($old)) } - $a = "this is a TEsT case"; - $a =~ s/(test)/preserve_case($1, "success")/egi; - print "$a\n"; + $string = "this is a TEsT case"; + $string =~ s/(test)/preserve_case($1, "success")/egi; + print "$string\n"; This prints: @@ -356,7 +397,7 @@ This example takes a regular expression from the argument list and prints the lines of input that match it: my $pattern = shift @ARGV; - + while( <> ) { print if m/$pattern/; } @@ -367,7 +408,7 @@ would prevent this by telling Perl to compile the pattern the first time, then reuse that for subsequent iterations: my $pattern = shift @ARGV; - + while( <> ) { print if m/$pattern/o; # useful for Perl < 5.6 } @@ -387,7 +428,7 @@ compiling the regular expression on each iteration. With Perl 5.6 or later, you should only see C report that for the first iteration. use re 'debug'; - + $regex = 'Perl'; foreach ( qw(Perl Java Ruby Python) ) { print STDERR "-" x 73, "\n"; @@ -452,40 +493,138 @@ whitespace and comments. Here it is expanded, courtesy of Fred Curtis. ) }{defined $2 ? $2 : ""}gxse; -A slight modification also removes C++ comments, as long as they are not -spread over multiple lines using a continuation character): +A slight modification also removes C++ comments, possibly spanning multiple lines +using a continuation character: - s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse; =head2 Can I use Perl regular expressions to match balanced text? X X -X - -Historically, Perl regular expressions were not capable of matching -balanced text. As of more recent versions of perl including 5.6.1 -experimental features have been added that make it possible to do this. -Look at the documentation for the (??{ }) construct in recent perlre manual -pages to see an example of matching balanced parentheses. Be sure to take -special notice of the warnings present in the manual before making use -of this feature. - -CPAN contains many modules that can be useful for matching text -depending on the context. Damian Conway provides some useful -patterns in Regexp::Common. The module Text::Balanced provides a -general solution to this problem. - -One of the common applications of balanced text matching is working -with XML and HTML. There are many modules available that support -these needs. Two examples are HTML::Parser and XML::Parser. There -are many others. - -An elaborate subroutine (for 7-bit ASCII only) to pull out balanced -and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>, -or C<(> and C<)> can be found in -http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz . - -The C::Scan module from CPAN also contains such subs for internal use, -but they are undocumented. +X X X +X X X X + +(contributed by brian d foy) + +Your first try should probably be the C module, which +is in the Perl standard library since Perl 5.8. It has a variety of +functions to deal with tricky text. The C module can +also help by providing canned patterns you can use. + +As of Perl 5.10, you can match balanced text with regular expressions +using recursive patterns. Before Perl 5.10, you had to resort to +various tricks such as using Perl code in C<(??{})> sequences. + +Here's an example using a recursive regular expression. The goal is to +capture all of the text within angle brackets, including the text in +nested angle brackets. This sample text has two "major" groups: a +group with one level of nesting and a group with two levels of +nesting. There are five total groups in angle brackets: + + I have some > and + > > + and that's it. + +The regular expression to match the balanced text uses two new (to +Perl 5.10) regular expression features. These are covered in L +and this example is a modified version of one in that documentation. + +First, adding the new possesive C<+> to any quantifier finds the +longest match and does not backtrack. That's important since you want +to handle any angle brackets through the recursion, not backtracking. +The group C<< [^<>]++ >> finds one or more non-angle brackets without +backtracking. + +Second, the new C<(?PARNO)> refers to the sub-pattern in the +particular capture buffer given by C. In the following regex, +the first capture buffer finds (and remembers) the balanced text, and +you need that same pattern within the first buffer to get past the +nested text. That's the recursive part. The C<(?1)> uses the pattern +in the outer capture buffer as an independent part of the regex. + +Putting it all together, you have: + + #!/usr/local/bin/perl5.10.0 + + my $string =<<"HERE"; + I have some > and + > > + and that's it. + HERE + + my @groups = $string =~ m/ + ( # start of capture buffer 1 + < # match an opening angle bracket + (?: + [^<>]++ # one or more non angle brackets, non backtracking + | + (?1) # found < or >, so recurse to capture buffer 1 + )* + > # match a closing angle bracket + ) # end of capture buffer 1 + /xg; + + $" = "\n\t"; + print "Found:\n\t@groups\n"; + +The output shows that Perl found the two major groups: + + Found: + > + > > + +With a little extra work, you can get the all of the groups in angle +brackets even if they are in other angle brackets too. Each time you +get a balanced match, remove its outer delimiter (that's the one you +just matched so don't match it again) and add it to a queue of strings +to process. Keep doing that until you get no matches: + + #!/usr/local/bin/perl5.10.0 + + my @queue =<<"HERE"; + I have some > and + > > + and that's it. + HERE + + my $regex = qr/ + ( # start of bracket 1 + < # match an opening angle bracket + (?: + [^<>]++ # one or more non angle brackets, non backtracking + | + (?1) # recurse to bracket 1 + )* + > # match a closing angle bracket + ) # end of bracket 1 + /x; + + $" = "\n\t"; + + while( @queue ) + { + my $string = shift @queue; + + my @groups = $string =~ m/$regex/g; + print "Found:\n\t@groups\n\n" if @groups; + + unshift @queue, map { s/^$//; $_ } @groups; + } + +The output shows all of the groups. The outermost matches show up +first and the nested matches so up later: + + Found: + > + > > + + Found: + + + Found: + > + + Found: + =head2 What does it mean that regexes are greedy? How can I get around it? X X @@ -575,7 +714,7 @@ X Avoid asking Perl to compile a regular expression every time you want to match it. In this example, perl must recompile -the regular expression for every iteration of the foreach() +the regular expression for every iteration of the C loop since it has no way to know what $pattern will be. @patterns = qw( foo bar baz ); @@ -592,10 +731,10 @@ loop since it has no way to know what $pattern will be. } } -The qr// operator showed up in perl 5.005. It compiles a +The C operator showed up in perl 5.005. It compiles a regular expression, but doesn't apply it. When you use the pre-compiled version of the regex, perl does less work. In -this example, I inserted a map() to turn each pattern into +this example, I inserted a C to turn each pattern into its pre-compiled form. The rest of the script is the same, but faster. @@ -605,8 +744,11 @@ but faster. { foreach $pattern ( @patterns ) { - print if /$pattern/i; - next LINE; + if( /$pattern/ ) + { + print; + next LINE; + } } } @@ -621,8 +763,8 @@ backtracking though. print if /\b(?:$regex)\b/i; } -For more details on regular expression efficiency, see Mastering -Regular Expressions by Jeffrey Freidl. He explains how regular +For more details on regular expression efficiency, see I by Jeffrey Freidl. He explains how regular expressions engine work and why some patterns are surprisingly inefficient. Once you understand how perl applies regular expressions, you can tune them for individual situations. @@ -706,6 +848,11 @@ of each match (see perlvar for the full story), so they give you essentially the same information, but without the risk of excessive string copying. +Perl 5.10 added three specials, C<${^MATCH}>, C<${^PREMATCH}>, and +C<${^POSTMATCH}> to do the same job but without the global performance +penalty. Perl 5.10 only sets these variables if you compile or execute the +regular expression with the C

modifier. + =head2 What good is C<\G> in a regular expression? X<\G> @@ -983,15 +1130,15 @@ Or... =head1 REVISION -Revision: $Revision: 10126 $ +Revision: $Revision$ -Date: $Date: 2007-10-27 21:29:20 +0200 (Sat, 27 Oct 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod index 5f4e39c..4c0c2f1 100644 --- a/pod/perlfaq7.pod +++ b/pod/perlfaq7.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq7 - General Perl Language Issues ($Revision: 10100 $) +perlfaq7 - General Perl Language Issues =head1 DESCRIPTION @@ -63,7 +63,7 @@ one-liners: if ($whoops) { exit 1 } @nums = (1, 2, 3); - + if ($whoops) { exit 1; } @@ -153,7 +153,7 @@ deliberately have precedence lower than that of list operators for just such situations as the one above. Another operator with surprising precedence is exponentiation. It -binds more tightly even than unary minus, making C<-2**2> product a +binds more tightly even than unary minus, making C<-2**2> produce a negative not a positive four. It is also right-associating, meaning that C<2**3**2> is two raised to the ninth power, not eight squared. @@ -172,7 +172,7 @@ Here's an example: $person = {}; # new anonymous hash $person->{AGE} = 24; # set field AGE to 24 $person->{NAME} = "Nat"; # set field NAME to "Nat" - + If you're looking for something a bit more rigorous, try L. =head2 How do I create a module? @@ -202,6 +202,9 @@ distributions. (contributed by brian d foy) +The full answer to this can be found at +http://cpan.org/modules/04pause.html#takeover + The easiest way to take over a module is to have the current module maintainer either make you a co-maintainer or transfer the module to you. @@ -223,7 +226,7 @@ Write to modules@perl.org explaining what you did to contact the current maintainer. The PAUSE admins will also try to reach the maintainer. -=item +=item Post a public message in a heavily trafficked site announcing your intention to take over the module. @@ -231,16 +234,26 @@ intention to take over the module. =item Wait a bit. The PAUSE admins don't want to act too quickly in case -the current maintainer is on holiday. If there's no response to +the current maintainer is on holiday. If there's no response to private communication or the public post, a PAUSE admin can transfer it to you. =back =head2 How do I create a class? +X X -See L for an introduction to classes and objects, as well as -L and L. +(contributed by brian d foy) + +In Perl, a class is just a package, and methods are just subroutines. +Perl doesn't get more formal than that and lets you set up the package +just the way that you like it (that is, it doesn't set up anything for +you). + +The Perl documentation has several tutorials that cover class +creation, including L (Barnyard Object Oriented Tutorial), +L (Tom's Object Oriented Tutorial), L (Bag o' +Object Tricks), and L. =head2 How can I tell if a variable is tainted? @@ -256,7 +269,7 @@ I is a computer science term with a precise but hard-to-explain meaning. Usually, closures are implemented in Perl as anonymous subroutines with lasting references to lexical variables outside their own scopes. These lexicals magically refer to the -variables that were around when the subroutine was defined (deep +variables that were around when the subroutine was defined (deep binding). Closures are most often used in programming languages where you can @@ -291,7 +304,7 @@ value that the lexical had when the function was created. my $addpiece = shift; return sub { shift() + $addpiece }; } - + $f1 = make_adder(20); $f2 = make_adder(555); @@ -486,9 +499,11 @@ You could also investigate the can() method in the UNIVERSAL class (contributed by brian d foy) -Perl doesn't have "static" variables, which can only be accessed from -the function in which they are declared. You can get the same effect -with lexical variables, though. +In Perl 5.10, declare the variable with C. The C +declaration creates the lexical variable that persists between calls +to the subroutine: + + sub counter { state $count = 1; $counter++ } You can fake a static variable by using a lexical variable which goes out of scope. In this example, you define the subroutine C, and @@ -508,11 +523,11 @@ C. my $count = 1; sub counter { $count++ } } - + my $start = counter(); - + .... # code that calls counter(); - + my $end = counter(); In the previous example, you created a function-private variable @@ -559,19 +574,19 @@ For instance: sub visible { print "var has value $var\n"; } - + sub dynamic { local $var = 'local'; # new temporary value for the still-global visible(); # variable called $var } - + sub lexical { my $var = 'private'; # new private variable, $var visible(); # (invisible outside of sub scope) } - + $var = 'global'; - + visible(); # prints global dynamic(); # prints local lexical(); # prints global @@ -670,26 +685,56 @@ see L. =head2 What's the difference between calling a function as &foo and foo()? -When you call a function as C<&foo>, you allow that function access to -your current @_ values, and you bypass prototypes. -The function doesn't get an empty @_--it gets yours! While not -strictly speaking a bug (it's documented that way in L), it -would be hard to consider this a feature in most cases. +(contributed by brian d foy) + +Calling a subroutine as C<&foo> with no trailing parentheses ignores +the prototype of C and passes it the current value of the argumet +list, C<@_>. Here's an example; the C subroutine calls C<&foo>, +which prints what its arguments list: + + sub bar { &foo } + + sub foo { print "Args in foo are: @_\n" } + + bar( qw( a b c ) ); + +When you call C with arguments, you see that C got the same C<@_>: -When you call your function as C<&foo()>, then you I get a new @_, -but prototyping is still circumvented. + Args in foo are: a b c -Normally, you want to call a function using C. You may only -omit the parentheses if the function is already known to the compiler -because it already saw the definition (C but not C), -or via a forward reference or C declaration. Even in this -case, you get a clean @_ without any of the old values leaking through -where they don't belong. +Calling the subroutine with trailing parentheses, with or without arguments, +does not use the current C<@_> and respects the subroutine prototype. Changing +the example to put parentheses after the call to C changes the program: + + sub bar { &foo() } + + sub foo { print "Args in foo are: @_\n" } + + bar( qw( a b c ) ); + +Now the output shows that C doesn't get the C<@_> from its caller. + + Args in foo are: + +The main use of the C<@_> pass-through feature is to write subroutines +whose main job it is to call other subroutines for you. For further +details, see L. =head2 How do I create a switch or case statement? +In Perl 5.10, use the C construct described in L: + + use 5.010; + + given ( $string ) { + when( 'Fred' ) { say "I found Fred!" } + when( 'Barney' ) { say "I found Barney!" } + when( /Bamm-?Bamm/ ) { say "I found Bamm-Bamm!" } + default { say "I don't recognize the name!" } + }; + If one wants to use pure Perl and to be compatible with Perl versions -prior to 5.10, the general answer is to write a construct like this: +prior to 5.10, the general answer is to use C: for ($variable_to_test) { if (/pat1/) { } # do something @@ -758,7 +803,7 @@ A totally different approach is to create a hash of function references. "done" => sub { die "See ya!" }, "mad" => \&angry, ); - + print "How are you? "; chomp($string = ); if ($commands{$string}) { @@ -767,9 +812,6 @@ A totally different approach is to create a hash of function references. print "No such command: $string\n"; } -Note that starting from version 5.10, Perl has now a native switch -statement. See L. - Starting from Perl 5.8, a source filter module, C, can also be used to get switch and case. Its use is now discouraged, because it's not fully compatible with the native switch of Perl 5.10, and because, @@ -807,52 +849,89 @@ L. Make sure to read about creating modules in L and the perils of indirect objects in L. -=head2 How can I find out my current package? +=head2 How can I find out my current or calling package? -If you're just a random program, you can do this to find -out what the currently compiled package is: +(contributed by brian d foy) - my $packname = __PACKAGE__; +To find the package you are currently in, use the special literal +C<__PACKAGE__>, as documented in L. You can only use the +special literals as separate tokens, so you can't interpolate them +into strings like you can with variables: -But, if you're a method and you want to print an error message -that includes the kind of object you were called on (which is -not necessarily the same as the one in which you were compiled): + my $current_package = __PACKAGE__; + print "I am in package $current_package\n"; - sub amethod { - my $self = shift; - my $class = ref($self) || $self; - warn "called me from a $class object"; - } +This is different from finding out the package an object is blessed +into, which might not be the current package. For that, use C +from C, part of the Standard Library since Perl 5.8: + + use Scalar::Util qw(blessed); + my $object_package = blessed( $object ); + +Most of the time, you shouldn't care what package an object is blessed +into, however, as long as it claims to inherit from that class: -=head2 How can I comment out a large block of perl code? + my $is_right_class = eval { $object->isa( $package ) }; # true or false + +If you want to find the package calling your code, perhaps to give better +diagnostics as C does, use the C built-in: + + sub foo { + my @args = ...; + my( $package, $filename, $line ) = caller; + + print "I was called from package $package\n"; + ); + +By default, your program starts in package C
, so you should +always be in some package unless someone uses the C built-in +with no namespace. See the C entry in L for the +details of empty packges. + +=head2 How can I comment out a large block of Perl code? + +(contributed by brian d foy) -You can use embedded POD to discard it. Enclose the blocks you want -to comment out in POD markers. The <=begin> directive marks a section -for a specific formatter. Use the C format, which no formatter -should claim to understand (by policy). Mark the end of the block -with <=end>. +The quick-and-dirty way to comment out more than one line of Perl is +to surround those lines with Pod directives. You have to put these +directives at the beginning of the line and somewhere where Perl +expects a new statement (so not in the middle of statements like the # +comments). You end the comment with C<=cut>, ending the Pod section: + + =pod + + my $object = NotGonnaHappen->new(); + + ignored_sub(); + + $wont_be_assigned = 37; + + =cut + +The quick-and-dirty method only works well when you don't plan to +leave the commented code in the source. If a Pod parser comes along, +you're multiline comment is going to show up in the Pod translation. +A better way hides it from Pod parsers as well. + +The C<=begin> directive can mark a section for a particular purpose. +If the Pod parser doesn't want to handle it, it just ignores it. Label +the comments with C. End the comment using C<=end> with the +same label. You still need the C<=cut> to go back to Perl code from +the Pod comment: - # program is here - =begin comment - - all of this stuff - - here will be ignored - by everyone - + + my $object = NotGonnaHappen->new(); + + ignored_sub(); + + $wont_be_assigned = 37; + =end comment - - =cut - - # program continues -The pod directives cannot go just anywhere. You must put a -pod directive where the parser is expecting a new statement, -not just in the middle of an expression or some other -arbitrary grammar production. + =cut -See L for more details. +For more information on Pod, check out L and L. =head2 How do I clear a package? @@ -1011,15 +1090,15 @@ where you expect it so you need to adjust your shebang line. =head1 REVISION -Revision: $Revision: 10100 $ +Revision: $Revision$ -Date: $Date: 2007-10-21 20:59:30 +0200 (Sun, 21 Oct 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod index 7def972..78039f9 100644 --- a/pod/perlfaq8.pod +++ b/pod/perlfaq8.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq8 - System Interaction ($Revision: 10183 $) +perlfaq8 - System Interaction =head1 DESCRIPTION @@ -20,11 +20,16 @@ the name of the operating system (not its release number) that your perl binary was built for. =head2 How come exec() doesn't return? +X X X X X -Because that's what it does: it replaces your currently running -program with a different one. If you want to keep going (as is -probably the case if you're asking this question) use system() -instead. +(contributed by brian d foy) + +The C function's job is to turn your process into another +command and never to return. If that's not what you want to do, don't +use C. :) + +If you want to run an external command and still keep your Perl process +going, look at a piped C, C, or C. =head2 How do I do fancy stuff with the keyboard/screen/mouse? @@ -169,24 +174,50 @@ not to block: =head2 How do I clear the screen? -If you only have do so infrequently, use C: +(contributed by brian d foy) - system("clear"); +To clear the screen, you just have to print the special sequence +that tells the terminal to clear the screen. Once you have that +sequence, output it when you want to clear the screen. -If you have to do this a lot, save the clear string -so you can print it 100 times without calling a program -100 times: +You can use the C module to get the special +sequence. Import the C function (or the C<:screen> tag): - $clear_string = `clear`; - print $clear_string; + use Term::ANSIScreen qw(cls); + my $clear_screen = cls(); + + print $clear_screen; -If you're planning on doing other screen manipulations, like cursor -positions, etc, you might wish to use Term::Cap module: +The C module can also get the special sequence if you want +to deal with the low-level details of terminal control. The C +method returns the string for the given capability: use Term::Cap; - $terminal = Term::Cap->Tgetent( {OSPEED => 9600} ); + + $terminal = Term::Cap->Tgetent( { OSPEED => 9600 } ); $clear_string = $terminal->Tputs('cl'); + print $clear_screen; + +On Windows, you can use the C module. After creating +an object for the output filehandle you want to affect, call the +C method: + + Win32::Console; + + $OUT = Win32::Console->new(STD_OUTPUT_HANDLE); + my $clear_string = $OUT->Cls; + + print $clear_screen; + +If you have a command-line program that does the job, you can call +it in backticks to capture whatever it outputs so you can use it +later: + + $clear_string = `clear`; + + print $clear_string; + =head2 How do I get the screen size? If you have Term::ReadKey module installed from CPAN, @@ -346,18 +377,25 @@ passwd(1), for example). =head2 How do I start a process in the background? -Several modules can start other processes that do not block -your Perl program. You can use IPC::Open3, Parallel::Jobs, -IPC::Run, and some of the POE modules. See CPAN for more -details. +(contributed by brian d foy) -You could also use +There's not a single way to run code in the background so you don't +have to wait for it to finish before your program moves on to other +tasks. Process management depends on your particular operating system, +and many of the techniques are in L. + +Several CPAN modules may be able to help, including IPC::Open2 or +IPC::Open3, IPC::Run, Parallel::Jobs, Parallel::ForkManager, POE, +Proc::Background, and Win32::Process. There are many other modules you +might use, so check those namespaces for other options too. + +If you are on a unix-like system, you might be able to get away with a +system call where you put an C<&> on the end of the command: system("cmd &") -or you could use fork as documented in L, with -further examples in L. Some things to be aware of, if you're -on a Unix-like system: +You can also try using C, as described in L (although +this is the same thing that many of the modules will do for you). =over 4 @@ -1181,46 +1219,56 @@ and use?". =head2 What's the difference between require and use? -Perl offers several different ways to include code from one file into -another. Here are the deltas between the various inclusion constructs: +(contributed by brian d foy) + +Perl runs C statement at run-time. Once Perl loads, compiles, +and runs the file, it doesn't do anything else. The C statement +is the same as a C run at compile-time, but Perl also calls the +C method for the loaded package. These two are the same: + + use MODULE qw(import list); - 1) do $file is like eval `cat $file`, except the former - 1.1: searches @INC and updates %INC. - 1.2: bequeaths an *unrelated* lexical scope on the eval'ed code. + BEGIN { + require MODULE; + MODULE->import(import list); + } + +However, you can suppress the C by using an explicit, empty +import list. Both of these still happen at compile-time: - 2) require $file is like do $file, except the former - 2.1: checks for redundant loading, skipping already loaded files. - 2.2: raises an exception on failure to find, compile, or execute $file. + use MODULE (); - 3) require Module is like require "Module.pm", except the former - 3.1: translates each "::" into your system's directory separator. - 3.2: primes the parser to disambiguate class Module as an indirect object. + BEGIN { + require MODULE; + } - 4) use Module is like require Module, except the former - 4.1: loads the module at compile time, not run-time. - 4.2: imports symbols and semantics from that package to the current one. +Since C will also call the C method, the actual value +for C must be a bareword. That is, C cannot load files +by name, although C can: -In general, you usually want C and a proper Perl module. + require "$ENV{HOME}/lib/Foo.pm"; # no @INC searching! + +See the entry for C in L for more details. =head2 How do I keep my own module/library directory? When you build modules, tell Perl where to install the modules. -For C-based distributions, use the PREFIX and LIB options +For C-based distributions, use the INSTALL_BASE option when generating Makefiles: - perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib + perl Makefile.PL INSTALL_BASE=/mydir/perl You can set this in your CPAN.pm configuration so modules automatically install in your private library directory when you use the CPAN.pm shell: % cpan - cpan> o conf makepl_arg PREFIX=/mydir/perl,LIB=/mydir/perl/lib + cpan> o conf makepl_arg INSTALL_BASE=/mydir/perl cpan> o conf commit For C-based distributions, use the --install_base option: - perl Build.PL --install_base /mydir/perl + perl Build.PL --install_base /mydir/perl You can configure CPAN.pm to automatically use this option too: @@ -1228,6 +1276,19 @@ You can configure CPAN.pm to automatically use this option too: cpan> o conf mbuild_arg --install_base /mydir/perl cpan> o conf commit +INSTALL_BASE tells these tools to put your modules into +F. See L for details on how to run your newly +installed moudles. + +There is one caveat with INSTALL_BASE, though, since it acts +differently than the PREFIX and LIB settings that older versions of +ExtUtils::MakeMaker advocated. INSTALL_BASE does not support +installing modules for multiple versions of Perl or different +architectures under the same directory. You should consider if you +really want that , and if you do, use the older PREFIX and LIB +settings. See the ExtUtils::Makemaker documentation for more details. + =head2 How do I add the directory my program lives in to the module/library search path? (contributed by brian d foy) @@ -1237,7 +1298,7 @@ for any other directory. You might if you know the directory at compile time: use lib $directory; - + The trick in this task is to find the directory. Before your script does anything else (such as a C), you can get the current working directory with the C module, which comes with Perl: @@ -1246,30 +1307,28 @@ directory with the C module, which comes with Perl: use Cwd; our $directory = cwd; } - + use lib $directory; - + You can do a similar thing with the value of C<$0>, which holds the script name. That might hold a relative path, but C can turn -it into an absolute path. Once you have the +it into an absolute path. Once you have the - BEGIN { + BEGIN { use File::Spec::Functions qw(rel2abs); use File::Basename qw(dirname); - + my $path = rel2abs( $0 ); our $directory = dirname( $path ); } - + use lib $directory; -The C module, which comes with Perl, might work. It searches -through C<$ENV{PATH}> (so your script has to be in one of those -directories). You can then use that directory (in C<$FindBin::Bin>) -to locate nearby directories you want to add: +The C module, which comes with Perl, might work. It finds the +directory of the currently running script and puts it in C<$Bin>, which +you can then use to construct the right library path: - use FindBin; - use lib "$FindBin::Bin/../lib"; + use FindBin qw($Bin); =head2 How do I add a directory to my include path (@INC) at runtime? @@ -1310,15 +1369,15 @@ but other times it is not. Modern programs C instead. =head1 REVISION -Revision: $Revision: 10183 $ +Revision: $Revision$ -Date: $Date: 2007-11-07 09:35:12 +0100 (Wed, 07 Nov 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index 8ac3d0e..ce0cf07 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq9 - Networking ($Revision: 8539 $) +perlfaq9 - Networking =head1 DESCRIPTION @@ -65,7 +65,6 @@ listed in the CGI Meta FAQ: http://www.perl.org/CGI_MetaFAQ.html - =head2 How can I get better error messages from a CGI program? Use the CGI::Carp module. It replaces C and C, plus the @@ -73,25 +72,25 @@ normal Carp modules C, C, and C functions with more verbose and safer versions. It still sends them to the normal server error log. - use CGI::Carp; - warn "This is a complaint"; - die "But this one is serious"; + use CGI::Carp; + warn "This is a complaint"; + die "But this one is serious"; The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well: - BEGIN { - use CGI::Carp qw(carpout); - open(LOG, ">>/var/local/cgi-logs/mycgi-log") - or die "Unable to append to mycgi-log: $!\n"; - carpout(*LOG); - } + BEGIN { + use CGI::Carp qw(carpout); + open(LOG, ">>/var/local/cgi-logs/mycgi-log") + or die "Unable to append to mycgi-log: $!\n"; + carpout(*LOG); + } You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging, but might confuse the end user. - use CGI::Carp qw(fatalsToBrowser); - die "Bad error here"; + use CGI::Carp qw(fatalsToBrowser); + die "Bad error here"; Even if the error happens before you get the HTTP header out, the module will try to take care of this to avoid the dreaded server 500 errors. @@ -114,8 +113,8 @@ entities--like C<<> for example. Here's one "simple-minded" approach, that works for most files: - #!/usr/bin/perl -p0777 - s/<(?:[^>'"]*|(['"]).*?\1)*>//gs + #!/usr/bin/perl -p0777 + s/<(?:[^>'"]*|(['"]).*?\1)*>//gs If you want a more complete solution, see the 3-stage striphtml program in @@ -125,25 +124,25 @@ http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz Here are some tricky cases that you should think about when picking a solution: - A > B + A > B - A > B - + - + - <# Just data #> + <# Just data #> - >>>>>>>>>>> ]]> + >>>>>>>>>>> ]]> If HTML comments include other tags, those solutions would also break on text like this: - + =head2 How do I extract URLs? @@ -163,14 +162,13 @@ solution from Tom Christiansen runs 100 times faster than most module based approaches but only extracts URLs from anchors where the first attribute is HREF and there are no other attributes. - #!/usr/bin/perl -n00 - # qxurl - tchrist@perl.com - print "$2\n" while m{ - < \s* - A \s+ HREF \s* = \s* (["']) (.*?) \1 - \s* > - }gsix; - + #!/usr/bin/perl -n00 + # qxurl - tchrist@perl.com + print "$2\n" while m{ + < \s* + A \s+ HREF \s* = \s* (["']) (.*?) \1 + \s* > + }gsix; =head2 How do I download a file from the user's machine? How do I open a file on another machine? @@ -200,47 +198,36 @@ examples. start_form, "What's your favorite animal? ", - popup_menu( - -name => 'animal', + popup_menu( + -name => 'animal', -values => [ qw( Llama Alpaca Camel Ram ) ] ), - submit, - - end_form, - end_html; + submit, + end_form, + end_html; =head2 How do I fetch an HTML file? -One approach, if you have the lynx text-based HTML browser installed -on your system, is this: +(contributed by brian d foy) + +Use the libwww-perl distribution. The C module can fetch web +resources and give their content back to you as a string: - $html_code = `lynx -source $url`; - $text_data = `lynx -dump $url`; + use LWP::Simple qw(get); -The libwww-perl (LWP) modules from CPAN provide a more powerful way -to do this. They don't require lynx, but like lynx, can still work -through proxies: + my $html = get( "http://www.example.com/index.html" ); - # simplest version - use LWP::Simple; - $content = get($URL); +It can also store the resource directly in a file: - # or print HTML from a URL - use LWP::Simple; - getprint "http://www.linpro.no/lwp/"; + use LWP::Simple qw(getstore); - # or print ASCII from HTML from a URL - # also need HTML-Tree package from CPAN - use LWP::Simple; - use HTML::Parser; - use HTML::FormatText; - my ($html, $ascii); - $html = get("http://www.perl.com/"); - defined $html - or die "Can't fetch HTML from http://www.perl.com/"; - $ascii = HTML::FormatText->new->format(parse_html($html)); - print $ascii; + getstore( "http://www.example.com/index.html", "foo.html" ); + +If you need to do something more complicated, you can use +C module to create your own user-agent (e.g. browser) +to get the job done. If you want to simulate an interactive web +browser, you can use the C module. =head2 How do I automate an HTML form submission? @@ -251,46 +238,65 @@ documentation for all the details. If you're submitting values using the GET method, create a URL and encode the form using the C method: - use LWP::Simple; - use URI::URL; + use LWP::Simple; + use URI::URL; - my $url = url('http://www.perl.com/cgi-bin/cpan_mod'); - $url->query_form(module => 'DB_File', readme => 1); - $content = get($url); + my $url = url('http://www.perl.com/cgi-bin/cpan_mod'); + $url->query_form(module => 'DB_File', readme => 1); + $content = get($url); If you're using the POST method, create your own user agent and encode the content appropriately. - use HTTP::Request::Common qw(POST); - use LWP::UserAgent; + use HTTP::Request::Common qw(POST); + use LWP::UserAgent; - $ua = LWP::UserAgent->new(); - my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod', - [ module => 'DB_File', readme => 1 ]; - $content = $ua->request($req)->as_string; + $ua = LWP::UserAgent->new(); + my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod', + [ module => 'DB_File', readme => 1 ]; + $content = $ua->request($req)->as_string; =head2 How do I decode or create those %-encodings on the web? +X X X X X + +(contributed by brian d foy) + +Those C<%> encodings handle reserved characters in URIs, as described +in RFC 2396, Section 2. This encoding replaces the reserved character +with the hexadecimal representation of the character's number from +the US-ASCII table. For instance, a colon, C<:>, becomes C<%3A>. + +In CGI scripts, you don't have to worry about decoding URIs if you are +using C. You shouldn't have to process the URI yourself, +either on the way in or the way out. -If you are writing a CGI script, you should be using the CGI.pm module -that comes with perl, or some other equivalent module. The CGI module -automatically decodes queries for you, and provides an escape() -function to handle encoding. +If you have to encode a string yourself, remember that you should +never try to encode an already-composed URI. You need to escape the +components separately then put them together. To encode a string, you +can use the the C module. The C function +returns the escaped string: -The best source of detailed information on URI encoding is RFC 2396. -Basically, the following substitutions do it: + my $original = "Colon : Hash # Percent %"; - s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode + my $escaped = uri_escape( $original ) - s/%([A-Fa-f\d]{2})/chr hex $1/eg; # decode - s/%([[:xdigit:]]{2})/chr hex $1/eg; # same thing + print "$string\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25%20' -However, you should only apply them to individual URI components, not -the entire URI, otherwise you'll lose information and generally mess -things up. If that didn't explain it, don't worry. Just go read -section 2 of the RFC, it's probably the best explanation there is. +To decode the string, use the C function: -RFC 2396 also contains a lot of other useful information, including a -regexp for breaking any arbitrary URI into components (Appendix B). + my $unescaped = uri_unescape( $escaped ); + + print $unescaped; # back to original + +If you wanted to do it yourself, you simply need to replace the +reserved characters with their encodings. A global substitution +is one way to do it: + + # encode + $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg; + + #decode + $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg; =head2 How do I redirect to another page? @@ -304,26 +310,23 @@ allow relative URLs in either case. Use of CGI.pm is strongly recommended. This example shows redirection with a complete URL. This redirection is handled by the web browser. - use CGI qw/:standard/; - - my $url = 'http://www.cpan.org/'; - print redirect($url); + use CGI qw/:standard/; + my $url = 'http://www.cpan.org/'; + print redirect($url); This example shows a redirection with an absolute URLpath. This redirection is handled by the local web server. - my $url = '/CPAN/index.html'; - print redirect($url); - + my $url = '/CPAN/index.html'; + print redirect($url); But if coded directly, it could be as follows (the final "\n" is shown separately, for clarity), using either a complete URL or an absolute URLpath. - print "Location: $url\n"; # CGI response header - print "\n"; # end of headers - + print "Location: $url\n"; # CGI response header + print "\n"; # end of headers =head2 How do I put a password on my web pages? @@ -341,8 +344,8 @@ stored. Databases may be text, dbm, Berkeley DB or any database with a DBI compatible driver. HTTPD::UserAdmin supports files used by the "Basic" and "Digest" authentication schemes. Here's an example: - use HTTPD::UserAdmin (); - HTTPD::UserAdmin + use HTTPD::UserAdmin (); + HTTPD::UserAdmin ->new(DB => "/foo/.htpasswd") ->add($username => $password); @@ -357,10 +360,10 @@ See the security references listed in the CGI Meta FAQ For a quick-and-dirty solution, try this solution derived from L: - $/ = ''; - $header = ; - $header =~ s/\n\s+/ /g; # merge continuation lines - %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header ); + $/ = ''; + $header = ; + $header =~ s/\n\s+/ /g; # merge continuation lines + %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header ); That solution doesn't do well if, for example, you're trying to maintain all the Received lines. A more complete approach is to use @@ -429,14 +432,14 @@ following will match valid RFC-2822 addresses that do not have comments, folding whitespace, or any other obsolete or non-essential elements. This I matches the address itself: - my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+}; - my $dot_atom = qr{$atom(?:\.$atom)*}; - my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"}; - my $local = qr{(?:$dot_atom|$quoted)}; - my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]}; - my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]}; - my $domain = qr{(?:$dot_atom|$domain_lit)}; - my $addr_spec = qr{$local\@$domain}; + my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+}; + my $dot_atom = qr{$atom(?:\.$atom)*}; + my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"}; + my $local = qr{(?:$dot_atom|$quoted)}; + my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]}; + my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]}; + my $domain = qr{(?:$dot_atom|$domain_lit)}; + my $addr_spec = qr{$local\@$domain}; Just match an address against C to see if it follows the RFC2822 specification. However, because it is impossible to be @@ -464,8 +467,8 @@ with the characters reversed, one added or subtracted to each digit, etc. The MIME-Base64 package (available from CPAN) handles this as well as the MIME/QP encoding. Decoding BASE64 becomes as simple as: - use MIME::Base64; - $decoded = decode_base64($encoded); + use MIME::Base64; + $decoded = decode_base64($encoded); The MIME-Tools package (available from CPAN) supports extraction with decoding of BASE64 encoded attachments and content directly from email @@ -475,10 +478,10 @@ If the string to decode is short (less than 84 bytes long) a more direct approach is to use the unpack() function's "u" format after minor transliterations: - tr#A-Za-z0-9+/##cd; # remove non-base64 chars - tr#A-Za-z0-9+/# -_#; # convert to uuencoded format - $len = pack("c", 32 + 0.75*length); # compute length byte - print unpack("u", $len . $_); # uudecode and print + tr#A-Za-z0-9+/##cd; # remove non-base64 chars + tr#A-Za-z0-9+/# -_#; # convert to uuencoded format + $len = pack("c", 32 + 0.75*length); # compute length byte + print unpack("u", $len . $_); # uudecode and print =head2 How do I return the user's mail address? @@ -486,8 +489,8 @@ On systems that support getpwuid, the $< variable, and the Sys::Hostname module (which is part of the standard perl distribution), you can probably try using something like this: - use Sys::Hostname; - $address = sprintf('%s@%s', scalar getpwuid($<), hostname); + use Sys::Hostname; + $address = sprintf('%s@%s', scalar getpwuid($<), hostname); Company policies on mail address can mean that this generates addresses that the company's mail system will not accept, so you should ask for @@ -504,17 +507,17 @@ Again, the best way is often just to ask the user. Use the C program directly: - open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq") - or die "Can't fork for sendmail: $!\n"; - print SENDMAIL <<"EOF"; - From: User Originating Mail - To: Final Destination - Subject: A relevant subject line + open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq") + or die "Can't fork for sendmail: $!\n"; + print SENDMAIL <<"EOF"; + From: User Originating Mail + To: Final Destination + Subject: A relevant subject line - Body of the message goes here after the blank line - in as many lines as you like. - EOF - close(SENDMAIL) or warn "sendmail didn't close nicely"; + Body of the message goes here after the blank line + in as many lines as you like. + EOF + close(SENDMAIL) or warn "sendmail didn't close nicely"; The B<-oi> option prevents sendmail from interpreting a line consisting of a single dot as "end of message". The B<-t> option says to use the @@ -530,16 +533,16 @@ probably sendmail. Or you might be able use the CPAN module Mail::Mailer: - use Mail::Mailer; + use Mail::Mailer; - $mailer = Mail::Mailer->new(); - $mailer->open({ From => $from_address, - To => $to_address, - Subject => $subject, - }) - or die "Can't open: $!\n"; - print $mailer $body; - $mailer->close(); + $mailer = Mail::Mailer->new(); + $mailer->open({ From => $from_address, + To => $to_address, + Subject => $subject, + }) + or die "Can't open: $!\n"; + print $mailer $body; + $mailer->close(); The Mail::Internet module uses Net::SMTP which is less Unix-centric than Mail::Mailer, but less reliable. Avoid raw SMTP commands. There @@ -551,31 +554,31 @@ include queuing, MX records, and security. This answer is extracted directly from the MIME::Lite documentation. Create a multipart message (i.e., one with attachments). - use MIME::Lite; + use MIME::Lite; - ### Create a new multipart message: - $msg = MIME::Lite->new( - From =>'me@myhost.com', - To =>'you@yourhost.com', - Cc =>'some@other.com, some@more.com', - Subject =>'A message with 2 parts...', - Type =>'multipart/mixed' - ); + ### Create a new multipart message: + $msg = MIME::Lite->new( + From =>'me@myhost.com', + To =>'you@yourhost.com', + Cc =>'some@other.com, some@more.com', + Subject =>'A message with 2 parts...', + Type =>'multipart/mixed' + ); - ### Add parts (each "attach" has same arguments as "new"): - $msg->attach(Type =>'TEXT', - Data =>"Here's the GIF file you wanted" - ); - $msg->attach(Type =>'image/gif', - Path =>'aaa000123.gif', - Filename =>'logo.gif' - ); + ### Add parts (each "attach" has same arguments as "new"): + $msg->attach(Type =>'TEXT', + Data =>"Here's the GIF file you wanted" + ); + $msg->attach(Type =>'image/gif', + Path =>'aaa000123.gif', + Filename =>'logo.gif' + ); - $text = $msg->as_string; + $text = $msg->as_string; MIME::Lite also includes a method for sending these things. - $msg->send; + $msg->send; This defaults to using L but can be customized to use SMTP via L. @@ -587,30 +590,30 @@ MailFolder package) or the Mail::Internet module from CPAN (part of the MailTools package), often a module is overkill. Here's a mail sorter. - #!/usr/bin/perl - - my(@msgs, @sub); - my $msgno = -1; - $/ = ''; # paragraph reads - while (<>) { - if (/^From /m) { - /^Subject:\s*(?:Re:\s*)*(.*)/mi; - $sub[++$msgno] = lc($1) || ''; - } - $msgs[$msgno] .= $_; - } - for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) { - print $msgs[$i]; - } + #!/usr/bin/perl + + my(@msgs, @sub); + my $msgno = -1; + $/ = ''; # paragraph reads + while (<>) { + if (/^From /m) { + /^Subject:\s*(?:Re:\s*)*(.*)/mi; + $sub[++$msgno] = lc($1) || ''; + } + $msgs[$msgno] .= $_; + } + for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) { + print $msgs[$i]; + } Or more succinctly, - #!/usr/bin/perl -n00 - # bysub2 - awkish sort-by-subject - BEGIN { $msgno = -1 } - $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m; - $msg[$msgno] .= $_; - END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] } + #!/usr/bin/perl -n00 + # bysub2 - awkish sort-by-subject + BEGIN { $msgno = -1 } + $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m; + $msg[$msgno] .= $_; + END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] } =head2 How do I find out my hostname, domainname, or IP address? X function from the module, which also comes with perl. - use Socket; + use Socket; - my $address = inet_ntoa( - scalar gethostbyname( $host || 'localhost' ) - ); + my $address = inet_ntoa( + scalar gethostbyname( $host || 'localhost' ) + ); =head2 How do I fetch a news article or the active newsgroups? Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. This can make tasks like fetching the newsgroup list as simple as - perl -MNews::NNTPClient - -e 'print News::NNTPClient->new->list("newsgroups")' + perl -MNews::NNTPClient + -e 'print News::NNTPClient->new->list("newsgroups")' =head2 How do I fetch/put an FTP file? @@ -666,15 +669,15 @@ http://search.cpan.org/search?query=RPC&mode=all ). =head1 REVISION -Revision: $Revision: 8539 $ +Revision: $Revision$ -Date: $Date: 2007-01-11 00:07:14 +0100 (Thu, 11 Jan 2007) $ +Date: $Date$ See L for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it