From: Gurusamy Sarathy Date: Mon, 24 May 1999 06:26:48 +0000 (+0000) Subject: perlfaq update from Tom Christiansen X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=d92eb7b0e84a41728b3fbb642691f159dbe28882;p=p5sagit%2Fp5-mst-13.2.git perlfaq update from Tom Christiansen p4raw-id: //depot/perl@3459 --- diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod index cb35493..56cf3d7 100644 --- a/pod/perlfaq.pod +++ b/pod/perlfaq.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq - frequently asked questions about Perl ($Date: 1999/01/08 05:54:52 $) +perlfaq - frequently asked questions about Perl ($Date: 1999/05/23 20:38:02 $) =head1 DESCRIPTION @@ -199,6 +199,8 @@ miscellaneous data issues. =item * How do I find the week-of-the-year/day-of-the-year? +=item * How do I find the current century or millennium? + =item * How can I compare two dates and find the difference? =item * How can I take a string and turn it into epoch seconds? @@ -254,7 +256,7 @@ miscellaneous data issues. =item * What is the difference between $array[1] and @array[1]? -=item * How can I extract just the unique elements of an array? +=item * How can I remove duplicate elements from a list or array? =item * How can I tell whether a list or array contains a certain element? @@ -381,6 +383,8 @@ I/O and the "f" issues: filehandles, flushing, formats and footers. =item * How do I print to more than one file at once? +=item * How can I read in an entire file all at once? + =item * How can I read in a file by paragraphs? =item * How can I read a single character from a file? From the keyboard? @@ -426,7 +430,7 @@ Pattern matching and regular expressions. =item * How can I match a locale-smart version of C? -=item * How can I quote a variable to use in a regexp? +=item * How can I quote a variable to use in a regex? =item * What is C really for? @@ -434,7 +438,7 @@ Pattern matching and regular expressions. =item * Can I use Perl regular expressions to match balanced text? -=item * What does it mean that regexps are greedy? How can I get around it? +=item * What does it mean that regexes are greedy? How can I get around it? =item * How do I process each word on each line? @@ -450,7 +454,7 @@ Pattern matching and regular expressions. =item * What good is C<\G> in a regular expression? -=item * Are Perl regexps DFAs or NFAs? Are they POSIX compliant? +=item * Are Perl regexes DFAs or NFAs? Are they POSIX compliant? =item * What's wrong with using grep or map in a void context? @@ -470,7 +474,7 @@ other sections. =item * Can I get a BNF/yacc/RE for the Perl language? -=item * What are all these $@%* punctuation signs, and how do I know when to use them? +=item * What are all these $@%&* punctuation signs, and how do I know when to use them? =item * Do I always/never have to quote my strings or use semicolons and commas? @@ -494,7 +498,7 @@ other sections. =item * What is variable suicide and how can I prevent it? -=item * How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}? +=item * How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}? =item * How do I create a static variable? @@ -522,6 +526,8 @@ other sections. =item * How do I clear a package? +=item * How can I use a variable as a variable name? + =back @@ -620,7 +626,7 @@ Interprocess communication (IPC), control over the user-interface =item * How do I open a file without blocking? -=item * How do I install a CPAN module? +=item * How do I install a module from CPAN? =item * What's the difference between require and use? @@ -758,6 +764,15 @@ in respect of this information or its use. =over 4 +=item 23/May/99 + +Extensive updates from the net in preparation for 5.006 release. + +=item 13/April/99 + +More minor touch-ups. Added new question at the end +of perlfaq7 on variable names within variables. + =item 7/January/99 Small touchups here and there. Added all questions in this @@ -816,4 +831,3 @@ This is the initial release of version 3 of the FAQ; consequently there have been no changes since its initial release. =back - diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod index d4cac42..7566bf5 100644 --- a/pod/perlfaq1.pod +++ b/pod/perlfaq1.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq1 - General Questions About Perl ($Revision: 1.20 $, $Date: 1999/01/08 04:22:09 $) +perlfaq1 - General Questions About Perl ($Revision: 1.23 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -56,13 +56,13 @@ You should definitely use version 5. Version 4 is old, limited, and no longer maintained; its last patch (4.036) was in 1992, long ago and far away. Sure, it's stable, but so is anything that's dead; in fact, perl4 had been called a dead, flea-bitten camel carcass. The most recent -production release is 5.005_02 (although 5.004_04 is still supported). -The most cutting-edge development release is 5.005_54. Further references +production release is 5.005_03 (although 5.004_05 is still supported). +The most cutting-edge development release is 5.005_57. Further references to the Perl language in this document refer to the production release -unless otherwise specified. There may be one or more official bug -fixes for 5.005_02 by the time you read this, and also perhaps some -experimental versions on the way to the next release. All releases -prior to 5.004 were subject to buffer overruns, a grave security issue. +unless otherwise specified. There may be one or more official bug fixes +by the time you read this, and also perhaps some experimental versions +on the way to the next release. All releases prior to 5.004 were subject +to buffer overruns, a grave security issue. =head2 What are perl4 and perl5? @@ -96,7 +96,7 @@ found in release 5. Written in nominally portable C++, Topaz hopes to maintain 100% source-compatibility with previous releases of Perl but to run significantly faster and smaller. The Topaz team hopes to provide an XS compatibility interface to allow most XS modules to work unchanged, -albeit perhaps without the efficiency that the new interface uowld allow. +albeit perhaps without the efficiency that the new interface would allow. New features in Topaz are as yet undetermined, and will be addressed once compatibility and performance goals are met. @@ -309,11 +309,11 @@ as soon as possible. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is +of Perl or of its documentation (printed or otherwise), this works is covered under Perl's Artistic Licence. For separate distributions of all or part of this FAQ outside of that, see L. @@ -322,4 +322,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod index 32970af..26865c7 100644 --- a/pod/perlfaq2.pod +++ b/pod/perlfaq2.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.30 $, $Date: 1998/12/29 19:43:32 $) +perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.31 $, $Date: 1999/04/14 03:46:19 $) =head1 DESCRIPTION @@ -45,8 +45,10 @@ Some URLs that might help you are: http://www.perl.com/latest/ http://www.perl.com/CPAN/ports/ -If you want information on proprietary systems. A simple installation -guide for MS-DOS is available at http://www.cs.ruu.nl/~piet/perl5dos.html +Someone looking for a Perl for Win16 might look to LMOLNAR's djgpp +port in http://www.perl.com/CPAN/ports/msdos/ , which comes with clear +installation instructions. A simple installation guide for MS-DOS using +IlyaZ's OS/2 port is available at http://www.cs.ruu.nl/~piet/perl5dos.html and similarly for Windows 3.1 at http://www.cs.ruu.nl/~piet/perlwin3.html . =head2 I don't have a C compiler on my system. How can I compile perl? @@ -364,7 +366,7 @@ let perlfaq-suggestions@perl.com know. =head2 Where can I buy a commercial version of Perl? -In a real sense, Perl already I commercial software: It has a licence +In a real sense, Perl already I commercial software: It has a license that you can grab and carefully read to your manager. It is distributed in releases and comes in well-defined packages. There is a very large user community and an extensive literature. The comp.lang.perl.* @@ -379,7 +381,7 @@ purchase order from a company whom they can sue should anything go awry. Or maybe they need very serious hand-holding and contractual obligations. Shrink-wrapped CDs with perl on them are available from several sources if that will help. For example, many perl books carry a perl distribution -on them, as do the O'Reily Perl Resource Kits (in both the Unix flavor +on them, as do the O'Reilly Perl Resource Kits (in both the Unix flavor and in the proprietary Microsoft flavor); the free Unix distributions also all come with Perl. @@ -447,8 +449,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L. Irrespective of its distribution, all code examples here are public @@ -456,4 +458,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod index a811c3c..4e56a54 100644 --- a/pod/perlfaq3.pod +++ b/pod/perlfaq3.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq3 - Programming Tools ($Revision: 1.33 $, $Date: 1998/12/29 20:12:12 $) +perlfaq3 - Programming Tools ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -19,7 +19,7 @@ Have you read the appropriate man pages? Here's a brief index: Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlmodlib, perlsub - Regexps perlre, perlfunc, perlop, perllocale + Regexes perlre, perlfunc, perlop, perllocale Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html @@ -130,7 +130,7 @@ can provide significant assistance. Tom swears by the following settings in vi and its clones: set ai sw=4 - map ^O {^M}^[O^T + map! ^O {^M}^[O^T Now put that in your F<.exrc> file (replacing the caret characters with control characters) and away you go. In insert mode, ^T is @@ -147,31 +147,42 @@ results are not particularly satisfying for sophisticated code. The a2ps at http://www.infres.enst.fr/~demaille/a2ps/ does lots of things related to generating nicely printed output of documents. -=head2 Is there a etags/ctags for perl? +=head2 Is there a ctags for Perl? -With respect to the source code for the Perl interpreter, yes. -There has been support for etags in the source for a long time. -Ctags was introduced in v5.005_54 (and probably 5.005_03). -After building perl, type 'make etags' or 'make ctags' and both -sets of tag files will be built. - -Now, if you're looking to build a tag file for perl code, then there's -a simple one at +There's a simple one at http://www.perl.com/CPAN/authors/id/TOMC/scripts/ptags.gz which may do the trick. And if not, it's easy to hack into what you want. =head2 Is there an IDE or Windows Perl Editor? -If you're on Unix, you already have an IDE -- Unix itself. -You just have to learn the toolbox. If you're not, then you -probably don't have a toolbox, so may need something else. - -PerlBuilder (XXX URL to follow) is an integrated development -environment for Windows that supports Perl development. Perl programs -are just plain text, though, so you could download emacs for Windows -(XXX) or vim for win32 (http://www.cs.vu.nl/~tmgil/vi.html). If -you're transferring Windows files to Unix, be sure to transfer in -ASCII mode so the ends of lines are appropriately converted. +If you're on Unix, you already have an IDE -- Unix itself. This powerful +IDE derives from its interoperability, flexibility, and configurability. +If you really want to get a feel for Unix-qua-IDE, the best thing to do +is to find some high-powered programmer whose native language is Unix. +Find someone who has been at this for many years, and just sit back +and watch them at work. They have created their own IDE, one that +suits their own tastes and aptitudes. Quietly observe them edit files, +move them around, compile them, debug them, test them, etc. The entire +development *is* integrated, like a top-of-the-line German sports car: +functional, powerful, and elegant. You will be absolutely astonished +at the speed and ease exhibited by the native speaker of Unix in his +home territory. The art and skill of a virtuoso can only be seen to be +believed. That is the path to mastery -- all these cobbled little IDEs +are expensive toys designed to sell a flashy demo using cheap tricks, +and being optimized for immediate but shallow understanding rather than +enduring use, are but a dim palimpsest of real tools. + +In short, you just have to learn the toolbox. However, if you're not +on Unix, then your vendor probably didn't bother to provide you with +a proper toolbox on the so-called complete system that you forked out +your hard-earned cash on. + +PerlBuilder (XXX URL to follow) is an integrated development environment +for Windows that supports Perl development. Perl programs are just plain +text, though, so you could download emacs for Windows (???) or a vi clone +(vim) which runs on for win32 (http://www.cs.vu.nl/~tmgil/vi.html). +If you're transferring Windows files to Unix, be sure to transfer in +ASCII mode so the ends of lines are appropriately mangled. =head2 Where can I get Perl macros for vi? @@ -192,7 +203,7 @@ which contains a cperl-mode that color-codes keywords, provides context-sensitive help, and other nifty things. Note that the perl-mode of emacs will have fits with C<"main'foo"> -(single quote), and mess up the indentation and hilighting. You +(single quote), and mess up the indentation and highlighting. You are probably using C<"main::foo"> in new Perl code anyway, so this shouldn't be an issue. @@ -368,7 +379,7 @@ care. See http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/ . A non-free, commercial product, ``The Velocity Engine for Perl'', -(http://www.binevolve.com/ or +(http://www.binevolve.com/ or http://www.binevolve.com/bine/vep) might also be worth looking at. It will allow you to increase the performance of your perl scripts, upto 25 times faster than normal CGI perl by running in persistent perl mode, or 4 to 5 times faster without any @@ -404,12 +415,12 @@ your code, but none can definitively conceal it (this is true of every language, not just Perl). If you're concerned about people profiting from your code, then the -bottom line is that nothing but a restrictive licence will give you +bottom line is that nothing but a restrictive license will give you legal security. License your software and pepper it with threatening statements like ``This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah.'' We are not lawyers, of course, so you should see a lawyer if -you want to be sure your licence's wording will stand up in court. +you want to be sure your license's wording will stand up in court. =head2 How can I compile my Perl program into byte code or C? @@ -435,7 +446,7 @@ because as currently written, all programs are prepared for a full eval() statement. You can tremendously reduce this cost by building a shared I library and linking against that. See the F podfile in the perl source distribution for details. If -you link your main perl binary with this, it will make it miniscule. +you link your main perl binary with this, it will make it minuscule. For example, on one author's system, F is only 11k in size! @@ -470,13 +481,12 @@ F file in the source distribution for more information). The Win95/NT installation, when using the ActiveState port of Perl, will modify the Registry to associate the C<.pl> extension with the -perl interpreter. If you install another port (Gurusamy Sarathy's is -the recommended Win95/NT port), or (eventually) build your own -Win95/NT Perl using a Windows port of gcc (e.g., with cygwin32 or -mingw32), then you'll have to modify the Registry yourself. In -addition to associating C<.pl> with the interpreter, NT people can -use: C to let them run the program -C merely by typing C. +perl interpreter. If you install another port, perhaps even building +your own Win95/NT Perl from the standard sources by using a Windows port +of gcc (e.g., with cygwin32 or mingw32), then you'll have to modify +the Registry yourself. In addition to associating C<.pl> with the +interpreter, NT people can use: C to let them +run the program C merely by typing C. Macintosh perl scripts will have the appropriate Creator and Type, so that double-clicking them will invoke the perl application. @@ -570,7 +580,7 @@ when it runs fine on the command line'', see these sources: http://www.boutell.com/faq/ CGI FAQ - http://www.webthing.com/tutorials/cgifaq.html + http://www.webthing.com/page.cgi/cgifaq HTTP Spec http://www.w3.org/pub/WWW/Protocols/HTTP/ @@ -585,7 +595,6 @@ when it runs fine on the command line'', see these sources: CGI Security FAQ http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt -Also take a look at L =head2 Where can I learn about object-oriented Perl programming? @@ -641,8 +650,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L. Irrespective of its distribution, all code examples here are public @@ -650,4 +659,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 92aee2c..700c42a 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.40 $, $Date: 1999/01/08 04:26:39 $) +perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $) =head1 DESCRIPTION @@ -104,14 +104,21 @@ are not guaranteed. =head2 How do I convert bits into ints? To turn a string of 1s and 0s like C<10110110> into a scalar containing -its binary value, use the pack() function (documented in -L): +its binary value, use the pack() and unpack() functions (documented in +L): - $decimal = pack('B8', '10110110'); + $decimal = unpack('c', pack('B8', '10110110')); + +This packs the string C<10110110> into an eight bit binary structure. +This is then unpack as a character, which returns its ordinal value. + +This does the same thing: + + $decimal = ord(pack('B8', '10110110')); Here's an example of going the other way: - $binary_string = join('', unpack('B*', "\x29")); + $binary_string = unpack('B*', "\x29"); =head2 Why doesn't & work the way I want it to? @@ -228,12 +235,34 @@ American businesses often consider the first week with a Monday in it to be Work Week #1, despite ISO 8601, which considers WW1 to be the first week with a Thursday in it. +=head2 How do I find the current century or millennium? + +Use the following simple functions: + + sub get_century { + return int((((localtime(shift || time))[5] + 1999))/100); + } + sub get_millennium { + return 1+int((((localtime(shift || time))[5] + 1899))/1000); + } + +On some systems, you'll find that the POSIX module's strftime() function +has been extended in a non-standard way to use a C<%C> format, which they +sometimes claim is the "century". It isn't, because on most such systems, +this is only the first two digits of the four-digit year, and thus cannot +be used to reliably determine the current century or millennium. + =head2 How can I compare two dates and find the difference? If you're storing your dates as epoch seconds then simply subtract one from the other. If you've got a structured date (distinct year, day, -month, hour, minute, seconds values) then use one of the Date::Manip -and Date::Calc modules from CPAN. +month, hour, minute, seconds values), then for reasons of accessibility, +simplicity, and efficiency, merely use either timelocal or timegm (from +the Time::Local module in the standard distribution) to reduce structured +dates to epoch seconds. However, if you don't know the precise format of +your dates, then you should probably use either of the Date::Manip and +Date::Calc modules from CPAN before you go hacking up your own parsing +routine to handle arbitrary date formats. =head2 How can I take a string and turn it into epoch seconds? @@ -244,22 +273,83 @@ and Date::Manip modules from CPAN. =head2 How can I find the Julian Day? -Neither Date::Manip nor Date::Calc deal with Julian days. Instead, -there is an example of Julian date calculation that should help you in -Time::JulianDay (part of the Time-modules bundle) which can be found at -http://www.perl.com/CPAN/modules/by-module/Time/. +You could use Date::Calc's Delta_Days function and calculate the number +of days from there. Assuming that's what you really want, that is. + +Before you immerse yourself too deeply in this, be sure to verify that it +is the I Day you really want. Are they really just interested in +a way of getting serial days so that they can do date arithmetic? If you +are interested in performing date arithmetic, this can be done using +either Date::Manip or Date::Calc, without converting to Julian Day first. + +There is too much confusion on this issue to cover in this FAQ, but the +term is applied (correctly) to a calendar now supplanted by the Gregorian +Calendar, with the Julian Calendar failing to adjust properly for leap +years on centennial years (among other annoyances). The term is also used +(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days +since a particular starting time or `epoch', usually 1970 in the Unix +world and 1980 in the MS-DOS/Windows world. If you find that it is not +the first meaning that you really want, then check out the Date::Manip +and Date::Calc modules. (Thanks to David Cassell for most of this text.) +There is also an example of Julian date calculation that should help you in +http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz =head2 How do I find yesterday's date? The C function returns the current time in seconds since the -epoch. Take one day off that: +epoch. Take twenty-four hours off that: $yesterday = time() - ( 24 * 60 * 60 ); Then you can pass this to C and get the individual year, month, day, hour, minute, seconds values. +Note very carefully that the code above assumes that your days are +twenty-four hours each. For most people, there are two days a year +when they aren't: the switch to and from summer time throws this off. +A solution to this issue is offered by Russ Allbery. + + sub yesterday { + my $now = defined $_[0] ? $_[0] : time; + my $then = $now - 60 * 60 * 24; + my $ndst = (localtime $now)[8] > 0; + my $tdst = (localtime $then)[8] > 0; + $then - ($tdst - $ndst) * 60 * 60; + } + # Should give you "this time yesterday" in seconds since epoch relative to + # the first argument or the current time if no argument is given and + # suitable for passing to localtime or whatever else you need to do with + # it. $ndst is whether we're currently in daylight savings time; $tdst is + # whether the point 24 hours ago was in daylight savings time. If $tdst + # and $ndst are the same, a boundary wasn't crossed, and the correction + # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more + # from yesterday's time since we gained an extra hour while going off + # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a + # negative hour (add an hour) to yesterday's time since we lost an hour. + # + # All of this is because during those days when one switches off or onto + # DST, a "day" isn't 24 hours long; it's either 23 or 25. + # + # The explicit settings of $ndst and $tdst are necessary because localtime + # only says it returns the system tm struct, and the system tm struct at + # least on Solaris doesn't guarantee any particuliar positive value (like, + # say, 1) for isdst, just a positive value. And that value can + # potentially be negative, if DST information isn't available (this sub + # just treats those cases like no DST). + # + # Note that between 2am and 3am on the day after the time zone switches + # off daylight savings time, the exact hour of "yesterday" corresponding + # to the current hour is not clearly defined. Note also that if used + # between 2am and 3am the day after the change to daylight savings time, + # the result will be between 3am and 4am of the previous day; it's + # arguable whether this is correct. + # + # This sub does not attempt to deal with leap seconds (most things don't). + # + # Copyright relinquished 1999 by Russ Allbery + # This code is in the public domain + =head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant? Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is @@ -312,7 +402,11 @@ This won't expand C<"\n"> or C<"\t"> or any other special escapes. To turn C<"abbcccd"> into C<"abccd">: - s/(.)\1/$1/g; + s/(.)\1/$1/g; # add /s to include newlines + +Here's a solution that turns "abbcccd" to "abcd": + + y///cs; # y == tr, but shorter :-) =head2 How do I expand function calls in a string? @@ -353,7 +447,7 @@ Dominus's excellent I tool at http://www.plover.com/~mjd/perl/py/ One simple destructive, inside-out approach that you might try is to pull out the smallest nesting parts one at a time: - while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) { + while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 } @@ -422,24 +516,25 @@ likely prefer: You have to keep track of N yourself. For example, let's say you want to change the fifth occurrence of C<"whoever"> or C<"whomever"> into -C<"whosoever"> or C<"whomsoever">, case insensitively. +C<"whosoever"> or C<"whomsoever">, case insensitively. These +all assume that $_ contains the string to be altered. $count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there - }igex; + }ige; In the more general case, you can use the C modifier in a C loop, keeping count of matches. $WANT = 3; $count = 0; + $_ = "One fish two fish red fish blue fish"; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; - # Warning: don't `last' out of this loop } } @@ -456,7 +551,7 @@ C function like so: $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); - print "There are $count X charcters in the string"; + print "There are $count X characters in the string"; This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a @@ -499,7 +594,7 @@ characters by placing a C pragma in your program. See L for endless details on locales. This is sometimes referred to as putting something into "title -case", but that's not quite accurate. Consdier the proper +case", but that's not quite accurate. Consider the proper capitalization of the movie I, for example. @@ -546,8 +641,8 @@ Although the simplest approach would seem to be: $string =~ s/^\s*(.*?)\s*$/$1/; -This is unnecessarily slow, destructive, and fails with embedded newlines. -It is much better faster to do this in two steps: +Not only is this unnecessarily slow and destructive, it also fails with +embedded newlines. It is much faster to do this operation in two steps: $string =~ s/^\s+//; $string =~ s/\s+$//; @@ -562,7 +657,7 @@ Or more nicely written as: This idiom takes advantage of the C loop's aliasing behavior to factor out common code. You can do this on several strings at once, or arrays, or even the -values of a hash if you use a slide: +values of a hash if you use a slice: # trim whitespace in the scalar, the array, # and all the values in the hash @@ -573,41 +668,48 @@ values of a hash if you use a slide: =head2 How do I pad a string with blanks or pad a number with zeroes? -(This answer contributed by Uri Guttman) +(This answer contributed by Uri Guttman, with kibitzing from +Bart Lateur.) In the following examples, C<$pad_len> is the length to which you wish -to pad the string, C<$text> or C<$num> contains the string to be -padded, and C<$pad_char> contains the padding character. You can use a -single character string constant instead of the C<$pad_char> variable -if you know what it is in advance. +to pad the string, C<$text> or C<$num> contains the string to be padded, +and C<$pad_char> contains the padding character. You can use a single +character string constant instead of the C<$pad_char> variable if you +know what it is in advance. And in the same way you can use an integer in +place of C<$pad_len> if you know the pad length in advance. -The simplest method use the C function. It can pad on the -left or right with blanks and on the left with zeroes. +The simplest method uses the C function. It can pad on the left +or right with blanks and on the left with zeroes and it will not +truncate the result. The C function can only pad strings on the +right with blanks and it will truncate the result to a maximum length of +C<$pad_len>. - # Left padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Left padding a string with blanks (no truncation): + $padded = sprintf("%${pad_len}s", $text); - # Right padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Right padding a string with blanks (no truncation): + $padded = sprintf("%-${pad_len}s", $text); - # Left padding with 0: - $padded = sprintf( "%0${pad_len}d", $num ) ; + # Left padding a number with 0 (no truncation): + $padded = sprintf("%0${pad_len}d", $num); -If you need to pad with a character other than blank or zero you can use -one of the following methods. + # Right padding a string with blanks using pack (will truncate): + $padded = pack("A$pad_len",$text); -These methods generate a pad string with the C operator and -concatenate that with the original text. +If you need to pad with a character other than blank or zero you can use +one of the following methods. They all generate a pad string with the +C operator and combine that with C<$text>. These methods do +not truncate C<$text>. -Left and right padding with any character: +Left and right padding with any character, creating a new string: - $padded = $pad_char x ( $pad_len - length( $text ) ) . $text ; - $padded = $text . $pad_char x ( $pad_len - length( $text ) ) ; + $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; + $padded = $text . $pad_char x ( $pad_len - length( $text ) ); -Or you can left or right pad $text directly: +Left and right padding with any character, modifying C<$text> directly: - $text .= $pad_char x ( $pad_len - length( $text ) ) ; - substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ) ; + substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); + $text .= $pad_char x ( $pad_len - length( $text ) ); =head2 How do I extract selected columns from a string? @@ -634,6 +736,13 @@ you can use this kind of thing: =head2 How do I find the soundex value of a string? Use the standard Text::Soundex module distributed with perl. +But before you do so, you may want to determine whether `soundex' is in +fact what you think it is. Knuth's soundex algorithm compresses words +into a small space, and so it does not necessarily distinguish between +two words which you might want to appear separately. For example, the +last names `Knuth' and `Kant' are both mapped to the soundex code K530. +If Text::Soundex does not do what you are looking for, you might want +to consider the String::Approx module available at CPAN. =head2 How can I expand variables in text strings? @@ -767,7 +876,7 @@ This works with leading special strings, dynamically determined: @@@ runops() { @@@ SAVEI32(runlevel); @@@ runlevel++; - @@@ while ( op = (*op->op_ppaddr)() ) ; + @@@ while ( op = (*op->op_ppaddr)() ); @@@ TAINT_NOT; @@@ return 0; @@@ } @@ -805,9 +914,9 @@ When you say $scalar = (2, 5, 7, 9); -you're using the comma operator in scalar context, so it evaluates the -left hand side, then evaluates and returns the left hand side. This -causes the last value to be returned: 9. +you're using the comma operator in scalar context, so it uses the scalar +comma operator. There never was a list there at all! This causes the +last value to be returned: 9. =head2 What is the difference between $array[1] and @array[1]? @@ -827,7 +936,7 @@ with The B<-w> flag will warn you about these matters. -=head2 How can I extract just the unique elements of an array? +=head2 How can I remove duplicate elements from a list or array? There are several possible ways, depending on whether the array is ordered and whether you wish to preserve the ordering. @@ -893,7 +1002,8 @@ array. This kind of an array will take up less space: @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); undef @is_tiny_prime; - for (@primes) { $is_tiny_prime[$_] = 1; } + for (@primes) { $is_tiny_prime[$_] = 1 } + # or simply @istiny_prime[@primes] = (1) x @primes; Now you check whether $is_tiny_prime[$some_number]. @@ -916,7 +1026,7 @@ or worse yet These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are -regexp characters in $whatever?). If you're only testing once, then +regex characters in $whatever?). If you're only testing once, then use: $is_there = 0; @@ -941,6 +1051,9 @@ each element is unique in a given array: push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; } +Note that this is the I, that is, all elements in +either A or in B, but not in both. Think of it as an xor operation. + =head2 How do I test whether two arrays or hashes are equal? The following code works for single-level arrays. It uses a stringwise @@ -1078,7 +1191,7 @@ Use this: fisher_yates_shuffle( \@array ); # permutes @array in place -You've probably seen shuffling algorithms that works using splice, +You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with: srand; @@ -1185,7 +1298,7 @@ that's come to be known as the Schwartzian Transform: @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } - map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data; + map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; If you need to sort on several fields, the following paradigm is useful. @@ -1311,7 +1424,19 @@ sorting the keys as shown in an earlier question. =head2 What happens if I add or remove keys from a hash while iterating over it? -Don't do that. +Don't do that. :-) + +[lwall] In Perl 4, you were not allowed to modify a hash at all while +interating over it. In Perl 5 you can delete from it, but you still +can't add to it, because that might cause a doubling of the hash table, +in which half the entries get copied up to the new top half of the +table, at which point you've totally bamboozled the interator code. +Even if the table doesn't double, there's no telling whether your new +entry will be inserted before or after the current iterator position. + +Either treasure up your changes and make them after the iterator finishes, +or use keys to fetch all the old keys at once, and iterate over the list +of keys. =head2 How do I look up a hash element by value? @@ -1327,8 +1452,13 @@ to use: $by_value{$value} = $key; } -If your hash could have repeated values, the methods above will only -find one of the associated keys. This may or may not worry you. +If your hash could have repeated values, the methods above will only find +one of the associated keys. This may or may not worry you. If it does +worry you, you can always reverse the hash into a hash of arrays instead: + + while (($key, $value) = each %by_key) { + push @{$key_list_by_value{$value}}, $key; + } =head2 How can I know how many entries are in a hash? @@ -1337,8 +1467,9 @@ take the scalar sense of the keys() function: $num_keys = scalar keys %hash; -In void context it just resets the iterator, which is faster -for tied hashes. +In void context, the keys() function just resets the iterator, which is +faster for tied hashes than would be iterating through the whole +hash, one key-value pair at a time. =head2 How do I sort a hash (optionally by value instead of key)? @@ -1467,8 +1598,8 @@ re-enter it, the hash iterator has been reset. =head2 How can I get the unique keys from two hashes? -First you extract the keys from the hashes into arrays, and then solve -the uniquifying the array problem described above. For example: +First you extract the keys from the hashes into lists, then solve +the "removing duplicates" problem described above. For example: %seen = (); for $element (keys(%foo), keys(%bar)) { @@ -1560,9 +1691,11 @@ this works fine (assuming the files are found): print "Your kernel is GNU-zip enabled!\n"; } -On some legacy systems, however, you have to play tedious games with -"text" versus "binary" files. See L, or the upcoming -L manpage. +On less elegant (read: Byzantine) systems, however, you have +to play tedious games with "text" versus "binary" files. See +L or L. Most of these ancient-thinking +systems are curses out of Microsoft, who seem to be committed to putting +the backward into backward compatibility. If you're concerned about 8-bit ASCII data, then see L. @@ -1606,10 +1739,10 @@ if you just want to say, ``Is this a float?'' sub is_numeric { defined &getnum } -Or you could check out String::Scanf which can be found at -http://www.perl.com/CPAN/modules/by-module/String/. -The POSIX module (part of the standard Perl distribution) provides -the C and C for converting strings to double +Or you could check out +http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz +instead. The POSIX module (part of the standard Perl distribution) +provides the C and C for converting strings to double and longs, respectively. =head2 How do I keep persistent data across program calls? @@ -1663,7 +1796,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I of that package require that special arrangements be made with copyright holder. @@ -1673,4 +1806,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 99c25b7..1e8252b 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.34 $, $Date: 1999/01/08 05:46:13 $) +perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -69,7 +69,7 @@ or even this: Note the bizarrely hardcoded carriage return and newline in their octal equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That the way things work in +on all platforms, including Macintosh. That's the way things work in network programming: you really should specify the exact bit pattern on the network line terminator. In practice, C<"\n\n"> often works, but this is not portable. @@ -491,8 +491,12 @@ I gives you read-write access: open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. Using "E" always clobbers or creates. -Using "E" never does either. The "+" doesn't change this. +doesn't exist. + + open(FH, "+< /path/name"); # open for update + +Using "E" always clobbers or creates. Using "E" never does +either. The "+" doesn't change this. Here are examples of many kinds of file opens. Those using sysopen() all assume @@ -606,10 +610,14 @@ For more information, see also the new L if you have it =head2 How can I reliably rename a file? -Well, usually you just use Perl's rename() function. But that may -not work everywhere, in particular, renaming files across file systems. -If your operating system supports a mv(1) program or its moral equivalent, -this works: +Well, usually you just use Perl's rename() function. But that may not +work everywhere, in particular, renaming files across file systems. +Some sub-Unix systems have broken ports that corrupt the semantics of +rename() -- for example, WinNT does this right, but Win95 and Win98 +are broken. (The last two parts are not surprising, but the first is. :-) + +If your operating system supports a proper mv(1) program or its moral +equivalent, this works: rename($old, $new) or system("mv", $old, $new); @@ -643,11 +651,25 @@ filehandle be open for writing (or appending, or read/writing). =item 3 -Some versions of flock() can't lock files over a network (e.g. on NFS -file systems), so you'd need to force the use of fcntl(2) when you -build Perl. See the flock entry of L, and the F -file in the source distribution for information on building Perl to do -this. +Some versions of flock() can't lock files over a network (e.g. on NFS file +systems), so you'd need to force the use of fcntl(2) when you build Perl. +But even this is dubious at best. See the flock entry of L, +and the F file in the source distribution for information on +building Perl to do this. + +Two potentially non-obvious but traditional flock semantics are that +it waits indefinitely until the lock is granted, and that its locks +I. Such discretionary locks are more flexible, but +offer fewer guarantees. This means that files locked with flock() may +be modified by programs that do not also use flock(). Cars that stop +for red lights get on well with each other, but not with cars that don't +stop for red lights. See the perlport manpage, your port's specific +documentation, or your system-specific local manpages for details. It's +best to assume traditional behavior if you're writing portable programs. +(But if you're not, you should as always feel perfectly free to write +for your own system's idiosyncrasies (sometimes called "features"). +Slavish adherence to portability concerns shouldn't get in the way of +your getting your job done.) For more information on file locking, see also L if you have it (new for 5.006). @@ -797,6 +819,59 @@ at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is written in Perl and offers much greater functionality than the stock version. +=head2 How can I read in an entire file all at once? + +The customary Perl approach for processing all the lines in a file is to +do so one line at a time: + + open (INPUT, $file) || die "can't open $file: $!"; + while () { + chomp; + # do something with $_ + } + close(INPUT) || die "can't close $file: $!"; + +This is tremendously more efficient than reading the entire file into +memory as an array of lines and then processing it one element at a time, +which is often -- if not almost always -- the wrong approach. Whenever +you see someone do this: + + @lines = ; + +You should think long and hard about why you need everything loaded +at once. It's just not a scalable solution. You might also find it +more fun to use the the standard DB_File module's $DB_RECNO bindings, +which allow you to tie an array to a file so that accessing an element +the array actually accesses the corresponding line in the file. + +On very rare occasion, you may have an algorithm that demands that +the entire file be in memory at once as one scalar. The simplest solution +to that is: + + $var = `cat $file`; + +Being in scalar context, you get the whole thing. In list context, +you'd get a list of all the lines: + + @lines = `cat $file`; + +This tiny but expedient solution is neat, clean, and portable to all +systems that you've bothered to install decent tools on, even if you are +a Prisoner of Bill. For those die-hards PoBs who've paid their billtax +and refuse to use the toolbox, or who like writing complicated code for +job security, you can of course read the file manually. + + { + local(*INPUT, $/); + open (INPUT, $file) || die "can't open $file: $!"; + $var = ; + } + +That temporarily undefs your record separator, and will automatically +close the file at block exit. If the file is already open, just use this: + + $var = do { local $/; }; + =head2 How can I read in a file by paragraphs? Use the C<$/> variable (see L for details). You can either @@ -1043,6 +1118,14 @@ to, you may be able to do this: $rc = syscall(&SYS_close, $fd + 0); # must force numeric die "can't sysclose $fd: $!" unless $rc == -1; +Or just use the fdopen(3S) feature of open(): + + { + local *F; + open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!"; + close F; + } + =head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work? Whoops! You just put a tab and a formfeed into that filename! @@ -1121,8 +1204,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L. Irrespective of its distribution, all code examples here are public @@ -1130,4 +1213,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 234570d..de6093a 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regexps ($Revision: 1.25 $, $Date: 1999/01/08 04:50:47 $) +perlfaq6 - Regexes ($Revision: 1.27 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -18,7 +18,7 @@ understandable. =over 4 -=item Comments Outside the Regexp +=item Comments Outside the Regex Describe what you're doing and how you're doing it, using normal Perl comments. @@ -27,9 +27,9 @@ comments. # number of characters on the rest of the line s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; -=item Comments Inside the Regexp +=item Comments Inside the Regex -The C modifier causes whitespace to be ignored in a regexp pattern +The C modifier causes whitespace to be ignored in a regex pattern (except in a character class), and also allows you to use normal comments there, too. As you can imagine, whitespace and comments help a lot. @@ -177,11 +177,46 @@ appear within a certain time. =head2 How do I substitute case insensitively on the LHS, but preserving case on the RHS? -It depends on what you mean by "preserving case". The following -script makes the substitution have the same case, letter by letter, as -the original. If the substitution has more characters than the string -being substituted, the case of the last character is used for the rest -of the substitution. +Here's a lovely Perlish solution by Larry Rosler. It exploits +properties of bitwise xor on ASCII strings. + + $_= "this is a TEsT case"; + + $old = 'test'; + $new = 'success'; + + s{(\Q$old\E} + { uc $new | (uc $1 ^ $1) . + (uc(substr $1, -1) ^ substr $1, -1) x + (length($new) - length $1) + }egi; + + print; + +And here it is as a subroutine, modelled after the above: + + sub preserve_case($$) { + my ($old, $new) = @_; + my $mask = uc $old ^ $old; + + uc $new | $mask . + substr($mask, -1) x (length($new) - length($old)) + } + + $a = "this is a TEsT case"; + $a =~ s/(test)/preserve_case($1, "success")/egi; + print "$a\n"; + +This prints: + + this is a SUcCESS case + +Just to show that C programmers can write C in any programming language, +if you prefer a more C-like solution, the following script makes the +substitution have the same case, letter by letter, as the original. +(It also happens to run about 240% slower than the Perlish solution runs.) +If the substitution has more characters than the string being substituted, +the case of the last character is used for the rest of the substitution. # Original by Nathan Torkington, massaged by Jeffrey Friedl # @@ -214,14 +249,6 @@ of the substitution. return $new; } - $a = "this is a TEsT case"; - $a =~ s/(test)/preserve_case($1, "success")/gie; - print "$a\n"; - -This prints: - - this is a SUcCESS case - =head2 How can I make C<\w> match national character sets? See L. @@ -232,41 +259,41 @@ One alphabetic character would be C, no matter what locale you're in. Non-alphabetics would be C (assuming you don't consider an underscore a letter). -=head2 How can I quote a variable to use in a regexp? +=head2 How can I quote a variable to use in a regex? The Perl parser will expand $variable and @variable references in regular expressions unless the delimiter is a single quote. Remember, too, that the right-hand side of a C substitution is considered a double-quoted string (see L for more details). Remember -also that any regexp special characters will be acted on unless you +also that any regex special characters will be acted on unless you precede the substitution with \Q. Here's an example: $string = "to die?"; $lhs = "die?"; - $rhs = "sleep no more"; + $rhs = "sleep, no more"; $string =~ s/\Q$lhs/$rhs/; # $string is now "to sleep no more" -Without the \Q, the regexp would also spuriously match "di". +Without the \Q, the regex would also spuriously match "di". =head2 What is C really for? Using a variable in a regular expression match forces a re-evaluation (and perhaps recompilation) each time through. The C modifier -locks in the regexp the first time it's used. This always happens in a +locks in the regex the first time it's used. This always happens in a constant regular expression, and in fact, the pattern was compiled into the internal format at the same time your entire program was. Use of C is irrelevant unless variable interpolation is used in -the pattern, and if so, the regexp engine will neither know nor care +the pattern, and if so, the regex engine will neither know nor care whether the variables change after the pattern is evaluated the I time. C is often used to gain an extra measure of efficiency by not performing subsequent evaluations when you know it won't matter (because you know the variables won't change), or more rarely, when -you don't want the regexp to notice if they do. +you don't want the regex to notice if they do. For example, here's a "paragrep" program: @@ -286,23 +313,66 @@ For example, this one-liner will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, -created by Jeffrey Friedl: +created by Jeffrey Friedl and later modified by Fred Curtis. $/ = undef; $_ = <>; - s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|\n+|.[^/"'\\]*)#$2#g; + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs print; This could, of course, be more legibly written with the C modifier, adding -whitespace and comments. +whitespace and comments. Here it is expanded, courtesy of Fred Curtis. + + s{ + /\* ## Start of /* ... */ comment + [^*]*\*+ ## Non-* followed by 1-or-more *'s + ( + [^/*][^*]*\*+ + )* ## 0-or-more things which don't start with / + ## but do end with '*' + / ## End of /* ... */ comment + + | ## OR various things which aren't comments: + + ( + " ## Start of " ... " string + ( + \\. ## Escaped char + | ## OR + [^"\\] ## Non "\ + )* + " ## End of " ... " string + + | ## OR + + ' ## Start of ' ... ' string + ( + \\. ## Escaped char + | ## OR + [^'\\] ## Non '\ + )* + ' ## End of ' ... ' string + + | ## OR + + . ## Anything other char + [^/"'\\]* ## Chars which doesn't start a comment, string or escape + ) + }{$2}gxs; + +A slight modification also removes C++ comments: + + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs; =head2 Can I use Perl regular expressions to match balanced text? Although Perl regular expressions are more powerful than "mathematical" regular expressions, because they feature conveniences like backreferences -(C<\1> and its ilk), they still aren't powerful enough. You still need -to use non-regexp techniques to parse balanced text, such as the text -enclosed between matching parentheses or braces, for example. +(C<\1> and its ilk), they still aren't powerful enough -- with +the possible exception of bizarre and experimental features in the +development-track releases of Perl. You still need to use non-regex +techniques to parse balanced text, such as the text enclosed between +matching parentheses or braces, for example. An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>, @@ -312,9 +382,9 @@ http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz . The C::Scan module from CPAN contains such subs for internal usage, but they are undocumented. -=head2 What does it mean that regexps are greedy? How can I get around it? +=head2 What does it mean that regexes are greedy? How can I get around it? -Most people mean that greedy regexps match as much as they can. +Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (C, C<*>, C<+>, C<{}>) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy @@ -422,7 +492,7 @@ characters. Neither is correct. C<\b> is the place between a C<\w> character and a C<\W> character (that is, C<\b> is the edge of a "word"). It's a zero-width assertion, just like C<^>, C<$>, and all the other anchors, so it doesn't consume any characters. L -describes the behaviour of all the regexp metacharacters. +describes the behavior of all the regex metacharacters. Here are examples of the incorrect application of C<\b>, with fixes: @@ -446,8 +516,8 @@ not "this" or "island". Because once Perl sees that you need one of these variables anywhere in the program, it has to provide them on each and every pattern match. The same mechanism that handles these provides for the use of $1, $2, -etc., so you pay the same price for each regexp that contains capturing -parentheses. But if you never use $&, etc., in your script, then regexps +etc., so you pay the same price for each regex that contains capturing +parentheses. But if you never use $&, etc., in your script, then regexes I capturing parentheses won't be penalized. So avoid $&, $', and $` if you can, but if you can't, once you've used them at all, use them at will because you've already paid the price. Remember that some @@ -515,7 +585,7 @@ Of course, that could have been written as But then you lose the vertical alignment of the regular expressions. -=head2 Are Perl regexps DFAs or NFAs? Are they POSIX compliant? +=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant? While it's true that Perl's regular expressions resemble the DFAs (deterministic finite automata) of the egrep(1) program, they are in @@ -620,7 +690,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I of that package require that special arrangements be made with copyright holder. @@ -630,4 +700,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod index a4ea872..070d965 100644 --- a/pod/perlfaq7.pod +++ b/pod/perlfaq7.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq7 - Perl Language Issues ($Revision: 1.24 $, $Date: 1999/01/08 05:32:11 $) +perlfaq7 - Perl Language Issues ($Revision: 1.28 $, $Date: 1999/05/23 20:36:18 $) =head1 DESCRIPTION @@ -18,19 +18,17 @@ In the words of Chaim Frenkel: "Perl's grammar can not be reduced to BNF. The work of parsing perl is distributed between yacc, the lexer, smoke and mirrors." -=head2 What are all these $@%* punctuation signs, and how do I know when to use them? +=head2 What are all these $@%&* punctuation signs, and how do I know when to use them? They are type specifiers, as detailed in L: $ for scalar values (number, string or reference) @ for arrays % for hashes (associative arrays) + & for subroutines (aka functions, procedures, methods) * for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references. -While there are a few places where you don't actually need these type -specifiers, you should always use them. - A couple of others that you're likely to encounter that aren't really type specifiers are: @@ -180,7 +178,7 @@ own module. Make sure to change the names appropriately. # if using RCS/CVS, this next line may be preferred, # but beware two-digit versions. - $VERSION = do{my@r=q$Revision: 1.24 $=~/\d+/g;sprintf '%d.'.'%02d'x$#r,@r}; + $VERSION = do{my@r=q$Revision: 1.28 $=~/\d+/g;sprintf '%d.'.'%02d'x$#r,@r}; @ISA = qw(Exporter); @EXPORT = qw(&func1 &func2 &func3); @@ -330,12 +328,13 @@ harder. Take this code: print "Finally $f\n"; The $f that has "bar" added to it three times should be a new C<$f> -(C should create a new local variable each time through the -loop). It isn't, however. This is a bug, and will be fixed. +(C should create a new local variable each time through the loop). +It isn't, however. This was a bug, now fixed in the latest releases +(tested against 5.004_05, 5.005_03, and 5.005_56). -=head2 How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}? +=head2 How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}? -With the exception of regexps, you need to pass references to these +With the exception of regexes, you need to pass references to these objects. See L for this particular question, and L for information on references. @@ -391,28 +390,42 @@ If you're planning on generating new filehandles, you could do this: $fh = openit('< /etc/motd'); print <$fh>; -=item Passing Regexps +=item Passing Regexes + +To pass regexes around, you'll need to be using a release of Perl +sufficiently recent as to support the C construct, pass around +strings and use an exception-trapping eval, or else be very, very clever. -To pass regexps around, you'll need to either use one of the highly -experimental regular expression modules from CPAN (Nick Ing-Simmons's -Regexp or Ilya Zakharevich's Devel::Regexp), pass around strings -and use an exception-trapping eval, or else be very, very clever. -Here's an example of how to pass in a string to be regexp compared: +Here's an example of how to pass in a string to be regex compared +using C: sub compare($$) { - my ($val1, $regexp) = @_; - my $retval = eval { $val =~ /$regexp/ }; + my ($val1, $regex) = @_; + my $retval = $val1 =~ /$regex/; + return $retval; + } + $match = compare("old McDonald", qr/d.*D/i); + +Notice how C allows flags at the end. That pattern was compiled +at compile time, although it was executed later. The nifty C +notation wasn't introduced until the 5.005 release. Before that, you +had to approach this problem much less intuitively. For example, here +it is again if you don't have C: + + sub compare($$) { + my ($val1, $regex) = @_; + my $retval = eval { $val1 =~ /$regex/ }; die if $@; return $retval; } - $match = compare("old McDonald", q/d.*D/); + $match = compare("old McDonald", q/($?i)d.*D/); Make sure you never say something like this: - return eval "\$val =~ /$regexp/"; # WRONG + return eval "\$val =~ /$regex/"; # WRONG -or someone can sneak shell escapes into the regexp due to the double +or someone can sneak shell escapes into the regex due to the double interpolation of the eval and the double-quoted string. For example: $pattern_of_evil = 'danger ${ system("rm -rf * &") } danger'; @@ -630,7 +643,7 @@ where they don't belong. This is explained in more depth in the L. Briefly, there's no official case statement, because of the variety of tests possible in Perl (numeric comparison, string comparison, glob comparison, -regexp matching, overloaded comparisons, ...). Larry couldn't decide +regex matching, overloaded comparisons, ...). Larry couldn't decide how best to do this, so he left it out, even though it's been on the wish list since perl1. @@ -826,6 +839,106 @@ Use this code, provided by Mark-Jason Dominus: Or, if you're using a recent release of Perl, you can just use the Symbol::delete_package() function instead. +=head2 How can I use a variable as a variable name? + +Beginners often think they want to have a variable contain the name +of a variable. + + $fred = 23; + $varname = "fred"; + ++$$varname; # $fred now 24 + +This works I, but it is a very bad idea for two reasons. + +The first reason is that they I. +That means above that if $fred is a lexical variable created with my(), +that the code won't work at all: you'll accidentally access the global +and skip right over the private lexical altogether. Global variables +are bad because they can easily collide accidentally and in general make +for non-scalable and confusing code. + +Symbolic references are forbidden under the C pragma. +They are not true references and consequently are not reference counted +or garbage collected. + +The other reason why using a variable to hold the name of another +variable a bad idea is that the question often stems from a lack of +understanding of Perl data structures, particularly hashes. By using +symbolic references, you are just using the package's symbol-table hash +(like C<%main::>) instead of a user-defined hash. The solution is to +use your own hash or a real reference instead. + + $fred = 23; + $varname = "fred"; + $USER_VARS{$varname}++; # not $$varname++ + +There we're using the %USER_VARS hash instead of symbolic references. +Sometimes this comes up in reading strings from the user with variable +references and wanting to expand them to the values of your perl +program's variables. This is also a bad idea because it conflates the +program-addressable namespace and the user-addressable one. Instead of +reading a string and expanding it to the actual contents of your program's +own variables: + + $str = 'this has a $fred and $barney in it'; + $str =~ s/(\$\w+)/$1/eeg; # need double eval + +Instead, it would be better to keep a hash around like %USER_VARS and have +variable references actually refer to entries in that hash: + + $str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all + +That's faster, cleaner, and safer than the previous approach. Of course, +you don't need to use a dollar sign. You could use your own scheme to +make it less confusing, like bracketed percent symbols, etc. + + $str = 'this has a %fred% and %barney% in it'; + $str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all + +Another reason that folks sometimes think they want a variable to contain +the name of a variable is because they don't know how to build proper +data structures using hashes. For example, let's say they wanted two +hashes in their program: %fred and %barney, and to use another scalar +variable to refer to those by name. + + $name = "fred"; + $$name{WIFE} = "wilma"; # set %fred + + $name = "barney"; + $$name{WIFE} = "betty"; # set %barney + +This is still a symbolic reference, and is still saddled with the +problems enumerated above. It would be far better to write: + + $folks{"fred"}{WIFE} = "wilma"; + $folks{"barney"}{WIFE} = "betty"; + +And just use a multilevel hash to start with. + +The only times that you absolutely I use symbolic references are +when you really must refer to the symbol table. This may be because it's +something that can't take a real reference to, such as a format name. +Doing so may also be important for method calls, since these always go +through the symbol table for resolution. + +In those cases, you would turn off C temporarily so you +can play around with the symbol table. For example: + + @colors = qw(red blue green yellow orange purple violet); + for my $name (@colors) { + no strict 'refs'; # renege for the block + *$name = sub { "@_" }; + } + +All those functions (red(), blue(), green(), etc.) appear to be separate, +but the real code in the closure actually was compiled only once. + +So, sometimes you might want to use symbolic references to directly +manipulate the symbol table. This doesn't matter for formats, handles, and +subroutines, because they are always global -- you can't use my() on them. +But for scalars, arrays, and hashes -- and usually for subroutines -- +you probably want to use hard references only. + =head1 AUTHOR AND COPYRIGHT Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. @@ -833,7 +946,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I of that package require that special arrangements be made with copyright holder. @@ -843,4 +956,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod index 9ef41af..26efa3f 100644 --- a/pod/perlfaq8.pod +++ b/pod/perlfaq8.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq8 - System Interaction ($Revision: 1.36 $, $Date: 1999/01/08 05:36:34 $) +perlfaq8 - System Interaction ($Revision: 1.39 $, $Date: 1999/05/23 18:37:57 $) =head1 DESCRIPTION @@ -15,8 +15,9 @@ contain more detailed information on the vagaries of your perl. =head2 How do I find out which operating system I'm running under? -The $^O variable ($OSNAME if you use English) contains the operating -system that your perl binary was built for. +The $^O variable ($OSNAME if you use English) contains an indication of +the name of the operating system (not its release number) that your perl +binary was built for. =head2 How come exec() doesn't return? @@ -74,7 +75,7 @@ Or like this: =head2 How do I read just one key without waiting for a return key? Controlling input buffering is a remarkably system-dependent matter. -If most systems, you can just use the B command as shown in +On many systems, you can just use the B command as shown in L, but as you see, that's already getting you into portability snags. @@ -167,7 +168,7 @@ not to block: =head2 How do I clear the screen? -If you only have to so infrequently, use C: +If you only have do so infrequently, use C: system("clear"); @@ -421,7 +422,7 @@ properly, the getpw*() functions described in L should in theory provide (read-only) access to entries in the shadow password file. To change the file, make a new shadow password file (the format varies from system to system - see L for specifics) and use -pwd_mkdb(8) to install it (see L for more details). +pwd_mkdb(8) to install it (see L for more details). =head2 How do I set the time and date? @@ -461,7 +462,7 @@ something like this: $done = $start = pack($TIMEVAL_T, ()); - syscall( &SYS_gettimeofday, $start, 0) != -1 + syscall(&SYS_gettimeofday, $start, 0) != -1 or die "gettimeofday: $!"; ########################## @@ -699,7 +700,7 @@ case the fork()/exec() description still applies. Strictly speaking, nothing. Stylistically speaking, it's not a good way to write maintainable code because backticks have a (potentially -humungous) return value, and you're ignoring it. It's may also not be very +humongous) return value, and you're ignoring it. It's may also not be very efficient, because you have to read in all the lines of output, allocate memory for them, and then throw it away. Too often people are lulled to writing: @@ -725,7 +726,7 @@ In most cases, this could and probably should be written as system("cat /etc/termcap") == 0 or die "cat program failed!"; -Which will get the output quickly (as its generated, instead of only +Which will get the output quickly (as it is generated, instead of only at the end) and also check the return value. system() also provides direct control over whether shell wildcard @@ -751,8 +752,14 @@ You have to do this: } Just as with system(), no shell escapes happen when you exec() a list. +Further examples of this can be found in L. -There are more examples of this L. +Note that if you're stuck on Microsoft, no solution to this vexing issue +is even possible. Even if Perl were to emulate fork(), you'd still +be hosed, because Microsoft gives no argc/argv-style API. Their API +always reparses from a single string, which is fundamentally wrong, +but you're not likely to get the Gods of Redmond to acknowledge this +and fix it for you. =head2 Why can't my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)? @@ -970,12 +977,15 @@ sysopen(): sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644) or die "can't open /tmp/somefile: $!": -=head2 How do I install a CPAN module? + -The easiest way is to have the CPAN module do it for you. This module -comes with perl version 5.004 and later. To manually install the CPAN -module, or any well-behaved CPAN module for that matter, follow these -steps: + +=head2 How do I install a module from CPAN? + +The easiest way is to have a module also named CPAN do it for you. +This module comes with perl version 5.004 and later. To manually install +the CPAN module, or any well-behaved CPAN module for that matter, follow +these steps: =over 4 @@ -1085,7 +1095,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I of that package require that special arrangements be made with copyright holder. @@ -1095,4 +1105,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index 6536064..91d432e 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq9 - Networking ($Revision: 1.24 $, $Date: 1999/01/08 05:39:48 $) +perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -20,7 +20,7 @@ may not be so well received. The useful FAQs and related documents are: CGI FAQ - http://www.webthing.com/tutorials/cgifaq.html + http://www.webthing.com/page.cgi/cgifaq Web FAQ http://www.boutell.com/faq/ @@ -100,7 +100,7 @@ a solution: A > B - A > B @@ -131,12 +131,11 @@ A quick but imperfect approach is }gsix; This version does not adjust relative URLs, understand alternate -bases, deal with HTML comments, deal with HREF and NAME attributes in -the same tag, or accept URLs themselves as arguments. It also runs -about 100x faster than a more "complete" solution using the LWP suite -of modules, such as the -http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz -program. +bases, deal with HTML comments, deal with HREF and NAME attributes +in the same tag, understand extra qualifiers like TARGET, or accept +URLs themselves as arguments. It also runs about 100x faster than a +more "complete" solution using the LWP suite of modules, such as the +http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program. =head2 How do I download a file from the user's machine? How do I open a file on another machine? @@ -159,8 +158,9 @@ on your system, is this: $html_code = `lynx -source $url`; $text_data = `lynx -dump $url`; -The libwww-perl (LWP) modules from CPAN provide a more powerful way to -do this. They work through proxies, and don't require lynx: +The libwww-perl (LWP) modules from CPAN provide a more powerful way +to do this. They don't require lynx, but like lynx, can still work +through proxies: # simplest version use LWP::Simple; @@ -213,7 +213,7 @@ Here's an example of decoding: $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge; Encoding is a bit harder, because you can't just blindly change -all the non-alphanumeric characters (C<\W>) into their hex escapes. +all the non-alphanumunder character (C<\W>) into their hex escapes. It's important that characters with special meaning like C and C I be translated. Probably the easiest way to get this right is to avoid reinventing the wheel and just use the URI::Escape module, @@ -236,9 +236,21 @@ because of "optimizations" that servers do. print "Location: $url\n\n"; exit; -To be correct to the spec, each of those C<"\n"> -should really each be C<"\015\012">, but unless you're -stuck on MacOS, you probably won't notice. +To target a particular frame in a frameset, include the "Window-target:" +in the header. + + print < + + EOF + +To be correct to the spec, each of those virtual newlines should really be +physical C<"\015\012"> sequences by the time you hit the client browser. +Except for NPH scripts, though, that local newline should get translated +by your server into standard form, so you shouldn't have a problem +here, even if you are stuck on MacOS. Everybody else probably won't +even notice. =head2 How do I put a password on my web pages? @@ -329,7 +341,7 @@ RFC-822 (the mail header standard) compliant, and addresses that aren't deliverable which are compliant. Many are tempted to try to eliminate many frequently-invalid -mail addresses with a simple regexp, such as +mail addresses with a simple regex, such as C. It's a very bad idea. However, this also throws out many valid ones, and says nothing about potential deliverability, so is not suggested. Instead, see @@ -423,7 +435,12 @@ the message into the queue. This last option means your message won't be immediately delivered, so leave it out if you want immediate delivery. -Or use the CPAN module Mail::Mailer: +Alternate, less convenient approaches include calling mail (sometimes +called mailx) directly or simply opening up port 25 have having an +intimate conversation between just you and the remote SMTP daemon, +probably sendmail. + +Or you might be able use the CPAN module Mail::Mailer: use Mail::Mailer; @@ -438,34 +455,17 @@ Or use the CPAN module Mail::Mailer: The Mail::Internet module uses Net::SMTP which is less Unix-centric than Mail::Mailer, but less reliable. Avoid raw SMTP commands. There -are many reasons to use a mail transport agent like sendmail. These +are many reasons to use a mail transport agent like sendmail. These include queueing, MX records, and security. =head2 How do I read mail? -Use the Mail::Folder module from CPAN (part of the MailFolder package) or -the Mail::Internet module from CPAN (also part of the MailTools package). - - # sending mail - use Mail::Internet; - use Mail::Header; - # say which mail host to use - $ENV{SMTPHOSTS} = 'mail.frii.com'; - # create headers - $header = new Mail::Header; - $header->add('From', 'gnat@frii.com'); - $header->add('Subject', 'Testing'); - $header->add('To', 'gnat@frii.com'); - # create body - $body = 'This is a test, ignore'; - # create mail object - $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]); - # send it - $mail->smtpsend or die; - -Often a module is overkill, though. Here's a mail sorter. - - #!/usr/bin/perl +While you could use the Mail::Folder module from CPAN (part of the +MailFolder package) or the Mail::Internet module from CPAN (also part +of the MailTools package), often a module is overkill, though. Here's a +mail sorter. + + #!/usr/bin/perl # bysub1 - simple sort by subject my(@msgs, @sub); my $msgno = -1; @@ -476,12 +476,12 @@ Often a module is overkill, though. Here's a mail sorter. $sub[++$msgno] = lc($1) || ''; } $msgs[$msgno] .= $_; - } + } for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) { print $msgs[$i]; } -Or more succinctly, +Or more succinctly, #!/usr/bin/perl -n00 # bysub2 - awkish sort-by-subject @@ -541,7 +541,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I of that package require that special arrangements be made with copyright holder. @@ -551,4 +551,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. -