From: Tom Christiansen Date: Sat, 13 Jun 1998 22:19:32 +0000 (-0600) Subject: documentation update from tchrist X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=5a964f204835a8014f4ba86fc91884cff958ac67;p=p5sagit%2Fp5-mst-13.2.git documentation update from tchrist Message-Id: <199806140419.WAA20549@chthon.perl.com> Subject: doc patches p4raw-id: //depot/perl@1132 --- diff --git a/pod/perl.pod b/pod/perl.pod index a7e02f6..4c9808f 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -265,7 +265,9 @@ Perl developers, please write to >. The B<-w> switch produces some lovely diagnostics. -See L for explanations of all Perl's diagnostics. +See L for explanations of all Perl's diagnostics. The C pragma automatically turns Perl's normally terse warnings +and errors into these longer forms. Compilation errors will tell you the line number of the error, with an indication of the next token or token type that was to be examined. diff --git a/pod/perlbook.pod b/pod/perlbook.pod index f5bf99d..0ff2cd5 100644 --- a/pod/perlbook.pod +++ b/pod/perlbook.pod @@ -11,15 +11,17 @@ web-connected, you can even mosey on over to http://www.ora.com/ for an online order form. I is a reference work that covers -nearly all of Perl, while I is a -tutorial that covers the most frequently used subset of the language, -and I is an indepth study of complex topics -including the internals of perl. You might also check out the very -handy, inexpensive, and compact I, especially -when the thought of lugging the 676-page Camel around doesn't make much -sense. I, by Jeffrey Friedl, is a -reference work that covers the art and implementation of regular -expressions in various languages including Perl. +nearly all of Perl; I is a tutorial that +covers the most frequently used subset of the language; and I is an in-depth study of complex topics including the +internals of perl. You might also check out the very handy, inexpensive, +and compact I, especially when the thought of +lugging the 676-page Camel around doesn't make much sense. I, by Jeffrey Friedl, is a reference work that covers +the art and implementation of regular expressions in various languages +including Perl. Currently published quarterly by Jon Orwant, I is the first and only periodical devoted to All Things Perl. +See http://www.tpj.com/ for information. Programming Perl, Second Edition (the Camel Book): ISBN 1-56592-149-6 (English) @@ -27,7 +29,10 @@ expressions in various languages including Perl. Learning Perl, Second Edition (the Llama Book): ISBN 1-56592-284-0 (English) - Advanced Perl Programming: + Learning Perl on Win32 Systems (the Gecko Book): + ISBN 1-56592-324-3 (English) + + Advanced Perl Programming (the Panther Book): ISBN 1-56592-220-4 (English) Perl 5 Desktop Reference (the reference card): diff --git a/pod/perldata.pod b/pod/perldata.pod index dc2975a..58c1123 100644 --- a/pod/perldata.pod +++ b/pod/perldata.pod @@ -22,15 +22,15 @@ that's deprecated); all but the last are interpreted as names of packages, to locate the namespace in which to look up the final identifier (see L for details). It's possible to substitute for a simple identifier an expression -which produces a reference to the value at runtime; this is +that produces a reference to the value at runtime; this is described in more detail below, and in L. There are also special variables whose names don't follow these rules, so that they don't accidentally collide with one of your -normal variables. Strings which match parenthesized parts of a +normal variables. Strings that match parenthesized parts of a regular expression are saved under names containing only digits after the C<$> (see L and L). In addition, several special -variables which provide windows into the inner working of Perl have names +variables that provide windows into the inner working of Perl have names containing punctuation characters (see L). Scalar values are always named with '$', even when referring to a scalar @@ -81,7 +81,7 @@ that returns a reference to an object of that type. For a description of this, see L. Names that start with a digit may contain only more digits. Names -which do not start with a letter, underscore, or digit are limited to +that do not start with a letter, underscore, or digit are limited to one character, e.g., C<$%> or C<$$>. (Most of these one character names have a predefined significance to Perl. For instance, C<$$> is the current process id.) @@ -171,13 +171,15 @@ numbers count as 0, just as they do in B: That's usually preferable because otherwise you won't treat IEEE notations like C or C properly. At other times you might prefer to -use a regular expression to check whether data is numeric. See L -for details on regular expressions. +use the POSIX::strtod function or a regular expression to check whether +data is numeric. See L for details on regular expressions. warn "has nondigits" if /\D/; - warn "not a whole number" unless /^\d+$/; - warn "not an integer" unless /^[+-]?\d+$/ - warn "not a decimal number" unless /^[+-]?\d+\.?\d*$/ + warn "not a natural number" unless /^\d+$/; # rejects -3 + warn "not an integer" unless /^-?\d+$/; # rejects +3 + warn "not an integer" unless /^[+-]?\d+$/; + warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 + warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; warn "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/; @@ -189,7 +191,7 @@ length of the array. Shortening an array by this method destroys intervening values. Lengthening an array that was previously shortened I recovers the values that were in those elements. (It used to in Perl 4, but we had to break this to make sure destructors were -called when expected.) You can also gain some measure of efficiency by +called when expected.) You can also gain some miniscule measure of efficiency by pre-extending an array that is going to get big. (You can also extend an array by assigning to an element that is off the end of the array.) You can truncate an array down to nothing by assigning the null list () @@ -200,7 +202,8 @@ to it. The following are equivalent: If you evaluate a named array in a scalar context, it returns the length of the array. (Note that this is not true of lists, which return the -last value, like the C comma operator.) The following is always true: +last value, like the C comma operator, nor of built-in functions, which return +whatever they feel like returning.) The following is always true: scalar(@whatever) == $#whatever - $[ + 1; @@ -216,7 +219,7 @@ left to doubt: $element_count = scalar(@whatever); -If you evaluate a hash in a scalar context, it returns a value which is +If you evaluate a hash in a scalar context, it returns a value that is true if and only if the hash contains any key/value pairs. (If there are any key/value pairs, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated @@ -227,6 +230,11 @@ scalar context reveals "1/16", which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen.) +You can preallocate space for a hash by assigning to the keys() function. +This rounds up the allocated bucked to the next power of two: + + keys(%users) = 1000; # allocate 1024 buckets + =head2 Scalar value constructors Numeric literals are specified in any of the customary floating point or @@ -288,7 +296,7 @@ Three special literals are __FILE__, __LINE__, and __PACKAGE__, which represent the current filename, line number, and package name at that point in your program. They may be used only as separate tokens; they will not be interpolated into strings. If there is no current package -(due to a C directive), __PACKAGE__ is the undefined value. +(due to an empty C directive), __PACKAGE__ is the undefined value. The tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual end of file. Any following text is @@ -421,14 +429,14 @@ list literal, so that you can say: LISTs do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is evaluated in a list context, and the resulting list value is interpolated into LIST just as if each -individual element were a member of LIST. Thus arrays lose their +individual element were a member of LIST. Thus arrays and hashes lose their identity in a LIST--the list - (@foo,@bar,&SomeSub) + (@foo,@bar,&SomeSub,%glarch) contains all the elements of @foo followed by all the elements of @bar, -followed by all the elements returned by the subroutine named SomeSub when -it's called in a list context. +followed by all the elements returned by the subroutine named SomeSub +called in a list context, followed by the key/value pairs of %glarch. To make a list reference that does I interpolate, see L. The null list is represented by (). Interpolating it in a list @@ -476,7 +484,7 @@ which when assigned produces a 0, which is interpreted as FALSE. The final element may be an array or a hash: ($a, $b, @rest) = split; - local($a, $b, %rest) = @_; + my($a, $b, %rest) = @_; You can actually put an array or hash anywhere in the list, but the first one in the list will soak up all the values, and anything after it will get @@ -498,7 +506,7 @@ key/value pairs. That's why it's good to use references sometimes. It is often more readable to use the C<=E> operator between key/value pairs. The C<=E> operator is mostly just a more visually distinctive synonym for a comma, but it also arranges for its left-hand operand to be -interpreted as a string, if it's a bareword which would be a legal identifier. +interpreted as a string--if it's a bareword that would be a legal identifier. This makes it nice for initializing hashes: %map = ( @@ -535,13 +543,28 @@ Perl uses an internal type called a I to hold an entire symbol table entry. The type prefix of a typeglob is a C<*>, because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that -we have real references, this is seldom needed. It also used to be the -preferred way to pass filehandles into a function, but now -that we have the *foo{THING} notation it isn't often needed for that, -either. It is still needed to pass new filehandles into functions -(*HANDLE{IO} only works if HANDLE has already been used). +we have real references, this is seldom needed. + +The main use of typeglobs in modern Perl is create symbol table aliases. +This assignment: + + *this = *that; + +makes $this an alias for $that, @this an alias for @that, %this an alias +for %that, &this an alias for &that, etc. Much safer is to use a reference. +This: -If you need to use a typeglob to save away a filehandle, do it this way: + local *Here::blue = \$There::green; + +temporarily makes $Here::blue an alias for $There::green, but doesn't +make @Here::blue an alias for @There::green, or %Here::blue an alias for +%There::green, etc. See L for more examples +of this. Strange though this may seem, this is the basis for the whole +module import/export system. + +Another use for typeglobs is to to pass filehandles into a function or +to create new filehandles. If you need to use a typeglob to save away +a filehandle, do it this way: $fh = *STDOUT; @@ -549,18 +572,32 @@ or perhaps as a real reference, like this: $fh = \*STDOUT; -This is also a way to create a local filehandle. For example: +See L for examples of using these as indirect filehandles +in functions. + +Typeglobs are also a way to create a local filehandle using the local() +operator. These last until their block is exited, but may be passed back. +For example: sub newopen { my $path = shift; local *FH; # not my! - open (FH, $path) || return undef; + open (FH, $path) or return undef; return *FH; } $fh = newopen('/etc/passwd'); -Another way to create local filehandles is with IO::Handle and its ilk, -see the bottom of L. +Now that we have the *foo{THING} notation, typeglobs aren't used as much +for filehandle manipulations, although they're still needed to pass brand +new file and directory handles into or out of functions. That's because +*HANDLE{IO} only works if HANDLE has already been used as a handle. +In other words, *FH can be used to create new symbol table entries, +but *foo{THING} cannot. + +Another way to create anonymous filehandles is with the IO::Handle +module and its ilk. These modules have the advantage of not hiding +different types of the same name during the local(). See the bottom of +L for an example. See L, L, and L for more -discussion on typeglobs. +discussion on typeglobs and the *foo{THING} syntax. diff --git a/pod/perldsc.pod b/pod/perldsc.pod index cd689e3..d0cc335 100644 --- a/pod/perldsc.pod +++ b/pod/perldsc.pod @@ -64,8 +64,8 @@ sections on each of the following: =back -But for now, let's look at some of the general issues common to all -of these types of data structures. +But for now, let's look at general issues common to all +these types of data structures. =head1 REFERENCES @@ -461,7 +461,7 @@ types of data structures. $a cmp $b } keys %HoL ) { - print "$family: ", join(", ", sort @{ $HoL{$family}), "\n"; + print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n"; } =head1 LISTS OF HASHES @@ -614,7 +614,7 @@ types of data structures. # append new members to an existing family %new_folks = ( wife => "wilma", - pet => "dino"; + pet => "dino", ); for $what (keys %new_folks) { diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod index 2213a0f..370d9cb 100644 --- a/pod/perlfaq.pod +++ b/pod/perlfaq.pod @@ -141,7 +141,7 @@ and added mass commenting to L. Added Net::Telnet, fixed backticks, added reader/writer pair to telnet question, added FindBin, grouped module questions together in L. Expanded caveats for the simple URL extractor, gave LWP example, added CGI security -question, expanded on the email address answer in L. +question, expanded on the mail address answer in L. =item 25/March/97 diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod index a9a5fd4..34caab8 100644 --- a/pod/perlfaq1.pod +++ b/pod/perlfaq1.pod @@ -77,6 +77,8 @@ To avoid the "what language is perl5?" confusion, some people prefer to simply use "perl" to refer to the latest version of perl and avoid using "perl5" altogether. It's not really that big a deal, though. +See L for a history of Perl revisions. + =head2 How stable is Perl? Production releases, which incorporate bug fixes and new functionality, @@ -92,10 +94,10 @@ and the rare new keyword). =head2 Is Perl difficult to learn? -Perl is easy to start learning -- and easy to keep learning. It looks -like most programming languages you're likely to have had experience +No, Perl is easy to start learning -- and easy to keep learning. It looks +like most programming languages you're likely to have experience with, so if you've ever written an C program, an awk script, a shell -script, or even an Excel macro, you're already part way there. +script, or even BASIC program, you're already part way there. Most tasks only require a small subset of the Perl language. One of the guiding mottos for Perl development is "there's more than one way @@ -220,7 +222,7 @@ sometimes helpful to point out that delivery times may be reduced using Perl, as compared to other languages. If you have a project which has a bottleneck, especially in terms of -translation, or testing, Perl almost certainly will provide a viable, +translation or testing, Perl almost certainly will provide a viable, and quick solution. In conjunction with any persuasion effort, you should not fail to point out that Perl is used, quite extensively, and with extremely reliable and valuable results, at many large computer @@ -245,5 +247,18 @@ behind). Several important bugs were fixed from the 5.000 through =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod index 0f73eea..4d873c7 100644 --- a/pod/perlfaq2.pod +++ b/pod/perlfaq2.pod @@ -19,7 +19,7 @@ as well as Windows NT, Plan 9, VMS, QNX, OS/2, and the Amiga. Binary distributions for various platforms can be found http://www.perl.com/CPAN/ports/ directory. Some of these ports (especially -the ones that are not part of the standard sources) may behave differently +the ones not part of the standard sources) may behave differently than what is documented in the standard source documentation. These differences can be either positive (e.g. extensions for the features of the particular platform that are not supported in the source release of perl) @@ -27,7 +27,7 @@ or negative (e.g. might be based upon a less current source release of perl). A useful FAQ for Win32 Perl users is: http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html -[This FAQ is seriously outdated as of Jan 1998--it is only relevant to +[This FAQ is seriously outdated as of May 1998--it is only relevant to the perl that ActiveState distributes, especially where it describes various inadequacies and differences with the standard perl extension build support.] @@ -119,13 +119,13 @@ Certainly not. Larry expects that he'll be certified before Perl is. =head2 Where can I get information on Perl? -The complete Perl documentation is available with the perl -distribution. If you have perl installed locally, you probably have -the documentation installed as well: type C if you're on a -system resembling Unix. This will lead you to other important man -pages. If you're not on a Unix system, access to the documentation -will be different; for example, it might be only in HTML format. But -all proper perl installations have fully-accessible documentation. +The complete Perl documentation is available with the perl distribution. +If you have perl installed locally, you probably have the documentation +installed as well: type C if you're on a system resembling Unix. +This will lead you to other important man pages, including how to set your +$MANPATH. If you're not on a Unix system, access to the documentation +will be different; for example, it might be only in HTML format. But all +proper perl installations have fully-accessible documentation. You might also try C in case your system doesn't have a proper man command, or it's been misinstalled. If that doesn't @@ -136,10 +136,6 @@ complete documentation in various formats, including native pod, troff, html, and plain text. There's also a web page at http://www.perl.com/perl/info/documentation.html that might help. -It's also worth noting that there's a PDF version of the complete -documentation for perl available in the CPAN/authors/id/BMIDD -directory. - Many good books have been written about Perl -- see the section below for more details. @@ -150,14 +146,18 @@ following groups: comp.lang.perl.announce Moderated announcement group comp.lang.perl.misc Very busy group about Perl in general + comp.lang.perl.moderated Moderated discussion group comp.lang.perl.modules Use and development of Perl modules comp.lang.perl.tk Using Tk (and X) from Perl comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web. +Actually, the moderated group hasn't passed yet, but we're +keeping our fingers crossed. + There is also USENET gateway to the mailing list used by the crack Perl development team (perl5-porters) at -news://genetics.upenn.edu/perl.porters-gw/ . +news://news.perl.com/perl.porters-gw/ . =head2 Where should I post source code? @@ -167,6 +167,10 @@ cross-post to alt.sources, please make sure it follows their posting standards, including setting the Followup-To header line to NOT include alt.sources; see their FAQ for details. +If you're just looking for software, first use Alta Vista, Deja News, and +search CPAN. This is faster and more productive than just posting +a request. + =head2 Perl Books A number of books on Perl and/or CGI programming are available. A few of @@ -182,71 +186,104 @@ fourth printing. Authors: Larry Wall, Tom Christiansen, and Randal Schwartz ISBN 1-56592-149-6 (English) ISBN 4-89052-384-7 (Japanese) - (French and German translations in progress) + (French, German, and Italian translations also available) Note that O'Reilly books are color-coded: turquoise (some would call it teal) covers indicate perl5 coverage, while magenta (some would call it pink) covers indicate perl4 only. Check the cover color before you buy! +If you're already a hard-core systems programmer, then the Camel Book +might suffice for you to learn Perl from. But if you're not, check +out I by Randal and Tom. The second edition of "Llama +Book" has a blue cover, and is updated for the 5.004 release of Perl. + +If you're not an accidental programmer, but a more serious and possibly +even degreed computer scientist who doesn't need as much hand-holding as +we try to provide in the Llama or its defurred cousin the Gecko, please +check out the delightful book, I, +written by Nigel Chapman. + +You can order O'Reilly books diretly from O'Reilly & Associates, +1-800-998-9938. Local/overseas is 1-707-829-0515. If you can +locate an O'Reilly order form, you can also fax to 1-707-829-0104. +See http://www.ora.com/ on the Web. + What follows is a list of the books that the FAQ authors found personally useful. Your mileage may (but, we hope, probably won't) vary. -If you're already a hard-core systems programmer, then the Camel Book -just might suffice for you to learn Perl from. But if you're not, -check out the "Llama Book". It currently doesn't cover perl5, but the -2nd edition is nearly done and should be out by summer 97: +Recommended books on (or muchly on) Perl are the following. +Those marked with a star may be ordered from O'Reilly. - Learning Perl (the Llama Book): - Author: Randal Schwartz, with intro by Larry Wall - ISBN 1-56592-042-2 (English) - ISBN 4-89502-678-1 (Japanese) - ISBN 2-84177-005-2 (French) - ISBN 3-930673-08-8 (German) +=over -Another stand-out book in the turquoise O'Reilly Perl line is the "Hip -Owls" book. It covers regular expressions inside and out, with quite a -bit devoted exclusively to Perl: +=item References - Mastering Regular Expressions (the Cute Owls Book): - Author: Jeffrey Friedl - ISBN 1-56592-257-3 + *Programming Perl + by Larry Wall, Tom Christiansen, and Randal L. Schwartz -You can order any of these books from O'Reilly & Associates, -1-800-998-9938. Local/overseas is 1-707-829-0515. If you can locate -an O'Reilly order form, you can also fax to 1-707-829-0104. See -http://www.ora.com/ on the Web. + *Perl 5 Desktop Reference + By Johan Vromans -Recommended Perl books that are not from O'Reilly are the following: +=item Tutorials + + *Learning Perl [2nd edition] + by Randal L. Schwartz and Tom Christiansen - Cross-Platform Perl, (for Unix and Windows NT) - Author: Eric F. Johnson - ISBN: 1-55851-483-X + *Learning Perl on Win32 Systems + by Randal L. Schwartz, Erik Olson, and Tom Christiansen, + with foreword by Larry Wall - How to Set up and Maintain a World Wide Web Site, (2nd edition) - Author: Lincoln Stein, M.D., Ph.D. - ISBN: 0-201-63462-7 + Perl: The Programmer's Companion + by Nigel Chapman - CGI Programming in C & Perl, - Author: Thomas Boutell - ISBN: 0-201-42219-0 + Cross-Platform Perl + by Eric F. Johnson -Note that some of these address specific application areas (e.g. the -Web) and are not general-purpose programming books. + MacPerl: Power and Ease + by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher -=head2 Perl in Magazines +=item Task-Oriented + + *The Perl Cookbook + by Tom Christiansen and Nathan Torkington + with foreword by Larry Wall + + Perl5 Interactive Course [2nd edition] + by Jon Orwant + + *Advanced Perl Programming + by Sriram Srinivasan -The Perl Journal is the first and only magazine dedicated to Perl. -It is published (on paper, not online) quarterly by Jon Orwant -(orwant@tpj.com), editor. Subscription information is at http://tpj.com -or via email to subscriptions@tpj.com. + Effective Perl Programming + by Joseph Hall -Beyond this, two other magazines that frequently carry high-quality -articles on Perl are Web Techniques (see -http://www.webtechniques.com/) and Unix Review -(http://www.unixreview.com/). Randal Schwartz's Web Technique's -columns are available on the web at -http://www.stonehenge.com/merlyn/WebTechniques/ . +=item Special Topics + + *Mastering Regular Expressions + by Jeffrey Friedl + + How to Set up and Maintain a World Wide Web Site [2nd edition] + by Lincoln Stein + +=back + +=head2 Perl in Magazines + +The first and only periodical devoted to All Things Perl, I contains tutorials, demonstrations, case studies, +announcements, contests, and much more. TPJ has columns on web +development, databases, Win32 Perl, graphical programming, regular +expressions, and networking, and sponsors the Obfuscated Perl Contest. +It is published quarterly by Jon Orwant. See http://www.tpj.com/ or +send mail to subscriptions@tpj.com. + +Beyond this, magazines that frequently carry high-quality articles +on Perl are I (see http://www.webtechniques.com/), +I at www.performance-computing.com, and Usenix's +newsletter/magazine to its members, I, at http://www.usenix.org/. +Randal's Web Technique's columns are available on the web at +http://www.stonehenge.com/merlyn/WebTechniques/. =head2 Perl on the Net: FTP and WWW Access @@ -297,13 +334,13 @@ for information on subscribing. =item NTPerl This list is used to discuss issues involving Win32 Perl 5 (Windows NT -and Win95). Subscribe by emailing ListManager@ActiveWare.com with the +and Win95). Subscribe by mailing ListManager@ActiveWare.com with the message body: subscribe Perl-Win32-Users The list software, also written in perl, will automatically determine -your address, and subscribe you automatically. To unsubscribe, email +your address, and subscribe you automatically. To unsubscribe, mail the following in the message body to the same address like so: unsubscribe Perl-Win32-Users @@ -314,7 +351,7 @@ to join or leave this list. =item Perl-Packrats Discussion related to archiving of perl materials, particularly the -Comprehensive Perl Archive Network (CPAN). Subscribe by emailing +Comprehensive Perl Archive Network (CPAN). Subscribe by mailing majordomo@cis.ufl.edu: subscribe perl-packrats @@ -348,13 +385,14 @@ let perlfaq-suggestions@perl.com know. =head2 Perl Training -While some large training companies offer their own courses on Perl, -you may prefer to contact individuals near and dear to the heart of -Perl development. Two well-known members of the Perl development team -who offer such things are Tom Christiansen -and Randal Schwartz , plus their -respective minions, who offer a variety of professional tutorials -and seminars on Perl. These courses include large public seminars, +While some large training companies offer their own courses on +Perl, you may prefer to contact individuals near and dear to the +heart of Perl development. Two well-known members of the Perl +development team head companies which offer such things are +Tom Christiansen and Randal Schwartz +, plus their respective +minions, who offer a variety of professional tutorials and +seminars on Perl. These courses include large public seminars, private corporate training, and fly-ins to Colorado and Oregon. See http://www.perl.com/perl/info/training.html for more details. @@ -406,8 +444,8 @@ For more information, contact the The Perl Clinic: =head2 Where do I send bug reports? If you are reporting a bug in the perl interpreter or the modules -shipped with perl, use the perlbug program in the perl distribution or -email your report to perlbug@perl.com. +shipped with perl, use the I program in the perl distribution or +mail your report to perlbug@perl.com. If you are posting a bug with a non-standard port (see the answer to "What platforms is Perl available for?"), a binary distribution, or a @@ -415,10 +453,18 @@ non-standard module (such as Tk, CGI, etc), then please see the documentation that came with it to determine the correct place to post bugs. -Read the perlbug man page (perl5.004 or later) for more information. +Read the perlbug(1) man page (perl5.004 or later) for more information. =head2 What is perl.com? perl.org? The Perl Institute? +The perl.com domain is Tom Christiansen's domain. He created it as a +public service long before perl.org came about. Despite the name, it's a +pretty non-commercial site meant to be a clearinghouse for information +about all things Perlian, accepting no paid advertisements, bouncy +happy gifs, or silly java applets on its pages. The Perl Home Page at +http://www.perl.com/ is currently hosted on a T3 line courtesy of Songline +Systems, a software-oriented subsidiary of O'Reilly and Associates. + perl.org is the official vehicle for The Perl Institute. The motto of TPI is "helping people help Perl help people" (or something like that). It's a non-profit organization supporting development, @@ -426,12 +472,6 @@ documentation, and dissemination of perl. Current directors of TPI include Larry Wall, Tom Christiansen, and Randal Schwartz, whom you may have heard of somewhere else around here. -The perl.com domain is Tom Christiansen's domain. He created it as a -public service long before perl.org came about. It's the original PBS -of the Perl world, a clearinghouse for information about all things -Perlian, accepting no paid advertisements, glossy gifs, or (gasp!) -java applets on its pages. - =head2 How do I learn about object-oriented Perl programming? L (distributed with 5.004 or later) is a good place to start. @@ -440,5 +480,18 @@ while L has some excellent tips and tricks. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod index 7a30759..31e669f 100644 --- a/pod/perlfaq3.pod +++ b/pod/perlfaq3.pod @@ -13,10 +13,13 @@ Have you looked at CPAN (see L)? The chances are that someone has already written a module that can solve your problem. Have you read the appropriate man pages? Here's a brief index: + Basics perldata, perlvar, perlsyn, perlop, perlsub + Execution perlrun, perldebug + Functions perlfunc Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlmodlib, perlsub - Regexps perlre, perlfunc, perlop + Regexps perlre, perlfunc, perlop, perllocale Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html @@ -65,8 +68,8 @@ breakdowns of where your code spends its time. =head2 How do I cross-reference my Perl programs? The B::Xref module, shipped with the new, alpha-release Perl compiler -(not the general distribution), can be used to generate -cross-reference reports for Perl programs. +(not the general distribution prior to the 5.005 release), can be used +to generate cross-reference reports for Perl programs. perl -MO=Xref[,OPTIONS] foo.pl @@ -99,10 +102,10 @@ the trick. =head2 Where can I get Perl macros for vi? For a complete version of Tom Christiansen's vi configuration file, -see ftp://ftp.perl.com/pub/vi/toms.exrc, the standard benchmark file -for vi emulators. This runs best with nvi, the current version of vi -out of Berkeley, which incidentally can be built with an embedded Perl -interpreter -- see http://www.perl.com/CPAN/src/misc . +see http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/toms.exrc, +the standard benchmark file for vi emulators. This runs best with nvi, +the current version of vi out of Berkeley, which incidentally can be built +with an embedded Perl interpreter -- see http://www.perl.com/CPAN/src/misc. =head2 Where can I get perl-mode for emacs? @@ -121,25 +124,23 @@ should be using "main::foo", anyway. =head2 How can I use curses with Perl? The Curses module from CPAN provides a dynamically loadable object -module interface to a curses library. +module interface to a curses library. A small demo can be found at the +directory http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/rep; +this program repeats a command and updates the screen as needed, rendering +B similar to B. =head2 How can I use X or Tk with Perl? -Tk is a completely Perl-based, object-oriented interface to the Tk -toolkit that doesn't force you to use Tcl just to get at Tk. Sx is an -interface to the Athena Widget set. Both are available from CPAN. +Tk is a completely Perl-based, object-oriented interface to the Tk toolkit +that doesn't force you to use Tcl just to get at Tk. Sx is an interface +to the Athena Widget set. Both are available from CPAN. See the +directory http://www.perl.com/CPAN/modules/by-category/08_User_Interfaces/ =head2 How can I generate simple menus without using CGI or Tk? The http://www.perl.com/CPAN/authors/id/SKUNZ/perlmenu.v4.0.tar.gz module, which is curses-based, can help with this. -=head2 Can I dynamically load C routines into Perl? - -If your system architecture supports it, then the standard perl -on your system should also provide you with this via the -DynaLoader module. Read L for details. - =head2 What is undump? See the next questions. @@ -225,9 +226,9 @@ No, Perl's garbage collection system takes care of this. =head2 How can I free an array or hash so my program shrinks? -You can't. Memory the system allocates to a program will never be -returned to the system. That's why long-running programs sometimes -re-exec themselves. +You can't. Memory the system allocates to a program will in practice +never be returned to the system. That's why long-running programs +sometimes re-exec themselves. However, judicious use of my() on your variables will help make sure that they go out of scope so that Perl can free up their storage for @@ -266,6 +267,8 @@ Both of these solutions can have far-reaching effects on your system and on the way you write your CGI scripts, so investigate them with care. +See http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/. + =head2 How can I hide the source for my Perl program? Delete it. :-) Seriously, there are a number of (mostly @@ -319,9 +322,10 @@ save little more than compilation time, leaving execution no more than (like several times faster), but this takes some tweaking of your code. -Malcolm will be in charge of the 5.005 release of Perl itself -to try to unify and merge his compiler and multithreading work into -the main release. +The 5.005 release of Perl itself, whose main goal is merging the various +non-Unix ports back into the one Perl source, will also have preliminary +(strictly beta) support for Malcolm's compiler and his light-weight +processes (sometimes called "threads"). You'll probably be astonished to learn that the current version of the compiler generates a compiled form of your script whose executable is @@ -334,6 +338,16 @@ you link your main perl binary with this, it will make it miniscule. For example, on one author's system, /usr/bin/perl is only 11k in size! +In general, the compiler will do nothing to make a Perl program smaller, +faster, more portable, or more secure. In fact, it will usually hurt +all of those. The executable will be bigger, your VM system may take +longer to load the whole thing, the binary is fragile and hard to fix, +and compilation never stopped software piracy in the form of crackers, +viruses, or bootleggers. The real advantage of the compiler is merely +packaging, and once you see the size of what it makes (well, unless +you use a shared I), you'll probably want a complete +Perl install anywayt. + =head2 How can I get '#!perl' to work on [MS-DOS,NT,...]? For OS/2 just use @@ -365,12 +379,12 @@ Yes. Read L for more information. Some examples follow. (These assume standard Unix shell quoting rules.) # sum first and last fields - perl -lane 'print $F[0] + $F[-1]' + perl -lane 'print $F[0] + $F[-1]' * # identify text files perl -le 'for(@ARGV) {print if -f && -T _}' * - # remove comments from C program + # remove (most) comments from C program perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c # make file a month younger than today, defeating reaper daemons @@ -433,21 +447,28 @@ books. For problems and questions related to the web, like "Why do I get 500 Errors" or "Why doesn't it run from the browser right when it runs fine on the command line", see these sources: - The Idiot's Guide to Solving Perl/CGI Problems, by Tom Christiansen - http://www.perl.com/perl/faq/idiots-guide.html + WWW Security FAQ + http://www.w3.org/Security/Faq/ - Frequently Asked Questions about CGI Programming, by Nick Kew - ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq - http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml + Web FAQ + http://www.boutell.com/faq/ - Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen - http://www.perl.com/perl/faq/perl-cgi-faq.html + CGI FAQ + http://www.webthing.com/page.cgi/cgifaq - The WWW Security FAQ, by Lincoln Stein - http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html + HTTP Spec + http://www.w3.org/pub/WWW/Protocols/HTTP/ + + HTML Spec + http://www.w3.org/TR/REC-html40/ + http://www.w3.org/pub/WWW/MarkUp/ + + CGI Spec + http://www.w3.org/CGI/ + + CGI Security FAQ + http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt - World Wide Web FAQ, by Thomas Boutell - http://www.boutell.com/faq/ =head2 Where can I learn about object-oriented Perl programming? @@ -499,6 +520,18 @@ information, see L. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. - +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 4c38d90..8c57db3 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -12,6 +12,10 @@ data issues. =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)? +The infinite set that a mathematician thinks of as the real numbers can +only be approximate on a computer, since the computer only has a finite +number of bits to store an infinite number of, um, numbers. + Internally, your computer represents floating-point numbers in binary. Floating-point numbers read in from a file, or appearing as literals in your program, are converted from their decimal floating-point @@ -37,6 +41,7 @@ are consequently slower. To get rid of the superfluous digits, just use a format (eg, C) to get the required precision. +See L. =head2 Why isn't my octal data interpreted correctly? @@ -54,11 +59,10 @@ umask(), or sysopen(), which all want permissions in octal. chmod(644, $file); # WRONG -- perl -w catches this chmod(0644, $file); # right -=head2 Does perl have a round function? What about ceil() and floor()? -Trig functions? +=head2 Does perl have a round function? What about ceil() and floor()? Trig functions? -For rounding to a certain number of digits, sprintf() or printf() is -usually the easiest route. +Remember that int() merely truncates toward 0. For rounding to a certain +number of digits, sprintf() or printf() is usually the easiest route. The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of other mathematical and trigonometric @@ -131,11 +135,13 @@ Get the http://www.perl.com/CPAN/modules/by-module/Roman module. =head2 Why aren't my random numbers random? +John von Neumann said, ``Anyone who attempts to generate random numbers by +deterministic means is, of course, living in a state of sin.'' + The short explanation is that you're getting pseudorandom numbers, not -random ones, because that's how these things work. A longer -explanation is available on -http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom -Phoenix. +random ones, because that's how these things work. A longer explanation +is available on http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy +of Tom Phoenix. You should also check out the Math::TrulyRandom module from CPAN. @@ -177,27 +183,33 @@ Instead, there is an example of Julian date calculation in http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz, which should help. -=head2 Does Perl have a year 2000 problem? +=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant? -Not unless you use Perl to create one. The date and time functions -supplied with perl (gmtime and localtime) supply adequate information -to determine the year well beyond 2000 (2038 is when trouble strikes). -The year returned by these functions when used in an array context is -the year minus 1900. For years between 1910 and 1999 this I -to be a 2-digit decimal number. To avoid the year 2000 problem simply -do not treat the year as a 2-digit number. It isn't. +Perl is just as Y2K compliant as your pencil--no more, and no less. +The date and time functions supplied with perl (gmtime and localtime) +supply adequate information to determine the year well beyond 2000 (2038 +is when trouble strikes). The year returned by these functions when used +in an array context is the year minus 1900. For years between 1910 and +1999 this I to be a 2-digit decimal number. To avoid the year +2000 problem simply do not treat the year as a 2-digit number. It isn't. -When gmtime() and localtime() are used in a scalar context they return +When gmtime() and localtime() are used in scalar context they return a timestamp string that contains a fully-expanded year. For example, C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00 2001". There's no year 2000 problem here. +That doesn't mean that Perl can't be used to create non-Y2K compliant +programs. It can. But so can your pencil. It's the fault of the user, +not the language. At the risk of inflaming the NRA: ``Perl doesn't +break Y2K, people do.'' See http://language.perl.com/news/y2k.html for +a longer exposition. + =head1 Data: Strings =head2 How do I validate input? The answer to this question is usually a regular expression, perhaps -with auxiliary logic. See the more specific questions (numbers, email +with auxiliary logic. See the more specific questions (numbers, mail addresses, etc.) for details. =head2 How do I unescape a string? @@ -220,7 +232,7 @@ To turn "abbcccd" into "abccd": This is documented in L. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate -a subroutine call (in a list context) into a string: +a subroutine call (in list context) into a string: print "My sub returned @{[mysub(1,2,3)]} that time.\n"; @@ -241,16 +253,23 @@ multiple ones, then something more like C would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser. +One destructive, inside-out approach that you might try is to pull +out the smallest nesting parts one at a time: + + while (s/BEGIN(.*?)END//gs) { + # do something with $1 + } + =head2 How do I reverse a string? -Use reverse() in a scalar context, as documented in +Use reverse() in scalar context, as documented in L. $reversed = reverse $string; =head2 How do I expand tabs in a string? -You can do it the old-fashioned way: +You can do it yourself: 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; @@ -299,6 +318,23 @@ into "whosoever" or "whomsoever", case insensitively. : $1 # renege and leave it there }igex; +In the more general case, you can use the C modifier in a C +loop, keeping count of matches. + + $WANT = 3; + $count = 0; + while (/(\w+)\s+fish\b/gi) { + if (++$count == $WANT) { + print "The third fish is a $1 one.\n"; + # Warning: don't `last' out of this loop + } + } + +That prints out: "The third fish is a red one." You can also use a +repetition count and repeated pattern like this: + + /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; + =head2 How can I count the number of occurrences of a substring within a string? There are a number of ways, with varying efficiency: If you want a @@ -345,6 +381,10 @@ To force each word to be lower case, with the first letter upper case: $line =~ s/(\w+)/\u\L$1/g; +You can (and probably should) enable locale awareness of those +characters by placing a C pragma in your program. +See L for endless details. + =head2 How can I split a [character] delimited string except when inside [character]? (Comma-separated files) @@ -382,11 +422,12 @@ distribution) lets you say: =head2 How do I strip blank space from the beginning/end of a string? -The simplest approach, albeit not the fastest, is probably like this: +Although the simplest approach would seem to be: $string =~ s/^\s*(.*?)\s*$/$1/; -It would be faster to do this in two steps: +This is unneccesarily slow, destructive, and fails with embedded newlines. +It is much better faster to do this in two steps: $string =~ s/^\s+//; $string =~ s/\s+$//; @@ -398,9 +439,39 @@ Or more nicely written as: s/\s+$//; } +This idiom takes advantage of the for(each) loop's aliasing +behavior to factor out common code. You can do this +on several strings at once, or arrays, or even the +values of a hash if you use a slide: + + # trim whitespace in the scalar, the array, + # and all the values in the hash + foreach ($scalar, @array, @hash{keys %hash}) { + s/^\s+//; + s/\s+$//; + } + =head2 How do I extract selected columns from a string? Use substr() or unpack(), both documented in L. +If you prefer thinking in terms of columns instead of widths, +you can use this kind of thing: + + # determine the unpack format needed to split Linux ps output + # arguments are cut columns + my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); + + sub cut2fmt { + my(@positions) = @_; + my $template = ''; + my $lastpos = 1; + for my $place (@positions) { + $template .= "A" . ($place - $lastpos) . " "; + $lastpos = $place; + } + $template .= "A*"; + return $template; + } =head2 How do I find the soundex value of a string? @@ -411,15 +482,26 @@ Use the standard Text::Soundex module distributed with perl. Let's assume that you have a string like: $text = 'this has a $foo in it and a $bar'; + +If those were both global variables, then this would +suffice: + $text =~ s/\$(\w+)/${$1}/g; -Before version 5 of perl, this had to be done with a double-eval -substitution: +But since they are probably lexicals, or at least, they could +be, you'd have to do this: $text =~ s/(\$\w+)/$1/eeg; + die if $@; # needed on /ee, not /e -Which is bizarre enough that you'll probably actually need an EEG -afterwards. :-) +It's probably better in the general case to treat those +variables as entries in some special hash. For example: + + %user_defs = ( + foo => 23, + bar => 19, + ); + $text =~ s/\$(\w+)/$user_defs{$1}/g; See also "How do I expand function calls in a string?" in this section of the FAQ. @@ -458,6 +540,12 @@ that actually do care about the difference between a string and a number, such as the magical C<++> autoincrement operator or the syscall() function. +Stringification also destroys arrays. + + @lines = `command`; + print "@lines"; # WRONG - extra blanks + print @lines; # right + =head2 Why don't my <op_ppaddr)() ) ; + @@@ TAINT_NOT; + @@@ return 0; + @@@ } + MAIN_INTERPRETER_LOOP + +Or with a fixed amount of leading white space, with remaining +indentation correctly preserved: + + $poem = fix<dication that you probably should have +used a hash, not a list or array, to store your data. Hashes are +designed to answer this question quickly and efficiently. Arrays aren't. -There are several ways to approach this. If you are going to make -this query many times and the values are arbitrary strings, the -fastest way is probably to invert the original array and keep an +That being said, there are several ways to approach this. If you +are going to make this query many times over arbitrary string values, +the fastest way is probably to invert the original array and keep an associative array lying about whose keys are the first array's values. @blues = qw/azure cerulean teal turquoise lapis-lazuli/; @@ -605,8 +764,11 @@ Now C<$found_index> has what you want. In general, you usually don't need a linked list in Perl, since with regular arrays, you can push and pop or shift and unshift at either end, -or you can use splice to add and/or remove arbitrary number of elements -at arbitrary points. +or you can use splice to add and/or remove arbitrary number of elements at +arbitrary points. Both pop and shift are both O(1) operations on perl's +dynamic arrays. In the absence of shifts and pops, push in general +needs to reallocate on the order every log(N) times, and unshift will +need to copy pointers each time. If you really, really wanted, you could use structures as described in L or L and do just what the algorithm book tells you @@ -622,7 +784,23 @@ lists, or you could just do something like this with an array: =head2 How do I shuffle an array randomly? -Here's a shuffling algorithm which works its way through the list, +Use this: + + # fisher_yates_shuffle( \@array ) : + # generate a random permutation of @array in place + sub fisher_yates_shuffle { + my $array = shift; + my $i; + for ($i = @$array; --$i; ) { + my $j = int rand ($i+1); + next if $i == $j; + @$array[$i,$j] = @$array[$j,$i]; + } + } + + fisher_yates_shuffle( \@array ); # permutes @array in place + +You've probably seen shuffling algorithms that works using splice, randomly picking another element to swap the current element with: srand; @@ -632,65 +810,70 @@ randomly picking another element to swap the current element with: push(@new, splice(@old, rand @old, 1)); } -For large arrays, this avoids a lot of the reshuffling: - - srand; - @new = (); - @old = 1 .. 10000; # just a demo - for( @old ){ - my $r = rand @new+1; - push(@new,$new[$r]); - $new[$r] = $_; - } +This is bad because splice is already O(N), and since you do it N times, +you just invented a quadratic algorithm; that is, O(N**2). This does +not scale, although Perl is so efficient that you probably won't notice +this until you have rather largish arrays. =head2 How do I process/modify each element of an array? Use C/C: for (@lines) { - s/foo/bar/; - tr[a-z][A-Z]; + s/foo/bar/; # change that word + y/XZ/ZX/; # swap those letters } Here's another; let's compute spherical volumes: - for (@radii) { + for (@volumes = @radii) { # @volumes has changed parts $_ **= 3; $_ *= (4/3) * 3.14159; # this will be constant folded } +If you want to do the same thing to modify the values of the hash, +you may not use the C function, oddly enough. You need a slice: + + for $orbit ( @orbits{keys %orbits} ) { + ($orbit **= 3) *= (4/3) * 3.14159; + } + =head2 How do I select a random element from an array? Use the rand() function (see L): + # at the top of the program: srand; # not needed for 5.004 and later + + # then later on $index = rand @array; $element = $array[$index]; +Make sure you I. +If you are calling it more than once (such as before each +call to rand), you're almost certainly doing something wrong. + =head2 How do I permute N elements of a list? Here's a little program that generates all permutations of all the words on each line of input. The algorithm embodied -in the permut() function should work on any list: +in the permute() function should work on any list: #!/usr/bin/perl -n - # permute - tchrist@perl.com - permut([split], []); - sub permut { - my @head = @{ $_[0] }; - my @tail = @{ $_[1] }; - unless (@head) { - # stop recursing when there are no elements in the head - print "@tail\n"; + # tsc-permute: permute each word of input + permute([split], []); + sub permute { + my @items = @{ $_[0] }; + my @perms = @{ $_[1] }; + unless (@items) { + print "@perms\n"; } else { - # for all elements in @head, move one from @head to @tail - # and call permut() on the new @head and @tail - my(@newhead,@newtail,$i); - foreach $i (0 .. $#head) { - @newhead = @head; - @newtail = @tail; - unshift(@newtail, splice(@newhead, $i, 1)); - permut([@newhead], [@newtail]); + my(@newitems,@newperms,$i); + foreach $i (0 .. $#items) { + @newitems = @items; + @newperms = @perms; + unshift(@newperms, splice(@newitems, $i, 1)); + permute([@newitems], [@newperms]); } } } @@ -796,7 +979,7 @@ See L in the 5.004 release or later of Perl. Use the each() function (see L) if you don't care whether it's sorted: - while (($key,$value) = each %hash) { + while ( ($key, $value) = each %hash) { print "$key = $value\n"; } @@ -862,6 +1045,7 @@ L). You can look into using the DB_File module and tie() using the $DB_BTREE hash bindings as documented in L. +The Tie::IxHash module from CPAN might also be instructive. =head2 What's the difference between "delete" and "undef" with hashes? @@ -953,7 +1137,7 @@ they end up doing is not what they do with ordinary hashes. =head2 How do I reset an each() operation part-way through? -Using C in a scalar context returns the number of keys in +Using C in scalar context returns the number of keys in the hash I resets the iterator associated with the hash. You may need to do this if you use C to exit a loop early so that when you re-enter it, the hash iterator has been reset. @@ -1055,14 +1239,37 @@ Assuming that you don't care about IEEE notations like "NaN" or "Infinity", you probably just want to use a regular expression. warn "has nondigits" if /\D/; - warn "not a whole number" unless /^\d+$/; - warn "not an integer" unless /^-?\d+$/; # reject +3 + warn "not a natural number" unless /^\d+$/; # rejects -3 + warn "not an integer" unless /^-?\d+$/; # rejects +3 warn "not an integer" unless /^[+-]?\d+$/; warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; warn "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/; +If you're on a POSIX system, Perl's supports the C +function. Its semantics are somewhat cumbersome, so here's a C +wrapper function for more convenient access. This function takes +a string and returns the number it found, or C for input that +isn't a C float. The C function is a front end to C +if you just want to say, ``Is this a float?'' + + sub getnum { + use POSIX qw(strtod); + my $str = shift; + $str =~ s/^\s+//; + $str =~ s/\s+$//; + $! = 0; + my($num, $unparsed) = strtod($str); + if (($str eq '') || ($unparsed != 0) || $!) { + return undef; + } else { + return $num; + } + } + + sub is_numeric { defined &getnum } + Or you could check out http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz instead. The POSIX module (part of the standard Perl distribution) @@ -1096,6 +1303,18 @@ Get the Business::CreditCard module from CPAN. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. - +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 5d71f64..41c46a3 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -7,7 +7,7 @@ perlfaq5 - Files and Formats ($Revision: 1.22 $, $Date: 1997/04/24 22:44:02 $) This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers. -=head2 How do I flush/unbuffer a filehandle? Why must I do this? +=head2 How do I flush/unbuffer an output filehandle? Why must I do this? The C standard I/O library (stdio) normally buffers characters sent to devices. This is done for efficiency reasons, so that there isn't a @@ -15,7 +15,7 @@ system call for each byte. Any time you use print() or write() in Perl, you go though this buffering. syswrite() circumvents stdio and buffering. -In most stdio implementations, the type of buffering and the size of +In most stdio implementations, the type of output buffering and the size of the buffer varies according to the type of device. Disk files are block buffered, often with a buffer size of more than 2k. Pipes and sockets are often buffered with a buffer size between 1/2 and 2k. Serial devices @@ -29,10 +29,23 @@ command. This isn't as hard on your system as unbuffering, but does get the output where you want it when you want it. If you expect characters to get to your device when you print them there, -you'll want to autoflush its handle, as in the older: +you'll want to autoflush its handle. +Use select() and the C<$|> variable to control autoflushing +(see L and L): + + $old_fh = select(OUTPUT_HANDLE); + $| = 1; + select($old_fh); + +Or using the traditional idiom: + + select((select(OUTPUT_HANDLE), $| = 1)[0]); + +Or if don't mind slowly loading several thousand lines of module code +just because you're afraid of the C<$|> variable: use FileHandle; - open(DEV, "<+/dev/tty"); # ceci n'est pas une pipe + open(DEV, "+autoflush(1); or the newer IO::* modules: @@ -50,24 +63,18 @@ or even this: die "$!" unless $sock; $sock->autoflush(); - $sock->print("GET /\015\012"); - $document = join('', $sock->getlines()); + print $sock "GET / HTTP/1.0" . "\015\012" x 2; + $document = join('', <$sock>); print "DOC IS: $document\n"; -Note the hardcoded carriage return and newline in their octal -equivalents. This is the ONLY way (currently) to assure a proper -flush on all platforms, including Macintosh. +Note the bizarrely hardcoded carriage return and newline in their octal +equivalents. This is the ONLY way (currently) to assure a proper flush +on all platforms, including Macintosh. That the way things work in +network programming: you really should specify the exact bit pattern +on the network line terminator. In practice, C<"\n\n"> often works, +but this is not portable. -You can use select() and the C<$|> variable to control autoflushing -(see L and L): - - $oldh = select(DEV); - $| = 1; - select($oldh); - -You'll also see code that does this without a temporary variable, as in - - select((select(DEV), $| = 1)[0]); +See L for other examples of fetching URLs over the web. =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? @@ -78,13 +85,15 @@ bytes. In general, there's no direct way for Perl to seek to a particular line of a file, insert text into a file, or remove text from a file. -(There are exceptions in special circumstances. Replacing a sequence -of bytes with another sequence of the same length is one. Another is -using the C<$DB_RECNO> array bindings as documented in L. -Yet another is manipulating files with all lines the same length.) +(There are exceptions in special circumstances. You can add or remove at +the very end of the file. Another is replacing a sequence of bytes with +another sequence of the same length. Another is using the C<$DB_RECNO> +array bindings as documented in L. Yet another is manipulating +files with all lines the same length.) The general solution is to create a temporary copy of the text file with -the changes you want, then copy that over the original. +the changes you want, then copy that over the original. This assumes +no locking. $old = $file; $new = "$file.tmp.$$"; @@ -157,45 +166,73 @@ proper text file, so this may report one fewer line than you expect. } close FILE; +This assumes no funny games with newline translations. + =head2 How do I make a temporary file name? -Use the process ID and/or the current time-value. If you need to have -many temporary files in one process, use a counter: +Use the C class method from the IO::File module to get a +filehandle opened for reading and writing. Use this if you don't +need to know the file's name. - BEGIN { use IO::File; + $fh = IO::File->new_tmpfile() + or die "Unable to make new temporary file: $!"; + +Or you can use the C function from the POSIX module to get a +filename that you then open yourself. Use this if you do need to know +the file's name. + + use Fcntl; + use POSIX qw(tmpnam); + + # try new temporary filenames until we get one that didn't already + # exist; the check should be unnecessary, but you can't be too careful + do { $name = tmpnam() } + until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL); + + # install atexit-style handler so that when we exit or die, + # we automatically delete this temporary file + END { unlink($name) or die "Couldn't unlink $name : $!" } + + # now go on to use the file ... + +If you're committed to doing this by hand, use the process ID and/or +the current time-value. If you need to have many temporary files in +one process, use a counter: + + BEGIN { use Fcntl; my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP}; my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time()); sub temp_file { - my $fh = undef; + local *FH; my $count = 0; - until (defined($fh) || $count > 100) { + until (defined(fileno(FH)) || $count++ > 100) { $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; - $fh = IO::File->new($base_name, O_WRONLY|O_EXCL|O_CREAT, 0644) + sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT); } - if (defined($fh)) { - return ($fh, $base_name); + if (defined(fileno(FH)) + return (*FH, $base_name); } else { return (); } } } -Or you could simply use IO::Handle::new_tmpfile. - =head2 How can I manipulate fixed-record-length files? -The most efficient way is using pack() and unpack(). This is faster -than using substr(). Here is a sample chunk of code to break up and -put back together again some fixed-format input lines, in this case -from the output of a normal, Berkeley-style ps: +The most efficient way is using pack() and unpack(). This is faster than +using substr() when take many, many strings. It is slower for just a few. + +Here is a sample chunk of code to break up and put back together again +some fixed-format input lines, in this case from the output of a normal, +Berkeley-style ps: # sample input line: # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what $PS_T = 'A6 A4 A7 A5 A*'; open(PS, "ps|"); - $_ = ; print; + print scalar ; while () { ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_); for $var (qw!pid tt stat time command!) { @@ -205,61 +242,174 @@ from the output of a normal, Berkeley-style ps: "\n"; } +We've used C<$$var> in a way that forbidden by C. +That is, we've promoted a string to a scalar variable reference using +symbolic references. This is ok in small programs, but doesn't scale +well. It also only works on global variables, not lexicals. + =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? -You may have some success with typeglobs, as we always had to use -in days of old: +The fastest, simplest, and most direct way is to localize the typeglob +of the filehandle in question: - local(*FH); + local *TmpHandle; -But while still supported, that isn't the best to go about getting -local filehandles. Typeglobs have their drawbacks. You may well want -to use the C module, which creates new filehandles for you -(see L): +Typeglobs are fast (especially compared with the alternatives) and +reasonably easy to use, but they also have one subtle drawback. If you +had, for example, a function named TmpHandle(), or a variable named +%TmpHandle, you just hid it from yourself. - use FileHandle; sub findme { - my $fh = FileHandle->new(); - open($fh, ") { + local *HostFile; + open(HostFile, ") { print if /\b127\.(0\.0\.)?1\b/; } - # $fh automatically closes/disappears here + # *HostFile automatically closes/disappears here + } + +Here's how to use this in a loop to open and store a bunch of +filehandles. We'll use as values of the hash an ordered +pair to make it easy to sort the hash in insertion order. + + @names = qw(motd termcap passwd hosts); + my $i = 0; + foreach $filename (@names) { + local *FH; + open(FH, "/etc/$filename") || die "$filename: $!"; + $file{$filename} = [ $i++, *FH ]; } -Internally, Perl believes filehandles to be of class IO::Handle. You -may use that module directly if you'd like (see L), or -one of its more specific derived classes. + # Using the filehandles in the array + foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) { + my $fh = $file{$name}[1]; + my $line = <$fh>; + print "$name $. $line"; + } + +If you want to create many, anonymous handles, you should check out the +Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent +code with Symbol::gensym, which is reasonably light-weight: + + foreach $filename (@names) { + use Symbol; + my $fh = gensym(); + open($fh, "/etc/$filename") || die "open /etc/$filename: $!"; + $file{$filename} = [ $i++, $fh ]; + } -Once you have IO::File or FileHandle objects, you can pass them -between subroutines or store them in hashes as you would any other -scalar values: +Or here using the semi-object-oriented FileHandle, which certainly isn't +light-weight: use FileHandle; - # Storing filehandles in a hash and array foreach $filename (@names) { - my $fh = new FileHandle($filename) or die; - $file{$filename} = $fh; - push(@files, $fh); + my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!"; + $file{$filename} = [ $i++, $fh ]; } - # Using the filehandles in the array - foreach $file (@files) { - print $file "Testing\n"; +Please understand that whether the filehandle happens to be a (probably +localized) typeglob or an anonymous handle from one of the modules, +in no way affects the bizarre rules for managing indirect handles. +See the next question. + +=head2 How can I use a filehandle indirectly? + +An indirect filehandle is using something other than a symbol +in a place that a filehandle is expected. Here are ways +to get those: + + $fh = SOME_FH; # bareword is strict-subs hostile + $fh = "SOME_FH"; # strict-refs hostile; same package only + $fh = *SOME_FH; # typeglob + $fh = \*SOME_FH; # ref to typeglob (bless-able) + $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob + +Or to use the C method from the FileHandle or IO modules to +create an anonymous filehandle, store that in a scalar variable, +and use it as though it were a normal filehandle. + + use FileHandle; + $fh = FileHandle->new(); + + use IO::Handle; # 5.004 or higher + $fh = IO::Handle->new(); + +Then use any of those as you would a normal filehandle. Anywhere that +Perl is expecting a filehandle, an indirect filehandle may be used +instead. An indirect filehandle is just a scalar variable that contains +a filehandle. Functions like C, C, C, or the functions or +the CFHE> diamond operator will accept either a read filehandle +or a scalar variable containing one: + + ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); + print $ofh "Type it: "; + $got = <$ifh> + print $efh "What was that: $got"; + +Of you're passing a filehandle to a function, you can write +the function in two ways: + + sub accept_fh { + my $fh = shift; + print $fh "Sending to indirect filehandle\n"; } - # You have to do the { } ugliness when you're specifying the - # filehandle by anything other than a simple scalar variable. - print { $files[2] } "Testing\n"; +Or it can localize a typeglob and use the filehandle directly: - # Passing filehandles to subroutines - sub debug { - my $filehandle = shift; - printf $filehandle "DEBUG: ", @_; + sub accept_fh { + local *FH = shift; + print FH "Sending to localized filehandle\n"; } - debug($fh, "Testing\n"); +Both styles work with either objects or typeglobs of real filehandles. +(They might also work with strings under some circumstances, but this +is risky.) + + accept_fh(*STDOUT); + accept_fh($handle); + +In the examples above, we assigned the filehandle to a scalar variable +before using it. That is because only simple scalar variables, +not expressions or subscripts into hashes or arrays, can be used with +built-ins like C, C, or the diamond operator. These are +illegal and won't even compile: + + @fd = (*STDIN, *STDOUT, *STDERR); + print $fd[1] "Type it: "; # WRONG + $got = <$fd[0]> # WRONG + print $fd[2] "What was that: $got"; # WRONG + +With C and C, you get around this by using a block and +an expression where you would place the filehandle: + + print { $fd[1] } "funny stuff\n"; + printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559; + # Pity the poor deadbeef. + +That block is a proper block like any other, so you can put more +complicated code there. This sends the message out to one of two places: + + $ok = -x "/bin/cat"; + print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; + print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; + +This approach of treating C and C like object methods +calls doesn't work for the diamond operator. That's because it's a +real operator, not just a function with a comma-less argument. Assuming +you've been storing typeglobs in your structure as we did above, you +can use the built-in function named C to reads a record just +as CE> does. Given the initialization shown above for @fd, this +would work, but only because readline() require a typeglob. It doesn't +work with objects or strings, which might be a bug we haven't fixed yet. + + $got = readline($fd[0]); + +Let it be noted that the flakiness of indirect filehandles is not +related to whether they're strings, typeglobs, objects, or anything else. +It's the syntax of the fundamental operators. Playing the object +game doesn't help you at all here. =head2 How can I set up a footer format to be used with write()? @@ -326,23 +476,72 @@ Within Perl, you may use this directly: : ( $ENV{HOME} || $ENV{LOGDIR} ) }ex; -=head2 How come when I open the file read-write it wipes it out? +=head2 How come when I open a file read-write it wipes it out? Because you're using something like this, which truncates the file and I gives you read-write access: - open(FH, "+> /path/name"); # WRONG + open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. +doesn't exist. Using "E" always clobbers or creates. +Using "E" never does either. The "+" doesn't change this. - open(FH, "+< /path/name"); # open for update +Here are examples of many kinds of file opens. Those using sysopen() +all assume -If this is an issue, try: + use Fcntl; - sysopen(FH, "/path/name", O_RDWR|O_CREAT, 0644); +To open file for reading: -Error checking is left as an exercise for the reader. + open(FH, "< $path") || die $!; + sysopen(FH, $path, O_RDONLY) || die $!; + +To open file for writing, create new file if needed or else truncate old file: + + open(FH, "> $path") || die $!; + sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT) || die $!; + sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666) || die $!; + +To open file for writing, create new file, file must not exist: + + sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT) || die $!; + sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666) || die $!; + +To open file for appending, create if necessary: + + open(FH, ">> $path") || die $!; + sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT) || die $!; + sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!; + +To open file for appending, file must exist: + + sysopen(FH, $path, O_WRONLY|O_APPEND) || die $!; + +To open file for update, file must exist: + + open(FH, "+< $path") || die $!; + sysopen(FH, $path, O_RDWR) || die $!; + +To open file for update, create file if necessary: + + sysopen(FH, $path, O_RDWR|O_CREAT) || die $!; + sysopen(FH, $path, O_RDWR|O_CREAT, 0666) || die $!; + +To open file for update, file must not exist: + + sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT) || die $!; + sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666) || die $!; + +To open a file without blocking, creating if necessary: + + sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT) + or die "can't open /tmp/somefile: $!": + +Be warned that neither creation nor deletion of files is guaranteed to +be an atomic operation over NFS. That is, two processes might both +successful create or unlink the same file! Therefore O_EXCL +isn't so exclusive as you might wish. =head2 Why do I sometimes get an "Argument list too long" when I use <*>? @@ -398,6 +597,8 @@ then delete the old one. This isn't really the same semantics as a real rename(), though, which preserves metainformation like permissions, timestamps, inode info, etc. +The newer version of File::Copy export a move() function. + =head2 How can I lock a file? Perl's builtin flock() function (see L for details) will call @@ -428,10 +629,6 @@ this. =back -The CPAN module File::Lock offers similar functionality and (if you -have dynamic loading) won't require you to rebuild perl if your -flock() can't lock network files. - =head2 What can't I just open(FH, ">file.lock")? A common bit of code B is this: @@ -443,7 +640,7 @@ This is a classic race condition: you take two steps to do something which must be done in one. That's why computer hardware provides an atomic test-and-set instruction. In theory, this "ought" to work: - sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT, 0644) + sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT) or die "can't open file.lock: $!": except that lamentably, file creation (and deletion) is not atomic @@ -454,11 +651,14 @@ these tend to involve busy-wait, which is also subdesirable. =head2 I still don't get locking. I just want to increment the number in the file. How can I do this? Didn't anyone ever tell you web-page hit counters were useless? +They don't count number of hits, they're a waste of time, and they serve +only to stroke the writer's vanity. Better to pick a random number. +It's more realistic. -Anyway, this is what to do: +Anyway, this is what you can do if you can't help yourself. use Fcntl; - sysopen(FH, "numfile", O_RDWR|O_CREAT, 0644) or die "can't open numfile: $!"; + sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can't open numfile: $!"; flock(FH, 2) or die "can't flock numfile: $!"; $num = || 0; seek(FH, 0, 0) or die "can't rewind numfile: $!"; @@ -496,10 +696,6 @@ like this: Locking and error checking are left as an exercise for the reader. Don't forget them, or you'll be quite sorry. -Don't forget to set binmode() under DOS-like platforms when operating -on files that have anything other than straight text in them. See the -docs on open() and on binmode() for more details. - =head2 How do I get a file's timestamp in perl? If you want to retrieve the time at which the file was last read, @@ -558,13 +754,18 @@ of the multiplexing: open (FH, "| tee file1 file2 file3"); -Otherwise you'll have to write your own multiplexing print function -- -or your own tee program -- or use Tom Christiansen's, at -http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is -written in Perl. +Or even: + + # make STDOUT go to three files, plus original STDOUT + open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n"; + print "whatever\n" or die "Writing: $!\n"; + close(STDOUT) or die "Closing: $!\n"; -In theory a IO::Tee class could be written, but to date we haven't -seen such. +Otherwise you'll have to write your own multiplexing print +function -- or your own tee program -- or use Tom Christiansen's, +at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is +written in Perl and offers much greater functionality +than the stock version. =head2 How can I read in a file by paragraphs? @@ -691,7 +892,11 @@ file that worked. =head2 How can I tell if there's a character waiting on a filehandle? -You should check out the Frequently Asked Questions list in +The very first thing you should do is look into getting the Term::ReadKey +extension from CPAN. It now even has limited support for closed, proprietary +(read: not open systems, not POSIX, not Unix, etc) systems. + +You should also check out the Frequently Asked Questions list in comp.unix.* for things like this: the answer is essentially the same. It's very system dependent. Here's one solution that works on BSD systems: @@ -702,29 +907,47 @@ systems: return $nfd = select($rin,undef,undef,0); } -You should look into getting the Term::ReadKey extension from CPAN. +If you want to find out how many characters are waiting, +there's also the FIONREAD ioctl call to be looked at. -=head2 How do I open a file without blocking? +The I tool that comes with Perl tries to convert C include +files to Perl code, which can be Cd. FIONREAD ends +up defined as a function in the I file: -You need to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module -in conjunction with sysopen(): + require 'sys/ioctl.ph'; - use Fcntl; - sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644) - or die "can't open /tmp/somefile: $!": + $size = pack("L", 0); + ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n"; + $size = unpack("L", $size); -=head2 How do I create a file only if it doesn't exist? +If I wasn't installed or doesn't work for you, you can +I the include files by hand: -You need to use the O_CREAT and O_EXCL flags from the Fcntl module in -conjunction with sysopen(): + % grep FIONREAD /usr/include/*/* + /usr/include/asm/ioctls.h:#define FIONREAD 0x541B - use Fcntl; - sysopen(FH, "/tmp/somefile", O_WRONLY|O_EXCL|O_CREAT, 0644) - or die "can't open /tmp/somefile: $!": +Or write a small C program using the editor of champions: -Be warned that neither creation nor deletion of files is guaranteed to -be an atomic operation over NFS. That is, two processes might both -successful create or unlink the same file! + % cat > fionread.c + #include + main() { + printf("%#08x\n", FIONREAD); + } + ^D + % cc -o fionread fionread + % ./fionread + 0x4004667f + +And then hard-code it, leaving porting as an exercise to your successor. + + $FIONREAD = 0x4004667f; # XXX: opsys dependent + + $size = pack("L", 0); + ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n"; + $size = unpack("L", $size); + +FIONREAD requires a filehandle connected to a stream, meaning sockets, +pipes, and tty devices work, but I files. =head2 How do I do a C in perl? @@ -765,7 +988,12 @@ Or even with a literal numeric descriptor: $fd = $ENV{MHCONTEXTFD}; open(MHCONTEXT, "<&=$fd"); # like fdopen(3S) -Error checking has been left as an exercise for the reader. +Note that "E&STDIN" makes a copy, but "E&=STDIN" make +an alias. That means if you close an aliased handle, all +aliases become inaccessible. This is not true with +a copied one. + +Error checking, as always, has been left as an exercise for the reader. =head2 How do I close a file descriptor by number? @@ -797,7 +1025,7 @@ awk, Tcl, Java, or Python, just to mention a few. Because even on non-Unix ports, Perl's glob function follows standard Unix globbing semantics. You'll need C to get all (non-hidden) -files. +files. This makes glob() portable. =head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl? @@ -821,10 +1049,23 @@ Here's an algorithm from the Camel Book: rand($.) < 1 && ($line = $_) while <>; This has a significant advantage in space over reading the whole -file in. +file in. A simple proof by induction is available upon +request if you doubt its correctness. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. - +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 535e464..0e94f9b 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -25,7 +25,7 @@ comments. # turn the line into the first word, a colon, and the # number of characters on the rest of the line - s/^(\w+)(.*)/ lc($1) . ":" . length($2) /ge; + s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; =item Comments Inside the Regexp @@ -69,8 +69,9 @@ delimiter within the pattern: =head2 I'm having trouble matching over more than one line. What's wrong? -Either you don't have newlines in your string, or you aren't using the -correct modifier(s) on your pattern. +Either you don't have more than one line in the string you're looking at +(probably), or else you aren't using the correct modifier(s) on your +pattern (possibly). There are many ways to get multiline data into a string. If you want it to happen automatically while reading input, you'll want to set $/ @@ -94,7 +95,7 @@ record read in. $/ = ''; # read in more whole paragraph, not just one line while ( <> ) { - while ( /\b(\w\S+)(\s+\1)+\b/gi ) { + while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha print "Duplicate $1 at paragraph $.\n"; } } @@ -133,6 +134,16 @@ But if you want nested occurrences of C through C, you'll run up against the problem described in the question in this section on matching balanced text. +Here's another example of using C<..>: + + while (<>) { + $in_header = 1 .. /^$/; + $in_body = /^$/ .. eof(); + # now choose between them + } continue { + reset if eof(); # fix $. + } + =head2 I put a regular expression into $/ but it didn't work. What's wrong? $/ must be a string, not a regular expression. Awk has to be better @@ -211,7 +222,7 @@ This prints: this is a SUcCESS case -=head2 How can I make C<\w> match accented characters? +=head2 How can I make C<\w> match national character sets? See L. @@ -600,6 +611,18 @@ all mixed. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. - +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod index d62ee36..291ca36 100644 --- a/pod/perlfaq7.pod +++ b/pod/perlfaq7.pod @@ -713,5 +713,18 @@ Use embedded POD to discard it: =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod index dbc1bcd..f4addd8 100644 --- a/pod/perlfaq8.pod +++ b/pod/perlfaq8.pod @@ -848,5 +848,18 @@ included with the 5.002 release of Perl. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index aa942c2..deeb650 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -166,7 +166,7 @@ C or C calls. In addition to tainting, never use the single-argument form of system() or exec(). Instead, supply the command and arguments as a list, which prevents shell globbing. -=head2 How do I parse an email header? +=head2 How do I parse a mail header? For a quick-and-dirty solution, try this solution derived from page 222 of the 2nd edition of "Programming Perl": @@ -193,33 +193,33 @@ CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in the module-free land of perl1 .. perl4, you might look into cgi-lib.pl (available from http://www.bio.cam.ac.uk/web/form.html). -=head2 How do I check a valid email address? +=head2 How do I check a valid mail address? You can't. Without sending mail to the address and seeing whether it bounces (and even then you face the halting problem), you cannot determine whether -an email address is valid. Even if you apply the email header +an mail address is valid. Even if you apply the mail header standard, you can have problems, because there are deliverable addresses that aren't RFC-822 (the mail header standard) compliant, and addresses that aren't deliverable which are compliant. -Many are tempted to try to eliminate many frequently-invalid email +Many are tempted to try to eliminate many frequently-invalid mail addresses with a simple regexp, such as C. However, this also throws out many valid ones, and says nothing about potential deliverability, so is not suggested. Instead, see http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , which actually checks against the full RFC spec (except for nested -comments), looks for addresses you may not wish to accept email to +comments), looks for addresses you may not wish to accept mail to (say, Bill Clinton or your postmaster), and then makes sure that the hostname given can be looked up in DNS. It's not fast, but it works. Here's an alternative strategy used by many CGI script authors: Check -the email address with a simple regexp (such as the one above). If +the mail address with a simple regexp (such as the one above). If the regexp matched the address, accept the address. If the regexp didn't match the address, request confirmation from the user that the -email address they entered was correct. +mail address they entered was correct. =head2 How do I decode a MIME/BASE64 string? @@ -237,7 +237,7 @@ format after minor transliterations: $len = pack("c", 32 + 0.75*length); # compute length byte print unpack("u", $len . $_); # uudecode and print -=head2 How do I return the user's email address? +=head2 How do I return the user's mail address? On systems that support getpwuid, the $E variable and the Sys::Hostname module (which is part of the standard perl distribution), @@ -246,9 +246,9 @@ you can probably try using something like this: use Sys::Hostname; $address = sprintf('%s@%s', getpwuid($<), hostname); -Company policies on email address can mean that this generates addresses -that the company's email system will not accept, so you should ask for -users' email addresses when this matters. Furthermore, not all systems +Company policies on mail address can mean that this generates addresses +that the company's mail system will not accept, so you should ask for +users' mail addresses when this matters. Furthermore, not all systems on which Perl runs are so forthcoming with this information as is Unix. The Mail::Util module from CPAN (part of the MailTools package) provides a @@ -326,6 +326,18 @@ CPAN). No ONC::RPC module is known. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997 Tom Christiansen and Nathan Torkington. -All rights reserved. See L for distribution information. - +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. diff --git a/pod/perlform.pod b/pod/perlform.pod index 0b2a68c..6b65e04 100644 --- a/pod/perlform.pod +++ b/pod/perlform.pod @@ -233,11 +233,11 @@ of the page, however wide it is." You have to specify where it goes. The truly desperate can generate their own format on the fly, based on the current number of columns, and then eval() it: - $format = "format STDOUT = \n"; - . '^' . '<' x $cols . "\n"; - . '$entry' . "\n"; - . "\t^" . "<" x ($cols-8) . "~~\n"; - . '$entry' . "\n"; + $format = "format STDOUT = \n" + . '^' . '<' x $cols . "\n" + . '$entry' . "\n" + . "\t^" . "<" x ($cols-8) . "~~\n" + . '$entry' . "\n" . ".\n"; print $format if $Debugging; eval $format; @@ -295,7 +295,7 @@ For example: print "Wow, I just stored `$^A' in the accumulator!\n"; -Or to make an swrite() subroutine which is to write() what sprintf() +Or to make an swrite() subroutine, which is to write() what sprintf() is to printf(), do this: use Carp; @@ -315,18 +315,18 @@ is to printf(), do this: =head1 WARNINGS -The lone dot that ends a format can also prematurely end an email +The lone dot that ends a format can also prematurely end a mail message passing through a misconfigured Internet mailer (and based on experience, such misconfiguration is the rule, not the exception). So -when sending format code through email, you should indent it so that +when sending format code through mail, you should indent it so that the format-ending dot is not on the left margin; this will prevent -email cutoff. +SMTP cutoff. Lexical variables (declared with "my") are not visible within a format unless the format is declared within the scope of the lexical variable. (They weren't visible at all before version 5.001.) -Formats are the only part of Perl which unconditionally use information +Formats are the only part of Perl that unconditionally use information from a program's locale; if a program's environment specifies an LC_NUMERIC locale, it is always used to specify the decimal point character in formatted output. Perl ignores all other aspects of locale diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index e867a0c..25a97ff 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -1,4 +1,3 @@ - =head1 NAME perlfunc - Perl builtin functions @@ -53,26 +52,36 @@ nonabortive failure is generally indicated in a scalar context by returning the undefined value, and in a list context by returning the null list. -Remember the following rule: - -=over 8 - -=item I - -=back - +Remember the following important rule: There is B that relates +the behavior of an expression in list context to its behavior in scalar +context, or vice versa. It might do two totally different things. Each operator and function decides which sort of value it would be most appropriate to return in a scalar context. Some operators return the -length of the list that would have been returned in a list context. Some +length of the list that would have been returned in list context. Some operators return the first value in the list. Some operators return the last value in the list. Some operators return a count of successful operations. In general, they do what you want, unless you want consistency. +An named array in scalar context is quite different from what would at +first glance appear to be a list in scalar context. You can't get a list +like C<(1,2,3)> into being in scalar context, because the compiler knows +the context at compile time. It would generate the scalar comma operator +there, not the list construction version of the comma. That means it +was never a list to start with. + +In general, functions in Perl that serve as wrappers for system calls +of the same name (like chown(2), fork(2), closedir(2), etc.) all return +true when they succeed and C otherwise, as is usually mentioned +in the descriptions below. This is different from the C interfaces, +which return -1 on failure. Exceptions to this rule are wait(), +waitpid(), and syscall(). System calls also set the special C<$!> +variable on failure. Other functions do not, except accidentally. + =head2 Perl Functions by Category Here are Perl's functions (including things that look like -functions, like some of the keywords and named operators) +functions, like some keywords and named operators) arranged by category. Some functions appear in more than one place. @@ -189,7 +198,7 @@ C, C, C, C, C, C, C, C, C, C, C, C * - C was a keyword in perl4, but in perl5 it is an -operator which can be used in expressions. +operator, which can be used in expressions. =item Functions obsoleted in perl5 @@ -254,7 +263,7 @@ operator may be any of: The interpretation of the file permission operators C<-r>, C<-R>, C<-w>, C<-W>, C<-x>, and C<-X> is based solely on the mode of the file and the uids and gids of the user. There may be other reasons you can't actually -read, write or execute the file. Also note that, for the superuser, +read, write, or execute the file, such as AFS access control lists. Also note that, for the superuser, C<-r>, C<-R>, C<-w>, and C<-W> always return 1, and C<-x> and C<-X> return 1 if any execute bit is set in the mode. Scripts run by the superuser may thus need to do a stat() to determine the actual mode of the @@ -265,7 +274,7 @@ Example: while (<>) { chop; next unless -f $_; # ignore specials - ... + #... } Note that C<-s/a/b/> does not do a negated substitution. Saying @@ -274,7 +283,7 @@ following a minus are interpreted as file tests. The C<-T> and C<-B> switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or -characters with the high bit set. If too many odd characters (E30%) +characters with the high bit set. If too many strange characters (E30%) are found, it's a C<-B> file, otherwise it's a C<-T> file. Also, any file containing null in the first block is considered a binary file. If C<-T> or C<-B> is used on a filehandle, the current stdio buffer is examined @@ -336,17 +345,18 @@ and sleep() calls. If you want to use alarm() to time out a system call you need to use an eval/die pair. You can't rely on the alarm causing the system call to -fail with $! set to EINTR because Perl sets up signal handlers to -restart system calls on some systems. Using eval/die always works. +fail with C<$!> set to EINTR because Perl sets up signal handlers to +restart system calls on some systems. Using eval/die always works, +modulo the caveats given in L. eval { - local $SIG{ALRM} = sub { die "alarm\n" }; # NB \n required + local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required alarm $timeout; $nread = sysread SOCKET, $buffer, $size; alarm 0; }; - die if $@ && $@ ne "alarm\n"; # propagate errors if ($@) { + die unless $@ eq "alarm\n"; # propagate unexpected errors # timed out } else { @@ -378,7 +388,7 @@ translated to CR LF on output. Binmode has no effect under Unix; in MS-DOS and similarly archaic systems, it may be imperative--otherwise your MS-DOS-damaged C library may mangle your file. The key distinction between systems that need binmode and those that don't is their text file -formats. Systems like Unix and Plan9 that delimit lines with a single +formats. Systems like Unix, MacOS, and Plan9 that delimit lines with a single character, and that encode that character in C as '\n', do not need C. The rest need it. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. @@ -392,17 +402,17 @@ an object in the CLASSNAME package--or the current package if no CLASSNAME is specified, which is often the case. It returns the reference for convenience, because a bless() is often the last thing in a constructor. Always use the two-argument version if the function doing the blessing -might be inherited by a derived class. See L for more about the -blessing (and blessings) of objects. +might be inherited by a derived class. See L and L +for more about the blessing (and blessings) of objects. =item caller EXPR =item caller -Returns the context of the current subroutine call. In a scalar context, +Returns the context of the current subroutine call. In scalar context, returns the caller's package name if there is a caller, that is, if we're in a subroutine or eval() or require(), and the undefined value -otherwise. In a list context, returns +otherwise. In list context, returns ($package, $filename, $line) = caller; @@ -464,7 +474,7 @@ VARIABLE is omitted, it chomps $_. Example: while (<>) { chomp; # avoid \n on last field @array = split(/:/); - ... + # ... } You can actually chomp anything that's an lvalue, including an assignment: @@ -490,7 +500,7 @@ Example: while (<>) { chop; # avoid \n on last field @array = split(/:/); - ... + #... } You can actually chop anything that's an lvalue, including an assignment: @@ -517,13 +527,13 @@ Here's an example that looks up nonnumeric uids in the passwd file: print "User: "; chop($user = ); - print "Files: " + print "Files: "; chop($pattern = ); ($login,$pass,$uid,$gid) = getpwnam($user) or die "$user not in passwd file"; - @ary = <${pattern}>; # expand filenames + @ary = glob($pattern); # expand filenames chown $uid, $gid, @ary; On most systems, you are not allowed to change the ownership of the @@ -544,12 +554,12 @@ If NUMBER is omitted, uses $_. =item chroot -This function works as the system call by the same name: it makes the +This function works like the system call by the same name: it makes the named directory the new root directory for all further pathnames that -begin with a "/" by your process and all of its children. (It doesn't +begin with a "/" by your process and all its children. (It doesn't change your current working directory, which is unaffected.) For security reasons, this call is restricted to the superuser. If FILENAME is -omitted, does chroot to $_. +omitted, does a chroot to $_. =item close FILEHANDLE @@ -565,26 +575,32 @@ counter ($.), while the implicit close done by open() does not. If the file handle came from a piped open C will additionally return FALSE if one of the other system calls involved fails or if the program exits with non-zero status. (If the only problem was that the -program exited non-zero $! will be set to 0.) Also, closing a pipe will -wait for the process executing on the pipe to complete, in case you +program exited non-zero $! will be set to 0.) Also, closing a pipe +waits for the process executing on the pipe to complete, in case you want to look at the output of the pipe afterwards. Closing a pipe explicitly also puts the exit status value of the command into C<$?>. + Example: open(OUTPUT, '|sort >foo') # pipe to sort or die "Can't start sort: $!"; - ... # print stuff to output + #... # print stuff to output close OUTPUT # wait for sort to finish or warn $! ? "Error closing sort pipe: $!" : "Exit status $? from sort"; open(INPUT, 'foo') # get sort's results or die "Can't open 'foo' for input: $!"; -FILEHANDLE may be an expression whose value gives the real filehandle name. +FILEHANDLE may be an expression whose value can be used as an indirect +filehandle, usually the real filehandle name. =item closedir DIRHANDLE -Closes a directory opened by opendir(). +Closes a directory opened by opendir() and returns the success of that +system call. + +DIRHANDLE may be an expression whose value can be used as an indirect +dirhandle, usually the real dirhandle name. =item connect SOCKET,NAME @@ -624,7 +640,7 @@ to check the condition at the top of the loop. =item cos EXPR -Returns the cosine of EXPR (expressed in radians). If EXPR is omitted +Returns the cosine of EXPR (expressed in radians). If EXPR is omitted, takes cosine of $_. For the inverse cosine operation, you may use the POSIX::acos() @@ -725,10 +741,10 @@ doesn't I indicate an exceptional condition: pop() returns C when its argument is an empty array, I when the element to return happens to be C. -You may also use defined() to check whether a subroutine exists. On -the other hand, use of defined() upon aggregates (hashes and arrays) -is not guaranteed to produce intuitive results, and should probably be -avoided. +You may also use defined() to check whether a subroutine exists, by +saying C without parentheses. On the other hand, use +of defined() upon aggregates (hashes and arrays) is not guaranteed to +produce intuitive results, and should probably be avoided. When used on a hash element, it tells you whether the value is defined, not whether the key exists in the hash. Use L for the latter @@ -749,7 +765,7 @@ defined values. For example, if you say "ab" =~ /a(.*)b/; -the pattern match succeeds, and $1 is defined, despite the fact that it +The pattern match succeeds, and $1 is defined, despite the fact that it matched "nothing". But it didn't really match nothing--rather, it matched something that happened to be 0 characters long. This is all very above-board and honest. When a function returns an undefined value, @@ -768,11 +784,12 @@ should instead use a simple test for size: if (%a_hash) { print "has hash members\n" } Using undef() on these, however, does clear their memory and then report -them as not defined anymore, but you shoudln't do that unless you don't +them as not defined anymore, but you shouldn't do that unless you don't plan to use them again, because it saves time when you load them up -again to have memory already ready to be filled. +again to have memory already ready to be filled. The normal way to +free up space used by an aggregate is to assign the empty list. -This counterintuitive behaviour of defined() on aggregates may be +This counterintuitive behavior of defined() on aggregates may be changed, fixed, or broken in a future release of Perl. See also L, L, L. @@ -796,20 +813,20 @@ And so does this: delete @HASH{keys %HASH} -(But both of these are slower than the undef() command.) Note that the -EXPR can be arbitrarily complicated as long as the final operation is a -hash element lookup or hash slice: +(But both of these are slower than just assigning the empty list, or +using undef().) Note that the EXPR can be arbitrarily complicated as +long as the final operation is a hash element lookup or hash slice: delete $ref->[$x][$y]{$key}; delete @{$ref->[$x][$y]}{$key1, $key2, @morekeys}; =item die LIST -Outside of an eval(), prints the value of LIST to C and exits with +Outside an eval(), prints the value of LIST to C and exits with the current value of C<$!> (errno). If C<$!> is 0, exits with the value of C<($? EE 8)> (backtick `command` status). If C<($? EE 8)> is 0, exits with 255. Inside an eval(), the error message is stuffed into -C<$@>, and the eval() is terminated with the undefined value; this makes +C<$@> and the eval() is terminated with the undefined value. This makes die() the way to raise an exception. Equivalent examples: @@ -842,7 +859,7 @@ This is useful for propagating exceptions: If $@ is empty then the string "Died" is used. -You can arrange for a callback to be called just before the die() does +You can arrange for a callback to be run just before the die() does its deed, by setting the C<$SIG{__DIE__}> hook. The associated handler will be called with the error text and can change the error message, if it sees fit, by calling die() again. See L for details on @@ -879,7 +896,7 @@ is just like scalar eval `cat stat.pl`; -except that it's more efficient, more concise, keeps track of the +except that it's more efficient and concise, keeps track of the current filename for error messages, and searches all the B<-I> libraries if the file isn't in the current directory (see also the @INC array in L). It is also different in how @@ -889,9 +906,21 @@ reparse the file every time you call it, so you probably don't want to do this inside a loop. Note that inclusion of library modules is better done with the -use() and require() operators, which also do error checking +use() and require() operators, which also do automatic error checking and raise an exception if there's a problem. +You might like to use C to read in a program configuration +file. Manual error checking can be done this way: + + # read in config files: system first, then user + for $file ('/share/prog/defaults.rc", "$ENV{HOME}/.someprogrc") { + unless ($return = do $file) { + warn "couldn't parse $file: $@" if $@; + warn "couldn't do $file: $!" unless defined $return; + warn "couldn't run $file" unless $return; + } + } + =item dump LABEL This causes an immediate core dump. Primarily this is so that you can @@ -900,7 +929,7 @@ after having initialized all your variables at the beginning of the program. When the new binary is executed it will begin by executing a C (with all the restrictions that C suffers). Think of it as a goto with an intervening core dump and reincarnation. If LABEL -is omitted, restarts the program from the top. WARNING: any files +is omitted, restarts the program from the top. WARNING: Any files opened at the time of the dump will NOT be open any more when the program is reincarnated, with possible resulting confusion on the part of Perl. See also B<-u> option in L. @@ -925,18 +954,22 @@ Example: QUICKSTART: Getopt('f'); +This operator is largely obsolete, partly because it's very hard to +convert a core file into an executable, and because the real perl-to-C +compiler has superseded it. + =item each HASH -When called in a list context, returns a 2-element list consisting of the +When called in list context, returns a 2-element list consisting of the key and value for the next element of a hash, so that you can iterate over -it. When called in a scalar context, returns the key for only the next +it. When called in scalar context, returns the key for only the "next" element in the hash. (Note: Keys may be "0" or "", which are logically false; you may wish to avoid constructs like C for this reason.) Entries are returned in an apparently random order. When the hash is entirely read, a null array is returned in list context (which when -assigned produces a FALSE (0) value), and C is returned in a +assigned produces a FALSE (0) value), and C in scalar context. The next call to each() after that will start iterating again. There is a single iterator for each hash, shared by all each(), keys(), and values() function calls in the program; it can be reset by @@ -961,14 +994,14 @@ See also keys() and values(). Returns 1 if the next read on FILEHANDLE will return end of file, or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value -gives the real filehandle name. (Note that this function actually -reads a character and then ungetc()s it, so it is not very useful in an +gives the real filehandle. (Note that this function actually +reads a character and then ungetc()s it, so isn't very useful in an interactive context.) Do not read from a terminal file (or call C on it) after end-of-file is reached. Filetypes such as terminals may lose the end-of-file condition if you do. An C without an argument uses the last file read as argument. -Empty parentheses () may be used to indicate the pseudo file formed of +Using C with empty parentheses is very different. It indicates the pseudo file formed of the files listed on the command line, i.e., C is reasonable to use inside a CE)> loop to detect the end of only the last file. Use C or eof without the parentheses to test @@ -976,13 +1009,15 @@ I file in a while (EE) loop. Examples: # reset line numbering on each input file while (<>) { + next if /^\s*#/; # skip comments print "$.\t$_"; - close(ARGV) if (eof); # Not eof(). + } continue { + close ARGV if eof; # Not eof()! } # insert dashes just before last line of last file while (<>) { - if (eof()) { + if (eof()) { # check for end of current file print "--------------\n"; close(ARGV); # close or break; is needed if we # are reading from the terminal @@ -991,7 +1026,7 @@ I file in a while (EE) loop. Examples: } Practical hint: you almost never need to use C in Perl, because the -input operators return undef when they run out of data. +input operators return C when they run out of data. =item eval EXPR @@ -999,7 +1034,7 @@ input operators return undef when they run out of data. In the first form, the return value of EXPR is parsed and executed as if it were a little Perl program. The value of the expression (which is itself -determined within a scalar context) is first parsed, and if there are no +determined within scalar context) is first parsed, and if there weren't any errors, executed in the context of the current Perl program, so that any variable settings or subroutine and format definitions remain afterwards. Note that the value is parsed every time the eval executes. If EXPR is @@ -1017,9 +1052,9 @@ The final semicolon, if any, may be omitted from the value of EXPR or within the BLOCK. In both forms, the value returned is the value of the last expression -evaluated inside the mini-program, or a return statement may be used, just +evaluated inside the mini-program; a return statement may be also used, just as with subroutines. The expression providing the return value is evaluated -in void, scalar or array context, depending on the context of the eval itself. +in void, scalar, or list context, depending on the context of the eval itself. See L for more on how the evaluation context can be determined. If there is a syntax error or runtime error, or a die() statement is @@ -1047,7 +1082,7 @@ Examples: eval '$answer = $a / $b'; warn $@ if $@; # a compile-time error - eval { $answer = }; + eval { $answer = }; # WRONG # a run-time error eval '$answer ='; # sets $@ @@ -1079,7 +1114,7 @@ being looked at when: eval '$x'; # CASE 3 eval { $x }; # CASE 4 - eval "\$$x++" # CASE 5 + eval "\$$x++"; # CASE 5 $$x++; # CASE 6 Cases 1 and 2 above behave identically: they run the code contained in @@ -1103,24 +1138,24 @@ returns FALSE only if the command does not exist I it is executed directly instead of via your system's command shell (see below). Since it's a common mistake to use system() instead of exec(), Perl -warns you if there is a following statement which isn't die(), warn() +warns you if there is a following statement which isn't die(), warn(), or exit() (if C<-w> is set - but you always do that). If you I want to follow an exec() with some other statement, you can use one of these styles to avoid the warning: - exec ('foo') or print STDERR "couldn't exec foo"; - { exec ('foo') }; print STDERR "couldn't exec foo"; + exec ('foo') or print STDERR "couldn't exec foo: $!"; + { exec ('foo') }; print STDERR "couldn't exec foo: $!"; -If there is more than one argument in LIST, or if LIST is an array with -more than one value, calls execvp(3) with the arguments in LIST. If -there is only one scalar argument, the argument is checked for shell -metacharacters, and if there are any, the entire argument is passed to -the system's command shell for parsing (this is C on Unix -platforms, but varies on other platforms). If there are no shell -metacharacters in the argument, it is split into words and passed -directly to execvp(), which is more efficient. Note: exec() and -system() do not flush your output buffer, so you may need to set C<$|> -to avoid lost output. Examples: +If there is more than one argument in LIST, or if LIST is an array +with more than one value, calls execvp(3) with the arguments in LIST. +If there is only one scalar argument or an array with one element in it, +the argument is checked for shell metacharacters, and if there are any, +the entire argument is passed to the system's command shell for parsing +(this is C on Unix platforms, but varies on other platforms). +If there are no shell metacharacters in the argument, it is split into +words and passed directly to execvp(), which is more efficient. Note: +exec() and system() do not flush your output buffer, so you may need to +set C<$|> to avoid lost output. Examples: exec '/bin/echo', 'Your arguments are: ', @ARGV; exec "sort $outfile | uniq"; @@ -1143,6 +1178,21 @@ When the arguments get executed via the system shell, results will be subject to its quirks and capabilities. See L for details. +Using an indirect object with C or C is also more secure. +This usage forces interpretation of the arguments as a multivalued list, +even if the list had just one argument. That way you're safe from the +shell expanding wildcards or splitting up words with whitespace in them. + + @args = ( "echo surprise" ); + + system @args; # subject to shell escapes if @args == 1 + system { $args[0] } @args; # safe even with one-arg list + +The first version, the one without the indirect object, ran the I +program, passing it C<"surprise"> an argument. The second version +didn't--it tried to run a program literally called I<"echo surprise">, +didn't find it, and set C<$?> to a non-zero value indicating failure. + =item exists EXPR Returns TRUE if the specified hash key exists in its hash array, even @@ -1158,7 +1208,13 @@ it exists, but the reverse doesn't necessarily hold true. Note that the EXPR can be arbitrarily complicated as long as the final operation is a hash key lookup: - if (exists $ref->[$x][$y]{$key}) { ... } + if (exists $ref->{"A"}{"B"}{$key}) { ... } + +Although the last element will not spring into existence just because its +existence was tested, intervening ones will. Thus C<$ref-E{"A"}> +C<$ref-E{"B"}> will spring into existence due to the existence +test for a $key element. This autovivification may be fixed in a later +release. =item exit EXPR @@ -1179,6 +1235,8 @@ You shouldn't use exit() to abort a subroutine if there's any chance that someone might want to trap whatever error happened. Use die() instead, which can be trapped by an eval(). +All C blocks are run at exit time. See L for details. + =item exp EXPR =item exp @@ -1193,18 +1251,36 @@ Implements the fcntl(2) function. You'll probably have to say use Fcntl; first to get the correct function definitions. Argument processing and -value return works just like ioctl() below. Note that fcntl() will produce -a fatal error if used on a machine that doesn't implement fcntl(2). +value return works just like ioctl() below. For example: use Fcntl; - fcntl($filehandle, F_GETLK, $packed_return_buffer); + fcntl($filehandle, F_GETFL, $packed_return_buffer) + or die "can't fcntl F_GETFL: $!"; + +You don't have to check for C on the return from +fnctl. Like ioctl, it maps a 0 return from the system +call into "0 but true" in Perl. This string is true in +boolean context and 0 in numeric context. It is also +exempt from the normal B<-w> warnings on improper numeric +conversions. + +Note that fcntl() will produce a fatal error if used on a machine that +doesn't implement fcntl(2). =item fileno FILEHANDLE Returns the file descriptor for a filehandle. This is useful for -constructing bitmaps for select(). If FILEHANDLE is an expression, the -value is taken as the name of the filehandle. +constructing bitmaps for select() and low-level POSIX tty-handling +operations. If FILEHANDLE is an expression, the value is taken as +an indirect filehandle, generally its name. + +You can use this to find out whether two handles refer to the +same underlying descriptor: + + if (fileno(THIS) == fileno(THAT)) { + print "THIS and THAT are dups\n"; + } =item flock FILEHANDLE,OPERATION @@ -1215,10 +1291,11 @@ is Perl's portable file locking interface, although it locks only entire files, not records. On many platforms (including most versions or clones of Unix), locks -established by flock() are B. This means that files -locked with flock() may be modified by programs which do not also use -flock(). Windows NT and OS/2, however, are among the platforms which -supply mandatory locking. See your local documentation for details. +established by flock() are B. Such discretionary locks +are more flexible, but offer fewer guarantees. This means that files +locked with flock() may be modified by programs that do not also use +flock(). Windows NT and OS/2 are among the platforms which +enforce mandatory locking. See your local documentation for details. OPERATION is one of LOCK_SH, LOCK_EX, or LOCK_UN, possibly combined with LOCK_NB. These constants are traditionally valued 1, 2, 8 and 4, but @@ -1271,8 +1348,9 @@ See also L for other flock() examples. =item fork -Does a fork(2) system call. Returns the child pid to the parent process -and 0 to the child process, or C if the fork is unsuccessful. +Does a fork(2) system call. Returns the child pid to the parent process, +0 to the child process, or C if the fork is unsuccessful. + Note: unflushed buffers remain unflushed in both processes, which means you may need to set C<$|> ($AUTOFLUSH in English) or call the autoflush() method of IO::Handle to avoid duplicate output. @@ -1302,7 +1380,7 @@ moribund children. Note that if your forked child inherits system file descriptors like STDIN and STDOUT that are actually connected by a pipe or socket, even -if you exit, the remote server (such as, say, httpd or rsh) won't think +if you exit, then the remote server (such as, say, httpd or rsh) won't think you're done. You should reopen those to /dev/null if it's any issue. =item format @@ -1322,10 +1400,9 @@ example: See L for many details and examples. - =item formline PICTURE,LIST -This is an internal function used by Cs, though you may call it +This is an internal function used by Cs, though you may call it, too. It formats (see L) a list of values according to the contents of PICTURE, placing the output into the format output accumulator, C<$^A> (or $ACCUMULATOR in English). @@ -1372,14 +1449,15 @@ Determination of whether $BSD_STYLE should be set is left as an exercise to the reader. The POSIX::getattr() function can do this more portably on systems -alleging POSIX compliance. +purporting POSIX compliance. See also the C module from your nearest CPAN site; details on CPAN can be found on L. =item getlogin -Returns the current login from F, if any. If null, use -getpwuid(). +Implements the C library function of the same name, which on most +systems returns the current login from F, if any. If null, +use getpwuid(). $login = getlogin || getpwuid($<) || "Kilroy"; @@ -1476,7 +1554,7 @@ machine that doesn't implement getpriority(2). =item endservent These routines perform the same functions as their counterparts in the -system library. Within a list context, the return values from the +system library. In list context, the return values from the various get routines are as follows: ($name,$passwd,$uid,$gid, @@ -1489,17 +1567,17 @@ various get routines are as follows: (If the entry doesn't exist you get a null list.) -Within a scalar context, you get the name, unless the function was a +In scalar context, you get the name, unless the function was a lookup by name, in which case you get the other thing, whatever it is. (If the entry doesn't exist you get the undefined value.) For example: - $uid = getpwnam - $name = getpwuid - $name = getpwent - $gid = getgrnam - $name = getgrgid - $name = getgrent - etc. + $uid = getpwnam($name); + $name = getpwuid($num); + $name = getpwent(); + $gid = getgrnam($name); + $name = getgrgid($num; + $name = getgrent(); + #etc. In I the fields $quota, $comment, and $expire are special cases in the sense that in many systems they are unsupported. If the @@ -1529,6 +1607,20 @@ by saying something like: ($a,$b,$c,$d) = unpack('C4',$addr[0]); +If you get tired of remembering which element of the return list contains +which return value, by-name interfaces are also provided in modules: +File::stat, Net::hostent, Net::netent, Net::protoent, Net::servent, +Time::gmtime, Time::localtime, and User::grent. These override the +normal built-in, replacing them with versions that return objects with +the appropriate names for each field. For example: + + use File::stat; + use User::pwent; + $is_his = (stat($filename)->uid == pwent($whoever)->uid); + +Even though it looks like they're the same method calls (uid), +they aren't, because a File::stat object is different from a User::pwent object. + =item getsockname SOCKET Returns the packed sockaddr address of this end of the SOCKET connection. @@ -1539,13 +1631,13 @@ Returns the packed sockaddr address of this end of the SOCKET connection. =item getsockopt SOCKET,LEVEL,OPTNAME -Returns the socket option requested, or undefined if there is an error. +Returns the socket option requested, or undef if there is an error. =item glob EXPR =item glob -Returns the value of EXPR with filename expansions such as a shell would +Returns the value of EXPR with filename expansions such as the standard Unix shell /bin/sh would do. This is the internal function implementing the C*.cE> operator, but you can use it directly. If EXPR is omitted, $_ is used. The C*.cE> operator is discussed in more detail in @@ -1568,7 +1660,7 @@ years since 1900, I simply the last two digits of the year. If EXPR is omitted, does C. -In a scalar context, returns the ctime(3) value: +In scalar context, returns the ctime(3) value: $now_string = gmtime; # e.g., "Thu Oct 13 04:54:34 1994" @@ -1648,7 +1740,7 @@ see L.) If EXPR is omitted, uses $_. =item import -There is no builtin import() function. It is merely an ordinary +There is no builtin import() function. It is just an ordinary method (subroutine) defined (or inherited) by modules that wish to export names to another module. The use() function calls the import() method for the package used. See also L, L, and L. @@ -1668,6 +1760,10 @@ one less than the base, ordinarily -1. =item int Returns the integer portion of EXPR. If EXPR is omitted, uses $_. +You should not use this for rounding, because it truncates +towards 0, and because machine representations of floating point +numbers can sometimes produce counterintuitive results. Usually sprintf() or printf(), +or the POSIX::floor or POSIX::ceil functions, would serve you better. =item ioctl FILEHANDLE,FUNCTION,SCALAR @@ -1678,7 +1774,7 @@ Implements the ioctl(2) function. You'll probably have to say first to get the correct function definitions. If F doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as Fsys/ioctl.hE>. -(There is a Perl script called B that comes with the Perl kit which +(There is a Perl script called B that comes with the Perl kit that may help you in this, but it's nontrivial.) SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR will be passed as the third argument of the actual ioctl call. (If SCALAR @@ -1714,6 +1810,9 @@ system: ($retval = ioctl(...)) || ($retval = -1); printf "System returned %d\n", $retval; +The special string "0 but true" is excempt from B<-w> complaints +about improper numeric conversions. + =item join EXPR,LIST Joins the separate strings of LIST into a single string with @@ -1749,7 +1848,7 @@ or how about sorted by key: To sort an array by value, you'll need to use a C function. Here's a descending numeric sort of a hash by its values: - foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash)) { + foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { printf "%4d %s\n", $hash{$key}, $key; } @@ -1760,7 +1859,8 @@ an array by assigning a larger number to $#array.) If you say keys %hash = 200; -then C<%hash> will have at least 200 buckets allocated for it. These +then C<%hash> will have at least 200 buckets allocated for it--256 of them, in fact, since +it rounds up to the next power of two. These buckets will be retained even if you do C<%hash = ()>, use C if you want to free the storage while C<%hash> is still in scope. You can't shrink the number of buckets allocated for the hash using @@ -1793,7 +1893,7 @@ C block, if any, is not executed: LINE: while () { last LINE if /^$/; # exit when done with header - ... + #... } See also L for an illustration of how C, C, and @@ -1823,13 +1923,13 @@ If EXPR is omitted, uses $_. =item length -Returns the length in characters of the value of EXPR. If EXPR is +Returns the length in bytes of the value of EXPR. If EXPR is omitted, returns length of $_. =item link OLDFILE,NEWFILE -Creates a new filename linked to the old filename. Returns 1 for -success, 0 otherwise. +Creates a new filename linked to the old filename. Returns TRUE for +success, FALSE otherwise. =item listen SOCKET,QUEUESIZE @@ -1838,10 +1938,10 @@ it succeeded, FALSE otherwise. See example in L, or C. If more than one value is listed, the -list must be placed in parentheses. See L for details, including issues with tied arrays and hashes. +A local modifies the listed variables to be local to the enclosing +block, file, or eval. If more than one value is listed, the list must +be placed in parentheses. See L +for details, including issues with tied arrays and hashes. You really probably want to be using my() instead, because local() isn't what most people think of as "local". See L). -In a scalar context, returns the ctime(3) value: +In scalar context, returns the ctime(3) value: $now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994" @@ -1873,9 +1973,9 @@ instead a Perl builtin. Also see the Time::Local module, and the strftime(3) and mktime(3) function available via the POSIX module. To get somewhat similar but locale dependent date strings, set up your locale environment variables appropriately (please see L) -and try for example +and try for example: - use POSIX qw(strftime) + use POSIX qw(strftime); $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime; Note that the C<%a> and C<%b>, the short forms of the day of the week @@ -1885,7 +1985,7 @@ and the month of the year, may not necessarily be three characters wide. =item log -Returns logarithm (base I) of EXPR. If EXPR is omitted, returns log +Returns the natural logarithm (base I) of EXPR. If EXPR is omitted, returns log of $_. =item lstat FILEHANDLE @@ -1894,9 +1994,10 @@ of $_. =item lstat -Does the same thing as the stat() function, but stats a symbolic link -instead of the file the symbolic link points to. If symbolic links are -unimplemented on your system, a normal stat() is done. +Does the same thing as the stat() function (including setting the +special C<_> filehandle) but stats a symbolic link instead of the file +the symbolic link points to. If symbolic links are unimplemented on +your system, a normal stat() is done. If EXPR is omitted, stats $_. @@ -1935,13 +2036,13 @@ original list for which the BLOCK or EXPR evaluates to true. =item mkdir FILENAME,MODE Creates the directory specified by FILENAME, with permissions specified -by MODE (as modified by umask). If it succeeds it returns 1, otherwise -it returns 0 and sets C<$!> (errno). +by MODE (as modified by umask). If it succeeds it returns TRUE, otherwise +it returns FALSE and sets C<$!> (errno). =item msgctl ID,CMD,ARG Calls the System V IPC function msgctl(2). If CMD is &IPC_STAT, then ARG -must be a variable which will hold the returned msqid_ds structure. +must be a variable that will hold the returned msqid_ds structure. Returns like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. @@ -1969,7 +2070,7 @@ an error. =item my EXPR A "my" declares the listed variables to be local (lexically) to the -enclosing block, subroutine, C, or C'd file. If +enclosing block, file, or C. If more than one value is listed, the list must be placed in parentheses. See L for details. @@ -1982,7 +2083,7 @@ the next iteration of the loop: LINE: while () { next LINE if /^#/; # discard comments - ... + #... } Note that if there were a C block on the above, it would get @@ -2030,17 +2131,20 @@ output. If the filename begins with '>>', the file is opened for appending. You can put a '+' in front of the '>' or '<' to indicate that you want both read and write access to the file; thus '+<' is almost always preferred for read/write updates--the '+>' mode would clobber the -file first. The prefix and the filename may be separated with spaces. +file first. You can't usually use either read-write mode for updating +textfiles, since they have variable length records. See the B<-i> +switch in L for a better approach. + +The prefix and the filename may be separated with spaces. These various prefixes correspond to the fopen(3) modes of 'r', 'r+', 'w', 'w+', 'a', and 'a+'. -If the filename begins with "|", the filename is interpreted as a command -to which output is to be piped, and if the filename ends with a "|", the -filename is interpreted See L for more -examples of this. as command which pipes input to us. (You may not have -a raw open() to a command that pipes both in I out, but see -L, L, and L -for alternatives.) +If the filename begins with "|", the filename is interpreted as a +command to which output is to be piped, and if the filename ends with a +"|", the filename is interpreted See L +for more examples of this. (You are not allowed to open() to a command +that pipes both in I out, but see L, L, +and L for alternatives.) Opening '-' opens STDIN and opening 'E-' opens STDOUT. Open returns nonzero upon success, the undefined value otherwise. If the open @@ -2051,15 +2155,15 @@ If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should check out L for tips for dealing with this. The key distinction between systems that need binmode -and those that don't is their text file formats. Systems like Unix and -Plan9 that delimit lines with a single character, and that encode that +and those that don't is their text file formats. Systems like Unix, MacOS, and +Plan9, which delimit lines with a single character, and which encode that character in C as '\n', do not need C. The rest need it. When opening a file, it's usually a bad idea to continue normal execution if the request failed, so C is frequently used in connection with C. Even if C won't do what you want (say, in a CGI script, where you want to make a nicely formatted error message (but there are -modules which can help with that problem)) you should always check +modules that can help with that problem)) you should always check the return value from opening a file. The infrequent exception is when working with an unopened filehandle is actually what you want to do. @@ -2088,25 +2192,26 @@ Examples: } sub process { - local($filename, $input) = @_; + my($filename, $input) = @_; $input++; # this is a string increment unless (open($input, $filename)) { print STDERR "Can't open $filename: $!\n"; return; } + local $_; while (<$input>) { # note use of indirection if (/^#include "(.*)"/) { process($1, $input); next; } - ... # whatever + #... # whatever } } You may also, in the Bourne shell tradition, specify an EXPR beginning with "E&", in which case the rest of the string is interpreted as the -name of a filehandle (or file descriptor, if numeric) which is to be +name of a filehandle (or file descriptor, if numeric) to be duped and opened. You may use & after E, EE, E, +E, +EE, and +E. The mode you specify should match the mode of the original filehandle. @@ -2116,8 +2221,8 @@ Here is a script that saves, redirects, and restores STDOUT and STDERR: #!/usr/bin/perl - open(SAVEOUT, ">&STDOUT"); - open(SAVEERR, ">&STDERR"); + open(OLDOUT, ">&STDOUT"); + open(OLDERR, ">&STDERR"); open(STDOUT, ">foo.out") || die "Can't redirect stdout"; open(STDERR, ">&STDOUT") || die "Can't dup stdout"; @@ -2131,8 +2236,8 @@ STDERR: close(STDOUT); close(STDERR); - open(STDOUT, ">&SAVEOUT"); - open(STDERR, ">&SAVEERR"); + open(STDOUT, ">&OLDOUT"); + open(STDERR, ">&OLDERR"); print STDOUT "stdout 2\n"; print STDERR "stderr 2\n"; @@ -2165,21 +2270,47 @@ The following pairs are more or less equivalent: See L for more examples of this. -NOTE: On any operation which may do a fork, unflushed buffers remain +NOTE: On any operation that may do a fork, any unflushed buffers remain unflushed in both processes, which means you may need to set C<$|> to avoid duplicate output. Closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in C<$?>. +The filename passed to open will have leading and trailing +whitespace deleted, and the normal redirection chararacters +honored. This property, known as "magic open", +can often be used to good effect. A user could specify a filename of +"rsh cat file |", or you could change certain filenames as needed: + + $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/; + open(FH, $filename) or die "Can't open $filename: $!"; + +However, to open a file with arbitrary weird characters in it, it's +necessary to protect any leading and trailing whitespace: + + $file =~ s#^(\s)#./$1#; + open(FOO, "< $file\0"); + +If you want a "real" C open() (see L on your system), then you +should use the sysopen() function, which involves no such magic. This is +another way to protect your filenames from interpretation. For example: + + use IO::Handle; + sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL) + or die "sysopen $path: $!"; + $oldfh = select(HANDLE); $| = 1; select($oldfh); + print HANDLE "stuff $$\n"); + seek(HANDLE, 0, 0); + print "File contains: ", ; + Using the constructor from the IO::Handle package (or one of its -subclasses, such as IO::File or IO::Socket), -you can generate anonymous filehandles which have the scope of whatever -variables hold references to them, and automatically close whenever -and however you leave that scope: +subclasses, such as IO::File or IO::Socket), you can generate anonymous +filehandles that have the scope of whatever variables hold references to +them, and automatically close whenever and however you leave that scope: use IO::File; - ... + #... sub read_myfile_munged { my $ALL = shift; my $handle = new IO::File; @@ -2191,26 +2322,6 @@ and however you leave that scope: $first; # Or here. } -The filename that is passed to open will have leading and trailing -whitespace deleted. To open a file with arbitrary weird -characters in it, it's necessary to protect any leading and trailing -whitespace thusly: - - $file =~ s#^(\s)#./$1#; - open(FOO, "< $file\0"); - -If you want a "real" C open() (see L on your system), then -you should use the sysopen() function. This is another way to -protect your filenames from interpretation. For example: - - use IO::Handle; - sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL, 0700) - or die "sysopen $path: $!"; - HANDLE->autoflush(1); - HANDLE->print("stuff $$\n"); - seek(HANDLE, 0, 0); - print "File contains: ", ; - See L for some details about mixing reading and writing. =item opendir DIRHANDLE,EXPR @@ -2283,7 +2394,7 @@ follows: X Back up a byte. @ Null fill to absolute position. -Each letter may optionally be followed by a number which gives a repeat +Each letter may optionally be followed by a number giving a repeat count. With all types except "a", "A", "b", "B", "h", "H", and "P" the pack function will gobble up that many values from the LIST. A * for the repeat count means to use however many items are left. The "a" and "A" @@ -2340,6 +2451,8 @@ Examples: The same template may generally also be used in the unpack function. +=item package + =item package NAMESPACE Declares the compilation unit as being in the given namespace. The scope @@ -2350,12 +2463,16 @@ statement affects only dynamic variables--including those you've used local() on--but I lexical variables created with my(). Typically it would be the first declaration in a file to be included by the C or C operator. You can switch into a package in more than one place; -it influences merely which symbol table is used by the compiler for the +it merely influences which symbol table is used by the compiler for the rest of that block. You can refer to variables and filehandles in other packages by prefixing the identifier with the package name and a double colon: C<$Package::Variable>. If the package name is null, the C
package as assumed. That is, C<$::sail> is equivalent to C<$main::sail>. +If NAMESPACE is omitted, then there is no current package, and all +identifiers must be fully qualified or lexicals. This is stricter +than C, since it also extends to function names. + See L for more information about packages, modules, and classes. See L for other scoping issues. @@ -2408,11 +2525,11 @@ token is a term, it may be misinterpreted as an operator unless you interpose a + or put parentheses around the arguments.) If FILEHANDLE is omitted, prints by default to standard output (or to the last selected output channel--see L). If LIST is also omitted, prints $_ to -STDOUT. To set the default output channel to something other than +the currently selected output channel. To set the default output channel to something other than STDOUT use the select operation. Note that, because print takes a -LIST, anything in the LIST is evaluated in a list context, and any +LIST, anything in the LIST is evaluated in list context, and any subroutine that you call will have one or more of its expressions -evaluated in a list context. Also be careful not to follow the print +evaluated in list context. Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print--interpose a + or put parentheses around all the arguments. @@ -2434,7 +2551,7 @@ in effect, the character used for the decimal point in formatted real numbers is affected by the LC_NUMERIC locale. See L. Don't fall into the trap of using a printf() when a simple -print() would do. The print() is more efficient, and less +print() would do. The print() is more efficient and less error prone. =item prototype FUNCTION @@ -2507,15 +2624,15 @@ specified FILEHANDLE. Returns the number of bytes actually read, or undef if there was an error. SCALAR will be grown or shrunk to the length actually read. An OFFSET may be specified to place the read data at some other place than the beginning of the string. This call -is actually implemented in terms of stdio's fread call. To get a true -read system call, see sysread(). +is actually implemented in terms of stdio's fread(3) call. To get a true +read(2) system call, see sysread(). =item readdir DIRHANDLE Returns the next directory entry for a directory opened by opendir(). -If used in a list context, returns all the rest of the entries in the +If used in list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in -a scalar context or a null list in a list context. +scalar context or a null list in list context. If you're planning to filetest the return values out of a readdir(), you'd better prepend the directory in question. Otherwise, because we didn't @@ -2527,7 +2644,7 @@ chdir() there, it would have been testing the wrong file. =item readline EXPR -Reads from the file handle EXPR. In scalar context, a single line +Reads from the filehandle whose typeglob is contained in EXPR. In scalar context, a single line is read and returned. In list context, reads until end-of-file is reached and returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR). @@ -2535,6 +2652,9 @@ This is the internal function implementing the CEXPRE> operator, but you can use it directly. The CEXPRE> operator is discussed in more detail in L. + $line = ; + $line = readline(*STDIN); # same thing + =item readlink EXPR =item readlink @@ -2546,7 +2666,7 @@ omitted, uses $_. =item readpipe EXPR -EXPR is interpolated and then executed as a system command. +EXPR is executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines @@ -2584,7 +2704,7 @@ themselves about what was just input: $front = $_; while () { if (/}/) { # end of comment? - s|^|$front{|; + s|^|$front\{|; redo LINE; } } @@ -2617,7 +2737,7 @@ name is returned instead. You can think of ref() as a typeof() operator. if (ref($r) eq "HASH") { print "r is a reference to a hash.\n"; } - if (!ref ($r) { + if (!ref($r)) { print "r is not a reference at all.\n"; } @@ -2642,9 +2762,9 @@ essentially just a variety of eval(). Has semantics similar to the following subroutine: sub require { - local($filename) = @_; + my($filename) = @_; return 1 if $INC{$filename}; - local($realfilename,$result); + my($realfilename,$result); ITER: { foreach $prefix (@INC) { $realfilename = "$prefix/$filename"; @@ -2658,7 +2778,7 @@ subroutine: die $@ if $@; die "$filename did not return true value" unless $result; $INC{$filename} = $realfilename; - $result; + return $result; } Note that the file will not be included twice under the same specified @@ -2675,20 +2795,20 @@ modules does not risk altering your namespace. In other words, if you try this: - require Foo::Bar ; # a splendid bareword + require Foo::Bar; # a splendid bareword The require function will actually look for the "Foo/Bar.pm" file in the directories specified in the @INC array. -But if you try this : +But if you try this: $class = 'Foo::Bar'; - require $class ; # $class is not a bareword -or - require "Foo::Bar" ; # not a bareword because of the "" + require $class; # $class is not a bareword + #or + require "Foo::Bar"; # not a bareword because of the "" The require function will look for the "Foo::Bar" file in the @INC array and -will complain about not finding "Foo::Bar" there. In this case you can do : +will complain about not finding "Foo::Bar" there. In this case you can do: eval "require $class"; @@ -2720,20 +2840,20 @@ so you'll probably want to use them instead. See L. =item return -Returns from a subroutine, eval(), or do FILE with the value of the -given EXPR. Evaluation of EXPR may be in a list, scalar, or void +Returns from a subroutine, eval(), or C with the value +given in EXPR. Evaluation of EXPR may be in list, scalar, or void context, depending on how the return value will be used, and the context may vary from one execution to the next (see wantarray()). If no EXPR -is given, returns an empty list in a list context, an undefined value in -a scalar context, or nothing in a void context. +is given, returns an empty list in list context, an undefined value in +scalar context, or nothing in a void context. (Note that in the absence of a return, a subroutine, eval, or do FILE will automatically return the value of the last expression evaluated.) =item reverse LIST -In a list context, returns a list value consisting of the elements -of LIST in the opposite order. In a scalar context, concatenates the +In list context, returns a list value consisting of the elements +of LIST in the opposite order. In scalar context, concatenates the elements of LIST, and returns a string value consisting of those bytes, but in the opposite order. @@ -2767,8 +2887,8 @@ last occurrence at or before that position. =item rmdir -Deletes the directory specified by FILENAME if it is empty. If it -succeeds it returns 1, otherwise it returns 0 and sets C<$!> (errno). If +Deletes the directory specified by FILENAME if that directory is empty. If it +succeeds it returns TRUE, otherwise it returns FALSE and sets C<$!> (errno). If FILENAME is omitted, uses $_. =item s/// @@ -2777,13 +2897,13 @@ The substitution operator. See L. =item scalar EXPR -Forces EXPR to be interpreted in a scalar context and returns the value +Forces EXPR to be interpreted in scalar context and returns the value of EXPR. @counts = ( scalar @a, scalar @b, scalar @c ); There is no equivalent operator to force an expression to -be interpolated in a list context because it's in practice never +be interpolated in list context because it's in practice never needed. If you really wanted to do so, however, you could use the construction C<@{[ (some expression) ]}>, but usually a simple C<(some expression)> suffices. @@ -2875,8 +2995,8 @@ If you want to select on many filehandles you might wish to write a subroutine: sub fhbits { - local(@fhlist) = split(' ',$_[0]); - local($bits); + my(@fhlist) = split(' ',$_[0]); + my($bits); for (@fhlist) { vec($bits,fileno($_),1) = 1; } @@ -2894,7 +3014,7 @@ or to block until something becomes ready just do this $nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef); Most systems do not bother to return anything useful in $timeleft, so -calling select() in a scalar context just returns $nfound. +calling select() in scalar context just returns $nfound. Any of the bit masks can also be undef. The timeout, if specified, is in seconds, which may be fractional. Note: not all implementations are @@ -2905,13 +3025,14 @@ You can effect a sleep of 250 milliseconds this way: select(undef, undef, undef, 0.25); -B: Do not attempt to mix buffered I/O (like read() or EFHE) -with select(). You have to use sysread() instead. +B: One should not attempt to mix buffered I/O (like read() +or EFHE) with select(), except as permitted by POSIX, and even +then only on POSIX systems. You have to use sysread() instead. =item semctl ID,SEMNUM,CMD,ARG Calls the System V IPC function semctl. If CMD is &IPC_STAT or -&GETALL, then ARG must be a variable which will hold the returned +&GETALL, then ARG must be a variable that will hold the returned semid_ds structure or semaphore value array. Returns like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. @@ -2984,7 +3105,7 @@ right end. =item shmctl ID,CMD,ARG Calls the System V IPC function shmctl. If CMD is &IPC_STAT, then ARG -must be a variable which will hold the returned shmid_ds structure. +must be a variable that will hold the returned shmid_ds structure. Returns like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. @@ -2999,7 +3120,7 @@ segment id, or the undefined value if there is an error. Reads or writes the System V shared memory segment ID starting at position POS for size SIZE by attaching to it, copying in/out, and -detaching from it. When reading, VAR must be a variable which will +detaching from it. When reading, VAR must be a variable that will hold the data read. When writing, if STRING is too long, only SIZE bytes are used; if STRING is too short, nulls are written to fill out SIZE bytes. Return TRUE if successful, or FALSE if there is an error. @@ -3009,6 +3130,16 @@ SIZE bytes. Return TRUE if successful, or FALSE if there is an error. Shuts down a socket connection in the manner indicated by HOW, which has the same interpretation as in the system call of the same name. + shutdown(SOCKET, 0); # I/we have stopped reading data + shutdown(SOCKET, 1); # I/we have stopped writing data + shutdown(SOCKET, 2); # I/we have stopped using this socket + +This is useful with sockets when you want to tell the other +side you're done writing but not done reading, or vice versa. +It's also a more insistent form of close because it also +disables the filedescriptor in any forked copies in other +processes. + =item sin EXPR =item sin @@ -3033,7 +3164,9 @@ using alarm(). On some older systems, it may sleep up to a full second less than what you requested, depending on how it counts seconds. Most modern systems -always sleep the full amount. +always sleep the full amount. They may appear to sleep longer than that, +however, because your process might not be scheduled right away in a +busy multitasking system. For delays of finer granularity than one second, you may use Perl's syscall() interface to access setitimer(2) if your system supports it, @@ -3055,6 +3188,16 @@ specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as for the system call of the same name. If unimplemented, yields a fatal error. Returns TRUE if successful. +Some systems defined pipe() in terms of socketpair, in which a call +to C is essentially: + + use Socket; + socketpair(Rdr, Wtr, AF_UNIX, SOCK_STREAM, PF_UNSPEC); + shutdown(Rdr, 1); # no more writing for reader + shutdown(Wtr, 0); # no more reading for writer + +See L for an example of socketpair use. + =item sort SUBNAME LIST =item sort BLOCK LIST @@ -3186,8 +3329,8 @@ sanity checks in the interest of speed. =item splice ARRAY,OFFSET Removes the elements designated by OFFSET and LENGTH from an array, and -replaces them with the elements of LIST, if any. In a list context, -returns the elements removed from the array. In a scalar context, +replaces them with the elements of LIST, if any. In list context, +returns the elements removed from the array. In scalar context, returns the last element removed, or C if no elements are removed. The array grows or shrinks as necessary. If LENGTH is omitted, removes everything from OFFSET onward. The following @@ -3197,13 +3340,13 @@ equivalences hold (assuming C<$[ == 0>): pop(@a) splice(@a,-1) shift(@a) splice(@a,0,1) unshift(@a,$x,$y) splice(@a,0,0,$x,$y) - $a[$x] = $y splice(@a,$x,1,$y); + $a[$x] = $y splice(@a,$x,1,$y) Example, assuming array lengths are passed before arrays: sub aeq { # compare two list values - local(@a) = splice(@_,0,shift); - local(@b) = splice(@_,0,shift); + my(@a) = splice(@_,0,shift); + my(@b) = splice(@_,0,shift); return 0 unless @a == @b; # same len? while (@a) { return 0 if pop(@a) ne pop(@b); @@ -3220,19 +3363,21 @@ Example, assuming array lengths are passed before arrays: =item split -Splits a string into an array of strings, and returns it. +Splits a string into an array of strings, and returns it. By default, +empty leading fields are preserved, and empty trailing ones are deleted. -If not in a list context, returns the number of fields found and splits into -the @_ array. (In a list context, you can force the split into @_ by +If not in list context, returns the number of fields found and splits into +the @_ array. (In list context, you can force the split into @_ by using C as the pattern delimiters, but it still returns the list -value.) The use of implicit split to @_ is deprecated, however. +value.) The use of implicit split to @_ is deprecated, however, because +it clobbers your subroutine arguments. If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.) -If LIMIT is specified and is positive, splits into no more than that +If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop() would do well to remember). If LIMIT is negative, it is @@ -3286,11 +3431,10 @@ really does a C internally. Example: - open(passwd, '/etc/passwd'); - while () { - ($login, $passwd, $uid, $gid, $gcos, - $home, $shell) = split(/:/); - ... + open(PASSWD, '/etc/passwd'); + while () { + ($login, $passwd, $uid, $gid, $gcos,$home, $shell) = split(/:/); + #... } (Note that $shell above will still have a newline on it. See L, @@ -3302,7 +3446,7 @@ Returns a string formatted by the usual printf conventions of the C library function sprintf(). See L or L on your system for an explanation of the general principles. -Perl does all of its own sprintf() formatting -- it emulates the C +Perl does its own sprintf() formatting -- it emulates the C function sprintf(), but it doesn't use it (except for floating-point numbers, and even then only the standard modifiers are allowed). As a result, any non-standard extensions in your local sprintf() are not @@ -3485,7 +3629,7 @@ the rarest character is selected, based on some static frequency tables constructed from some C programs and English text. Only those places that contain this "rarest" character are examined.) -For example, here is a loop which inserts index producing entries +For example, here is a loop that inserts index producing entries before any line containing a certain pattern: while (<>) { @@ -3493,11 +3637,11 @@ before any line containing a certain pattern: print ".IX foo\n" if /\bfoo\b/; print ".IX bar\n" if /\bbar\b/; print ".IX blurfl\n" if /\bblurfl\b/; - ... + # ... print; } -In searching for /\bfoo\b/, only those locations in $_ that contain "f" +In searching for C, only those locations in $_ that contain "f" will be looked at, because "f" is rarer than "o". In general, this is a big win except in pathological cases. The only question is whether it saves you more time than it took to build the linked list in the @@ -3549,7 +3693,7 @@ that far from the end of the string. If LEN is omitted, returns everything to the end of the string. If LEN is negative, leaves that many characters off the end of the string. -If you specify a substring which is partly outside the string, the part +If you specify a substring that is partly outside the string, the part within the string is returned. If the substring is totally outside the string a warning is produced. @@ -3573,7 +3717,7 @@ Returns 1 for success, 0 otherwise. On systems that don't support symbolic links, produces a fatal error at run time. To check for that, use eval: - $symlink_exists = (eval {symlink("","")};, $@ eq ''); + $symlink_exists = eval { symlink("",""); 1 }; =item syscall LIST @@ -3589,7 +3733,7 @@ because Perl has to assume that any string pointer might be written through. If your integer arguments are not literals and have never been interpreted in a numeric context, you may need to add 0 to them to force them to look -like numbers. +like numbers. This emulates the syswrite() function (or vice versa): require 'syscall.ph'; # may need to run h2ph $s = "hi there\n"; @@ -3624,11 +3768,26 @@ system-dependent; they are available via the standard module C. However, for historical reasons, some values are universal: zero means read-only, one means write-only, and two means read/write. -If the file named by FILENAME does not exist and the C call -creates it (typically because MODE includes the O_CREAT flag), then -the value of PERMS specifies the permissions of the newly created -file. If PERMS is omitted, the default value is 0666, which allows -read and write for all. This default is reasonable: see C. +If the file named by FILENAME does not exist and the C call creates +it (typically because MODE includes the O_CREAT flag), then the value of +PERMS specifies the permissions of the newly created file. If you omit +the PERMS argument to C, Perl uses the octal value C<0666>. +These permission values need to be in octal, and are modified by your +process's current C. The C value is a number representing +disabled permissions bits--if your C were 027 (group can't write; +others can't read, write, or execute), then passing C 0666 would +create a file with mode 0640 (C<0666 &~ 027> is 0640). + +If you find this C talk confusing, here's some advice: supply a +creation mode of 0666 for regular files and one of 0777 for directories +(in C) and executable files. This gives users the freedom of +choice: if they want protected files, they might choose process umasks +of 022, 027, or even the particularly antisocial mask of 077. Programs +should rarely if ever make policy decisions better left to the user. +The exception to this is when writing files that should be kept private: +mail files, web browser cookies, I<.rhosts> files, and so on. In short, +seldom if ever use 0644 as argument to C because that takes +away the user's option to have a more permissive umask. Better to omit it. The IO::File module provides a more object-oriented approach, if you're into that kind of thing. @@ -3692,40 +3851,16 @@ program they're running doesn't actually interrupt your program. system(@args) == 0 or die "system @args failed: $?" -Here's a more elaborate example of analysing the return value from -system() on a Unix system to check for all possibilities, including for -signals and core dumps. +You can check all the failure possibilities by inspecting +C<$?> like this: - $! = 0; - $rc = system @args; - printf "system(%s) returned %#04x: ", "@args", $rc; - if ($rc == 0) { - print "ran with normal exit\n"; - } - elsif ($rc == 0xff00) { - # Note that $! can be an empty string if the command that - # system() tried to execute was not found, not executable, etc. - # These errors occur in the child process after system() has - # forked, so the errno value is not visible in the parent. - printf "command failed: %s\n", ($! || "Unknown system() error"); - } - elsif (($rc & 0xff) == 0) { - $rc >>= 8; - print "ran with non-zero exit status $rc\n"; - } - else { - print "ran with "; - if ($rc & 0x80) { - $rc &= ~0x80; - print "core dump from "; - } - print "signal $rc\n" - } - $ok = ($rc != 0); + $exit_value = $? >> 8; + $signal_num = $? & 127; + $dumped_core = $? & 128; When the arguments get executed via the system shell, results will be subject to its quirks and capabilities. See L -for details. +and L for details. =item syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET @@ -3819,7 +3954,7 @@ For further details see L, L. =item tied VARIABLE Returns a reference to the object underlying VARIABLE (the same value -that was originally returned by the tie() call which bound the variable +that was originally returned by the tie() call that bound the variable to a package.) Returns the undefined value if VARIABLE isn't tied to a package. @@ -3904,6 +4039,8 @@ parameter. Examples: select undef, undef, undef, 0.25; ($a, $b, undef, $c) = &foo; # Ignore third value returned +Note that this is a unary operator, not a list operator. + =item unlink LIST =item unlink @@ -3926,12 +4063,12 @@ If LIST is omitted, uses $_. Unpack does the reverse of pack: it takes a string representing a structure and expands it out into a list value, returning the array -value. (In a scalar context, it returns merely the first value +value. (In scalar context, it returns merely the first value produced.) The TEMPLATE has the same format as in the pack function. Here's a subroutine that does substring: sub substr { - local($what,$where,$howmuch) = @_; + my($what,$where,$howmuch) = @_; unpack("x$where a$howmuch", $what); } @@ -3989,7 +4126,7 @@ If the first argument to C is a number, it is treated as a version number instead of a module name. If the version of the Perl interpreter is less than VERSION, then an error message is printed and Perl exits immediately. This is often useful if you need to check the current -Perl version before Cing library modules which have changed in +Perl version before Cing library modules that have changed in incompatible ways from older versions of Perl. (We try not to do this more than we have to.) @@ -4010,7 +4147,7 @@ If you don't want your namespace altered, explicitly supply an empty list: That is exactly equivalent to - BEGIN { require Module; } + BEGIN { require Module } If the VERSION argument is present between Module and LIST, then the C will call the VERSION method in class Module with the given @@ -4028,9 +4165,10 @@ are also implemented this way. Currently implemented pragmas are: use strict qw(subs vars refs); use subs qw(afunc blurfl); -These pseudo-modules import semantics into the current block scope, unlike -ordinary modules, which import symbols into the current package (which are -effective through the end of the file). +Some of these these pseudo-modules import semantics into the current +block scope (like C or C, unlike ordinary modules, +which import symbols into the current package (which are effective +through the end of the file). There's a corresponding "no" command that unimports meanings imported by use, i.e., it calls C instead of C. @@ -4115,7 +4253,7 @@ of the deceased process, or -1 if there is no such child process. The status is returned in C<$?>. If you say use POSIX ":sys_wait_h"; - ... + #... waitpid(-1,&WNOHANG); then you can do a non-blocking wait for any process. Non-blocking wait @@ -4125,6 +4263,8 @@ FLAGS of 0 is implemented everywhere. (Perl emulates the system call by remembering the status values of processes that have exited but have not been harvested by the Perl script yet.) +See L for other examples. + =item wantarray Returns TRUE if the context of the currently executing subroutine is @@ -4184,7 +4324,7 @@ examples. =item write -Writes a formatted record (possibly multi-line) to the specified file, +Writes a formatted record (possibly multi-line) to the specified FILEHANDLE, using the format associated with that file. By default the format for a file is the one having the same name as the filehandle, but the format for the current output channel (see the select() function) may be set diff --git a/pod/perlipc.pod b/pod/perlipc.pod index 6581896..09b011e 100644 --- a/pod/perlipc.pod +++ b/pod/perlipc.pod @@ -163,7 +163,7 @@ systems, mkfifo(1). These may not be in your normal path. if ( system('mknod', $path, 'p') && system('mkfifo', $path) ) { - die "mk{nod,fifo} $path failed; + die "mk{nod,fifo} $path failed"; } @@ -196,6 +196,33 @@ to find out whether anyone (or anything) has accidentally removed our fifo. sleep 2; # to avoid dup signals } +=head2 WARNING + +By installing Perl code to deal with signals, you're exposing yourself +to danger from two things. First, few system library functions are +re-entrant. If the signal interrupts while Perl is executing one function +(like malloc(3) or printf(3)), and your signal handler then calls the +same function again, you could get unpredictable behavior--often, a +core dump. Second, Perl isn't itself re-entrant at the lowest levels. +If the signal interrupts Perl while Perl is changing its own internal +data structures, similarly unpredictable behaviour may result. + +There are two things you can do, knowing this: be paranoid or be +pragmatic. The paranoid approach is to do as little as possible in your +signal handler. Set an existing integer variable that already has a +value, and return. This doesn't help you if you're in a slow system call, +which will just restart. That means you have to C to longjump(3) out +of the handler. Even this is a little cavalier for the true paranoiac, +who avoids C in a handler because the system I out to get you. +The pragmatic approach is to say ``I know the risks, but prefer the +convenience'', and to do anything you want in your signal handler, +prepared to clean up core dumps now and again. + +To forbid signal handlers altogether would bars you from +many interesting programs, including virtually everything in this manpage, +since you could no longer even write SIGCHLD handlers. Their dodginess +is expected to be addresses in the 5.005 release. + =head1 Using open() for IPC @@ -224,7 +251,7 @@ If one can be sure that a particular program is a Perl script that is expecting filenames in @ARGV, the clever programmer can write something like this: - $ program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile + % program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile and irrespective of which shell it's called from, the Perl program will read from the file F, the process F, standard input (F @@ -254,18 +281,27 @@ while readers of bogus commands return just a quick end of file, writers to bogus command will trigger a signal they'd better be prepared to handle. Consider: - open(FH, "|bogus"); - print FH "bang\n"; - close FH; + open(FH, "|bogus") or die "can't fork: $!"; + print FH "bang\n" or die "can't write: $!"; + close FH or die "can't close: $!"; + +That won't blow up until the close, and it will blow up with a SIGPIPE. +To catch it, you could use this: + + $SIG{PIPE} = 'IGNORE'; + open(FH, "|bogus") or die "can't fork: $!"; + print FH "bang\n" or die "can't write: $!"; + close FH or die "can't close: status=$?"; =head2 Filehandles -Both the main process and the child process share the same STDIN, -STDOUT and STDERR filehandles. If both processes try to access them -at once, strange things can happen. You may want to close or reopen -the filehandles for the child. You can get around this by opening -your pipe with open(), but on some systems this means that the child -process cannot outlive the parent. +Both the main process and any child processes it forks share the same +STDIN, STDOUT, and STDERR filehandles. If both processes try to access +them at once, strange things can happen. You'll certainly want to any +stdio flush output buffers before forking. You may also want to close +or reopen the filehandles for the child. You can get around this by +opening your pipe with open(), but on some systems this means that the +child process cannot outlive the parent. =head2 Background Processes @@ -281,9 +317,15 @@ details). =head2 Complete Dissociation of Child from Parent In some cases (starting server processes, for instance) you'll want to -complete dissociate the child process from the parent. The following -process is reported to work on most Unixish systems. Non-Unix users -should check their Your_OS::Process module for other solutions. +complete dissociate the child process from the parent. The easiest +way is to use: + + use POSIX qw(setsid); + setsid() or die "Can't start a new session: $!"; + +However, you may not be on POSIX. The following process is reported +to work on most Unixish systems. Non-Unix users should check their +Your_OS::Process module for other solutions. =over 4 @@ -307,6 +349,13 @@ Background yourself like this: fork && exit; +=item * + +Ignore hangup signals in case you're running on a shell that doesn't +automatically no-hup you: + + $SIG{HUP} = 'IGNORE'; # or whatever you'd like + =back =head2 Safe Pipe Opens @@ -416,7 +465,7 @@ awkward select() loop and wouldn't allow you to use normal Perl input operations. If you look at its source, you'll see that open2() uses low-level -primitives like Unix pipe() and exec() to create all the connections. +primitives like Unix pipe() and exec() calls to create all the connections. While it might have been slightly more efficient by using socketpair(), it would have then been even less portable than it already is. The open2() and open3() functions are unlikely to work anywhere except on a Unix @@ -426,7 +475,7 @@ Here's an example of using open2(): use FileHandle; use IPC::Open2; - $pid = open2( \*Reader, \*Writer, "cat -u -n" ); + $pid = open2(*Reader, *Writer, "cat -u -n" ); Writer->autoflush(); # default here, actually print Writer "stuff\n"; $got = ; @@ -457,6 +506,74 @@ and interact() functions. Find the library (and we hope its successor F) at your nearest CPAN archive as detailed in the SEE ALSO section below. +=head2 Bidirectional Communication with Yourself + +If you want, you may make low-level pipe() and fork() +to stitch this together by hand. This example only +talks to itself, but you could reopen the appropriate +handles to STDIN and STDOUT and call other processes. + + #!/usr/bin/perl -w + # pipe1 - bidirectional communication using two pipe pairs + # designed for the socketpair-challenged + use IO::Handle; # thousands of lines just for autoflush :-( + pipe(PARENT_RDR, CHILD_WTR); # XXX: failure? + pipe(CHILD_RDR, PARENT_WTR); # XXX: failure? + CHILD_WTR->autoflush(1); + PARENT_WTR->autoflush(1); + + if ($pid = fork) { + close PARENT_RDR; close PARENT_WTR; + print CHILD_WTR "Parent Pid $$ is sending this\n"; + chomp($line = ); + print "Parent Pid $$ just read this: `$line'\n"; + close CHILD_RDR; close CHILD_WTR; + waitpid($pid,0); + } else { + die "cannot fork: $!" unless defined $pid; + close CHILD_RDR; close CHILD_WTR; + chomp($line = ); + print "Child Pid $$ just read this: `$line'\n"; + print PARENT_WTR "Child Pid $$ is sending this\n"; + close PARENT_RDR; close PARENT_WTR; + exit; + } + +But you don't actually have to make two pipe calls. If you +have the socketpair() system call, it will do this all for you. + + #!/usr/bin/perl -w + # pipe2 - bidirectional communication using socketpair + # "the best ones always go both ways" + + use Socket; + use IO::Handle; # thousands of lines just for autoflush :-( + # We say AF_UNIX because although *_LOCAL is the + # POSIX 1003.1g form of the constant, many machines + # still don't have it. + socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC) + or die "socketpair: $!"; + + CHILD->autoflush(1); + PARENT->autoflush(1); + + if ($pid = fork) { + close PARENT; + print CHILD "Parent Pid $$ is sending this\n"; + chomp($line = ); + print "Parent Pid $$ just read this: `$line'\n"; + close CHILD; + waitpid($pid,0); + } else { + die "cannot fork: $!" unless defined $pid; + close CHILD; + chomp($line = ); + print "Child Pid $$ just read this: `$line'\n"; + print PARENT "Child Pid $$ is sending this\n"; + close PARENT; + exit; + } + =head1 Sockets: Client/Server Communication While not limited to Unix-derived operating systems (e.g., WinSock on PCs @@ -487,6 +604,17 @@ knows the other has finished when a "\n" is received) or multi-line messages and responses that end with a period on an empty line ("\n.\n" terminates a message/response). +=head2 Internet Line Terminators + +The Internet line terminator is "\015\012". Under ASCII variants of +Unix, that could usually be written as "\r\n", but under other systems, +"\r\n" might at times be "\015\015\012", "\012\012\015", or something +completely different. The standards specify writing "\015\012" to be +conformant (be strict in what you provide), but they also recommend +accepting a lone "\012" on input (but be lenient in what you require). +We haven't always been very good about that in the code in this manpage, +but unless you're on a Mac, you'll probably be ok. + =head2 Internet TCP Clients and Servers Use Internet-domain sockets when you want to do client-server @@ -495,7 +623,6 @@ communication that might extend to machines outside of your own system. Here's a sample TCP client using Internet-domain sockets: #!/usr/bin/perl -w - require 5.002; use strict; use Socket; my ($remote,$port, $iaddr, $paddr, $proto, $line); @@ -525,11 +652,11 @@ or firewall machine), you should fill this in with your real address instead. #!/usr/bin/perl -Tw - require 5.002; use strict; BEGIN { $ENV{PATH} = '/usr/ucb:/bin' } use Socket; use Carp; + $EOL = "\015\012"; sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" } @@ -558,7 +685,7 @@ instead. at port $port"; print Client "Hello there, $name, it's now ", - scalar localtime, "\n"; + scalar localtime, $EOL; } And here's a multithreaded version. It's multithreaded in that @@ -567,11 +694,11 @@ handle the client request so that the master server can quickly go back to service a new client. #!/usr/bin/perl -Tw - require 5.002; use strict; BEGIN { $ENV{PATH} = '/usr/ucb:/bin' } use Socket; use Carp; + $EOL = "\015\012"; sub spawn; # forward declaration sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" } @@ -612,8 +739,8 @@ go back to service a new client. at port $port"; spawn sub { - print "Hello there, $name, it's now ", scalar localtime, "\n"; - exec '/usr/games/fortune' + print "Hello there, $name, it's now ", scalar localtime, $EOL; + exec '/usr/games/fortune' # XXX: `wrong' line terminators or confess "can't exec fortune: $!"; }; @@ -661,7 +788,6 @@ service on a number of different machines and shows how far their clocks differ from the system on which it's being run: #!/usr/bin/perl -w - require 5.002; use strict; use Socket; @@ -698,7 +824,7 @@ want to. Unix-domain sockets are local to the current host, and are often used internally to implement pipes. Unlike Internet domain sockets, Unix domain sockets can show up in the file system with an ls(1) listing. - $ ls -l /dev/log + % ls -l /dev/log srw-rw-rw- 1 root 0 Oct 31 07:23 /dev/log You can test for these with Perl's B<-S> file test: @@ -710,7 +836,6 @@ You can test for these with Perl's B<-S> file test: Here's a sample Unix-domain client: #!/usr/bin/perl -w - require 5.002; use Socket; use strict; my ($rendezvous, $line); @@ -723,15 +848,17 @@ Here's a sample Unix-domain client: } exit; -And here's a corresponding server. +And here's a corresponding server. You don't have to worry about silly +network terminators here because Unix domain sockets are guaranteed +to be on the localhost, and thus everything works right. #!/usr/bin/perl -Tw - require 5.002; use strict; use Socket; use Carp; BEGIN { $ENV{PATH} = '/usr/ucb:/bin' } + sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" } my $NAME = '/tmp/catsock'; my $uaddr = sockaddr_un($NAME); @@ -744,8 +871,17 @@ And here's a corresponding server. logmsg "server started on $NAME"; + my $waitedpid; + + sub REAPER { + $waitedpid = wait; + $SIG{CHLD} = \&REAPER; # loathe sysV + logmsg "reaped $waitedpid" . ($? ? " with exit $?" : ''); + } + $SIG{CHLD} = \&REAPER; + for ( $waitedpid = 0; accept(Client,Server) || $waitedpid; $waitedpid = 0, close Client) @@ -866,6 +1002,8 @@ something to the server before fetching the server's response. use IO::Socket; unless (@ARGV > 1) { die "usage: $0 host document ..." } $host = shift(@ARGV); + $EOL = "\015\012"; + $BLANK = $EOL x 2; foreach $document ( @ARGV ) { $remote = IO::Socket::INET->new( Proto => "tcp", PeerAddr => $host, @@ -873,7 +1011,7 @@ something to the server before fetching the server's response. ); unless ($remote) { die "cannot connect to http daemon on $host" } $remote->autoflush(1); - print $remote "GET $document HTTP/1.0\n\n"; + print $remote "GET $document HTTP/1.0" . $BLANK; while ( <$remote> ) { print } close $remote; } @@ -900,7 +1038,7 @@ such a request. Here's an example of running that program, which we'll call I: - shell_prompt$ webget www.perl.com /guanaco.html + % webget www.perl.com /guanaco.html HTTP/1.1 404 File Not Found Date: Thu, 08 May 1997 18:02:32 GMT Server: Apache/1.2b6 @@ -935,9 +1073,8 @@ simultaneously copies everything from standard input to the socket. To accomplish the same thing using just one process would be I harder, because it's easier to code two processes to do one thing than it is to code one process to do two things. (This keep-it-simple principle -is one of the cornerstones of the Unix philosophy, and good software -engineering as well, which is probably why it's spread to other systems -as well.) +a cornerstones of the Unix philosophy, and good software engineering as +well, which is probably why it's spread to other systems.) Here's the code: @@ -997,7 +1134,7 @@ well. =head1 TCP Servers with IO::Socket -Setting up server is little bit more involved than running a client. +As always, setting up a server is little bit more involved than running a client. The model is that the server creates a special kind of socket that does nothing but listen on a particular port for incoming connections. It does this by calling the Cnew()> method with @@ -1111,7 +1248,6 @@ with TCP, you'd have to use a different socket handle for each host. #!/usr/bin/perl -w use strict; - require 5.002; use Socket; use Sys::Hostname; @@ -1239,22 +1375,15 @@ on CPAN. =head1 NOTES -If you are running under version 5.000 (dubious) or 5.001, you can still -use most of the examples in this document. You may have to remove the -C and some of the my() statements for 5.000, and for both -you'll have to load in version 1.2 or older of the F module, which -is included in I. - -Most of these routines quietly but politely return C when they fail -instead of causing your program to die right then and there due to an -uncaught exception. (Actually, some of the new I conversion -functions croak() on bad arguments.) It is therefore essential -that you should check the return values of these functions. Always begin -your socket programs this way for optimal success, and don't forget to add -B<-T> taint checking flag to the pound-bang line for servers: +Most of these routines quietly but politely return C when they +fail instead of causing your program to die right then and there due to +an uncaught exception. (Actually, some of the new I conversion +functions croak() on bad arguments.) It is therefore essential to +check return values from these functions. Always begin your socket +programs this way for optimal success, and don't forget to add B<-T> +taint checking flag to the #! line for servers: - #!/usr/bin/perl -w - require 5.002; + #!/usr/bin/perl -Tw use strict; use sigtrap; use Socket; @@ -1268,14 +1397,14 @@ signals and to stick with simple TCP and UDP socket operations; e.g., don't try to pass open file descriptors over a local UDP datagram socket if you want your code to stand a chance of being portable. -Because few vendors provide C libraries that are safely re-entrant, -the prudent programmer will do little else within a handler beyond -setting a numeric variable that already exists; or, if locked into -a slow (restarting) system call, using die() to raise an exception -and longjmp(3) out. In fact, even these may in some cases cause a -core dump. It's probably best to avoid signals except where they are -absolutely inevitable. This perilous problems will be addressed in a -future release of Perl. +As mentioned in the signals section, because few vendors provide C +libraries that are safely re-entrant, the prudent programmer will do +little else within a handler beyond setting a numeric variable that +already exists; or, if locked into a slow (restarting) system call, +using die() to raise an exception and longjmp(3) out. In fact, even +these may in some cases cause a core dump. It's probably best to avoid +signals except where they are absolutely inevitable. This +will be addressed in a future release of Perl. =head1 AUTHOR @@ -1287,10 +1416,10 @@ version and suggestions from the Perl Porters. There's a lot more to networking than this, but this should get you started. -For intrepid programmers, the classic textbook I -by Richard Stevens (published by Addison-Wesley). Note that most books -on networking address networking from the perspective of a C programmer; -translation to Perl is left as an exercise for the reader. +For intrepid programmers, the indispensable textbook is I by W. Richard Stevens (published by Addison-Wesley). Note +that most books on networking address networking from the perspective of +a C programmer; translation to Perl is left as an exercise for the reader. The IO::Socket(3) manpage describes the object library, and the Socket(3) manpage describes the low-level interface to sockets. Besides the obvious diff --git a/pod/perllocale.pod b/pod/perllocale.pod index 2a08835..f4ca0dd 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -4,17 +4,18 @@ perllocale - Perl locale handling (internationalization and localization) =head1 DESCRIPTION -Perl supports language-specific notions of data such as "is this a -letter", "what is the uppercase equivalent of this letter", and "which -of these letters comes first". These are important issues, especially -for languages other than English - but also for English: it would be -very naEve to think that C defines all the "letters". Perl -is also aware that some character other than '.' may be preferred as a -decimal point, and that output date representations may be -language-specific. The process of making an application take account of -its users' preferences in such matters is called B -(often abbreviated as B); telling such an application about a -particular set of preferences is known as B (B). +Perl supports language-specific notions of data such as "is this +a letter", "what is the uppercase equivalent of this letter", and +"which of these letters comes first". These are important issues, +especially for languages other than English--but also for English: it +would be naEve to imagine that C defines all the "letters" +needed to write in English. Perl is also aware that some character other +than '.' may be preferred as a decimal point, and that output date +representations may be language-specific. The process of making an +application take account of its users' preferences in such matters is +called B (often abbreviated as B); telling +such an application about a particular set of preferences is known as +B (B). Perl can understand language-specific data via the standardized (ISO C, XPG4, POSIX 1.c) method called "the locale system". The locale system is @@ -22,13 +23,13 @@ controlled per application using one pragma, one function call, and several environment variables. B: This feature is new in Perl 5.004, and does not apply unless an -application specifically requests it - see L. +application specifically requests it--see L. The one exception is that write() now B uses the current locale - see L<"NOTES">. =head1 PREPARING TO USE LOCALES -If Perl applications are to be able to understand and present your data +If Perl applications are to understand and present your data correctly according a locale of your choice, B of the following must be true: @@ -42,15 +43,15 @@ its C library. =item * -B. You, or +B. You, or your system administrator, must make sure that this is the case. The available locales, the location in which they are kept, and the manner -in which they are installed, vary from system to system. Some systems -provide only a few, hard-wired, locales, and do not allow more to be -added; others allow you to add "canned" locales provided by the system -supplier; still others allow you or the system administrator to define +in which they are installed all vary from system to system. Some systems +provide only a few, hard-wired locales and do not allow more to be +added. Others allow you to add "canned" locales provided by the system +supplier. Still others allow you or the system administrator to define and add arbitrary locales. (You may have to ask your supplier to -provide canned locales which are not delivered with your operating +provide canned locales that are not delivered with your operating system.) Read your system documentation for further illumination. =item * @@ -71,8 +72,8 @@ appropriate, and B of the following must be true: =item * B) -must be correctly set up>, either by yourself, or by the person who set -up your system account, at the time the application is started. +must be correctly set up> at the time the application is started, either +by yourself or by whoever set up your system account. =item * @@ -94,16 +95,16 @@ pragma tells Perl to use the current locale for some operations: B (C, C, C, C, and C) and the POSIX string collation functions strcoll() and strxfrm() use -C. sort() is also affected if it is used without an -explicit comparison function because it uses C by default. +C. sort() is also affected if used without an +explicit comparison function, because it uses C by default. -B C and C are unaffected by the locale: they always +B C and C are unaffected by locale: they always perform a byte-by-byte comparison of their scalar operands. What's more, if C finds that its operands are equal according to the collation sequence specified by the current locale, it goes on to perform a byte-by-byte comparison, and only returns I<0> (equal) if the operands are bit-for-bit identical. If you really want to know whether -two strings - which C and C may consider different - are equal +two strings--which C and C may consider different--are equal as far as collation in the locale is concerned, see the discussion in L. @@ -126,10 +127,10 @@ B (strftime()) uses C. C, C, and so on, are discussed further in L. -The default behavior returns with S> or on reaching the -end of the enclosing block. +The default behavior is restored with the S> pragma, or +upon reaching the end of block enclosing C. -Note that the string result of any operation that uses locale +The string result of any operation that uses locale information is tainted, as it is possible for a locale to be untrustworthy. See L<"SECURITY">. @@ -173,17 +174,17 @@ the current locale for the category. You can use this value as the second argument in a subsequent call to setlocale(). If a second argument is given and it corresponds to a valid locale, the locale for the category is set to that value, and the function returns the -now-current locale value. You can use this in a subsequent call to +now-current locale value. You can then use this in yet another call to setlocale(). (In some implementations, the return value may sometimes -differ from the value you gave as the second argument - think of it as -an alias for the value that you gave.) +differ from the value you gave as the second argument--think of it as +an alias for the value you gave.) As the example shows, if the second argument is an empty string, the category's locale is returned to the default specified by the corresponding environment variables. Generally, this results in a -return to the default which was in force when Perl started up: changes +return to the default that was in force when Perl started up: changes to the environment made by the application after startup may or may not -be noticed, depending on the implementation of your system's C library. +be noticed, depending on your system's C library. If the second argument does not correspond to a valid locale, the locale for the category is not changed, and the function returns I. @@ -192,10 +193,9 @@ For further information about the categories, consult L. =head2 Finding locales -For the locales available in your system, also consult L -and see whether it leads you to the list of the available locales -(search for the I section). If that fails, try the following -command lines: +For locales available in your system, consult also L to +see whether it leads to the list of available locales (search for the +I section). If that fails, try the following command lines: locale -a @@ -215,25 +215,25 @@ and see whether they list something resembling these english german russian english.iso88591 german.iso88591 russian.iso88595 -Sadly, even though the calling interface for setlocale() has been -standardized, the names of the locales and the directories where the -configuration is, have not. The basic form of the name is -IB<.>I, but the latter parts -after the I are not always present. The I and the -I are usually from the standards B and B, -respectively, the two-letter abbreviations for the countries and the -languages of the world. The I part often mentions some B character set, the Latin codesets. For example the C is the so-called "Western codeset" that can be used to encode -most of the Western European languages. Again, sadly, as you can see, -there are several ways to write even the name of that one standard. +Sadly, even though the calling interface for setlocale() has +been standardized, names of locales and the directories where the +configuration resides have not been. The basic form of the name is +IB<.>I, but the latter parts after +I are not always present. The I and I are +usually from the standards B and B, the two-letter +abbreviations for the countries and the languages of the world, +respectively. The I part often mentions some B +character set, the Latin codesets. For example, C is the +so-called "Western codeset" that can be used to encode most Western +European languages. Again, there are several ways to write even the +name of that one standard. Lamentably. Two special locales are worth particular mention: "C" and "POSIX". Currently these are effectively the same locale: the difference is -mainly that the first one is defined by the C standard and the second by -the POSIX standard. What they define is the B in which +mainly that the first one is defined by the C standard, the second by +the POSIX standard. They define the B in which every program starts in the absence of locale information in its -environment. (The default default locale, if you will.) Its language +environment. (The I default locale, if you will.) Its language is (American) English and its character codeset ASCII. B: Not all systems have the "POSIX" locale (not all systems are @@ -242,7 +242,7 @@ default locale. =head2 LOCALE PROBLEMS -You may meet the following warning message at Perl startup: +You may encounter the following warning message at Perl startup: perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: @@ -251,83 +251,80 @@ You may meet the following warning message at Perl startup: are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). -This means that your locale settings were that LC_ALL equals "En_US" -and LANG exists but has no value. Perl tried to believe you but it -could not. Instead Perl gave up and fell back to the "C" locale, the -default locale that is supposed to work no matter what. This usually -means either or both of the two problems: either your locale settings -were wrong, they talk of locales your system has never heard of, or -that the locale installation in your system has problems, for example -some system files are broken or missing. For the problems there are -quick and temporary fixes and more thorough and lasting fixes. +This means that your locale settings had LC_ALL set to "En_US" and +LANG exists but has no value. Perl tried to believe you but could not. +Instead, Perl gave up and fell back to the "C" locale, the default locale +that is supposed to work no matter what. This usually means your locale +settings were wrong, they mention locales your system has never heard +of, or the locale installation in your system has problems (for example, +some system files are broken or missing). There are quick and temporary +fixes to these problems, as well as more thorough and lasting fixes. =head2 Temporarily fixing locale problems -The two quickest fixes are either to make Perl be silent about any +The two quickest fixes are either to render Perl silent about any locale inconsistencies or to run Perl under the default locale "C". Perl's moaning about locale problems can be silenced by setting the environment variable PERL_BADLANG to a non-zero value, for example "1". This method really just sweeps the problem under the carpet: you tell Perl to shut up even when Perl sees that something is wrong. Do -not be surprised if later something locale-dependent works funny. +not be surprised if later something locale-dependent misbehaves. Perl can be run under the "C" locale by setting the environment -variable LC_ALL to "C". This method is perhaps a bit more civilised -than the PERL_BADLANG one but please note that setting the LC_ALL (or -the other locale variables) may affect also other programs, not just -Perl. Especially external programs run from within Perl will see +variable LC_ALL to "C". This method is perhaps a bit more civilized +than the PERL_BADLANG approach, but setting LC_ALL (or +other locale variables) may affect other programs as well, not just +Perl. In particular, external programs run from within Perl will see these changes. If you make the new settings permanent (read on), all -the programs you run will see the changes. See L for for -the full list of all the environment variables and L -for their effects in Perl. The effects in other programs are quite -easily deducible: for example the variable LC_COLLATE may well affect -your "sort" program (or whatever the program that arranges `records' +programs you run see the changes. See L for for +the full list of relevant environment variables and L +for their effects in Perl. Effects in other programs are +easily deducible. For example, the variable LC_COLLATE may well affect +your B program (or whatever the program that arranges `records' alphabetically in your system is called). -You can first try out changing these variables temporarily and if the -new settings seem to help then put the settings into the startup files -of your environment. Please consult your local documentation for the -exact details but very shortly for UNIXish systems: in Bourneish -shells (sh, ksh, bash, zsh) for example +You can test out changing these variables temporarily, and if the +new settings seem to help, put those settings into your shell startup +files. Consult your local documentation for the exact details. For in +Bourne-like shells (B, B, B, B): LC_ALL=en_US.ISO8859-1 export LC_ALL -We assume here that we saw with the above discussed commands the -locale "en_US.ISO8859-1" and decided to try that instead of the above -faulty locale "En_US" -- and in Cshish shells (csh, tcsh) +This assumes that we saw the locale "en_US.ISO8859-1" using the commands +discussed above. We decided to try that instead of the above faulty +locale "En_US"--and in Cshish shells (B, B) setenv LC_ALL en_US.ISO8859-1 -If you do not know what shell you have, please consult your local +If you do not know what shell you have, consult your local helpdesk or the equivalent. =head2 Permanently fixing locale problems -Then the slower but better fixes: the misconfiguration of your own -environment variables you may be able to fix yourself; the +The slower but superior fixes are when you may be able to yourself +fix the misconfiguration of your own environment variables. The mis(sing)configuration of the whole system's locales usually requires the help of your friendly system administrator. -First, see earlier in this document about L. That -tells how you can find which locales really are supported and more -importantly, installed, in your system. In our example error message -the environment variables affecting the locale are listed in the order -of decreasing importance and unset variables do not matter, therefore -in the above error message the LC_ALL being "En_US" must have been the -bad choice. Always try fixing first the locale settings listed first. +First, see earlier in this document about L. That tells +how to find which locales are really supported--and more importantly, +installed--on your system. In our example error message, environment +variables affecting the locale are listed in the order of decreasing +importance (and unset variables do not matter). Therefore, having +LC_ALL set to "En_US" must have been the bad choice, as shown by the +error message. First try fixing locale settings listed first. -Second, if you see with the listed commands something B (for -example prefix matches do not count and case usually matters) like -"En_US" (without the quotes), then you should be okay because you are -using a locale name that should be installed and available in your -system. In this case skip forward to L. +Second, if using the listed commands you see something B +(prefix matches do not count and case usually counts) like "En_US" +without the quotes, then you should be okay because you are using a +locale name that should be installed and available in your system. +In this case, see L. -=head2 Permantently fixing your locale configuration +=head2 Permanently fixing your locale configuration -This is the case when for example you see +This is when you see something like: perl: warning: Please check that your locale settings: LC_ALL = "En_US", @@ -335,21 +332,21 @@ This is the case when for example you see are supported and installed on your system. but then cannot see that "En_US" listed by the above-mentioned -commands. You may see things like "en_US.ISO8859-1" but that is not -the same thing. In this case you might try running under a locale -that you could list and somehow matches with what you tried. The +commands. You may see things like "en_US.ISO8859-1", but that isn't +the same. In this case, try running under a locale +that you can list and which somehow matches what you tried. The rules for matching locale names are a bit vague because -standardisation is weak in this area. See again the L about the general rules. +standardization is weak in this area. See again the L about general rules. -=head2 Permanently fixing the system locale configuration +=head2 Permanently fixing system locale configuration -Please contact your system administrator and tell her the exact error -message you get and ask her to read this same documentation you are -now reading. She should be able to check whether there is something -wrong with the locale configuration of the system. The L section is unfortunately a bit vague about the exact commands -and places because these things are not that standardised. +Contact a system administrator (preferably your own) and report the exact +error message you get, and ask them to read this same documentation you +are now reading. They should be able to check whether there is something +wrong with the locale configuration of the system. The L +section is unfortunately a bit vague about the exact commands and places +because these things are not that standardized. =head2 The localeconv function @@ -357,7 +354,7 @@ The POSIX::localeconv() function allows you to get particulars of the locale-dependent numeric formatting information specified by the current C and C locales. (If you just want the name of the current locale for a particular category, use POSIX::setlocale() -with a single parameter - see L.) +with a single parameter--see L.) use POSIX qw(locale_h); @@ -370,16 +367,15 @@ with a single parameter - see L.) } localeconv() takes no arguments, and returns B a hash. -The keys of this hash are formatting variable names such as -C and C; the values are the corresponding -values. See L for a longer example, which lists -all the categories an implementation might be expected to provide; some -provide more and others fewer, however. Note that you don't need C: as a function with the job of querying the locale, localeconv() -always observes the current locale. +The keys of this hash are variable names for formatting, such as +C and C. The values are the corresponding, +er, values. See L for a longer example listing +the categories an implementation might be expected to provide; some +provide more and others fewer, however. You don't need an explicit C, because localeconv() always observes the current locale. -Here's a simple-minded example program which rewrites its command line -parameters as integers formatted correctly in the current locale: +Here's a simple-minded example program that rewrites its command-line +parameters as integers correctly formatted in the current locale: # See comments in previous example require 5.004; @@ -404,20 +400,20 @@ parameters as integers formatted correctly in the current locale: =head1 LOCALE CATEGORIES -The subsections which follow describe basic locale categories. As well -as these, there are some combination categories which allow the -manipulation of more than one basic category at a time. See -L<"ENVIRONMENT"> for a discussion of these. +The following subsections describe basic locale categories. Beyond these, +some combination categories allow manipulation of more than one +basic category at a time. See L<"ENVIRONMENT"> for a discussion of these. =head2 Category LC_COLLATE: Collation -When in the scope of S>, Perl looks to the C -environment variable to determine the application's notions on the -collation (ordering) of characters. ('b' follows 'a' in Latin -alphabets, but where do 'E' and 'E' belong?) +In the scope of S>, Perl looks to the C +environment variable to determine the application's notions on collation +(ordering) of characters. For example, 'b' follows 'a' in Latin +alphabets, but where do 'E' and 'E' belong? And while +'color' follows 'chocolate' in English, what about in Spanish? -Here is a code snippet that will tell you what are the alphanumeric -characters in the current locale, in the locale order: +Here is a code snippet to tell what alphanumeric +characters are in the current locale, in that locale's order: use locale; print +(sort grep /\w/, map { chr() } 0..255), "\n"; @@ -435,7 +431,7 @@ first example is useful for natural text. As noted in L, C compares according to the current collation locale when C is in effect, but falls back to a -byte-by-byte comparison for strings which the locale says are equal. You +byte-by-byte comparison for strings that the locale says are equal. You can use POSIX::strcoll() if you don't want this fall-back: use POSIX qw(strcoll); @@ -443,10 +439,10 @@ can use POSIX::strcoll() if you don't want this fall-back: !strcoll("space and case ignored", "SpaceAndCaseIgnored"); $equal_in_locale will be true if the collation locale specifies a -dictionary-like ordering which ignores space characters completely, and +dictionary-like ordering that ignores space characters completely and which folds case. -If you have a single string which you want to check for "equality in +If you have a single string that you want to check for "equality in locale" against several others, you might think you could gain a little efficiency by using POSIX::strxfrm() in conjunction with C: @@ -462,30 +458,30 @@ efficiency by using POSIX::strxfrm() in conjunction with C: strxfrm() takes a string and maps it into a transformed string for use in byte-by-byte comparisons against other transformed strings during collation. "Under the hood", locale-affected Perl comparison operators -call strxfrm() for both their operands, then do a byte-by-byte -comparison of the transformed strings. By calling strxfrm() explicitly, +call strxfrm() for both operands, then do a byte-by-byte +comparison of the transformed strings. By calling strxfrm() explicitly and using a non locale-affected comparison, the example attempts to save -a couple of transformations. In fact, it doesn't save anything: Perl +a couple of transformations. But in fact, it doesn't save anything: Perl magic (see L) creates the transformed version of a -string the first time it's needed in a comparison, then keeps it around +string the first time it's needed in a comparison, then keeps this version around in case it's needed again. An example rewritten the easy way with C runs just about as fast. It also copes with null characters embedded in strings; if you call strxfrm() directly, it treats the first -null it finds as a terminator. And don't expect the transformed strings -it produces to be portable across systems - or even from one revision +null it finds as a terminator. don't expect the transformed strings +it produces to be portable across systems--or even from one revision of your operating system to the next. In short, don't call strxfrm() directly: let Perl do it for you. -Note: C isn't shown in some of these examples, as it isn't +Note: C isn't shown in some of these examples because it isn't needed: strcoll() and strxfrm() exist only to generate locale-dependent results, and so always obey the current C locale. =head2 Category LC_CTYPE: Character Types -When in the scope of S>, Perl obeys the C locale +In the scope of S>, Perl obeys the C locale setting. This controls the application's notion of which characters are alphabetic. This affects Perl's C<\w> regular expression metanotation, -which stands for alphanumeric characters - that is, alphabetic and +which stands for alphanumeric characters--that is, alphabetic and numeric characters. (Consult L for more information about regular expressions.) Thanks to C, depending on your locale setting, characters like 'E', 'E', 'E', and @@ -493,35 +489,34 @@ setting, characters like 'E', 'E', 'E', and The C locale also provides the map used in transliterating characters between lower and uppercase. This affects the case-mapping -functions - lc(), lcfirst, uc() and ucfirst(); case-mapping -interpolation with C<\l>, C<\L>, C<\u> or C<\U> in double-quoted strings -and in C substitutions; and case-independent regular expression +functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping +interpolation with C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted strings +and C substitutions; and case-independent regular expression pattern matching using the C modifier. -Finally, C affects the POSIX character-class test functions - -isalpha(), islower() and so on. For example, if you move from the "C" -locale to a 7-bit Scandinavian one, you may find - possibly to your -surprise - that "|" moves from the ispunct() class to isalpha(). +Finally, C affects the POSIX character-class test +functions--isalpha(), islower(), and so on. For example, if you move +from the "C" locale to a 7-bit Scandinavian one, you may find--possibly +to your surprise--that "|" moves from the ispunct() class to isalpha(). B A broken or malicious C locale definition may result in clearly ineligible characters being considered to be alphanumeric by -your application. For strict matching of (unaccented) letters and -digits - for example, in command strings - locale-aware applications +your application. For strict matching of (mundane) letters and +digits--for example, in command strings--locale-aware applications should use C<\w> inside a C block. See L<"SECURITY">. =head2 Category LC_NUMERIC: Numeric Formatting -When in the scope of S>, Perl obeys the C -locale information, which controls application's idea of how numbers -should be formatted for human readability by the printf(), sprintf(), -and write() functions. String to numeric conversion by the -POSIX::strtod() function is also affected. In most implementations the -only effect is to change the character used for the decimal point - -perhaps from '.' to ',': these functions aren't aware of such niceties -as thousands separation and so on. (See L if -you care about these things.) - -Note that output produced by print() is B affected by the +In the scope of S>, Perl obeys the C locale +information, which controls an application's idea of how numbers should +be formatted for human readability by the printf(), sprintf(), and +write() functions. String-to-numeric conversion by the POSIX::strtod() +function is also affected. In most implementations the only effect is to +change the character used for the decimal point--perhaps from '.' to ','. +These functions aren't aware of such niceties as thousands separation and +so on. (See L if you care about these things.) + +Output produced by print() is B affected by the current locale: it is independent of whether C or C is in effect, and corresponds to what you'd get from printf() in the "C" locale. The same is true for Perl's internal conversions @@ -543,23 +538,23 @@ between numeric and string formats: =head2 Category LC_MONETARY: Formatting of monetary amounts -The C standard defines the C category, but no function that -is affected by its contents. (Those with experience of standards +The C standard defines the C category, but no function +that is affected by its contents. (Those with experience of standards committees will recognize that the working group decided to punt on the issue.) Consequently, Perl takes no notice of it. If you really want -to use C, you can query its contents - see L - and use the information that it returns in your -application's own formatting of currency amounts. However, you may well -find that the information, though voluminous and complex, does not quite -meet your requirements: currency formatting is a hard nut to crack. +to use C, you can query its contents--see L--and use the information that it returns in your application's +own formatting of currency amounts. However, you may well find that +the information, voluminous and complex though it may be, still does not +quite meet your requirements: currency formatting is a hard nut to crack. =head2 LC_TIME -The output produced by POSIX::strftime(), which builds a formatted +Output produced by POSIX::strftime(), which builds a formatted human-readable date/time string, is affected by the current C locale. Thus, in a French locale, the output produced by the C<%B> format element (full month name) for the first month of the year would -be "janvier". Here's how to get a list of the long month names in the +be "janvier". Here's how to get a list of long month names in the current locale: use POSIX qw(strftime); @@ -568,24 +563,24 @@ current locale: strftime("%B", 0, 0, 0, 1, $_, 96); } -Note: C isn't needed in this example: as a function which +Note: C isn't needed in this example: as a function that exists only to generate locale-dependent results, strftime() always obeys the current C locale. =head2 Other categories -The remaining locale category, C (possibly supplemented by -others in particular implementations) is not currently used by Perl - -except possibly to affect the behavior of library functions called by -extensions which are not part of the standard Perl distribution. +The remaining locale category, C (possibly supplemented +by others in particular implementations) is not currently used by +Perl--except possibly to affect the behavior of library functions called +by extensions outside the standard Perl distribution. =head1 SECURITY -While the main discussion of Perl security issues can be found in +Although the main discussion of Perl security issues can be found in L, a discussion of Perl's locale handling would be incomplete if it did not draw your attention to locale-dependent security issues. -Locales - particularly on systems which allow unprivileged users to -build their own locales - are untrustworthy. A malicious (or just plain +Locales--particularly on systems that allow unprivileged users to +build their own locales--are untrustworthy. A malicious (or just plain broken) locale can make a locale-aware application give unexpected results. Here are a few possibilities: @@ -594,7 +589,7 @@ results. Here are a few possibilities: =item * Regular expression checks for safe file names or mail addresses using -C<\w> may be spoofed by an C locale which claims that +C<\w> may be spoofed by an C locale that claims that characters such as "E" and "|" are alphanumeric. =item * @@ -618,32 +613,32 @@ A sneaky C locale could result in the names of students with =item * -An application which takes the trouble to use the information in +An application that takes the trouble to use information in C may format debits as if they were credits and vice versa -if that locale has been subverted. Or it make may make payments in US +if that locale has been subverted. Or it might make payments in US dollars instead of Hong Kong dollars. =item * The date and day names in dates formatted by strftime() could be manipulated to advantage by a malicious user able to subvert the -C locale. ("Look - it says I wasn't in the building on +C locale. ("Look--it says I wasn't in the building on Sunday.") =back Such dangers are not peculiar to the locale system: any aspect of an -application's environment which may maliciously be modified presents +application's environment which may be modified maliciously presents similar challenges. Similarly, they are not specific to Perl: any -programming language which allows you to write programs which take +programming language that allows you to write programs that take account of their environment exposes you to these issues. -Perl cannot protect you from all of the possibilities shown in the -examples - there is no substitute for your own vigilance - but, when +Perl cannot protect you from all possibilities shown in the +examples--there is no substitute for your own vigilance--but, when C is in effect, Perl uses the tainting mechanism (see -L) to mark string results which become locale-dependent, and +L) to mark string results that become locale-dependent, and which may be untrustworthy in consequence. Here is a summary of the -tainting behavior of operators and functions which may be affected by +tainting behavior of operators and functions that may be affected by the locale: =over 4 @@ -661,11 +656,11 @@ C is in effect. Scalar true/false result never tainted. -Subpatterns, either delivered as an array-context result, or as $1 etc. +Subpatterns, either delivered as a list-context result or as $1 etc. are tainted if C is in effect, and the subpattern regular expression contains C<\w> (to match an alphanumeric character), C<\W> (non-alphanumeric character), C<\s> (white-space character), or C<\S> -(non white-space character). The matched pattern variable, $&, $` +(non white-space character). The matched-pattern variable, $&, $` (pre-match), $' (post-match), and $+ (last match) are also tainted if C is in effect and the regular expression contains C<\w>, C<\W>, C<\s>, or C<\S>. @@ -673,8 +668,8 @@ C<\W>, C<\s>, or C<\S>. =item B (C): Has the same behavior as the match operator. Also, the left -operand of C<=~> becomes tainted when C in effect, -if it is modified as a result of a substitution based on a regular +operand of C<=~> becomes tainted when C in effect +if modified as a result of a substitution based on a regular expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of case-mapping with C<\l>, C<\L>,C<\u> or C<\U>. @@ -718,8 +713,8 @@ when taint checks are enabled. or warn "Open of $untainted_output_file failed: $!\n"; The program can be made to run by "laundering" the tainted value through -a regular expression: the second example - which still ignores locale -information - runs, creating the file named on its command line +a regular expression: the second example--which still ignores locale +information--runs, creating the file named on its command line if it can. #/usr/local/bin/perl -T @@ -731,7 +726,7 @@ if it can. open(F, ">$untainted_output_file") or warn "Open of $untainted_output_file failed: $!\n"; -Compare this with a very similar program which is locale-aware: +Compare this with a similar but locale-aware program: #/usr/local/bin/perl -T @@ -744,7 +739,7 @@ Compare this with a very similar program which is locale-aware: or warn "Open of $localized_output_file failed: $!\n"; This third program fails to run because $& is tainted: it is the result -of a match involving C<\w> when C is in effect. +of a match involving C<\w> while C is in effect. =head1 ENVIRONMENT @@ -754,10 +749,10 @@ of a match involving C<\w> when C is in effect. A string that can suppress Perl's warning about failed locale settings at startup. Failure can occur if the locale support in the operating -system is lacking (broken) in some way - or if you mistyped the name of +system is lacking (broken) in some way--or if you mistyped the name of a locale when you set up your environment. If this environment variable -is absent, or has a value which does not evaluate to integer zero - that -is "0" or "" - Perl will complain about locale setting failures. +is absent, or has a value that does not evaluate to integer zero--that +is, "0" or ""--Perl will complain about locale setting failures. B: PERL_BADLANG only gives you a way to hide the warning message. The message tells about some problem in your system's locale support, @@ -773,7 +768,7 @@ for controlling an application's opinion on data. =item LC_ALL -C is the "override-all" locale environment variable. If it is +C is the "override-all" locale environment variable. If set, it overrides all the rest of the locale environment variables. =item LC_CTYPE @@ -819,23 +814,22 @@ category-specific C. =head2 Backward compatibility Versions of Perl prior to 5.004 B ignored locale information, -generally behaving as if something similar to the C<"C"> locale (see -L) was always in force, even if the program -environment suggested otherwise. By default, Perl still behaves this -way so as to maintain backward compatibility. If you want a Perl -application to pay attention to locale information, you B use -the S> pragma (see L) to -instruct it to do so. +generally behaving as if something similar to the C<"C"> locale were +always in force, even if the program environment suggested otherwise +(see L). By default, Perl still behaves this +way for backward compatibility. If you want a Perl application to pay +attention to locale information, you B use the S> +pragma (see L) to instruct it to do so. Versions of Perl from 5.002 to 5.003 did use the C -information if that was available, that is, C<\w> did understand what -are the letters according to the locale environment variables. +information if available; that is, C<\w> did understand what +were the letters according to the locale environment variables. The problem was that the user had no control over the feature: if the C library supported locales, Perl used them. =head2 I18N:Collate obsolete -In versions of Perl prior to 5.004 per-locale collation was possible +In versions of Perl prior to 5.004, per-locale collation was possible using the C library module. This module is now mildly obsolete and should be avoided in new applications. The C functionality is now integrated into the Perl core language: One can @@ -856,7 +850,7 @@ system's implementation of the locale system than by Perl. =head2 write() and LC_NUMERIC -Formats are the only part of Perl which unconditionally use information +Formats are the only part of Perl that unconditionally use information from a program's locale; if a program's environment specifies an LC_NUMERIC locale, it is always used to specify the decimal point character in formatted output. Formatted output cannot be controlled by @@ -869,7 +863,7 @@ structure. There is a large collection of locale definitions at C. You should be aware that it is unsupported, and is not claimed to be fit for any purpose. If your -system allows the installation of arbitrary locales, you may find the +system allows installation of arbitrary locales, you may find the definitions useful as they are, or as a basis for the development of your own locales. @@ -895,12 +889,12 @@ standard we've got. This may be construed as a bug. =head2 Broken systems -In certain system environments the operating system's locale support +In certain systems, the operating system's locale support is broken and cannot be fixed or used by Perl. Such deficiencies can and will result in mysterious hangs and/or Perl core dumps when the C is in effect. When confronted with such a system, please report in excruciating detail to >, and -complain to your vendor: maybe some bug fixes exist for these problems +complain to your vendor: bug fixes may exist for these problems in your operating system. Sometimes such bug fixes are called an operating system upgrade. @@ -941,6 +935,7 @@ L =head1 HISTORY Jarkko Hietaniemi's original F heavily hacked by Dominic -Dunlop, assisted by the perl5-porters. +Dunlop, assisted by the perl5-porters. Prose worked over a bit by +Tom Christiansen. -Last update: Mon Nov 17 22:48:48 EET 1997 +Last update: Thu Jun 11 08:44:13 MDT 1998 diff --git a/pod/perllol.pod b/pod/perllol.pod index 1de3b1a..0e6796b 100644 --- a/pod/perllol.pod +++ b/pod/perllol.pod @@ -26,7 +26,7 @@ a declaration of the array: bart Now you should be very careful that the outer bracket type -is a round one, that is, parentheses. That's because you're assigning to +is a round one, that is, a parenthesis. That's because you're assigning to an @list, so you need parentheses. If you wanted there I to be an @LoL, but rather just a reference to it, you could do something more like this: @@ -144,17 +144,7 @@ you'd have to do something like this: push @$ref_to_LoL, [ split ]; } -Actually, if you were using strict, you'd have to declare not only -$ref_to_LoL as you had to declare @LoL, but you'd I having to -initialize it to a reference to an empty list. (This was a bug in -perl version 5.001m that's been fixed for the 5.002 release.) - - my $ref_to_LoL = []; - while (<>) { - push @$ref_to_LoL, [ split ]; - } - -Ok, now you can add new rows. What about adding new columns? If you're +Now you can add new rows. What about adding new columns? If you're dealing with just matrices, it's often easiest to use simple assignment: for $x (1 .. 10) { @@ -310,4 +300,4 @@ perldata(1), perlref(1), perldsc(1) Tom Christiansen > -Last udpate: Sat Oct 7 19:35:26 MDT 1995 +Last update: Thu Jun 4 16:16:23 MDT 1998 diff --git a/pod/perlmod.pod b/pod/perlmod.pod index 942f216..2a0f6fe 100644 --- a/pod/perlmod.pod +++ b/pod/perlmod.pod @@ -7,19 +7,20 @@ perlmod - Perl modules (packages and symbol tables) =head2 Packages Perl provides a mechanism for alternative namespaces to protect packages -from stomping on each other's variables. In fact, apart from certain -magical variables, there's really no such thing as a global variable -in Perl. The package statement declares the compilation unit as +from stomping on each other's variables. In fact, there's really no such +thing as a global variable in Perl (although some identifiers default +to the main package instead of the current one). The package statement +declares the compilation unit as being in the given namespace. The scope of the package declaration is from the declaration itself through the end of the enclosing block, C, C, or end of file, whichever comes first (the same scope as the my() and local() operators). All further unqualified dynamic -identifiers will be in this namespace. A package statement affects -only dynamic variables--including those you've used local() on--but +identifiers will be in this namespace. A package statement only affects +dynamic variables--including those you've used local() on--but I lexical variables created with my(). Typically it would be the first declaration in a file to be included by the C or C operator. You can switch into a package in more than one place; -it influences merely which symbol table is used by the compiler for the +it merely influences which symbol table is used by the compiler for the rest of that block. You can refer to variables and filehandles in other packages by prefixing the identifier with the package name and a double colon: C<$Package::Variable>. If the package name is null, the C
@@ -39,13 +40,13 @@ It would treat package C as a totally separate global package. Only identifiers starting with letters (or underscore) are stored in a package's symbol table. All other symbols are kept in package C
, -including all of the punctuation variables like $_. In addition, the -identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are -forced to be in package C
, even when used for other purposes than -their builtin one. Note also that, if you have a package called C, -C, or C, then you can't use the qualified form of an identifier -because it will be interpreted instead as a pattern match, a substitution, -or a transliteration. +including all of the punctuation variables like $_. In addition, when +unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, +INC, and SIG are forced to be in package C
, even when used for other +purposes than their builtin one. Note also that, if you have a package +called C, C, or C, then you can't use the qualified form of an +identifier because it will be interpreted instead as a pattern match, +a substitution, or a transliteration. (Variables beginning with underscore used to be forced into package main, but we decided it was more useful for package writers to be able @@ -85,62 +86,29 @@ table lookups at compile time: local $main::{foo} = $main::{bar}; You can use this to print out all the variables in a package, for -instance. Here is F from the Perl library: - - package dumpvar; - sub main::dumpvar { - ($package) = @_; - local(*stab) = eval("*${package}::"); - while (($key,$val) = each(%stab)) { - local(*entry) = $val; - if (defined $entry) { - print "\$$key = '$entry'\n"; - } - - if (defined @entry) { - print "\@$key = (\n"; - foreach $num ($[ .. $#entry) { - print " $num\t'",$entry[$num],"'\n"; - } - print ")\n"; - } - - if ($key ne "${package}::" && defined %entry) { - print "\%$key = (\n"; - foreach $key (sort keys(%entry)) { - print " $key\t'",$entry{$key},"'\n"; - } - print ")\n"; - } - } - } - -Note that even though the subroutine is compiled in package C, -the name of the subroutine is qualified so that its name is inserted into -package C
. While popular many years ago, this is now considered -very poor style; in general, you should be writing modules and using the -normal export mechanism instead of hammering someone else's namespace, -even main's. +instance. The standard F library and the CPAN module +Devel::Symdump make use of this. Assignment to a typeglob performs an aliasing operation, i.e., *dick = *richard; -causes variables, subroutines, and file handles accessible via the -identifier C to also be accessible via the identifier C. If -you want to alias only a particular variable or subroutine, you can -assign a reference instead: +causes variables, subroutines, formats, and file and directory handles +accessible via the identifier C also to be accessible via the +identifier C. If you want to alias only a particular variable or +subroutine, you can assign a reference instead: *dick = \$richard; -makes $richard and $dick the same variable, but leaves +Which makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays. Tricky, eh? This mechanism may be used to pass and return cheap references into or from subroutines if you won't want to copy the whole -thing. +thing. It only works when assigning to dynamic variables, not +lexicals. - %some_hash = (); + %some_hash = (); # can't be my() *some_hash = fn( \%another_hash ); sub fn { local *hashsym = shift; @@ -161,14 +129,15 @@ Another use of symbol tables is for making "constant" scalars. *PI = \3.14159265358979; Now you cannot alter $PI, which is probably a good thing all in all. -This isn't the same as a constant subroutine (one prototyped to -take no arguments and to return a constant expression), which is -subject to optimization at compile-time. This isn't. See L -for details on these. +This isn't the same as a constant subroutine, which is subject to +optimization at compile-time. This isn't. A constant subroutine is one +prototyped to take no arguments and to return a constant expression. +See L for details on these. The C pragma is a +convenient shorthand for these. You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and package the *foo symbol table entry comes from. This may be useful -in a subroutine which is passed typeglobs as arguments +in a subroutine that gets passed typeglobs as arguments: sub identify_typeglob { my $glob = shift; @@ -200,27 +169,32 @@ files in time to be visible to the rest of the file. Once a C has run, it is immediately undefined and any code it used is returned to Perl's memory pool. This means you can't ever explicitly call a C. -An C subroutine is executed as late as possible, that is, when the -interpreter is being exited, even if it is exiting as a result of a -die() function. (But not if it's is being blown out of the water by a -signal--you have to trap that yourself (if you can).) You may have -multiple C blocks within a file--they will execute in reverse -order of definition; that is: last in, first out (LIFO). +An C subroutine is executed as late as possible, that is, when +the interpreter is being exited, even if it is exiting as a result of +a die() function. (But not if it's polymorphing into another program +via C, or being blown out of the water by a signal--you have to +trap that yourself (if you can).) You may have multiple C blocks +within a file--they will execute in reverse order of definition; that is: +last in, first out (LIFO). -Inside an C subroutine C<$?> contains the value that the script is +Inside an C subroutine, C<$?> contains the value that the script is going to pass to C. You can modify C<$?> to change the exit value of the script. Beware of changing C<$?> by accident (e.g. by running something via C). -Note that when you use the B<-n> and B<-p> switches to Perl, C -and C work just as they do in B, as a degenerate case. +Note that when you use the B<-n> and B<-p> switches to Perl, C and +C work just as they do in B, as a degenerate case. As currently +implemented (and subject to change, since its inconvenient at best), +both C I C blocks are run when you use the B<-c> switch +for a compile-only syntax check, although your main code is not. =head2 Perl Classes There is no special class syntax in Perl, but a package may function -as a class if it provides subroutines that function as methods. Such a -package may also derive some of its methods from another class package -by listing the other package name in its @ISA array. +as a class if it provides subroutines to act as methods. Such a +package may also derive some of its methods from another class (package) +by listing the other package name in its global @ISA array (which +must be a package global, not a lexical). For more on this, see L and L. @@ -310,11 +284,11 @@ or This is exactly equivalent to - BEGIN { require "Module.pm"; import Module; } + BEGIN { require Module; import Module; } or - BEGIN { require "Module.pm"; import Module LIST; } + BEGIN { require Module; import Module LIST; } As a special case @@ -322,7 +296,7 @@ As a special case is exactly equivalent to - BEGIN { require "Module.pm"; } + BEGIN { require Module; } All Perl module files have the extension F<.pm>. C assumes this so that you don't have to spell out "F" in quotes. This also @@ -331,6 +305,19 @@ Module names are also capitalized unless they're functioning as pragmas, "Pragmas" are in effect compiler directives, and are sometimes called "pragmatic modules" (or even "pragmata" if you're a classicist). +The two statements: + + require SomeModule; + require "SomeModule.pm"; + +differ from each other in two ways. In the first case, any double +colons in the module name, such as C, are translated +into your system's directory separator, usually "/". The second +case does not, and would have to be specified literally. The other difference +is that seeing the first C clues in the compiler that uses of +indirect object notation involving "SomeModule", as in C<$ob = purge SomeModule>, +are method calls, not function calls. (Yes, this really can make a difference.) + Because the C statement implies a C block, the importation of semantics happens at the moment the C statement is compiled, before the rest of the file is compiled. This is how it is able @@ -348,7 +335,11 @@ instead of C. With require you can get into this problem: require Cwd; # make Cwd:: accessible $here = getcwd(); # oops! no main::getcwd() -In general C is recommended over C. +In general, C is recommended over C, +because it determines module availability at compile time, not in the +middle of your program's execution. An exception would be if two modules +each tried to C each other, and each also called a function from +that other module. In that case, it's easy to use Cs instead. Perl packages may be nested inside other package names, so we can have package names containing C<::>. But if we used that package name diff --git a/pod/perlmodlib.pod b/pod/perlmodlib.pod index 6e4da5e..9511f55 100644 --- a/pod/perlmodlib.pod +++ b/pod/perlmodlib.pod @@ -271,7 +271,7 @@ supply object methods for filehandles =item FindBin -locate directory of original perl script +locate directory of original Perl script =item GDBM_File @@ -368,7 +368,7 @@ by-name interface to Perl's builtin getserv*() functions =item Opcode -disable named opcodes when compiling or running perl code +disable named opcodes when compiling or running Perl code =item Pod::Text @@ -400,7 +400,7 @@ load functions only on demand =item Shell -run shell commands transparently within perl +run shell commands transparently within Perl =item Socket @@ -432,7 +432,7 @@ interface to various C packages =item Test::Harness -run perl standard test scripts with statistics +run Perl standard test scripts with statistics =item Text::Abbrev @@ -503,7 +503,7 @@ by-name interface to Perl's builtin getpw*() functions To find out I the modules installed on your system, including those without documentation or outside the standard release, do this: - find `perl -e 'print "@INC"'` -name '*.pm' -print + % find `perl -e 'print "@INC"'` -name '*.pm' -print They should all have their own documentation installed and accessible via your system man(1) command. If that fails, try the I program. @@ -762,7 +762,7 @@ Avoid C<$r-EClass::func()> where using C<@ISA=qw(... Class ...)> and C<$r-Efunc()> would work (see L for more details). Use autosplit so little used or newly added functions won't be a -burden to programs which don't use them. Add test functions to +burden to programs that don't use them. Add test functions to the module after __END__ either using AutoSplit or by saying: eval join('',) || die $@ unless caller(); @@ -779,12 +779,12 @@ information in objects. Always use B<-w>. Try to C (or C). Remember that you can add C to individual blocks -of code which need less strictness. Always use B<-w>. Always use B<-w>! +of code that need less strictness. Always use B<-w>. Always use B<-w>! Follow the guidelines in the perlstyle(1) manual. =item Some simple style guidelines -The perlstyle manual supplied with perl has many helpful points. +The perlstyle manual supplied with Perl has many helpful points. Coding style is a matter of personal taste. Many people evolve their style over several years as they learn what helps them write and @@ -804,7 +804,7 @@ use mixed case with no underscores (need to be short and portable). You may find it helpful to use letter case to indicate the scope or nature of a variable. For example: - $ALL_CAPS_HERE constants only (beware clashes with perl vars) + $ALL_CAPS_HERE constants only (beware clashes with Perl vars) $Some_Caps_Here package-wide global/static $no_caps_here function scope my() or local() variables @@ -934,7 +934,7 @@ GPL and The Artistic Licence (see the files README, Copying, and Artistic). Larry has good reasons for NOT just using the GNU GPL. My personal recommendation, out of respect for Larry, Perl, and the -perl community at large is to state something simply like: +Perl community at large is to state something simply like: Copyright (c) 1995 Your Name. All rights reserved. This program is free software; you can redistribute it and/or @@ -969,7 +969,7 @@ If possible you should place the module into a major ftp archive and include details of its location in your announcement. Some notes about ftp archives: Please use a long descriptive file -name which includes the version number. Most incoming directories +name that includes the version number. Most incoming directories will not be readable/listable, i.e., you won't be able to see your file after uploading it. Remember to send your email notification message as soon as possible after uploading else your file may get @@ -1019,7 +1019,7 @@ there is no need to convert a .pl file into a Module for just that. =item Consider the implications. -All the perl applications which make use of the script will need to +All Perl applications that make use of the script will need to be changed (slightly) if the script is converted into a module. Is it worth it unless you plan to make other changes at the same time? @@ -1062,7 +1062,7 @@ Don't delete the original .pl file till the new .pm one works! =item Complete applications rarely belong in the Perl Module Library. -=item Many applications contain some perl code which could be reused. +=item Many applications contain some Perl code that could be reused. Help save the world! Share your code in a form that makes it easy to reuse. @@ -1076,9 +1076,9 @@ to reuse. fragment of code built on top of the reusable modules. In these cases the application could invoked as: - perl -e 'use Module::Name; method(@ARGV)' ... + % perl -e 'use Module::Name; method(@ARGV)' ... or - perl -mModule::Name ... (in perl5.002 or higher) + % perl -mModule::Name ... (in perl5.002 or higher) =back diff --git a/pod/perlobj.pod b/pod/perlobj.pod index 3d7bee8..f10fbdf 100644 --- a/pod/perlobj.pod +++ b/pod/perlobj.pod @@ -44,12 +44,28 @@ constructor: package Critter; sub new { bless {} } -The C<{}> constructs a reference to an anonymous hash containing no -key/value pairs. The bless() takes that reference and tells the object -it references that it's now a Critter, and returns the reference. -This is for convenience, because the referenced object itself knows that -it has been blessed, and the reference to it could have been returned -directly, like this: +That word C isn't special. You could have written +a construct this way, too: + + package Critter; + sub spawn { bless {} } + +In fact, this might even be preferable, because the C++ programmers won't +be tricked into thinking that C works in Perl as it does in C++. +It doesn't. We recommend that you name your constructors whatever +makes sense in the context of the problem you're solving. For example, +constructors in the Tk extension to Perl are named after the widgets +they create. + +One thing that's different about Perl constructors compared with those in +C++ is that in Perl, they have to allocate their own memory. (The other +things is that they don't automatically call overridden base-class +constructors.) The C<{}> allocates an anonymous hash containing no +key/value pairs, and returns it The bless() takes that reference and +tells the object it references that it's now a Critter, and returns +the reference. This is for convenience, because the referenced object +itself knows that it has been blessed, and the reference to it could +have been returned directly, like this: sub new { my $self = {}; @@ -61,7 +77,7 @@ In fact, you often see such a thing in more complicated constructors that wish to call methods in the class as part of the construction: sub new { - my $self = {} + my $self = {}; bless $self; $self->initialize(); return $self; @@ -75,7 +91,7 @@ so that your constructors may be inherited: sub new { my $class = shift; my $self = {}; - bless $self, $class + bless $self, $class; $self->initialize(); return $self; } @@ -89,7 +105,7 @@ object into: my $this = shift; my $class = ref($this) || $this; my $self = {}; - bless $self, $class + bless $self, $class; $self->initialize(); return $self; } @@ -103,7 +119,8 @@ A constructor may re-bless a referenced object currently belonging to another class, but then the new class is responsible for all cleanup later. The previous blessing is forgotten, as an object may belong to only one class at a time. (Although of course it's free to -inherit methods from many classes.) +inherit methods from many classes.) If you find yourself having to +do this, the parent class is probably misbehaving, though. A clarification: Perl objects are blessed. References are not. Objects know which package they belong to. References do not. The bless() @@ -124,7 +141,7 @@ Unlike say C++, Perl doesn't provide any special syntax for class definitions. You use a package as a class by putting method definitions into the class. -There is a special array within each package called @ISA which says +There is a special array within each package called @ISA, which says where else to look for a method if you can't find it in the current package. This is how Perl implements inheritance. Each element of the @ISA array is just the name of another package that happens to be a @@ -132,33 +149,44 @@ class package. The classes are searched (depth first) for missing methods in the order that they occur in @ISA. The classes accessible through @ISA are known as base classes of the current class. +All classes implicitly inherit from class C as their +last base class. Several commonly used methods are automatically +supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for +more details. + If a missing method is found in one of the base classes, it is cached in the current class for efficiency. Changing @ISA or defining new subroutines invalidates the cache and causes Perl to do the lookup again. -If a method isn't found, but an AUTOLOAD routine is found, then -that is called on behalf of the missing method. - -If neither a method nor an AUTOLOAD routine is found in @ISA, then one -last try is made for the method (or an AUTOLOAD routine) in a class -called UNIVERSAL. (Several commonly used methods are automatically -supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for -more details.) If that doesn't work, Perl finally gives up and -complains. - -Perl classes do only method inheritance. Data inheritance is left -up to the class itself. By and large, this is not a problem in Perl, -because most classes model the attributes of their object using -an anonymous hash, which serves as its own little namespace to be -carved up by the various classes that might want to do something -with the object. +If neither the current class, its named base classes, nor the UNIVERSAL +class contains the requested method, these three places are searched +all over again, this time looking for a method named AUTOLOAD(). If an +AUTOLOAD is found, this method is called on behalf of the missing method, +setting the package global $AUTOLOAD to be the fully qualified name of +the method that was intended to be called. + +If none of that works, Perl finally gives up and complains. + +Perl classes do method inheritance only. Data inheritance is left up +to the class itself. By and large, this is not a problem in Perl, +because most classes model the attributes of their object using an +anonymous hash, which serves as its own little namespace to be carved up +by the various classes that might want to do something with the object. +The only problem with this is that you can't sure that you aren't using +a piece of the hash that isn't already used. A reasonable workaround +is to prepend your fieldname in the hash with the package name. + + sub bump { + my $self = shift; + $self->{ __PACKAGE__ . ".count"}++; + } =head2 A Method is Simply a Subroutine Unlike say C++, Perl doesn't provide any special syntax for method definition. (It does provide a little syntax for method invocation though. More on that later.) A method expects its first argument -to be the object or package it is being invoked on. There are just two +to be the object (reference) or package (string) it is being invoked on. There are just two types of methods, which we'll call class and instance. (Sometimes you'll hear these called static and virtual, in honor of the two C++ method types they most closely resemble.) @@ -291,7 +319,7 @@ allows the ability to check what a reference points to. Example use UNIVERSAL qw(isa); if(isa($ref, 'ARRAY')) { - ... + #... } =item can(METHOD) @@ -352,18 +380,47 @@ elsewhere. =head2 WARNING -An indirect object is limited to a name, a scalar variable, or a block, -because it would have to do too much lookahead otherwise, just like any -other postfix dereference in the language. The left side of -E is not so -limited, because it's an infix operator, not a postfix operator. +While indirect object syntax may well be appealing to English speakers and +to C++ programmers, be not seduced! It suffers from two grave problems. -That means that in the following, A and B are equivalent to each other, and -C and D are equivalent, but A/B and C/D are different: +The first problem is that an indirect object is limited to a name, +a scalar variable, or a block, because it would have to do too much +lookahead otherwise, just like any other postfix dereference in the +language. (These are the same quirky rules as are used for the filehandle +slot in functions like C and C.) This can lead to horribly +confusing precedence problems, as in these next two lines: - A: method $obref->{"fieldname"} - B: (method $obref)->{"fieldname"} - C: $obref->{"fieldname"}->method() - D: method {$obref->{"fieldname"}} + move $obj->{FIELD}; # probably wrong! + move $ary[$i]; # probably wrong! + +Those actually parse as the very surprising: + + $obj->move->{FIELD}; # Well, lookee here + $ary->move->[$i]; # Didn't expect this one, eh? + +Rather than what you might have expected: + + $obj->{FIELD}->move(); # You should be so lucky. + $ary[$i]->move; # Yeah, sure. + +The left side of ``-E'' is not so limited, because it's an infix operator, +not a postfix operator. + +As if that weren't bad enough, think about this: Perl must guess I whether C and C above are functions or methods. +Usually Perl gets it right, but when it doesn't it, you get a function +call compiled as a method, or vice versa. This can introduce subtle +bugs that are hard to unravel. For example, calling a method C +in indirect notation--as C++ programmers are so wont to do--can +be miscompiled into a subroutine call if there's already a C +function in scope. You'd end up calling the current package's C +as a subroutine, rather than the desired class's method. The compiler +tries to cheat by remembering bareword Cs, but the grief if it +messes up just isn't worth the years of debugging it would likely take +you to to track such subtle bugs down. + +The infix arrow notation using ``C<-E>'' doesn't suffer from either +of these disturbing ambiguities, so we recommend you use it exclusively. =head2 Summary @@ -470,6 +527,11 @@ C<-DDEBUGGING> was enabled during perl build time. A more complete garbage collection strategy will be implemented at a future date. +In the meantime, the best solution is to create a non-recursive container +class that holds a pointer to the self-referential data structure. +Define a DESTROY method for the containing object's class that manually +breaks the circularities in the self-referential structure. + =head1 SEE ALSO A kinder, gentler tutorial on object-oriented programming in Perl can diff --git a/pod/perlop.pod b/pod/perlop.pod index cae38eb..fe6ba1e 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -38,6 +38,8 @@ operate on scalar values only, not array values. In the following sections, these operators are covered in precedence order. +Many operators can be overloaded for objects. See L. + =head1 DESCRIPTION =head2 Terms and List Operators (Leftward) @@ -114,7 +116,7 @@ The auto-increment operator has a little extra builtin magic to it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has been used in only string contexts since it was set, and -has a value that is not null and matches the pattern +has a value that is not the empty string and matches the pattern C, the increment is done as a string, preserving each character within its range, with carry: @@ -144,8 +146,9 @@ starts with a plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that C<-bareword> is equivalent to C<"-bareword">. -Unary "~" performs bitwise negation, i.e., 1's complement. -(See also L and L.) +Unary "~" performs bitwise negation, i.e., 1's complement. For example, +C<0666 &~ 027> is 0640. (See also L and L.) Unary "+" has no effect whatsoever, even on strings. It is useful syntactically for separating a function name from a parenthesized expression @@ -184,16 +187,18 @@ operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is C<$a> minus the largest multiple of C<$b> that is not greater than C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the smallest multiple of C<$b> that is not less than C<$a> (i.e. the -result will be less than or equal to zero). +result will be less than or equal to zero). If C is +in effect, the native hardware will be used instead of this rule, +which may be construed a bug that will be fixed at some point. -Note than when C is in scope "%" give you direct access +Note than when C is in scope, "%" give you direct access to the modulus operator as implemented by your C compiler. This operator is not as well defined for negative operands, but it will execute faster. -Binary "x" is the repetition operator. In a scalar context, it +Binary "x" is the repetition operator. In scalar context, it returns a string consisting of the left operand repeated the number of -times specified by the right operand. In a list context, if the left +times specified by the right operand. In list context, if the left operand is a list in parentheses, it repeats the list. print '-' x 80; # print row of dashes @@ -336,11 +341,18 @@ way to find out the home directory (assuming it's not "0") might be: $home = $ENV{'HOME'} || $ENV{'LOGDIR'} || (getpwuid($<))[7] || die "You're homeless!\n"; -As more readable alternatives to C<&&> and C<||>, Perl provides "and" and -"or" operators (see below). The short-circuit behavior is identical. The -precedence of "and" and "or" is much lower, however, so that you can -safely use them after a list operator without the need for -parentheses: +In particular, this means that you shouldn't use this +for selecting between two aggregates for assignment: + + @a = @b || @c; # this is wrong + @a = scalar(@b) || @c; # really meant this + @a = @b ? @b : @c; # this works fine, though + +As more readable alternatives to C<&&> and C<||> when used for +control flow, Perl provides C and C operators (see below). +The short-circuit behavior is identical. The precedence of "and" and +"or" is much lower, however, so that you can safely use them after a +list operator without the need for parentheses: unlink "alpha", "beta", "gamma" or gripe(), next LINE; @@ -350,10 +362,12 @@ With the C-style operators that would have been written like this: unlink("alpha", "beta", "gamma") || (gripe(), next LINE); -=head2 Range Operator +Use "or" for assignment is unlikely to do what you want; see below. + +=head2 Range Operators Binary ".." is the range operator, which is really two different -operators depending on the context. In a list context, it returns an +operators depending on the context. In list context, it returns an array of values counting (by ones) from the left value to the right value. This is useful for writing C loops and for doing slice operations on arrays. Be aware that under the current implementation, @@ -364,7 +378,7 @@ write something like this: # code } -In a scalar context, ".." returns a boolean value. The operator is +In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of B, B, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. @@ -378,7 +392,7 @@ If you don't want it to test the right operand till the next evaluation operand is not evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is in the "true" state. The precedence is a little lower than || and &&. The value -returned is either the null string for false, or a sequence number +returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you @@ -394,13 +408,22 @@ As a scalar operator: next line if (1 .. /^$/); # skip header lines s/^/> / if (/^$/ .. eof()); # quote body + # parse mail messages + while (<>) { + $in_header = 1 .. /^$/; + $in_body = /^$/ .. eof(); + # do something based on those + } continue { + close ARGV if eof; # reset $. each file + } + As a list operator: for (101 .. 200) { print; } # print $_ 100 times @foo = @foo[0 .. $#foo]; # an expensive no-op @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items -The range operator (in a list context) makes use of the magical +The range operator (in list context) makes use of the magical auto-increment algorithm if the operands are strings. You can say @@ -443,6 +466,19 @@ legal lvalues (meaning that you can assign to them): This is not necessarily guaranteed to contribute to the readability of your program. +Because this operator produces an assignable result, using assignments +without parentheses will get you in trouble. For example, this: + + $a % 2 ? $a += 10 : $a += 2 + +Really means this: + + (($a % 2) ? ($a += 10) : $a) += 2 + +Rather than this: + + ($a % 2) ? ($a += 10) : ($a += 2) + =head2 Assignment Operators "=" is the ordinary assignment operator. @@ -485,11 +521,11 @@ is equivalent to =head2 Comma Operator -Binary "," is the comma operator. In a scalar context it evaluates +Binary "," is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator. -In a list context, it's just the list argument separator, and inserts +In list context, it's just the list argument separator, and inserts both its arguments into the list. The =E digraph is mostly just a synonym for the comma operator. It's useful for @@ -524,9 +560,27 @@ expression is evaluated only if the left expression is true. =head2 Logical or and Exclusive Or Binary "or" returns the logical disjunction of the two surrounding -expressions. It's equivalent to || except for the very low -precedence. This means that it short-circuits: i.e., the right -expression is evaluated only if the left expression is false. +expressions. It's equivalent to || except for the very low precedence. +This makes it useful for control flow + + print FH $data or die "Can't write to FH: $!"; + +This means that it short-circuits: i.e., the right expression is evaluated +only if the left expression is false. Due to its precedence, you should +probably avoid using this for assignment, only for control flow. + + $a = $b or $c; # bug: this is wrong + ($a = $b) or $c; # really means this + $a = $b || $c; # better written this way + +However, when it's a list context assignment and you're trying to use +"||" for control flow, you probably need "or" so that the assignment +takes higher precedence. + + @info = stat($file) || die; # oops, scalar sense of stat! + @info = stat($file) or die; # better, now @info gets its due + +Then again, you could always use parentheses. Binary "xor" returns the exclusive-OR of the two surrounding expressions. It cannot short circuit, of course. @@ -586,7 +640,7 @@ or "C<@>" are interpolated, as are the following sequences. Within a transliteration, the first ten of these sequences may be used. \t tab (HT, TAB) - \n newline (LF, NL) + \n newline (NL) \r return (CR) \f form feed (FF) \b backspace (BS) @@ -606,6 +660,20 @@ a transliteration, the first ten of these sequences may be used. If C is in effect, the case map used by C<\l>, C<\L>, C<\u> and C<\U> is taken from the current locale. See L. +All systems use the virtual C<"\n"> to represent a line terminator, +called a "newline". There is no such thing as an unvarying, physical +newline character. It is an illusion that the operating system, +device drivers, C libraries, and Perl all conspire to preserve. Not all +systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example, +on a Mac, these are reversed, and on systems without line terminator, +printing C<"\n"> may emit no actual data. In general, use C<"\n"> when +you mean a "newline" for your system, but use the literal ASCII when you +need an exact character. For example, most networking protocols expect +and prefer a CR+LF (C<"\012\015"> or C<"\cJ\cM">) for line terminators, +and although they often accept just C<"\012">, they seldom tolerate just +C<"\015">. If you get in the habit of using C<"\n"> for networking, +you may be burned some day. + You cannot include a literal C<$> or C<@> within a C<\Q> sequence. An unescaped C<$> or C<@> interpolates the corresponding variable, while escaping will cause the literal string C<\$> to be inserted. @@ -637,6 +705,14 @@ optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only C patterns local to the current package are reset. + while (<>) { + if (?^$?) { + # blank line between header and body + } + } continue { + reset if eof; # clear ?? status for next file + } + This usage is vaguely deprecated, and may be removed in some future version of Perl. @@ -644,13 +720,13 @@ version of Perl. =item /PATTERN/cgimosx -Searches a string for a pattern match, and in a scalar context returns +Searches a string for a pattern match, and in scalar context returns true (1) or false (''). If no string is specified via the C<=~> or C operator, the $_ string is searched. (The string specified with C<=~> need not be an lvalue--it may be the result of an expression evaluation, but remember the C<=~> binds rather tightly.) See also L. -See L for discussion of additional considerations which apply +See L for discussion of additional considerations that apply when C is in effect. Options are: @@ -680,8 +756,8 @@ interpolating won't change over the life of the script. However, mentioning C constitutes a promise that you won't change the variables in the pattern. If you change them, Perl won't even notice. -If the PATTERN evaluates to a null string, the last -successfully matched regular expression is used instead. +If the PATTERN evaluates to the empty string, the last +I matched regular expression is used instead. If used in a context that requires a list value, a pattern match returns a list consisting of the subexpressions matched by the parentheses in the @@ -714,12 +790,12 @@ the pattern matched. The C modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on -the context. In a list context, it returns a list of all the +the context. In list context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. -In a scalar context, C iterates through the string, returning TRUE +In scalar context, C iterates through the string, returning TRUE each time it matches, and FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current match @@ -823,17 +899,53 @@ A double-quoted, interpolated string. =item `STRING` -A string which is interpolated and then executed as a system command. -The collected standard output of the command is returned. In scalar -context, it comes back as a single (potentially multi-line) string. -In list context, returns a list of lines (however you've defined lines -with $/ or $INPUT_RECORD_SEPARATOR). +A string which is (possibly) interpolated and then executed as a system +command with C or its equivalent. Shell wildcards, pipes, +and redirections will be honored. The collected standard output of the +command is returned; standard error is unaffected. In scalar context, +it comes back as a single (potentially multi-line) string. In list +context, returns a list of lines (however you've defined lines with $/ +or $INPUT_RECORD_SEPARATOR). + +Because backticks do not affect standard error, use shell file descriptor +syntax (assuming the shell supports this) if you care to address this. +To capture a command's STDERR and STDOUT together: - $today = qx{ date }; + $output = `cmd 2>&1`; + +To capture a command's STDOUT but discard its STDERR: + + $output = `cmd 2>/dev/null`; + +To capture a command's STDERR but discard its STDOUT (ordering is +important here): + + $output = `cmd 2>&1 1>/dev/null`; + +To exchange a command's STDOUT and STDERR in order to capture the STDERR +but leave its STDOUT to come out the old STDERR: + + $output = `cmd 3>&1 1>&2 2>&3 3>&-`; + +To read both a command's STDOUT and its STDERR separately, it's easiest +and safest to redirect them separately to files, and then read from those +files when the program is done: + + system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr"); + +Using single-quote as a delimiter protects the command from Perl's +double-quote interpolation, passing it on to the shell instead: + + $perl_info = qx(ps $$); # that's Perl's $$ + $shell_info = qx'ps $$'; # that's the new shell's $$ + +Note that how the string gets evaluated is entirely subject to the command +interpreter on your system. On most platforms, you will have to protect +shell metacharacters if you want them treated literally. This is in +practice difficult to do, as it's unclear how to escape which characters. +See L for a clean and safe example of a manual fork() and exec() +to emulate backticks safely. -Note that how the string gets evaluated is entirely subject to the -command interpreter on your system. On most platforms, you will have -to protect shell metacharacters if you want them treated literally. On some platforms (notably DOS-like ones), the shell may not be capable of dealing with multiline commands, so putting newlines in the string may not get you what you want. You may be able to evaluate @@ -846,8 +958,14 @@ of the command line. You must ensure your strings don't exceed this limit after any necessary interpolations. See the platform-specific release notes for more details about your particular environment. -Also realize that using this operator frequently leads to unportable -programs. +Using this operator can lead to programs that are difficult to port, +because the shell commands called vary between systems, and may in +fact not be present at all. As one example, the C command under +the POSIX shell is very different from the C command under DOS. +That doesn't mean you should go out of your way to avoid backticks +when they're the right way to get something done. Perl was made to be +a glue language, and one of the things it glues together is commands. +Just understand what you're getting yourself into. See L<"I/O Operators"> for more discussion. @@ -858,13 +976,16 @@ whitespace as the word delimiters. It is exactly equivalent to split(' ', q/STRING/); +This equivalency means that if used in scalar context, you'll get split's +(unfortunate) scalar context behavior, complete with mysterious warnings. + Some frequently seen examples: use POSIX qw( setlocale localeconv ) @EXPORT = qw( foo bar baz ); A common mistake is to try to separate the words with comma or to put -comments into a multi-line qw-string. For this reason the C<-w> +comments into a multi-line C-string. For this reason the C<-w> switch produce warnings if the STRING contains the "," or the "#" character. @@ -876,7 +997,7 @@ made. Otherwise it returns false (specifically, the empty string). If no string is specified via the C<=~> or C operator, the C<$_> variable is searched and modified. (The string specified with C<=~> must -be a scalar variable, an array element, a hash element, or an assignment +be scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) If the delimiter chosen is single quote, no variable interpolation is @@ -885,9 +1006,9 @@ PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the C option. If the pattern -evaluates to a null string, the last successfully executed regular +evaluates to the empty string, the last successfully executed regular expression is used instead. See L for further explanation on these. -See L for discussion of additional considerations which apply +See L for discussion of additional considerations that apply when C is in effect. Options are: @@ -920,9 +1041,9 @@ Examples: s/Login: $foo/Login: $bar/; # run-time pattern - ($foo = $bar) =~ s/this/that/; + ($foo = $bar) =~ s/this/that/; # copy first, then change - $count = ($paragraph =~ s/Mister\b/Mr./g); + $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count $_ = 'abc123xyz'; s/\d+/$&*2/e; # yields 'abc246xyz' @@ -933,18 +1054,27 @@ Examples: s/%(.)/$percent{$1} || $&/ge; # expr now, so /e s/^=(\w+)/&pod($1)/ge; # use function call + # expand variables in $_, but dynamics only, using + # symbolic dereferencing + s/\$(\w+)/${$1}/g; + # /e's can even nest; this will expand - # simple embedded variables in $_ + # any embedded scalar variable (including lexicals) in $_ s/(\$\w+)/$1/eeg; - # Delete C comments. + # Delete (most) C comments. $program =~ s { /\* # Match the opening delimiter. .*? # Match a minimal number of characters. \*/ # Match the closing delimiter. } []gsx; - s/^\s*(.*?)\s*$/$1/; # trim white space + s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively + + for ($variable) { # trim white space in $variable, cheap + s/^\s+//; + s/\s+$//; + } s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields @@ -998,7 +1128,7 @@ character. If the C modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long -enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated. +enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class. @@ -1045,8 +1175,8 @@ There are several I/O operators you should know about. A string enclosed by backticks (grave accents) first undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the output of that command is the value -of the pseudo-literal, like in a shell. In a scalar context, a single -string consisting of all the output is returned. In a list context, +of the pseudo-literal, like in a shell. In scalar context, a single +string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set C<$/> to use a different line terminator.) The command is executed each time the pseudo-literal is evaluated. The status value of the @@ -1066,7 +1196,7 @@ situation where an automatic assignment happens. I the input symbol is the only thing inside the conditional of a C or C loop, the value is automatically assigned to the variable C<$_>. In these loop constructs, the assigned value (whether assignment -is automatic or explcit) is then tested to see if it is defined. +is automatic or explicit) is then tested to see if it is defined. The defined test avoids problems where line has a string value that would be treated as false by perl e.g. "" or "0" with no trailing newline. (This may seem like an odd thing to you, but you'll use the @@ -1086,12 +1216,12 @@ and this also behaves similarly, but avoids the use of $_ : while (my $line = ) { print $line } If you really mean such values to terminate the loop they should be -tested for explcitly: +tested for explicitly: while (($_ = ) ne '0') { ... } while () { last unless $_; ... } -In other boolean contexts CIE> without explcit C +In other boolean contexts, CIE> without explicit C test or comparison will solicit a warning if C<-w> is in effect. The filehandles STDIN, STDOUT, and STDERR are predefined. (The @@ -1109,7 +1239,7 @@ The null filehandle EE is special and can be used to emulate the behavior of B and B. Input from EE comes either from standard input, or from each file listed on the command line. Here's how it works: the first time EE is evaluated, the @ARGV array is -checked, and if it is null, C<$ARGV[0]> is set to "-", which when opened +checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop @@ -1136,10 +1266,19 @@ doesn't work because it treats EARGVE as non-magical.) You can modify @ARGV before the first EE as long as the array ends up containing the list of filenames you really want. Line numbers (C<$.>) continue as if the input were one big happy file. (But see example -under eof() for how to reset line numbers on each file.) +under C for how to reset line numbers on each file.) + +If you want to set @ARGV to your own list of files, go right ahead. +This sets @ARGV to all plain text files if no @ARGV was given: + + @ARGV = grep { -f && -T } glob('*') unless @ARGV; -If you want to set @ARGV to your own list of files, go right ahead. If -you want to pass switches into your script, you can use one of the +You can even set them to pipe commands. For example, this automatically +filters compressed arguments through B: + + @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV; + +If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this: while ($_ = $ARGV[0], /^-/) { @@ -1147,10 +1286,11 @@ Getopts modules or put a loop on the front like this: last if /^--$/; if (/^-D(.*)/) { $debug = $1 } if (/^-v/) { $verbose++ } - ... # other switches + # ... # other switches } + while (<>) { - ... # code for each line + # ... # code for each line } The EE symbol will return C for end-of-file only once. @@ -1159,22 +1299,28 @@ If you call it again after this it will assume you are processing another If the string inside the angle brackets is a reference to a scalar variable (e.g., E$fooE), then that variable contains the name of the -filehandle to input from, or a reference to the same. For example: +filehandle to input from, or its typeglob, or a reference to the same. For example: $fh = \*STDIN; $line = <$fh>; -If the string inside angle brackets is not a filehandle or a scalar -variable containing a filehandle name or reference, then it is interpreted -as a filename pattern to be globbed, and either a list of filenames or the -next filename in the list is returned, depending on context. One level of -$ interpretation is done first, but you can't say C$fooE> -because that's an indirect filehandle as explained in the previous -paragraph. (In older versions of Perl, programmers would insert curly -brackets to force interpretation as a filename glob: C${foo}E>. -These days, it's considered cleaner to call the internal function directly -as C, which is probably the right way to have done it in the -first place.) Example: +If what's within the angle brackets is neither a filehandle nor a simple +scalar variable containing a filehandle name, typeglob, or typeglob +reference, it is interpreted as a filename pattern to be globbed, and +either a list of filenames or the next filename in the list is returned, +depending on context. This distinction is determined on syntactic +grounds alone. That means C$xE> is always a readline from +an indirect handle, but C$hash{key}E> is always a glob. +That's because $x is a simple scalar variable, but C<$hash{key}> is +not--it's a hash element. + +One level of double-quote interpretation is done first, but you can't +say C$fooE> because that's an indirect filehandle as explained +in the previous paragraph. (In older versions of Perl, programmers +would insert curly brackets to force interpretation as a filename glob: +C${foo}E>. These days, it's considered cleaner to call the +internal function directly as C, which is probably the right +way to have done it in the first place.) Example: while (<*.c>) { chmod 0644, $_; @@ -1202,7 +1348,7 @@ long" errors (unless you've installed tcsh(1L) as F). A glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In a list context this isn't important, because you automatically get them all -anyway. In a scalar context, however, the operator returns the next value +anyway. In scalar context, however, the operator returns the next value each time it is called, or a C value if you've just run out. As for filehandles an automatic C is generated when the glob occurs in the test part of a C or C - because legal glob returns @@ -1229,7 +1375,7 @@ to become confused with the indirect filehandle notation. =head2 Constant Folding Like C, Perl does a certain amount of expression evaluation at -compile time, whenever it determines that all of the arguments to an +compile time, whenever it determines that all arguments to an operator are static and have no side effects. In particular, string concatenation happens at compile time between literals that don't do variable substitution. Backslash interpretation also happens at @@ -1242,7 +1388,7 @@ and this all reduces to one string internally. Likewise, if you say foreach $file (@filenames) { - if (-s $file > 5 + 100 * 2**16) { ... } + if (-s $file > 5 + 100 * 2**16) { } } the compiler will precompute the number that @@ -1299,7 +1445,7 @@ However, C still has meaning for them. By default, their results are interpreted as unsigned integers. However, if C is in effect, their results are interpreted as signed integers. For example, C<~0> usually evaluates -to a large integral value. However, C is -1. +to a large integral value. However, C is -1 on twos-complement machines. =head2 Floating-point Arithmetic @@ -1308,6 +1454,27 @@ similar ways to provide rounding or truncation at a certain number of decimal places. For rounding to a certain number of digits, sprintf() or printf() is usually the easiest route. +Floating-point numbers are only approximations to what a mathematician +would call real numbers. There are infinitely more reals than floats, +so some corners must be cut. For example: + + printf "%.20g\n", 123456789123456789; + # produces 123456789123456784 + +Testing for exact equality of floating-point equality or inequality is +not a good idea. Here's a (relatively expensive) work-around to compare +whether two floating-point numbers are equal to a particular number of +decimal places. See Knuth, volume II, for a more robust treatment of +this topic. + + sub fp_equal { + my ($X, $Y, $POINTS) = @_; + my ($tX, $tY); + $tX = sprintf("%.${POINTS}g", $X); + $tY = sprintf("%.${POINTS}g", $Y); + return $tX eq $tY; + } + The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of other mathematical and trigonometric functions. The Math::Complex module (part of the standard perl @@ -1320,3 +1487,17 @@ the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself. + +=head2 Bigger Numbers + +The standard Math::BigInt and Math::BigFloat modules provide +variable precision arithmetic and overloaded operators. +At the cost of some space and considerable speed, they +avoid the normal pitfalls associated with limited-precision +representations. + + use Math::BigInt; + $x = Math::BigInt->new('123456789123456789'); + print $x * $x; + + # prints +15241578780673678515622620750190521 diff --git a/pod/perlre.pod b/pod/perlre.pod index da32f87..927d088 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -10,7 +10,7 @@ operations, plus various examples of the same, see C and C in L. The matching operations can have various modifiers. The modifiers -which relate to the interpretation of the regular expression inside +that relate to the interpretation of the regular expression inside are listed below. For the modifiers that alter the behaviour of the operation, see L and L. @@ -34,8 +34,8 @@ line anywhere within the string, Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which it normally would not match. -The /s and /m modifiers both override the C<$*> setting. That is, no matter -what C<$*> contains, /s (without /m) will force "^" to match only at the +The C and C modifiers both override the C<$*> setting. That is, no matter +what C<$*> contains, C without C will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while yet allowing "^" and "$" to match, @@ -58,18 +58,19 @@ backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The C<#> character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real -whitespace or C<#> characters in the pattern that you'll have to either +whitespace or C<#> characters in the pattern (outside of a character +class, where they are unaffected by C), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did -not intend to close the pattern early. See the C comment deletion code +not intend to close the pattern early. See the C-comment deletion code in L. =head2 Regular Expressions The patterns used in pattern matching are regular expressions such as -those supplied in the Version 8 regexp routines. (In fact, the +those supplied in the Version 8 regex routines. (In fact, the routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.) See L for details. @@ -146,7 +147,7 @@ also work: \L lowercase till \E (think vi) \U uppercase till \E (think vi) \E end case modification (think vi) - \Q quote (disable) regexp metacharacters till \E + \Q quote (disable) pattern metacharacters till \E If C is in effect, the case map used by C<\l>, C<\L>, C<\u> and C<\U> is taken from the current locale. See L. @@ -165,7 +166,7 @@ In addition, Perl defines the following: \d Match a digit character \D Match a non-digit character -Note that C<\w> matches a single alphanumeric character, not a whole +A C<\w> matches a single alphanumeric character, not a whole word. To match a word you'd need to say C<\w+>. If C is in effect, the list of alphabetic characters generated by C<\w> is taken from the current locale. See L. You may use C<\w>, C<\W>, @@ -185,7 +186,7 @@ has a C<\w> on one side of it and a C<\W> on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a C<\W>. (Within character classes C<\b> represents backspace rather than a word boundary.) The C<\A> and C<\Z> are -just like "^" and "$" except that they won't match multiple times when the +just like "^" and "$", except that they won't match multiple times when the C modifier is used, while "^" and "$" will match at every internal line boundary. To match the actual end of the string, not ignoring newline, you can use C<\Z(?!\n)>. The C<\G> assertion can be used to chain global @@ -193,7 +194,7 @@ matches (using C), as described in L. It is also useful when writing C-like scanners, when you have several -regexps which you want to match against consequent substrings of your +patterns that you want to match against consequent substrings of your string, see the previous reference. The actual location where C<\G> will match can also be influenced by using C as an lvalue. See L. @@ -233,21 +234,21 @@ Once perl sees that you need one of C<$&>, C<$`> or C<$'> anywhere in the program, it has to provide them on each and every pattern match. This can slow your program down. The same mechanism that handles these provides for the use of $1, $2, etc., so you pay the same price -for each regexp that contains capturing parentheses. But if you never -use $&, etc., in your script, then regexps I capturing +for each pattern that contains capturing parentheses. But if you never +use $&, etc., in your script, then patterns I capturing parentheses won't be penalized. So avoid $&, $', and $` if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid -the price. +the price. As of 5.005, $& is not so costly as the other two. -You will note that all backslashed metacharacters in Perl are +Backslashed metacharacters in Perl are alphanumeric, such as C<\b>, C<\w>, C<\n>. Unlike some other regular expression languages, there are no backslashed symbols that aren't alphanumeric. So anything that looks like \\, \(, \), \E, \E, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a -string that you want to use for a pattern. Simply quote all the +string that you want to use for a pattern. Simply quote all non-alphanumeric characters: $pattern =~ s/(\W)/\\$1/g; @@ -271,31 +272,32 @@ function of the extension. Several extensions are already supported: A comment. The text is ignored. If the C switch is used to enable whitespace formatting, a simple C<#> will suffice. -=item C<(?:regexp)> +=item C<(?:pattern)> -This groups things like "()" but doesn't make backreferences like "()" does. So +This is for clustering, not capturing; it groups subexpressions like +"()", but doesn't make backreferences as "()" does. So - split(/\b(?:a|b|c)\b/) + @fields = split(/\b(?:a|b|c)\b/) is like - split(/\b(a|b|c)\b/) + @fields = split(/\b(a|b|c)\b/) but doesn't spit out extra fields. -=item C<(?=regexp)> +=item C<(?=pattern)> A zero-width positive lookahead assertion. For example, C matches a word followed by a tab, without including the tab in C<$&>. -=item C<(?!regexp)> +=item C<(?!pattern)> A zero-width negative lookahead assertion. For example C matches any occurrence of "foo" that isn't followed by "bar". Note however that lookahead and lookbehind are NOT the same thing. You cannot use this for lookbehind. -If you are looking for a "bar" which isn't preceded by a "foo", C +If you are looking for a "bar" that isn't preceded by a "foo", C will not do what you want. That's because the C<(?!foo)> is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. You would have to do something like C for that. We @@ -307,13 +309,13 @@ Sometimes it's still easier just to say: For lookbehind see below. -=item C<(?<=regexp)> +=item C<(?E=pattern)> -A zero-width positive lookbehind assertion. For example, C +A zero-width positive lookbehind assertion. For example, C=\t)\w+/> matches a word following a tab, without including the tab in C<$&>. Works only for fixed-width lookbehind. -=item C<(? +=item C<(? A zero-width negative lookbehind assertion. For example C matches any occurrence of "foo" that isn't following "bar". @@ -325,70 +327,80 @@ Experimental "evaluate any Perl code" zero-width assertion. Always succeeds. C is not interpolated. Currently the rules to determine where the C ends are somewhat convoluted. -=item C<(?Eregexp)> +B: This is a grave security risk for arbitrarily interpolated +patterns. It introduces security holes in previously safe programs. +A fix to Perl, and to this documentation, will be forthcoming prior +to the actual 5.005 release. -An "independend" subexpression. Matches the substring which a -I C would match if anchored at the given position, +=item C<(?Epattern)> + +An "independent" subexpression. Matches the substring that a +I C would match if anchored at the given position, B. Say, C<^(?Ea*)ab> will never match, since C<(?Ea*)> (anchored -at the beginning of string, as above) will match I the characters +at the beginning of string, as above) will match I characters C at the beginning of string, leaving no C for C to match. In contrast, C will match the same as C, since the match of the subgroup C is influenced by the following group C (see L<"Backtracking">). In particular, C inside C will match less characters that a standalone C, since this makes the tail match. -Note that a similar effect to C<(?Eregexp)> may be achieved by +An effect similar to C<(?Epattern)> may be achieved by - (?=(regexp))\1 + (?=(pattern))\1 since the lookahead is in I<"logical"> context, thus matches the same substring as a standalone C. The following C<\1> eats the matched string, thus making a zero-length assertion into an analogue of -C<(?>...)>. (The difference of these two constructions is that the -second one uses a catching group, thus shifts ordinals of +C<(?>...)>. (The difference between these two constructs is that the +second one uses a catching group, thus shifting ordinals of backreferences in the rest of a regular expression.) -This construction is very useful for optimizations of "eternal" -matches, since it will not backtrack (see L<"Backtracking">). Say, +This construct is useful for optimizations of "eternal" +matches, because it will not backtrack (see L<"Backtracking">). - / \( ( + m{ \( ( [^()]+ | \( [^()]* \) )+ - \) /x - -will match a nonempty group with matching two-or-less-level-deep -parentheses. It is very efficient in finding such groups. However, -if there is no such group, it is going to take forever (on reasonably -long string), since there are so many different ways to split a long -string into several substrings (this is essentially what C<(.+)+> is -doing, and this is a subpattern of the above pattern). Say, on -C<((()aaaaaaaaaaaaaaaaaa> the above pattern detects no-match in 5sec -(on kitchentop'96 processor), and each extra letter doubles this time. - -However, a tiny modification of this - - / \( ( + \) + }x + +That will efficiently match a nonempty group with matching +two-or-less-level-deep parentheses. However, if there is no such group, +it will take virtually forever on a long string. That's because there are +so many different ways to split a long string into several substrings. +This is essentially what C<(.+)+> is doing, and this is a subpattern +of the above pattern. Consider that C<((()aaaaaaaaaaaaaaaaaa> on the +pattern above detects no-match in several seconds, but that each extra +letter doubles this time. This exponential performance will make it +appear that your program has hung. + +However, a tiny modification of this pattern + + m{ \( ( (?> [^()]+ ) | \( [^()]* \) )+ - \) /x + \) + }x -which uses (?>...) matches exactly when the above one does (it is a -good excercise to check this), but finishes in a fourth of the above -time on a similar string with 1000000 Cs. +which uses C<(?E...)> matches exactly when the one above does (verifying +this yourself would be a productive exercise), but finishes in a fourth +the time when used on a similar string with 1000000 Cs. Be aware, +however, that this pattern currently triggers a warning message under +B<-w> saying it C<"matches the null string many times">): -Note that on simple groups like the above C<(?> [^()]+ )> a similar +On simple groups, such as the pattern C<(?> [^()]+ )>, a comparable effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>. This was only 4 times slower on a string with 1000000 Cs. -=item C<(?(condition)yes-regexp|no-regexp)> +=item C<(?(condition)yes-pattern|no-pattern)> -=item C<(?(condition)yes-regexp)> +=item C<(?(condition)yes-pattern)> Conditional expression. C<(condition)> should be either an integer in parentheses (which is valid if the corresponding pair of parentheses @@ -396,14 +408,15 @@ matched), or lookahead/lookbehind/evaluate zero-width assertion. Say, - / ( \( )? + m{ ( \( )? [^()]+ - (?(1) \) )/x + (?(1) \) ) + }x matches a chunk of non-parentheses, possibly included in parentheses themselves. -=item C<(?imstx)> +=item C<(?imsx)> One or more embedded pattern-match modifiers. This is particularly useful for patterns that are specified in a table somewhere, some of @@ -412,15 +425,14 @@ insensitive ones need to include merely C<(?i)> at the front of the pattern. For example: $pattern = "foobar"; - if ( /$pattern/i ) + if ( /$pattern/i ) { } # more flexible: $pattern = "(?i)foobar"; - if ( /$pattern/ ) + if ( /$pattern/ ) { } -Note that these modifiers are localized inside an enclosing group (if -any). Say, +These modifiers are localized inside an enclosing group (if any). Say, ( (?i) blah ) \s+ \1 @@ -430,15 +442,15 @@ case. =back -The specific choice of question mark for this and the new minimal -matching construct was because 1) question mark is pretty rare in older -regular expressions, and 2) whenever you see one, you should stop -and "question" exactly what is going on. That's psychology... +A question mark was chosen for this and for the new minimal-matching +construct because 1) question mark is pretty rare in older regular +expressions, and 2) whenever you see one, you should stop and "question" +exactly what is going on. That's psychology... =head2 Backtracking A fundamental feature of regular expression matching involves the -notion called I. which is currently used (when needed) +notion called I, which is currently used (when needed) by all regular expression quantifiers, namely C<*>, C<*?>, C<+>, C<+?>, C<{n,m}>, and C<{n,m}?>. @@ -539,7 +551,8 @@ As you see, this can be a bit tricky. It's important to realize that a regular expression is merely a set of assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition might succeed against a particular string. And if there are -multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve. +multiple ways it might succeed, you need to understand backtracking to +know which variety of success you will achieve. When using lookahead assertions and negations, this can all get even tricker. Imagine you'd like to find a sequence of non-digits not @@ -578,14 +591,13 @@ non-digits, you have something that's not 123?" If the pattern matcher had let C<\D*> expand to "ABC", this would have caused the whole pattern to fail. The search engine will initially match C<\D*> with "ABC". Then it will -try to match C<(?!123> with "123" which, of course, fails. But because +try to match C<(?!123> with "123", which of course fails. But because a quantifier (C<\D*>) has been used in the regular expression, the search engine can backtrack and retry the match differently in the hope of matching the complete regular expression. -Well now, -the pattern really, I wants to succeed, so it uses the -standard regexp back-off-and-retry and lets C<\D*> expand to just "AB" this +The pattern really, I wants to succeed, so it uses the +standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this time. Now there's indeed something following "AB" that is not "123". It's in fact "C123", which suffices. @@ -601,7 +613,7 @@ you'd expect; that is, case 5 will fail, but case 6 succeeds: 6: got ABC -In other words, the two zero-width assertions next to each other work like +In other words, the two zero-width assertions next to each other work as though they're ANDed together, just as you'd use any builtin assertions: C matches only if you're at the beginning of the line AND the end of the line simultaneously. The deeper underlying truth is that juxtaposition in @@ -621,31 +633,31 @@ And if you used C<*>'s instead of limiting it to 0 through 5 matches, then it would take literally forever--or until you ran out of stack space. A powerful tool for optimizing such beasts is "independent" groups, -which do not backtrace (see Lregexp)>>). Note also that +which do not backtrace (see Lpattern)>>). Note also that zero-length lookahead/lookbehind assertions will not backtrace to make the tail match, since they are in "logical" context: only the fact whether they match or not is considered relevant. For an example where side-effects of a lookahead I have influenced the -following match, see Lregexp)>>. +following match, see Lpattern)>>. =head2 Version 8 Regular Expressions -In case you're not familiar with the "regular" Version 8 regexp +In case you're not familiar with the "regular" Version 8 regex routines, here are the pattern-matching rules not described above. Any single character matches itself, unless it is a I with a special meaning described here or above. You can cause -characters which normally function as metacharacters to be interpreted +characters that normally function as metacharacters to be interpreted literally by prefixing them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a "\"). A series of characters matches that series of characters in the target string, so the pattern C would match "blurfl" in the target string. You can specify a character class, by enclosing a list of characters -in C<[]>, which will match any one of the characters in the list. If the +in C<[]>, which will match any one character from the list. If the first character after the "[" is "^", the class matches any character not in the list. Within a list, the "-" character is used to specify a -range, so that C represents all the characters between "a" and "z", +range, so that C represents all characters between "a" and "z", inclusive. If you want "-" itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. (The following all specify the same class of three characters: C<[-az]>, @@ -663,7 +675,7 @@ character except "\n" (unless you use C). You can specify a series of alternatives for a pattern using "|" to separate them, so that C will match any of "fee", "fie", -or "foe" in the target string (as would C). Note that the +or "foe" in the target string (as would C). The first alternative includes everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the last alternative contains everything from the last "|" to the next @@ -671,7 +683,7 @@ pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end. -Note that alternatives are tried from left to right, so the first +Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when mathing C against "barefoot", only the "foo" @@ -679,23 +691,23 @@ part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.) -Also note that "|" is interpreted as a literal within square brackets, +Also remember that "|" is interpreted as a literal within square brackets, so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>. Within a pattern, you may designate subpatterns for later reference by enclosing them in parentheses, and you may refer back to the Ith subpattern later in the pattern using the metacharacter \I. Subpatterns are numbered based on the left to right order of their -opening parenthesis. Note that a backreference matches whatever +opening parenthesis. A backreference matches whatever actually matched the subpattern in the string being examined, not the rules for that subpattern. Therefore, C<(0|0x)\d*\s\1\d*> will -match "0x1234 0x4321",but not "0x1234 01234", because subpattern 1 +match "0x1234 0x4321", but not "0x1234 01234", because subpattern 1 actually matched "0x", even though the rule C<0|0x> could potentially match the leading 0 in the second number. =head2 WARNING on \1 vs $1 -Some people get too used to writing things like +Some people get too used to writing things like: $pattern =~ s/(\W)/\\\1/g; @@ -707,7 +719,7 @@ meaning of C<\1> is kludged in for C. However, if you get into the habit of doing that, you get yourself into trouble if you then add an C modifier. - s/(\d+)/ \1 + 1 /eg; + s/(\d+)/ \1 + 1 /eg; # causes warning under -w Or if you try to do @@ -726,4 +738,4 @@ L. L. -"Mastering Regular Expressions" (see L) by Jeffrey Friedl. +I (see L) by Jeffrey Friedl. diff --git a/pod/perlref.pod b/pod/perlref.pod index 34c071f..50400b7 100644 --- a/pod/perlref.pod +++ b/pod/perlref.pod @@ -5,13 +5,13 @@ perlref - Perl references and nested data structures =head1 DESCRIPTION Before release 5 of Perl it was difficult to represent complex data -structures, because all references had to be symbolic, and even that was -difficult to do when you wanted to refer to a variable rather than a -symbol table entry. Perl not only makes it easier to use symbolic -references to variables, but lets you have "hard" references to any piece -of data. Any scalar may hold a hard reference. Because arrays and hashes -contain scalars, you can now easily build arrays of arrays, arrays of -hashes, hashes of arrays, arrays of hashes of functions, and so on. +structures, because all references had to be symbolic--and even then +it was difficult to refer to a variable instead of a symbol table entry. +Perl now not only makes it easier to use symbolic references to variables, +but also lets you have "hard" references to any piece of data or code. +Any scalar may hold a hard reference. Because arrays and hashes contain +scalars, you can now easily build arrays of arrays, arrays of hashes, +hashes of arrays, arrays of hashes of functions, and so on. Hard references are smart--they keep track of reference counts for you, automatically freeing the thing referred to when its reference count goes @@ -32,7 +32,7 @@ them that; references are confusing enough without useless synonyms.) In contrast, hard references are more like hard links in a Unix file system: They are used to access an underlying object without concern for what its (other) name is. When the word "reference" is used without an -adjective, like in the following paragraph, it usually is talking about a +adjective, as in the following paragraph, it is usually talking about a hard reference. References are easy to use in Perl. There is just one overriding @@ -41,7 +41,9 @@ scalar is holding a reference, it always behaves as a simple scalar. It doesn't magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing it. -References can be constructed in several ways. +=head2 Making References + +References can be created in several ways. =over 4 @@ -60,20 +62,20 @@ reference that the backslash returned. Here are some examples: $coderef = \&handler; $globref = \*foo; -It isn't possible to create a true reference to an IO handle (filehandle or -dirhandle) using the backslash operator. See the explanation of the -*foo{THING} syntax below. (However, you're apt to find Perl code -out there using globrefs as though they were IO handles, which is -grandfathered into continued functioning.) +It isn't possible to create a true reference to an IO handle (filehandle +or dirhandle) using the backslash operator. The most you can get is a +reference to a typeglob, which is actually a complete symbol table entry. +But see the explanation of the C<*foo{THING}> syntax below. However, +you can still use type globs and globrefs as though they were IO handles. =item 2. -A reference to an anonymous array can be constructed using square +A reference to an anonymous array can be created using square brackets: $arrayref = [1, 2, ['a', 'b', 'c']]; -Here we've constructed a reference to an anonymous array of three elements +Here we've created a reference to an anonymous array of three elements whose final element is itself a reference to another anonymous array of three elements. (The multidimensional syntax described later can be used to access this. For example, after the above, C<$arrayref-E[2][1]> would have @@ -91,7 +93,7 @@ of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>. =item 3. -A reference to an anonymous hash can be constructed using curly +A reference to an anonymous hash can be created using curly brackets: $hashref = { @@ -99,7 +101,7 @@ brackets: 'Clyde' => 'Bonnie', }; -Anonymous hash and array constructors can be intermixed freely to +Anonymous hash and array composers like these can be intermixed freely to produce as complicated a structure as you want. The multidimensional syntax described below works for these too. The values above are literals, but variables and expressions would work just as well, because @@ -131,7 +133,7 @@ the expression to mean either the HASH reference, or the BLOCK. =item 4. -A reference to an anonymous subroutine can be constructed by using +A reference to an anonymous subroutine can be created by using C without a subname: $coderef = sub { print "Boink!\n" }; @@ -139,7 +141,7 @@ C without a subname: Note the presence of the semicolon. Except for the fact that the code inside isn't executed immediately, a C is not so much a declaration as it is an operator, like C or C. (However, no -matter how many times you execute that line (unless you're in an +matter how many times you execute that particular line (unless you're in an C), C<$coderef> will still have a reference to the I anonymous subroutine.) @@ -183,7 +185,7 @@ newprint() I the fact that the "my $x" has seemingly gone out of scope by the time the anonymous subroutine runs. That's what closure is all about. -This applies to only lexical variables, by the way. Dynamic variables +This applies only to lexical variables, by the way. Dynamic variables continue to work as they have always worked. Closure is not something that most Perl programmers need trouble themselves about to begin with. @@ -194,11 +196,23 @@ Perl objects are just references to a special kind of object that happens to kno which package it's associated with. Constructors are just special subroutines that know how to create that association. They do so by starting with an ordinary reference, and it remains an ordinary reference -even while it's also being an object. Constructors are customarily -named new(), but don't have to be: +even while it's also being an object. Constructors are often +named new() and called indirectly: $objref = new Doggie (Tail => 'short', Ears => 'long'); +But don't have to be: + + $objref = Doggie->new(Tail => 'short', Ears => 'long'); + + use Term::Cap; + $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); + + use Tk; + $main = MainWindow->new(); + $menubar = $main->Frame(-relief => "raised", + -borderwidth => 2) + =item 6. References of the appropriate type can spring into existence if you @@ -230,36 +244,34 @@ except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn't been used yet. This might change in a future release. -The use of *foo{IO} is the best way to pass bareword filehandles into or -out of subroutines, or to store them in larger data structures. +*foo{IO} is an alternative to the \*HANDLE mechanism given in +L for passing filehandles +into or out of subroutines, or storing into larger data structures. +Its disadvantage is that it won't create a new filehandle for you. +Its advantage is that you have no risk of clobbering more than you want +to with a typeglob assignment, although if you assign to a scalar instead +of a typeglob, you're ok. + splutter(*STDOUT); splutter(*STDOUT{IO}); + sub splutter { my $fh = shift; print $fh "her um well a hmmm\n"; } + $rec = get_rec(*STDIN); $rec = get_rec(*STDIN{IO}); + sub get_rec { my $fh = shift; return scalar <$fh>; } -Beware, though, that you can't do this with a routine which is going to -open the filehandle for you, because *HANDLE{IO} will be undef if HANDLE -hasn't been used yet. Use \*HANDLE for that sort of thing instead. - -Using \*HANDLE (or *HANDLE) is another way to use and store non-bareword -filehandles (before perl version 5.002 it was the only way). The two -methods are largely interchangeable, you can do - - splutter(\*STDOUT); - $rec = get_rec(\*STDIN); - -with the above subroutine definitions. - =back +=head2 Using References + That's it for creating references. By now you're probably dying to know how to use references to get back to your long-lost data. There are several basic methods. @@ -347,6 +359,7 @@ statement, C<$array[$x]> may have been undefined. If so, it's automatically defined with a hash reference so that we can look up C<{"foo"}> in it. Likewise C<$array[$x]-E{"foo"}> will automatically get defined with an array reference so that we can look up C<[0]> in it. +This process is called I. One more thing here. The arrow is optional I brackets subscripts, so you can shrink the above down to @@ -376,8 +389,8 @@ civility though. The ref() operator may be used to determine what type of thing the reference is pointing to. See L. -The bless() operator may be used to associate a reference with a package -functioning as an object class. See L. +The bless() operator may be used to associate the object a reference +points to with a package functioning as an object class. See L. A typeglob may be dereferenced the same way a reference can, because the dereference syntax always indicates the kind of reference desired. @@ -430,11 +443,11 @@ block. An inner block may countermand that with no strict 'refs'; -Only package variables are visible to symbolic references. Lexical -variables (declared with my()) aren't in a symbol table, and thus are -invisible to this mechanism. For example: +Only package variables (globals, even if localized) are visible to +symbolic references. Lexical variables (declared with my()) aren't in +a symbol table, and thus are invisible to this mechanism. For example: - local($value) = 10; + local $value = 10; $ref = \$value; { my $value = 20; @@ -498,6 +511,79 @@ The B<-w> switch will warn you if it interprets a reserved word as a string. But it will no longer warn you about using lowercase words, because the string is effectively quoted. +=head2 Function Templates + +As explained above, a closure is an anonymous function with access to the +lexical variables visible when that function was compiled. It retains +access to those variables even though it doesn't get run until later, +such as in a signal handler or a Tk callback. + +Using a closure as a function template allows us to generate many functions +that act similarly. Suppopose you wanted functions named after the colors +that generated HTML font changes for the various colors: + + print "Be ", red("careful"), "with that ", green("light"); + +The red() and green() functions would be very similar. To create these, +we'll assign a closure to a typeglob of the name of the function we're +trying to build. + + @colors = qw(red blue green yellow orange purple violet); + for my $name (@colors) { + no strict 'refs'; # allow symbol table manipulation + *$name = *{uc $name} = sub { "@_" }; + } + +Now all those different functions appear to exist independently. You can +call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on +both compile time and memory use, and is less error-prone as well, since +syntax checks happen at compile time. It's critical that any variables in +the anonymous subroutine be lexicals in order to create a proper closure. +That's the reasons for the C on the loop iteration variable. + +This is one of the only places where giving a prototype to a closure makes +much sense. If you wanted to impose scalar context on the arguments of +these functions (probably not a wise idea for this particular example), +you could have written it this way instead: + + *$name = sub ($) { "$_[0]" }; + +However, since prototype checking happens at compile time, the assignment +above happens too late to be of much use. You could address this by +putting the whole loop of assignments within a BEGIN block, forcing it +to occur during compilation. + +Access to lexicals that change over type--like those in the C loop +above--only works with closures, not general subroutines. In the general +case, then, named subroutines do not nest properly, although anonymous +ones do. If you are accustomed to using nested subroutines in other +programming languages with their own private variables, you'll have to +work at it a bit in Perl. The intuitive coding of this kind of thing +incurs mysterious warnings about ``will not stay shared''. For example, +this won't work: + + sub outer { + my $x = $_[0] + 35; + sub inner { return $x * 19 } # WRONG + return $x + inner(); + } + +A work-around is the following: + + sub outer { + my $x = $_[0] + 35; + local *inner = sub { return $x * 19 }; + return $x + inner(); + } + +Now inner() can only be called from within outer(), because of the +temporary assignments of the closure (anonymous subroutine). But when +it does, it has normal access to the lexical variable $x from the scope +of outer(). + +This has the interesting effect of creating a function local to another +function, something not normally supported in Perl. + =head1 WARNING You may not (usefully) use a reference as the key to a hash. It will be @@ -515,6 +601,8 @@ more like And then at least you can use the values(), which will be real refs, instead of the keys(), which won't. +The standard Tie::RefHash module provides a convenient workaround to this. + =head1 SEE ALSO Besides the obvious documents, source code can be instructive. @@ -522,5 +610,5 @@ Some rather pathological examples of the use of references can be found in the F regression test in the Perl source directory. See also L and L for how to use references to create -complex data structures, and L for how to use them to create -objects. +complex data structures, and L, L, and L +for how to use them to create objects. diff --git a/pod/perlrun.pod b/pod/perlrun.pod index 84ce270..72c772e 100644 --- a/pod/perlrun.pod +++ b/pod/perlrun.pod @@ -512,14 +512,15 @@ support that). =item B<-u> causes Perl to dump core after compiling your script. You can then -take this core dump and turn it into an executable file by using the +in theory take this core dump and turn it into an executable file by using the B program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your script before dumping, use the dump() operator instead. Note: availability of B is platform specific and may not be available for a specific port of -Perl. +Perl. It has been superseded by the new perl-to-C compiler, which is more +portable, even though it's still only considered beta. =item B<-U> diff --git a/pod/perlsec.pod b/pod/perlsec.pod index 3fd9034..4a743c7 100644 --- a/pod/perlsec.pod +++ b/pod/perlsec.pod @@ -225,15 +225,14 @@ never call the shell at all. } else { my @temp = ($EUID, $EGID); $EUID = $UID; - $EGID = $GID; # XXX: initgroups() not called + $EGID = $GID; # initgroups() also called! # Make sure privs are really gone ($EUID, $EGID) = @temp; - die "Can't drop privileges" unless - $UID == $EUID and - $GID eq $EGID; # String test + die "Can't drop privileges" + unless $UID == $EUID && $GID eq $EGID; $ENV{PATH} = "/bin:/usr/bin"; - exec 'myprog', 'arg1', 'arg2' or - die "can't exec myprog: $!"; + exec 'myprog', 'arg1', 'arg2' + or die "can't exec myprog: $!"; } A similar strategy would work for wildcard expansion via C, although @@ -320,9 +319,10 @@ First of all, however, you I take away read permission, because the source code has to be readable in order to be compiled and interpreted. (That doesn't mean that a CGI script's source is readable by people on the web, though.) So you have to leave the -permissions at the socially friendly 0755 level. +permissions at the socially friendly 0755 level. This lets +people on your local system only see your source. -Some people regard this as a security problem. If your program does +Some people mistakenly regard this as a security problem. If your program does insecure things, and relies on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine the insecure things and exploit them without viewing the @@ -345,3 +345,7 @@ statements like "This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah." You should see a lawyer to be sure your licence's wording will stand up in court. + +=head1 SEE ALSO + +L for its description of cleaning up environment variables. diff --git a/pod/perlsub.pod b/pod/perlsub.pod index 1d7660c..5baff89 100644 --- a/pod/perlsub.pod +++ b/pod/perlsub.pod @@ -14,7 +14,8 @@ To declare subroutines: To define an anonymous subroutine at runtime: - $subref = sub BLOCK; + $subref = sub BLOCK; # no proto + $subref = sub (PROTO) BLOCK; # with proto To import subroutines: @@ -24,7 +25,7 @@ To call subroutines: NAME(LIST); # & is optional with parentheses. NAME LIST; # Parentheses optional if predeclared/imported. - &NAME; # Passes current @_ to subroutine. + &NAME; # Makes current @_ visible to called subroutine. =head1 DESCRIPTION @@ -33,7 +34,7 @@ may be located anywhere in the main program, loaded in from other files via the C, C, or C keywords, or even generated on the fly using C or anonymous subroutines (closures). You can even call a function indirectly using a variable containing its name or a CODE reference -to it, as in C<$var = \&function>. +to it. The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and @@ -190,6 +191,14 @@ disables any prototype checking on the arguments you do provide. This is partly for historical reasons, and partly for having a convenient way to cheat if you know what you're doing. See the section on Prototypes below. +Function whose names are in all upper case are reserved to the Perl core, +just as are modules whose names are in all lower case. A function in +all capitals is a loosely-held convention meaning it will be called +indirectly by the run-time system itself. Functions that do special, +pre-defined things BEGIN, END, AUTOLOAD, and DESTROY--plus all the +functions mentioned in L. The 5.005 release adds INIT +to this list. + =head2 Private Variables via my() Synopsis: @@ -207,11 +216,20 @@ must be placed in parentheses. All listed elements must be legal lvalues. Only alphanumeric identifiers may be lexically scoped--magical builtins like $/ must currently be localized with "local" instead. -Unlike dynamic variables created by the "local" statement, lexical +Unlike dynamic variables created by the "local" operator, lexical variables declared with "my" are totally hidden from the outside world, including any called subroutines (even if it's the same subroutine called from itself or elsewhere--every call gets its own copy). +This doesn't mean that a my() variable declared in a statically +I lexical scope would be invisible. Only the dynamic scopes +are cut off. For example, the bumpx() function below has access to the +lexical $x variable because both the my and the sub occurred at the same +scope, presumably the file scope. + + my $x = 10; + sub bumpx { $x++ } + (An eval(), however, can see the lexical variables of the scope it is being evaluated in so long as the names aren't hidden by declarations within the eval() itself. See L.) @@ -236,7 +254,7 @@ The "my" is simply a modifier on something you might assign to. So when you do assign to the variables in its argument list, the "my" doesn't change whether those variables is viewed as a scalar or an array. So - my ($foo) = ; + my ($foo) = ; # WRONG? my @FOO = ; both supply a list context to the right-hand side, while @@ -245,7 +263,7 @@ both supply a list context to the right-hand side, while supplies a scalar context. But the following declares only one variable: - my $foo, $bar = 1; + my $foo, $bar = 1; # WRONG That has the same effect as @@ -342,13 +360,13 @@ lexical of the same name is also visible: That will print out 20 and 10. -You may declare "my" variables at the outermost scope of a file to -hide any such identifiers totally from the outside world. This is similar +You may declare "my" variables at the outermost scope of a file to hide +any such identifiers totally from the outside world. This is similar to C's static variables at the file level. To do this with a subroutine -requires the use of a closure (anonymous function). If a block (such as -an eval(), function, or C) wants to create a private subroutine -that cannot be called from outside that block, it can declare a lexical -variable containing an anonymous sub reference: +requires the use of a closure (anonymous function with lexical access). +If a block (such as an eval(), function, or C) wants to create +a private subroutine that cannot be called from outside that block, +it can declare a lexical variable containing an anonymous sub reference: my $secret_version = '1.001-beta'; my $secret_sub = sub { print $secret_version }; @@ -363,13 +381,28 @@ unqualified and unqualifiable. This does not work with object methods, however; all object methods have to be in the symbol table of some package to be found. -Just because the lexical variable is lexically (also called statically) -scoped doesn't mean that within a function it works like a C static. It -normally works more like a C auto. But here's a mechanism for giving a -function private variables with both lexical scoping and a static -lifetime. If you do want to create something like C's static variables, -just enclose the whole function in an extra block, and put the -static variable outside the function but in the block. +=head2 Peristent Private Variables + +Just because a lexical variable is lexically (also called statically) +scoped to its enclosing block, eval, or do FILE, this doesn't mean that +within a function it works like a C static. It normally works more +like a C auto, but with implicit garbage collection. + +Unlike local variables in C or C++, Perl's lexical variables don't +necessarily get recycled just because their scope has exited. +If something more permanent is still aware of the lexical, it will +stick around. So long as something else references a lexical, that +lexical won't be freed--which is as it should be. You wouldn't want +memory being free until you were done using it, or kept around once you +were done. Automatic garbage collection takes care of this for you. + +This means that you can pass back or save away references to lexical +variables, whereas to return a pointer to a C auto is a grave error. +It also gives us a way to simulate C's function statics. Here's a +mechanism for giving a function private variables with both lexical +scoping and a static lifetime. If you do want to create something like +C's static variables, just enclose the whole function in an extra block, +and put the static variable outside the function but in the block. { my $secret_val = 0; @@ -397,6 +430,12 @@ starts to run: See L about the BEGIN function. +If declared at the outermost scope, the file scope, then lexicals work +someone like C's file statics. They are available to all functions in +that same file declared below them, but are inaccessible from outside of +the file. This is sometimes used in modules to create private variables +for the whole module. + =head2 Temporary Values via local() B: In general, you should be using "my" instead of "local", because @@ -419,11 +458,12 @@ Synopsis: local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc -A local() modifies its listed variables to be local to the enclosing -block, (or subroutine, C, or C) and I. A local() just gives temporary values to global -(meaning package) variables. This is known as dynamic scoping. Lexical -scoping is done with "my", which works more like C's auto declarations. +A local() modifies its listed variables to be "local" to the enclosing +block, eval, or C--and to I. +A local() just gives temporary values to global (meaning package) +variables. It does not create a local variable. This is known as +dynamic scoping. Lexical scoping is done with "my", which works more +like C's auto declarations. If more than one variable is given to local(), they must be placed in parentheses. All listed elements must be legal lvalues. This operator works @@ -541,9 +581,6 @@ Perl will print This is a test only a test. The array has 6 elements: 0, 1, 2, undef, undef, 5 -In short, be careful when manipulating the containers for composite types -whose elements have been localized. - =head2 Passing Symbol Table Entries (typeglobs) [Note: The mechanism described in this section was originally the only @@ -586,6 +623,77 @@ mechanism will merge all the array values so that you can't extract out the individual arrays. For more on typeglobs, see L. +=head2 When to Still Use local() + +Despite the existence of my(), there are still three places where the +local() operator still shines. In fact, in these three places, you +I use C instead of C. + +=over + +=item 1. You need to give a global variable a temporary value, especially $_. + +The global variables, like @ARGV or the punctuation variables, must be +localized with local(). This block reads in I, and splits +it up into chunks separated by lines of equal signs, which are placed +in @Fields. + + { + local @ARGV = ("/etc/motd"); + local $/ = undef; + local $_ = <>; + @Fields = split /^\s*=+\s*$/; + } + +It particular, its important to localize $_ in any routine that assigns +to it. Look out for implicit assignments in C conditionals. + +=item 2. You need to create a local file or directory handle or a local function. + +A function that needs a filehandle of its own must use local() uses +local() on complete typeglob. This can be used to create new symbol +table entries: + + sub ioqueue { + local (*READER, *WRITER); # not my! + pipe (READER, WRITER); or die "pipe: $!"; + return (*READER, *WRITER); + } + ($head, $tail) = ioqueue(); + +See the Symbol module for a way to create anonymous symbol table +entries. + +Because assignment of a reference to a typeglob creates an alias, this +can be used to create what is effectively a local function, or at least, +a local alias. + + { + local *grow = \&shrink; # only until this block exists + grow(); # really calls shrink() + move(); # if move() grow()s, it shrink()s too + } + grow(); # get the real grow() again + +See L for more about manipulating +functions by name in this way. + +=item 3. You want to temporarily change just one element of an array or hash. + +You can localize just one element of an aggregate. Usually this +is done on dynamics: + + { + local $SIG{INT} = 'IGNORE'; + funct(); # uninterruptible + } + # interruptibility automatically restored here + +But it also works on lexically declared aggregates. Prior to 5.005, +this operation could on occasion misbehave. + +=back + =head2 Pass by Reference If you want to pass more than one array or hash into a function--or @@ -852,7 +960,7 @@ without C<&> or C. Calls made using C<&> or C are never inlined. (See constant.pm for an easy way to declare most constants.) -All of the following functions would be inlined. +The following functions would all be inlined: sub pi () { 3.14159 } # Not exact, but close. sub PI () { 4 * atan2 1, 1 } # As good as it gets, @@ -881,7 +989,7 @@ All of the following functions would be inlined. sub N_FACTORIAL () { $prod } } -If you redefine a subroutine which was eligible for inlining you'll get +If you redefine a subroutine that was eligible for inlining, you'll get a mandatory warning. (You can use this warning to tell whether or not a particular subroutine is considered constant.) The warning is considered severe enough not to be optional because previously compiled @@ -991,7 +1099,7 @@ library. If you call a subroutine that is undefined, you would ordinarily get an immediate fatal error complaining that the subroutine doesn't exist. (Likewise for subroutines being used as methods, when the method -doesn't exist in any of the base classes of the class package.) If, +doesn't exist in any base class of the class package.) If, however, there is an C subroutine defined in the package or packages that were searched for the original subroutine, then that C subroutine is called with the arguments that would have been @@ -1036,7 +1144,6 @@ functions to perl code in L. =head1 SEE ALSO -See L for more on references. See L if you'd -like to learn about calling C subroutines from perl. See -L to learn about bundling up your functions in -separate files. +See L for more about references and closures. See L if +you'd like to learn about calling C subroutines from perl. See L +to learn about bundling up your functions in separate files. diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod index 1d0f5d6..7c93257 100644 --- a/pod/perlsyn.pod +++ b/pod/perlsyn.pod @@ -95,10 +95,25 @@ conditional is evaluated. This is so that you can write loops like: ... } until $line eq ".\n"; -See L. Note also that the loop control -statements described later will I work in this construct, because -modifiers don't take loop labels. Sorry. You can always wrap -another block around it to do that sort of thing. +See L. Note also that the loop control statements described +later will I work in this construct, because modifiers don't take +loop labels. Sorry. You can always put another block inside of it +(for C) or around it (for C) to do that sort of thing. +For next, just double the braces: + + do {{ + next if $x == $y; + # do something here + }} until $x++ > $z; + +For last, you have to be more elaborate: + + LOOP: { + do { + last if $x = $y**2; + # do something here + } while $x++ <= $z; + } =head2 Compound statements @@ -201,31 +216,34 @@ which is Perl short-hand for the more explicitly written version: # now process $line } -Or here's a simpleminded Pascal comment stripper (warning: assumes no -{ or } in strings). +Note that if there were a C block on the above code, it would get +executed even on discarded lines. This is often used to reset line counters +or C one-time matches. - LINE: while () { - while (s|({.*}.*){.*}|$1 |) {} - s|{.*}| |; - if (s|{.*| |) { - $front = $_; - while () { - if (/}/) { # end of comment? - s|^|$front{|; - redo LINE; - } - } - } - print; + # inspired by :1,$g/fred/s//WILMA/ + while (<>) { + ?(fred)? && s//WILMA $1 WILMA/; + ?(barney)? && s//BETTY $1 BETTY/; + ?(homer)? && s//MARGE $1 MARGE/; + } continue { + print "$ARGV $.: $_"; + close ARGV if eof(); # reset $. + reset if eof(); # reset ?pat? } -Note that if there were a C block on the above code, it would get -executed even on discarded lines. - If the word C is replaced by the word C, the sense of the test is reversed, but the conditional is still tested before the first iteration. +The loop control statements don't work in an C or C, since +they aren't loops. You can double the braces to make them such, though. + + if (/pattern/) {{ + next if /fred/; + next if /barney/; + # so something here + }} + The form C, available in Perl 4, is no longer available. Replace any occurrence of C by C. @@ -276,11 +294,12 @@ if you have subroutine or format declarations within the loop which refer to it.) The C keyword is actually a synonym for the C keyword, so -you can use C for readability or C for brevity. If VAR is -omitted, $_ is set to each value. If any element of LIST is an lvalue, -you can modify it by modifying VAR inside the loop. That's because -the C loop index variable is an implicit alias for each item -in the list that you're looping over. +you can use C for readability or C for brevity. (Or because +the Bourne shell is more familiar to you than I, so writing C +comes more naturally.) If VAR is omitted, $_ is set to each value. +If any element of LIST is an lvalue, you can modify it by modifying VAR +inside the loop. That's because the C loop index variable is +an implicit alias for each item in the list that you're looping over. If any part of LIST is an array, C will get very confused if you add or remove elements within the loop body, for example with @@ -409,18 +428,6 @@ or $nothing = 1; } -or, using experimental C of regular expressions -(see L), - - / ^abc (?{ $abc = 1 }) - | - ^def (?{ $def = 1 }) - | - ^xyz (?{ $xyz = 1 }) - | - (?{ $nothing = 1 }) - /x; - or even, horrors, if (/^abc/) @@ -432,7 +439,6 @@ or even, horrors, else { $nothing = 1 } - A common idiom for a switch statement is to use C's aliasing to make a temporary assignment to $_ for convenient matching: @@ -447,7 +453,7 @@ Another interesting approach to a switch statement is arrange for a C block to return the proper value: $amode = do { - if ($flag & O_RDONLY) { "r" } + if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0? elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" } elsif ($flag & O_RDWR) { if ($flag & O_CREAT) { "w+" } @@ -455,6 +461,38 @@ for a C block to return the proper value: } }; +Or + + print do { + ($flags & O_WRONLY) ? "write-only" : + ($flags & O_RDWR) ? "read-write" : + "read-only"; + }; + +Or if you are certainly that all the C<&&> clauses are true, you can use +something like this, which "switches" on the value of the +HTTP_USER_AGENT envariable. + + #!/usr/bin/perl + # pick out jargon file page based on browser + $dir = 'http://www.wins.uva.nl/~mes/jargon'; + for ($ENV{HTTP_USER_AGENT}) { + $page = /Mac/ && 'm/Macintrash.html' + || /Win(dows )?NT/ && 'e/evilandrude.html' + || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html' + || /Linux/ && 'l/Linux.html' + || /HP-UX/ && 'h/HP-SUX.html' + || /SunOS/ && 's/ScumOS.html' + || 'a/AppendixB.html'; + } + print "Location: $dir/$page\015\012\015\012"; + +That kind of switch statement only works when you know the C<&&> clauses +will be true. If you don't, the previous C example should be used. + +You might also consider writing a hash instead of synthesizing a switch +statement. + =head2 Goto Although not for the faint of heart, Perl does support a C statement. @@ -539,8 +577,8 @@ of code. =head2 Plain Old Comments (Not!) -Much like the C preprocessor, perl can process line directives. Using -this, one can control perl's idea of filenames and line numbers in +Much like the C preprocessor, Perl can process line directives. Using +this, one can control Perl's idea of filenames and line numbers in error or warning messages (especially for strings that are processed with eval()). The syntax for this mechanism is the same as for most C preprocessors: it matches the regular expression diff --git a/pod/perltie.pod b/pod/perltie.pod index da4fbe9..cae0a15 100644 --- a/pod/perltie.pod +++ b/pod/perltie.pod @@ -23,7 +23,7 @@ Now you can. The tie() function binds a variable to a class (package) that will provide the implementation for access methods for that variable. Once this magic has been performed, accessing a tied variable automatically triggers -method calls in the proper class. All of the complexity of the class is +method calls in the proper class. The complexity of the class is hidden behind magic methods calls. The method names are in ALL CAPS, which is a convention that Perl uses to indicate that they're called implicitly rather than explicitly--just like the BEGIN() and END() diff --git a/pod/perltoot.pod b/pod/perltoot.pod index 90ef81a..c77a971 100644 --- a/pod/perltoot.pod +++ b/pod/perltoot.pod @@ -1753,27 +1753,25 @@ L, and L. -=head1 COPYRIGHT +=head1 AUTHOR AND COPYRIGHT + +Copyright (c) 1997, 1998 Tom Christiansen +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I +of that package require that special arrangements be made with +copyright holder. -I I hate to have to say this, but recent unpleasant -experiences have mandated its inclusion: - - Copyright 1996 Tom Christiansen. All Rights Reserved. - -This work derives in part from the second edition of I. -Although destined for release as a manpage with the standard Perl -distribution, it is not public domain (nor is any of Perl and its docset: -publishers beware). It's expected to someday make its way into a revision -of the Camel Book. While it is copyright by me with all rights reserved, -permission is granted to freely distribute verbatim copies of this -document provided that no modifications outside of formatting be made, -and that this notice remain intact. You are permitted and encouraged to -use its code and derivatives thereof in your own source code for fun or -for profit as you see fit. But so help me, if in six months I find some -book out there with a hacked-up version of this material in it claiming to -be written by someone else, I'll tell all the world that you're a jerk. -Furthermore, your lawyer will meet my lawyer (or O'Reilly's) over lunch -to arrange for you to receive your just deserts. Count on it. +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. + +=head1 COPYRIGHT =head2 Acknowledgments diff --git a/pod/perlvar.pod b/pod/perlvar.pod index d9edffa..1a12011 100644 --- a/pod/perlvar.pod +++ b/pod/perlvar.pod @@ -6,15 +6,15 @@ perlvar - Perl predefined variables =head2 Predefined Names -The following names have special meaning to Perl. Most of the +The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics, or analogues in one of -the shells. Nevertheless, if you wish to use the long variable names, +the shells. Nevertheless, if you wish to use long variable names, you just need to say use English; at the top of your program. This will alias all the short names to the -long names in the current package. Some of them even have medium names, +long names in the current package. Some even have medium names, generally borrowed from B. To go a step further, those variables that depend on the currently @@ -28,7 +28,7 @@ after which you may use either method HANDLE EXPR -or +or more safely, HANDLE->method(EXPR) @@ -112,11 +112,11 @@ test. Note that outside of a C test, this will not happen. =over 8 -=item $EIE +=item $EIE Contains the subpattern from the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested -blocks that have been exited already. (Mnemonic: like \digit.) +blocks that have been exited already. (Mnemonic: like \digits.) These variables are all read-only. =item $MATCH @@ -176,7 +176,8 @@ is 0. (Mnemonic: * matches multiple things.) Note that this variable influences the interpretation of only "C<^>" and "C<$>". A literal newline can be searched for even when C<$* == 0>. -Use of "C<$*>" is deprecated in modern perls. +Use of "C<$*>" is deprecated in modern Perls, supplanted by +the C and C modifiers on pattern matching. =item input_line_number HANDLE EXPR @@ -427,12 +428,11 @@ L. =item $? The status returned by the last pipe close, backtick (C<``>) command, -or system() operator. Note that this is the status word returned by -the wait() system call (or else is made up to look like it). Thus, -the exit value of the subprocess is actually (C<$? EE 8>), and -C<$? & 255> gives which signal, if any, the process died from, and -whether there was a core dump. (Mnemonic: similar to B and -B.) +or system() operator. Note that this is the status word returned by the +wait() system call (or else is made up to look like it). Thus, the exit +value of the subprocess is actually (C<$? EE 8>), and C<$? & 127> +gives which signal, if any, the process died from, and C<$? & 128> reports +whether there was a core dump. (Mnemonic: similar to B and B.) Additionally, if the C variable is supported in C, its value is returned via $? if any of the C functions fail.