X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq5.pod;h=feb66a45cdc9e089d7b6a78196f9fb435fc15ff4;hb=1e6da01743571311bdbed539271d576c28f4c2ec;hp=41c46a3dae9e964eefab5d43c9298c7000945ecb;hpb=5a964f204835a8014f4ba86fc91884cff958ac67;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 41c46a3..feb66a4 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.22 $, $Date: 1997/04/24 22:44:02 $) +perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -69,7 +69,7 @@ or even this: Note the bizarrely hardcoded carriage return and newline in their octal equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That the way things work in +on all platforms, including Macintosh. That's the way things work in network programming: you really should specify the exact bit pattern on the network line terminator. In practice, C<"\n\n"> often works, but this is not portable. @@ -78,12 +78,15 @@ See L for other examples of fetching URLs over the web. =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? +Those are operations of a text editor. Perl is not a text editor. +Perl is a programming language. You have to decompose the problem into +low-level calls to read, write, open, close, and seek. + Although humans have an easy time thinking of a text file as being a -sequence of lines that operates much like a stack of playing cards -- -or punch cards -- computers usually see the text file as a sequence of -bytes. In general, there's no direct way for Perl to seek to a -particular line of a file, insert text into a file, or remove text -from a file. +sequence of lines that operates much like a stack of playing cards -- or +punch cards -- computers usually see the text file as a sequence of bytes. +In general, there's no direct way for Perl to seek to a particular line +of a file, insert text into a file, or remove text from a file. (There are exceptions in special circumstances. You can add or remove at the very end of the file. Another is replacing a sequence of bytes with @@ -97,7 +100,7 @@ no locking. $old = $file; $new = "$file.tmp.$$"; - $bak = "$file.bak"; + $bak = "$file.orig"; open(OLD, "< $old") or die "can't open $old: $!"; open(NEW, "> $new") or die "can't open $new: $!"; @@ -124,7 +127,7 @@ platform-specific documentation that came with your port. perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t # form a script - local($^I, @ARGV) = ('.bak', glob("*.c")); + local($^I, @ARGV) = ('.orig', glob("*.c")); while (<>) { if ($. == 1) { print "This line should appear at the top of each file\n"; @@ -174,9 +177,9 @@ Use the C class method from the IO::File module to get a filehandle opened for reading and writing. Use this if you don't need to know the file's name. - use IO::File; + use IO::File; $fh = IO::File->new_tmpfile() - or die "Unable to make new temporary file: $!"; + or die "Unable to make new temporary file: $!"; Or you can use the C function from the POSIX module to get a filename that you then open yourself. Use this if you do need to know @@ -222,7 +225,7 @@ one process, use a counter: =head2 How can I manipulate fixed-record-length files? The most efficient way is using pack() and unpack(). This is faster than -using substr() when take many, many strings. It is slower for just a few. +using substr() when taking many, many strings. It is slower for just a few. Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, @@ -288,7 +291,11 @@ pair to make it easy to sort the hash in insertion order. print "$name $. $line"; } -If you want to create many, anonymous handles, you should check out the +For passing filehandles to functions, the easiest way is to +preface them with a star, as in func(*STDIN). See L for details. + +If you want to create many anonymous handles, you should check out the Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent code with Symbol::gensym, which is reasonably light-weight: @@ -299,8 +306,8 @@ code with Symbol::gensym, which is reasonably light-weight: $file{$filename} = [ $i++, $fh ]; } -Or here using the semi-object-oriented FileHandle, which certainly isn't -light-weight: +Or here using the semi-object-oriented FileHandle module, which certainly +isn't light-weight: use FileHandle; @@ -339,8 +346,8 @@ and use it as though it were a normal filehandle. Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains -a filehandle. Functions like C, C, C, or the functions or -the CFHE> diamond operator will accept either a read filehandle +a filehandle. Functions like C, C, C, or +the C<< >> diamond operator will accept either a read filehandle or a scalar variable containing one: ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); @@ -348,7 +355,7 @@ or a scalar variable containing one: $got = <$ifh> print $efh "What was that: $got"; -Of you're passing a filehandle to a function, you can write +If you're passing a filehandle to a function, you can write the function in two ways: sub accept_fh { @@ -400,7 +407,7 @@ calls doesn't work for the diamond operator. That's because it's a real operator, not just a function with a comma-less argument. Assuming you've been storing typeglobs in your structure as we did above, you can use the built-in function named C to reads a record just -as CE> does. Given the initialization shown above for @fd, this +as C<< <> >> does. Given the initialization shown above for @fd, this would work, but only because readline() require a typeglob. It doesn't work with objects or strings, which might be a bug we haven't fixed yet. @@ -418,7 +425,7 @@ techniques to make it possible for the intrepid hacker. =head2 How can I write() into a string? -See L for an swrite() function. +See L for an swrite() function. =head2 How can I output my numbers with commas added? @@ -426,7 +433,7 @@ This one will do it for you: sub commify { local $_ = shift; - 1 while s/^(-?\d+)(\d{3})/$1,$2/; + 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; return $_; } @@ -437,7 +444,7 @@ This one will do it for you: You can't just: - s/^(-?\d+)(\d{3})/$1,$2/g; + s/^([-+]?\d+)(\d{3})/$1,$2/g; because you have to put the comma in and then recalculate your position. @@ -451,12 +458,12 @@ whatever: my $input = shift; $input = reverse $input; $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g; - return reverse $input; + return scalar reverse $input; } =head2 How can I translate tildes (~) in a filename? -Use the EE (glob()) operator, documented in L. This +Use the <> (glob()) operator, documented in L. This requires that you have a shell installed that groks tildes, meaning csh or tcsh or (some versions of) ksh, and thus may have portability problems. The Glob::KGlob module (available from CPAN) gives more @@ -484,8 +491,12 @@ I gives you read-write access: open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. Using "E" always clobbers or creates. -Using "E" never does either. The "+" doesn't change this. +doesn't exist. + + open(FH, "+< /path/name"); # open for update + +Using ">" always clobbers or creates. Using "<" never does +either. The "+" doesn't change this. Here are examples of many kinds of file opens. Those using sysopen() all assume @@ -543,17 +554,20 @@ be an atomic operation over NFS. That is, two processes might both successful create or unlink the same file! Therefore O_EXCL isn't so exclusive as you might wish. +See also the new L if you have it (new for 5.6). + =head2 Why do I sometimes get an "Argument list too long" when I use <*>? -The CE> operator performs a globbing operation (see above). -By default glob() forks csh(1) to do the actual glob expansion, but +The C<< <> >> operator performs a globbing operation (see above). +In Perl versions earlier than v5.6.0, the internal glob() operator forks +csh(1) to do the actual glob expansion, but csh can't handle more than 127 items and so gives the error message C. People who installed tcsh as csh won't have this problem, but their users may be surprised by it. -To get around this, either do the glob yourself with Cs and -patterns, or use a module like Glob::KGlob, one that doesn't use the -shell to do globbing. +To get around this, either upgrade to Perl v5.6.0 or later, do the glob +yourself with readdir() and patterns, or use a module like Glob::KGlob, +one that doesn't use the shell to do globbing. =head2 Is there a leak/bug in glob()? @@ -562,7 +576,7 @@ use the glob() function or its angle-bracket alias in a scalar context, you may cause a leak and/or unpredictable behavior. It's best therefore to use glob() only in list context. -=head2 How can I open a file with a leading "E" or trailing blanks? +=head2 How can I open a file with a leading ">" or trailing blanks? Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|") to mean something @@ -572,22 +586,39 @@ trailing null byte on the name to make perl leave it alone: sub safe_filename { local $_ = shift; - return m#^/# - ? "$_\0" - : "./$_\0"; + s#^([^./])#./$1#; + $_ .= "\0"; + return $_; } - $fn = safe_filename("<< $fn") or "couldn't open $fn: $!"; + $badpath = "<< $fn") or "couldn't open $badpath: $!"; -You could also use the sysopen() function (see L). +This assumes that you are using POSIX (portable operating systems +interface) paths. If you are on a closed, non-portable, proprietary +system, you may have to adjust the C<"./"> above. + +It would be a lot clearer to use sysopen(), though: + + use Fcntl; + $badpath = "<< if you have it +(new for 5.6). =head2 How can I reliably rename a file? -Well, usually you just use Perl's rename() function. But that may -not work everywhere, in particular, renaming files across file systems. -If your operating system supports a mv(1) program or its moral equivalent, -this works: +Well, usually you just use Perl's rename() function. But that may not +work everywhere, in particular, renaming files across file systems. +Some sub-Unix systems have broken ports that corrupt the semantics of +rename() -- for example, WinNT does this right, but Win95 and Win98 +are broken. (The last two parts are not surprising, but the first is. :-) + +If your operating system supports a proper mv(1) program or its moral +equivalent, this works: rename($old, $new) or system("mv", $old, $new); @@ -597,7 +628,7 @@ then delete the old one. This isn't really the same semantics as a real rename(), though, which preserves metainformation like permissions, timestamps, inode info, etc. -The newer version of File::Copy export a move() function. +The newer version of File::Copy exports a move() function. =head2 How can I lock a file? @@ -621,15 +652,32 @@ filehandle be open for writing (or appending, or read/writing). =item 3 -Some versions of flock() can't lock files over a network (e.g. on NFS -file systems), so you'd need to force the use of fcntl(2) when you -build Perl. See the flock entry of L, and the F -file in the source distribution for information on building Perl to do -this. +Some versions of flock() can't lock files over a network (e.g. on NFS file +systems), so you'd need to force the use of fcntl(2) when you build Perl. +But even this is dubious at best. See the flock entry of L, +and the F file in the source distribution for information on +building Perl to do this. + +Two potentially non-obvious but traditional flock semantics are that +it waits indefinitely until the lock is granted, and that its locks +I. Such discretionary locks are more flexible, but +offer fewer guarantees. This means that files locked with flock() may +be modified by programs that do not also use flock(). Cars that stop +for red lights get on well with each other, but not with cars that don't +stop for red lights. See the perlport manpage, your port's specific +documentation, or your system-specific local manpages for details. It's +best to assume traditional behavior if you're writing portable programs. +(But if you're not, you should as always feel perfectly free to write +for your own system's idiosyncrasies (sometimes called "features"). +Slavish adherence to portability concerns shouldn't get in the way of +your getting your job done.) + +For more information on file locking, see also L if you have it (new for 5.6). =back -=head2 What can't I just open(FH, ">file.lock")? +=head2 Why can't I just open(FH, ">file.lock")? A common bit of code B is this: @@ -645,7 +693,7 @@ atomic test-and-set instruction. In theory, this "ought" to work: except that lamentably, file creation (and deletion) is not atomic over NFS, so this won't work (at least, not every time) over the net. -Various schemes involving involving link() have been suggested, but +Various schemes involving link() have been suggested, but these tend to involve busy-wait, which is also subdesirable. =head2 I still don't get locking. I just want to increment the number in the file. How can I do this? @@ -657,14 +705,15 @@ It's more realistic. Anyway, this is what you can do if you can't help yourself. - use Fcntl; + use Fcntl ':flock'; sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can't open numfile: $!"; - flock(FH, 2) or die "can't flock numfile: $!"; + flock(FH, LOCK_EX) or die "can't flock numfile: $!"; $num = || 0; seek(FH, 0, 0) or die "can't rewind numfile: $!"; truncate(FH, 0) or die "can't truncate numfile: $!"; (print FH $num+1, "\n") or die "can't write numfile: $!"; - # DO NOT UNLOCK THIS UNTIL YOU CLOSE + # Perl as of 5.004 automatically flushes before unlocking + flock(FH, LOCK_UN) or die "can't flock numfile: $!"; close FH or die "can't close numfile: $!"; Here's a much better web-page hit counter: @@ -689,7 +738,7 @@ like this: seek(FH, $recno * $RECSIZE, 0); read(FH, $record, $RECSIZE) == $RECSIZE || die "can't read record $recno: $!"; # munge the record - seek(FH, $recno * $RECSIZE, 0); + seek(FH, -$RECSIZE, 1); print FH $record; close FH; @@ -710,17 +759,21 @@ into human-readable form. Here's an example: $write_secs = (stat($file))[9]; - print "file $file updated at ", scalar(localtime($file)), "\n"; + printf "file %s updated at %s\n", $file, + scalar localtime($write_secs); If you prefer something more legible, use the File::stat module (part of the standard distribution in version 5.004 and later): + # error checking left as an exercise for reader. use File::stat; use Time::localtime; $date_string = ctime(stat($file)->mtime); print "file $file updated at $date_string\n"; -Error checking is left as an exercise for the reader. +The POSIX::strftime() approach has the benefit of being, +in theory, independent of the current locale. See L +for details. =head2 How do I set a file's timestamp in perl? @@ -736,7 +789,7 @@ of them. ($atime, $mtime) = (stat($timestamp))[8,9]; utime $atime, $mtime, @ARGV; -Error checking is left as an exercise for the reader. +Error checking is, as usual, left as an exercise for the reader. Note that utime() currently doesn't work correctly with Win95/NT ports. A bug has been reported. Check it carefully before using @@ -767,13 +820,68 @@ at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is written in Perl and offers much greater functionality than the stock version. +=head2 How can I read in an entire file all at once? + +The customary Perl approach for processing all the lines in a file is to +do so one line at a time: + + open (INPUT, $file) || die "can't open $file: $!"; + while () { + chomp; + # do something with $_ + } + close(INPUT) || die "can't close $file: $!"; + +This is tremendously more efficient than reading the entire file into +memory as an array of lines and then processing it one element at a time, +which is often -- if not almost always -- the wrong approach. Whenever +you see someone do this: + + @lines = ; + +You should think long and hard about why you need everything loaded +at once. It's just not a scalable solution. You might also find it +more fun to use the standard DB_File module's $DB_RECNO bindings, +which allow you to tie an array to a file so that accessing an element +the array actually accesses the corresponding line in the file. + +On very rare occasion, you may have an algorithm that demands that +the entire file be in memory at once as one scalar. The simplest solution +to that is: + + $var = `cat $file`; + +Being in scalar context, you get the whole thing. In list context, +you'd get a list of all the lines: + + @lines = `cat $file`; + +This tiny but expedient solution is neat, clean, and portable to +all systems on which decent tools have been installed. For those +who prefer not to use the toolbox, you can of course read the file +manually, although this makes for more complicated code. + + { + local(*INPUT, $/); + open (INPUT, $file) || die "can't open $file: $!"; + $var = ; + } + +That temporarily undefs your record separator, and will automatically +close the file at block exit. If the file is already open, just use this: + + $var = do { local $/; }; + =head2 How can I read in a file by paragraphs? -Use the C<$\> variable (see L for details). You can either +Use the C<$/> variable (see L for details). You can either set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">, for instance, gets treated as two paragraphs and not three), or C<"\n\n"> to accept empty paragraphs. +Note that a blank line must have no blanks in it. Thus C<"fred\n +\nstuff\n\n"> is one paragraph, but C<"fred\n\nstuff\n\n"> is two. + =head2 How can I read a single character from a file? From the keyboard? You can use the builtin C function for most filehandles, but @@ -781,8 +889,9 @@ it won't (easily) work on a terminal device. For STDIN, either use the Term::ReadKey module from CPAN, or use the sample code in L. -If your system supports POSIX, you can use the following code, which -you'll note turns off echo processing as well. +If your system supports the portable operating system programming +interface (POSIX), you can use the following code, which you'll note +turns off echo processing as well. #!/usr/bin/perl -w use strict; @@ -833,7 +942,8 @@ you'll note turns off echo processing as well. END { cooked() } -The Term::ReadKey module from CPAN may be easier to use: +The Term::ReadKey module from CPAN may be easier to use. Recent version +include also support for non-portable systems as well. use Term::ReadKey; open(TTY, " reports the following: +For legacy DOS systems, Dan Carson reports the following: To put the PC in "raw" mode, use ioctl with some magic numbers gleaned from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes @@ -890,11 +1000,12 @@ table: This is all trial and error I did a long time ago, I hope I'm reading the file that worked. -=head2 How can I tell if there's a character waiting on a filehandle? +=head2 How can I tell whether there's a character waiting on a filehandle? The very first thing you should do is look into getting the Term::ReadKey -extension from CPAN. It now even has limited support for closed, proprietary -(read: not open systems, not POSIX, not Unix, etc) systems. +extension from CPAN. As we mentioned earlier, it now even has limited +support for non-portable (read: not open systems, closed, proprietary, +not POSIX, not Unix, etc) systems. You should also check out the Frequently Asked Questions list in comp.unix.* for things like this: the answer is essentially the same. @@ -907,12 +1018,11 @@ systems: return $nfd = select($rin,undef,undef,0); } -If you want to find out how many characters are waiting, -there's also the FIONREAD ioctl call to be looked at. - -The I tool that comes with Perl tries to convert C include -files to Perl code, which can be Cd. FIONREAD ends -up defined as a function in the I file: +If you want to find out how many characters are waiting, there's +also the FIONREAD ioctl call to be looked at. The I tool that +comes with Perl tries to convert C include files to Perl code, which +can be Cd. FIONREAD ends up defined as a function in the +I file: require 'sys/ioctl.ph'; @@ -934,7 +1044,7 @@ Or write a small C program using the editor of champions: printf("%#08x\n", FIONREAD); } ^D - % cc -o fionread fionread + % cc -o fionread fionread.c % ./fionread 0x4004667f @@ -975,6 +1085,8 @@ the clearerr() method, which can remove the end of file condition on a filehandle. The method: read until end of file, clearerr(), read some more. Lather, rinse, repeat. +There's also a File::Tail module from CPAN. + =head2 How do I dup() a filehandle in Perl? If you check L, you'll see that several of the ways @@ -988,7 +1100,7 @@ Or even with a literal numeric descriptor: $fd = $ENV{MHCONTEXTFD}; open(MHCONTEXT, "<&=$fd"); # like fdopen(3S) -Note that "E&STDIN" makes a copy, but "E&=STDIN" make +Note that "<&STDIN" makes a copy, but "<&=STDIN" make an alias. That means if you close an aliased handle, all aliases become inaccessible. This is not true with a copied one. @@ -1006,6 +1118,14 @@ to, you may be able to do this: $rc = syscall(&SYS_close, $fd + 0); # must force numeric die "can't sysclose $fd: $!" unless $rc == -1; +Or just use the fdopen(3S) feature of open(): + + { + local *F; + open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!"; + close F; + } + =head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work? Whoops! You just put a tab and a formfeed into that filename! @@ -1013,19 +1133,22 @@ Remember that within double quoted strings ("like\this"), the backslash is an escape character. The full list of these is in L. Unsurprisingly, you don't have a file called "c:(tab)emp(formfeed)oo" or -"c:(tab)emp(formfeed)oo.exe" on your DOS filesystem. +"c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem. Either single-quote your strings, or (preferably) use forward slashes. Since all DOS and Windows versions since something like MS-DOS 2.0 or so have treated C and C<\> the same in a path, you might as well use the one that doesn't clash with Perl -- or the POSIX shell, ANSI C and C++, -awk, Tcl, Java, or Python, just to mention a few. +awk, Tcl, Java, or Python, just to mention a few. POSIX paths +are more portable, too. =head2 Why doesn't glob("*.*") get all the files? Because even on non-Unix ports, Perl's glob function follows standard Unix globbing semantics. You'll need C to get all (non-hidden) -files. This makes glob() portable. +files. This makes glob() portable even to legacy systems. Your +port may include proprietary globbing functions as well. Check its +documentation for details. =head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl? @@ -1052,20 +1175,41 @@ This has a significant advantage in space over reading the whole file in. A simple proof by induction is available upon request if you doubt its correctness. +=head2 Why do I get weird spaces when I print an array of lines? + +Saying + + print "@lines\n"; + +joins together the elements of C<@lines> with a space between them. +If C<@lines> were C<("little", "fluffy", "clouds")> then the above +statement would print: + + little fluffy clouds + +but if each element of C<@lines> was a line of text, ending a newline +character C<("little\n", "fluffy\n", "clouds\n")> then it would print: + + little + fluffy + clouds + +If your array contains lines, just print them: + + print @lines; + =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. +Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. -When included as part of the Standard Version of Perl, or as part of -its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic License. -Any distribution of this file or derivatives thereof I -of that package require that special arrangements be made with -copyright holder. - -Irrespective of its distribution, all code examples in this file -are hereby placed into the public domain. You are permitted and -encouraged to use this code in your own programs for fun -or for profit as you see fit. A simple comment in the code giving -credit would be courteous but is not required. +When included as an integrated part of the Standard Distribution +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of +all or part of this FAQ outside of that, see L. + +Irrespective of its distribution, all code examples here are in the public +domain. You are permitted and encouraged to use this code and any +derivatives thereof in your own programs for fun or for profit as you +see fit. A simple comment in the code giving credit to the FAQ would +be courteous but is not required.