X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq5.pod;h=83e34942c06c732ebf69bc762c831d3eecdf5706;hb=351f32542342b92f7303dec0a812c5301714120f;hp=80aad9402d9a54842b29c52f0f5cc4a16c3092ab;hpb=d2321c93212ec55efb2273c89a29b5ee3b388e1b;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 80aad94..83e3494 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.9 $, $Date: 2002/02/11 19:30:21 $) +perlfaq5 - Files and Formats ($Revision: 1.18 $, $Date: 2002/05/30 07:04:25 $) =head1 DESCRIPTION @@ -9,152 +9,61 @@ formats, and footers. =head2 How do I flush/unbuffer an output filehandle? Why must I do this? -The C standard I/O library (stdio) normally buffers characters sent to -devices. This is done for efficiency reasons so that there isn't a -system call for each byte. Any time you use print() or write() in -Perl, you go though this buffering. syswrite() circumvents stdio and -buffering. - -In most stdio implementations, the type of output buffering and the size of -the buffer varies according to the type of device. Disk files are block -buffered, often with a buffer size of more than 2k. Pipes and sockets -are often buffered with a buffer size between 1/2 and 2k. Serial devices -(e.g. modems, terminals) are normally line-buffered, and stdio sends -the entire line when it gets the newline. - -Perl does not support truly unbuffered output (except insofar as you can -C). What it does instead support is "command -buffering", in which a physical write is performed after every output -command. This isn't as hard on your system as unbuffering, but does -get the output where you want it when you want it. - -If you expect characters to get to your device when you print them there, -you'll want to autoflush its handle. -Use select() and the C<$|> variable to control autoflushing -(see L and L): +Perl does not support truly unbuffered output (except +insofar as you can C), although it +does support is "command buffering", in which a physical +write is performed after every output command. + +The C standard I/O library (stdio) normally buffers +characters sent to devices so that there isn't a system call +for each byte. In most stdio implementations, the type of +output buffering and the size of the buffer varies according +to the type of device. Perl's print() and write() functions +normally buffer output, while syswrite() bypasses buffering +all together. + +If you want your output to be sent immediately when you +execute print() or write() (for instance, for some network +protocols), you must set the handle's autoflush flag. This +flag is the Perl variable $| and when it is set to a true +value, Perl will flush the handle's buffer after each +print() or write(). Setting $| affects buffering only for +the currently selected default file handle. You choose this +handle with the one argument select() call (see +L and L). + +Use select() to choose the desired handle, then set its +per-filehandle variables. $old_fh = select(OUTPUT_HANDLE); $| = 1; select($old_fh); -Or using the traditional idiom: +Some idioms can handle this in a single statement: select((select(OUTPUT_HANDLE), $| = 1)[0]); -Or if don't mind slowly loading several thousand lines of module code -just because you're afraid of the C<$|> variable: + $| = 1, select $_ for select OUTPUT_HANDLE; - use FileHandle; - open(DEV, "+autoflush(1); - -or the newer IO::* modules: +Some modules offer object-oriented access to handles and their +variables, although they may be overkill if this is the only +thing you do with them. You can use IO::Handle: use IO::Handle; open(DEV, ">/dev/printer"); # but is this? DEV->autoflush(1); -or even this: +or IO::Socket: use IO::Socket; # this one is kinda a pipe? - $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com', - PeerPort => 'http(80)', - Proto => 'tcp'); - die "$!" unless $sock; + my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ; $sock->autoflush(); - print $sock "GET / HTTP/1.0" . "\015\012" x 2; - $document = join('', <$sock>); - print "DOC IS: $document\n"; - -Note the bizarrely hard coded carriage return and newline in their octal -equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That's the way things work in -network programming: you really should specify the exact bit pattern -on the network line terminator. In practice, C<"\n\n"> often works, -but this is not portable. - -See L for other examples of fetching URLs over the web. =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? -Those are operations of a text editor. Perl is not a text editor. -Perl is a programming language. You have to decompose the problem into -low-level calls to read, write, open, close, and seek. - -Although humans have an easy time thinking of a text file as being a -sequence of lines that operates much like a stack of playing cards--or -punch cards--computers usually see the text file as a sequence of bytes. -In general, there's no direct way for Perl to seek to a particular line -of a file, insert text into a file, or remove text from a file. - -(There are exceptions in special circumstances. You can add or remove -data at the very end of the file. A sequence of bytes can be replaced -with another sequence of the same length. The C<$DB_RECNO> array -bindings as documented in L also provide a direct way of -modifying a file. Files where all lines are the same length are also -easy to alter.) - -The general solution is to create a temporary copy of the text file with -the changes you want, then copy that over the original. This assumes -no locking. - - $old = $file; - $new = "$file.tmp.$$"; - $bak = "$file.orig"; - - open(OLD, "< $old") or die "can't open $old: $!"; - open(NEW, "> $new") or die "can't open $new: $!"; - - # Correct typos, preserving case - while () { - s/\b(p)earl\b/${1}erl/i; - (print NEW $_) or die "can't write to $new: $!"; - } - - close(OLD) or die "can't close $old: $!"; - close(NEW) or die "can't close $new: $!"; - - rename($old, $bak) or die "can't rename $old to $bak: $!"; - rename($new, $old) or die "can't rename $new to $old: $!"; - -Perl can do this sort of thing for you automatically with the C<-i> -command-line switch or the closely-related C<$^I> variable (see -L for more details). Note that -C<-i> may require a suffix on some non-Unix systems; see the -platform-specific documentation that came with your port. - - # Renumber a series of tests from the command line - perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t - - # form a script - local($^I, @ARGV) = ('.orig', glob("*.c")); - while (<>) { - if ($. == 1) { - print "This line should appear at the top of each file\n"; - } - s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case - print; - close ARGV if eof; # Reset $. - } - -If you need to seek to an arbitrary line of a file that changes -infrequently, you could build up an index of byte positions of where -the line ends are in the file. If the file is large, an index of -every tenth or hundredth line end would allow you to seek and read -fairly efficiently. If the file is sorted, try the look.pl library -(part of the standard perl distribution). - -In the unique case of deleting lines at the end of a file, you -can use tell() and truncate(). The following code snippet deletes -the last line of a file without making a copy or reading the -whole file into memory: - - open (FH, "+< $file"); - while ( ) { $addr = tell(FH) unless eof(FH) } - truncate(FH, $addr); - -Error checking is left as an exercise for the reader. +Use the Tie::File module, which is included in the standard +distribution since Perl 5.8.0. =head2 How do I count the number of lines in a file? @@ -247,74 +156,31 @@ well. It also only works on global variables, not lexicals. =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? -The fastest, simplest, and most direct way is to localize the typeglob -of the filehandle in question: +As of perl5.6, open() autovivifies file and directory handles +as references if you pass it an uninitialized scalar variable. +You can then pass these references just like any other scalar, +and use them in the place of named handles. - local *TmpHandle; + open my $fh, $file_name; -Typeglobs are fast (especially compared with the alternatives) and -reasonably easy to use, but they also have one subtle drawback. If you -had, for example, a function named TmpHandle(), or a variable named -%TmpHandle, you just hid it from yourself. + open local $fh, $file_name; - sub findme { - local *HostFile; - open(HostFile, ") { - print if /\b127\.(0\.0\.)?1\b/; - } - # *HostFile automatically closes/disappears here - } + print $fh "Hello World!\n"; -Here's how to use typeglobs in a loop to open and store a bunch of -filehandles. We'll use as values of the hash an ordered -pair to make it easy to sort the hash in insertion order. + process_file( $fh ); - @names = qw(motd termcap passwd hosts); - my $i = 0; - foreach $filename (@names) { - local *FH; - open(FH, "/etc/$filename") || die "$filename: $!"; - $file{$filename} = [ $i++, *FH ]; - } - - # Using the filehandles in the array - foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) { - my $fh = $file{$name}[1]; - my $line = <$fh>; - print "$name $. $line"; - } +Before perl5.6, you had to deal with various typeglob idioms +which you may see in older code. -For passing filehandles to functions, the easiest way is to -preface them with a star, as in func(*STDIN). -See L for details. + open FILE, "> $filename"; + process_typeglob( *FILE ); + process_reference( \*FILE ); -If you want to create many anonymous handles, you should check out the -Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent -code with Symbol::gensym, which is reasonably light-weight: - - foreach $filename (@names) { - use Symbol; - my $fh = gensym(); - open($fh, "/etc/$filename") || die "open /etc/$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } + sub process_typeglob { local *FH = shift; print FH "Typeglob!" } + sub process_reference { local $fh = shift; print $fh "Reference!" } -Here's using the semi-object-oriented FileHandle module, which certainly -isn't light-weight: - - use FileHandle; - - foreach $filename (@names) { - my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } - -Please understand that whether the filehandle happens to be a (probably -localized) typeglob or an anonymous handle from one of the modules -in no way affects the bizarre rules for managing indirect handles. -See the next question. +If you want to create many anonymous handles, you should +check out the Symbol or IO::Handle modules. =head2 How can I use a filehandle indirectly? @@ -328,13 +194,10 @@ to get indirect filehandles: $fh = \*SOME_FH; # ref to typeglob (bless-able) $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob -Or, you can use the C method from the FileHandle or IO modules to +Or, you can use the C method from one of the IO::* modules to create an anonymous filehandle, store that in a scalar variable, and use it as though it were a normal filehandle. - use FileHandle; - $fh = FileHandle->new(); - use IO::Handle; # 5.004 or higher $fh = IO::Handle->new(); @@ -342,7 +205,7 @@ Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains a filehandle. Functions like C, C, C, or -the C<< >> diamond operator will accept either a read filehandle +the C<< >> diamond operator will accept either a named filehandle or a scalar variable containing one: ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); @@ -402,9 +265,9 @@ This approach of treating C and C like object methods calls doesn't work for the diamond operator. That's because it's a real operator, not just a function with a comma-less argument. Assuming you've been storing typeglobs in your structure as we did above, you -can use the built-in function named C to reads a record just +can use the built-in function named C to read a record just as C<< <> >> does. Given the initialization shown above for @fd, this -would work, but only because readline() require a typeglob. It doesn't +would work, but only because readline() requires a typeglob. It doesn't work with objects or strings, which might be a bug we haven't fixed yet. $got = readline($fd[0]); @@ -425,37 +288,23 @@ See L for an swrite() function. =head2 How can I output my numbers with commas added? -This one will do it for you: - - sub commify { - my $number = shift; - 1 while ($number =~ s/^([-+]?\d+)(\d{3})/$1,$2/); - return $number; - } - - $n = 23659019423.2331; - print "GOT: ", commify($n), "\n"; - - GOT: 23,659,019,423.2331 - -You can't just: - - s/^([-+]?\d+)(\d{3})/$1,$2/g; +This one from Benjamin Goldberg will do it for you: -because you have to put the comma in and then recalculate your -position. + s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g; -Alternatively, this code commifies all numbers in a line regardless of -whether they have decimal portions, are preceded by + or -, or -whatever: +or written verbosely: - # from Andrew Johnson - sub commify { - my $input = shift; - $input = reverse $input; - $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g; - return scalar reverse $input; - } + s/( + ^[-+]? # beginning of number. + \d{1,3}? # first digits before first comma + (?= # followed by, (but not included in the match) : + (?>(?:\d{3})+) # some positive multiple of three digits. + (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever. + ) + | # or: + \G\d{3} # after the last group, get three digits + (?=\d) # but they have to have more digits after them. + )/$1,/xg; =head2 How can I translate tildes (~) in a filename? @@ -576,35 +425,23 @@ best therefore to use glob() only in list context. Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|") to mean something -special. To avoid this, you might want to use a routine like the one below. -It turns incomplete pathnames into explicit relative ones, and tacks a -trailing null byte on the name to make perl leave it alone: - - sub safe_filename { - local $_ = shift; - s#^([^./])#./$1#; - $_ .= "\0"; - return $_; - } +special. - $badpath = "<< $fn") or "couldn't open $badpath: $!"; +The three argument form of open() lets you specify the mode +separately from the filename. The open() function treats +special mode characters and whitespace in the filename as +literals -This assumes that you are using POSIX (portable operating systems -interface) paths. If you are on a closed, non-portable, proprietary -system, you may have to adjust the C<"./"> above. + open FILE, "<", " file "; # filename is " file " + open FILE, ">", ">file"; # filename is ">file" -It would be a lot clearer to use sysopen(), though: +It may be a lot clearer to use sysopen(), though: use Fcntl; $badpath = "<< if you have it -(new for 5.6). - =head2 How can I reliably rename a file? If your operating system supports a proper mv(1) utility or its functional @@ -763,14 +600,17 @@ Don't forget them or you'll be quite sorry. =head2 How do I get a file's timestamp in perl? -If you want to retrieve the time at which the file was last read, -written, or had its meta-data (owner, etc) changed, you use the B<-M>, -B<-A>, or B<-C> file test operations as documented in L. These -retrieve the age of the file (measured against the start-time of your -program) in days as a floating point number. To retrieve the "raw" -time in seconds since the epoch, you would call the stat function, -then use localtime(), gmtime(), or POSIX::strftime() to convert this -into human-readable form. +If you want to retrieve the time at which the file was last +read, written, or had its meta-data (owner, etc) changed, +you use the B<-M>, B<-A>, or B<-C> file test operations as +documented in L. These retrieve the age of the +file (measured against the start-time of your program) in +days as a floating point number. Some platforms may not have +all of these times. See L for details. To +retrieve the "raw" time in seconds since the epoch, you +would call the stat function, then use localtime(), +gmtime(), or POSIX::strftime() to convert this into +human-readable form. Here's an example: @@ -855,27 +695,14 @@ you see someone do this: @lines = ; -you should think long and hard about why you need everything loaded -at once. It's just not a scalable solution. You might also find it -more fun to use the standard DB_File module's $DB_RECNO bindings, -which allow you to tie an array to a file so that accessing an element -the array actually accesses the corresponding line in the file. - -On very rare occasion, you may have an algorithm that demands that -the entire file be in memory at once as one scalar. The simplest solution -to that is - - $var = `cat $file`; +you should think long and hard about why you need everything loaded at +once. It's just not a scalable solution. You might also find it more +fun to use the standard Tie::File module, or the DB_File module's +$DB_RECNO bindings, which allow you to tie an array to a file so that +accessing an element the array actually accesses the corresponding +line in the file. -Being in scalar context, you get the whole thing. In list context, -you'd get a list of all the lines: - - @lines = `cat $file`; - -This tiny but expedient solution is neat, clean, and portable to -all systems on which decent tools have been installed. For those -who prefer not to use the toolbox, you can of course read the file -manually, although this makes for more complicated code. +You can read the entire filehandle contents into a scalar. { local(*INPUT, $/); @@ -888,6 +715,13 @@ close the file at block exit. If the file is already open, just use this: $var = do { local $/; }; +For ordinary files you can also use the read function. + + read( INPUT, $var, -s INPUT ); + +The third argument tests the byte size of the data on the INPUT filehandle +and reads that many bytes into the buffer $var. + =head2 How can I read in a file by paragraphs? Use the C<$/> variable (see L for details). You can either @@ -1096,7 +930,7 @@ Or, just use the fdopen(3S) feature of open(): close F; } -=head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work? +=head2 Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work? Whoops! You just put a tab and a formfeed into that filename! Remember that within double quoted strings ("like\this"), the