X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq5.pod;h=31db204129115cf2b4b8a5da0ec9d100f0424d17;hb=af9e49b40a4cc2d6c0d5ebad7e84fb62143b24e1;hp=8c4aa2f0c9274937a93c55eacf9e2d662d28a9a7;hpb=b3b08c80675caa159d2328f3eb67ef1f9440082b;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 8c4aa2f..31db204 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.9 $, $Date: 2002/02/11 19:30:21 $) +perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 2005/10/13 19:49:13 $) =head1 DESCRIPTION @@ -8,159 +8,67 @@ This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers. =head2 How do I flush/unbuffer an output filehandle? Why must I do this? - -The C standard I/O library (stdio) normally buffers characters sent to -devices. This is done for efficiency reasons so that there isn't a -system call for each byte. Any time you use print() or write() in -Perl, you go though this buffering. syswrite() circumvents stdio and -buffering. - -In most stdio implementations, the type of output buffering and the size of -the buffer varies according to the type of device. Disk files are block -buffered, often with a buffer size of more than 2k. Pipes and sockets -are often buffered with a buffer size between 1/2 and 2k. Serial devices -(e.g. modems, terminals) are normally line-buffered, and stdio sends -the entire line when it gets the newline. - -Perl does not support truly unbuffered output (except insofar as you can -C). What it does instead support is "command -buffering", in which a physical write is performed after every output -command. This isn't as hard on your system as unbuffering, but does -get the output where you want it when you want it. - -If you expect characters to get to your device when you print them there, -you'll want to autoflush its handle. -Use select() and the C<$|> variable to control autoflushing -(see L and L): +X X X X + +Perl does not support truly unbuffered output (except +insofar as you can C), although it +does support is "command buffering", in which a physical +write is performed after every output command. + +The C standard I/O library (stdio) normally buffers +characters sent to devices so that there isn't a system call +for each byte. In most stdio implementations, the type of +output buffering and the size of the buffer varies according +to the type of device. Perl's print() and write() functions +normally buffer output, while syswrite() bypasses buffering +all together. + +If you want your output to be sent immediately when you +execute print() or write() (for instance, for some network +protocols), you must set the handle's autoflush flag. This +flag is the Perl variable $| and when it is set to a true +value, Perl will flush the handle's buffer after each +print() or write(). Setting $| affects buffering only for +the currently selected default file handle. You choose this +handle with the one argument select() call (see +L> and L). + +Use select() to choose the desired handle, then set its +per-filehandle variables. $old_fh = select(OUTPUT_HANDLE); $| = 1; select($old_fh); -Or using the traditional idiom: +Some idioms can handle this in a single statement: select((select(OUTPUT_HANDLE), $| = 1)[0]); -Or if don't mind slowly loading several thousand lines of module code -just because you're afraid of the C<$|> variable: - - use FileHandle; - open(DEV, "+autoflush(1); + $| = 1, select $_ for select OUTPUT_HANDLE; -or the newer IO::* modules: +Some modules offer object-oriented access to handles and their +variables, although they may be overkill if this is the only +thing you do with them. You can use IO::Handle: use IO::Handle; open(DEV, ">/dev/printer"); # but is this? DEV->autoflush(1); -or even this: +or IO::Socket: use IO::Socket; # this one is kinda a pipe? - $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com', - PeerPort => 'http(80)', - Proto => 'tcp'); - die "$!" unless $sock; + my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ; $sock->autoflush(); - print $sock "GET / HTTP/1.0" . "\015\012" x 2; - $document = join('', <$sock>); - print "DOC IS: $document\n"; - -Note the bizarrely hard coded carriage return and newline in their octal -equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That's the way things work in -network programming: you really should specify the exact bit pattern -on the network line terminator. In practice, C<"\n\n"> often works, -but this is not portable. - -See L for other examples of fetching URLs over the web. =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? +X -The short answer is to use the Tie::File module, which is included -in the standard distribution since Perl 5.8.0. - -The long answer is that those are operations of a text editor. Perl -is not a text editor. Perl is a programming language. You have to -decompose the problem into low-level calls to read, write, open, -close, and seek. - -Although humans have an easy time thinking of a text file as being a -sequence of lines that operates much like a stack of playing cards--or -punch cards--computers usually see the text file as a sequence of bytes. -In general, there's no direct way for Perl to seek to a particular line -of a file, insert text into a file, or remove text from a file. - -(There are exceptions in special circumstances. You can add or remove -data at the very end of the file. A sequence of bytes can be replaced -with another sequence of the same length. The C<$DB_RECNO> array -bindings as documented in L also provide a direct way of -modifying a file. Files where all lines are the same length are also -easy to alter.) - -The general solution is to create a temporary copy of the text file with -the changes you want, then copy that over the original. This assumes -no locking. - - $old = $file; - $new = "$file.tmp.$$"; - $bak = "$file.orig"; - - open(OLD, "< $old") or die "can't open $old: $!"; - open(NEW, "> $new") or die "can't open $new: $!"; - - # Correct typos, preserving case - while () { - s/\b(p)earl\b/${1}erl/i; - (print NEW $_) or die "can't write to $new: $!"; - } - - close(OLD) or die "can't close $old: $!"; - close(NEW) or die "can't close $new: $!"; - - rename($old, $bak) or die "can't rename $old to $bak: $!"; - rename($new, $old) or die "can't rename $new to $old: $!"; - -Perl can do this sort of thing for you automatically with the C<-i> -command-line switch or the closely-related C<$^I> variable (see -L for more details). Note that -C<-i> may require a suffix on some non-Unix systems; see the -platform-specific documentation that came with your port. - - # Renumber a series of tests from the command line - perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t - - # form a script - local($^I, @ARGV) = ('.orig', glob("*.c")); - while (<>) { - if ($. == 1) { - print "This line should appear at the top of each file\n"; - } - s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case - print; - close ARGV if eof; # Reset $. - } - -If you need to seek to an arbitrary line of a file that changes -infrequently, you could build up an index of byte positions of where -the line ends are in the file. If the file is large, an index of -every tenth or hundredth line end would allow you to seek and read -fairly efficiently. If the file is sorted, try the look.pl library -(part of the standard perl distribution). - -In the unique case of deleting lines at the end of a file, you -can use tell() and truncate(). The following code snippet deletes -the last line of a file without making a copy or reading the -whole file into memory: - - open (FH, "+< $file"); - while ( ) { $addr = tell(FH) unless eof(FH) } - truncate(FH, $addr); - -Error checking is left as an exercise for the reader. +Use the Tie::File module, which is included in the standard +distribution since Perl 5.8.0. =head2 How do I count the number of lines in a file? +X X X One fairly efficient way is to count newlines in the file. The following program uses a feature of tr///, as documented in L. @@ -176,11 +84,61 @@ proper text file, so this may report one fewer line than you expect. This assumes no funny games with newline translations. +=head2 How can I use Perl's C<-i> option from within a program? +X<-i> X + +C<-i> sets the value of Perl's C<$^I> variable, which in turn affects +the behavior of C<< <> >>; see L for more details. By +modifying the appropriate variables directly, you can get the same +behavior within a larger program. For example: + + # ... + { + local($^I, @ARGV) = ('.orig', glob("*.c")); + while (<>) { + if ($. == 1) { + print "This line should appear at the top of each file\n"; + } + s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case + print; + close ARGV if eof; # Reset $. + } + } + # $^I and @ARGV return to their old values here + +This block modifies all the C<.c> files in the current directory, +leaving a backup of the original data from each file in a new +C<.c.orig> file. + +=head2 How can I copy a file? +X X + +(contributed by brian d foy) + +Use the File::Copy module. It comes with Perl and can do a +true copy across file systems, and it does its magic in +a portable fashion. + + use File::Copy; + + copy( $original, $new_copy ) or die "Copy failed: $!"; + +If you can't use File::Copy, you'll have to do the work yourself: +open the original file, open the destination file, then print +to the destination file as you read the original. + =head2 How do I make a temporary file name? +X + +If you don't need to know the name of the file, you can use C +with C in place of the file name. The C function +creates an anonymous temporary file. -Use the File::Temp module, see L for more information. + open my $tmp, '+>', undef or die $!; - use File::Temp qw/ tempfile tempdir /; +Otherwise, you can use the File::Temp module. + + use File::Temp qw/ tempfile tempdir /; $dir = tempdir( CLEANUP => 1 ); ($fh, $filename) = tempfile( DIR => $dir ); @@ -211,6 +169,7 @@ temporary files in one process, use a counter: my $count = 0; until (defined(fileno(FH)) || $count++ > 100) { $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; + # O_EXCL is required for security reasons. sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT); } if (defined(fileno(FH)) @@ -222,9 +181,12 @@ temporary files in one process, use a counter: } =head2 How can I manipulate fixed-record-length files? +X X -The most efficient way is using pack() and unpack(). This is faster than -using substr() when taking many, many strings. It is slower for just a few. +The most efficient way is using L and +L. This is faster than using +L when taking many, many strings. It is +slower for just a few. Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, @@ -232,95 +194,55 @@ Berkeley-style ps: # sample input line: # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what - $PS_T = 'A6 A4 A7 A5 A*'; - open(PS, "ps|"); - print scalar ; - while () { - ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_); - for $var (qw!pid tt stat time command!) { - print "$var: <$$var>\n"; + my $PS_T = 'A6 A4 A7 A5 A*'; + open my $ps, '-|', 'ps'; + print scalar <$ps>; + my @fields = qw( pid tt stat time command ); + while (<$ps>) { + my %process; + @process{@fields} = unpack($PS_T, $_); + for my $field ( @fields ) { + print "$field: <$process{$field}>\n"; } - print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command), - "\n"; + print 'line=', pack($PS_T, @process{@fields} ), "\n"; } -We've used C<$$var> in a way that forbidden by C. -That is, we've promoted a string to a scalar variable reference using -symbolic references. This is okay in small programs, but doesn't scale -well. It also only works on global variables, not lexicals. +We've used a hash slice in order to easily handle the fields of each row. +Storing the keys in an array means it's easy to operate on them as a +group or loop over them with for. It also avoids polluting the program +with global variables and using symbolic references. =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? +X X X -The fastest, simplest, and most direct way is to localize the typeglob -of the filehandle in question: - - local *TmpHandle; - -Typeglobs are fast (especially compared with the alternatives) and -reasonably easy to use, but they also have one subtle drawback. If you -had, for example, a function named TmpHandle(), or a variable named -%TmpHandle, you just hid it from yourself. - - sub findme { - local *HostFile; - open(HostFile, ") { - print if /\b127\.(0\.0\.)?1\b/; - } - # *HostFile automatically closes/disappears here - } - -Here's how to use typeglobs in a loop to open and store a bunch of -filehandles. We'll use as values of the hash an ordered -pair to make it easy to sort the hash in insertion order. +As of perl5.6, open() autovivifies file and directory handles +as references if you pass it an uninitialized scalar variable. +You can then pass these references just like any other scalar, +and use them in the place of named handles. - @names = qw(motd termcap passwd hosts); - my $i = 0; - foreach $filename (@names) { - local *FH; - open(FH, "/etc/$filename") || die "$filename: $!"; - $file{$filename} = [ $i++, *FH ]; - } + open my $fh, $file_name; - # Using the filehandles in the array - foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) { - my $fh = $file{$name}[1]; - my $line = <$fh>; - print "$name $. $line"; - } + open local $fh, $file_name; -For passing filehandles to functions, the easiest way is to -preface them with a star, as in func(*STDIN). -See L for details. + print $fh "Hello World!\n"; -If you want to create many anonymous handles, you should check out the -Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent -code with Symbol::gensym, which is reasonably light-weight: + process_file( $fh ); - foreach $filename (@names) { - use Symbol; - my $fh = gensym(); - open($fh, "/etc/$filename") || die "open /etc/$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } +Before perl5.6, you had to deal with various typeglob idioms +which you may see in older code. -Here's using the semi-object-oriented FileHandle module, which certainly -isn't light-weight: + open FILE, "> $filename"; + process_typeglob( *FILE ); + process_reference( \*FILE ); - use FileHandle; + sub process_typeglob { local *FH = shift; print FH "Typeglob!" } + sub process_reference { local $fh = shift; print $fh "Reference!" } - foreach $filename (@names) { - my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } - -Please understand that whether the filehandle happens to be a (probably -localized) typeglob or an anonymous handle from one of the modules -in no way affects the bizarre rules for managing indirect handles. -See the next question. +If you want to create many anonymous handles, you should +check out the Symbol or IO::Handle modules. =head2 How can I use a filehandle indirectly? +X An indirect filehandle is using something other than a symbol in a place that a filehandle is expected. Here are ways @@ -332,13 +254,10 @@ to get indirect filehandles: $fh = \*SOME_FH; # ref to typeglob (bless-able) $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob -Or, you can use the C method from the FileHandle or IO modules to +Or, you can use the C method from one of the IO::* modules to create an anonymous filehandle, store that in a scalar variable, and use it as though it were a normal filehandle. - use FileHandle; - $fh = FileHandle->new(); - use IO::Handle; # 5.004 or higher $fh = IO::Handle->new(); @@ -346,7 +265,7 @@ Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains a filehandle. Functions like C, C, C, or -the C<< >> diamond operator will accept either a read filehandle +the C<< >> diamond operator will accept either a named filehandle or a scalar variable containing one: ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); @@ -398,17 +317,17 @@ an expression where you would place the filehandle: That block is a proper block like any other, so you can put more complicated code there. This sends the message out to one of two places: - $ok = -x "/bin/cat"; + $ok = -x "/bin/cat"; print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; - print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; + print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; This approach of treating C and C like object methods calls doesn't work for the diamond operator. That's because it's a real operator, not just a function with a comma-less argument. Assuming you've been storing typeglobs in your structure as we did above, you -can use the built-in function named C to reads a record just +can use the built-in function named C to read a record just as C<< <> >> does. Given the initialization shown above for @fd, this -would work, but only because readline() require a typeglob. It doesn't +would work, but only because readline() requires a typeglob. It doesn't work with objects or strings, which might be a bug we haven't fixed yet. $got = readline($fd[0]); @@ -419,49 +338,54 @@ It's the syntax of the fundamental operators. Playing the object game doesn't help you at all here. =head2 How can I set up a footer format to be used with write()? +X