X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq5.pod;h=49a348a81c7a8e8890936426cbcd61ed200f0a6a;hb=0dfdcd8a63a82bd61087d84a6f130e03a4b20ed9;hp=521cb6f263ad92bcd3d8a5c9e7c256dad617bafe;hpb=30852c57b1a5333a45e16bf8bdc51e232a28eed0;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 521cb6f..49a348a 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.14 $, $Date: 2002/04/07 18:33:45 $) +perlfaq5 - Files and Formats ($Revision: 1.30 $, $Date: 2003/11/23 08:07:46 $) =head1 DESCRIPTION @@ -9,72 +9,56 @@ formats, and footers. =head2 How do I flush/unbuffer an output filehandle? Why must I do this? -The C standard I/O library (stdio) normally buffers characters sent to -devices. This is done for efficiency reasons so that there isn't a -system call for each byte. Any time you use print() or write() in -Perl, you go though this buffering. syswrite() circumvents stdio and -buffering. - -In most stdio implementations, the type of output buffering and the size of -the buffer varies according to the type of device. Disk files are block -buffered, often with a buffer size of more than 2k. Pipes and sockets -are often buffered with a buffer size between 1/2 and 2k. Serial devices -(e.g. modems, terminals) are normally line-buffered, and stdio sends -the entire line when it gets the newline. - -Perl does not support truly unbuffered output (except insofar as you can -C). What it does instead support is "command -buffering", in which a physical write is performed after every output -command. This isn't as hard on your system as unbuffering, but does -get the output where you want it when you want it. - -If you expect characters to get to your device when you print them there, -you'll want to autoflush its handle. -Use select() and the C<$|> variable to control autoflushing -(see L and L): +Perl does not support truly unbuffered output (except +insofar as you can C), although it +does support is "command buffering", in which a physical +write is performed after every output command. + +The C standard I/O library (stdio) normally buffers +characters sent to devices so that there isn't a system call +for each byte. In most stdio implementations, the type of +output buffering and the size of the buffer varies according +to the type of device. Perl's print() and write() functions +normally buffer output, while syswrite() bypasses buffering +all together. + +If you want your output to be sent immediately when you +execute print() or write() (for instance, for some network +protocols), you must set the handle's autoflush flag. This +flag is the Perl variable $| and when it is set to a true +value, Perl will flush the handle's buffer after each +print() or write(). Setting $| affects buffering only for +the currently selected default file handle. You choose this +handle with the one argument select() call (see +L> and L). + +Use select() to choose the desired handle, then set its +per-filehandle variables. $old_fh = select(OUTPUT_HANDLE); $| = 1; select($old_fh); -Or using the traditional idiom: +Some idioms can handle this in a single statement: select((select(OUTPUT_HANDLE), $| = 1)[0]); -Or if don't mind slowly loading several thousand lines of module code -just because you're afraid of the C<$|> variable: + $| = 1, select $_ for select OUTPUT_HANDLE; - use FileHandle; - open(DEV, "+autoflush(1); - -or the newer IO::* modules: +Some modules offer object-oriented access to handles and their +variables, although they may be overkill if this is the only +thing you do with them. You can use IO::Handle: use IO::Handle; open(DEV, ">/dev/printer"); # but is this? DEV->autoflush(1); -or even this: +or IO::Socket: use IO::Socket; # this one is kinda a pipe? - $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com', - PeerPort => 'http(80)', - Proto => 'tcp'); - die "$!" unless $sock; + my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ; $sock->autoflush(); - print $sock "GET / HTTP/1.0" . "\015\012" x 2; - $document = join('', <$sock>); - print "DOC IS: $document\n"; - -Note the bizarrely hard coded carriage return and newline in their octal -equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That's the way things work in -network programming: you really should specify the exact bit pattern -on the network line terminator. In practice, C<"\n\n"> often works, -but this is not portable. - -See L for other examples of fetching URLs over the web. =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file? @@ -97,11 +81,36 @@ proper text file, so this may report one fewer line than you expect. This assumes no funny games with newline translations. +=head2 How can I use Perl's C<-i> option from within a program? + +C<-i> sets the value of Perl's C<$^I> variable, which in turn affects +the behavior of C<< <> >>; see L for more details. By +modifying the appropriate variables directly, you can get the same +behavior within a larger program. For example: + + # ... + { + local($^I, @ARGV) = ('.orig', glob("*.c")); + while (<>) { + if ($. == 1) { + print "This line should appear at the top of each file\n"; + } + s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case + print; + close ARGV if eof; # Reset $. + } + } + # $^I and @ARGV return to their old values here + +This block modifies all the C<.c> files in the current directory, +leaving a backup of the original data from each file in a new +C<.c.orig> file. + =head2 How do I make a temporary file name? Use the File::Temp module, see L for more information. - use File::Temp qw/ tempfile tempdir /; + use File::Temp qw/ tempfile tempdir /; $dir = tempdir( CLEANUP => 1 ); ($fh, $filename) = tempfile( DIR => $dir ); @@ -132,6 +141,7 @@ temporary files in one process, use a counter: my $count = 0; until (defined(fileno(FH)) || $count++ > 100) { $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; + # O_EXCL is required for security reasons. sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT); } if (defined(fileno(FH)) @@ -144,8 +154,10 @@ temporary files in one process, use a counter: =head2 How can I manipulate fixed-record-length files? -The most efficient way is using pack() and unpack(). This is faster than -using substr() when taking many, many strings. It is slower for just a few. +The most efficient way is using L and +L. This is faster than using +L when taking many, many strings. It is +slower for just a few. Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, @@ -153,93 +165,51 @@ Berkeley-style ps: # sample input line: # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what - $PS_T = 'A6 A4 A7 A5 A*'; - open(PS, "ps|"); - print scalar ; - while () { - ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_); - for $var (qw!pid tt stat time command!) { - print "$var: <$$var>\n"; + my $PS_T = 'A6 A4 A7 A5 A*'; + open my $ps, '-|', 'ps'; + print scalar <$ps>; + my @fields = qw( pid tt stat time command ); + while (<$ps>) { + my %process; + @process{@fields} = unpack($PS_T, $_); + for my $field ( @fields ) { + print "$field: <$process{$field}>\n"; } - print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command), - "\n"; + print 'line=', pack($PS_T, @process{@fields} ), "\n"; } -We've used C<$$var> in a way that forbidden by C. -That is, we've promoted a string to a scalar variable reference using -symbolic references. This is okay in small programs, but doesn't scale -well. It also only works on global variables, not lexicals. +We've used a hash slice in order to easily handle the fields of each row. +Storing the keys in an array means it's easy to operate on them as a +group or loop over them with for. It also avoids polluting the program +with global variables and using symbolic references. =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? -The fastest, simplest, and most direct way is to localize the typeglob -of the filehandle in question: +As of perl5.6, open() autovivifies file and directory handles +as references if you pass it an uninitialized scalar variable. +You can then pass these references just like any other scalar, +and use them in the place of named handles. - local *TmpHandle; + open my $fh, $file_name; -Typeglobs are fast (especially compared with the alternatives) and -reasonably easy to use, but they also have one subtle drawback. If you -had, for example, a function named TmpHandle(), or a variable named -%TmpHandle, you just hid it from yourself. + open local $fh, $file_name; - sub findme { - local *HostFile; - open(HostFile, ") { - print if /\b127\.(0\.0\.)?1\b/; - } - # *HostFile automatically closes/disappears here - } + print $fh "Hello World!\n"; -Here's how to use typeglobs in a loop to open and store a bunch of -filehandles. We'll use as values of the hash an ordered -pair to make it easy to sort the hash in insertion order. + process_file( $fh ); - @names = qw(motd termcap passwd hosts); - my $i = 0; - foreach $filename (@names) { - local *FH; - open(FH, "/etc/$filename") || die "$filename: $!"; - $file{$filename} = [ $i++, *FH ]; - } +Before perl5.6, you had to deal with various typeglob idioms +which you may see in older code. - # Using the filehandles in the array - foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) { - my $fh = $file{$name}[1]; - my $line = <$fh>; - print "$name $. $line"; - } - -For passing filehandles to functions, the easiest way is to -preface them with a star, as in func(*STDIN). -See L for details. + open FILE, "> $filename"; + process_typeglob( *FILE ); + process_reference( \*FILE ); -If you want to create many anonymous handles, you should check out the -Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent -code with Symbol::gensym, which is reasonably light-weight: - - foreach $filename (@names) { - use Symbol; - my $fh = gensym(); - open($fh, "/etc/$filename") || die "open /etc/$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } + sub process_typeglob { local *FH = shift; print FH "Typeglob!" } + sub process_reference { local $fh = shift; print $fh "Reference!" } -Here's using the semi-object-oriented FileHandle module, which certainly -isn't light-weight: - - use FileHandle; - - foreach $filename (@names) { - my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!"; - $file{$filename} = [ $i++, $fh ]; - } - -Please understand that whether the filehandle happens to be a (probably -localized) typeglob or an anonymous handle from one of the modules -in no way affects the bizarre rules for managing indirect handles. -See the next question. +If you want to create many anonymous handles, you should +check out the Symbol or IO::Handle modules. =head2 How can I use a filehandle indirectly? @@ -253,13 +223,10 @@ to get indirect filehandles: $fh = \*SOME_FH; # ref to typeglob (bless-able) $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob -Or, you can use the C method from the FileHandle or IO modules to +Or, you can use the C method from one of the IO::* modules to create an anonymous filehandle, store that in a scalar variable, and use it as though it were a normal filehandle. - use FileHandle; - $fh = FileHandle->new(); - use IO::Handle; # 5.004 or higher $fh = IO::Handle->new(); @@ -267,7 +234,7 @@ Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains a filehandle. Functions like C, C, C, or -the C<< >> diamond operator will accept either a read filehandle +the C<< >> diamond operator will accept either a named filehandle or a scalar variable containing one: ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR); @@ -319,17 +286,17 @@ an expression where you would place the filehandle: That block is a proper block like any other, so you can put more complicated code there. This sends the message out to one of two places: - $ok = -x "/bin/cat"; + $ok = -x "/bin/cat"; print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; - print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; + print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; This approach of treating C and C like object methods calls doesn't work for the diamond operator. That's because it's a real operator, not just a function with a comma-less argument. Assuming you've been storing typeglobs in your structure as we did above, you -can use the built-in function named C to reads a record just +can use the built-in function named C to read a record just as C<< <> >> does. Given the initialization shown above for @fd, this -would work, but only because readline() require a typeglob. It doesn't +would work, but only because readline() requires a typeglob. It doesn't work with objects or strings, which might be a bug we haven't fixed yet. $got = readline($fd[0]); @@ -350,11 +317,19 @@ See L for an swrite() function. =head2 How can I output my numbers with commas added? -This one from Benjamin Goldberg will do it for you: +This subroutine will add commas to your number: + + sub commify { + local $_ = shift; + 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; + return $_; + } + +This regex from Benjamin Goldberg will add commas to numbers: s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g; -or written verbosely: +It is easier to see with comments: s/( ^[-+]? # beginning of number. @@ -398,7 +373,7 @@ I gives you read-write access: open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. +doesn't exist. open(FH, "+< /path/name"); # open for update @@ -453,8 +428,8 @@ To open file for update, file must not exist: To open a file without blocking, creating if necessary: - sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT) - or die "can't open /tmp/somefile: $!": + sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT) + or die "can't open /foo/somefile: $!": Be warned that neither creation nor deletion of files is guaranteed to be an atomic operation over NFS. That is, two processes might both @@ -463,7 +438,7 @@ isn't as exclusive as you might wish. See also the new L if you have it (new for 5.6). -=head2 Why do I sometimes get an "Argument list too long" when I use <*>? +=head2 Why do I sometimes get an "Argument list too long" when I use E*E? The C<< <> >> operator performs a globbing operation (see above). In Perl versions earlier than v5.6.0, the internal glob() operator forks @@ -487,11 +462,11 @@ best therefore to use glob() only in list context. Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|") to mean something -special. +special. The three argument form of open() lets you specify the mode separately from the filename. The open() function treats -special mode characters and whitespace in the filename as +special mode characters and whitespace in the filename as literals open FILE, "<", " file "; # filename is " file " @@ -506,8 +481,8 @@ It may be a lot clearer to use sysopen(), though: =head2 How can I reliably rename a file? -If your operating system supports a proper mv(1) utility or its functional -equivalent, this works: +If your operating system supports a proper mv(1) utility or its +functional equivalent, this works: rename($old, $new) or system("mv", $old, $new); @@ -561,12 +536,12 @@ for your own system's idiosyncrasies (sometimes called "features"). Slavish adherence to portability concerns shouldn't get in the way of your getting your job done.) -For more information on file locking, see also +For more information on file locking, see also L if you have it (new for 5.6). =back -=head2 Why can't I just open(FH, ">file.lock")? +=head2 Why can't I just open(FH, "Efile.lock")? A common bit of code B is this: @@ -578,7 +553,7 @@ which must be done in one. That's why computer hardware provides an atomic test-and-set instruction. In theory, this "ought" to work: sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT) - or die "can't open file.lock: $!": + or die "can't open file.lock: $!"; except that lamentably, file creation (and deletion) is not atomic over NFS, so this won't work (at least, not every time) over the net. @@ -715,30 +690,22 @@ utime() on those platforms. =head2 How do I print to more than one file at once? -If you only have to do this once, you can do this: +To connect one filehandle to several output filehandles, +you can use the IO::Tee or Tie::FileHandle::Multiplex modules. - for $fh (FH1, FH2, FH3) { print $fh "whatever\n" } - -To connect up to one filehandle to several output filehandles, it's -easiest to use the tee(1) program if you have it, and let it take care -of the multiplexing: +If you only have to do this once, you can print individually +to each filehandle. - open (FH, "| tee file1 file2 file3"); + for $fh (FH1, FH2, FH3) { print $fh "whatever\n" } -Or even: +=head2 How can I read in an entire file all at once? - # make STDOUT go to three files, plus original STDOUT - open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n"; - print "whatever\n" or die "Writing: $!\n"; - close(STDOUT) or die "Closing: $!\n"; +You can use the File::Slurp module to do it in one step. -Otherwise you'll have to write your own multiplexing print -function--or your own tee program--or use Tom Christiansen's, -at http://www.cpan.org/authors/id/TOMC/scripts/tct.gz , which is -written in Perl and offers much greater functionality -than the stock version. + use File::Slurp; -=head2 How can I read in an entire file all at once? + $all_of_it = read_file($filename); # entire file in scalar + @all_lines = read_file($filename); # one line perl element The customary Perl approach for processing all the lines in a file is to do so one line at a time: @@ -747,7 +714,7 @@ do so one line at a time: while () { chomp; # do something with $_ - } + } close(INPUT) || die "can't close $file: $!"; This is tremendously more efficient than reading the entire file into @@ -772,7 +739,7 @@ You can read the entire filehandle contents into a scalar. $var = ; } -That temporarily undefs your record separator, and will automatically +That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this: $var = do { local $/; }; @@ -791,7 +758,7 @@ set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">, for instance, gets treated as two paragraphs and not three), or C<"\n\n"> to accept empty paragraphs. -Note that a blank line must have no blanks in it. Thus +Note that a blank line must have no blanks in it. Thus S> is one paragraph, but C<"fred\n\nstuff\n\n"> is two. =head2 How can I read a single character from a file? From the keyboard? @@ -958,7 +925,7 @@ There's also a File::Tail module from CPAN. If you check L, you'll see that several of the ways to call open() should do the trick. For example: - open(LOG, ">>/tmp/logfile"); + open(LOG, ">>/foo/logfile"); open(STDERR, ">&LOG"); Or even with a literal numeric descriptor: @@ -968,7 +935,7 @@ Or even with a literal numeric descriptor: Note that "<&STDIN" makes a copy, but "<&=STDIN" make an alias. That means if you close an aliased handle, all -aliases become inaccessible. This is not true with +aliases become inaccessible. This is not true with a copied one. Error checking, as always, has been left as an exercise for the reader. @@ -986,8 +953,8 @@ to, you may be able to do this: Or, just use the fdopen(3S) feature of open(): - { - local *F; + { + local *F; open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!"; close F; } @@ -1020,7 +987,7 @@ documentation for details. This is elaborately and painstakingly described in the F article in the "Far More Than You Ever Wanted To -Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz . +Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . The executive summary: learn how your filesystem works. The permissions on a file say what can happen to the data in that file. @@ -1037,9 +1004,18 @@ Here's an algorithm from the Camel Book: srand; rand($.) < 1 && ($line = $_) while <>; -This has a significant advantage in space over reading the whole -file in. A simple proof by induction is available upon -request if you doubt the algorithm's correctness. +This has a significant advantage in space over reading the whole file +in. You can find a proof of this method in I, Volume 2, Section 3.4.2, by Donald E. Knuth. + +You can use the File::Random module which provides a function +for that algorithm: + + use File::Random qw/random_line/; + my $line = random_line($filename); + +Another way is to use the Tie::File module, which treats the entire +file as an array. Simply access a random array element. =head2 Why do I get weird spaces when I print an array of lines?