=head1 NAME
-perlfaq5 - Files and Formats ($Revision: 1.14 $, $Date: 2002/04/07 18:33:45 $)
+perlfaq5 - Files and Formats ($Revision: 1.36 $, $Date: 2005/04/22 19:04:48 $)
=head1 DESCRIPTION
=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
-The C standard I/O library (stdio) normally buffers characters sent to
-devices. This is done for efficiency reasons so that there isn't a
-system call for each byte. Any time you use print() or write() in
-Perl, you go though this buffering. syswrite() circumvents stdio and
-buffering.
-
-In most stdio implementations, the type of output buffering and the size of
-the buffer varies according to the type of device. Disk files are block
-buffered, often with a buffer size of more than 2k. Pipes and sockets
-are often buffered with a buffer size between 1/2 and 2k. Serial devices
-(e.g. modems, terminals) are normally line-buffered, and stdio sends
-the entire line when it gets the newline.
-
-Perl does not support truly unbuffered output (except insofar as you can
-C<syswrite(OUT, $char, 1)>). What it does instead support is "command
-buffering", in which a physical write is performed after every output
-command. This isn't as hard on your system as unbuffering, but does
-get the output where you want it when you want it.
-
-If you expect characters to get to your device when you print them there,
-you'll want to autoflush its handle.
-Use select() and the C<$|> variable to control autoflushing
-(see L<perlvar/$|> and L<perlfunc/select>):
+Perl does not support truly unbuffered output (except
+insofar as you can C<syswrite(OUT, $char, 1)>), although it
+does support is "command buffering", in which a physical
+write is performed after every output command.
+
+The C standard I/O library (stdio) normally buffers
+characters sent to devices so that there isn't a system call
+for each byte. In most stdio implementations, the type of
+output buffering and the size of the buffer varies according
+to the type of device. Perl's print() and write() functions
+normally buffer output, while syswrite() bypasses buffering
+all together.
+
+If you want your output to be sent immediately when you
+execute print() or write() (for instance, for some network
+protocols), you must set the handle's autoflush flag. This
+flag is the Perl variable $| and when it is set to a true
+value, Perl will flush the handle's buffer after each
+print() or write(). Setting $| affects buffering only for
+the currently selected default file handle. You choose this
+handle with the one argument select() call (see
+L<perlvar/$E<verbar>> and L<perlfunc/select>).
+
+Use select() to choose the desired handle, then set its
+per-filehandle variables.
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
-Or using the traditional idiom:
+Some idioms can handle this in a single statement:
select((select(OUTPUT_HANDLE), $| = 1)[0]);
-Or if don't mind slowly loading several thousand lines of module code
-just because you're afraid of the C<$|> variable:
+ $| = 1, select $_ for select OUTPUT_HANDLE;
- use FileHandle;
- open(DEV, "+</dev/tty"); # ceci n'est pas une pipe
- DEV->autoflush(1);
-
-or the newer IO::* modules:
+Some modules offer object-oriented access to handles and their
+variables, although they may be overkill if this is the only
+thing you do with them. You can use IO::Handle:
use IO::Handle;
open(DEV, ">/dev/printer"); # but is this?
DEV->autoflush(1);
-or even this:
+or IO::Socket:
use IO::Socket; # this one is kinda a pipe?
- $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
- PeerPort => 'http(80)',
- Proto => 'tcp');
- die "$!" unless $sock;
+ my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ;
$sock->autoflush();
- print $sock "GET / HTTP/1.0" . "\015\012" x 2;
- $document = join('', <$sock>);
- print "DOC IS: $document\n";
-
-Note the bizarrely hard coded carriage return and newline in their octal
-equivalents. This is the ONLY way (currently) to assure a proper flush
-on all platforms, including Macintosh. That's the way things work in
-network programming: you really should specify the exact bit pattern
-on the network line terminator. In practice, C<"\n\n"> often works,
-but this is not portable.
-
-See L<perlfaq9> for other examples of fetching URLs over the web.
=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
This assumes no funny games with newline translations.
+=head2 How can I use Perl's C<-i> option from within a program?
+
+C<-i> sets the value of Perl's C<$^I> variable, which in turn affects
+the behavior of C<< <> >>; see L<perlrun> for more details. By
+modifying the appropriate variables directly, you can get the same
+behavior within a larger program. For example:
+
+ # ...
+ {
+ local($^I, @ARGV) = ('.orig', glob("*.c"));
+ while (<>) {
+ if ($. == 1) {
+ print "This line should appear at the top of each file\n";
+ }
+ s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
+ print;
+ close ARGV if eof; # Reset $.
+ }
+ }
+ # $^I and @ARGV return to their old values here
+
+This block modifies all the C<.c> files in the current directory,
+leaving a backup of the original data from each file in a new
+C<.c.orig> file.
+
+=head2 How can I copy a file?
+
+(contributed by brian d foy)
+
+Use the File::Copy module. It comes with Perl and can do a
+true copy across file systems, and it does its magic in
+a portable fashion.
+
+ use File::Copy;
+
+ copy( $original, $new_copy ) or die "Copy failed: $!";
+
+If you can't use File::Copy, you'll have to do the work yourself:
+open the original file, open the destination file, then print
+to the destination file as you read the original.
+
=head2 How do I make a temporary file name?
-Use the File::Temp module, see L<File::Temp> for more information.
+If you don't need to know the name of the file, you can use C<open()>
+with C<undef> in place of the file name. The C<open()> function
+creates an anonymous temporary file.
+
+ open my $tmp, '+>', undef or die $!;
+
+Otherwise, you can use the File::Temp module.
- use File::Temp qw/ tempfile tempdir /;
+ use File::Temp qw/ tempfile tempdir /;
$dir = tempdir( CLEANUP => 1 );
($fh, $filename) = tempfile( DIR => $dir );
my $count = 0;
until (defined(fileno(FH)) || $count++ > 100) {
$base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
+ # O_EXCL is required for security reasons.
sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
}
if (defined(fileno(FH))
=head2 How can I manipulate fixed-record-length files?
-The most efficient way is using pack() and unpack(). This is faster than
-using substr() when taking many, many strings. It is slower for just a few.
+The most efficient way is using L<pack()|perlfunc/"pack"> and
+L<unpack()|perlfunc/"unpack">. This is faster than using
+L<substr()|perlfunc/"substr"> when taking many, many strings. It is
+slower for just a few.
Here is a sample chunk of code to break up and put back together again
some fixed-format input lines, in this case from the output of a normal,
# sample input line:
# 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
- $PS_T = 'A6 A4 A7 A5 A*';
- open(PS, "ps|");
- print scalar <PS>;
- while (<PS>) {
- ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
- for $var (qw!pid tt stat time command!) {
- print "$var: <$$var>\n";
+ my $PS_T = 'A6 A4 A7 A5 A*';
+ open my $ps, '-|', 'ps';
+ print scalar <$ps>;
+ my @fields = qw( pid tt stat time command );
+ while (<$ps>) {
+ my %process;
+ @process{@fields} = unpack($PS_T, $_);
+ for my $field ( @fields ) {
+ print "$field: <$process{$field}>\n";
}
- print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command),
- "\n";
+ print 'line=', pack($PS_T, @process{@fields} ), "\n";
}
-We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
-That is, we've promoted a string to a scalar variable reference using
-symbolic references. This is okay in small programs, but doesn't scale
-well. It also only works on global variables, not lexicals.
+We've used a hash slice in order to easily handle the fields of each row.
+Storing the keys in an array means it's easy to operate on them as a
+group or loop over them with for. It also avoids polluting the program
+with global variables and using symbolic references.
=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
-The fastest, simplest, and most direct way is to localize the typeglob
-of the filehandle in question:
+As of perl5.6, open() autovivifies file and directory handles
+as references if you pass it an uninitialized scalar variable.
+You can then pass these references just like any other scalar,
+and use them in the place of named handles.
- local *TmpHandle;
+ open my $fh, $file_name;
-Typeglobs are fast (especially compared with the alternatives) and
-reasonably easy to use, but they also have one subtle drawback. If you
-had, for example, a function named TmpHandle(), or a variable named
-%TmpHandle, you just hid it from yourself.
+ open local $fh, $file_name;
- sub findme {
- local *HostFile;
- open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
- local $_; # <- VERY IMPORTANT
- while (<HostFile>) {
- print if /\b127\.(0\.0\.)?1\b/;
- }
- # *HostFile automatically closes/disappears here
- }
+ print $fh "Hello World!\n";
-Here's how to use typeglobs in a loop to open and store a bunch of
-filehandles. We'll use as values of the hash an ordered
-pair to make it easy to sort the hash in insertion order.
+ process_file( $fh );
- @names = qw(motd termcap passwd hosts);
- my $i = 0;
- foreach $filename (@names) {
- local *FH;
- open(FH, "/etc/$filename") || die "$filename: $!";
- $file{$filename} = [ $i++, *FH ];
- }
+Before perl5.6, you had to deal with various typeglob idioms
+which you may see in older code.
- # Using the filehandles in the array
- foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
- my $fh = $file{$name}[1];
- my $line = <$fh>;
- print "$name $. $line";
- }
+ open FILE, "> $filename";
+ process_typeglob( *FILE );
+ process_reference( \*FILE );
-For passing filehandles to functions, the easiest way is to
-preface them with a star, as in func(*STDIN).
-See L<perlfaq7/"Passing Filehandles"> for details.
+ sub process_typeglob { local *FH = shift; print FH "Typeglob!" }
+ sub process_reference { local $fh = shift; print $fh "Reference!" }
-If you want to create many anonymous handles, you should check out the
-Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent
-code with Symbol::gensym, which is reasonably light-weight:
-
- foreach $filename (@names) {
- use Symbol;
- my $fh = gensym();
- open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
- $file{$filename} = [ $i++, $fh ];
- }
-
-Here's using the semi-object-oriented FileHandle module, which certainly
-isn't light-weight:
-
- use FileHandle;
-
- foreach $filename (@names) {
- my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
- $file{$filename} = [ $i++, $fh ];
- }
-
-Please understand that whether the filehandle happens to be a (probably
-localized) typeglob or an anonymous handle from one of the modules
-in no way affects the bizarre rules for managing indirect handles.
-See the next question.
+If you want to create many anonymous handles, you should
+check out the Symbol or IO::Handle modules.
=head2 How can I use a filehandle indirectly?
$fh = \*SOME_FH; # ref to typeglob (bless-able)
$fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
-Or, you can use the C<new> method from the FileHandle or IO modules to
+Or, you can use the C<new> method from one of the IO::* modules to
create an anonymous filehandle, store that in a scalar variable,
and use it as though it were a normal filehandle.
- use FileHandle;
- $fh = FileHandle->new();
-
use IO::Handle; # 5.004 or higher
$fh = IO::Handle->new();
Perl is expecting a filehandle, an indirect filehandle may be used
instead. An indirect filehandle is just a scalar variable that contains
a filehandle. Functions like C<print>, C<open>, C<seek>, or
-the C<< <FH> >> diamond operator will accept either a read filehandle
+the C<< <FH> >> diamond operator will accept either a named filehandle
or a scalar variable containing one:
($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
That block is a proper block like any other, so you can put more
complicated code there. This sends the message out to one of two places:
- $ok = -x "/bin/cat";
+ $ok = -x "/bin/cat";
print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
- print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
+ print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
This approach of treating C<print> and C<printf> like object methods
calls doesn't work for the diamond operator. That's because it's a
real operator, not just a function with a comma-less argument. Assuming
you've been storing typeglobs in your structure as we did above, you
-can use the built-in function named C<readline> to reads a record just
+can use the built-in function named C<readline> to read a record just
as C<< <> >> does. Given the initialization shown above for @fd, this
-would work, but only because readline() require a typeglob. It doesn't
+would work, but only because readline() requires a typeglob. It doesn't
work with objects or strings, which might be a bug we haven't fixed yet.
$got = readline($fd[0]);
=head2 How can I output my numbers with commas added?
-This one from Benjamin Goldberg will do it for you:
+This subroutine will add commas to your number:
+
+ sub commify {
+ local $_ = shift;
+ 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
+ return $_;
+ }
+
+This regex from Benjamin Goldberg will add commas to numbers:
s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
-or written verbosely:
+It is easier to see with comments:
s/(
^[-+]? # beginning of number.
- \d{1,3}? # first digits before first comma
+ \d+? # first digits before first comma
(?= # followed by, (but not included in the match) :
(?>(?:\d{3})+) # some positive multiple of three digits.
(?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
open(FH, "+> /path/name"); # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
-doesn't exist.
+doesn't exist.
open(FH, "+< /path/name"); # open for update
To open a file without blocking, creating if necessary:
- sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
- or die "can't open /tmp/somefile: $!":
+ sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT)
+ or die "can't open /foo/somefile: $!":
Be warned that neither creation nor deletion of files is guaranteed to
be an atomic operation over NFS. That is, two processes might both
See also the new L<perlopentut> if you have it (new for 5.6).
-=head2 Why do I sometimes get an "Argument list too long" when I use <*>?
+=head2 Why do I sometimes get an "Argument list too long" when I use E<lt>*E<gt>?
The C<< <> >> operator performs a globbing operation (see above).
In Perl versions earlier than v5.6.0, the internal glob() operator forks
Normally perl ignores trailing blanks in filenames, and interprets
certain leading characters (or a trailing "|") to mean something
-special.
+special.
The three argument form of open() lets you specify the mode
separately from the filename. The open() function treats
-special mode characters and whitespace in the filename as
+special mode characters and whitespace in the filename as
literals
open FILE, "<", " file "; # filename is " file "
=head2 How can I reliably rename a file?
-If your operating system supports a proper mv(1) utility or its functional
-equivalent, this works:
+If your operating system supports a proper mv(1) utility or its
+functional equivalent, this works:
rename($old, $new) or system("mv", $old, $new);
Slavish adherence to portability concerns shouldn't get in the way of
your getting your job done.)
-For more information on file locking, see also
+For more information on file locking, see also
L<perlopentut/"File Locking"> if you have it (new for 5.6).
=back
-=head2 Why can't I just open(FH, ">file.lock")?
+=head2 Why can't I just open(FH, "E<gt>file.lock")?
A common bit of code B<NOT TO USE> is this:
atomic test-and-set instruction. In theory, this "ought" to work:
sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
- or die "can't open file.lock: $!":
+ or die "can't open file.lock: $!";
except that lamentably, file creation (and deletion) is not atomic
over NFS, so this won't work (at least, not every time) over the net.
Error checking is, as usual, left as an exercise for the reader.
-Note that utime() currently doesn't work correctly with Win95/NT
-ports. A bug has been reported. Check it carefully before using
-utime() on those platforms.
+The perldoc for utime also has an example that has the same
+effect as touch(1) on files that I<already exist>.
-=head2 How do I print to more than one file at once?
+Certain file systems have a limited ability to store the times
+on a file at the expected level of precision. For example, the
+FAT and HPFS filesystem are unable to create dates on files with
+a finer granularity than two seconds. This is a limitation of
+the filesystems, not of utime().
-If you only have to do this once, you can do this:
+=head2 How do I print to more than one file at once?
- for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
+To connect one filehandle to several output filehandles,
+you can use the IO::Tee or Tie::FileHandle::Multiplex modules.
-To connect up to one filehandle to several output filehandles, it's
-easiest to use the tee(1) program if you have it, and let it take care
-of the multiplexing:
+If you only have to do this once, you can print individually
+to each filehandle.
- open (FH, "| tee file1 file2 file3");
+ for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
-Or even:
+=head2 How can I read in an entire file all at once?
- # make STDOUT go to three files, plus original STDOUT
- open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
- print "whatever\n" or die "Writing: $!\n";
- close(STDOUT) or die "Closing: $!\n";
+You can use the File::Slurp module to do it in one step.
-Otherwise you'll have to write your own multiplexing print
-function--or your own tee program--or use Tom Christiansen's,
-at http://www.cpan.org/authors/id/TOMC/scripts/tct.gz , which is
-written in Perl and offers much greater functionality
-than the stock version.
+ use File::Slurp;
-=head2 How can I read in an entire file all at once?
+ $all_of_it = read_file($filename); # entire file in scalar
+ @all_lines = read_file($filename); # one line perl element
The customary Perl approach for processing all the lines in a file is to
do so one line at a time:
while (<INPUT>) {
chomp;
# do something with $_
- }
+ }
close(INPUT) || die "can't close $file: $!";
This is tremendously more efficient than reading the entire file into
$var = <INPUT>;
}
-That temporarily undefs your record separator, and will automatically
+That temporarily undefs your record separator, and will automatically
close the file at block exit. If the file is already open, just use this:
$var = do { local $/; <INPUT> };
for instance, gets treated as two paragraphs and not three), or
C<"\n\n"> to accept empty paragraphs.
-Note that a blank line must have no blanks in it. Thus
+Note that a blank line must have no blanks in it. Thus
S<C<"fred\n \nstuff\n\n">> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
=head2 How can I read a single character from a file? From the keyboard?
If you check L<perlfunc/open>, you'll see that several of the ways
to call open() should do the trick. For example:
- open(LOG, ">>/tmp/logfile");
+ open(LOG, ">>/foo/logfile");
open(STDERR, ">&LOG");
Or even with a literal numeric descriptor:
Note that "<&STDIN" makes a copy, but "<&=STDIN" make
an alias. That means if you close an aliased handle, all
-aliases become inaccessible. This is not true with
+aliases become inaccessible. This is not true with
a copied one.
Error checking, as always, has been left as an exercise for the reader.
Or, just use the fdopen(3S) feature of open():
- {
- local *F;
+ {
+ local *F;
open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!";
close F;
}
This is elaborately and painstakingly described in the
F<file-dir-perms> article in the "Far More Than You Ever Wanted To
-Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz .
+Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz .
The executive summary: learn how your filesystem works. The
permissions on a file say what can happen to the data in that file.
srand;
rand($.) < 1 && ($line = $_) while <>;
-This has a significant advantage in space over reading the whole
-file in. A simple proof by induction is available upon
-request if you doubt the algorithm's correctness.
+This has a significant advantage in space over reading the whole file
+in. You can find a proof of this method in I<The Art of Computer
+Programming>, Volume 2, Section 3.4.2, by Donald E. Knuth.
+
+You can use the File::Random module which provides a function
+for that algorithm:
+
+ use File::Random qw/random_line/;
+ my $line = random_line($filename);
+
+Another way is to use the Tie::File module, which treats the entire
+file as an array. Simply access a random array element.
=head2 Why do I get weird spaces when I print an array of lines?
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
-All rights reserved.
+Copyright (c) 1997-2005 Tom Christiansen, Nathan Torkington, and
+other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it
under the same terms as Perl itself.