3 perlfaq5 - Files and Formats ($Revision: 1.19 $)
7 This section deals with I/O and the "f" issues: filehandles, flushing,
10 =head2 How do I flush/unbuffer a filehandle? Why must I do this?
12 The C standard I/O library (stdio) normally buffers characters sent to
13 devices. This is done for efficiency reasons, so that there isn't a
14 system call for each byte. Any time you use print() or write() in
15 Perl, you go though this buffering. syswrite() circumvents stdio and
18 In most stdio implementations, the type of buffering and the size of
19 the buffer varies according to the type of device. Disk files are block
20 buffered, often with a buffer size of more than 2k. Pipes and sockets
21 are often buffered with a buffer size between 1/2 and 2k. Serial devices
22 (e.g. modems, terminals) are normally line-buffered, and stdio sends
23 the entire line when it gets the newline.
25 Perl does not support truly unbuffered output (except insofar as you can
26 C<syswrite(OUT, $char, 1)>). What it does instead support is "command
27 buffering", in which a physical write is performed after every output
28 command. This isn't as hard on your system as unbuffering, but does
29 get the output where you want it when you want it.
31 If you expect characters to get to your device when you print them there,
32 you'll want to autoflush its handle, as in the older:
35 open(DEV, "<+/dev/tty"); # ceci n'est pas une pipe
38 or the newer IO::* modules:
41 open(DEV, ">/dev/printer"); # but is this?
46 use IO::Socket; # this one is kinda a pipe?
47 $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
48 PeerPort => 'http(80)',
50 die "$!" unless $sock;
53 $sock->print("GET /\015\012");
54 $document = join('', $sock->getlines());
55 print "DOC IS: $document\n";
57 Note the hardcoded carriage return and newline in their octal
58 equivalents. This is the ONLY way (currently) to assure a proper
59 flush on all platforms, including Macintosh.
61 You can use select() and the C<$|> variable to control autoflushing
62 (see L<perlvar/$|> and L<perlfunc/select>):
68 You'll also see code that does this without a temporary variable, as in
70 select((select(DEV), $| = 1)[0]);
72 =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
74 Although humans have an easy time thinking of a text file as being a
75 sequence of lines that operates much like a stack of playing cards --
76 or punch cards -- computers usually see the text file as a sequence of
77 bytes. In general, there's no direct way for Perl to seek to a
78 particular line of a file, insert text into a file, or remove text
81 (There are exceptions in special circumstances. Replacing a sequence
82 of bytes with another sequence of the same length is one. Another is
83 using the C<$DB_RECNO> array bindings as documented in L<DB_File>.
84 Yet another is manipulating files with all lines the same length.)
86 The general solution is to create a temporary copy of the text file with
87 the changes you want, then copy that over the original.
90 $new = "$file.tmp.$$";
93 open(OLD, "< $old") or die "can't open $old: $!";
94 open(NEW, "> $new") or die "can't open $new: $!";
96 # Correct typos, preserving case
98 s/\b(p)earl\b/${1}erl/i;
99 (print NEW $_) or die "can't write to $new: $!";
102 close(OLD) or die "can't close $old: $!";
103 close(NEW) or die "can't close $new: $!";
105 rename($old, $bak) or die "can't rename $old to $bak: $!";
106 rename($new, $old) or die "can't rename $new to $old: $!";
108 Perl can do this sort of thing for you automatically with the C<-i>
109 command-line switch or the closely-related C<$^I> variable (see
110 L<perlrun> for more details). Note that
111 C<-i> may require a suffix on some non-Unix systems; see the
112 platform-specific documentation that came with your port.
114 # Renumber a series of tests from the command line
115 perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t
118 local($^I, @ARGV) = ('.bak', glob("*.c"));
121 print "This line should appear at the top of each file\n";
123 s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
125 close ARGV if eof; # Reset $.
128 If you need to seek to an arbitrary line of a file that changes
129 infrequently, you could build up an index of byte positions of where
130 the line ends are in the file. If the file is large, an index of
131 every tenth or hundredth line end would allow you to seek and read
132 fairly efficiently. If the file is sorted, try the look.pl library
133 (part of the standard perl distribution).
135 In the unique case of deleting lines at the end of a file, you
136 can use tell() and truncate(). The following code snippet deletes
137 the last line of a file without making a copy or reading the
138 whole file into memory:
140 open (FH, "+< $file");
141 while ( <FH> ) { $addr = tell(FH) unless eof(FH) }
144 Error checking is left as an exercise for the reader.
146 =head2 How do I count the number of lines in a file?
148 One fairly efficient way is to count newlines in the file. The
149 following program uses a feature of tr///, as documented in L<perlop>.
150 If your text file doesn't end with a newline, then it's not really a
151 proper text file, so this may report one fewer line than you expect.
154 open(FILE, $filename) or die "Can't open `$filename': $!";
155 while (sysread FILE, $buffer, 4096) {
156 $lines += ($buffer =~ tr/\n//);
160 =head2 How do I make a temporary file name?
162 Use the process ID and/or the current time-value. If you need to have
163 many temporary files in one process, use a counter:
168 my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
169 my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
173 until (defined($fh) || $count > 100) {
174 $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
175 $fh = IO::File->new($base_name, O_WRONLY|O_EXCL|O_CREAT, 0644)
178 return ($fh, $base_name);
185 Or you could simply use IO::Handle::new_tmpfile.
187 =head2 How can I manipulate fixed-record-length files?
189 The most efficient way is using pack() and unpack(). This is faster
190 than using substr(). Here is a sample chunk of code to break up and
191 put back together again some fixed-format input lines, in this case
192 from the output of a normal, Berkeley-style ps:
195 # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
196 $PS_T = 'A6 A4 A7 A5 A*';
200 ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
201 for $var (qw!pid tt stat time command!) {
202 print "$var: <$$var>\n";
204 print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command),
208 =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
210 You may have some success with typeglobs, as we always had to use
215 But while still supported, that isn't the best to go about getting
216 local filehandles. Typeglobs have their drawbacks. You may well want
217 to use the C<FileHandle> module, which creates new filehandles for you
222 my $fh = FileHandle->new();
223 open($fh, "</etc/hosts") or die "no /etc/hosts: $!";
225 print if /\b127\.(0\.0\.)?1\b/;
227 # $fh automatically closes/disappears here
230 Internally, Perl believes filehandles to be of class IO::Handle. You
231 may use that module directly if you'd like (see L<IO::Handle>), or
232 one of its more specific derived classes.
234 =head2 How can I set up a footer format to be used with write()?
236 There's no built-in way to do this, but L<perlform> has a couple of
237 techniques to make it possible for the intrepid hacker.
239 =head2 How can I write() into a string?
241 See L<perlform> for an swrite() function.
243 =head2 How can I output my numbers with commas added?
245 This one will do it for you:
249 1 while s/^(-?\d+)(\d{3})/$1,$2/;
253 $n = 23659019423.2331;
254 print "GOT: ", commify($n), "\n";
256 GOT: 23,659,019,423.2331
260 s/^(-?\d+)(\d{3})/$1,$2/g;
262 because you have to put the comma in and then recalculate your
265 =head2 How can I translate tildes (~) in a filename?
267 Use the E<lt>E<gt> (glob()) operator, documented in L<perlfunc>. This
268 requires that you have a shell installed that groks tildes, meaning
269 csh or tcsh or (some versions of) ksh, and thus may have portability
270 problems. The Glob::KGlob module (available from CPAN) gives more
271 portable glob functionality.
273 Within Perl, you may use this directly:
276 ^ ~ # find a leading tilde
278 [^/] # a non-slash character
279 * # repeated 0 or more times (0 means me)
284 : ( $ENV{HOME} || $ENV{LOGDIR} )
287 =head2 How come when I open the file read-write it wipes it out?
289 Because you're using something like this, which truncates the file and
290 I<then> gives you read-write access:
292 open(FH, "+> /path/name"); # WRONG
294 Whoops. You should instead use this, which will fail if the file
297 open(FH, "+< /path/name"); # open for update
299 If this is an issue, try:
301 sysopen(FH, "/path/name", O_RDWR|O_CREAT, 0644);
303 Error checking is left as an exercise for the reader.
305 =head2 Why do I sometimes get an "Argument list too long" when I use <*>?
307 The C<E<lt>E<gt>> operator performs a globbing operation (see above).
308 By default glob() forks csh(1) to do the actual glob expansion, but
309 csh can't handle more than 127 items and so gives the error message
310 C<Argument list too long>. People who installed tcsh as csh won't
311 have this problem, but their users may be surprised by it.
313 To get around this, either do the glob yourself with C<Dirhandle>s and
314 patterns, or use a module like Glob::KGlob, one that doesn't use the
315 shell to do globbing.
317 =head2 Is there a leak/bug in glob()?
319 Due to the current implementation on some operating systems, when you
320 use the glob() function or its angle-bracket alias in a scalar
321 context, you may cause a leak and/or unpredictable behavior. It's
322 best therefore to use glob() only in list context.
324 =head2 How can I open a file with a leading "E<gt>" or trailing blanks?
326 Normally perl ignores trailing blanks in filenames, and interprets
327 certain leading characters (or a trailing "|") to mean something
328 special. To avoid this, you might want to use a routine like this.
329 It makes incomplete pathnames into explicit relative ones, and tacks a
330 trailing null byte on the name to make perl leave it alone:
339 $fn = safe_filename("<<<something really wicked ");
340 open(FH, "> $fn") or "couldn't open $fn: $!";
342 You could also use the sysopen() function (see L<perlfunc/sysopen>).
344 =head2 How can I reliably rename a file?
346 Well, usually you just use Perl's rename() function. But that may
347 not work everywhere, in particular, renaming files across file systems.
348 If your operating system supports a mv(1) program or its moral equivalent,
351 rename($old, $new) or system("mv", $old, $new);
353 It may be more compelling to use the File::Copy module instead. You
354 just copy to the new file to the new name (checking return values),
355 then delete the old one. This isn't really the same semantics as a
356 real rename(), though, which preserves metainformation like
357 permissions, timestamps, inode info, etc.
359 =head2 How can I lock a file?
361 Perl's built-in flock() function (see L<perlfunc> for details) will call
362 flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and
363 later), and lockf(3) if neither of the two previous system calls exists.
364 On some systems, it may even use a different form of native locking.
365 Here are some gotchas with Perl's flock():
371 Produces a fatal error if none of the three system calls (or their
372 close equivalent) exists.
376 lockf(3) does not provide shared locking, and requires that the
377 filehandle be open for writing (or appending, or read/writing).
381 Some versions of flock() can't lock files over a network (e.g. on NFS
382 file systems), so you'd need to force the use of fcntl(2) when you
383 build Perl. See the flock entry of L<perlfunc>, and the F<INSTALL>
384 file in the source distribution for information on building Perl to do
389 The CPAN module File::Lock offers similar functionality and (if you
390 have dynamic loading) won't require you to rebuild perl if your
391 flock() can't lock network files.
393 =head2 What can't I just open(FH, ">file.lock")?
395 A common bit of code B<NOT TO USE> is this:
397 sleep(3) while -e "file.lock"; # PLEASE DO NOT USE
398 open(LCK, "> file.lock"); # THIS BROKEN CODE
400 This is a classic race condition: you take two steps to do something
401 which must be done in one. That's why computer hardware provides an
402 atomic test-and-set instruction. In theory, this "ought" to work:
404 sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT, 0644)
405 or die "can't open file.lock: $!":
407 except that lamentably, file creation (and deletion) is not atomic
408 over NFS, so this won't work (at least, not every time) over the net.
409 Various schemes involving involving link() have been suggested, but
410 these tend to involve busy-wait, which is also subdesirable.
412 =head2 I still don't get locking. I just want to increment the number
413 in the file. How can I do this?
415 Didn't anyone ever tell you web-page hit counters were useless?
417 Anyway, this is what to do:
420 sysopen(FH, "numfile", O_RDWR|O_CREAT, 0644) or die "can't open numfile: $!";
421 flock(FH, 2) or die "can't flock numfile: $!";
423 seek(FH, 0, 0) or die "can't rewind numfile: $!";
424 truncate(FH, 0) or die "can't truncate numfile: $!";
425 (print FH $num+1, "\n") or die "can't write numfile: $!";
426 # DO NOT UNLOCK THIS UNTIL YOU CLOSE
427 close FH or die "can't close numfile: $!";
429 Here's a much better web-page hit counter:
431 $hits = int( (time() - 850_000_000) / rand(1_000) );
433 If the count doesn't impress your friends, then the code might. :-)
435 =head2 How do I randomly update a binary file?
437 If you're just trying to patch a binary, in many cases something as
438 simple as this works:
440 perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
442 However, if you have fixed sized records, then you might do something more
445 $RECSIZE = 220; # size of record, in bytes
446 $recno = 37; # which record to update
447 open(FH, "+<somewhere") || die "can't update somewhere: $!";
448 seek(FH, $recno * $RECSIZE, 0);
449 read(FH, $record, $RECSIZE) == $RECSIZE || die "can't read record $recno: $!";
451 seek(FH, $recno * $RECSIZE, 0);
455 Locking and error checking are left as an exercise for the reader.
456 Don't forget them, or you'll be quite sorry.
458 Don't forget to set binmode() under DOS-like platforms when operating
459 on files that have anything other than straight text in them. See the
460 docs on open() and on binmode() for more details.
462 =head2 How do I get a file's timestamp in perl?
464 If you want to retrieve the time at which the file was last read,
465 written, or had its meta-data (owner, etc) changed, you use the B<-M>,
466 B<-A>, or B<-C> filetest operations as documented in L<perlfunc>. These
467 retrieve the age of the file (measured against the start-time of your
468 program) in days as a floating point number. To retrieve the "raw"
469 time in seconds since the epoch, you would call the stat function,
470 then use localtime(), gmtime(), or POSIX::strftime() to convert this
471 into human-readable form.
475 $write_secs = (stat($file))[9];
476 print "file $file updated at ", scalar(localtime($file)), "\n";
478 If you prefer something more legible, use the File::stat module
479 (part of the standard distribution in version 5.004 and later):
483 $date_string = ctime(stat($file)->mtime);
484 print "file $file updated at $date_string\n";
486 Error checking is left as an exercise for the reader.
488 =head2 How do I set a file's timestamp in perl?
490 You use the utime() function documented in L<perlfunc/utime>.
491 By way of example, here's a little program that copies the
492 read and write times from its first argument to all the rest
496 die "usage: cptimes timestamp_file other_files ...\n";
499 ($atime, $mtime) = (stat($timestamp))[8,9];
500 utime $atime, $mtime, @ARGV;
502 Error checking is left as an exercise for the reader.
504 Note that utime() currently doesn't work correctly with Win95/NT
505 ports. A bug has been reported. Check it carefully before using
506 it on those platforms.
508 =head2 How do I print to more than one file at once?
510 If you only have to do this once, you can do this:
512 for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
514 To connect up to one filehandle to several output filehandles, it's
515 easiest to use the tee(1) program if you have it, and let it take care
518 open (FH, "| tee file1 file2 file3");
520 Otherwise you'll have to write your own multiplexing print function --
521 or your own tee program -- or use Tom Christiansen's, at
522 http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is
525 In theory a IO::Tee class could be written, but to date we haven't
528 =head2 How can I read in a file by paragraphs?
530 Use the C<$\> variable (see L<perlvar> for details). You can either
531 set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
532 for instance, gets treated as two paragraphs and not three), or
533 C<"\n\n"> to accept empty paragraphs.
535 =head2 How can I read a single character from a file? From the keyboard?
537 You can use the builtin C<getc()> function for most filehandles, but
538 it won't (easily) work on a terminal device. For STDIN, either use
539 the Term::ReadKey module from CPAN, or use the sample code in
542 If your system supports POSIX, you can use the following code, which
543 you'll note turns off echo processing as well.
557 use POSIX qw(:termios_h);
559 my ($term, $oterm, $echo, $noecho, $fd_stdin);
561 $fd_stdin = fileno(STDIN);
563 $term = POSIX::Termios->new();
564 $term->getattr($fd_stdin);
565 $oterm = $term->getlflag();
567 $echo = ECHO | ECHOK | ICANON;
568 $noecho = $oterm & ~$echo;
571 $term->setlflag($noecho);
572 $term->setcc(VTIME, 1);
573 $term->setattr($fd_stdin, TCSANOW);
577 $term->setlflag($oterm);
578 $term->setcc(VTIME, 0);
579 $term->setattr($fd_stdin, TCSANOW);
585 sysread(STDIN, $key, 1);
594 The Term::ReadKey module from CPAN may be easier to use:
597 open(TTY, "</dev/tty");
598 print "Gimme a char: ";
600 $key = ReadKey 0, *TTY;
602 printf "\nYou said %s, char number %03d\n",
605 For DOS systems, Dan Carson <dbc@tc.fluke.COM> reports the following:
607 To put the PC in "raw" mode, use ioctl with some magic numbers gleaned
608 from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes
609 across the net every so often):
611 $old_ioctl = ioctl(STDIN,0,0); # Gets device info
613 ioctl(STDIN,1,$old_ioctl | 32); # Writes it back, setting bit 5
615 Then to read a single character:
617 sysread(STDIN,$c,1); # Read a single character
619 And to put the PC back to "cooked" mode:
621 ioctl(STDIN,1,$old_ioctl); # Sets it back to cooked mode.
623 So now you have $c. If C<ord($c) == 0>, you have a two byte code, which
624 means you hit a special key. Read another byte with C<sysread(STDIN,$c,1)>,
625 and that value tells you what combination it was according to this
628 # PC 2-byte keycodes = ^@ + the following:
633 # 10-19 ALT QWERTYUIOP
634 # 1E-26 ALT ASDFGHJKL
640 # 4F-53 END,DOWN,PgDn,Ins,Del
644 # 73-77 CTR LEFT,RIGHT,END,PgDn,HOME
645 # 78-83 ALT 1234567890-=
648 This is all trial and error I did a long time ago, I hope I'm reading the
651 =head2 How can I tell if there's a character waiting on a filehandle?
653 You should check out the Frequently Asked Questions list in
654 comp.unix.* for things like this: the answer is essentially the same.
655 It's very system dependent. Here's one solution that works on BSD
660 vec($rin, fileno(STDIN), 1) = 1;
661 return $nfd = select($rin,undef,undef,0);
664 You should look into getting the Term::ReadKey extension from CPAN.
666 =head2 How do I open a file without blocking?
668 You need to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module
669 in conjunction with sysopen():
672 sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
673 or die "can't open /tmp/somefile: $!":
675 =head2 How do I create a file only if it doesn't exist?
677 You need to use the O_CREAT and O_EXCL flags from the Fcntl module in
678 conjunction with sysopen():
681 sysopen(FH, "/tmp/somefile", O_WRONLY|O_EXCL|O_CREAT, 0644)
682 or die "can't open /tmp/somefile: $!":
684 Be warned that neither creation nor deletion of files is guaranteed to
685 be an atomic operation over NFS. That is, two processes might both
686 successful create or unlink the same file!
688 =head2 How do I do a C<tail -f> in perl?
694 The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
695 but it does clear the end-of-file condition on the handle, so that the
696 next <GWFILE> makes Perl try again to read something.
698 If that doesn't work (it relies on features of your stdio implementation),
699 then you need something more like this:
702 for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
703 # search for some stuff and put it into files
706 seek(GWFILE, $curpos, 0); # seek to where we had been
709 If this still doesn't work, look into the POSIX module. POSIX defines
710 the clearerr() method, which can remove the end of file condition on a
711 filehandle. The method: read until end of file, clearerr(), read some
712 more. Lather, rinse, repeat.
714 =head2 How do I dup() a filehandle in Perl?
716 If you check L<perlfunc/open>, you'll see that several of the ways
717 to call open() should do the trick. For example:
719 open(LOG, ">>/tmp/logfile");
720 open(STDERR, ">&LOG");
722 Or even with a literal numeric descriptor:
724 $fd = $ENV{MHCONTEXTFD};
725 open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)
727 Error checking has been left as an exercise for the reader.
729 =head2 How do I close a file descriptor by number?
731 This should rarely be necessary, as the Perl close() function is to be
732 used for things that Perl opened itself, even if it was a dup of a
733 numeric descriptor, as with MHCONTEXT above. But if you really have
734 to, you may be able to do this:
736 require 'sys/syscall.ph';
737 $rc = syscall(&SYS_close, $fd + 0); # must force numeric
738 die "can't sysclose $fd: $!" unless $rc == -1;
740 =head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work?
742 Whoops! You just put a tab and a formfeed into that filename!
743 Remember that within double quoted strings ("like\this"), the
744 backslash is an escape character. The full list of these is in
745 L<perlop/Quote and Quote-like Operators>. Unsurprisingly, you don't
746 have a file called "c:(tab)emp(formfeed)oo" or
747 "c:(tab)emp(formfeed)oo.exe" on your DOS filesystem.
749 Either single-quote your strings, or (preferably) use forward slashes.
750 Since all DOS and Windows versions since something like MS-DOS 2.0 or so
751 have treated C</> and C<\> the same in a path, you might as well use the
752 one that doesn't clash with Perl -- or the POSIX shell, ANSI C and C++,
753 awk, Tcl, Java, or Python, just to mention a few.
755 =head2 Why doesn't glob("*.*") get all the files?
757 Because even on non-Unix ports, Perl's glob function follows standard
758 Unix globbing semantics. You'll need C<glob("*")> to get all (non-hidden)
761 =head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
763 This is elaborately and painstakingly described in the "Far More Than
764 You Every Wanted To Know" in
765 http://www.perl.com/CPAN/doc/FMTEYEWTK/file-dir-perms .
767 The executive summary: learn how your filesystem works. The
768 permissions on a file say what can happen to the data in that file.
769 The permissions on a directory say what can happen to the list of
770 files in that directory. If you delete a file, you're removing its
771 name from the directory (so the operation depends on the permissions
772 of the directory, not of the file). If you try to write to the file,
773 the permissions of the file govern whether you're allowed to.
775 =head2 How do I select a random line from a file?
777 Here's an algorithm from the Camel Book:
780 rand($.) < 1 && ($line = $_) while <>;
782 This has a significant advantage in space over reading the whole
785 =head1 AUTHOR AND COPYRIGHT
787 Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
788 All rights reserved. See L<perlfaq> for distribution information.