Add information about Scalar::Util::blessed.
[p5sagit/p5-mst-13.2.git] / pod / perlfaq5.pod
CommitLineData
68dc0745 1=head1 NAME
2
d92eb7b0 3perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $)
68dc0745 4
5=head1 DESCRIPTION
6
7This section deals with I/O and the "f" issues: filehandles, flushing,
8formats, and footers.
9
5a964f20 10=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
68dc0745 11
12The C standard I/O library (stdio) normally buffers characters sent to
a6dd486b 13devices. This is done for efficiency reasons so that there isn't a
68dc0745 14system call for each byte. Any time you use print() or write() in
15Perl, you go though this buffering. syswrite() circumvents stdio and
16buffering.
17
5a964f20 18In most stdio implementations, the type of output buffering and the size of
68dc0745 19the buffer varies according to the type of device. Disk files are block
20buffered, often with a buffer size of more than 2k. Pipes and sockets
21are often buffered with a buffer size between 1/2 and 2k. Serial devices
22(e.g. modems, terminals) are normally line-buffered, and stdio sends
23the entire line when it gets the newline.
24
25Perl does not support truly unbuffered output (except insofar as you can
26C<syswrite(OUT, $char, 1)>). What it does instead support is "command
27buffering", in which a physical write is performed after every output
28command. This isn't as hard on your system as unbuffering, but does
29get the output where you want it when you want it.
30
31If you expect characters to get to your device when you print them there,
5a964f20 32you'll want to autoflush its handle.
33Use select() and the C<$|> variable to control autoflushing
34(see L<perlvar/$|> and L<perlfunc/select>):
35
36 $old_fh = select(OUTPUT_HANDLE);
37 $| = 1;
38 select($old_fh);
39
40Or using the traditional idiom:
41
42 select((select(OUTPUT_HANDLE), $| = 1)[0]);
43
44Or if don't mind slowly loading several thousand lines of module code
45just because you're afraid of the C<$|> variable:
68dc0745 46
47 use FileHandle;
5a964f20 48 open(DEV, "+</dev/tty"); # ceci n'est pas une pipe
68dc0745 49 DEV->autoflush(1);
50
51or the newer IO::* modules:
52
53 use IO::Handle;
54 open(DEV, ">/dev/printer"); # but is this?
55 DEV->autoflush(1);
56
57or even this:
58
59 use IO::Socket; # this one is kinda a pipe?
60 $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
61 PeerPort => 'http(80)',
62 Proto => 'tcp');
63 die "$!" unless $sock;
64
65 $sock->autoflush();
5a964f20 66 print $sock "GET / HTTP/1.0" . "\015\012" x 2;
67 $document = join('', <$sock>);
68dc0745 68 print "DOC IS: $document\n";
69
5a964f20 70Note the bizarrely hardcoded carriage return and newline in their octal
71equivalents. This is the ONLY way (currently) to assure a proper flush
d92eb7b0 72on all platforms, including Macintosh. That's the way things work in
5a964f20 73network programming: you really should specify the exact bit pattern
74on the network line terminator. In practice, C<"\n\n"> often works,
75but this is not portable.
68dc0745 76
5a964f20 77See L<perlfaq9> for other examples of fetching URLs over the web.
68dc0745 78
79=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
80
65acb1b1 81Those are operations of a text editor. Perl is not a text editor.
82Perl is a programming language. You have to decompose the problem into
83low-level calls to read, write, open, close, and seek.
84
68dc0745 85Although humans have an easy time thinking of a text file as being a
a6dd486b 86sequence of lines that operates much like a stack of playing cards--or
87punch cards--computers usually see the text file as a sequence of bytes.
65acb1b1 88In general, there's no direct way for Perl to seek to a particular line
89of a file, insert text into a file, or remove text from a file.
68dc0745 90
a6dd486b 91(There are exceptions in special circumstances. You can add or remove
92data at the very end of the file. A sequence of bytes can be replaced
93with another sequence of the same length. The C<$DB_RECNO> array
94bindings as documented in L<DB_File> also provide a direct way of
95modifying a file. Files where all lines are the same length are also
96easy to alter.)
68dc0745 97
98The general solution is to create a temporary copy of the text file with
5a964f20 99the changes you want, then copy that over the original. This assumes
100no locking.
68dc0745 101
102 $old = $file;
103 $new = "$file.tmp.$$";
65acb1b1 104 $bak = "$file.orig";
68dc0745 105
106 open(OLD, "< $old") or die "can't open $old: $!";
107 open(NEW, "> $new") or die "can't open $new: $!";
108
109 # Correct typos, preserving case
110 while (<OLD>) {
111 s/\b(p)earl\b/${1}erl/i;
112 (print NEW $_) or die "can't write to $new: $!";
113 }
114
115 close(OLD) or die "can't close $old: $!";
116 close(NEW) or die "can't close $new: $!";
117
118 rename($old, $bak) or die "can't rename $old to $bak: $!";
119 rename($new, $old) or die "can't rename $new to $old: $!";
120
121Perl can do this sort of thing for you automatically with the C<-i>
46fc3d4c 122command-line switch or the closely-related C<$^I> variable (see
68dc0745 123L<perlrun> for more details). Note that
124C<-i> may require a suffix on some non-Unix systems; see the
125platform-specific documentation that came with your port.
126
127 # Renumber a series of tests from the command line
128 perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t
129
130 # form a script
65acb1b1 131 local($^I, @ARGV) = ('.orig', glob("*.c"));
68dc0745 132 while (<>) {
133 if ($. == 1) {
134 print "This line should appear at the top of each file\n";
135 }
136 s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
137 print;
138 close ARGV if eof; # Reset $.
139 }
140
141If you need to seek to an arbitrary line of a file that changes
142infrequently, you could build up an index of byte positions of where
143the line ends are in the file. If the file is large, an index of
144every tenth or hundredth line end would allow you to seek and read
145fairly efficiently. If the file is sorted, try the look.pl library
146(part of the standard perl distribution).
147
148In the unique case of deleting lines at the end of a file, you
149can use tell() and truncate(). The following code snippet deletes
150the last line of a file without making a copy or reading the
151whole file into memory:
152
153 open (FH, "+< $file");
54310121 154 while ( <FH> ) { $addr = tell(FH) unless eof(FH) }
68dc0745 155 truncate(FH, $addr);
156
157Error checking is left as an exercise for the reader.
158
159=head2 How do I count the number of lines in a file?
160
161One fairly efficient way is to count newlines in the file. The
162following program uses a feature of tr///, as documented in L<perlop>.
163If your text file doesn't end with a newline, then it's not really a
164proper text file, so this may report one fewer line than you expect.
165
166 $lines = 0;
167 open(FILE, $filename) or die "Can't open `$filename': $!";
168 while (sysread FILE, $buffer, 4096) {
169 $lines += ($buffer =~ tr/\n//);
170 }
171 close FILE;
172
5a964f20 173This assumes no funny games with newline translations.
174
68dc0745 175=head2 How do I make a temporary file name?
176
5a964f20 177Use the C<new_tmpfile> class method from the IO::File module to get a
a6dd486b 178filehandle opened for reading and writing. Use it if you don't
179need to know the file's name:
68dc0745 180
65acb1b1 181 use IO::File;
5a964f20 182 $fh = IO::File->new_tmpfile()
65acb1b1 183 or die "Unable to make new temporary file: $!";
5a964f20 184
a6dd486b 185If you do need to know the file's name, you can use the C<tmpnam>
186function from the POSIX module to get a filename that you then open
187yourself:
188
5a964f20 189
190 use Fcntl;
191 use POSIX qw(tmpnam);
192
193 # try new temporary filenames until we get one that didn't already
194 # exist; the check should be unnecessary, but you can't be too careful
195 do { $name = tmpnam() }
196 until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
197
198 # install atexit-style handler so that when we exit or die,
199 # we automatically delete this temporary file
200 END { unlink($name) or die "Couldn't unlink $name : $!" }
201
202 # now go on to use the file ...
203
a6dd486b 204If you're committed to creating a temporary file by hand, use the
205process ID and/or the current time-value. If you need to have many
206temporary files in one process, use a counter:
5a964f20 207
208 BEGIN {
68dc0745 209 use Fcntl;
210 my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
211 my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
212 sub temp_file {
5a964f20 213 local *FH;
68dc0745 214 my $count = 0;
5a964f20 215 until (defined(fileno(FH)) || $count++ > 100) {
68dc0745 216 $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
5a964f20 217 sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
68dc0745 218 }
5a964f20 219 if (defined(fileno(FH))
220 return (*FH, $base_name);
68dc0745 221 } else {
222 return ();
223 }
224 }
225 }
226
68dc0745 227=head2 How can I manipulate fixed-record-length files?
228
5a964f20 229The most efficient way is using pack() and unpack(). This is faster than
65acb1b1 230using substr() when taking many, many strings. It is slower for just a few.
5a964f20 231
232Here is a sample chunk of code to break up and put back together again
233some fixed-format input lines, in this case from the output of a normal,
234Berkeley-style ps:
68dc0745 235
236 # sample input line:
237 # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
238 $PS_T = 'A6 A4 A7 A5 A*';
239 open(PS, "ps|");
5a964f20 240 print scalar <PS>;
68dc0745 241 while (<PS>) {
242 ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
243 for $var (qw!pid tt stat time command!) {
244 print "$var: <$$var>\n";
245 }
246 print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command),
247 "\n";
248 }
249
5a964f20 250We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
251That is, we've promoted a string to a scalar variable reference using
252symbolic references. This is ok in small programs, but doesn't scale
253well. It also only works on global variables, not lexicals.
254
68dc0745 255=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
256
5a964f20 257The fastest, simplest, and most direct way is to localize the typeglob
258of the filehandle in question:
68dc0745 259
5a964f20 260 local *TmpHandle;
68dc0745 261
5a964f20 262Typeglobs are fast (especially compared with the alternatives) and
263reasonably easy to use, but they also have one subtle drawback. If you
264had, for example, a function named TmpHandle(), or a variable named
265%TmpHandle, you just hid it from yourself.
68dc0745 266
68dc0745 267 sub findme {
5a964f20 268 local *HostFile;
269 open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
270 local $_; # <- VERY IMPORTANT
271 while (<HostFile>) {
68dc0745 272 print if /\b127\.(0\.0\.)?1\b/;
273 }
5a964f20 274 # *HostFile automatically closes/disappears here
275 }
276
a6dd486b 277Here's how to use typeglobs in a loop to open and store a bunch of
5a964f20 278filehandles. We'll use as values of the hash an ordered
279pair to make it easy to sort the hash in insertion order.
280
281 @names = qw(motd termcap passwd hosts);
282 my $i = 0;
283 foreach $filename (@names) {
284 local *FH;
285 open(FH, "/etc/$filename") || die "$filename: $!";
286 $file{$filename} = [ $i++, *FH ];
68dc0745 287 }
288
5a964f20 289 # Using the filehandles in the array
290 foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
291 my $fh = $file{$name}[1];
292 my $line = <$fh>;
293 print "$name $. $line";
294 }
295
c8db1d39 296For passing filehandles to functions, the easiest way is to
13a2d996 297preface them with a star, as in func(*STDIN).
298See L<perlfaq7/"Passing Filehandles"> for details.
c8db1d39 299
65acb1b1 300If you want to create many anonymous handles, you should check out the
5a964f20 301Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent
302code with Symbol::gensym, which is reasonably light-weight:
303
304 foreach $filename (@names) {
305 use Symbol;
306 my $fh = gensym();
307 open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
308 $file{$filename} = [ $i++, $fh ];
309 }
68dc0745 310
a6dd486b 311Here's using the semi-object-oriented FileHandle module, which certainly
65acb1b1 312isn't light-weight:
46fc3d4c 313
314 use FileHandle;
315
46fc3d4c 316 foreach $filename (@names) {
5a964f20 317 my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
318 $file{$filename} = [ $i++, $fh ];
46fc3d4c 319 }
320
5a964f20 321Please understand that whether the filehandle happens to be a (probably
a6dd486b 322localized) typeglob or an anonymous handle from one of the modules
5a964f20 323in no way affects the bizarre rules for managing indirect handles.
324See the next question.
325
326=head2 How can I use a filehandle indirectly?
327
328An indirect filehandle is using something other than a symbol
329in a place that a filehandle is expected. Here are ways
a6dd486b 330to get indirect filehandles:
5a964f20 331
332 $fh = SOME_FH; # bareword is strict-subs hostile
333 $fh = "SOME_FH"; # strict-refs hostile; same package only
334 $fh = *SOME_FH; # typeglob
335 $fh = \*SOME_FH; # ref to typeglob (bless-able)
336 $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
337
a6dd486b 338Or, you can use the C<new> method from the FileHandle or IO modules to
5a964f20 339create an anonymous filehandle, store that in a scalar variable,
340and use it as though it were a normal filehandle.
341
342 use FileHandle;
343 $fh = FileHandle->new();
344
345 use IO::Handle; # 5.004 or higher
346 $fh = IO::Handle->new();
347
348Then use any of those as you would a normal filehandle. Anywhere that
349Perl is expecting a filehandle, an indirect filehandle may be used
350instead. An indirect filehandle is just a scalar variable that contains
368c9434 351a filehandle. Functions like C<print>, C<open>, C<seek>, or
c47ff5f1 352the C<< <FH> >> diamond operator will accept either a read filehandle
5a964f20 353or a scalar variable containing one:
354
355 ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
356 print $ofh "Type it: ";
357 $got = <$ifh>
358 print $efh "What was that: $got";
359
368c9434 360If you're passing a filehandle to a function, you can write
5a964f20 361the function in two ways:
362
363 sub accept_fh {
364 my $fh = shift;
365 print $fh "Sending to indirect filehandle\n";
46fc3d4c 366 }
367
5a964f20 368Or it can localize a typeglob and use the filehandle directly:
46fc3d4c 369
5a964f20 370 sub accept_fh {
371 local *FH = shift;
372 print FH "Sending to localized filehandle\n";
46fc3d4c 373 }
374
5a964f20 375Both styles work with either objects or typeglobs of real filehandles.
376(They might also work with strings under some circumstances, but this
377is risky.)
378
379 accept_fh(*STDOUT);
380 accept_fh($handle);
381
382In the examples above, we assigned the filehandle to a scalar variable
a6dd486b 383before using it. That is because only simple scalar variables, not
384expressions or subscripts of hashes or arrays, can be used with
385built-ins like C<print>, C<printf>, or the diamond operator. Using
386something other than a simple scalar varaible as a filehandle is
5a964f20 387illegal and won't even compile:
388
389 @fd = (*STDIN, *STDOUT, *STDERR);
390 print $fd[1] "Type it: "; # WRONG
391 $got = <$fd[0]> # WRONG
392 print $fd[2] "What was that: $got"; # WRONG
393
394With C<print> and C<printf>, you get around this by using a block and
395an expression where you would place the filehandle:
396
397 print { $fd[1] } "funny stuff\n";
398 printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
399 # Pity the poor deadbeef.
400
401That block is a proper block like any other, so you can put more
402complicated code there. This sends the message out to one of two places:
403
404 $ok = -x "/bin/cat";
405 print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
406 print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
407
408This approach of treating C<print> and C<printf> like object methods
409calls doesn't work for the diamond operator. That's because it's a
410real operator, not just a function with a comma-less argument. Assuming
411you've been storing typeglobs in your structure as we did above, you
412can use the built-in function named C<readline> to reads a record just
c47ff5f1 413as C<< <> >> does. Given the initialization shown above for @fd, this
5a964f20 414would work, but only because readline() require a typeglob. It doesn't
415work with objects or strings, which might be a bug we haven't fixed yet.
416
417 $got = readline($fd[0]);
418
419Let it be noted that the flakiness of indirect filehandles is not
420related to whether they're strings, typeglobs, objects, or anything else.
421It's the syntax of the fundamental operators. Playing the object
422game doesn't help you at all here.
46fc3d4c 423
68dc0745 424=head2 How can I set up a footer format to be used with write()?
425
54310121 426There's no builtin way to do this, but L<perlform> has a couple of
68dc0745 427techniques to make it possible for the intrepid hacker.
428
429=head2 How can I write() into a string?
430
65acb1b1 431See L<perlform/"Accessing Formatting Internals"> for an swrite() function.
68dc0745 432
433=head2 How can I output my numbers with commas added?
434
435This one will do it for you:
436
437 sub commify {
438 local $_ = shift;
65acb1b1 439 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
68dc0745 440 return $_;
441 }
442
443 $n = 23659019423.2331;
444 print "GOT: ", commify($n), "\n";
445
446 GOT: 23,659,019,423.2331
447
448You can't just:
449
65acb1b1 450 s/^([-+]?\d+)(\d{3})/$1,$2/g;
68dc0745 451
452because you have to put the comma in and then recalculate your
453position.
454
a6dd486b 455Alternatively, this code commifies all numbers in a line regardless of
46fc3d4c 456whether they have decimal portions, are preceded by + or -, or
457whatever:
458
459 # from Andrew Johnson <ajohnson@gpu.srv.ualberta.ca>
460 sub commify {
461 my $input = shift;
462 $input = reverse $input;
463 $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
65acb1b1 464 return scalar reverse $input;
46fc3d4c 465 }
466
68dc0745 467=head2 How can I translate tildes (~) in a filename?
468
575cc754 469Use the <> (glob()) operator, documented in L<perlfunc>. Older
470versions of Perl require that you have a shell installed that groks
471tildes. Recent perl versions have this feature built in. The
472Glob::KGlob module (available from CPAN) gives more portable glob
473functionality.
68dc0745 474
475Within Perl, you may use this directly:
476
477 $filename =~ s{
478 ^ ~ # find a leading tilde
479 ( # save this in $1
480 [^/] # a non-slash character
481 * # repeated 0 or more times (0 means me)
482 )
483 }{
484 $1
485 ? (getpwnam($1))[7]
486 : ( $ENV{HOME} || $ENV{LOGDIR} )
487 }ex;
488
5a964f20 489=head2 How come when I open a file read-write it wipes it out?
68dc0745 490
491Because you're using something like this, which truncates the file and
492I<then> gives you read-write access:
493
5a964f20 494 open(FH, "+> /path/name"); # WRONG (almost always)
68dc0745 495
496Whoops. You should instead use this, which will fail if the file
d92eb7b0 497doesn't exist.
498
499 open(FH, "+< /path/name"); # open for update
500
c47ff5f1 501Using ">" always clobbers or creates. Using "<" never does
d92eb7b0 502either. The "+" doesn't change this.
68dc0745 503
5a964f20 504Here are examples of many kinds of file opens. Those using sysopen()
505all assume
68dc0745 506
5a964f20 507 use Fcntl;
68dc0745 508
5a964f20 509To open file for reading:
68dc0745 510
5a964f20 511 open(FH, "< $path") || die $!;
512 sysopen(FH, $path, O_RDONLY) || die $!;
513
514To open file for writing, create new file if needed or else truncate old file:
515
516 open(FH, "> $path") || die $!;
517 sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT) || die $!;
518 sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666) || die $!;
519
520To open file for writing, create new file, file must not exist:
521
522 sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT) || die $!;
523 sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666) || die $!;
524
525To open file for appending, create if necessary:
526
527 open(FH, ">> $path") || die $!;
528 sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT) || die $!;
529 sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;
530
531To open file for appending, file must exist:
532
533 sysopen(FH, $path, O_WRONLY|O_APPEND) || die $!;
534
535To open file for update, file must exist:
536
537 open(FH, "+< $path") || die $!;
538 sysopen(FH, $path, O_RDWR) || die $!;
539
540To open file for update, create file if necessary:
541
542 sysopen(FH, $path, O_RDWR|O_CREAT) || die $!;
543 sysopen(FH, $path, O_RDWR|O_CREAT, 0666) || die $!;
544
545To open file for update, file must not exist:
546
547 sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT) || die $!;
548 sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666) || die $!;
549
550To open a file without blocking, creating if necessary:
551
552 sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
553 or die "can't open /tmp/somefile: $!":
554
555Be warned that neither creation nor deletion of files is guaranteed to
556be an atomic operation over NFS. That is, two processes might both
a6dd486b 557successfully create or unlink the same file! Therefore O_EXCL
558isn't as exclusive as you might wish.
68dc0745 559
87275199 560See also the new L<perlopentut> if you have it (new for 5.6).
65acb1b1 561
c47ff5f1 562=head2 Why do I sometimes get an "Argument list too long" when I use <*>?
68dc0745 563
c47ff5f1 564The C<< <> >> operator performs a globbing operation (see above).
3a4b19e4 565In Perl versions earlier than v5.6.0, the internal glob() operator forks
566csh(1) to do the actual glob expansion, but
68dc0745 567csh can't handle more than 127 items and so gives the error message
568C<Argument list too long>. People who installed tcsh as csh won't
569have this problem, but their users may be surprised by it.
570
3a4b19e4 571To get around this, either upgrade to Perl v5.6.0 or later, do the glob
572yourself with readdir() and patterns, or use a module like Glob::KGlob,
573one that doesn't use the shell to do globbing.
68dc0745 574
575=head2 Is there a leak/bug in glob()?
576
577Due to the current implementation on some operating systems, when you
578use the glob() function or its angle-bracket alias in a scalar
a6dd486b 579context, you may cause a memory leak and/or unpredictable behavior. It's
68dc0745 580best therefore to use glob() only in list context.
581
c47ff5f1 582=head2 How can I open a file with a leading ">" or trailing blanks?
68dc0745 583
584Normally perl ignores trailing blanks in filenames, and interprets
585certain leading characters (or a trailing "|") to mean something
a6dd486b 586special. To avoid this, you might want to use a routine like the one below.
587It turns incomplete pathnames into explicit relative ones, and tacks a
68dc0745 588trailing null byte on the name to make perl leave it alone:
589
590 sub safe_filename {
591 local $_ = shift;
65acb1b1 592 s#^([^./])#./$1#;
593 $_ .= "\0";
594 return $_;
68dc0745 595 }
596
65acb1b1 597 $badpath = "<<<something really wicked ";
598 $fn = safe_filename($badpath");
599 open(FH, "> $fn") or "couldn't open $badpath: $!";
600
601This assumes that you are using POSIX (portable operating systems
602interface) paths. If you are on a closed, non-portable, proprietary
603system, you may have to adjust the C<"./"> above.
604
605It would be a lot clearer to use sysopen(), though:
606
607 use Fcntl;
608 $badpath = "<<<something really wicked ";
a6dd486b 609 sysopen (FH, $badpath, O_WRONLY | O_CREAT | O_TRUNC)
65acb1b1 610 or die "can't open $badpath: $!";
68dc0745 611
65acb1b1 612For more information, see also the new L<perlopentut> if you have it
87275199 613(new for 5.6).
68dc0745 614
615=head2 How can I reliably rename a file?
616
a6dd486b 617Well, usually you just use Perl's rename() function. That may not
618work everywhere, though, particularly when renaming files across file systems.
d92eb7b0 619Some sub-Unix systems have broken ports that corrupt the semantics of
a6dd486b 620rename()--for example, WinNT does this right, but Win95 and Win98
d92eb7b0 621are broken. (The last two parts are not surprising, but the first is. :-)
622
623If your operating system supports a proper mv(1) program or its moral
624equivalent, this works:
68dc0745 625
626 rename($old, $new) or system("mv", $old, $new);
627
628It may be more compelling to use the File::Copy module instead. You
629just copy to the new file to the new name (checking return values),
a6dd486b 630then delete the old one. This isn't really the same semantically as a
68dc0745 631real rename(), though, which preserves metainformation like
632permissions, timestamps, inode info, etc.
633
a6dd486b 634Newer versions of File::Copy exports a move() function.
5a964f20 635
68dc0745 636=head2 How can I lock a file?
637
54310121 638Perl's builtin flock() function (see L<perlfunc> for details) will call
68dc0745 639flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and
640later), and lockf(3) if neither of the two previous system calls exists.
641On some systems, it may even use a different form of native locking.
642Here are some gotchas with Perl's flock():
643
644=over 4
645
646=item 1
647
648Produces a fatal error if none of the three system calls (or their
649close equivalent) exists.
650
651=item 2
652
653lockf(3) does not provide shared locking, and requires that the
654filehandle be open for writing (or appending, or read/writing).
655
656=item 3
657
d92eb7b0 658Some versions of flock() can't lock files over a network (e.g. on NFS file
659systems), so you'd need to force the use of fcntl(2) when you build Perl.
a6dd486b 660But even this is dubious at best. See the flock entry of L<perlfunc>
d92eb7b0 661and the F<INSTALL> file in the source distribution for information on
662building Perl to do this.
663
664Two potentially non-obvious but traditional flock semantics are that
a6dd486b 665it waits indefinitely until the lock is granted, and that its locks are
d92eb7b0 666I<merely advisory>. Such discretionary locks are more flexible, but
667offer fewer guarantees. This means that files locked with flock() may
668be modified by programs that do not also use flock(). Cars that stop
669for red lights get on well with each other, but not with cars that don't
670stop for red lights. See the perlport manpage, your port's specific
671documentation, or your system-specific local manpages for details. It's
672best to assume traditional behavior if you're writing portable programs.
a6dd486b 673(If you're not, you should as always feel perfectly free to write
d92eb7b0 674for your own system's idiosyncrasies (sometimes called "features").
675Slavish adherence to portability concerns shouldn't get in the way of
676your getting your job done.)
68dc0745 677
13a2d996 678For more information on file locking, see also
679L<perlopentut/"File Locking"> if you have it (new for 5.6).
65acb1b1 680
68dc0745 681=back
682
65acb1b1 683=head2 Why can't I just open(FH, ">file.lock")?
68dc0745 684
685A common bit of code B<NOT TO USE> is this:
686
687 sleep(3) while -e "file.lock"; # PLEASE DO NOT USE
688 open(LCK, "> file.lock"); # THIS BROKEN CODE
689
690This is a classic race condition: you take two steps to do something
691which must be done in one. That's why computer hardware provides an
692atomic test-and-set instruction. In theory, this "ought" to work:
693
5a964f20 694 sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
68dc0745 695 or die "can't open file.lock: $!":
696
697except that lamentably, file creation (and deletion) is not atomic
698over NFS, so this won't work (at least, not every time) over the net.
65acb1b1 699Various schemes involving link() have been suggested, but
46fc3d4c 700these tend to involve busy-wait, which is also subdesirable.
68dc0745 701
fc36a67e 702=head2 I still don't get locking. I just want to increment the number in the file. How can I do this?
68dc0745 703
46fc3d4c 704Didn't anyone ever tell you web-page hit counters were useless?
5a964f20 705They don't count number of hits, they're a waste of time, and they serve
a6dd486b 706only to stroke the writer's vanity. It's better to pick a random number;
707they're more realistic.
68dc0745 708
5a964f20 709Anyway, this is what you can do if you can't help yourself.
68dc0745 710
e2c57c3e 711 use Fcntl qw(:DEFAULT :flock);
5a964f20 712 sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can't open numfile: $!";
65acb1b1 713 flock(FH, LOCK_EX) or die "can't flock numfile: $!";
68dc0745 714 $num = <FH> || 0;
715 seek(FH, 0, 0) or die "can't rewind numfile: $!";
716 truncate(FH, 0) or die "can't truncate numfile: $!";
717 (print FH $num+1, "\n") or die "can't write numfile: $!";
68dc0745 718 close FH or die "can't close numfile: $!";
719
46fc3d4c 720Here's a much better web-page hit counter:
68dc0745 721
722 $hits = int( (time() - 850_000_000) / rand(1_000) );
723
724If the count doesn't impress your friends, then the code might. :-)
725
f52f3be2 726=head2 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
05caf3a7 727
728If you are on a system that correctly implements flock() and you use the
729example appending code from "perldoc -f flock" everything will be OK
730even if the OS you are on doesn't implement append mode correctly (if
731such a system exists.) So if you are happy to restrict yourself to OSs
732that implement flock() (and that's not really much of a restriction)
733then that is what you should do.
734
735If you know you are only going to use a system that does correctly
736implement appending (i.e. not Win32) then you can omit the seek() from
737the above code.
738
739If you know you are only writing code to run on an OS and filesystem that
740does implement append mode correctly (a local filesystem on a modern
741Unix for example), and you keep the file in block-buffered mode and you
742write less than one buffer-full of output between each manual flushing
743of the buffer then each bufferload is almost garanteed to be written to
744the end of the file in one chunk without getting intermingled with
745anyone else's output. You can also use the syswrite() function which is
746simply a wrapper around your systems write(2) system call.
747
748There is still a small theoretical chance that a signal will interrupt
749the system level write() operation before completion. There is also a
750possibility that some STDIO implementations may call multiple system
751level write()s even if the buffer was empty to start. There may be some
752systems where this probability is reduced to zero.
753
68dc0745 754=head2 How do I randomly update a binary file?
755
756If you're just trying to patch a binary, in many cases something as
757simple as this works:
758
759 perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
760
761However, if you have fixed sized records, then you might do something more
762like this:
763
764 $RECSIZE = 220; # size of record, in bytes
765 $recno = 37; # which record to update
766 open(FH, "+<somewhere") || die "can't update somewhere: $!";
767 seek(FH, $recno * $RECSIZE, 0);
768 read(FH, $record, $RECSIZE) == $RECSIZE || die "can't read record $recno: $!";
769 # munge the record
65acb1b1 770 seek(FH, -$RECSIZE, 1);
68dc0745 771 print FH $record;
772 close FH;
773
774Locking and error checking are left as an exercise for the reader.
a6dd486b 775Don't forget them or you'll be quite sorry.
68dc0745 776
68dc0745 777=head2 How do I get a file's timestamp in perl?
778
779If you want to retrieve the time at which the file was last read,
46fc3d4c 780written, or had its meta-data (owner, etc) changed, you use the B<-M>,
68dc0745 781B<-A>, or B<-C> filetest operations as documented in L<perlfunc>. These
782retrieve the age of the file (measured against the start-time of your
783program) in days as a floating point number. To retrieve the "raw"
784time in seconds since the epoch, you would call the stat function,
785then use localtime(), gmtime(), or POSIX::strftime() to convert this
786into human-readable form.
787
788Here's an example:
789
790 $write_secs = (stat($file))[9];
c8db1d39 791 printf "file %s updated at %s\n", $file,
792 scalar localtime($write_secs);
68dc0745 793
794If you prefer something more legible, use the File::stat module
795(part of the standard distribution in version 5.004 and later):
796
65acb1b1 797 # error checking left as an exercise for reader.
68dc0745 798 use File::stat;
799 use Time::localtime;
800 $date_string = ctime(stat($file)->mtime);
801 print "file $file updated at $date_string\n";
802
65acb1b1 803The POSIX::strftime() approach has the benefit of being,
804in theory, independent of the current locale. See L<perllocale>
805for details.
68dc0745 806
807=head2 How do I set a file's timestamp in perl?
808
809You use the utime() function documented in L<perlfunc/utime>.
810By way of example, here's a little program that copies the
811read and write times from its first argument to all the rest
812of them.
813
814 if (@ARGV < 2) {
815 die "usage: cptimes timestamp_file other_files ...\n";
816 }
817 $timestamp = shift;
818 ($atime, $mtime) = (stat($timestamp))[8,9];
819 utime $atime, $mtime, @ARGV;
820
65acb1b1 821Error checking is, as usual, left as an exercise for the reader.
68dc0745 822
823Note that utime() currently doesn't work correctly with Win95/NT
824ports. A bug has been reported. Check it carefully before using
a6dd486b 825utime() on those platforms.
68dc0745 826
827=head2 How do I print to more than one file at once?
828
829If you only have to do this once, you can do this:
830
831 for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
832
833To connect up to one filehandle to several output filehandles, it's
834easiest to use the tee(1) program if you have it, and let it take care
835of the multiplexing:
836
837 open (FH, "| tee file1 file2 file3");
838
5a964f20 839Or even:
840
841 # make STDOUT go to three files, plus original STDOUT
842 open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
843 print "whatever\n" or die "Writing: $!\n";
844 close(STDOUT) or die "Closing: $!\n";
68dc0745 845
5a964f20 846Otherwise you'll have to write your own multiplexing print
a6dd486b 847function--or your own tee program--or use Tom Christiansen's,
848at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz , which is
5a964f20 849written in Perl and offers much greater functionality
850than the stock version.
68dc0745 851
d92eb7b0 852=head2 How can I read in an entire file all at once?
853
854The customary Perl approach for processing all the lines in a file is to
855do so one line at a time:
856
857 open (INPUT, $file) || die "can't open $file: $!";
858 while (<INPUT>) {
859 chomp;
860 # do something with $_
861 }
862 close(INPUT) || die "can't close $file: $!";
863
864This is tremendously more efficient than reading the entire file into
865memory as an array of lines and then processing it one element at a time,
a6dd486b 866which is often--if not almost always--the wrong approach. Whenever
d92eb7b0 867you see someone do this:
868
869 @lines = <INPUT>;
870
a6dd486b 871you should think long and hard about why you need everything loaded
d92eb7b0 872at once. It's just not a scalable solution. You might also find it
106325ad 873more fun to use the standard DB_File module's $DB_RECNO bindings,
d92eb7b0 874which allow you to tie an array to a file so that accessing an element
875the array actually accesses the corresponding line in the file.
876
877On very rare occasion, you may have an algorithm that demands that
878the entire file be in memory at once as one scalar. The simplest solution
a6dd486b 879to that is
d92eb7b0 880
881 $var = `cat $file`;
882
883Being in scalar context, you get the whole thing. In list context,
884you'd get a list of all the lines:
885
886 @lines = `cat $file`;
887
87275199 888This tiny but expedient solution is neat, clean, and portable to
889all systems on which decent tools have been installed. For those
890who prefer not to use the toolbox, you can of course read the file
891manually, although this makes for more complicated code.
d92eb7b0 892
893 {
894 local(*INPUT, $/);
895 open (INPUT, $file) || die "can't open $file: $!";
896 $var = <INPUT>;
897 }
898
899That temporarily undefs your record separator, and will automatically
900close the file at block exit. If the file is already open, just use this:
901
902 $var = do { local $/; <INPUT> };
903
68dc0745 904=head2 How can I read in a file by paragraphs?
905
65acb1b1 906Use the C<$/> variable (see L<perlvar> for details). You can either
68dc0745 907set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
908for instance, gets treated as two paragraphs and not three), or
909C<"\n\n"> to accept empty paragraphs.
910
65acb1b1 911Note that a blank line must have no blanks in it. Thus C<"fred\n
912\nstuff\n\n"> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
913
68dc0745 914=head2 How can I read a single character from a file? From the keyboard?
915
916You can use the builtin C<getc()> function for most filehandles, but
917it won't (easily) work on a terminal device. For STDIN, either use
a6dd486b 918the Term::ReadKey module from CPAN or use the sample code in
68dc0745 919L<perlfunc/getc>.
920
65acb1b1 921If your system supports the portable operating system programming
922interface (POSIX), you can use the following code, which you'll note
923turns off echo processing as well.
68dc0745 924
925 #!/usr/bin/perl -w
926 use strict;
927 $| = 1;
928 for (1..4) {
929 my $got;
930 print "gimme: ";
931 $got = getone();
932 print "--> $got\n";
933 }
934 exit;
935
936 BEGIN {
937 use POSIX qw(:termios_h);
938
939 my ($term, $oterm, $echo, $noecho, $fd_stdin);
940
941 $fd_stdin = fileno(STDIN);
942
943 $term = POSIX::Termios->new();
944 $term->getattr($fd_stdin);
945 $oterm = $term->getlflag();
946
947 $echo = ECHO | ECHOK | ICANON;
948 $noecho = $oterm & ~$echo;
949
950 sub cbreak {
951 $term->setlflag($noecho);
952 $term->setcc(VTIME, 1);
953 $term->setattr($fd_stdin, TCSANOW);
954 }
955
956 sub cooked {
957 $term->setlflag($oterm);
958 $term->setcc(VTIME, 0);
959 $term->setattr($fd_stdin, TCSANOW);
960 }
961
962 sub getone {
963 my $key = '';
964 cbreak();
965 sysread(STDIN, $key, 1);
966 cooked();
967 return $key;
968 }
969
970 }
971
972 END { cooked() }
973
a6dd486b 974The Term::ReadKey module from CPAN may be easier to use. Recent versions
65acb1b1 975include also support for non-portable systems as well.
68dc0745 976
977 use Term::ReadKey;
978 open(TTY, "</dev/tty");
979 print "Gimme a char: ";
980 ReadMode "raw";
981 $key = ReadKey 0, *TTY;
982 ReadMode "normal";
983 printf "\nYou said %s, char number %03d\n",
984 $key, ord $key;
985
65acb1b1 986=head2 How can I tell whether there's a character waiting on a filehandle?
68dc0745 987
5a964f20 988The very first thing you should do is look into getting the Term::ReadKey
65acb1b1 989extension from CPAN. As we mentioned earlier, it now even has limited
990support for non-portable (read: not open systems, closed, proprietary,
991not POSIX, not Unix, etc) systems.
5a964f20 992
993You should also check out the Frequently Asked Questions list in
68dc0745 994comp.unix.* for things like this: the answer is essentially the same.
995It's very system dependent. Here's one solution that works on BSD
996systems:
997
998 sub key_ready {
999 my($rin, $nfd);
1000 vec($rin, fileno(STDIN), 1) = 1;
1001 return $nfd = select($rin,undef,undef,0);
1002 }
1003
65acb1b1 1004If you want to find out how many characters are waiting, there's
1005also the FIONREAD ioctl call to be looked at. The I<h2ph> tool that
1006comes with Perl tries to convert C include files to Perl code, which
1007can be C<require>d. FIONREAD ends up defined as a function in the
1008I<sys/ioctl.ph> file:
68dc0745 1009
5a964f20 1010 require 'sys/ioctl.ph';
68dc0745 1011
5a964f20 1012 $size = pack("L", 0);
1013 ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
1014 $size = unpack("L", $size);
68dc0745 1015
5a964f20 1016If I<h2ph> wasn't installed or doesn't work for you, you can
1017I<grep> the include files by hand:
68dc0745 1018
5a964f20 1019 % grep FIONREAD /usr/include/*/*
1020 /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
68dc0745 1021
5a964f20 1022Or write a small C program using the editor of champions:
68dc0745 1023
5a964f20 1024 % cat > fionread.c
1025 #include <sys/ioctl.h>
1026 main() {
1027 printf("%#08x\n", FIONREAD);
1028 }
1029 ^D
65acb1b1 1030 % cc -o fionread fionread.c
5a964f20 1031 % ./fionread
1032 0x4004667f
1033
1034And then hard-code it, leaving porting as an exercise to your successor.
1035
1036 $FIONREAD = 0x4004667f; # XXX: opsys dependent
1037
1038 $size = pack("L", 0);
1039 ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
1040 $size = unpack("L", $size);
1041
a6dd486b 1042FIONREAD requires a filehandle connected to a stream, meaning that sockets,
5a964f20 1043pipes, and tty devices work, but I<not> files.
68dc0745 1044
1045=head2 How do I do a C<tail -f> in perl?
1046
1047First try
1048
1049 seek(GWFILE, 0, 1);
1050
1051The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
1052but it does clear the end-of-file condition on the handle, so that the
1053next <GWFILE> makes Perl try again to read something.
1054
1055If that doesn't work (it relies on features of your stdio implementation),
1056then you need something more like this:
1057
1058 for (;;) {
1059 for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
1060 # search for some stuff and put it into files
1061 }
1062 # sleep for a while
1063 seek(GWFILE, $curpos, 0); # seek to where we had been
1064 }
1065
1066If this still doesn't work, look into the POSIX module. POSIX defines
1067the clearerr() method, which can remove the end of file condition on a
1068filehandle. The method: read until end of file, clearerr(), read some
1069more. Lather, rinse, repeat.
1070
65acb1b1 1071There's also a File::Tail module from CPAN.
1072
68dc0745 1073=head2 How do I dup() a filehandle in Perl?
1074
1075If you check L<perlfunc/open>, you'll see that several of the ways
1076to call open() should do the trick. For example:
1077
1078 open(LOG, ">>/tmp/logfile");
1079 open(STDERR, ">&LOG");
1080
1081Or even with a literal numeric descriptor:
1082
1083 $fd = $ENV{MHCONTEXTFD};
1084 open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)
1085
c47ff5f1 1086Note that "<&STDIN" makes a copy, but "<&=STDIN" make
5a964f20 1087an alias. That means if you close an aliased handle, all
1088aliases become inaccessible. This is not true with
1089a copied one.
1090
1091Error checking, as always, has been left as an exercise for the reader.
68dc0745 1092
1093=head2 How do I close a file descriptor by number?
1094
1095This should rarely be necessary, as the Perl close() function is to be
1096used for things that Perl opened itself, even if it was a dup of a
a6dd486b 1097numeric descriptor as with MHCONTEXT above. But if you really have
68dc0745 1098to, you may be able to do this:
1099
1100 require 'sys/syscall.ph';
1101 $rc = syscall(&SYS_close, $fd + 0); # must force numeric
1102 die "can't sysclose $fd: $!" unless $rc == -1;
1103
a6dd486b 1104Or, just use the fdopen(3S) feature of open():
d92eb7b0 1105
1106 {
1107 local *F;
1108 open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!";
1109 close F;
1110 }
1111
46fc3d4c 1112=head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work?
68dc0745 1113
1114Whoops! You just put a tab and a formfeed into that filename!
1115Remember that within double quoted strings ("like\this"), the
1116backslash is an escape character. The full list of these is in
1117L<perlop/Quote and Quote-like Operators>. Unsurprisingly, you don't
1118have a file called "c:(tab)emp(formfeed)oo" or
65acb1b1 1119"c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem.
68dc0745 1120
1121Either single-quote your strings, or (preferably) use forward slashes.
46fc3d4c 1122Since all DOS and Windows versions since something like MS-DOS 2.0 or so
68dc0745 1123have treated C</> and C<\> the same in a path, you might as well use the
a6dd486b 1124one that doesn't clash with Perl--or the POSIX shell, ANSI C and C++,
65acb1b1 1125awk, Tcl, Java, or Python, just to mention a few. POSIX paths
1126are more portable, too.
68dc0745 1127
1128=head2 Why doesn't glob("*.*") get all the files?
1129
1130Because even on non-Unix ports, Perl's glob function follows standard
46fc3d4c 1131Unix globbing semantics. You'll need C<glob("*")> to get all (non-hidden)
65acb1b1 1132files. This makes glob() portable even to legacy systems. Your
1133port may include proprietary globbing functions as well. Check its
1134documentation for details.
68dc0745 1135
1136=head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
1137
1138This is elaborately and painstakingly described in the "Far More Than
7b8d334a 1139You Ever Wanted To Know" in
68dc0745 1140http://www.perl.com/CPAN/doc/FMTEYEWTK/file-dir-perms .
1141
1142The executive summary: learn how your filesystem works. The
1143permissions on a file say what can happen to the data in that file.
1144The permissions on a directory say what can happen to the list of
1145files in that directory. If you delete a file, you're removing its
1146name from the directory (so the operation depends on the permissions
1147of the directory, not of the file). If you try to write to the file,
1148the permissions of the file govern whether you're allowed to.
1149
1150=head2 How do I select a random line from a file?
1151
1152Here's an algorithm from the Camel Book:
1153
1154 srand;
1155 rand($.) < 1 && ($line = $_) while <>;
1156
1157This has a significant advantage in space over reading the whole
5a964f20 1158file in. A simple proof by induction is available upon
a6dd486b 1159request if you doubt the algorithm's correctness.
68dc0745 1160
65acb1b1 1161=head2 Why do I get weird spaces when I print an array of lines?
1162
1163Saying
1164
1165 print "@lines\n";
1166
1167joins together the elements of C<@lines> with a space between them.
1168If C<@lines> were C<("little", "fluffy", "clouds")> then the above
a6dd486b 1169statement would print
65acb1b1 1170
1171 little fluffy clouds
1172
1173but if each element of C<@lines> was a line of text, ending a newline
1174character C<("little\n", "fluffy\n", "clouds\n")> then it would print:
1175
1176 little
1177 fluffy
1178 clouds
1179
1180If your array contains lines, just print them:
1181
1182 print @lines;
1183
68dc0745 1184=head1 AUTHOR AND COPYRIGHT
1185
65acb1b1 1186Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1187All rights reserved.
1188
c8db1d39 1189When included as an integrated part of the Standard Distribution
d92eb7b0 1190of Perl or of its documentation (printed or otherwise), this works is
1191covered under Perl's Artistic License. For separate distributions of
c8db1d39 1192all or part of this FAQ outside of that, see L<perlfaq>.
1193
87275199 1194Irrespective of its distribution, all code examples here are in the public
c8db1d39 1195domain. You are permitted and encouraged to use this code and any
1196derivatives thereof in your own programs for fun or for profit as you
1197see fit. A simple comment in the code giving credit to the FAQ would
1198be courteous but is not required.