Re: Regex debugger patch
[p5sagit/p5-mst-13.2.git] / pod / perlfaq5.pod
CommitLineData
68dc0745 1=head1 NAME
2
d92eb7b0 3perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $)
68dc0745 4
5=head1 DESCRIPTION
6
7This section deals with I/O and the "f" issues: filehandles, flushing,
8formats, and footers.
9
5a964f20 10=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
68dc0745 11
12The C standard I/O library (stdio) normally buffers characters sent to
a6dd486b 13devices. This is done for efficiency reasons so that there isn't a
68dc0745 14system call for each byte. Any time you use print() or write() in
15Perl, you go though this buffering. syswrite() circumvents stdio and
16buffering.
17
5a964f20 18In most stdio implementations, the type of output buffering and the size of
68dc0745 19the buffer varies according to the type of device. Disk files are block
20buffered, often with a buffer size of more than 2k. Pipes and sockets
21are often buffered with a buffer size between 1/2 and 2k. Serial devices
22(e.g. modems, terminals) are normally line-buffered, and stdio sends
23the entire line when it gets the newline.
24
25Perl does not support truly unbuffered output (except insofar as you can
26C<syswrite(OUT, $char, 1)>). What it does instead support is "command
27buffering", in which a physical write is performed after every output
28command. This isn't as hard on your system as unbuffering, but does
29get the output where you want it when you want it.
30
31If you expect characters to get to your device when you print them there,
5a964f20 32you'll want to autoflush its handle.
33Use select() and the C<$|> variable to control autoflushing
34(see L<perlvar/$|> and L<perlfunc/select>):
35
36 $old_fh = select(OUTPUT_HANDLE);
37 $| = 1;
38 select($old_fh);
39
40Or using the traditional idiom:
41
42 select((select(OUTPUT_HANDLE), $| = 1)[0]);
43
44Or if don't mind slowly loading several thousand lines of module code
45just because you're afraid of the C<$|> variable:
68dc0745 46
47 use FileHandle;
5a964f20 48 open(DEV, "+</dev/tty"); # ceci n'est pas une pipe
68dc0745 49 DEV->autoflush(1);
50
51or the newer IO::* modules:
52
53 use IO::Handle;
54 open(DEV, ">/dev/printer"); # but is this?
55 DEV->autoflush(1);
56
57or even this:
58
59 use IO::Socket; # this one is kinda a pipe?
60 $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
61 PeerPort => 'http(80)',
62 Proto => 'tcp');
63 die "$!" unless $sock;
64
65 $sock->autoflush();
5a964f20 66 print $sock "GET / HTTP/1.0" . "\015\012" x 2;
67 $document = join('', <$sock>);
68dc0745 68 print "DOC IS: $document\n";
69
5a964f20 70Note the bizarrely hardcoded carriage return and newline in their octal
71equivalents. This is the ONLY way (currently) to assure a proper flush
d92eb7b0 72on all platforms, including Macintosh. That's the way things work in
5a964f20 73network programming: you really should specify the exact bit pattern
74on the network line terminator. In practice, C<"\n\n"> often works,
75but this is not portable.
68dc0745 76
5a964f20 77See L<perlfaq9> for other examples of fetching URLs over the web.
68dc0745 78
79=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
80
65acb1b1 81Those are operations of a text editor. Perl is not a text editor.
82Perl is a programming language. You have to decompose the problem into
83low-level calls to read, write, open, close, and seek.
84
68dc0745 85Although humans have an easy time thinking of a text file as being a
a6dd486b 86sequence of lines that operates much like a stack of playing cards--or
87punch cards--computers usually see the text file as a sequence of bytes.
65acb1b1 88In general, there's no direct way for Perl to seek to a particular line
89of a file, insert text into a file, or remove text from a file.
68dc0745 90
a6dd486b 91(There are exceptions in special circumstances. You can add or remove
92data at the very end of the file. A sequence of bytes can be replaced
93with another sequence of the same length. The C<$DB_RECNO> array
94bindings as documented in L<DB_File> also provide a direct way of
95modifying a file. Files where all lines are the same length are also
96easy to alter.)
68dc0745 97
98The general solution is to create a temporary copy of the text file with
5a964f20 99the changes you want, then copy that over the original. This assumes
100no locking.
68dc0745 101
102 $old = $file;
103 $new = "$file.tmp.$$";
65acb1b1 104 $bak = "$file.orig";
68dc0745 105
106 open(OLD, "< $old") or die "can't open $old: $!";
107 open(NEW, "> $new") or die "can't open $new: $!";
108
109 # Correct typos, preserving case
110 while (<OLD>) {
111 s/\b(p)earl\b/${1}erl/i;
112 (print NEW $_) or die "can't write to $new: $!";
113 }
114
115 close(OLD) or die "can't close $old: $!";
116 close(NEW) or die "can't close $new: $!";
117
118 rename($old, $bak) or die "can't rename $old to $bak: $!";
119 rename($new, $old) or die "can't rename $new to $old: $!";
120
121Perl can do this sort of thing for you automatically with the C<-i>
46fc3d4c 122command-line switch or the closely-related C<$^I> variable (see
68dc0745 123L<perlrun> for more details). Note that
124C<-i> may require a suffix on some non-Unix systems; see the
125platform-specific documentation that came with your port.
126
127 # Renumber a series of tests from the command line
128 perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t
129
130 # form a script
65acb1b1 131 local($^I, @ARGV) = ('.orig', glob("*.c"));
68dc0745 132 while (<>) {
133 if ($. == 1) {
134 print "This line should appear at the top of each file\n";
135 }
136 s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
137 print;
138 close ARGV if eof; # Reset $.
139 }
140
141If you need to seek to an arbitrary line of a file that changes
142infrequently, you could build up an index of byte positions of where
143the line ends are in the file. If the file is large, an index of
144every tenth or hundredth line end would allow you to seek and read
145fairly efficiently. If the file is sorted, try the look.pl library
146(part of the standard perl distribution).
147
148In the unique case of deleting lines at the end of a file, you
149can use tell() and truncate(). The following code snippet deletes
150the last line of a file without making a copy or reading the
151whole file into memory:
152
153 open (FH, "+< $file");
54310121 154 while ( <FH> ) { $addr = tell(FH) unless eof(FH) }
68dc0745 155 truncate(FH, $addr);
156
157Error checking is left as an exercise for the reader.
158
159=head2 How do I count the number of lines in a file?
160
161One fairly efficient way is to count newlines in the file. The
162following program uses a feature of tr///, as documented in L<perlop>.
163If your text file doesn't end with a newline, then it's not really a
164proper text file, so this may report one fewer line than you expect.
165
166 $lines = 0;
167 open(FILE, $filename) or die "Can't open `$filename': $!";
168 while (sysread FILE, $buffer, 4096) {
169 $lines += ($buffer =~ tr/\n//);
170 }
171 close FILE;
172
5a964f20 173This assumes no funny games with newline translations.
174
68dc0745 175=head2 How do I make a temporary file name?
176
5a964f20 177Use the C<new_tmpfile> class method from the IO::File module to get a
a6dd486b 178filehandle opened for reading and writing. Use it if you don't
179need to know the file's name:
68dc0745 180
65acb1b1 181 use IO::File;
5a964f20 182 $fh = IO::File->new_tmpfile()
65acb1b1 183 or die "Unable to make new temporary file: $!";
5a964f20 184
a6dd486b 185If you do need to know the file's name, you can use the C<tmpnam>
186function from the POSIX module to get a filename that you then open
187yourself:
188
5a964f20 189
190 use Fcntl;
191 use POSIX qw(tmpnam);
192
193 # try new temporary filenames until we get one that didn't already
194 # exist; the check should be unnecessary, but you can't be too careful
195 do { $name = tmpnam() }
196 until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
197
198 # install atexit-style handler so that when we exit or die,
199 # we automatically delete this temporary file
200 END { unlink($name) or die "Couldn't unlink $name : $!" }
201
202 # now go on to use the file ...
203
a6dd486b 204If you're committed to creating a temporary file by hand, use the
205process ID and/or the current time-value. If you need to have many
206temporary files in one process, use a counter:
5a964f20 207
208 BEGIN {
68dc0745 209 use Fcntl;
210 my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
211 my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
212 sub temp_file {
5a964f20 213 local *FH;
68dc0745 214 my $count = 0;
5a964f20 215 until (defined(fileno(FH)) || $count++ > 100) {
68dc0745 216 $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
5a964f20 217 sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
68dc0745 218 }
5a964f20 219 if (defined(fileno(FH))
220 return (*FH, $base_name);
68dc0745 221 } else {
222 return ();
223 }
224 }
225 }
226
68dc0745 227=head2 How can I manipulate fixed-record-length files?
228
5a964f20 229The most efficient way is using pack() and unpack(). This is faster than
65acb1b1 230using substr() when taking many, many strings. It is slower for just a few.
5a964f20 231
232Here is a sample chunk of code to break up and put back together again
233some fixed-format input lines, in this case from the output of a normal,
234Berkeley-style ps:
68dc0745 235
236 # sample input line:
237 # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
238 $PS_T = 'A6 A4 A7 A5 A*';
239 open(PS, "ps|");
5a964f20 240 print scalar <PS>;
68dc0745 241 while (<PS>) {
242 ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
243 for $var (qw!pid tt stat time command!) {
244 print "$var: <$$var>\n";
245 }
246 print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command),
247 "\n";
248 }
249
5a964f20 250We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
251That is, we've promoted a string to a scalar variable reference using
252symbolic references. This is ok in small programs, but doesn't scale
253well. It also only works on global variables, not lexicals.
254
68dc0745 255=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
256
5a964f20 257The fastest, simplest, and most direct way is to localize the typeglob
258of the filehandle in question:
68dc0745 259
5a964f20 260 local *TmpHandle;
68dc0745 261
5a964f20 262Typeglobs are fast (especially compared with the alternatives) and
263reasonably easy to use, but they also have one subtle drawback. If you
264had, for example, a function named TmpHandle(), or a variable named
265%TmpHandle, you just hid it from yourself.
68dc0745 266
68dc0745 267 sub findme {
5a964f20 268 local *HostFile;
269 open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
270 local $_; # <- VERY IMPORTANT
271 while (<HostFile>) {
68dc0745 272 print if /\b127\.(0\.0\.)?1\b/;
273 }
5a964f20 274 # *HostFile automatically closes/disappears here
275 }
276
a6dd486b 277Here's how to use typeglobs in a loop to open and store a bunch of
5a964f20 278filehandles. We'll use as values of the hash an ordered
279pair to make it easy to sort the hash in insertion order.
280
281 @names = qw(motd termcap passwd hosts);
282 my $i = 0;
283 foreach $filename (@names) {
284 local *FH;
285 open(FH, "/etc/$filename") || die "$filename: $!";
286 $file{$filename} = [ $i++, *FH ];
68dc0745 287 }
288
5a964f20 289 # Using the filehandles in the array
290 foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
291 my $fh = $file{$name}[1];
292 my $line = <$fh>;
293 print "$name $. $line";
294 }
295
c8db1d39 296For passing filehandles to functions, the easiest way is to
13a2d996 297preface them with a star, as in func(*STDIN).
298See L<perlfaq7/"Passing Filehandles"> for details.
c8db1d39 299
65acb1b1 300If you want to create many anonymous handles, you should check out the
5a964f20 301Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent
302code with Symbol::gensym, which is reasonably light-weight:
303
304 foreach $filename (@names) {
305 use Symbol;
306 my $fh = gensym();
307 open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
308 $file{$filename} = [ $i++, $fh ];
309 }
68dc0745 310
a6dd486b 311Here's using the semi-object-oriented FileHandle module, which certainly
65acb1b1 312isn't light-weight:
46fc3d4c 313
314 use FileHandle;
315
46fc3d4c 316 foreach $filename (@names) {
5a964f20 317 my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
318 $file{$filename} = [ $i++, $fh ];
46fc3d4c 319 }
320
5a964f20 321Please understand that whether the filehandle happens to be a (probably
a6dd486b 322localized) typeglob or an anonymous handle from one of the modules
5a964f20 323in no way affects the bizarre rules for managing indirect handles.
324See the next question.
325
326=head2 How can I use a filehandle indirectly?
327
328An indirect filehandle is using something other than a symbol
329in a place that a filehandle is expected. Here are ways
a6dd486b 330to get indirect filehandles:
5a964f20 331
332 $fh = SOME_FH; # bareword is strict-subs hostile
333 $fh = "SOME_FH"; # strict-refs hostile; same package only
334 $fh = *SOME_FH; # typeglob
335 $fh = \*SOME_FH; # ref to typeglob (bless-able)
336 $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
337
a6dd486b 338Or, you can use the C<new> method from the FileHandle or IO modules to
5a964f20 339create an anonymous filehandle, store that in a scalar variable,
340and use it as though it were a normal filehandle.
341
342 use FileHandle;
343 $fh = FileHandle->new();
344
345 use IO::Handle; # 5.004 or higher
346 $fh = IO::Handle->new();
347
348Then use any of those as you would a normal filehandle. Anywhere that
349Perl is expecting a filehandle, an indirect filehandle may be used
350instead. An indirect filehandle is just a scalar variable that contains
368c9434 351a filehandle. Functions like C<print>, C<open>, C<seek>, or
c47ff5f1 352the C<< <FH> >> diamond operator will accept either a read filehandle
5a964f20 353or a scalar variable containing one:
354
355 ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
356 print $ofh "Type it: ";
357 $got = <$ifh>
358 print $efh "What was that: $got";
359
368c9434 360If you're passing a filehandle to a function, you can write
5a964f20 361the function in two ways:
362
363 sub accept_fh {
364 my $fh = shift;
365 print $fh "Sending to indirect filehandle\n";
46fc3d4c 366 }
367
5a964f20 368Or it can localize a typeglob and use the filehandle directly:
46fc3d4c 369
5a964f20 370 sub accept_fh {
371 local *FH = shift;
372 print FH "Sending to localized filehandle\n";
46fc3d4c 373 }
374
5a964f20 375Both styles work with either objects or typeglobs of real filehandles.
376(They might also work with strings under some circumstances, but this
377is risky.)
378
379 accept_fh(*STDOUT);
380 accept_fh($handle);
381
382In the examples above, we assigned the filehandle to a scalar variable
a6dd486b 383before using it. That is because only simple scalar variables, not
384expressions or subscripts of hashes or arrays, can be used with
385built-ins like C<print>, C<printf>, or the diamond operator. Using
386something other than a simple scalar varaible as a filehandle is
5a964f20 387illegal and won't even compile:
388
389 @fd = (*STDIN, *STDOUT, *STDERR);
390 print $fd[1] "Type it: "; # WRONG
391 $got = <$fd[0]> # WRONG
392 print $fd[2] "What was that: $got"; # WRONG
393
394With C<print> and C<printf>, you get around this by using a block and
395an expression where you would place the filehandle:
396
397 print { $fd[1] } "funny stuff\n";
398 printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
399 # Pity the poor deadbeef.
400
401That block is a proper block like any other, so you can put more
402complicated code there. This sends the message out to one of two places:
403
404 $ok = -x "/bin/cat";
405 print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
406 print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
407
408This approach of treating C<print> and C<printf> like object methods
409calls doesn't work for the diamond operator. That's because it's a
410real operator, not just a function with a comma-less argument. Assuming
411you've been storing typeglobs in your structure as we did above, you
412can use the built-in function named C<readline> to reads a record just
c47ff5f1 413as C<< <> >> does. Given the initialization shown above for @fd, this
5a964f20 414would work, but only because readline() require a typeglob. It doesn't
415work with objects or strings, which might be a bug we haven't fixed yet.
416
417 $got = readline($fd[0]);
418
419Let it be noted that the flakiness of indirect filehandles is not
420related to whether they're strings, typeglobs, objects, or anything else.
421It's the syntax of the fundamental operators. Playing the object
422game doesn't help you at all here.
46fc3d4c 423
68dc0745 424=head2 How can I set up a footer format to be used with write()?
425
54310121 426There's no builtin way to do this, but L<perlform> has a couple of
68dc0745 427techniques to make it possible for the intrepid hacker.
428
429=head2 How can I write() into a string?
430
65acb1b1 431See L<perlform/"Accessing Formatting Internals"> for an swrite() function.
68dc0745 432
433=head2 How can I output my numbers with commas added?
434
435This one will do it for you:
436
437 sub commify {
438 local $_ = shift;
65acb1b1 439 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
68dc0745 440 return $_;
441 }
442
443 $n = 23659019423.2331;
444 print "GOT: ", commify($n), "\n";
445
446 GOT: 23,659,019,423.2331
447
448You can't just:
449
65acb1b1 450 s/^([-+]?\d+)(\d{3})/$1,$2/g;
68dc0745 451
452because you have to put the comma in and then recalculate your
453position.
454
a6dd486b 455Alternatively, this code commifies all numbers in a line regardless of
46fc3d4c 456whether they have decimal portions, are preceded by + or -, or
457whatever:
458
459 # from Andrew Johnson <ajohnson@gpu.srv.ualberta.ca>
460 sub commify {
461 my $input = shift;
462 $input = reverse $input;
463 $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
65acb1b1 464 return scalar reverse $input;
46fc3d4c 465 }
466
68dc0745 467=head2 How can I translate tildes (~) in a filename?
468
575cc754 469Use the <> (glob()) operator, documented in L<perlfunc>. Older
470versions of Perl require that you have a shell installed that groks
471tildes. Recent perl versions have this feature built in. The
472Glob::KGlob module (available from CPAN) gives more portable glob
473functionality.
68dc0745 474
475Within Perl, you may use this directly:
476
477 $filename =~ s{
478 ^ ~ # find a leading tilde
479 ( # save this in $1
480 [^/] # a non-slash character
481 * # repeated 0 or more times (0 means me)
482 )
483 }{
484 $1
485 ? (getpwnam($1))[7]
486 : ( $ENV{HOME} || $ENV{LOGDIR} )
487 }ex;
488
5a964f20 489=head2 How come when I open a file read-write it wipes it out?
68dc0745 490
491Because you're using something like this, which truncates the file and
492I<then> gives you read-write access:
493
5a964f20 494 open(FH, "+> /path/name"); # WRONG (almost always)
68dc0745 495
496Whoops. You should instead use this, which will fail if the file
d92eb7b0 497doesn't exist.
498
499 open(FH, "+< /path/name"); # open for update
500
c47ff5f1 501Using ">" always clobbers or creates. Using "<" never does
d92eb7b0 502either. The "+" doesn't change this.
68dc0745 503
5a964f20 504Here are examples of many kinds of file opens. Those using sysopen()
505all assume
68dc0745 506
5a964f20 507 use Fcntl;
68dc0745 508
5a964f20 509To open file for reading:
68dc0745 510
5a964f20 511 open(FH, "< $path") || die $!;
512 sysopen(FH, $path, O_RDONLY) || die $!;
513
514To open file for writing, create new file if needed or else truncate old file:
515
516 open(FH, "> $path") || die $!;
517 sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT) || die $!;
518 sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666) || die $!;
519
520To open file for writing, create new file, file must not exist:
521
522 sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT) || die $!;
523 sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666) || die $!;
524
525To open file for appending, create if necessary:
526
527 open(FH, ">> $path") || die $!;
528 sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT) || die $!;
529 sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;
530
531To open file for appending, file must exist:
532
533 sysopen(FH, $path, O_WRONLY|O_APPEND) || die $!;
534
535To open file for update, file must exist:
536
537 open(FH, "+< $path") || die $!;
538 sysopen(FH, $path, O_RDWR) || die $!;
539
540To open file for update, create file if necessary:
541
542 sysopen(FH, $path, O_RDWR|O_CREAT) || die $!;
543 sysopen(FH, $path, O_RDWR|O_CREAT, 0666) || die $!;
544
545To open file for update, file must not exist:
546
547 sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT) || die $!;
548 sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666) || die $!;
549
550To open a file without blocking, creating if necessary:
551
552 sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
553 or die "can't open /tmp/somefile: $!":
554
555Be warned that neither creation nor deletion of files is guaranteed to
556be an atomic operation over NFS. That is, two processes might both
a6dd486b 557successfully create or unlink the same file! Therefore O_EXCL
558isn't as exclusive as you might wish.
68dc0745 559
87275199 560See also the new L<perlopentut> if you have it (new for 5.6).
65acb1b1 561
c47ff5f1 562=head2 Why do I sometimes get an "Argument list too long" when I use <*>?
68dc0745 563
c47ff5f1 564The C<< <> >> operator performs a globbing operation (see above).
3a4b19e4 565In Perl versions earlier than v5.6.0, the internal glob() operator forks
566csh(1) to do the actual glob expansion, but
68dc0745 567csh can't handle more than 127 items and so gives the error message
568C<Argument list too long>. People who installed tcsh as csh won't
569have this problem, but their users may be surprised by it.
570
3a4b19e4 571To get around this, either upgrade to Perl v5.6.0 or later, do the glob
572yourself with readdir() and patterns, or use a module like Glob::KGlob,
573one that doesn't use the shell to do globbing.
68dc0745 574
575=head2 Is there a leak/bug in glob()?
576
577Due to the current implementation on some operating systems, when you
578use the glob() function or its angle-bracket alias in a scalar
a6dd486b 579context, you may cause a memory leak and/or unpredictable behavior. It's
68dc0745 580best therefore to use glob() only in list context.
581
c47ff5f1 582=head2 How can I open a file with a leading ">" or trailing blanks?
68dc0745 583
584Normally perl ignores trailing blanks in filenames, and interprets
585certain leading characters (or a trailing "|") to mean something
a6dd486b 586special. To avoid this, you might want to use a routine like the one below.
587It turns incomplete pathnames into explicit relative ones, and tacks a
68dc0745 588trailing null byte on the name to make perl leave it alone:
589
590 sub safe_filename {
591 local $_ = shift;
65acb1b1 592 s#^([^./])#./$1#;
593 $_ .= "\0";
594 return $_;
68dc0745 595 }
596
65acb1b1 597 $badpath = "<<<something really wicked ";
598 $fn = safe_filename($badpath");
599 open(FH, "> $fn") or "couldn't open $badpath: $!";
600
601This assumes that you are using POSIX (portable operating systems
602interface) paths. If you are on a closed, non-portable, proprietary
603system, you may have to adjust the C<"./"> above.
604
605It would be a lot clearer to use sysopen(), though:
606
607 use Fcntl;
608 $badpath = "<<<something really wicked ";
a6dd486b 609 sysopen (FH, $badpath, O_WRONLY | O_CREAT | O_TRUNC)
65acb1b1 610 or die "can't open $badpath: $!";
68dc0745 611
65acb1b1 612For more information, see also the new L<perlopentut> if you have it
87275199 613(new for 5.6).
68dc0745 614
615=head2 How can I reliably rename a file?
616
a6dd486b 617Well, usually you just use Perl's rename() function. That may not
618work everywhere, though, particularly when renaming files across file systems.
d92eb7b0 619Some sub-Unix systems have broken ports that corrupt the semantics of
a6dd486b 620rename()--for example, WinNT does this right, but Win95 and Win98
d92eb7b0 621are broken. (The last two parts are not surprising, but the first is. :-)
622
623If your operating system supports a proper mv(1) program or its moral
624equivalent, this works:
68dc0745 625
626 rename($old, $new) or system("mv", $old, $new);
627
628It may be more compelling to use the File::Copy module instead. You
629just copy to the new file to the new name (checking return values),
a6dd486b 630then delete the old one. This isn't really the same semantically as a
68dc0745 631real rename(), though, which preserves metainformation like
632permissions, timestamps, inode info, etc.
633
a6dd486b 634Newer versions of File::Copy exports a move() function.
5a964f20 635
68dc0745 636=head2 How can I lock a file?
637
54310121 638Perl's builtin flock() function (see L<perlfunc> for details) will call
68dc0745 639flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and
640later), and lockf(3) if neither of the two previous system calls exists.
641On some systems, it may even use a different form of native locking.
642Here are some gotchas with Perl's flock():
643
644=over 4
645
646=item 1
647
648Produces a fatal error if none of the three system calls (or their
649close equivalent) exists.
650
651=item 2
652
653lockf(3) does not provide shared locking, and requires that the
654filehandle be open for writing (or appending, or read/writing).
655
656=item 3
657
d92eb7b0 658Some versions of flock() can't lock files over a network (e.g. on NFS file
659systems), so you'd need to force the use of fcntl(2) when you build Perl.
a6dd486b 660But even this is dubious at best. See the flock entry of L<perlfunc>
d92eb7b0 661and the F<INSTALL> file in the source distribution for information on
662building Perl to do this.
663
664Two potentially non-obvious but traditional flock semantics are that
a6dd486b 665it waits indefinitely until the lock is granted, and that its locks are
d92eb7b0 666I<merely advisory>. Such discretionary locks are more flexible, but
667offer fewer guarantees. This means that files locked with flock() may
668be modified by programs that do not also use flock(). Cars that stop
669for red lights get on well with each other, but not with cars that don't
670stop for red lights. See the perlport manpage, your port's specific
671documentation, or your system-specific local manpages for details. It's
672best to assume traditional behavior if you're writing portable programs.
a6dd486b 673(If you're not, you should as always feel perfectly free to write
d92eb7b0 674for your own system's idiosyncrasies (sometimes called "features").
675Slavish adherence to portability concerns shouldn't get in the way of
676your getting your job done.)
68dc0745 677
13a2d996 678For more information on file locking, see also
679L<perlopentut/"File Locking"> if you have it (new for 5.6).
65acb1b1 680
68dc0745 681=back
682
65acb1b1 683=head2 Why can't I just open(FH, ">file.lock")?
68dc0745 684
685A common bit of code B<NOT TO USE> is this:
686
687 sleep(3) while -e "file.lock"; # PLEASE DO NOT USE
688 open(LCK, "> file.lock"); # THIS BROKEN CODE
689
690This is a classic race condition: you take two steps to do something
691which must be done in one. That's why computer hardware provides an
692atomic test-and-set instruction. In theory, this "ought" to work:
693
5a964f20 694 sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
68dc0745 695 or die "can't open file.lock: $!":
696
697except that lamentably, file creation (and deletion) is not atomic
698over NFS, so this won't work (at least, not every time) over the net.
65acb1b1 699Various schemes involving link() have been suggested, but
46fc3d4c 700these tend to involve busy-wait, which is also subdesirable.
68dc0745 701
fc36a67e 702=head2 I still don't get locking. I just want to increment the number in the file. How can I do this?
68dc0745 703
46fc3d4c 704Didn't anyone ever tell you web-page hit counters were useless?
5a964f20 705They don't count number of hits, they're a waste of time, and they serve
a6dd486b 706only to stroke the writer's vanity. It's better to pick a random number;
707they're more realistic.
68dc0745 708
5a964f20 709Anyway, this is what you can do if you can't help yourself.
68dc0745 710
e2c57c3e 711 use Fcntl qw(:DEFAULT :flock);
5a964f20 712 sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can't open numfile: $!";
65acb1b1 713 flock(FH, LOCK_EX) or die "can't flock numfile: $!";
68dc0745 714 $num = <FH> || 0;
715 seek(FH, 0, 0) or die "can't rewind numfile: $!";
716 truncate(FH, 0) or die "can't truncate numfile: $!";
717 (print FH $num+1, "\n") or die "can't write numfile: $!";
68dc0745 718 close FH or die "can't close numfile: $!";
719
46fc3d4c 720Here's a much better web-page hit counter:
68dc0745 721
722 $hits = int( (time() - 850_000_000) / rand(1_000) );
723
724If the count doesn't impress your friends, then the code might. :-)
725
726=head2 How do I randomly update a binary file?
727
728If you're just trying to patch a binary, in many cases something as
729simple as this works:
730
731 perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
732
733However, if you have fixed sized records, then you might do something more
734like this:
735
736 $RECSIZE = 220; # size of record, in bytes
737 $recno = 37; # which record to update
738 open(FH, "+<somewhere") || die "can't update somewhere: $!";
739 seek(FH, $recno * $RECSIZE, 0);
740 read(FH, $record, $RECSIZE) == $RECSIZE || die "can't read record $recno: $!";
741 # munge the record
65acb1b1 742 seek(FH, -$RECSIZE, 1);
68dc0745 743 print FH $record;
744 close FH;
745
746Locking and error checking are left as an exercise for the reader.
a6dd486b 747Don't forget them or you'll be quite sorry.
68dc0745 748
68dc0745 749=head2 How do I get a file's timestamp in perl?
750
751If you want to retrieve the time at which the file was last read,
46fc3d4c 752written, or had its meta-data (owner, etc) changed, you use the B<-M>,
68dc0745 753B<-A>, or B<-C> filetest operations as documented in L<perlfunc>. These
754retrieve the age of the file (measured against the start-time of your
755program) in days as a floating point number. To retrieve the "raw"
756time in seconds since the epoch, you would call the stat function,
757then use localtime(), gmtime(), or POSIX::strftime() to convert this
758into human-readable form.
759
760Here's an example:
761
762 $write_secs = (stat($file))[9];
c8db1d39 763 printf "file %s updated at %s\n", $file,
764 scalar localtime($write_secs);
68dc0745 765
766If you prefer something more legible, use the File::stat module
767(part of the standard distribution in version 5.004 and later):
768
65acb1b1 769 # error checking left as an exercise for reader.
68dc0745 770 use File::stat;
771 use Time::localtime;
772 $date_string = ctime(stat($file)->mtime);
773 print "file $file updated at $date_string\n";
774
65acb1b1 775The POSIX::strftime() approach has the benefit of being,
776in theory, independent of the current locale. See L<perllocale>
777for details.
68dc0745 778
779=head2 How do I set a file's timestamp in perl?
780
781You use the utime() function documented in L<perlfunc/utime>.
782By way of example, here's a little program that copies the
783read and write times from its first argument to all the rest
784of them.
785
786 if (@ARGV < 2) {
787 die "usage: cptimes timestamp_file other_files ...\n";
788 }
789 $timestamp = shift;
790 ($atime, $mtime) = (stat($timestamp))[8,9];
791 utime $atime, $mtime, @ARGV;
792
65acb1b1 793Error checking is, as usual, left as an exercise for the reader.
68dc0745 794
795Note that utime() currently doesn't work correctly with Win95/NT
796ports. A bug has been reported. Check it carefully before using
a6dd486b 797utime() on those platforms.
68dc0745 798
799=head2 How do I print to more than one file at once?
800
801If you only have to do this once, you can do this:
802
803 for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
804
805To connect up to one filehandle to several output filehandles, it's
806easiest to use the tee(1) program if you have it, and let it take care
807of the multiplexing:
808
809 open (FH, "| tee file1 file2 file3");
810
5a964f20 811Or even:
812
813 # make STDOUT go to three files, plus original STDOUT
814 open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
815 print "whatever\n" or die "Writing: $!\n";
816 close(STDOUT) or die "Closing: $!\n";
68dc0745 817
5a964f20 818Otherwise you'll have to write your own multiplexing print
a6dd486b 819function--or your own tee program--or use Tom Christiansen's,
820at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz , which is
5a964f20 821written in Perl and offers much greater functionality
822than the stock version.
68dc0745 823
d92eb7b0 824=head2 How can I read in an entire file all at once?
825
826The customary Perl approach for processing all the lines in a file is to
827do so one line at a time:
828
829 open (INPUT, $file) || die "can't open $file: $!";
830 while (<INPUT>) {
831 chomp;
832 # do something with $_
833 }
834 close(INPUT) || die "can't close $file: $!";
835
836This is tremendously more efficient than reading the entire file into
837memory as an array of lines and then processing it one element at a time,
a6dd486b 838which is often--if not almost always--the wrong approach. Whenever
d92eb7b0 839you see someone do this:
840
841 @lines = <INPUT>;
842
a6dd486b 843you should think long and hard about why you need everything loaded
d92eb7b0 844at once. It's just not a scalable solution. You might also find it
106325ad 845more fun to use the standard DB_File module's $DB_RECNO bindings,
d92eb7b0 846which allow you to tie an array to a file so that accessing an element
847the array actually accesses the corresponding line in the file.
848
849On very rare occasion, you may have an algorithm that demands that
850the entire file be in memory at once as one scalar. The simplest solution
a6dd486b 851to that is
d92eb7b0 852
853 $var = `cat $file`;
854
855Being in scalar context, you get the whole thing. In list context,
856you'd get a list of all the lines:
857
858 @lines = `cat $file`;
859
87275199 860This tiny but expedient solution is neat, clean, and portable to
861all systems on which decent tools have been installed. For those
862who prefer not to use the toolbox, you can of course read the file
863manually, although this makes for more complicated code.
d92eb7b0 864
865 {
866 local(*INPUT, $/);
867 open (INPUT, $file) || die "can't open $file: $!";
868 $var = <INPUT>;
869 }
870
871That temporarily undefs your record separator, and will automatically
872close the file at block exit. If the file is already open, just use this:
873
874 $var = do { local $/; <INPUT> };
875
68dc0745 876=head2 How can I read in a file by paragraphs?
877
65acb1b1 878Use the C<$/> variable (see L<perlvar> for details). You can either
68dc0745 879set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
880for instance, gets treated as two paragraphs and not three), or
881C<"\n\n"> to accept empty paragraphs.
882
65acb1b1 883Note that a blank line must have no blanks in it. Thus C<"fred\n
884\nstuff\n\n"> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
885
68dc0745 886=head2 How can I read a single character from a file? From the keyboard?
887
888You can use the builtin C<getc()> function for most filehandles, but
889it won't (easily) work on a terminal device. For STDIN, either use
a6dd486b 890the Term::ReadKey module from CPAN or use the sample code in
68dc0745 891L<perlfunc/getc>.
892
65acb1b1 893If your system supports the portable operating system programming
894interface (POSIX), you can use the following code, which you'll note
895turns off echo processing as well.
68dc0745 896
897 #!/usr/bin/perl -w
898 use strict;
899 $| = 1;
900 for (1..4) {
901 my $got;
902 print "gimme: ";
903 $got = getone();
904 print "--> $got\n";
905 }
906 exit;
907
908 BEGIN {
909 use POSIX qw(:termios_h);
910
911 my ($term, $oterm, $echo, $noecho, $fd_stdin);
912
913 $fd_stdin = fileno(STDIN);
914
915 $term = POSIX::Termios->new();
916 $term->getattr($fd_stdin);
917 $oterm = $term->getlflag();
918
919 $echo = ECHO | ECHOK | ICANON;
920 $noecho = $oterm & ~$echo;
921
922 sub cbreak {
923 $term->setlflag($noecho);
924 $term->setcc(VTIME, 1);
925 $term->setattr($fd_stdin, TCSANOW);
926 }
927
928 sub cooked {
929 $term->setlflag($oterm);
930 $term->setcc(VTIME, 0);
931 $term->setattr($fd_stdin, TCSANOW);
932 }
933
934 sub getone {
935 my $key = '';
936 cbreak();
937 sysread(STDIN, $key, 1);
938 cooked();
939 return $key;
940 }
941
942 }
943
944 END { cooked() }
945
a6dd486b 946The Term::ReadKey module from CPAN may be easier to use. Recent versions
65acb1b1 947include also support for non-portable systems as well.
68dc0745 948
949 use Term::ReadKey;
950 open(TTY, "</dev/tty");
951 print "Gimme a char: ";
952 ReadMode "raw";
953 $key = ReadKey 0, *TTY;
954 ReadMode "normal";
955 printf "\nYou said %s, char number %03d\n",
956 $key, ord $key;
957
65acb1b1 958=head2 How can I tell whether there's a character waiting on a filehandle?
68dc0745 959
5a964f20 960The very first thing you should do is look into getting the Term::ReadKey
65acb1b1 961extension from CPAN. As we mentioned earlier, it now even has limited
962support for non-portable (read: not open systems, closed, proprietary,
963not POSIX, not Unix, etc) systems.
5a964f20 964
965You should also check out the Frequently Asked Questions list in
68dc0745 966comp.unix.* for things like this: the answer is essentially the same.
967It's very system dependent. Here's one solution that works on BSD
968systems:
969
970 sub key_ready {
971 my($rin, $nfd);
972 vec($rin, fileno(STDIN), 1) = 1;
973 return $nfd = select($rin,undef,undef,0);
974 }
975
65acb1b1 976If you want to find out how many characters are waiting, there's
977also the FIONREAD ioctl call to be looked at. The I<h2ph> tool that
978comes with Perl tries to convert C include files to Perl code, which
979can be C<require>d. FIONREAD ends up defined as a function in the
980I<sys/ioctl.ph> file:
68dc0745 981
5a964f20 982 require 'sys/ioctl.ph';
68dc0745 983
5a964f20 984 $size = pack("L", 0);
985 ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
986 $size = unpack("L", $size);
68dc0745 987
5a964f20 988If I<h2ph> wasn't installed or doesn't work for you, you can
989I<grep> the include files by hand:
68dc0745 990
5a964f20 991 % grep FIONREAD /usr/include/*/*
992 /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
68dc0745 993
5a964f20 994Or write a small C program using the editor of champions:
68dc0745 995
5a964f20 996 % cat > fionread.c
997 #include <sys/ioctl.h>
998 main() {
999 printf("%#08x\n", FIONREAD);
1000 }
1001 ^D
65acb1b1 1002 % cc -o fionread fionread.c
5a964f20 1003 % ./fionread
1004 0x4004667f
1005
1006And then hard-code it, leaving porting as an exercise to your successor.
1007
1008 $FIONREAD = 0x4004667f; # XXX: opsys dependent
1009
1010 $size = pack("L", 0);
1011 ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
1012 $size = unpack("L", $size);
1013
a6dd486b 1014FIONREAD requires a filehandle connected to a stream, meaning that sockets,
5a964f20 1015pipes, and tty devices work, but I<not> files.
68dc0745 1016
1017=head2 How do I do a C<tail -f> in perl?
1018
1019First try
1020
1021 seek(GWFILE, 0, 1);
1022
1023The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
1024but it does clear the end-of-file condition on the handle, so that the
1025next <GWFILE> makes Perl try again to read something.
1026
1027If that doesn't work (it relies on features of your stdio implementation),
1028then you need something more like this:
1029
1030 for (;;) {
1031 for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
1032 # search for some stuff and put it into files
1033 }
1034 # sleep for a while
1035 seek(GWFILE, $curpos, 0); # seek to where we had been
1036 }
1037
1038If this still doesn't work, look into the POSIX module. POSIX defines
1039the clearerr() method, which can remove the end of file condition on a
1040filehandle. The method: read until end of file, clearerr(), read some
1041more. Lather, rinse, repeat.
1042
65acb1b1 1043There's also a File::Tail module from CPAN.
1044
68dc0745 1045=head2 How do I dup() a filehandle in Perl?
1046
1047If you check L<perlfunc/open>, you'll see that several of the ways
1048to call open() should do the trick. For example:
1049
1050 open(LOG, ">>/tmp/logfile");
1051 open(STDERR, ">&LOG");
1052
1053Or even with a literal numeric descriptor:
1054
1055 $fd = $ENV{MHCONTEXTFD};
1056 open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)
1057
c47ff5f1 1058Note that "<&STDIN" makes a copy, but "<&=STDIN" make
5a964f20 1059an alias. That means if you close an aliased handle, all
1060aliases become inaccessible. This is not true with
1061a copied one.
1062
1063Error checking, as always, has been left as an exercise for the reader.
68dc0745 1064
1065=head2 How do I close a file descriptor by number?
1066
1067This should rarely be necessary, as the Perl close() function is to be
1068used for things that Perl opened itself, even if it was a dup of a
a6dd486b 1069numeric descriptor as with MHCONTEXT above. But if you really have
68dc0745 1070to, you may be able to do this:
1071
1072 require 'sys/syscall.ph';
1073 $rc = syscall(&SYS_close, $fd + 0); # must force numeric
1074 die "can't sysclose $fd: $!" unless $rc == -1;
1075
a6dd486b 1076Or, just use the fdopen(3S) feature of open():
d92eb7b0 1077
1078 {
1079 local *F;
1080 open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!";
1081 close F;
1082 }
1083
46fc3d4c 1084=head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work?
68dc0745 1085
1086Whoops! You just put a tab and a formfeed into that filename!
1087Remember that within double quoted strings ("like\this"), the
1088backslash is an escape character. The full list of these is in
1089L<perlop/Quote and Quote-like Operators>. Unsurprisingly, you don't
1090have a file called "c:(tab)emp(formfeed)oo" or
65acb1b1 1091"c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem.
68dc0745 1092
1093Either single-quote your strings, or (preferably) use forward slashes.
46fc3d4c 1094Since all DOS and Windows versions since something like MS-DOS 2.0 or so
68dc0745 1095have treated C</> and C<\> the same in a path, you might as well use the
a6dd486b 1096one that doesn't clash with Perl--or the POSIX shell, ANSI C and C++,
65acb1b1 1097awk, Tcl, Java, or Python, just to mention a few. POSIX paths
1098are more portable, too.
68dc0745 1099
1100=head2 Why doesn't glob("*.*") get all the files?
1101
1102Because even on non-Unix ports, Perl's glob function follows standard
46fc3d4c 1103Unix globbing semantics. You'll need C<glob("*")> to get all (non-hidden)
65acb1b1 1104files. This makes glob() portable even to legacy systems. Your
1105port may include proprietary globbing functions as well. Check its
1106documentation for details.
68dc0745 1107
1108=head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
1109
1110This is elaborately and painstakingly described in the "Far More Than
7b8d334a 1111You Ever Wanted To Know" in
68dc0745 1112http://www.perl.com/CPAN/doc/FMTEYEWTK/file-dir-perms .
1113
1114The executive summary: learn how your filesystem works. The
1115permissions on a file say what can happen to the data in that file.
1116The permissions on a directory say what can happen to the list of
1117files in that directory. If you delete a file, you're removing its
1118name from the directory (so the operation depends on the permissions
1119of the directory, not of the file). If you try to write to the file,
1120the permissions of the file govern whether you're allowed to.
1121
1122=head2 How do I select a random line from a file?
1123
1124Here's an algorithm from the Camel Book:
1125
1126 srand;
1127 rand($.) < 1 && ($line = $_) while <>;
1128
1129This has a significant advantage in space over reading the whole
5a964f20 1130file in. A simple proof by induction is available upon
a6dd486b 1131request if you doubt the algorithm's correctness.
68dc0745 1132
65acb1b1 1133=head2 Why do I get weird spaces when I print an array of lines?
1134
1135Saying
1136
1137 print "@lines\n";
1138
1139joins together the elements of C<@lines> with a space between them.
1140If C<@lines> were C<("little", "fluffy", "clouds")> then the above
a6dd486b 1141statement would print
65acb1b1 1142
1143 little fluffy clouds
1144
1145but if each element of C<@lines> was a line of text, ending a newline
1146character C<("little\n", "fluffy\n", "clouds\n")> then it would print:
1147
1148 little
1149 fluffy
1150 clouds
1151
1152If your array contains lines, just print them:
1153
1154 print @lines;
1155
68dc0745 1156=head1 AUTHOR AND COPYRIGHT
1157
65acb1b1 1158Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20 1159All rights reserved.
1160
c8db1d39 1161When included as an integrated part of the Standard Distribution
d92eb7b0 1162of Perl or of its documentation (printed or otherwise), this works is
1163covered under Perl's Artistic License. For separate distributions of
c8db1d39 1164all or part of this FAQ outside of that, see L<perlfaq>.
1165
87275199 1166Irrespective of its distribution, all code examples here are in the public
c8db1d39 1167domain. You are permitted and encouraged to use this code and any
1168derivatives thereof in your own programs for fun or for profit as you
1169see fit. A simple comment in the code giving credit to the FAQ would
1170be courteous but is not required.