pod/perlfaq5.pod

   1 =head1 NAME
   2
   3 perlfaq5 - Files and Formats ($Revision: 1.22 $, $Date: 1997/04/24 22:44:02 $)
   4
   5 =head1 DESCRIPTION
   6
   7 This section deals with I/O and the "f" issues: filehandles, flushing,
   8 formats, and footers.
   9
  10 =head2 How do I flush/unbuffer an output filehandle?  Why must I do this?
  11
  12 The C standard I/O library (stdio) normally buffers characters sent to
  13 devices.  This is done for efficiency reasons, so that there isn't a
  14 system call for each byte.  Any time you use print() or write() in
  15 Perl, you go though this buffering.  syswrite() circumvents stdio and
  16 buffering.
  17
  18 In most stdio implementations, the type of output buffering and the size of
  19 the buffer varies according to the type of device.  Disk files are block
  20 buffered, often with a buffer size of more than 2k.  Pipes and sockets
  21 are often buffered with a buffer size between 1/2 and 2k.  Serial devices
  22 (e.g. modems, terminals) are normally line-buffered, and stdio sends
  23 the entire line when it gets the newline.
  24
  25 Perl does not support truly unbuffered output (except insofar as you can
  26 C<syswrite(OUT, $char, 1)>).  What it does instead support is "command
  27 buffering", in which a physical write is performed after every output
  28 command.  This isn't as hard on your system as unbuffering, but does
  29 get the output where you want it when you want it.
  30
  31 If you expect characters to get to your device when you print them there,
  32 you'll want to autoflush its handle.
  33 Use select() and the C<$|> variable to control autoflushing
  34 (see L<perlvar/$|> and L<perlfunc/select>):
  35
  36     $old_fh = select(OUTPUT_HANDLE);
  37     $| = 1;
  38     select($old_fh);
  39
  40 Or using the traditional idiom:
  41
  42     select((select(OUTPUT_HANDLE), $| = 1)[0]);
  43
  44 Or if don't mind slowly loading several thousand lines of module code
  45 just because you're afraid of the C<$|> variable:
  46
  47     use FileHandle;
  48     open(DEV, "+</dev/tty");      # ceci n'est pas une pipe
  49     DEV->autoflush(1);
  50
  51 or the newer IO::* modules:
  52
  53     use IO::Handle;
  54     open(DEV, ">/dev/printer");   # but is this?
  55     DEV->autoflush(1);
  56
  57 or even this:
  58
  59     use IO::Socket;               # this one is kinda a pipe?
  60     $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
  61                                   PeerPort => 'http(80)',
  62                                   Proto    => 'tcp');
  63     die "$!" unless $sock;
  64
  65     $sock->autoflush();
  66     print $sock "GET / HTTP/1.0" . "\015\012" x 2;
  67     $document = join('', <$sock>);
  68     print "DOC IS: $document\n";
  69
  70 Note the bizarrely hardcoded carriage return and newline in their octal
  71 equivalents.  This is the ONLY way (currently) to assure a proper flush
  72 on all platforms, including Macintosh.  That the way things work in
  73 network programming: you really should specify the exact bit pattern
  74 on the network line terminator.  In practice, C<"\n\n"> often works,
  75 but this is not portable.
  76
  77 See L<perlfaq9> for other examples of fetching URLs over the web.
  78
  79 =head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
  80
  81 Although humans have an easy time thinking of a text file as being a
  82 sequence of lines that operates much like a stack of playing cards --
  83 or punch cards -- computers usually see the text file as a sequence of
  84 bytes.  In general, there's no direct way for Perl to seek to a
  85 particular line of a file, insert text into a file, or remove text
  86 from a file.
  87
  88 (There are exceptions in special circumstances.  You can add or remove at
  89 the very end of the file.  Another is replacing a sequence of bytes with
  90 another sequence of the same length.  Another is using the C<$DB_RECNO>
  91 array bindings as documented in L<DB_File>.  Yet another is manipulating
  92 files with all lines the same length.)
  93
  94 The general solution is to create a temporary copy of the text file with
  95 the changes you want, then copy that over the original.  This assumes
  96 no locking.
  97
  98     $old = $file;
  99     $new = "$file.tmp.$$";
 100     $bak = "$file.bak";
 101
 102     open(OLD, "< $old")         or die "can't open $old: $!";
 103     open(NEW, "> $new")         or die "can't open $new: $!";
 104
 105     # Correct typos, preserving case
 106     while (<OLD>) {
 107         s/\b(p)earl\b/${1}erl/i;
 108         (print NEW $_)          or die "can't write to $new: $!";
 109     }
 110
 111     close(OLD)                  or die "can't close $old: $!";
 112     close(NEW)                  or die "can't close $new: $!";
 113
 114     rename($old, $bak)          or die "can't rename $old to $bak: $!";
 115     rename($new, $old)          or die "can't rename $new to $old: $!";
 116
 117 Perl can do this sort of thing for you automatically with the C<-i>
 118 command-line switch or the closely-related C<$^I> variable (see
 119 L<perlrun> for more details).  Note that
 120 C<-i> may require a suffix on some non-Unix systems; see the
 121 platform-specific documentation that came with your port.
 122
 123     # Renumber a series of tests from the command line
 124     perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t
 125
 126     # form a script
 127     local($^I, @ARGV) = ('.bak', glob("*.c"));
 128     while (<>) {
 129         if ($. == 1) {
 130             print "This line should appear at the top of each file\n";
 131         }
 132         s/\b(p)earl\b/${1}erl/i;        # Correct typos, preserving case
 133         print;
 134         close ARGV if eof;              # Reset $.
 135     }
 136
 137 If you need to seek to an arbitrary line of a file that changes
 138 infrequently, you could build up an index of byte positions of where
 139 the line ends are in the file.  If the file is large, an index of
 140 every tenth or hundredth line end would allow you to seek and read
 141 fairly efficiently.  If the file is sorted, try the look.pl library
 142 (part of the standard perl distribution).
 143
 144 In the unique case of deleting lines at the end of a file, you
 145 can use tell() and truncate().  The following code snippet deletes
 146 the last line of a file without making a copy or reading the
 147 whole file into memory:
 148
 149         open (FH, "+< $file");
 150         while ( <FH> ) { $addr = tell(FH) unless eof(FH) }
 151         truncate(FH, $addr);
 152
 153 Error checking is left as an exercise for the reader.
 154
 155 =head2 How do I count the number of lines in a file?
 156
 157 One fairly efficient way is to count newlines in the file. The
 158 following program uses a feature of tr///, as documented in L<perlop>.
 159 If your text file doesn't end with a newline, then it's not really a
 160 proper text file, so this may report one fewer line than you expect.
 161
 162     $lines = 0;
 163     open(FILE, $filename) or die "Can't open `$filename': $!";
 164     while (sysread FILE, $buffer, 4096) {
 165         $lines += ($buffer =~ tr/\n//);
 166     }
 167     close FILE;
 168
 169 This assumes no funny games with newline translations.
 170
 171 =head2 How do I make a temporary file name?
 172
 173 Use the C<new_tmpfile> class method from the IO::File module to get a
 174 filehandle opened for reading and writing.  Use this if you don't
 175 need to know the file's name.
 176
 177         use IO::File;
 178     $fh = IO::File->new_tmpfile()
 179             or die "Unable to make new temporary file: $!";
 180
 181 Or you can use the C<tmpnam> function from the POSIX module to get a
 182 filename that you then open yourself.  Use this if you do need to know
 183 the file's name.
 184
 185     use Fcntl;
 186     use POSIX qw(tmpnam);
 187
 188     # try new temporary filenames until we get one that didn't already
 189     # exist;  the check should be unnecessary, but you can't be too careful
 190     do { $name = tmpnam() }
 191         until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
 192
 193     # install atexit-style handler so that when we exit or die,
 194     # we automatically delete this temporary file
 195     END { unlink($name) or die "Couldn't unlink $name : $!" }
 196
 197     # now go on to use the file ...
 198
 199 If you're committed to doing this by hand, use the process ID and/or
 200 the current time-value.  If you need to have many temporary files in
 201 one process, use a counter:
 202
 203     BEGIN {
 204         use Fcntl;
 205         my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
 206         my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
 207         sub temp_file {
 208             local *FH;
 209             my $count = 0;
 210             until (defined(fileno(FH)) || $count++ > 100) {
 211                 $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
 212                 sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
 213             }
 214             if (defined(fileno(FH))
 215                 return (*FH, $base_name);
 216             } else {
 217                 return ();
 218             }
 219         }
 220     }
 221
 222 =head2 How can I manipulate fixed-record-length files?
 223
 224 The most efficient way is using pack() and unpack().  This is faster than
 225 using substr() when take many, many strings.  It is slower for just a few.
 226
 227 Here is a sample chunk of code to break up and put back together again
 228 some fixed-format input lines, in this case from the output of a normal,
 229 Berkeley-style ps:
 230
 231     # sample input line:
 232     #   15158 p5  T      0:00 perl /home/tchrist/scripts/now-what
 233     $PS_T = 'A6 A4 A7 A5 A*';
 234     open(PS, "ps|");
 235     print scalar <PS>;
 236     while (<PS>) {
 237         ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
 238         for $var (qw!pid tt stat time command!) {
 239             print "$var: <$$var>\n";
 240         }
 241         print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command),
 242                 "\n";
 243     }
 244
 245 We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
 246 That is, we've promoted a string to a scalar variable reference using
 247 symbolic references.  This is ok in small programs, but doesn't scale
 248 well.   It also only works on global variables, not lexicals.
 249
 250 =head2 How can I make a filehandle local to a subroutine?  How do I pass filehandles between subroutines?  How do I make an array of filehandles?
 251
 252 The fastest, simplest, and most direct way is to localize the typeglob
 253 of the filehandle in question:
 254
 255     local *TmpHandle;
 256
 257 Typeglobs are fast (especially compared with the alternatives) and
 258 reasonably easy to use, but they also have one subtle drawback.  If you
 259 had, for example, a function named TmpHandle(), or a variable named
 260 %TmpHandle, you just hid it from yourself.
 261
 262     sub findme {
 263         local *HostFile;
 264         open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
 265         local $_;               # <- VERY IMPORTANT
 266         while (<HostFile>) {
 267             print if /\b127\.(0\.0\.)?1\b/;
 268         }
 269         # *HostFile automatically closes/disappears here
 270     }
 271
 272 Here's how to use this in a loop to open and store a bunch of
 273 filehandles.  We'll use as values of the hash an ordered
 274 pair to make it easy to sort the hash in insertion order.
 275
 276     @names = qw(motd termcap passwd hosts);
 277     my $i = 0;
 278     foreach $filename (@names) {
 279         local *FH;
 280         open(FH, "/etc/$filename") || die "$filename: $!";
 281         $file{$filename} = [ $i++, *FH ];
 282     }
 283
 284     # Using the filehandles in the array
 285     foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
 286         my $fh = $file{$name}[1];
 287         my $line = <$fh>;
 288         print "$name $. $line";
 289     }
 290
 291 If you want to create many, anonymous handles, you should check out the
 292 Symbol, FileHandle, or IO::Handle (etc.) modules.  Here's the equivalent
 293 code with Symbol::gensym, which is reasonably light-weight:
 294
 295     foreach $filename (@names) {
 296         use Symbol;
 297         my $fh = gensym();
 298         open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
 299         $file{$filename} = [ $i++, $fh ];
 300     }
 301
 302 Or here using the semi-object-oriented FileHandle, which certainly isn't
 303 light-weight:
 304
 305     use FileHandle;
 306
 307     foreach $filename (@names) {
 308         my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
 309         $file{$filename} = [ $i++, $fh ];
 310     }
 311
 312 Please understand that whether the filehandle happens to be a (probably
 313 localized) typeglob or an anonymous handle from one of the modules,
 314 in no way affects the bizarre rules for managing indirect handles.
 315 See the next question.
 316
 317 =head2 How can I use a filehandle indirectly?
 318
 319 An indirect filehandle is using something other than a symbol
 320 in a place that a filehandle is expected.  Here are ways
 321 to get those:
 322
 323     $fh =   SOME_FH;       # bareword is strict-subs hostile
 324     $fh =  "SOME_FH";      # strict-refs hostile; same package only
 325     $fh =  *SOME_FH;       # typeglob
 326     $fh = \*SOME_FH;       # ref to typeglob (bless-able)
 327     $fh =  *SOME_FH{IO};   # blessed IO::Handle from *SOME_FH typeglob
 328
 329 Or to use the C<new> method from the FileHandle or IO modules to
 330 create an anonymous filehandle, store that in a scalar variable,
 331 and use it as though it were a normal filehandle.
 332
 333     use FileHandle;
 334     $fh = FileHandle->new();
 335
 336     use IO::Handle;                     # 5.004 or higher
 337     $fh = IO::Handle->new();
 338
 339 Then use any of those as you would a normal filehandle.  Anywhere that
 340 Perl is expecting a filehandle, an indirect filehandle may be used
 341 instead. An indirect filehandle is just a scalar variable that contains
 342 a filehandle.  Functions like C<print>, C<open>, C<seek>, or the functions or
 343 the C<E<lt>FHE<gt>> diamond operator will accept either a read filehandle
 344 or a scalar variable containing one:
 345
 346     ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
 347     print $ofh "Type it: ";
 348     $got = <$ifh>
 349     print $efh "What was that: $got";
 350
 351 Of you're passing a filehandle to a function, you can write
 352 the function in two ways:
 353
 354     sub accept_fh {
 355         my $fh = shift;
 356         print $fh "Sending to indirect filehandle\n";
 357     }
 358
 359 Or it can localize a typeglob and use the filehandle directly:
 360
 361     sub accept_fh {
 362         local *FH = shift;
 363         print  FH "Sending to localized filehandle\n";
 364     }
 365
 366 Both styles work with either objects or typeglobs of real filehandles.
 367 (They might also work with strings under some circumstances, but this
 368 is risky.)
 369
 370     accept_fh(*STDOUT);
 371     accept_fh($handle);
 372
 373 In the examples above, we assigned the filehandle to a scalar variable
 374 before using it.  That is because only simple scalar variables,
 375 not expressions or subscripts into hashes or arrays, can be used with
 376 built-ins like C<print>, C<printf>, or the diamond operator.  These are
 377 illegal and won't even compile:
 378
 379     @fd = (*STDIN, *STDOUT, *STDERR);
 380     print $fd[1] "Type it: ";                           # WRONG
 381     $got = <$fd[0]>                                     # WRONG
 382     print $fd[2] "What was that: $got";                 # WRONG
 383
 384 With C<print> and C<printf>, you get around this by using a block and
 385 an expression where you would place the filehandle:
 386
 387     print  { $fd[1] } "funny stuff\n";
 388     printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
 389     # Pity the poor deadbeef.
 390
 391 That block is a proper block like any other, so you can put more
 392 complicated code there.  This sends the message out to one of two places:
 393
 394     $ok = -x "/bin/cat";
 395     print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
 396     print { $fd[ 1+ ($ok || 0) ]  } "cat stat $ok\n";
 397
 398 This approach of treating C<print> and C<printf> like object methods
 399 calls doesn't work for the diamond operator.  That's because it's a
 400 real operator, not just a function with a comma-less argument.  Assuming
 401 you've been storing typeglobs in your structure as we did above, you
 402 can use the built-in function named C<readline> to reads a record just
 403 as C<E<lt>E<gt>> does.  Given the initialization shown above for @fd, this
 404 would work, but only because readline() require a typeglob.  It doesn't
 405 work with objects or strings, which might be a bug we haven't fixed yet.
 406
 407     $got = readline($fd[0]);
 408
 409 Let it be noted that the flakiness of indirect filehandles is not
 410 related to whether they're strings, typeglobs, objects, or anything else.
 411 It's the syntax of the fundamental operators.  Playing the object
 412 game doesn't help you at all here.
 413
 414 =head2 How can I set up a footer format to be used with write()?
 415
 416 There's no builtin way to do this, but L<perlform> has a couple of
 417 techniques to make it possible for the intrepid hacker.
 418
 419 =head2 How can I write() into a string?
 420
 421 See L<perlform> for an swrite() function.
 422
 423 =head2 How can I output my numbers with commas added?
 424
 425 This one will do it for you:
 426
 427     sub commify {
 428         local $_  = shift;
 429         1 while s/^(-?\d+)(\d{3})/$1,$2/;
 430         return $_;
 431     }
 432
 433     $n = 23659019423.2331;
 434     print "GOT: ", commify($n), "\n";
 435
 436     GOT: 23,659,019,423.2331
 437
 438 You can't just:
 439
 440     s/^(-?\d+)(\d{3})/$1,$2/g;
 441
 442 because you have to put the comma in and then recalculate your
 443 position.
 444
 445 Alternatively, this commifies all numbers in a line regardless of
 446 whether they have decimal portions, are preceded by + or -, or
 447 whatever:
 448
 449     # from Andrew Johnson <ajohnson@gpu.srv.ualberta.ca>
 450     sub commify {
 451        my $input = shift;
 452         $input = reverse $input;
 453         $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
 454         return reverse $input;
 455     }
 456
 457 =head2 How can I translate tildes (~) in a filename?
 458
 459 Use the E<lt>E<gt> (glob()) operator, documented in L<perlfunc>.  This
 460 requires that you have a shell installed that groks tildes, meaning
 461 csh or tcsh or (some versions of) ksh, and thus may have portability
 462 problems.  The Glob::KGlob module (available from CPAN) gives more
 463 portable glob functionality.
 464
 465 Within Perl, you may use this directly:
 466
 467         $filename =~ s{
 468           ^ ~             # find a leading tilde
 469           (               # save this in $1
 470               [^/]        # a non-slash character
 471                     *     # repeated 0 or more times (0 means me)
 472           )
 473         }{
 474           $1
 475               ? (getpwnam($1))[7]
 476               : ( $ENV{HOME} || $ENV{LOGDIR} )
 477         }ex;
 478
 479 =head2 How come when I open a file read-write it wipes it out?
 480
 481 Because you're using something like this, which truncates the file and
 482 I<then> gives you read-write access:
 483
 484     open(FH, "+> /path/name");          # WRONG (almost always)
 485
 486 Whoops.  You should instead use this, which will fail if the file
 487 doesn't exist.  Using "E<gt>" always clobbers or creates.
 488 Using "E<lt>" never does either.  The "+" doesn't change this.
 489
 490 Here are examples of many kinds of file opens.  Those using sysopen()
 491 all assume
 492
 493     use Fcntl;
 494
 495 To open file for reading:
 496
 497     open(FH, "< $path")                                 || die $!;
 498     sysopen(FH, $path, O_RDONLY)                        || die $!;
 499
 500 To open file for writing, create new file if needed or else truncate old file:
 501
 502     open(FH, "> $path") || die $!;
 503     sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT)        || die $!;
 504     sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666)  || die $!;
 505
 506 To open file for writing, create new file, file must not exist:
 507
 508     sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT)         || die $!;
 509     sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666)   || die $!;
 510
 511 To open file for appending, create if necessary:
 512
 513     open(FH, ">> $path") || die $!;
 514     sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT)       || die $!;
 515     sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;
 516
 517 To open file for appending, file must exist:
 518
 519     sysopen(FH, $path, O_WRONLY|O_APPEND)               || die $!;
 520
 521 To open file for update, file must exist:
 522
 523     open(FH, "+< $path")                                || die $!;
 524     sysopen(FH, $path, O_RDWR)                          || die $!;
 525
 526 To open file for update, create file if necessary:
 527
 528     sysopen(FH, $path, O_RDWR|O_CREAT)                  || die $!;
 529     sysopen(FH, $path, O_RDWR|O_CREAT, 0666)            || die $!;
 530
 531 To open file for update, file must not exist:
 532
 533     sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT)           || die $!;
 534     sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666)     || die $!;
 535
 536 To open a file without blocking, creating if necessary:
 537
 538     sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
 539             or die "can't open /tmp/somefile: $!":
 540
 541 Be warned that neither creation nor deletion of files is guaranteed to
 542 be an atomic operation over NFS.  That is, two processes might both
 543 successful create or unlink the same file!  Therefore O_EXCL
 544 isn't so exclusive as you might wish.
 545
 546 =head2 Why do I sometimes get an "Argument list too long" when I use <*>?
 547
 548 The C<E<lt>E<gt>> operator performs a globbing operation (see above).
 549 By default glob() forks csh(1) to do the actual glob expansion, but
 550 csh can't handle more than 127 items and so gives the error message
 551 C<Argument list too long>.  People who installed tcsh as csh won't
 552 have this problem, but their users may be surprised by it.
 553
 554 To get around this, either do the glob yourself with C<Dirhandle>s and
 555 patterns, or use a module like Glob::KGlob, one that doesn't use the
 556 shell to do globbing.
 557
 558 =head2 Is there a leak/bug in glob()?
 559
 560 Due to the current implementation on some operating systems, when you
 561 use the glob() function or its angle-bracket alias in a scalar
 562 context, you may cause a leak and/or unpredictable behavior.  It's
 563 best therefore to use glob() only in list context.
 564
 565 =head2 How can I open a file with a leading "E<gt>" or trailing blanks?
 566
 567 Normally perl ignores trailing blanks in filenames, and interprets
 568 certain leading characters (or a trailing "|") to mean something
 569 special.  To avoid this, you might want to use a routine like this.
 570 It makes incomplete pathnames into explicit relative ones, and tacks a
 571 trailing null byte on the name to make perl leave it alone:
 572
 573     sub safe_filename {
 574         local $_  = shift;
 575         return m#^/#
 576                 ? "$_\0"
 577                 : "./$_\0";
 578     }
 579
 580     $fn = safe_filename("<<<something really wicked   ");
 581     open(FH, "> $fn") or "couldn't open $fn: $!";
 582
 583 You could also use the sysopen() function (see L<perlfunc/sysopen>).
 584
 585 =head2 How can I reliably rename a file?
 586
 587 Well, usually you just use Perl's rename() function.  But that may
 588 not work everywhere, in particular, renaming files across file systems.
 589 If your operating system supports a mv(1) program or its moral equivalent,
 590 this works:
 591
 592     rename($old, $new) or system("mv", $old, $new);
 593
 594 It may be more compelling to use the File::Copy module instead.  You
 595 just copy to the new file to the new name (checking return values),
 596 then delete the old one.  This isn't really the same semantics as a
 597 real rename(), though, which preserves metainformation like
 598 permissions, timestamps, inode info, etc.
 599
 600 The newer version of File::Copy export a move() function.
 601
 602 =head2 How can I lock a file?
 603
 604 Perl's builtin flock() function (see L<perlfunc> for details) will call
 605 flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and
 606 later), and lockf(3) if neither of the two previous system calls exists.
 607 On some systems, it may even use a different form of native locking.
 608 Here are some gotchas with Perl's flock():
 609
 610 =over 4
 611
 612 =item 1
 613
 614 Produces a fatal error if none of the three system calls (or their
 615 close equivalent) exists.
 616
 617 =item 2
 618
 619 lockf(3) does not provide shared locking, and requires that the
 620 filehandle be open for writing (or appending, or read/writing).
 621
 622 =item 3
 623
 624 Some versions of flock() can't lock files over a network (e.g. on NFS
 625 file systems), so you'd need to force the use of fcntl(2) when you
 626 build Perl.  See the flock entry of L<perlfunc>, and the F<INSTALL>
 627 file in the source distribution for information on building Perl to do
 628 this.
 629
 630 =back
 631
 632 =head2 What can't I just open(FH, ">file.lock")?
 633
 634 A common bit of code B<NOT TO USE> is this:
 635
 636     sleep(3) while -e "file.lock";      # PLEASE DO NOT USE
 637     open(LCK, "> file.lock");           # THIS BROKEN CODE
 638
 639 This is a classic race condition: you take two steps to do something
 640 which must be done in one.  That's why computer hardware provides an
 641 atomic test-and-set instruction.   In theory, this "ought" to work:
 642
 643     sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
 644                 or die "can't open  file.lock: $!":
 645
 646 except that lamentably, file creation (and deletion) is not atomic
 647 over NFS, so this won't work (at least, not every time) over the net.
 648 Various schemes involving involving link() have been suggested, but
 649 these tend to involve busy-wait, which is also subdesirable.
 650
 651 =head2 I still don't get locking.  I just want to increment the number in the file.  How can I do this?
 652
 653 Didn't anyone ever tell you web-page hit counters were useless?
 654 They don't count number of hits, they're a waste of time, and they serve
 655 only to stroke the writer's vanity.  Better to pick a random number.
 656 It's more realistic.
 657
 658 Anyway, this is what you can do if you can't help yourself.
 659
 660     use Fcntl;
 661     sysopen(FH, "numfile", O_RDWR|O_CREAT)       or die "can't open numfile: $!";
 662     flock(FH, 2)                                 or die "can't flock numfile: $!";
 663     $num = <FH> || 0;
 664     seek(FH, 0, 0)                               or die "can't rewind numfile: $!";
 665     truncate(FH, 0)                              or die "can't truncate numfile: $!";
 666     (print FH $num+1, "\n")                      or die "can't write numfile: $!";
 667     # DO NOT UNLOCK THIS UNTIL YOU CLOSE
 668     close FH                                     or die "can't close numfile: $!";
 669
 670 Here's a much better web-page hit counter:
 671
 672     $hits = int( (time() - 850_000_000) / rand(1_000) );
 673
 674 If the count doesn't impress your friends, then the code might.  :-)
 675
 676 =head2 How do I randomly update a binary file?
 677
 678 If you're just trying to patch a binary, in many cases something as
 679 simple as this works:
 680
 681     perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
 682
 683 However, if you have fixed sized records, then you might do something more
 684 like this:
 685
 686     $RECSIZE = 220; # size of record, in bytes
 687     $recno   = 37;  # which record to update
 688     open(FH, "+<somewhere") || die "can't update somewhere: $!";
 689     seek(FH, $recno * $RECSIZE, 0);
 690     read(FH, $record, $RECSIZE) == $RECSIZE || die "can't read record $recno: $!";
 691     # munge the record
 692     seek(FH, $recno * $RECSIZE, 0);
 693     print FH $record;
 694     close FH;
 695
 696 Locking and error checking are left as an exercise for the reader.
 697 Don't forget them, or you'll be quite sorry.
 698
 699 =head2 How do I get a file's timestamp in perl?
 700
 701 If you want to retrieve the time at which the file was last read,
 702 written, or had its meta-data (owner, etc) changed, you use the B<-M>,
 703 B<-A>, or B<-C> filetest operations as documented in L<perlfunc>.  These
 704 retrieve the age of the file (measured against the start-time of your
 705 program) in days as a floating point number.  To retrieve the "raw"
 706 time in seconds since the epoch, you would call the stat function,
 707 then use localtime(), gmtime(), or POSIX::strftime() to convert this
 708 into human-readable form.
 709
 710 Here's an example:
 711
 712     $write_secs = (stat($file))[9];
 713     print "file $file updated at ", scalar(localtime($file)), "\n";
 714
 715 If you prefer something more legible, use the File::stat module
 716 (part of the standard distribution in version 5.004 and later):
 717
 718     use File::stat;
 719     use Time::localtime;
 720     $date_string = ctime(stat($file)->mtime);
 721     print "file $file updated at $date_string\n";
 722
 723 Error checking is left as an exercise for the reader.
 724
 725 =head2 How do I set a file's timestamp in perl?
 726
 727 You use the utime() function documented in L<perlfunc/utime>.
 728 By way of example, here's a little program that copies the
 729 read and write times from its first argument to all the rest
 730 of them.
 731
 732     if (@ARGV < 2) {
 733         die "usage: cptimes timestamp_file other_files ...\n";
 734     }
 735     $timestamp = shift;
 736     ($atime, $mtime) = (stat($timestamp))[8,9];
 737     utime $atime, $mtime, @ARGV;
 738
 739 Error checking is left as an exercise for the reader.
 740
 741 Note that utime() currently doesn't work correctly with Win95/NT
 742 ports.  A bug has been reported.  Check it carefully before using
 743 it on those platforms.
 744
 745 =head2 How do I print to more than one file at once?
 746
 747 If you only have to do this once, you can do this:
 748
 749     for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
 750
 751 To connect up to one filehandle to several output filehandles, it's
 752 easiest to use the tee(1) program if you have it, and let it take care
 753 of the multiplexing:
 754
 755     open (FH, "| tee file1 file2 file3");
 756
 757 Or even:
 758
 759     # make STDOUT go to three files, plus original STDOUT
 760     open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
 761     print "whatever\n"                       or die "Writing: $!\n";
 762     close(STDOUT)                            or die "Closing: $!\n";
 763
 764 Otherwise you'll have to write your own multiplexing print
 765 function -- or your own tee program -- or use Tom Christiansen's,
 766 at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is
 767 written in Perl and offers much greater functionality
 768 than the stock version.
 769
 770 =head2 How can I read in a file by paragraphs?
 771
 772 Use the C<$\> variable (see L<perlvar> for details).  You can either
 773 set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
 774 for instance, gets treated as two paragraphs and not three), or
 775 C<"\n\n"> to accept empty paragraphs.
 776
 777 =head2 How can I read a single character from a file?  From the keyboard?
 778
 779 You can use the builtin C<getc()> function for most filehandles, but
 780 it won't (easily) work on a terminal device.  For STDIN, either use
 781 the Term::ReadKey module from CPAN, or use the sample code in
 782 L<perlfunc/getc>.
 783
 784 If your system supports POSIX, you can use the following code, which
 785 you'll note turns off echo processing as well.
 786
 787     #!/usr/bin/perl -w
 788     use strict;
 789     $| = 1;
 790     for (1..4) {
 791         my $got;
 792         print "gimme: ";
 793         $got = getone();
 794         print "--> $got\n";
 795     }
 796     exit;
 797
 798     BEGIN {
 799         use POSIX qw(:termios_h);
 800
 801         my ($term, $oterm, $echo, $noecho, $fd_stdin);
 802
 803         $fd_stdin = fileno(STDIN);
 804
 805         $term     = POSIX::Termios->new();
 806         $term->getattr($fd_stdin);
 807         $oterm     = $term->getlflag();
 808
 809         $echo     = ECHO | ECHOK | ICANON;
 810         $noecho   = $oterm & ~$echo;
 811
 812         sub cbreak {
 813             $term->setlflag($noecho);
 814             $term->setcc(VTIME, 1);
 815             $term->setattr($fd_stdin, TCSANOW);
 816         }
 817
 818         sub cooked {
 819             $term->setlflag($oterm);
 820             $term->setcc(VTIME, 0);
 821             $term->setattr($fd_stdin, TCSANOW);
 822         }
 823
 824         sub getone {
 825             my $key = '';
 826             cbreak();
 827             sysread(STDIN, $key, 1);
 828             cooked();
 829             return $key;
 830         }
 831
 832     }
 833
 834     END { cooked() }
 835
 836 The Term::ReadKey module from CPAN may be easier to use:
 837
 838     use Term::ReadKey;
 839     open(TTY, "</dev/tty");
 840     print "Gimme a char: ";
 841     ReadMode "raw";
 842     $key = ReadKey 0, *TTY;
 843     ReadMode "normal";
 844     printf "\nYou said %s, char number %03d\n",
 845         $key, ord $key;
 846
 847 For DOS systems, Dan Carson <dbc@tc.fluke.COM> reports the following:
 848
 849 To put the PC in "raw" mode, use ioctl with some magic numbers gleaned
 850 from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes
 851 across the net every so often):
 852
 853     $old_ioctl = ioctl(STDIN,0,0);     # Gets device info
 854     $old_ioctl &= 0xff;
 855     ioctl(STDIN,1,$old_ioctl | 32);    # Writes it back, setting bit 5
 856
 857 Then to read a single character:
 858
 859     sysread(STDIN,$c,1);               # Read a single character
 860
 861 And to put the PC back to "cooked" mode:
 862
 863     ioctl(STDIN,1,$old_ioctl);         # Sets it back to cooked mode.
 864
 865 So now you have $c.  If C<ord($c) == 0>, you have a two byte code, which
 866 means you hit a special key.  Read another byte with C<sysread(STDIN,$c,1)>,
 867 and that value tells you what combination it was according to this
 868 table:
 869
 870     # PC 2-byte keycodes = ^@ + the following:
 871
 872     # HEX     KEYS
 873     # ---     ----
 874     # 0F      SHF TAB
 875     # 10-19   ALT QWERTYUIOP
 876     # 1E-26   ALT ASDFGHJKL
 877     # 2C-32   ALT ZXCVBNM
 878     # 3B-44   F1-F10
 879     # 47-49   HOME,UP,PgUp
 880     # 4B      LEFT
 881     # 4D      RIGHT
 882     # 4F-53   END,DOWN,PgDn,Ins,Del
 883     # 54-5D   SHF F1-F10
 884     # 5E-67   CTR F1-F10
 885     # 68-71   ALT F1-F10
 886     # 73-77   CTR LEFT,RIGHT,END,PgDn,HOME
 887     # 78-83   ALT 1234567890-=
 888     # 84      CTR PgUp
 889
 890 This is all trial and error I did a long time ago, I hope I'm reading the
 891 file that worked.
 892
 893 =head2 How can I tell if there's a character waiting on a filehandle?
 894
 895 The very first thing you should do is look into getting the Term::ReadKey
 896 extension from CPAN.  It now even has limited support for closed, proprietary
 897 (read: not open systems, not POSIX, not Unix, etc) systems.
 898
 899 You should also check out the Frequently Asked Questions list in
 900 comp.unix.* for things like this: the answer is essentially the same.
 901 It's very system dependent.  Here's one solution that works on BSD
 902 systems:
 903
 904     sub key_ready {
 905         my($rin, $nfd);
 906         vec($rin, fileno(STDIN), 1) = 1;
 907         return $nfd = select($rin,undef,undef,0);
 908     }
 909
 910 If you want to find out how many characters are waiting,
 911 there's also the FIONREAD ioctl call to be looked at.
 912
 913 The I<h2ph> tool that comes with Perl tries to convert C include
 914 files to Perl code, which can be C<require>d.  FIONREAD ends
 915 up defined as a function in the I<sys/ioctl.ph> file:
 916
 917     require 'sys/ioctl.ph';
 918
 919     $size = pack("L", 0);
 920     ioctl(FH, FIONREAD(), $size)    or die "Couldn't call ioctl: $!\n";
 921     $size = unpack("L", $size);
 922
 923 If I<h2ph> wasn't installed or doesn't work for you, you can
 924 I<grep> the include files by hand:
 925
 926     % grep FIONREAD /usr/include/*/*
 927     /usr/include/asm/ioctls.h:#define FIONREAD      0x541B
 928
 929 Or write a small C program using the editor of champions:
 930
 931     % cat > fionread.c
 932     #include <sys/ioctl.h>
 933     main() {
 934         printf("%#08x\n", FIONREAD);
 935     }
 936     ^D
 937     % cc -o fionread fionread
 938     % ./fionread
 939     0x4004667f
 940
 941 And then hard-code it, leaving porting as an exercise to your successor.
 942
 943     $FIONREAD = 0x4004667f;         # XXX: opsys dependent
 944
 945     $size = pack("L", 0);
 946     ioctl(FH, $FIONREAD, $size)     or die "Couldn't call ioctl: $!\n";
 947     $size = unpack("L", $size);
 948
 949 FIONREAD requires a filehandle connected to a stream, meaning sockets,
 950 pipes, and tty devices work, but I<not> files.
 951
 952 =head2 How do I do a C<tail -f> in perl?
 953
 954 First try
 955
 956     seek(GWFILE, 0, 1);
 957
 958 The statement C<seek(GWFILE, 0, 1)> doesn't change the current position,
 959 but it does clear the end-of-file condition on the handle, so that the
 960 next <GWFILE> makes Perl try again to read something.
 961
 962 If that doesn't work (it relies on features of your stdio implementation),
 963 then you need something more like this:
 964
 965         for (;;) {
 966           for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
 967             # search for some stuff and put it into files
 968           }
 969           # sleep for a while
 970           seek(GWFILE, $curpos, 0);  # seek to where we had been
 971         }
 972
 973 If this still doesn't work, look into the POSIX module.  POSIX defines
 974 the clearerr() method, which can remove the end of file condition on a
 975 filehandle.  The method: read until end of file, clearerr(), read some
 976 more.  Lather, rinse, repeat.
 977
 978 =head2 How do I dup() a filehandle in Perl?
 979
 980 If you check L<perlfunc/open>, you'll see that several of the ways
 981 to call open() should do the trick.  For example:
 982
 983     open(LOG, ">>/tmp/logfile");
 984     open(STDERR, ">&LOG");
 985
 986 Or even with a literal numeric descriptor:
 987
 988    $fd = $ENV{MHCONTEXTFD};
 989    open(MHCONTEXT, "<&=$fd");   # like fdopen(3S)
 990
 991 Note that "E<lt>&STDIN" makes a copy, but "E<lt>&=STDIN" make
 992 an alias.  That means if you close an aliased handle, all
 993 aliases become inaccessible.  This is not true with
 994 a copied one.
 995
 996 Error checking, as always, has been left as an exercise for the reader.
 997
 998 =head2 How do I close a file descriptor by number?
 999
1000 This should rarely be necessary, as the Perl close() function is to be
1001 used for things that Perl opened itself, even if it was a dup of a
1002 numeric descriptor, as with MHCONTEXT above.  But if you really have
1003 to, you may be able to do this:
1004
1005     require 'sys/syscall.ph';
1006     $rc = syscall(&SYS_close, $fd + 0);  # must force numeric
1007     die "can't sysclose $fd: $!" unless $rc == -1;
1008
1009 =head2 Why can't I use "C:\temp\foo" in DOS paths?  What doesn't `C:\temp\foo.exe` work?
1010
1011 Whoops!  You just put a tab and a formfeed into that filename!
1012 Remember that within double quoted strings ("like\this"), the
1013 backslash is an escape character.  The full list of these is in
1014 L<perlop/Quote and Quote-like Operators>.  Unsurprisingly, you don't
1015 have a file called "c:(tab)emp(formfeed)oo" or
1016 "c:(tab)emp(formfeed)oo.exe" on your DOS filesystem.
1017
1018 Either single-quote your strings, or (preferably) use forward slashes.
1019 Since all DOS and Windows versions since something like MS-DOS 2.0 or so
1020 have treated C</> and C<\> the same in a path, you might as well use the
1021 one that doesn't clash with Perl -- or the POSIX shell, ANSI C and C++,
1022 awk, Tcl, Java, or Python, just to mention a few.
1023
1024 =head2 Why doesn't glob("*.*") get all the files?
1025
1026 Because even on non-Unix ports, Perl's glob function follows standard
1027 Unix globbing semantics.  You'll need C<glob("*")> to get all (non-hidden)
1028 files.  This makes glob() portable.
1029
1030 =head2 Why does Perl let me delete read-only files?  Why does C<-i> clobber protected files?  Isn't this a bug in Perl?
1031
1032 This is elaborately and painstakingly described in the "Far More Than
1033 You Ever Wanted To Know" in
1034 http://www.perl.com/CPAN/doc/FMTEYEWTK/file-dir-perms .
1035
1036 The executive summary: learn how your filesystem works.  The
1037 permissions on a file say what can happen to the data in that file.
1038 The permissions on a directory say what can happen to the list of
1039 files in that directory.  If you delete a file, you're removing its
1040 name from the directory (so the operation depends on the permissions
1041 of the directory, not of the file).  If you try to write to the file,
1042 the permissions of the file govern whether you're allowed to.
1043
1044 =head2 How do I select a random line from a file?
1045
1046 Here's an algorithm from the Camel Book:
1047
1048     srand;
1049     rand($.) < 1 && ($line = $_) while <>;
1050
1051 This has a significant advantage in space over reading the whole
1052 file in.  A simple proof by induction is available upon
1053 request if you doubt its correctness.
1054
1055 =head1 AUTHOR AND COPYRIGHT
1056
1057 Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
1058 All rights reserved.
1059
1060 When included as part of the Standard Version of Perl, or as part of
1061 its complete documentation whether printed or otherwise, this work
1062 may be distributed only under the terms of Perl's Artistic License.
1063 Any distribution of this file or derivatives thereof I<outside>
1064 of that package require that special arrangements be made with
1065 copyright holder.
1066
1067 Irrespective of its distribution, all code examples in this file
1068 are hereby placed into the public domain.  You are permitted and
1069 encouraged to use this code in your own programs for fun
1070 or for profit as you see fit.  A simple comment in the code giving
1071 credit would be courteous but is not required.