X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq5.pod;h=49a348a81c7a8e8890936426cbcd61ed200f0a6a;hb=0dfdcd8a63a82bd61087d84a6f130e03a4b20ed9;hp=777350867d89b8fec4469845d3991941f2e84ffc;hpb=818c4caa84c1eb56340765ecb8e5b3df206aeab1;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 7773508..49a348a 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.17 $, $Date: 2002/05/23 19:33:50 $) +perlfaq5 - Files and Formats ($Revision: 1.30 $, $Date: 2003/11/23 08:07:46 $) =head1 DESCRIPTION @@ -30,7 +30,7 @@ value, Perl will flush the handle's buffer after each print() or write(). Setting $| affects buffering only for the currently selected default file handle. You choose this handle with the one argument select() call (see -L and L). +L> and L). Use select() to choose the desired handle, then set its per-filehandle variables. @@ -81,11 +81,36 @@ proper text file, so this may report one fewer line than you expect. This assumes no funny games with newline translations. +=head2 How can I use Perl's C<-i> option from within a program? + +C<-i> sets the value of Perl's C<$^I> variable, which in turn affects +the behavior of C<< <> >>; see L for more details. By +modifying the appropriate variables directly, you can get the same +behavior within a larger program. For example: + + # ... + { + local($^I, @ARGV) = ('.orig', glob("*.c")); + while (<>) { + if ($. == 1) { + print "This line should appear at the top of each file\n"; + } + s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case + print; + close ARGV if eof; # Reset $. + } + } + # $^I and @ARGV return to their old values here + +This block modifies all the C<.c> files in the current directory, +leaving a backup of the original data from each file in a new +C<.c.orig> file. + =head2 How do I make a temporary file name? Use the File::Temp module, see L for more information. - use File::Temp qw/ tempfile tempdir /; + use File::Temp qw/ tempfile tempdir /; $dir = tempdir( CLEANUP => 1 ); ($fh, $filename) = tempfile( DIR => $dir ); @@ -116,6 +141,7 @@ temporary files in one process, use a counter: my $count = 0; until (defined(fileno(FH)) || $count++ > 100) { $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e; + # O_EXCL is required for security reasons. sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT); } if (defined(fileno(FH)) @@ -128,8 +154,10 @@ temporary files in one process, use a counter: =head2 How can I manipulate fixed-record-length files? -The most efficient way is using pack() and unpack(). This is faster than -using substr() when taking many, many strings. It is slower for just a few. +The most efficient way is using L and +L. This is faster than using +L when taking many, many strings. It is +slower for just a few. Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, @@ -137,22 +165,23 @@ Berkeley-style ps: # sample input line: # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what - $PS_T = 'A6 A4 A7 A5 A*'; - open(PS, "ps|"); - print scalar ; - while () { - ($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_); - for $var (qw!pid tt stat time command!) { - print "$var: <$$var>\n"; + my $PS_T = 'A6 A4 A7 A5 A*'; + open my $ps, '-|', 'ps'; + print scalar <$ps>; + my @fields = qw( pid tt stat time command ); + while (<$ps>) { + my %process; + @process{@fields} = unpack($PS_T, $_); + for my $field ( @fields ) { + print "$field: <$process{$field}>\n"; } - print 'line=', pack($PS_T, $pid, $tt, $stat, $time, $command), - "\n"; + print 'line=', pack($PS_T, @process{@fields} ), "\n"; } -We've used C<$$var> in a way that forbidden by C. -That is, we've promoted a string to a scalar variable reference using -symbolic references. This is okay in small programs, but doesn't scale -well. It also only works on global variables, not lexicals. +We've used a hash slice in order to easily handle the fields of each row. +Storing the keys in an array means it's easy to operate on them as a +group or loop over them with for. It also avoids polluting the program +with global variables and using symbolic references. =head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? @@ -257,9 +286,9 @@ an expression where you would place the filehandle: That block is a proper block like any other, so you can put more complicated code there. This sends the message out to one of two places: - $ok = -x "/bin/cat"; + $ok = -x "/bin/cat"; print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n"; - print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; + print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n"; This approach of treating C and C like object methods calls doesn't work for the diamond operator. That's because it's a @@ -288,11 +317,19 @@ See L for an swrite() function. =head2 How can I output my numbers with commas added? -This one from Benjamin Goldberg will do it for you: +This subroutine will add commas to your number: + + sub commify { + local $_ = shift; + 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; + return $_; + } + +This regex from Benjamin Goldberg will add commas to numbers: s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g; -or written verbosely: +It is easier to see with comments: s/( ^[-+]? # beginning of number. @@ -336,7 +373,7 @@ I gives you read-write access: open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. +doesn't exist. open(FH, "+< /path/name"); # open for update @@ -391,8 +428,8 @@ To open file for update, file must not exist: To open a file without blocking, creating if necessary: - sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT) - or die "can't open /tmp/somefile: $!": + sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT) + or die "can't open /foo/somefile: $!": Be warned that neither creation nor deletion of files is guaranteed to be an atomic operation over NFS. That is, two processes might both @@ -401,7 +438,7 @@ isn't as exclusive as you might wish. See also the new L if you have it (new for 5.6). -=head2 Why do I sometimes get an "Argument list too long" when I use <*>? +=head2 Why do I sometimes get an "Argument list too long" when I use E*E? The C<< <> >> operator performs a globbing operation (see above). In Perl versions earlier than v5.6.0, the internal glob() operator forks @@ -425,11 +462,11 @@ best therefore to use glob() only in list context. Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|") to mean something -special. +special. The three argument form of open() lets you specify the mode separately from the filename. The open() function treats -special mode characters and whitespace in the filename as +special mode characters and whitespace in the filename as literals open FILE, "<", " file "; # filename is " file " @@ -444,8 +481,8 @@ It may be a lot clearer to use sysopen(), though: =head2 How can I reliably rename a file? -If your operating system supports a proper mv(1) utility or its functional -equivalent, this works: +If your operating system supports a proper mv(1) utility or its +functional equivalent, this works: rename($old, $new) or system("mv", $old, $new); @@ -499,12 +536,12 @@ for your own system's idiosyncrasies (sometimes called "features"). Slavish adherence to portability concerns shouldn't get in the way of your getting your job done.) -For more information on file locking, see also +For more information on file locking, see also L if you have it (new for 5.6). =back -=head2 Why can't I just open(FH, ">file.lock")? +=head2 Why can't I just open(FH, "Efile.lock")? A common bit of code B is this: @@ -516,7 +553,7 @@ which must be done in one. That's why computer hardware provides an atomic test-and-set instruction. In theory, this "ought" to work: sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT) - or die "can't open file.lock: $!": + or die "can't open file.lock: $!"; except that lamentably, file creation (and deletion) is not atomic over NFS, so this won't work (at least, not every time) over the net. @@ -653,30 +690,22 @@ utime() on those platforms. =head2 How do I print to more than one file at once? -If you only have to do this once, you can do this: +To connect one filehandle to several output filehandles, +you can use the IO::Tee or Tie::FileHandle::Multiplex modules. - for $fh (FH1, FH2, FH3) { print $fh "whatever\n" } +If you only have to do this once, you can print individually +to each filehandle. -To connect up to one filehandle to several output filehandles, it's -easiest to use the tee(1) program if you have it, and let it take care -of the multiplexing: + for $fh (FH1, FH2, FH3) { print $fh "whatever\n" } - open (FH, "| tee file1 file2 file3"); +=head2 How can I read in an entire file all at once? -Or even: +You can use the File::Slurp module to do it in one step. - # make STDOUT go to three files, plus original STDOUT - open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n"; - print "whatever\n" or die "Writing: $!\n"; - close(STDOUT) or die "Closing: $!\n"; + use File::Slurp; -Otherwise you'll have to write your own multiplexing print -function--or your own tee program--or use Tom Christiansen's, -at http://www.cpan.org/authors/id/TOMC/scripts/tct.gz , which is -written in Perl and offers much greater functionality -than the stock version. - -=head2 How can I read in an entire file all at once? + $all_of_it = read_file($filename); # entire file in scalar + @all_lines = read_file($filename); # one line perl element The customary Perl approach for processing all the lines in a file is to do so one line at a time: @@ -685,7 +714,7 @@ do so one line at a time: while () { chomp; # do something with $_ - } + } close(INPUT) || die "can't close $file: $!"; This is tremendously more efficient than reading the entire file into @@ -710,7 +739,7 @@ You can read the entire filehandle contents into a scalar. $var = ; } -That temporarily undefs your record separator, and will automatically +That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this: $var = do { local $/; }; @@ -729,7 +758,7 @@ set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">, for instance, gets treated as two paragraphs and not three), or C<"\n\n"> to accept empty paragraphs. -Note that a blank line must have no blanks in it. Thus +Note that a blank line must have no blanks in it. Thus S> is one paragraph, but C<"fred\n\nstuff\n\n"> is two. =head2 How can I read a single character from a file? From the keyboard? @@ -896,7 +925,7 @@ There's also a File::Tail module from CPAN. If you check L, you'll see that several of the ways to call open() should do the trick. For example: - open(LOG, ">>/tmp/logfile"); + open(LOG, ">>/foo/logfile"); open(STDERR, ">&LOG"); Or even with a literal numeric descriptor: @@ -906,7 +935,7 @@ Or even with a literal numeric descriptor: Note that "<&STDIN" makes a copy, but "<&=STDIN" make an alias. That means if you close an aliased handle, all -aliases become inaccessible. This is not true with +aliases become inaccessible. This is not true with a copied one. Error checking, as always, has been left as an exercise for the reader. @@ -924,8 +953,8 @@ to, you may be able to do this: Or, just use the fdopen(3S) feature of open(): - { - local *F; + { + local *F; open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!"; close F; } @@ -958,7 +987,7 @@ documentation for details. This is elaborately and painstakingly described in the F article in the "Far More Than You Ever Wanted To -Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz . +Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . The executive summary: learn how your filesystem works. The permissions on a file say what can happen to the data in that file. @@ -975,9 +1004,18 @@ Here's an algorithm from the Camel Book: srand; rand($.) < 1 && ($line = $_) while <>; -This has a significant advantage in space over reading the whole -file in. A simple proof by induction is available upon -request if you doubt the algorithm's correctness. +This has a significant advantage in space over reading the whole file +in. You can find a proof of this method in I, Volume 2, Section 3.4.2, by Donald E. Knuth. + +You can use the File::Random module which provides a function +for that algorithm: + + use File::Random qw/random_line/; + my $line = random_line($filename); + +Another way is to use the Tie::File module, which treats the entire +file as an array. Simply access a random array element. =head2 Why do I get weird spaces when I print an array of lines?