1 package IO::Uncompress::AnyInflate ;
3 # for RFC1950, RFC1951 or RFC1952
9 use IO::Compress::Base::Common qw(createSelfTiedObject);
11 use IO::Uncompress::Adapter::Inflate ();
14 use IO::Uncompress::Base ;
15 use IO::Uncompress::Gunzip ;
16 use IO::Uncompress::Inflate ;
17 use IO::Uncompress::RawInflate ;
18 use IO::Uncompress::Unzip ;
22 our ($VERSION, @ISA, @EXPORT_OK, %EXPORT_TAGS, $AnyInflateError);
24 $VERSION = '2.000_11';
25 $AnyInflateError = '';
27 @ISA = qw( Exporter IO::Uncompress::Base );
28 @EXPORT_OK = qw( $AnyInflateError anyinflate ) ;
29 %EXPORT_TAGS = %IO::Uncompress::Base::DEFLATE_CONSTANTS ;
30 push @{ $EXPORT_TAGS{all} }, @EXPORT_OK ;
31 Exporter::export_ok_tags('all');
33 # TODO - allow the user to pick a set of the three formats to allow
34 # or just assume want to auto-detect any of the three formats.
39 my $obj = createSelfTiedObject($class, \$AnyInflateError);
40 $obj->_create(undef, 0, @_);
45 my $obj = createSelfTiedObject(undef, \$AnyInflateError);
46 return $obj->_inf(@_) ;
59 # any always needs both crc32 and adler32
60 $got->value('CRC32' => 1);
61 $got->value('ADLER32' => 1);
72 my ($obj, $errstr, $errno) = IO::Uncompress::Adapter::Inflate::mkUncompObject();
74 return $self->saveErrorString(undef, $errstr, $errno)
77 *$self->{Uncomp} = $obj;
79 my $magic = $self->ckMagic( qw( RawInflate Inflate Gunzip Unzip ) );
82 *$self->{Info} = $self->readHeader($magic)
98 my $keep = ref $self ;
99 for my $class ( map { "IO::Uncompress::$_" } @names)
101 bless $self => $class;
102 my $magic = $self->ckMagic();
106 #bless $self => $class;
110 $self->pushBack(*$self->{HeaderPending}) ;
111 *$self->{HeaderPending} = '' ;
114 bless $self => $keep;
126 IO::Uncompress::AnyInflate - Uncompress zlib-based (zip, gzip) file/buffer
131 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
133 my $status = anyinflate $input => $output [,OPTS]
134 or die "anyinflate failed: $AnyInflateError\n";
136 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
137 or die "anyinflate failed: $AnyInflateError\n";
139 $status = $z->read($buffer)
140 $status = $z->read($buffer, $length)
141 $status = $z->read($buffer, $length, $offset)
142 $line = $z->getline()
147 $status = $z->inflateSync()
150 $data = $z->getHeaderInfo()
152 $z->seek($position, $whence)
164 read($z, $buffer, $length);
165 read($z, $buffer, $length, $offset);
167 seek($z, $position, $whence)
178 B<WARNING -- This is a Beta release>.
182 =item * DO NOT use in production code.
184 =item * The documentation is incomplete in places.
186 =item * Parts of the interface defined here are tentative.
188 =item * Please report any problems you find.
195 This module provides a Perl interface that allows the reading of
196 files/buffers that have been compressed in a number of formats that use the
197 zlib compression library.
199 The formats supported are
207 =item gzip (RFC 1952)
213 The module will auto-detect which, if any, of the supported
214 compression formats is being used.
221 =head1 Functional Interface
223 A top-level function, C<anyinflate>, is provided to carry out
224 "one-shot" uncompression between buffers and/or files. For finer
225 control over the uncompression process, see the L</"OO Interface">
228 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
230 anyinflate $input => $output [,OPTS]
231 or die "anyinflate failed: $AnyInflateError\n";
235 The functional interface needs Perl5.005 or better.
238 =head2 anyinflate $input => $output [, OPTS]
241 C<anyinflate> expects at least two parameters, C<$input> and C<$output>.
243 =head3 The C<$input> parameter
245 The parameter, C<$input>, is used to define the source of
248 It can take one of the following forms:
254 If the C<$input> parameter is a simple scalar, it is assumed to be a
255 filename. This file will be opened for reading and the input data
256 will be read from it.
260 If the C<$input> parameter is a filehandle, the input data will be
262 The string '-' can be used as an alias for standard input.
264 =item A scalar reference
266 If C<$input> is a scalar reference, the input data will be read
269 =item An array reference
271 If C<$input> is an array reference, each element in the array must be a
274 The input data will be read from each file in turn.
276 The complete array will be walked to ensure that it only
277 contains valid filenames before any data is uncompressed.
281 =item An Input FileGlob string
283 If C<$input> is a string that is delimited by the characters "<" and ">"
284 C<anyinflate> will assume that it is an I<input fileglob string>. The
285 input is the list of files that match the fileglob.
287 If the fileglob does not match any files ...
289 See L<File::GlobMapper|File::GlobMapper> for more details.
294 If the C<$input> parameter is any other type, C<undef> will be returned.
298 =head3 The C<$output> parameter
300 The parameter C<$output> is used to control the destination of the
301 uncompressed data. This parameter can take one of these forms.
307 If the C<$output> parameter is a simple scalar, it is assumed to be a
308 filename. This file will be opened for writing and the uncompressed
309 data will be written to it.
313 If the C<$output> parameter is a filehandle, the uncompressed data
314 will be written to it.
315 The string '-' can be used as an alias for standard output.
318 =item A scalar reference
320 If C<$output> is a scalar reference, the uncompressed data will be
321 stored in C<$$output>.
325 =item An Array Reference
327 If C<$output> is an array reference, the uncompressed data will be
328 pushed onto the array.
330 =item An Output FileGlob
332 If C<$output> is a string that is delimited by the characters "<" and ">"
333 C<anyinflate> will assume that it is an I<output fileglob string>. The
334 output is the list of files that match the fileglob.
336 When C<$output> is an fileglob string, C<$input> must also be a fileglob
337 string. Anything else is an error.
341 If the C<$output> parameter is any other type, C<undef> will be returned.
347 When C<$input> maps to multiple files/buffers and C<$output> is a single
348 file/buffer the uncompressed input files/buffers will all be stored
349 in C<$output> as a single uncompressed stream.
353 =head2 Optional Parameters
355 Unless specified below, the optional parameters for C<anyinflate>,
356 C<OPTS>, are the same as those used with the OO interface defined in the
357 L</"Constructor Options"> section below.
361 =item AutoClose =E<gt> 0|1
363 This option applies to any input or output data streams to
364 C<anyinflate> that are filehandles.
366 If C<AutoClose> is specified, and the value is true, it will result in all
367 input and/or output filehandles being closed once C<anyinflate> has
370 This parameter defaults to 0.
374 =item BinModeOut =E<gt> 0|1
376 When writing to a file or filehandle, set C<binmode> before writing to the
385 =item -Append =E<gt> 0|1
389 =item -MultiStream =E<gt> 0|1
391 Creates a new stream after each file.
404 To read the contents of the file C<file1.txt.Compressed> and write the
405 compressed data to the file C<file1.txt>.
409 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
411 my $input = "file1.txt.Compressed";
412 my $output = "file1.txt";
413 anyinflate $input => $output
414 or die "anyinflate failed: $AnyInflateError\n";
417 To read from an existing Perl filehandle, C<$input>, and write the
418 uncompressed data to a buffer, C<$buffer>.
422 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
425 my $input = new IO::File "<file1.txt.Compressed"
426 or die "Cannot open 'file1.txt.Compressed': $!\n" ;
428 anyinflate $input => \$buffer
429 or die "anyinflate failed: $AnyInflateError\n";
431 To uncompress all files in the directory "/my/home" that match "*.txt.Compressed" and store the compressed data in the same directory
435 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
437 anyinflate '</my/home/*.txt.Compressed>' => '</my/home/#1.txt>'
438 or die "anyinflate failed: $AnyInflateError\n";
440 and if you want to compress each file one at a time, this will do the trick
444 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
446 for my $input ( glob "/my/home/*.txt.Compressed" )
449 $output =~ s/.Compressed// ;
450 anyinflate $input => $output
451 or die "Error compressing '$input': $AnyInflateError\n";
458 The format of the constructor for IO::Uncompress::AnyInflate is shown below
461 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
462 or die "IO::Uncompress::AnyInflate failed: $AnyInflateError\n";
464 Returns an C<IO::Uncompress::AnyInflate> object on success and undef on failure.
465 The variable C<$AnyInflateError> will contain an error message on failure.
467 If you are running Perl 5.005 or better the object, C<$z>, returned from
468 IO::Uncompress::AnyInflate can be used exactly like an L<IO::File|IO::File> filehandle.
469 This means that all normal input file operations can be carried out with
470 C<$z>. For example, to read a line from a compressed file/buffer you can
471 use either of these forms
473 $line = $z->getline();
476 The mandatory parameter C<$input> is used to determine the source of the
477 compressed data. This parameter can take one of three forms.
483 If the C<$input> parameter is a scalar, it is assumed to be a filename. This
484 file will be opened for reading and the compressed data will be read from it.
488 If the C<$input> parameter is a filehandle, the compressed data will be
490 The string '-' can be used as an alias for standard input.
493 =item A scalar reference
495 If C<$input> is a scalar reference, the compressed data will be read from
500 =head2 Constructor Options
503 The option names defined below are case insensitive and can be optionally
504 prefixed by a '-'. So all of the following are valid
511 OPTS is a combination of the following options:
515 =item -AutoClose =E<gt> 0|1
517 This option is only valid when the C<$input> parameter is a filehandle. If
518 specified, and the value is true, it will result in the file being closed once
519 either the C<close> method is called or the IO::Uncompress::AnyInflate object is
522 This parameter defaults to 0.
524 =item -MultiStream =E<gt> 0|1
528 Allows multiple concatenated compressed streams to be treated as a single
529 compressed stream. Decompression will stop once either the end of the
530 file/buffer is reached, an error is encountered (premature eof, corrupt
531 compressed data) or the end of a stream is not immediately followed by the
532 start of another stream.
534 This parameter defaults to 0.
538 =item -Prime =E<gt> $string
540 This option will uncompress the contents of C<$string> before processing the
543 This option can be useful when the compressed data is embedded in another
544 file/data structure and it is not possible to work out where the compressed
545 data begins without having to read the first few bytes. If this is the
546 case, the uncompression can be I<primed> with these bytes using this
549 =item -Transparent =E<gt> 0|1
551 If this option is set and the input file or buffer is not compressed data,
552 the module will allow reading of it anyway.
554 This option defaults to 1.
556 =item -BlockSize =E<gt> $num
558 When reading the compressed input data, IO::Uncompress::AnyInflate will read it in
559 blocks of C<$num> bytes.
561 This option defaults to 4096.
563 =item -InputLength =E<gt> $size
565 When present this option will limit the number of compressed bytes read
566 from the input file/buffer to C<$size>. This option can be used in the
567 situation where there is useful data directly after the compressed data
568 stream and you know beforehand the exact length of the compressed data
571 This option is mostly used when reading from a filehandle, in which case
572 the file pointer will be left pointing to the first byte directly after the
573 compressed data stream.
577 This option defaults to off.
579 =item -Append =E<gt> 0|1
581 This option controls what the C<read> method does with uncompressed data.
583 If set to 1, all uncompressed data will be appended to the output parameter
584 of the C<read> method.
586 If set to 0, the contents of the output parameter of the C<read> method
587 will be overwritten by the uncompressed data.
591 =item -Strict =E<gt> 0|1
595 This option controls whether the extra checks defined below are used when
596 carrying out the decompression. When Strict is on, the extra tests are
597 carried out, when Strict is off they are not.
599 The default for this option is off.
602 If the input is an RFC 1950 data stream, the following will be checked:
611 The ADLER32 checksum field must be present.
615 The value of the ADLER32 field read must match the adler32 value of the
616 uncompressed data actually contained in the file.
622 If the input is a gzip (RFC 1952) data stream, the following will be checked:
631 If the FHCRC bit is set in the gzip FLG header byte, the CRC16 bytes in the
632 header must match the crc16 value of the gzip header actually read.
636 If the gzip header contains a name field (FNAME) it consists solely of ISO
641 If the gzip header contains a comment field (FCOMMENT) it consists solely
642 of ISO 8859-1 characters plus line-feed.
646 If the gzip FEXTRA header field is present it must conform to the sub-field
647 structure as defined in RFC 1952.
651 The CRC32 and ISIZE trailer fields must be present.
655 The value of the CRC32 field read must match the crc32 value of the
656 uncompressed data actually contained in the gzip file.
660 The value of the ISIZE fields read must match the length of the
661 uncompressed data actually read from the file.
670 =item -ParseExtra =E<gt> 0|1
672 If the gzip FEXTRA header field is present and this option is set, it will
673 force the module to check that it conforms to the sub-field structure as
676 If the C<Strict> is on it will automatically enable this option.
696 $status = $z->read($buffer)
698 Reads a block of compressed data (the size the the compressed block is
699 determined by the C<Buffer> option in the constructor), uncompresses it and
700 writes any uncompressed data into C<$buffer>. If the C<Append> parameter is
701 set in the constructor, the uncompressed data will be appended to the
702 C<$buffer> parameter. Otherwise C<$buffer> will be overwritten.
704 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
705 or a negative number on error.
711 $status = $z->read($buffer, $length)
712 $status = $z->read($buffer, $length, $offset)
714 $status = read($z, $buffer, $length)
715 $status = read($z, $buffer, $length, $offset)
717 Attempt to read C<$length> bytes of uncompressed data into C<$buffer>.
719 The main difference between this form of the C<read> method and the
720 previous one, is that this one will attempt to return I<exactly> C<$length>
721 bytes. The only circumstances that this function will not is if end-of-file
722 or an IO error is encountered.
724 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
725 or a negative number on error.
732 $line = $z->getline()
737 This method fully supports the use of of the variable C<$/>
738 (or C<$INPUT_RECORD_SEPARATOR> or C<$RS> when C<English> is in use) to
739 determine what constitutes an end of line. Both paragraph mode and file
740 slurp mode are supported.
749 Read a single character.
755 $char = $z->ungetc($string)
763 $status = $z->inflateSync()
772 $hdr = $z->getHeaderInfo();
773 @hdrs = $z->getHeaderInfo();
775 This method returns either a hash reference (in scalar context) or a list
776 or hash references (in array context) that contains information about each
777 of the header fields in the compressed data stream(s).
789 Returns the uncompressed file offset.
800 Returns true if the end of the compressed input stream has been reached.
806 $z->seek($position, $whence);
807 seek($z, $position, $whence);
812 Provides a sub-set of the C<seek> functionality, with the restriction
813 that it is only legal to seek forward in the input file/buffer.
814 It is a fatal error to attempt to seek backward.
818 The C<$whence> parameter takes one the usual values, namely SEEK_SET,
819 SEEK_CUR or SEEK_END.
821 Returns 1 on success, 0 on failure.
830 This is a noop provided for completeness.
836 Returns true if the object currently refers to a opened file/buffer.
840 my $prev = $z->autoflush()
841 my $prev = $z->autoflush(EXPR)
843 If the C<$z> object is associated with a file or a filehandle, this method
844 returns the current autoflush setting for the underlying filehandle. If
845 C<EXPR> is present, and is non-zero, it will enable flushing after every
846 write/print operation.
848 If C<$z> is associated with a buffer, this method has no effect and always
851 B<Note> that the special variable C<$|> B<cannot> be used to set or
852 retrieve the autoflush setting.
854 =head2 input_line_number
856 $z->input_line_number()
857 $z->input_line_number(EXPR)
861 Returns the current uncompressed line number. If C<EXPR> is present it has
862 the effect of setting the line number. Note that setting the line number
863 does not change the current position within the file/buffer being read.
865 The contents of C<$/> are used to to determine what constitutes a line
875 If the C<$z> object is associated with a file or a filehandle, this method
876 will return the underlying file descriptor.
878 If the C<$z> object is is associated with a buffer, this method will
888 Closes the output file/buffer.
892 For most versions of Perl this method will be automatically invoked if
893 the IO::Uncompress::AnyInflate object is destroyed (either explicitly or by the
894 variable with the reference to the object going out of scope). The
895 exceptions are Perl versions 5.005 through 5.00504 and 5.8.0. In
896 these cases, the C<close> method will be called automatically, but
897 not until global destruction of all live objects when the program is
900 Therefore, if you want your scripts to be able to run on all versions
901 of Perl, you should call C<close> explicitly and not rely on automatic
904 Returns true on success, otherwise 0.
906 If the C<AutoClose> option has been enabled when the IO::Uncompress::AnyInflate
907 object was created, and the object is associated with a file, the
908 underlying file will also be closed.
915 No symbolic constants are required by this IO::Uncompress::AnyInflate at present.
921 Imports C<anyinflate> and C<$AnyInflateError>.
924 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
935 L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Uncompress::AnyUncompress>
937 L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>
939 L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
940 L<Archive::Tar|Archive::Tar>,
944 For RFC 1950, 1951 and 1952 see
945 F<http://www.faqs.org/rfcs/rfc1950.html>,
946 F<http://www.faqs.org/rfcs/rfc1951.html> and
947 F<http://www.faqs.org/rfcs/rfc1952.html>
949 The I<zlib> compression library was written by Jean-loup Gailly
950 F<gzip@prep.ai.mit.edu> and Mark Adler F<madler@alumni.caltech.edu>.
952 The primary site for the I<zlib> compression library is
953 F<http://www.zlib.org>.
955 The primary site for gzip is F<http://www.gzip.org>.
962 This module was written by Paul Marquess, F<pmqs@cpan.org>.
966 =head1 MODIFICATION HISTORY
968 See the Changes file.
970 =head1 COPYRIGHT AND LICENSE
972 Copyright (c) 2005-2006 Paul Marquess. All rights reserved.
974 This program is free software; you can redistribute it and/or
975 modify it under the same terms as Perl itself.