1 package IO::Uncompress::AnyInflate ;
3 # for RFC1950, RFC1951 or RFC1952
9 use IO::Compress::Base::Common qw(createSelfTiedObject);
11 use IO::Uncompress::Adapter::Inflate ();
14 use IO::Uncompress::Base ;
15 use IO::Uncompress::Gunzip ;
16 use IO::Uncompress::Inflate ;
17 use IO::Uncompress::RawInflate ;
18 use IO::Uncompress::Unzip ;
22 our ($VERSION, @ISA, @EXPORT_OK, %EXPORT_TAGS, $AnyInflateError);
24 $VERSION = '2.000_12';
25 $AnyInflateError = '';
27 @ISA = qw( Exporter IO::Uncompress::Base );
28 @EXPORT_OK = qw( $AnyInflateError anyinflate ) ;
29 %EXPORT_TAGS = %IO::Uncompress::Base::DEFLATE_CONSTANTS ;
30 push @{ $EXPORT_TAGS{all} }, @EXPORT_OK ;
31 Exporter::export_ok_tags('all');
33 # TODO - allow the user to pick a set of the three formats to allow
34 # or just assume want to auto-detect any of the three formats.
39 my $obj = createSelfTiedObject($class, \$AnyInflateError);
40 $obj->_create(undef, 0, @_);
45 my $obj = createSelfTiedObject(undef, \$AnyInflateError);
46 return $obj->_inf(@_) ;
59 # any always needs both crc32 and adler32
60 $got->value('CRC32' => 1);
61 $got->value('ADLER32' => 1);
72 my ($obj, $errstr, $errno) = IO::Uncompress::Adapter::Inflate::mkUncompObject();
74 return $self->saveErrorString(undef, $errstr, $errno)
77 *$self->{Uncomp} = $obj;
79 my $magic = $self->ckMagic( qw( RawInflate Inflate Gunzip Unzip ) );
82 *$self->{Info} = $self->readHeader($magic)
98 my $keep = ref $self ;
99 for my $class ( map { "IO::Uncompress::$_" } @names)
101 bless $self => $class;
102 my $magic = $self->ckMagic();
106 #bless $self => $class;
110 $self->pushBack(*$self->{HeaderPending}) ;
111 *$self->{HeaderPending} = '' ;
114 bless $self => $keep;
126 IO::Uncompress::AnyInflate - Uncompress zlib-based (zip, gzip) file/buffer
131 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
133 my $status = anyinflate $input => $output [,OPTS]
134 or die "anyinflate failed: $AnyInflateError\n";
136 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
137 or die "anyinflate failed: $AnyInflateError\n";
139 $status = $z->read($buffer)
140 $status = $z->read($buffer, $length)
141 $status = $z->read($buffer, $length, $offset)
142 $line = $z->getline()
147 $status = $z->inflateSync()
150 $data = $z->getHeaderInfo()
152 $z->seek($position, $whence)
164 read($z, $buffer, $length);
165 read($z, $buffer, $length, $offset);
167 seek($z, $position, $whence)
178 B<WARNING -- This is a Beta release>.
182 =item * DO NOT use in production code.
184 =item * The documentation is incomplete in places.
186 =item * Parts of the interface defined here are tentative.
188 =item * Please report any problems you find.
195 This module provides a Perl interface that allows the reading of
196 files/buffers that have been compressed in a number of formats that use the
197 zlib compression library.
199 The formats supported are
207 =item gzip (RFC 1952)
213 The module will auto-detect which, if any, of the supported
214 compression formats is being used.
221 =head1 Functional Interface
223 A top-level function, C<anyinflate>, is provided to carry out
224 "one-shot" uncompression between buffers and/or files. For finer
225 control over the uncompression process, see the L</"OO Interface">
228 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
230 anyinflate $input => $output [,OPTS]
231 or die "anyinflate failed: $AnyInflateError\n";
235 The functional interface needs Perl5.005 or better.
238 =head2 anyinflate $input => $output [, OPTS]
241 C<anyinflate> expects at least two parameters, C<$input> and C<$output>.
243 =head3 The C<$input> parameter
245 The parameter, C<$input>, is used to define the source of
248 It can take one of the following forms:
254 If the C<$input> parameter is a simple scalar, it is assumed to be a
255 filename. This file will be opened for reading and the input data
256 will be read from it.
260 If the C<$input> parameter is a filehandle, the input data will be
262 The string '-' can be used as an alias for standard input.
264 =item A scalar reference
266 If C<$input> is a scalar reference, the input data will be read
269 =item An array reference
271 If C<$input> is an array reference, each element in the array must be a
274 The input data will be read from each file in turn.
276 The complete array will be walked to ensure that it only
277 contains valid filenames before any data is uncompressed.
281 =item An Input FileGlob string
283 If C<$input> is a string that is delimited by the characters "<" and ">"
284 C<anyinflate> will assume that it is an I<input fileglob string>. The
285 input is the list of files that match the fileglob.
287 If the fileglob does not match any files ...
289 See L<File::GlobMapper|File::GlobMapper> for more details.
294 If the C<$input> parameter is any other type, C<undef> will be returned.
298 =head3 The C<$output> parameter
300 The parameter C<$output> is used to control the destination of the
301 uncompressed data. This parameter can take one of these forms.
307 If the C<$output> parameter is a simple scalar, it is assumed to be a
308 filename. This file will be opened for writing and the uncompressed
309 data will be written to it.
313 If the C<$output> parameter is a filehandle, the uncompressed data
314 will be written to it.
315 The string '-' can be used as an alias for standard output.
318 =item A scalar reference
320 If C<$output> is a scalar reference, the uncompressed data will be
321 stored in C<$$output>.
325 =item An Array Reference
327 If C<$output> is an array reference, the uncompressed data will be
328 pushed onto the array.
330 =item An Output FileGlob
332 If C<$output> is a string that is delimited by the characters "<" and ">"
333 C<anyinflate> will assume that it is an I<output fileglob string>. The
334 output is the list of files that match the fileglob.
336 When C<$output> is an fileglob string, C<$input> must also be a fileglob
337 string. Anything else is an error.
341 If the C<$output> parameter is any other type, C<undef> will be returned.
348 When C<$input> maps to multiple compressed files/buffers and C<$output> is
349 a single file/buffer, after uncompression C<$output> will contain a
350 concatenation of all the uncompressed data from each of the input
357 =head2 Optional Parameters
359 Unless specified below, the optional parameters for C<anyinflate>,
360 C<OPTS>, are the same as those used with the OO interface defined in the
361 L</"Constructor Options"> section below.
365 =item AutoClose =E<gt> 0|1
367 This option applies to any input or output data streams to
368 C<anyinflate> that are filehandles.
370 If C<AutoClose> is specified, and the value is true, it will result in all
371 input and/or output filehandles being closed once C<anyinflate> has
374 This parameter defaults to 0.
378 =item BinModeOut =E<gt> 0|1
380 When writing to a file or filehandle, set C<binmode> before writing to the
389 =item -Append =E<gt> 0|1
393 =item -MultiStream =E<gt> 0|1
395 Creates a new stream after each file.
408 To read the contents of the file C<file1.txt.Compressed> and write the
409 compressed data to the file C<file1.txt>.
413 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
415 my $input = "file1.txt.Compressed";
416 my $output = "file1.txt";
417 anyinflate $input => $output
418 or die "anyinflate failed: $AnyInflateError\n";
421 To read from an existing Perl filehandle, C<$input>, and write the
422 uncompressed data to a buffer, C<$buffer>.
426 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
429 my $input = new IO::File "<file1.txt.Compressed"
430 or die "Cannot open 'file1.txt.Compressed': $!\n" ;
432 anyinflate $input => \$buffer
433 or die "anyinflate failed: $AnyInflateError\n";
435 To uncompress all files in the directory "/my/home" that match "*.txt.Compressed" and store the compressed data in the same directory
439 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
441 anyinflate '</my/home/*.txt.Compressed>' => '</my/home/#1.txt>'
442 or die "anyinflate failed: $AnyInflateError\n";
444 and if you want to compress each file one at a time, this will do the trick
448 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
450 for my $input ( glob "/my/home/*.txt.Compressed" )
453 $output =~ s/.Compressed// ;
454 anyinflate $input => $output
455 or die "Error compressing '$input': $AnyInflateError\n";
462 The format of the constructor for IO::Uncompress::AnyInflate is shown below
465 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
466 or die "IO::Uncompress::AnyInflate failed: $AnyInflateError\n";
468 Returns an C<IO::Uncompress::AnyInflate> object on success and undef on failure.
469 The variable C<$AnyInflateError> will contain an error message on failure.
471 If you are running Perl 5.005 or better the object, C<$z>, returned from
472 IO::Uncompress::AnyInflate can be used exactly like an L<IO::File|IO::File> filehandle.
473 This means that all normal input file operations can be carried out with
474 C<$z>. For example, to read a line from a compressed file/buffer you can
475 use either of these forms
477 $line = $z->getline();
480 The mandatory parameter C<$input> is used to determine the source of the
481 compressed data. This parameter can take one of three forms.
487 If the C<$input> parameter is a scalar, it is assumed to be a filename. This
488 file will be opened for reading and the compressed data will be read from it.
492 If the C<$input> parameter is a filehandle, the compressed data will be
494 The string '-' can be used as an alias for standard input.
497 =item A scalar reference
499 If C<$input> is a scalar reference, the compressed data will be read from
504 =head2 Constructor Options
507 The option names defined below are case insensitive and can be optionally
508 prefixed by a '-'. So all of the following are valid
515 OPTS is a combination of the following options:
519 =item -AutoClose =E<gt> 0|1
521 This option is only valid when the C<$input> parameter is a filehandle. If
522 specified, and the value is true, it will result in the file being closed once
523 either the C<close> method is called or the IO::Uncompress::AnyInflate object is
526 This parameter defaults to 0.
528 =item -MultiStream =E<gt> 0|1
532 Allows multiple concatenated compressed streams to be treated as a single
533 compressed stream. Decompression will stop once either the end of the
534 file/buffer is reached, an error is encountered (premature eof, corrupt
535 compressed data) or the end of a stream is not immediately followed by the
536 start of another stream.
538 This parameter defaults to 0.
542 =item -Prime =E<gt> $string
544 This option will uncompress the contents of C<$string> before processing the
547 This option can be useful when the compressed data is embedded in another
548 file/data structure and it is not possible to work out where the compressed
549 data begins without having to read the first few bytes. If this is the
550 case, the uncompression can be I<primed> with these bytes using this
553 =item -Transparent =E<gt> 0|1
555 If this option is set and the input file or buffer is not compressed data,
556 the module will allow reading of it anyway.
558 This option defaults to 1.
560 =item -BlockSize =E<gt> $num
562 When reading the compressed input data, IO::Uncompress::AnyInflate will read it in
563 blocks of C<$num> bytes.
565 This option defaults to 4096.
567 =item -InputLength =E<gt> $size
569 When present this option will limit the number of compressed bytes read
570 from the input file/buffer to C<$size>. This option can be used in the
571 situation where there is useful data directly after the compressed data
572 stream and you know beforehand the exact length of the compressed data
575 This option is mostly used when reading from a filehandle, in which case
576 the file pointer will be left pointing to the first byte directly after the
577 compressed data stream.
581 This option defaults to off.
583 =item -Append =E<gt> 0|1
585 This option controls what the C<read> method does with uncompressed data.
587 If set to 1, all uncompressed data will be appended to the output parameter
588 of the C<read> method.
590 If set to 0, the contents of the output parameter of the C<read> method
591 will be overwritten by the uncompressed data.
595 =item -Strict =E<gt> 0|1
599 This option controls whether the extra checks defined below are used when
600 carrying out the decompression. When Strict is on, the extra tests are
601 carried out, when Strict is off they are not.
603 The default for this option is off.
606 If the input is an RFC 1950 data stream, the following will be checked:
615 The ADLER32 checksum field must be present.
619 The value of the ADLER32 field read must match the adler32 value of the
620 uncompressed data actually contained in the file.
626 If the input is a gzip (RFC 1952) data stream, the following will be checked:
635 If the FHCRC bit is set in the gzip FLG header byte, the CRC16 bytes in the
636 header must match the crc16 value of the gzip header actually read.
640 If the gzip header contains a name field (FNAME) it consists solely of ISO
645 If the gzip header contains a comment field (FCOMMENT) it consists solely
646 of ISO 8859-1 characters plus line-feed.
650 If the gzip FEXTRA header field is present it must conform to the sub-field
651 structure as defined in RFC 1952.
655 The CRC32 and ISIZE trailer fields must be present.
659 The value of the CRC32 field read must match the crc32 value of the
660 uncompressed data actually contained in the gzip file.
664 The value of the ISIZE fields read must match the length of the
665 uncompressed data actually read from the file.
674 =item -ParseExtra =E<gt> 0|1
676 If the gzip FEXTRA header field is present and this option is set, it will
677 force the module to check that it conforms to the sub-field structure as
680 If the C<Strict> is on it will automatically enable this option.
700 $status = $z->read($buffer)
702 Reads a block of compressed data (the size the the compressed block is
703 determined by the C<Buffer> option in the constructor), uncompresses it and
704 writes any uncompressed data into C<$buffer>. If the C<Append> parameter is
705 set in the constructor, the uncompressed data will be appended to the
706 C<$buffer> parameter. Otherwise C<$buffer> will be overwritten.
708 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
709 or a negative number on error.
715 $status = $z->read($buffer, $length)
716 $status = $z->read($buffer, $length, $offset)
718 $status = read($z, $buffer, $length)
719 $status = read($z, $buffer, $length, $offset)
721 Attempt to read C<$length> bytes of uncompressed data into C<$buffer>.
723 The main difference between this form of the C<read> method and the
724 previous one, is that this one will attempt to return I<exactly> C<$length>
725 bytes. The only circumstances that this function will not is if end-of-file
726 or an IO error is encountered.
728 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof
729 or a negative number on error.
736 $line = $z->getline()
741 This method fully supports the use of of the variable C<$/>
742 (or C<$INPUT_RECORD_SEPARATOR> or C<$RS> when C<English> is in use) to
743 determine what constitutes an end of line. Both paragraph mode and file
744 slurp mode are supported.
753 Read a single character.
759 $char = $z->ungetc($string)
767 $status = $z->inflateSync()
776 $hdr = $z->getHeaderInfo();
777 @hdrs = $z->getHeaderInfo();
779 This method returns either a hash reference (in scalar context) or a list
780 or hash references (in array context) that contains information about each
781 of the header fields in the compressed data stream(s).
793 Returns the uncompressed file offset.
804 Returns true if the end of the compressed input stream has been reached.
810 $z->seek($position, $whence);
811 seek($z, $position, $whence);
816 Provides a sub-set of the C<seek> functionality, with the restriction
817 that it is only legal to seek forward in the input file/buffer.
818 It is a fatal error to attempt to seek backward.
822 The C<$whence> parameter takes one the usual values, namely SEEK_SET,
823 SEEK_CUR or SEEK_END.
825 Returns 1 on success, 0 on failure.
834 This is a noop provided for completeness.
840 Returns true if the object currently refers to a opened file/buffer.
844 my $prev = $z->autoflush()
845 my $prev = $z->autoflush(EXPR)
847 If the C<$z> object is associated with a file or a filehandle, this method
848 returns the current autoflush setting for the underlying filehandle. If
849 C<EXPR> is present, and is non-zero, it will enable flushing after every
850 write/print operation.
852 If C<$z> is associated with a buffer, this method has no effect and always
855 B<Note> that the special variable C<$|> B<cannot> be used to set or
856 retrieve the autoflush setting.
858 =head2 input_line_number
860 $z->input_line_number()
861 $z->input_line_number(EXPR)
865 Returns the current uncompressed line number. If C<EXPR> is present it has
866 the effect of setting the line number. Note that setting the line number
867 does not change the current position within the file/buffer being read.
869 The contents of C<$/> are used to to determine what constitutes a line
879 If the C<$z> object is associated with a file or a filehandle, this method
880 will return the underlying file descriptor.
882 If the C<$z> object is is associated with a buffer, this method will
892 Closes the output file/buffer.
896 For most versions of Perl this method will be automatically invoked if
897 the IO::Uncompress::AnyInflate object is destroyed (either explicitly or by the
898 variable with the reference to the object going out of scope). The
899 exceptions are Perl versions 5.005 through 5.00504 and 5.8.0. In
900 these cases, the C<close> method will be called automatically, but
901 not until global destruction of all live objects when the program is
904 Therefore, if you want your scripts to be able to run on all versions
905 of Perl, you should call C<close> explicitly and not rely on automatic
908 Returns true on success, otherwise 0.
910 If the C<AutoClose> option has been enabled when the IO::Uncompress::AnyInflate
911 object was created, and the object is associated with a file, the
912 underlying file will also be closed.
919 No symbolic constants are required by this IO::Uncompress::AnyInflate at present.
925 Imports C<anyinflate> and C<$AnyInflateError>.
928 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
939 L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Uncompress::AnyUncompress>
941 L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>
943 L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
944 L<Archive::Tar|Archive::Tar>,
948 For RFC 1950, 1951 and 1952 see
949 F<http://www.faqs.org/rfcs/rfc1950.html>,
950 F<http://www.faqs.org/rfcs/rfc1951.html> and
951 F<http://www.faqs.org/rfcs/rfc1952.html>
953 The I<zlib> compression library was written by Jean-loup Gailly
954 F<gzip@prep.ai.mit.edu> and Mark Adler F<madler@alumni.caltech.edu>.
956 The primary site for the I<zlib> compression library is
957 F<http://www.zlib.org>.
959 The primary site for gzip is F<http://www.gzip.org>.
966 This module was written by Paul Marquess, F<pmqs@cpan.org>.
970 =head1 MODIFICATION HISTORY
972 See the Changes file.
974 =head1 COPYRIGHT AND LICENSE
976 Copyright (c) 2005-2006 Paul Marquess. All rights reserved.
978 This program is free software; you can redistribute it and/or
979 modify it under the same terms as Perl itself.