1 package IO::Uncompress::AnyInflate ;
3 # for RFC1950, RFC1951 or RFC1952
7 use IO::Uncompress::Gunzip ;
11 our ($VERSION, @ISA, @EXPORT_OK, %EXPORT_TAGS, $AnyInflateError);
13 $VERSION = '2.000_05';
14 $AnyInflateError = '';
16 @ISA = qw(Exporter IO::BaseInflate);
17 @EXPORT_OK = qw( $AnyInflateError anyinflate ) ;
18 %EXPORT_TAGS = %IO::BaseInflate::EXPORT_TAGS ;
19 push @{ $EXPORT_TAGS{all} }, @EXPORT_OK ;
20 Exporter::export_ok_tags('all');
24 # TODO - allow the user to pick a set of the three formats to allow
25 # or just assume want to auto-detect any of the three formats.
30 return IO::BaseInflate::new($pkg, 'any', undef, \$AnyInflateError, 0, @_);
35 return IO::BaseInflate::_inf(__PACKAGE__, 'any', \$AnyInflateError, @_) ;
45 IO::Uncompress::AnyInflate - Perl interface to read RFC 1950, 1951 & 1952 files/buffers
49 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
51 my $status = anyinflate $input => $output [,OPTS]
52 or die "anyinflate failed: $AnyInflateError\n";
54 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
55 or die "anyinflate failed: $AnyInflateError\n";
57 $status = $z->read($buffer)
58 $status = $z->read($buffer, $length)
59 $status = $z->read($buffer, $length, $offset)
63 $status = $z->inflateSync()
65 $data = $z->getHeaderInfo()
67 $z->seek($position, $whence)
79 read($z, $buffer, $length);
80 read($z, $buffer, $length, $offset);
82 seek($z, $position, $whence)
93 B<WARNING -- This is a Beta release>.
97 =item * DO NOT use in production code.
99 =item * The documentation is incomplete in places.
101 =item * Parts of the interface defined here are tentative.
103 =item * Please report any problems you find.
111 This module provides a Perl interface that allows the reading of files/buffers
112 that conform to RFC's 1950, 1951 and 1952.
114 The module will auto-detect which, if any, of the three supported compression
115 formats is being used.
119 =head1 Functional Interface
121 A top-level function, C<anyinflate>, is provided to carry out "one-shot"
122 uncompression between buffers and/or files. For finer control over the uncompression process, see the L</"OO Interface"> section.
124 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
126 anyinflate $input => $output [,OPTS]
127 or die "anyinflate failed: $AnyInflateError\n";
129 anyinflate \%hash [,OPTS]
130 or die "anyinflate failed: $AnyInflateError\n";
132 The functional interface needs Perl5.005 or better.
135 =head2 anyinflate $input => $output [, OPTS]
137 If the first parameter is not a hash reference C<anyinflate> expects
138 at least two parameters, C<$input> and C<$output>.
140 =head3 The C<$input> parameter
142 The parameter, C<$input>, is used to define the source of
145 It can take one of the following forms:
151 If the C<$input> parameter is a simple scalar, it is assumed to be a
152 filename. This file will be opened for reading and the input data
153 will be read from it.
157 If the C<$input> parameter is a filehandle, the input data will be
159 The string '-' can be used as an alias for standard input.
161 =item A scalar reference
163 If C<$input> is a scalar reference, the input data will be read
166 =item An array reference
168 If C<$input> is an array reference, the input data will be read from each
169 element of the array in turn. The action taken by C<anyinflate> with
170 each element of the array will depend on the type of data stored
171 in it. You can mix and match any of the types defined in this list,
172 excluding other array or hash references.
173 The complete array will be walked to ensure that it only
174 contains valid data types before any data is uncompressed.
176 =item An Input FileGlob string
178 If C<$input> is a string that is delimited by the characters "<" and ">"
179 C<anyinflate> will assume that it is an I<input fileglob string>. The
180 input is the list of files that match the fileglob.
182 If the fileglob does not match any files ...
184 See L<File::GlobMapper|File::GlobMapper> for more details.
189 If the C<$input> parameter is any other type, C<undef> will be returned.
193 =head3 The C<$output> parameter
195 The parameter C<$output> is used to control the destination of the
196 uncompressed data. This parameter can take one of these forms.
202 If the C<$output> parameter is a simple scalar, it is assumed to be a filename.
203 This file will be opened for writing and the uncompressed data will be
208 If the C<$output> parameter is a filehandle, the uncompressed data will
210 The string '-' can be used as an alias for standard output.
213 =item A scalar reference
215 If C<$output> is a scalar reference, the uncompressed data will be stored
219 =item A Hash Reference
221 If C<$output> is a hash reference, the uncompressed data will be written
222 to C<$output{$input}> as a scalar reference.
224 When C<$output> is a hash reference, C<$input> must be either a filename or
225 list of filenames. Anything else is an error.
228 =item An Array Reference
230 If C<$output> is an array reference, the uncompressed data will be pushed
233 =item An Output FileGlob
235 If C<$output> is a string that is delimited by the characters "<" and ">"
236 C<anyinflate> will assume that it is an I<output fileglob string>. The
237 output is the list of files that match the fileglob.
239 When C<$output> is an fileglob string, C<$input> must also be a fileglob
240 string. Anything else is an error.
244 If the C<$output> parameter is any other type, C<undef> will be returned.
246 =head2 anyinflate \%hash [, OPTS]
248 If the first parameter is a hash reference, C<\%hash>, this will be used to
249 define both the source of compressed data and to control where the
250 uncompressed data is output. Each key/value pair in the hash defines a
251 mapping between an input filename, stored in the key, and an output
252 file/buffer, stored in the value. Although the input can only be a filename,
253 there is more flexibility to control the destination of the uncompressed
254 data. This is determined by the type of the value. Valid types are
260 If the value is C<undef> the uncompressed data will be written to the
261 value as a scalar reference.
265 If the value is a simple scalar, it is assumed to be a filename. This file will
266 be opened for writing and the uncompressed data will be written to it.
270 If the value is a filehandle, the uncompressed data will be
272 The string '-' can be used as an alias for standard output.
275 =item A scalar reference
277 If the value is a scalar reference, the uncompressed data will be stored
278 in the buffer that is referenced by the scalar.
281 =item A Hash Reference
283 If the value is a hash reference, the uncompressed data will be written
284 to C<$hash{$input}> as a scalar reference.
286 =item An Array Reference
288 If C<$output> is an array reference, the uncompressed data will be pushed
293 Any other type is a error.
297 When C<$input> maps to multiple files/buffers and C<$output> is a single
298 file/buffer the uncompressed input files/buffers will all be stored in
299 C<$output> as a single uncompressed stream.
303 =head2 Optional Parameters
305 Unless specified below, the optional parameters for C<anyinflate>,
306 C<OPTS>, are the same as those used with the OO interface defined in the
307 L</"Constructor Options"> section below.
311 =item AutoClose =E<gt> 0|1
313 This option applies to any input or output data streams to C<anyinflate>
314 that are filehandles.
316 If C<AutoClose> is specified, and the value is true, it will result in all
317 input and/or output filehandles being closed once C<anyinflate> has
320 This parameter defaults to 0.
324 =item -Append =E<gt> 0|1
337 To read the contents of the file C<file1.txt.Compressed> and write the
338 compressed data to the file C<file1.txt>.
342 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
344 my $input = "file1.txt.Compressed";
345 my $output = "file1.txt";
346 anyinflate $input => $output
347 or die "anyinflate failed: $AnyInflateError\n";
350 To read from an existing Perl filehandle, C<$input>, and write the
351 uncompressed data to a buffer, C<$buffer>.
355 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
358 my $input = new IO::File "<file1.txt.Compressed"
359 or die "Cannot open 'file1.txt.Compressed': $!\n" ;
361 anyinflate $input => \$buffer
362 or die "anyinflate failed: $AnyInflateError\n";
364 To uncompress all files in the directory "/my/home" that match "*.txt.Compressed" and store the compressed data in the same directory
368 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
370 anyinflate '</my/home/*.txt.Compressed>' => '</my/home/#1.txt>'
371 or die "anyinflate failed: $AnyInflateError\n";
373 and if you want to compress each file one at a time, this will do the trick
377 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
379 for my $input ( glob "/my/home/*.txt.Compressed" )
382 $output =~ s/.Compressed// ;
383 anyinflate $input => $output
384 or die "Error compressing '$input': $AnyInflateError\n";
391 The format of the constructor for IO::Uncompress::AnyInflate is shown below
394 my $z = new IO::Uncompress::AnyInflate $input [OPTS]
395 or die "IO::Uncompress::AnyInflate failed: $AnyInflateError\n";
397 Returns an C<IO::Uncompress::AnyInflate> object on success and undef on failure.
398 The variable C<$AnyInflateError> will contain an error message on failure.
400 If you are running Perl 5.005 or better the object, C<$z>, returned from
401 IO::Uncompress::AnyInflate can be used exactly like an L<IO::File|IO::File> filehandle.
402 This means that all normal input file operations can be carried out with C<$z>.
403 For example, to read a line from a compressed file/buffer you can use either
406 $line = $z->getline();
409 The mandatory parameter C<$input> is used to determine the source of the
410 compressed data. This parameter can take one of three forms.
416 If the C<$input> parameter is a scalar, it is assumed to be a filename. This
417 file will be opened for reading and the compressed data will be read from it.
421 If the C<$input> parameter is a filehandle, the compressed data will be
423 The string '-' can be used as an alias for standard input.
426 =item A scalar reference
428 If C<$input> is a scalar reference, the compressed data will be read from
433 =head2 Constructor Options
436 The option names defined below are case insensitive and can be optionally
437 prefixed by a '-'. So all of the following are valid
444 OPTS is a combination of the following options:
448 =item -AutoClose =E<gt> 0|1
450 This option is only valid when the C<$input> parameter is a filehandle. If
451 specified, and the value is true, it will result in the file being closed once
452 either the C<close> method is called or the IO::Uncompress::AnyInflate object is
455 This parameter defaults to 0.
457 =item -MultiStream =E<gt> 0|1
461 Allows multiple concatenated compressed streams to be treated as a single
462 compressed stream. Decompression will stop once either the end of the
463 file/buffer is reached, an error is encountered (premature eof, corrupt
464 compressed data) or the end of a stream is not immediately followed by the
465 start of another stream.
467 This parameter defaults to 0.
471 =item -Prime =E<gt> $string
473 This option will uncompress the contents of C<$string> before processing the
476 This option can be useful when the compressed data is embedded in another
477 file/data structure and it is not possible to work out where the compressed
478 data begins without having to read the first few bytes. If this is the case,
479 the uncompression can be I<primed> with these bytes using this option.
481 =item -Transparent =E<gt> 0|1
483 If this option is set and the input file or buffer is not compressed data,
484 the module will allow reading of it anyway.
486 This option defaults to 1.
488 =item -BlockSize =E<gt> $num
490 When reading the compressed input data, IO::Uncompress::AnyInflate will read it in blocks
493 This option defaults to 4096.
495 =item -InputLength =E<gt> $size
497 When present this option will limit the number of compressed bytes read from
498 the input file/buffer to C<$size>. This option can be used in the situation
499 where there is useful data directly after the compressed data stream and you
500 know beforehand the exact length of the compressed data stream.
502 This option is mostly used when reading from a filehandle, in which case the
503 file pointer will be left pointing to the first byte directly after the
504 compressed data stream.
508 This option defaults to off.
510 =item -Append =E<gt> 0|1
512 This option controls what the C<read> method does with uncompressed data.
514 If set to 1, all uncompressed data will be appended to the output parameter of
517 If set to 0, the contents of the output parameter of the C<read> method will be
518 overwritten by the uncompressed data.
522 =item -Strict =E<gt> 0|1
526 This option controls whether the extra checks defined below are used when
527 carrying out the decompression. When Strict is on, the extra tests are carried
528 out, when Strict is off they are not.
530 The default for this option is off.
533 If the input is an RFC1950 data stream, the following will be checked:
542 The ADLER32 checksum field must be present.
546 The value of the ADLER32 field read must match the adler32 value of the
547 uncompressed data actually contained in the file.
553 If the input is a gzip (RFC1952) data stream, the following will be checked:
562 If the FHCRC bit is set in the gzip FLG header byte, the CRC16 bytes in the
563 header must match the crc16 value of the gzip header actually read.
567 If the gzip header contains a name field (FNAME) it consists solely of ISO
572 If the gzip header contains a comment field (FCOMMENT) it consists solely of
573 ISO 8859-1 characters plus line-feed.
577 If the gzip FEXTRA header field is present it must conform to the sub-field
578 structure as defined in RFC1952.
582 The CRC32 and ISIZE trailer fields must be present.
586 The value of the CRC32 field read must match the crc32 value of the
587 uncompressed data actually contained in the gzip file.
591 The value of the ISIZE fields read must match the length of the uncompressed
592 data actually read from the file.
601 =item -ParseExtra =E<gt> 0|1
603 If the gzip FEXTRA header field is present and this option is set, it will
604 force the module to check that it conforms to the sub-field structure as
607 If the C<Strict> is on it will automatically enable this option.
625 $status = $z->read($buffer)
627 Reads a block of compressed data (the size the the compressed block is
628 determined by the C<Buffer> option in the constructor), uncompresses it and
629 writes any uncompressed data into C<$buffer>. If the C<Append> parameter is set
630 in the constructor, the uncompressed data will be appended to the C<$buffer>
631 parameter. Otherwise C<$buffer> will be overwritten.
633 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof or
634 a negative number on error.
640 $status = $z->read($buffer, $length)
641 $status = $z->read($buffer, $length, $offset)
643 $status = read($z, $buffer, $length)
644 $status = read($z, $buffer, $length, $offset)
646 Attempt to read C<$length> bytes of uncompressed data into C<$buffer>.
648 The main difference between this form of the C<read> method and the previous
649 one, is that this one will attempt to return I<exactly> C<$length> bytes. The
650 only circumstances that this function will not is if end-of-file or an IO error
653 Returns the number of uncompressed bytes written to C<$buffer>, zero if eof or
654 a negative number on error.
661 $line = $z->getline()
666 This method fully supports the use of of the variable C<$/>
667 (or C<$INPUT_RECORD_SEPARATOR> or C<$RS> when C<English> is in use) to
668 determine what constitutes an end of line. Both paragraph mode and file
669 slurp mode are supported.
678 Read a single character.
684 $char = $z->ungetc($string)
691 $status = $z->inflateSync()
699 $hdr = $z->getHeaderInfo()
718 Returns the uncompressed file offset.
729 Returns true if the end of the compressed input stream has been reached.
735 $z->seek($position, $whence);
736 seek($z, $position, $whence);
741 Provides a sub-set of the C<seek> functionality, with the restriction
742 that it is only legal to seek forward in the input file/buffer.
743 It is a fatal error to attempt to seek backward.
747 The C<$whence> parameter takes one the usual values, namely SEEK_SET,
748 SEEK_CUR or SEEK_END.
750 Returns 1 on success, 0 on failure.
759 This is a noop provided for completeness.
766 If the C<$z> object is associated with a file, this method will return
767 the underlying filehandle.
769 If the C<$z> object is is associated with a buffer, this method will
779 Closes the output file/buffer.
783 For most versions of Perl this method will be automatically invoked if
784 the IO::Uncompress::AnyInflate object is destroyed (either explicitly or by the
785 variable with the reference to the object going out of scope). The
786 exceptions are Perl versions 5.005 through 5.00504 and 5.8.0. In
787 these cases, the C<close> method will be called automatically, but
788 not until global destruction of all live objects when the program is
791 Therefore, if you want your scripts to be able to run on all versions
792 of Perl, you should call C<close> explicitly and not rely on automatic
795 Returns true on success, otherwise 0.
797 If the C<AutoClose> option has been enabled when the IO::Uncompress::AnyInflate
798 object was created, and the object is associated with a file, the
799 underlying file will also be closed.
806 No symbolic constants are required by this IO::Uncompress::AnyInflate at present.
812 Imports C<anyinflate> and C<$AnyInflateError>.
815 use IO::Uncompress::AnyInflate qw(anyinflate $AnyInflateError) ;
826 L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>
828 L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>
830 L<File::GlobMapper|File::GlobMapper>, L<Archive::Tar|Archive::Zip>,
833 For RFC 1950, 1951 and 1952 see
834 F<http://www.faqs.org/rfcs/rfc1950.html>,
835 F<http://www.faqs.org/rfcs/rfc1951.html> and
836 F<http://www.faqs.org/rfcs/rfc1952.html>
838 The primary site for the gzip program is F<http://www.gzip.org>.
842 The I<IO::Uncompress::AnyInflate> module was written by Paul Marquess,
843 F<pmqs@cpan.org>. The latest copy of the module can be
844 found on CPAN in F<modules/by-module/Compress/Compress-Zlib-x.x.tar.gz>.
846 The I<zlib> compression library was written by Jean-loup Gailly
847 F<gzip@prep.ai.mit.edu> and Mark Adler F<madler@alumni.caltech.edu>.
849 The primary site for the I<zlib> compression library is
850 F<http://www.zlib.org>.
852 =head1 MODIFICATION HISTORY
854 See the Changes file.
856 =head1 COPYRIGHT AND LICENSE
859 Copyright (c) 2005 Paul Marquess. All rights reserved.
860 This program is free software; you can redistribute it and/or
861 modify it under the same terms as Perl itself.