[p5sagit/p5-mst-13.2.git] / ext / IO_Compress_Zlib / pod / FAQ.pod


=head1 NAME

IO::Compress::Zlib::FAQ -- Frequently Asked Questions about IO::Compress::Zlib

=head1 DESCRIPTION

Common questions answered.

=head2 Compatibility with Unix compress/uncompress.

This module is not compatible with Unix C<compress>.

If you have the C<uncompress> program available, you can use this to read
compressed files

    open F, "uncompress -c $filename |";
    while (<F>)
    {
        ...

Alternatively, if you have the C<gunzip> program available, you can use
this to read compressed files

    open F, "gunzip -c $filename |";
    while (<F>)
    {
        ...

and this to write compress files, if you have the C<compress> program
available

    open F, "| compress -c $filename ";
    print F "data";
    ...
    close F ;

=head2 Accessing .tar.Z files

See previous FAQ item.

If the C<Archive::Tar> module is installed and either the C<uncompress> or
C<gunzip> programs are available, you can use one of these workarounds to
read C<.tar.Z> files.

Firstly with C<uncompress>

    use strict;
    use warnings;
    use Archive::Tar;

    open F, "uncompress -c $filename |";
    my $tar = Archive::Tar->new(*F);
    ...

and this with C<gunzip>

    use strict;
    use warnings;
    use Archive::Tar;

    open F, "gunzip -c $filename |";
    my $tar = Archive::Tar->new(*F);
    ...

Similarly, if the C<compress> program is available, you can use this to
write a C<.tar.Z> file

    use strict;
    use warnings;
    use Archive::Tar;
    use IO::File;

    my $fh = new IO::File "| compress -c >$filename";
    my $tar = Archive::Tar->new();
    ...
    $tar->write($fh);
    $fh->close ;

=head2 Accessing Zip Files

This module provides support for reading/writing zip files using the
C<IO::Compress::Zip> and C<IO::Uncompress::Unzip> modules.

The primary focus of the C<IO::Compress::Zip> and C<IO::Uncompress::Unzip>
modules is to provide an C<IO::File> compatible streaming read/write
interface to zip files/buffers. They are not fully flegged archivers. If
you are looking for an archiver check out the C<Archive::Zip> module. You
can find it on CPAN at 

    http://www.cpan.org/modules/by-module/Archive/Archive-Zip-*.tar.gz    

=head2 Compressed files and Net::FTP

The C<Net::FTP> module provides two low-level methods called C<stor> and
C<retr> that both return filehandles. These filehandles can used with the
C<IO::Compress/Uncompress> modules to compress or uncompress files read
from or written to an FTP Server on the fly, without having to create a
temporary file.

Firstly, here is code that uses C<retr> to uncompressed a file as it is
read from the FTP Server.

    use Net::FTP;
    use IO::Uncompress::Gunzip qw(:all);

    my $ftp = new Net::FTP ...

    my $retr_fh = $ftp->retr($compressed_filename);
    gunzip $retr_fh => $outFilename, AutoClose => 1
        or die "Cannot uncompress '$compressed_file': $GunzipError\n";

and this to compress a file as it is written to the FTP Server 

    use Net::FTP;
    use IO::Compress::Gzip qw(:all);

    my $stor_fh = $ftp->stor($filename);
    gzip "filename" => $stor_fh, AutoClose => 1
        or die "Cannot compress '$filename': $GzipError\n";

=head2 How do I recompress using a different compression?

This is easier that you might expect if you realise that all the
C<IO::Compress::*> objects are derived from C<IO::File> and that all the
C<IO::Uncompress::*> modules can read from an C<IO::File> filehandle.

So, for example, say you have a file compressed with gzip that you want to
recompress with bzip2. Here is all that is needed to carry out the
recompression.

    use IO::Uncompress::Gunzip ':all';
    use IO::Compress::Bzip2 ':all';

    my $gzipFile = "somefile.gz";
    my $bzipFile = "somefile.bz2";

    my $gunzip = new IO::Uncompress::Gunzip $gzipFile
        or die "Cannot gunzip $gzipFile: $GunzipError\n" ;

    bzip2 $gunzip => $bzipFile 
        or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;

Note, there is a limitation of this technique. Some compression file
formats store extra information along with the compressed data payload. For
example, gzip can optionally store the original filename and Zip stores a
lot of information about the original file. If the original compressed file
contains any of this extra information, it will not be transferred to the
new compressed file usign the technique above.

=head2 Apache::GZip Revisited

Below is a mod_perl Apache compression module, called C<Apache::GZip>,
taken from
F<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>

  package Apache::GZip;
  #File: Apache::GZip.pm
  
  use strict vars;
  use Apache::Constants ':common';
  use Compress::Zlib;
  use IO::File;
  use constant GZIP_MAGIC => 0x1f8b;
  use constant OS_MAGIC => 0x03;
  
  sub handler {
      my $r = shift;
      my ($fh,$gz);
      my $file = $r->filename;
      return DECLINED unless $fh=IO::File->new($file);
      $r->header_out('Content-Encoding'=>'gzip');
      $r->send_http_header;
      return OK if $r->header_only;
  
      tie *STDOUT,'Apache::GZip',$r;
      print($_) while <$fh>;
      untie *STDOUT;
      return OK;
  }
  
  sub TIEHANDLE {
      my($class,$r) = @_;
      # initialize a deflation stream
      my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
  
      # gzip header -- don't ask how I found out
      $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
  
      return bless { r   => $r,
                     crc =>  crc32(undef),
                     d   => $d,
                     l   =>  0 
                   },$class;
  }
  
  sub PRINT {
      my $self = shift;
      foreach (@_) {
        # deflate the data
        my $data = $self->{d}->deflate($_);
        $self->{r}->print($data);
        # keep track of its length and crc
        $self->{l} += length($_);
        $self->{crc} = crc32($_,$self->{crc});
      }
  }
  
  sub DESTROY {
     my $self = shift;
     
     # flush the output buffers
     my $data = $self->{d}->flush;
     $self->{r}->print($data);
     
     # print the CRC and the total length (uncompressed)
     $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
  }
   
  1;

Here's the Apache configuration entry you'll need to make use of it.  Once
set it will result in everything in the /compressed directory will be
compressed automagically.

  <Location /compressed>
     SetHandler  perl-script
     PerlHandler Apache::GZip
  </Location>

Although at first sight there seems to be quite a lot going on in
C<Apache::GZip>, you could sum up what the code was doing as follows --
read the contents of the file in C<< $r->filename >>, compress it and write
the compressed data to standard output. That's all.

This code has to jump through a few hoops to achieve this because

=over

=item 1.

The gzip support in C<Compress::Zlib> version 1.x can only work with a real
filesystem filehandle. The filehandles used by Apache modules are not
associated with the filesystem.

=item 2.

That means all the gzip support has to be done by hand - in this case by
creating a tied filehandle to deal with creating the gzip header and
trailer.

=back

C<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one
of the reasons for writing it in the first place). So if
C<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied
filehandle code can be removed. Here is the rewritten code.

  package Apache::GZip;
  
  use strict vars;
  use Apache::Constants ':common';
  use IO::Compress::Gzip;
  use IO::File;
  
  sub handler {
      my $r = shift;
      my ($fh,$gz);
      my $file = $r->filename;
      return DECLINED unless $fh=IO::File->new($file);
      $r->header_out('Content-Encoding'=>'gzip');
      $r->send_http_header;
      return OK if $r->header_only;

      my $gz = new IO::Compress::Gzip '-', Minimal => 1
          or return DECLINED ;

      print $gz $_ while <$fh>;
  
      return OK;
  }
  
or even more succinctly, like this, using a one-shot gzip

  package Apache::GZip;
  
  use strict vars;
  use Apache::Constants ':common';
  use IO::Compress::Gzip qw(gzip);
  
  sub handler {
      my $r = shift;
      $r->header_out('Content-Encoding'=>'gzip');
      $r->send_http_header;
      return OK if $r->header_only;

      gzip $r->filename => '-', Minimal => 1
        or return DECLINED ;

      return OK;
  }
   
  1;

The use of one-shot C<gzip> above just reads from C<< $r->filename >> and
writes the compressed data to standard output.

Note the use of the C<Minimal> option in the code above. When using gzip
for Content-Encoding you should I<always> use this option. In the example
above it will prevent the filename being included in the gzip header and
make the size of the gzip data stream a slight bit smaller.

=head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer.

A fairly common use-case is where compressed data is embedded in a larger
file/buffer and you want to read both.

As an example consider the structure of a zip file. This is a well-defined
file format that mixes both compressed and uncompressed sections of data in
a single file. 

For the purposes of this discussion you can think of a zip file as sequence
of compressed data streams, each of which is prefixed by an uncompressed
local header. The local header contains information about the compressed
data stream, including the name of the compressed file and, in particular,
the length of the compressed data stream. 

To illustrate how to use C<InputLength> here is a script that walks a zip
file and prints out how many lines are in each compressed file (if you
intend write code to walking through a zip file for real see
L<IO::Uncompress::Unzip/"Walking through a zip file"> )

    use strict;
    use warnings;

    use IO::File;
    use IO::Uncompress::RawInflate qw(:all);

    use constant ZIP_LOCAL_HDR_SIG  => 0x04034b50;
    use constant ZIP_LOCAL_HDR_LENGTH => 30;

    my $file = $ARGV[0] ;

    my $fh = new IO::File "<$file"
                or die "Cannot open '$file': $!\n";

    while (1)
    {
        my $sig;
        my $buffer;

        my $x ;
        ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH 
            or die "Truncated file: $!\n";

        my $signature = unpack ("V", substr($buffer, 0, 4));

        last unless $signature == ZIP_LOCAL_HDR_SIG;

        # Read Local Header
        my $gpFlag             = unpack ("v", substr($buffer, 6, 2));
        my $compressedMethod   = unpack ("v", substr($buffer, 8, 2));
        my $compressedLength   = unpack ("V", substr($buffer, 18, 4));
        my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
        my $filename_length    = unpack ("v", substr($buffer, 26, 2)); 
        my $extra_length       = unpack ("v", substr($buffer, 28, 2));

        my $filename ;
        $fh->read($filename, $filename_length) == $filename_length 
            or die "Truncated file\n";

        $fh->read($buffer, $extra_length) == $extra_length
            or die "Truncated file\n";

        if ($compressedMethod != 8 && $compressedMethod != 0)
        {
            warn "Skipping file '$filename' - not deflated $compressedMethod\n";
            $fh->read($buffer, $compressedLength) == $compressedLength
                or die "Truncated file\n";
            next;
        }

        if ($compressedMethod == 0 && $gpFlag & 8 == 8)
        {
            die "Streamed Stored not supported for '$filename'\n";
        }

        next if $compressedLength == 0;

        # Done reading the Local Header

        my $inf = new IO::Uncompress::RawInflate $fh,
                            Transparent => 1,
                            InputLength => $compressedLength
          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

        my $line_count = 0;

        while (<$inf>)
        {
            ++ $line_count;
        }

        print "$filename: $line_count\n";
    }

The majority of the code above is concerned with reading the zip local
header data. The code that I want to focus on is at the bottom. 

    while (1) {
    
        # read local zip header data
        # get $filename
        # get $compressedLength

        my $inf = new IO::Uncompress::RawInflate $fh,
                            Transparent => 1,
                            InputLength => $compressedLength
          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

        my $line_count = 0;

        while (<$inf>)
        {
            ++ $line_count;
        }

        print "$filename: $line_count\n";
    }

The call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf>
that can be used to read from the parent filehandle C<$fh>, uncompressing
it as it goes. The use of the C<InputLength> option will guarantee that
I<at most> C<$compressedLength> bytes of compressed data will be read from
the C<$fh> filehandle (The only exception is for an error case like a
truncated file or a corrupt data stream).

This means that once RawInflate is finished C<$fh> will be left at the
byte directly after the compressed data stream. 

Now consider what the code looks like without C<InputLength> 

    while (1) {
    
        # read local zip header data
        # get $filename
        # get $compressedLength

        # read all the compressed data into $data
        read($fh, $data, $compressedLength);

        my $inf = new IO::Uncompress::RawInflate \$data,
                            Transparent => 1,
          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

        my $line_count = 0;

        while (<$inf>)
        {
            ++ $line_count;
        }

        print "$filename: $line_count\n";
    }

The difference here is the addition of the temporary variable C<$data>.
This is used to store a copy of the compressed data while it is being
uncompressed.

If you know that C<$compressedLength> isn't that big then using temporary
storage won't be a problem. But if C<$compressedLength> is very large or
you are writing an application that other people will use, and so have no
idea how big C<$compressedLength> will be, it could be an issue.

Using C<InputLength> avoids the use of temporary storage and means the
application can cope with large compressed data streams.

One final point -- obviously C<InputLength> can only be used whenever you
know the length of the compressed data beforehand, like here with a zip
file. 

=head1 SEE ALSO

L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress>

L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>

L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
L<Archive::Tar|Archive::Tar>,
L<IO::Zlib|IO::Zlib>

=head1 AUTHOR

This module was written by Paul Marquess, F<pmqs@cpan.org>. 

=head1 MODIFICATION HISTORY

See the Changes file.

=head1 COPYRIGHT AND LICENSE

Copyright (c) 2005-2008 Paul Marquess. All rights reserved.

This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Commit	Line	Data
d54256af	1
	2	=head1 NAME
	3
	4	IO::Compress::Zlib::FAQ -- Frequently Asked Questions about IO::Compress::Zlib
	5
	6	=head1 DESCRIPTION
	7
	8	Common questions answered.
	9
	10	=head2 Compatibility with Unix compress/uncompress.
	11
	12	This module is not compatible with Unix C<compress>.
	13
	14	If you have the C<uncompress> program available, you can use this to read
	15	compressed files
	16
	17	open F, "uncompress -c $filename \|";
	18	while (<F>)
	19	{
	20	...
	21
	22	Alternatively, if you have the C<gunzip> program available, you can use
	23	this to read compressed files
	24
	25	open F, "gunzip -c $filename \|";
	26	while (<F>)
	27	{
	28	...
	29
	30	and this to write compress files, if you have the C<compress> program
	31	available
	32
	33	open F, "\| compress -c $filename ";
	34	print F "data";
	35	...
	36	close F ;
	37
	38	=head2 Accessing .tar.Z files
	39
	40	See previous FAQ item.
	41
	42	If the C<Archive::Tar> module is installed and either the C<uncompress> or
	43	C<gunzip> programs are available, you can use one of these workarounds to
	44	read C<.tar.Z> files.
	45
	46	Firstly with C<uncompress>
	47
	48	use strict;
	49	use warnings;
	50	use Archive::Tar;
	51
	52	open F, "uncompress -c $filename \|";
	53	my $tar = Archive::Tar->new(*F);
	54	...
	55
	56	and this with C<gunzip>
	57
	58	use strict;
	59	use warnings;
	60	use Archive::Tar;
	61
	62	open F, "gunzip -c $filename \|";
	63	my $tar = Archive::Tar->new(*F);
	64	...
65
66	Similarly, if the C<compress> program is available, you can use this to
67	write a C<.tar.Z> file
68
69	use strict;
70	use warnings;
71	use Archive::Tar;
72	use IO::File;
73
74	my $fh = new IO::File "\| compress -c >$filename";
75	my $tar = Archive::Tar->new();
76	...
77	$tar->write($fh);
78	$fh->close ;
79
80	=head2 Accessing Zip Files
81
82	This module provides support for reading/writing zip files using the
83	C<IO::Compress::Zip> and C<IO::Uncompress::Unzip> modules.
84
85	The primary focus of the C<IO::Compress::Zip> and C<IO::Uncompress::Unzip>
86	modules is to provide an C<IO::File> compatible streaming read/write
87	interface to zip files/buffers. They are not fully flegged archivers. If
88	you are looking for an archiver check out the C<Archive::Zip> module. You
89	can find it on CPAN at
90
91	http://www.cpan.org/modules/by-module/Archive/Archive-Zip-*.tar.gz
92
93	=head2 Compressed files and Net::FTP
94
95	The C<Net::FTP> module provides two low-level methods called C<stor> and
96	C<retr> that both return filehandles. These filehandles can used with the
97	C<IO::Compress/Uncompress> modules to compress or uncompress files read
98	from or written to an FTP Server on the fly, without having to create a
99	temporary file.
100
101	Firstly, here is code that uses C<retr> to uncompressed a file as it is
102	read from the FTP Server.
103
104	use Net::FTP;
105	use IO::Uncompress::Gunzip qw(:all);
106
107	my $ftp = new Net::FTP ...
108
109	my $retr_fh = $ftp->retr($compressed_filename);
110	gunzip $retr_fh => $outFilename, AutoClose => 1
111	or die "Cannot uncompress '$compressed_file': $GunzipError\n";
112
113	and this to compress a file as it is written to the FTP Server
114
115	use Net::FTP;
116	use IO::Compress::Gzip qw(:all);
117
118	my $stor_fh = $ftp->stor($filename);
119	gzip "filename" => $stor_fh, AutoClose => 1
120	or die "Cannot compress '$filename': $GzipError\n";
121
122	=head2 How do I recompress using a different compression?
123
124	This is easier that you might expect if you realise that all the
125	C<IO::Compress::*> objects are derived from C<IO::File> and that all the
126	C<IO::Uncompress::*> modules can read from an C<IO::File> filehandle.
127
128	So, for example, say you have a file compressed with gzip that you want to
129	recompress with bzip2. Here is all that is needed to carry out the
130	recompression.
131
132	use IO::Uncompress::Gunzip ':all';
133	use IO::Compress::Bzip2 ':all';
134
135	my $gzipFile = "somefile.gz";
136	my $bzipFile = "somefile.bz2";
137
138	my $gunzip = new IO::Uncompress::Gunzip $gzipFile
139	or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
140
141	bzip2 $gunzip => $bzipFile
142	or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
143
144	Note, there is a limitation of this technique. Some compression file
145	formats store extra information along with the compressed data payload. For
146	example, gzip can optionally store the original filename and Zip stores a
147	lot of information about the original file. If the original compressed file
148	contains any of this extra information, it will not be transferred to the
149	new compressed file usign the technique above.
150
151	=head2 Apache::GZip Revisited
152
153	Below is a mod_perl Apache compression module, called C<Apache::GZip>,
154	taken from
155	F<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>
156
157	package Apache::GZip;
158	#File: Apache::GZip.pm
159
160	use strict vars;
161	use Apache::Constants ':common';
162	use Compress::Zlib;
163	use IO::File;
164	use constant GZIP_MAGIC => 0x1f8b;
165	use constant OS_MAGIC => 0x03;
166
167	sub handler {
168	my $r = shift;
169	my ($fh,$gz);
170	my $file = $r->filename;
171	return DECLINED unless $fh=IO::File->new($file);
172	$r->header_out('Content-Encoding'=>'gzip');
173	$r->send_http_header;
174	return OK if $r->header_only;
175
176	tie *STDOUT,'Apache::GZip',$r;
177	print($_) while <$fh>;
178	untie *STDOUT;
179	return OK;
180	}
181
182	sub TIEHANDLE {
183	my($class,$r) = @_;
184	# initialize a deflation stream
185	my $d = deflateInit(-WindowBits=>-MAX_WBITS()) \|\| return undef;
186
187	# gzip header -- don't ask how I found out
188	$r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
189
190	return bless { r => $r,
191	crc => crc32(undef),
192	d => $d,
193	l => 0
194	},$class;
195	}
196
197	sub PRINT {
198	my $self = shift;
199	foreach (@_) {
200	# deflate the data
201	my $data = $self->{d}->deflate($_);
202	$self->{r}->print($data);
203	# keep track of its length and crc
204	$self->{l} += length($_);
205	$self->{crc} = crc32($_,$self->{crc});
206	}
207	}
208
209	sub DESTROY {
210	my $self = shift;
211
212	# flush the output buffers
213	my $data = $self->{d}->flush;
214	$self->{r}->print($data);
215
216	# print the CRC and the total length (uncompressed)
217	$self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
218	}
219
220	1;
221
222	Here's the Apache configuration entry you'll need to make use of it. Once
223	set it will result in everything in the /compressed directory will be
224	compressed automagically.
225
226	<Location /compressed>
227	SetHandler perl-script
228	PerlHandler Apache::GZip
229	</Location>
230
231	Although at first sight there seems to be quite a lot going on in
232	C<Apache::GZip>, you could sum up what the code was doing as follows --
233	read the contents of the file in C<< $r->filename >>, compress it and write
234	the compressed data to standard output. That's all.
235
236	This code has to jump through a few hoops to achieve this because
237
238	=over
239
240	=item 1.
241
242	The gzip support in C<Compress::Zlib> version 1.x can only work with a real
243	filesystem filehandle. The filehandles used by Apache modules are not
244	associated with the filesystem.
245
246	=item 2.
247
248	That means all the gzip support has to be done by hand - in this case by
249	creating a tied filehandle to deal with creating the gzip header and
250	trailer.
251
252	=back
253
254	C<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one
255	of the reasons for writing it in the first place). So if
256	C<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied
257	filehandle code can be removed. Here is the rewritten code.
258
259	package Apache::GZip;
260
261	use strict vars;
262	use Apache::Constants ':common';
263	use IO::Compress::Gzip;
264	use IO::File;
265
266	sub handler {
267	my $r = shift;
268	my ($fh,$gz);
269	my $file = $r->filename;
270	return DECLINED unless $fh=IO::File->new($file);
271	$r->header_out('Content-Encoding'=>'gzip');
272	$r->send_http_header;
273	return OK if $r->header_only;
274
275	my $gz = new IO::Compress::Gzip '-', Minimal => 1
276	or return DECLINED ;
277
278	print $gz $_ while <$fh>;
279
280	return OK;
281	}
282
283	or even more succinctly, like this, using a one-shot gzip
284
285	package Apache::GZip;
286
287	use strict vars;
288	use Apache::Constants ':common';
289	use IO::Compress::Gzip qw(gzip);
290
291	sub handler {
292	my $r = shift;
293	$r->header_out('Content-Encoding'=>'gzip');
294	$r->send_http_header;
295	return OK if $r->header_only;
296
297	gzip $r->filename => '-', Minimal => 1
298	or return DECLINED ;
299
300	return OK;
301	}
302
303	1;
304
305	The use of one-shot C<gzip> above just reads from C<< $r->filename >> and
306	writes the compressed data to standard output.
307
308	Note the use of the C<Minimal> option in the code above. When using gzip
309	for Content-Encoding you should I<always> use this option. In the example
310	above it will prevent the filename being included in the gzip header and
311	make the size of the gzip data stream a slight bit smaller.
312
313	=head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer.
314
315	A fairly common use-case is where compressed data is embedded in a larger
316	file/buffer and you want to read both.
317
318	As an example consider the structure of a zip file. This is a well-defined
319	file format that mixes both compressed and uncompressed sections of data in
320	a single file.
321
322	For the purposes of this discussion you can think of a zip file as sequence
323	of compressed data streams, each of which is prefixed by an uncompressed
324	local header. The local header contains information about the compressed
325	data stream, including the name of the compressed file and, in particular,
326	the length of the compressed data stream.
327
328	To illustrate how to use C<InputLength> here is a script that walks a zip
329	file and prints out how many lines are in each compressed file (if you
330	intend write code to walking through a zip file for real see
331	L<IO::Uncompress::Unzip/"Walking through a zip file"> )
332
333	use strict;
334	use warnings;
335
336	use IO::File;
337	use IO::Uncompress::RawInflate qw(:all);
338
339	use constant ZIP_LOCAL_HDR_SIG => 0x04034b50;
340	use constant ZIP_LOCAL_HDR_LENGTH => 30;
341
342	my $file = $ARGV[0] ;
343
344	my $fh = new IO::File "<$file"
345	or die "Cannot open '$file': $!\n";
346
347	while (1)
348	{
349	my $sig;
350	my $buffer;
351
352	my $x ;
353	($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
354	or die "Truncated file: $!\n";
355
356	my $signature = unpack ("V", substr($buffer, 0, 4));
357
358	last unless $signature == ZIP_LOCAL_HDR_SIG;
359
360	# Read Local Header
361	my $gpFlag = unpack ("v", substr($buffer, 6, 2));
362	my $compressedMethod = unpack ("v", substr($buffer, 8, 2));
363	my $compressedLength = unpack ("V", substr($buffer, 18, 4));
364	my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
365	my $filename_length = unpack ("v", substr($buffer, 26, 2));
366	my $extra_length = unpack ("v", substr($buffer, 28, 2));
367
368	my $filename ;
369	$fh->read($filename, $filename_length) == $filename_length
370	or die "Truncated file\n";
371
372	$fh->read($buffer, $extra_length) == $extra_length
373	or die "Truncated file\n";
374
375	if ($compressedMethod != 8 && $compressedMethod != 0)
376	{
377	warn "Skipping file '$filename' - not deflated $compressedMethod\n";
378	$fh->read($buffer, $compressedLength) == $compressedLength
379	or die "Truncated file\n";
380	next;
381	}
382
383	if ($compressedMethod == 0 && $gpFlag & 8 == 8)
384	{
385	die "Streamed Stored not supported for '$filename'\n";
386	}
387
388	next if $compressedLength == 0;
389
390	# Done reading the Local Header
391
392	my $inf = new IO::Uncompress::RawInflate $fh,
393	Transparent => 1,
394	InputLength => $compressedLength
395	or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
396
397	my $line_count = 0;
398
399	while (<$inf>)
400	{
401	++ $line_count;
402	}
403
404	print "$filename: $line_count\n";
405	}
406
407	The majority of the code above is concerned with reading the zip local
408	header data. The code that I want to focus on is at the bottom.
409
410	while (1) {
411
412	# read local zip header data
413	# get $filename
414	# get $compressedLength
415
416	my $inf = new IO::Uncompress::RawInflate $fh,
417	Transparent => 1,
418	InputLength => $compressedLength
419	or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
420
421	my $line_count = 0;
422
423	while (<$inf>)
424	{
425	++ $line_count;
426	}
427
428	print "$filename: $line_count\n";
429	}
430
431	The call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf>
432	that can be used to read from the parent filehandle C<$fh>, uncompressing
433	it as it goes. The use of the C<InputLength> option will guarantee that
434	I<at most> C<$compressedLength> bytes of compressed data will be read from
435	the C<$fh> filehandle (The only exception is for an error case like a
436	truncated file or a corrupt data stream).
437
438	This means that once RawInflate is finished C<$fh> will be left at the
439	byte directly after the compressed data stream.
440
441	Now consider what the code looks like without C<InputLength>
442
443	while (1) {
444
445	# read local zip header data
446	# get $filename
447	# get $compressedLength
448
449	# read all the compressed data into $data
450	read($fh, $data, $compressedLength);
451
452	my $inf = new IO::Uncompress::RawInflate \$data,
453	Transparent => 1,
454	or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
455
456	my $line_count = 0;
457
458	while (<$inf>)
459	{
460	++ $line_count;
461	}
462
463	print "$filename: $line_count\n";
464	}
465
466	The difference here is the addition of the temporary variable C<$data>.
467	This is used to store a copy of the compressed data while it is being
468	uncompressed.
469
470	If you know that C<$compressedLength> isn't that big then using temporary
471	storage won't be a problem. But if C<$compressedLength> is very large or
472	you are writing an application that other people will use, and so have no
473	idea how big C<$compressedLength> will be, it could be an issue.
474
475	Using C<InputLength> avoids the use of temporary storage and means the
476	application can cope with large compressed data streams.
477
478	One final point -- obviously C<InputLength> can only be used whenever you
479	know the length of the compressed data beforehand, like here with a zip
480	file.
481
482	=head1 SEE ALSO
483
484	L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress>
485
486	L<Compress::Zlib::FAQ\|Compress::Zlib::FAQ>
487
488	L<File::GlobMapper\|File::GlobMapper>, L<Archive::Zip\|Archive::Zip>,
489	L<Archive::Tar\|Archive::Tar>,
490	L<IO::Zlib\|IO::Zlib>
491
492	=head1 AUTHOR
493
494	This module was written by Paul Marquess, F<pmqs@cpan.org>.
495
496	=head1 MODIFICATION HISTORY
497
498	See the Changes file.
499
500	=head1 COPYRIGHT AND LICENSE
501
502	Copyright (c) 2005-2008 Paul Marquess. All rights reserved.
503
504	This program is free software; you can redistribute it and/or
505	modify it under the same terms as Perl itself.
506