1 ### the gnu tar specification:
2 ### http://www.gnu.org/software/tar/manual/tar.html
4 ### and the pax format spec, which tar derives from:
5 ### http://www.opengroup.org/onlinepubs/007904975/utilities/pax.html
13 use Carp qw(carp croak);
15 use File::Spec::Unix ();
18 use Archive::Tar::File;
19 use Archive::Tar::Constant;
24 use vars qw[$DEBUG $error $VERSION $WARN $FOLLOW_SYMLINK $CHOWN $CHMOD
25 $DO_NOT_USE_PREFIX $HAS_PERLIO $HAS_IO_STRING
26 $INSECURE_EXTRACT_MODE @ISA @EXPORT
30 @EXPORT = qw[ COMPRESS_GZIP COMPRESS_BZIP ];
37 $DO_NOT_USE_PREFIX = 0;
38 $INSECURE_EXTRACT_MODE = 0;
42 $HAS_PERLIO = $Config::Config{useperlio};
44 ### try and load IO::String anyway, so you can dynamically
45 ### switch between perlio and IO::String
46 $HAS_IO_STRING = eval {
55 Archive::Tar - module for manipulations of tar archives
60 my $tar = Archive::Tar->new;
62 $tar->read('origin.tgz');
65 $tar->add_files('file/foo.pl', 'docs/README');
66 $tar->add_data('file/baz.txt', 'This is the contents now');
68 $tar->rename('oldname', 'new/file/name');
70 $tar->write('files.tar'); # plain tar
71 $tar->write('files.tgz', COMPRESS_GZIP); # gzip compressed
72 $tar->write('files.tbz', COMPRESS_BZIP); # bzip2 compressed
76 Archive::Tar provides an object oriented mechanism for handling tar
77 files. It provides class methods for quick and easy files handling
78 while also allowing for the creation of tar file objects for custom
79 manipulation. If you have the IO::Zlib module installed,
80 Archive::Tar will also support compressed or gzipped tar files.
82 An object of class Archive::Tar represents a .tar(.gz) archive full
87 =head2 Archive::Tar->new( [$file, $compressed] )
89 Returns a new Tar object. If given any arguments, C<new()> calls the
90 C<read()> method automatically, passing on the arguments provided to
93 If C<new()> is invoked with arguments and the C<read()> method fails
94 for any reason, C<new()> returns undef.
103 ### install get/set accessors for this object.
104 for my $key ( keys %$tmpl ) {
106 *{__PACKAGE__."::$key"} = sub {
108 $self->{$key} = $_[0] if @_;
109 return $self->{$key};
115 $class = ref $class if ref $class;
117 ### copying $tmpl here since a shallow copy makes it use the
118 ### same aref, causing for files to remain in memory always.
119 my $obj = bless { _data => [ ], _file => 'Unknown' }, $class;
122 unless ( $obj->read( @_ ) ) {
123 $obj->_error(qq[No data could be read from file]);
131 =head2 $tar->read ( $filename|$handle, [$compressed, {opt => 'val'}] )
133 Read the given tar file into memory.
134 The first argument can either be the name of a file or a reference to
135 an already open filehandle (or an IO::Zlib object if it's compressed)
137 The C<read> will I<replace> any previous content in C<$tar>!
139 The second argument may be considered optional, but remains for
140 backwards compatibility. Archive::Tar now looks at the file
141 magic to determine what class should be used to open the file
142 and will transparently Do The Right Thing.
144 Archive::Tar will warn if you try to pass a bzip2 compressed file and the
145 IO::Zlib / IO::Uncompress::Bunzip2 modules are not available and simply return.
147 Note that you can currently B<not> pass a C<gzip> compressed
148 filehandle, which is not opened with C<IO::Zlib>, a C<bzip2> compressed
149 filehandle, which is not opened with C<IO::Uncompress::Bunzip2>, nor a string
150 containing the full archive information (either compressed or
151 uncompressed). These are worth while features, but not currently
152 implemented. See the C<TODO> section.
154 The third argument can be a hash reference with options. Note that
155 all options are case-sensitive.
161 Do not read more than C<limit> files. This is useful if you have
162 very big archives, and are only interested in the first few files.
166 Can be set to a regular expression. Only files with names that match
167 the expression will be read.
171 If set to true, immediately extract entries when reading them. This
172 gives you the same memory break as the C<extract_archive> function.
173 Note however that entries will not be read into memory, but written
174 straight to disk. This means no C<Archive::Tar::File> objects are
175 created for you to inspect.
179 All files are stored internally as C<Archive::Tar::File> objects.
180 Please consult the L<Archive::Tar::File> documentation for details.
182 Returns the number of files read in scalar context, and a list of
183 C<Archive::Tar::File> objects in list context.
190 my $gzip = shift || 0;
191 my $opts = shift || {};
193 unless( defined $file ) {
194 $self->_error( qq[No file to read from!] );
197 $self->_file( $file );
200 my $handle = $self->_get_handle($file, $gzip, READ_ONLY->( ZLIB ) )
203 my $data = $self->_read_tar( $handle, $opts ) or return;
205 $self->_data( $data );
207 return wantarray ? @$data : scalar @$data;
212 my $file = shift; return unless defined $file;
213 return $file if ref $file;
214 my $compress = shift || 0;
215 my $mode = shift || READ_ONLY->( ZLIB ); # default to read only
218 ### get a FH opened to the right class, so we can use it transparently
219 ### throughout the program
221 { ### reading magic only makes sense if we're opening a file for
222 ### reading. otherwise, just use what the user requested.
224 if( MODE_READ->($mode) ) {
225 open my $tmp, $file or do {
226 $self->_error( qq[Could not open '$file' for reading: $!] );
230 ### read the first 4 bites of the file to figure out which class to
231 ### use to open the file.
232 sysread( $tmp, $magic, 4 );
237 ### if you asked specifically for bzip compression, or if we're in
238 ### read mode and the magic numbers add up, use bzip
240 ($compress eq COMPRESS_BZIP) or
241 ( MODE_READ->($mode) and $magic =~ BZIP_MAGIC_NUM )
245 ### different reader/writer modules, different error vars... sigh
246 if( MODE_READ->($mode) ) {
247 $fh = IO::Uncompress::Bunzip2->new( $file ) or do {
248 $self->_error( qq[Could not read '$file': ] .
249 $IO::Uncompress::Bunzip2::Bunzip2Error
255 $fh = IO::Compress::Bzip2->new( $file ) or do {
256 $self->_error( qq[Could not write to '$file': ] .
257 $IO::Compress::Bzip2::Bzip2Error
264 ### if you asked for compression, if you wanted to read or the gzip
265 ### magic number is present (redundant with read)
267 $compress or MODE_READ->($mode) or $magic =~ GZIP_MAGIC_NUM
272 unless( $fh->open( $file, $mode ) ) {
273 $self->_error(qq[Could not create filehandle for '$file': $!]);
281 unless( $fh->open( $file, $mode ) ) {
282 $self->_error(qq[Could not create filehandle for '$file': $!]);
286 ### enable bin mode on tar archives
297 my $handle = shift or return;
298 my $opts = shift || {};
300 my $count = $opts->{limit} || 0;
301 my $filter = $opts->{filter};
302 my $extract = $opts->{extract} || 0;
304 ### set a cap on the amount of files to extract ###
306 $limit = 1 if $count > 0;
311 my $real_name; # to set the name of a file when
312 # we're encountering @longlink
316 while( $handle->read( $chunk, HEAD ) ) {
317 ### IO::Zlib doesn't support this yet
318 my $offset = eval { tell $handle } || 'unknown';
321 my $gzip = GZIP_MAGIC_NUM;
322 if( $chunk =~ /$gzip/ ) {
323 $self->_error( qq[Cannot read compressed format in tar-mode] );
327 ### size is < HEAD, which means a corrupted file, as the minimum
328 ### length is _at least_ HEAD
329 if (length $chunk != HEAD) {
330 $self->_error( qq[Cannot read enough bytes from the tarfile] );
335 ### if we can't read in all bytes... ###
336 last if length $chunk != HEAD;
338 ### Apparently this should really be two blocks of 512 zeroes,
339 ### but GNU tar sometimes gets it wrong. See comment in the
340 ### source code (tar.c) to GNU cpio.
341 next if $chunk eq TAR_END;
343 ### according to the posix spec, the last 12 bytes of the header are
344 ### null bytes, to pad it to a 512 byte block. That means if these
345 ### bytes are NOT null bytes, it's a corrrupt header. See:
346 ### www.koders.com/c/fidCE473AD3D9F835D690259D60AD5654591D91D5BA.aspx
348 { my $nulls = join '', "\0" x 12;
349 unless( $nulls eq substr( $chunk, 500, 12 ) ) {
350 $self->_error( qq[Invalid header block at offset $offset] );
355 ### pass the realname, so we can set it 'proper' right away
356 ### some of the heuristics are done on the name, so important
359 { my %extra_args = ();
360 $extra_args{'name'} = $$real_name if defined $real_name;
362 unless( $entry = Archive::Tar::File->new( chunk => $chunk,
365 $self->_error( qq[Couldn't read chunk at offset $offset] );
371 ### http://www.gnu.org/manual/tar/html_node/tar_139.html
372 next if $entry->is_label;
374 if( length $entry->type and ($entry->is_file || $entry->is_longlink) ) {
376 if ( $entry->is_file && !$entry->validate ) {
377 ### sometimes the chunk is rather fux0r3d and a whole 512
378 ### bytes ends up in the ->name area.
379 ### clean it up, if need be
380 my $name = $entry->name;
381 $name = substr($name, 0, 100) if length $name > 100;
384 $self->_error( $name . qq[: checksum error] );
388 my $block = BLOCK_SIZE->( $entry->size );
390 $data = $entry->get_content_by_ref;
392 ### just read everything into memory
393 ### can't do lazy loading since IO::Zlib doesn't support 'seek'
394 ### this is because Compress::Zlib doesn't support it =/
395 ### this reads in the whole data in one read() call.
396 if( $handle->read( $$data, $block ) < $block ) {
397 $self->_error( qq[Read error on tarfile (missing data) '].
398 $entry->full_path ."' at offset $offset" );
402 ### throw away trailing garbage ###
403 substr ($$data, $entry->size) = "" if defined $$data;
405 ### part II of the @LongLink munging -- need to do /after/
406 ### the checksum check.
407 if( $entry->is_longlink ) {
408 ### weird thing in tarfiles -- if the file is actually a
409 ### @LongLink, the data part seems to have a trailing ^@
410 ### (unprintable) char. to display, pipe output through less.
411 ### but that doesn't *always* happen.. so check if the last
412 ### character is a control character, and if so remove it
413 ### at any rate, we better remove that character here, or tests
414 ### like 'eq' and hashlook ups based on names will SO not work
415 ### remove it by calculating the proper size, and then
416 ### tossing out everything that's longer than that size.
418 ### count number of nulls
419 my $nulls = $$data =~ tr/\0/\0/;
421 ### cut data + size by that many bytes
422 $entry->size( $entry->size - $nulls );
423 substr ($$data, $entry->size) = "";
427 ### clean up of the entries.. posix tar /apparently/ has some
428 ### weird 'feature' that allows for filenames > 255 characters
429 ### they'll put a header in with as name '././@LongLink' and the
430 ### contents will be the name of the /next/ file in the archive
431 ### pretty crappy and kludgy if you ask me
433 ### set the name for the next entry if this is a @LongLink;
434 ### this is one ugly hack =/ but needed for direct extraction
435 if( $entry->is_longlink ) {
438 } elsif ( defined $real_name ) {
439 $entry->name( $$real_name );
444 ### skip this entry if we're filtering
445 if ($filter && $entry->name !~ $filter) {
448 ### skip this entry if it's a pax header. This is a special file added
449 ### by, among others, git-generated tarballs. It holds comments and is
450 ### not meant for extracting. See #38932: pax_global_header extracted
451 } elsif ( $entry->name eq PAX_HEADER ) {
455 $self->_extract_file( $entry ) if $extract
456 && !$entry->is_longlink
457 && !$entry->is_unknown
458 && !$entry->is_label;
460 ### Guard against tarfiles with garbage at the end
461 last LOOP if $entry->name eq '';
463 ### push only the name on the rv if we're extracting
464 ### -- for extract_archive
465 push @$tarfile, ($extract ? $entry->name : $entry);
468 $count-- unless $entry->is_longlink || $entry->is_dir;
469 last LOOP unless $count;
478 =head2 $tar->contains_file( $filename )
480 Check if the archive contains a certain file.
481 It will return true if the file is in the archive, false otherwise.
483 Note however, that this function does an exact match using C<eq>
484 on the full path. So it cannot compensate for case-insensitive file-
485 systems or compare 2 paths to see if they would point to the same
494 return unless defined $full;
496 ### don't warn if the entry isn't there.. that's what this function
497 ### is for after all.
499 return 1 if $self->_find_entry($full);
503 =head2 $tar->extract( [@filenames] )
505 Write files whose names are equivalent to any of the names in
506 C<@filenames> to disk, creating subdirectories as necessary. This
507 might not work too well under VMS.
508 Under MacPerl, the file's modification time will be converted to the
509 MacOS zero of time, and appropriate conversions will be done to the
510 path. However, the length of each element of the path is not
511 inspected to see whether it's longer than MacOS currently allows (32
514 If C<extract> is called without a list of file names, the entire
515 contents of the archive are extracted.
517 Returns a list of filenames extracted.
526 # use the speed optimization for all extracted files
527 local($self->{cwd}) = cwd() unless $self->{cwd};
529 ### you requested the extraction of only certian files
531 for my $file ( @args ) {
533 ### it's already an object?
534 if( UNIVERSAL::isa( $file, 'Archive::Tar::File' ) ) {
542 for my $entry ( @{$self->_data} ) {
543 next unless $file eq $entry->full_path;
545 ### we found the file you're looking for
551 return $self->_error(
552 qq[Could not find '$file' in archive] );
557 ### just grab all the file items
559 @files = $self->get_files;
562 ### nothing found? that's an error
563 unless( scalar @files ) {
564 $self->_error( qq[No files found for ] . $self->_file );
569 for my $entry ( @files ) {
570 unless( $self->_extract_file( $entry ) ) {
571 $self->_error(q[Could not extract ']. $entry->full_path .q['] );
579 =head2 $tar->extract_file( $file, [$extract_path] )
581 Write an entry, whose name is equivalent to the file name provided to
582 disk. Optionally takes a second parameter, which is the full native
583 path (including filename) the entry will be written to.
587 $tar->extract_file( 'name/in/archive', 'name/i/want/to/give/it' );
589 $tar->extract_file( $at_file_object, 'name/i/want/to/give/it' );
591 Returns true on success, false on failure.
597 my $file = shift; return unless defined $file;
600 my $entry = $self->_find_entry( $file )
601 or $self->_error( qq[Could not find an entry for '$file'] ), return;
603 return $self->_extract_file( $entry, $alt );
608 my $entry = shift or return;
611 ### you wanted an alternate extraction location ###
612 my $name = defined $alt ? $alt : $entry->full_path;
614 ### splitpath takes a bool at the end to indicate
615 ### that it's splitting a dir
616 my ($vol,$dirs,$file);
617 if ( defined $alt ) { # It's a local-OS path
618 ($vol,$dirs,$file) = File::Spec->splitpath( $alt,
621 ($vol,$dirs,$file) = File::Spec::Unix->splitpath( $name,
626 ### is $name an absolute path? ###
627 if( $vol || File::Spec->file_name_is_absolute( $dirs ) ) {
629 ### absolute names are not allowed to be in tarballs under
630 ### strict mode, so only allow it if a user tells us to do it
631 if( not defined $alt and not $INSECURE_EXTRACT_MODE ) {
633 q[Entry ']. $entry->full_path .q[' is an absolute path. ].
634 q[Not extracting absolute paths under SECURE EXTRACT MODE]
639 ### user asked us to, it's fine.
640 $dir = File::Spec->catpath( $vol, $dirs, "" );
642 ### it's a relative path ###
644 my $cwd = (ref $self and defined $self->{cwd})
648 my @dirs = defined $alt
649 ? File::Spec->splitdir( $dirs ) # It's a local-OS path
650 : File::Spec::Unix->splitdir( $dirs ); # it's UNIX-style, likely
651 # straight from the tarball
653 if( not defined $alt and
654 not $INSECURE_EXTRACT_MODE
657 ### paths that leave the current directory are not allowed under
658 ### strict mode, so only allow it if a user tells us to do this.
659 if( grep { $_ eq '..' } @dirs ) {
662 q[Entry ']. $entry->full_path .q[' is attempting to leave ].
663 q[the current working directory. Not extracting under ].
664 q[SECURE EXTRACT MODE]
669 ### the archive may be asking us to extract into a symlink. This
670 ### is not sane and a possible security issue, as outlined here:
671 ### https://rt.cpan.org/Ticket/Display.html?id=30380
672 ### https://bugzilla.redhat.com/show_bug.cgi?id=295021
673 ### https://issues.rpath.com/browse/RPL-1716
674 my $full_path = $cwd;
675 for my $d ( @dirs ) {
676 $full_path = File::Spec->catdir( $full_path, $d );
678 ### we've already checked this one, and it's safe. Move on.
679 next if ref $self and $self->{_link_cache}->{$full_path};
681 if( -l $full_path ) {
682 my $to = readlink $full_path;
683 my $diag = "symlinked directory ($full_path => $to)";
686 q[Entry ']. $entry->full_path .q[' is attempting to ].
687 qq[extract to a $diag. This is considered a security ].
688 q[vulnerability and not allowed under SECURE EXTRACT ].
694 ### XXX keep a cache if possible, so the stats become cheaper:
695 $self->{_link_cache}->{$full_path} = 1 if ref $self;
699 ### '.' is the directory delimiter on VMS, which has to be escaped
700 ### or changed to '_' on vms. vmsify is used, because older versions
701 ### of vmspath do not handle this properly.
702 ### Must not add a '/' to an empty directory though.
703 map { length() ? VMS::Filespec::vmsify($_.'/') : $_ } @dirs if ON_VMS;
705 my ($cwd_vol,$cwd_dir,$cwd_file)
706 = File::Spec->splitpath( $cwd );
707 my @cwd = File::Spec->splitdir( $cwd_dir );
708 push @cwd, $cwd_file if length $cwd_file;
710 ### We need to pass '' as the last elemant to catpath. Craig Berry
711 ### explains why (msgid <p0624083dc311ae541393@[172.16.52.1]>):
712 ### The root problem is that splitpath on UNIX always returns the
713 ### final path element as a file even if it is a directory, and of
714 ### course there is no way it can know the difference without checking
715 ### against the filesystem, which it is documented as not doing. When
716 ### you turn around and call catpath, on VMS you have to know which bits
717 ### are directory bits and which bits are file bits. In this case we
718 ### know the result should be a directory. I had thought you could omit
719 ### the file argument to catpath in such a case, but apparently on UNIX
721 $dir = File::Spec->catpath(
722 $cwd_vol, File::Spec->catdir( @cwd, @dirs ), ''
725 ### catdir() returns undef if the path is longer than 255 chars on
726 ### older VMS systems.
727 unless ( defined $dir ) {
728 $^W && $self->_error( qq[Could not compose a path for '$dirs'\n] );
734 if( -e $dir && !-d _ ) {
735 $^W && $self->_error( qq['$dir' exists, but it's not a directory!\n] );
740 eval { File::Path::mkpath( $dir, 0, 0777 ) };
742 my $fp = $entry->full_path;
743 $self->_error(qq[Could not create directory '$dir' for '$fp': $@]);
747 ### XXX chown here? that might not be the same as in the archive
748 ### as we're only chown'ing to the owner of the file we're extracting
749 ### not to the owner of the directory itself, which may or may not
750 ### be another entry in the archive
751 ### Answer: no, gnu tar doesn't do it either, it'd be the wrong
753 #if( $CHOWN && CAN_CHOWN ) {
754 # chown $entry->uid, $entry->gid, $dir or
755 # $self->_error( qq[Could not set uid/gid on '$dir'] );
759 ### we're done if we just needed to create a dir ###
760 return 1 if $entry->is_dir;
762 my $full = File::Spec->catfile( $dir, $file );
764 if( $entry->is_unknown ) {
765 $self->_error( qq[Unknown file type for file '$full'] );
769 if( length $entry->type && $entry->is_file ) {
770 my $fh = IO::File->new;
771 $fh->open( '>' . $full ) or (
772 $self->_error( qq[Could not open file '$full': $!] ),
778 syswrite $fh, $entry->data or (
779 $self->_error( qq[Could not write data to '$full'] ),
785 $self->_error( qq[Could not close file '$full'] ),
790 $self->_make_special_file( $entry, $full ) or return;
793 ### only update the timestamp if it's not a symlink; that will change the
794 ### timestamp of the original. This addresses bug #33669: Could not update
795 ### timestamp warning on symlinks
797 utime time, $entry->mtime - TIME_OFFSET, $full or
798 $self->_error( qq[Could not update timestamp] );
801 if( $CHOWN && CAN_CHOWN->() ) {
802 chown $entry->uid, $entry->gid, $full or
803 $self->_error( qq[Could not set uid/gid on '$full'] );
806 ### only chmod if we're allowed to, but never chmod symlinks, since they'll
807 ### change the perms on the file they're linking too...
808 if( $CHMOD and not -l $full ) {
809 chmod $entry->mode, $full or
810 $self->_error( qq[Could not chown '$full' to ] . $entry->mode );
816 sub _make_special_file {
818 my $entry = shift or return;
819 my $file = shift; return unless defined $file;
823 if( $entry->is_symlink ) {
826 symlink( $entry->linkname, $file ) or $fail++;
829 $self->_extract_special_file_as_plain_file( $entry, $file )
833 $err = qq[Making symbolic link '$file' to '] .
834 $entry->linkname .q[' failed] if $fail;
836 } elsif ( $entry->is_hardlink ) {
839 link( $entry->linkname, $file ) or $fail++;
842 $self->_extract_special_file_as_plain_file( $entry, $file )
846 $err = qq[Making hard link from '] . $entry->linkname .
847 qq[' to '$file' failed] if $fail;
849 } elsif ( $entry->is_fifo ) {
850 ON_UNIX && !system('mknod', $file, 'p') or
851 $err = qq[Making fifo ']. $entry->name .qq[' failed];
853 } elsif ( $entry->is_blockdev or $entry->is_chardev ) {
854 my $mode = $entry->is_blockdev ? 'b' : 'c';
856 ON_UNIX && !system('mknod', $file, $mode,
857 $entry->devmajor, $entry->devminor) or
858 $err = qq[Making block device ']. $entry->name .qq[' (maj=] .
859 $entry->devmajor . qq[ min=] . $entry->devminor .
862 } elsif ( $entry->is_socket ) {
863 ### the original doesn't do anything special for sockets.... ###
867 return $err ? $self->_error( $err ) : 1;
870 ### don't know how to make symlinks, let's just extract the file as
872 sub _extract_special_file_as_plain_file {
874 my $entry = shift or return;
875 my $file = shift; return unless defined $file;
879 my $orig = $self->_find_entry( $entry->linkname );
882 $err = qq[Could not find file '] . $entry->linkname .
887 ### clone the entry, make it appear as a normal file ###
888 my $clone = $entry->clone;
889 $clone->_downgrade_to_plainfile;
890 $self->_extract_file( $clone, $file ) or last TRY;
895 return $self->_error($err);
898 =head2 $tar->list_files( [\@properties] )
900 Returns a list of the names of all the files in the archive.
902 If C<list_files()> is passed an array reference as its first argument
903 it returns a list of hash references containing the requested
904 properties of each file. The following list of properties is
905 supported: name, size, mtime (last modified date), mode, uid, gid,
906 linkname, uname, gname, devmajor, devminor, prefix.
908 Passing an array reference containing only one element, 'name', is
909 special cased to return a list of names rather than a list of hash
910 references, making it equivalent to calling C<list_files> without
917 my $aref = shift || [ ];
919 unless( $self->_data ) {
920 $self->read() or return;
923 if( @$aref == 0 or ( @$aref == 1 and $aref->[0] eq 'name' ) ) {
924 return map { $_->full_path } @{$self->_data};
928 #for my $obj ( @{$self->_data} ) {
929 # push @rv, { map { $_ => $obj->$_() } @$aref };
933 ### this does the same as the above.. just needs a +{ }
934 ### to make sure perl doesn't confuse it for a block
935 return map { my $o=$_;
936 +{ map { $_ => $o->$_() } @$aref }
945 unless( defined $file ) {
946 $self->_error( qq[No file specified] );
950 ### it's an object already
951 return $file if UNIVERSAL::isa( $file, 'Archive::Tar::File' );
953 for my $entry ( @{$self->_data} ) {
954 my $path = $entry->full_path;
955 return $entry if $path eq $file;
958 $self->_error( qq[No such file in archive: '$file'] );
962 =head2 $tar->get_files( [@filenames] )
964 Returns the C<Archive::Tar::File> objects matching the filenames
965 provided. If no filename list was passed, all C<Archive::Tar::File>
966 objects in the current Tar object are returned.
968 Please refer to the C<Archive::Tar::File> documentation on how to
969 handle these objects.
976 return @{ $self->_data } unless @_;
979 for my $file ( @_ ) {
980 push @list, grep { defined } $self->_find_entry( $file );
986 =head2 $tar->get_content( $file )
988 Return the content of the named file.
994 my $entry = $self->_find_entry( shift ) or return;
999 =head2 $tar->replace_content( $file, $content )
1001 Make the string $content be the content for the file named $file.
1005 sub replace_content {
1007 my $entry = $self->_find_entry( shift ) or return;
1009 return $entry->replace_content( shift );
1012 =head2 $tar->rename( $file, $new_name )
1014 Rename the file of the in-memory archive to $new_name.
1016 Note that you must specify a Unix path for $new_name, since per tar
1017 standard, all files in the archive must be Unix paths.
1019 Returns true on success and false on failure.
1025 my $file = shift; return unless defined $file;
1026 my $new = shift; return unless defined $new;
1028 my $entry = $self->_find_entry( $file ) or return;
1030 return $entry->rename( $new );
1033 =head2 $tar->remove (@filenamelist)
1035 Removes any entries with names matching any of the given filenames
1036 from the in-memory archive. Returns a list of C<Archive::Tar::File>
1037 objects that remain.
1045 my %seen = map { $_->full_path => $_ } @{$self->_data};
1046 delete $seen{ $_ } for @list;
1048 $self->_data( [values %seen] );
1050 return values %seen;
1055 C<clear> clears the current in-memory archive. This effectively gives
1056 you a 'blank' object, ready to be filled again. Note that C<clear>
1057 only has effect on the object, not the underlying tarfile.
1062 my $self = shift or return;
1071 =head2 $tar->write ( [$file, $compressed, $prefix] )
1073 Write the in-memory archive to disk. The first argument can either
1074 be the name of a file or a reference to an already open filehandle (a
1077 The second argument is used to indicate compression. You can either
1078 compress using C<gzip> or C<bzip2>. If you pass a digit, it's assumed
1079 to be the C<gzip> compression level (between 1 and 9), but the use of
1080 constants is prefered:
1082 # write a gzip compressed file
1083 $tar->write( 'out.tgz', COMPRESS_GZIP );
1085 # write a bzip compressed file
1086 $tar->write( 'out.tbz', COMPRESS_BZIP );
1088 Note that when you pass in a filehandle, the compression argument
1089 is ignored, as all files are printed verbatim to your filehandle.
1090 If you wish to enable compression with filehandles, use an
1091 C<IO::Zlib> or C<IO::Compress::Bzip2> filehandle instead.
1093 The third argument is an optional prefix. All files will be tucked
1094 away in the directory you specify as prefix. So if you have files
1095 'a' and 'b' in your archive, and you specify 'foo' as prefix, they
1096 will be written to the archive as 'foo/a' and 'foo/b'.
1098 If no arguments are given, C<write> returns the entire formatted
1099 archive as a string, which could be useful if you'd like to stuff the
1100 archive into a socket or a pipe to gzip or something.
1107 my $file = shift; $file = '' unless defined $file;
1108 my $gzip = shift || 0;
1109 my $ext_prefix = shift; $ext_prefix = '' unless defined $ext_prefix;
1112 ### only need a handle if we have a file to print to ###
1113 my $handle = length($file)
1114 ? ( $self->_get_handle($file, $gzip, WRITE_ONLY->($gzip) )
1116 : $HAS_PERLIO ? do { open my $h, '>', \$dummy; $h }
1117 : $HAS_IO_STRING ? IO::String->new
1118 : __PACKAGE__->no_string_support();
1120 ### Addresses: #41798: Nonempty $\ when writing a TAR file produces a
1121 ### corrupt TAR file. Must clear out $\ to make sure no garbage is
1122 ### printed to the archive
1125 for my $entry ( @{$self->_data} ) {
1126 ### entries to be written to the tarfile ###
1129 ### only now will we change the object to reflect the current state
1130 ### of the name and prefix fields -- this needs to be limited to
1132 my $clone = $entry->clone;
1135 ### so, if you don't want use to use the prefix, we'll stuff
1136 ### everything in the name field instead
1137 if( $DO_NOT_USE_PREFIX ) {
1139 ### you might have an extended prefix, if so, set it in the clone
1140 ### XXX is ::Unix right?
1141 $clone->name( length $ext_prefix
1142 ? File::Spec::Unix->catdir( $ext_prefix,
1144 : $clone->full_path );
1145 $clone->prefix( '' );
1147 ### otherwise, we'll have to set it properly -- prefix part in the
1148 ### prefix and name part in the name field.
1151 ### split them here, not before!
1152 my ($prefix,$name) = $clone->_prefix_and_file( $clone->full_path );
1154 ### you might have an extended prefix, if so, set it in the clone
1155 ### XXX is ::Unix right?
1156 $prefix = File::Spec::Unix->catdir( $ext_prefix, $prefix )
1157 if length $ext_prefix;
1159 $clone->prefix( $prefix );
1160 $clone->name( $name );
1163 ### names are too long, and will get truncated if we don't add a
1164 ### '@LongLink' file...
1165 my $make_longlink = ( length($clone->name) > NAME_LENGTH or
1166 length($clone->prefix) > PREFIX_LENGTH
1169 ### perhaps we need to make a longlink file?
1170 if( $make_longlink ) {
1171 my $longlink = Archive::Tar::File->new(
1172 data => LONGLINK_NAME,
1174 { type => LONGLINK }
1177 unless( $longlink ) {
1178 $self->_error( qq[Could not create 'LongLink' entry for ] .
1179 qq[oversize file '] . $clone->full_path ."'" );
1183 push @write_me, $longlink;
1186 push @write_me, $clone;
1188 ### write the one, optionally 2 a::t::file objects to the handle
1189 for my $clone (@write_me) {
1191 ### if the file is a symlink, there are 2 options:
1192 ### either we leave the symlink intact, but then we don't write any
1193 ### data OR we follow the symlink, which means we actually make a
1194 ### copy. if we do the latter, we have to change the TYPE of the
1196 my $link_ok = $clone->is_symlink && $Archive::Tar::FOLLOW_SYMLINK;
1197 my $data_ok = !$clone->is_symlink && $clone->has_content;
1199 ### downgrade to a 'normal' file if it's a symlink we're going to
1200 ### treat as a regular file
1201 $clone->_downgrade_to_plainfile if $link_ok;
1203 ### get the header for this block
1204 my $header = $self->_format_tar_entry( $clone );
1206 $self->_error(q[Could not format header for: ] .
1207 $clone->full_path );
1211 unless( print $handle $header ) {
1212 $self->_error(q[Could not write header for: ] .
1217 if( $link_ok or $data_ok ) {
1218 unless( print $handle $clone->data ) {
1219 $self->_error(q[Could not write data for: ] .
1224 ### pad the end of the clone if required ###
1225 print $handle TAR_PAD->( $clone->size ) if $clone->size % BLOCK
1228 } ### done writing these entries
1231 ### write the end markers ###
1232 print $handle TAR_END x 2 or
1233 return $self->_error( qq[Could not write tar end markers] );
1235 ### did you want it written to a file, or returned as a string? ###
1236 my $rv = length($file) ? 1
1237 : $HAS_PERLIO ? $dummy
1238 : do { seek $handle, 0, 0; local $/; <$handle> };
1240 ### make sure to close the handle;
1246 sub _format_tar_entry {
1248 my $entry = shift or return;
1249 my $ext_prefix = shift; $ext_prefix = '' unless defined $ext_prefix;
1250 my $no_prefix = shift || 0;
1252 my $file = $entry->name;
1253 my $prefix = $entry->prefix; $prefix = '' unless defined $prefix;
1255 ### remove the prefix from the file name
1256 ### not sure if this is still neeeded --kane
1257 ### no it's not -- Archive::Tar::File->_new_from_file will take care of
1258 ### this for us. Even worse, this would break if we tried to add a file
1260 #if( length $prefix ) {
1261 # $file =~ s/^$match//;
1264 $prefix = File::Spec::Unix->catdir($ext_prefix, $prefix)
1265 if length $ext_prefix;
1267 ### not sure why this is... ###
1268 my $l = PREFIX_LENGTH; # is ambiguous otherwise...
1269 substr ($prefix, 0, -$l) = "" if length $prefix >= PREFIX_LENGTH;
1271 my $f1 = "%06o"; my $f2 = "%11o";
1273 ### this might be optimizable with a 'changed' flag in the file objects ###
1278 (map { sprintf( $f1, $entry->$_() ) } qw[mode uid gid]),
1279 (map { sprintf( $f2, $entry->$_() ) } qw[size mtime]),
1281 "", # checksum field - space padded a bit down
1283 (map { $entry->$_() } qw[type linkname magic]),
1285 $entry->version || TAR_VERSION,
1287 (map { $entry->$_() } qw[uname gname]),
1288 (map { sprintf( $f1, $entry->$_() ) } qw[devmajor devminor]),
1290 ($no_prefix ? '' : $prefix)
1293 ### add the checksum ###
1294 substr($tar,148,7) = sprintf("%6o\0", unpack("%16C*",$tar));
1299 =head2 $tar->add_files( @filenamelist )
1301 Takes a list of filenames and adds them to the in-memory archive.
1303 The path to the file is automatically converted to a Unix like
1304 equivalent for use in the archive, and, if on MacOS, the file's
1305 modification time is converted from the MacOS epoch to the Unix epoch.
1306 So tar archives created on MacOS with B<Archive::Tar> can be read
1307 both with I<tar> on Unix and applications like I<suntar> or
1308 I<Stuffit Expander> on MacOS.
1310 Be aware that the file's type/creator and resource fork will be lost,
1311 which is usually what you want in cross-platform archives.
1313 Instead of a filename, you can also pass it an existing C<Archive::Tar::File>
1314 object from, for example, another archive. The object will be clone, and
1315 effectively be a copy of the original, not an alias.
1317 Returns a list of C<Archive::Tar::File> objects that were just added.
1323 my @files = @_ or return;
1326 for my $file ( @files ) {
1328 ### you passed an Archive::Tar::File object
1329 ### clone it so we don't accidentally have a reference to
1330 ### an object from another archive
1331 if( UNIVERSAL::isa( $file,'Archive::Tar::File' ) ) {
1332 push @rv, $file->clone;
1336 unless( -e $file || -l $file ) {
1337 $self->_error( qq[No such file: '$file'] );
1341 my $obj = Archive::Tar::File->new( file => $file );
1343 $self->_error( qq[Unable to add file: '$file'] );
1350 push @{$self->{_data}}, @rv;
1355 =head2 $tar->add_data ( $filename, $data, [$opthashref] )
1357 Takes a filename, a scalar full of data and optionally a reference to
1358 a hash with specific options.
1360 Will add a file to the in-memory archive, with name C<$filename> and
1361 content C<$data>. Specific properties can be set using C<$opthashref>.
1362 The following list of properties is supported: name, size, mtime
1363 (last modified date), mode, uid, gid, linkname, uname, gname,
1364 devmajor, devminor, prefix, type. (On MacOS, the file's path and
1365 modification times are converted to Unix equivalents.)
1367 Valid values for the file type are the following constants defined in
1368 Archive::Tar::Constants:
1380 Hard and symbolic ("soft") links; linkname should specify target.
1386 Character and block devices. devmajor and devminor should specify the major
1387 and minor device numbers.
1403 Returns the C<Archive::Tar::File> object that was just added, or
1404 C<undef> on failure.
1410 my ($file, $data, $opt) = @_;
1412 my $obj = Archive::Tar::File->new( data => $file, $data, $opt );
1414 $self->_error( qq[Unable to add file: '$file'] );
1418 push @{$self->{_data}}, $obj;
1423 =head2 $tar->error( [$BOOL] )
1425 Returns the current errorstring (usually, the last error reported).
1426 If a true value was specified, it will give the C<Carp::longmess>
1427 equivalent of the error, in effect giving you a stacktrace.
1429 For backwards compatibility, this error is also available as
1430 C<$Archive::Tar::error> although it is much recommended you use the
1431 method call instead.
1441 my $msg = $error = shift;
1442 $longmess = Carp::longmess($error);
1444 ### set Archive::Tar::WARN to 0 to disable printing
1447 carp $DEBUG ? $longmess : $msg;
1455 return shift() ? $longmess : $error;
1459 =head2 $tar->setcwd( $cwd );
1461 C<Archive::Tar> needs to know the current directory, and it will run
1462 C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the
1463 tarfile and saves it in the file system. (As of version 1.30, however,
1464 C<Archive::Tar> will use the speed optimization described below
1465 automatically, so it's only relevant if you're using C<extract_file()>).
1467 Since C<Archive::Tar> doesn't change the current directory internally
1468 while it is extracting the items in a tarball, all calls to C<Cwd::cwd()>
1469 can be avoided if we can guarantee that the current directory doesn't
1470 get changed externally.
1472 To use this performance boost, set the current directory via
1475 $tar->setcwd( cwd() );
1477 once before calling a function like C<extract_file> and
1478 C<Archive::Tar> will use the current directory setting from then on
1479 and won't call C<Cwd::cwd()> internally.
1481 To switch back to the default behaviour, use
1483 $tar->setcwd( undef );
1485 and C<Archive::Tar> will call C<Cwd::cwd()> internally again.
1487 If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will
1496 $self->{cwd} = $cwd;
1499 =head1 Class Methods
1501 =head2 Archive::Tar->create_archive($file, $compressed, @filelist)
1503 Creates a tar file from the list of files provided. The first
1504 argument can either be the name of the tar file to create or a
1505 reference to an open file handle (e.g. a GLOB reference).
1507 The second argument is used to indicate compression. You can either
1508 compress using C<gzip> or C<bzip2>. If you pass a digit, it's assumed
1509 to be the C<gzip> compression level (between 1 and 9), but the use of
1510 constants is prefered:
1512 # write a gzip compressed file
1513 Archive::Tar->create_archive( 'out.tgz', COMPRESS_GZIP, @filelist );
1515 # write a bzip compressed file
1516 Archive::Tar->create_archive( 'out.tbz', COMPRESS_BZIP, @filelist );
1518 Note that when you pass in a filehandle, the compression argument
1519 is ignored, as all files are printed verbatim to your filehandle.
1520 If you wish to enable compression with filehandles, use an
1521 C<IO::Zlib> or C<IO::Compress::Bzip2> filehandle instead.
1523 The remaining arguments list the files to be included in the tar file.
1524 These files must all exist. Any files which don't exist or can't be
1525 read are silently ignored.
1527 If the archive creation fails for any reason, C<create_archive> will
1528 return false. Please use the C<error> method to find the cause of the
1531 Note that this method does not write C<on the fly> as it were; it
1532 still reads all the files into memory before writing out the archive.
1533 Consult the FAQ below if this is a problem.
1537 sub create_archive {
1540 my $file = shift; return unless defined $file;
1541 my $gzip = shift || 0;
1545 return $class->_error( qq[Cowardly refusing to create empty archive!] );
1548 my $tar = $class->new;
1549 $tar->add_files( @files );
1550 return $tar->write( $file, $gzip );
1553 =head2 Archive::Tar->iter( $filename, [ $compressed, {opt => $val} ] )
1555 Returns an iterator function that reads the tar file without loading
1556 it all in memory. Each time the function is called it will return the
1557 next file in the tarball. The files are returned as
1558 C<Archive::Tar::File> objects. The iterator function returns the
1559 empty list once it has exhausted the the files contained.
1561 The second argument can be a hash reference with options, which are
1562 identical to the arguments passed to C<read()>.
1566 my $next = Archive::Tar->iter( "example.tar.gz", 1, {filter => qr/\.pm$/} );
1568 while( my $f = $next->() ) {
1569 print $f->name, "\n";
1571 $f->extract or warn "Extraction failed";
1581 my $filename = shift or return;
1582 my $compressed = shift or 0;
1583 my $opts = shift || {};
1585 ### get a handle to read from.
1586 my $handle = $class->_get_handle(
1594 return shift(@data) if @data; # more than one file returned?
1595 return unless $handle; # handle exhausted?
1597 ### read data, should only return file
1598 @data = @{ $class->_read_tar($handle, { %$opts, limit => 1 }) };
1600 ### return one piece of data
1601 return shift(@data) if @data;
1603 ### data is exhausted, free the filehandle
1609 =head2 Archive::Tar->list_archive($file, $compressed, [\@properties])
1611 Returns a list of the names of all the files in the archive. The
1612 first argument can either be the name of the tar file to list or a
1613 reference to an open file handle (e.g. a GLOB reference).
1615 If C<list_archive()> is passed an array reference as its third
1616 argument it returns a list of hash references containing the requested
1617 properties of each file. The following list of properties is
1618 supported: full_path, name, size, mtime (last modified date), mode,
1619 uid, gid, linkname, uname, gname, devmajor, devminor, prefix.
1621 See C<Archive::Tar::File> for details about supported properties.
1623 Passing an array reference containing only one element, 'name', is
1624 special cased to return a list of names rather than a list of hash
1631 my $file = shift; return unless defined $file;
1632 my $gzip = shift || 0;
1634 my $tar = $class->new($file, $gzip);
1637 return $tar->list_files( @_ );
1640 =head2 Archive::Tar->extract_archive($file, $compressed)
1642 Extracts the contents of the tar file. The first argument can either
1643 be the name of the tar file to create or a reference to an open file
1644 handle (e.g. a GLOB reference). All relative paths in the tar file will
1645 be created underneath the current working directory.
1647 C<extract_archive> will return a list of files it extracted.
1648 If the archive extraction fails for any reason, C<extract_archive>
1649 will return false. Please use the C<error> method to find the cause
1654 sub extract_archive {
1656 my $file = shift; return unless defined $file;
1657 my $gzip = shift || 0;
1659 my $tar = $class->new( ) or return;
1661 return $tar->read( $file, $gzip, { extract => 1 } );
1664 =head2 $bool = Archive::Tar->has_io_string
1666 Returns true if we currently have C<IO::String> support loaded.
1668 Either C<IO::String> or C<perlio> support is needed to support writing
1669 stringified archives. Currently, C<perlio> is the preferred method, if
1672 See the C<GLOBAL VARIABLES> section to see how to change this preference.
1676 sub has_io_string { return $HAS_IO_STRING; }
1678 =head2 $bool = Archive::Tar->has_perlio
1680 Returns true if we currently have C<perlio> support loaded.
1682 This requires C<perl-5.8> or higher, compiled with C<perlio>
1684 Either C<IO::String> or C<perlio> support is needed to support writing
1685 stringified archives. Currently, C<perlio> is the preferred method, if
1688 See the C<GLOBAL VARIABLES> section to see how to change this preference.
1692 sub has_perlio { return $HAS_PERLIO; }
1694 =head2 $bool = Archive::Tar->has_zlib_support
1696 Returns true if C<Archive::Tar> can extract C<zlib> compressed archives
1700 sub has_zlib_support { return ZLIB }
1702 =head2 $bool = Archive::Tar->has_bzip2_support
1704 Returns true if C<Archive::Tar> can extract C<bzip2> compressed archives
1708 sub has_bzip2_support { return BZIP }
1710 =head2 Archive::Tar->can_handle_compressed_files
1712 A simple checking routine, which will return true if C<Archive::Tar>
1713 is able to uncompress compressed archives on the fly with C<IO::Zlib>
1714 and C<IO::Compress::Bzip2> or false if not both are installed.
1716 You can use this as a shortcut to determine whether C<Archive::Tar>
1717 will do what you think before passing compressed archives to its
1722 sub can_handle_compressed_files { return ZLIB && BZIP ? 1 : 0 }
1724 sub no_string_support {
1725 croak("You have to install IO::String to support writing archives to strings");
1732 =head1 GLOBAL VARIABLES
1734 =head2 $Archive::Tar::FOLLOW_SYMLINK
1736 Set this variable to C<1> to make C<Archive::Tar> effectively make a
1737 copy of the file when extracting. Default is C<0>, which
1738 means the symlink stays intact. Of course, you will have to pack the
1739 file linked to as well.
1741 This option is checked when you write out the tarfile using C<write>
1742 or C<create_archive>.
1744 This works just like C</bin/tar>'s C<-h> option.
1746 =head2 $Archive::Tar::CHOWN
1748 By default, C<Archive::Tar> will try to C<chown> your files if it is
1749 able to. In some cases, this may not be desired. In that case, set
1750 this variable to C<0> to disable C<chown>-ing, even if it were
1753 The default is C<1>.
1755 =head2 $Archive::Tar::CHMOD
1757 By default, C<Archive::Tar> will try to C<chmod> your files to
1758 whatever mode was specified for the particular file in the archive.
1759 In some cases, this may not be desired. In that case, set this
1760 variable to C<0> to disable C<chmod>-ing.
1762 The default is C<1>.
1764 =head2 $Archive::Tar::DO_NOT_USE_PREFIX
1766 By default, C<Archive::Tar> will try to put paths that are over
1767 100 characters in the C<prefix> field of your tar header, as
1768 defined per POSIX-standard. However, some (older) tar programs
1769 do not implement this spec. To retain compatibility with these older
1770 or non-POSIX compliant versions, you can set the C<$DO_NOT_USE_PREFIX>
1771 variable to a true value, and C<Archive::Tar> will use an alternate
1772 way of dealing with paths over 100 characters by using the
1773 C<GNU Extended Header> feature.
1775 Note that clients who do not support the C<GNU Extended Header>
1776 feature will not be able to read these archives. Such clients include
1777 tars on C<Solaris>, C<Irix> and C<AIX>.
1779 The default is C<0>.
1781 =head2 $Archive::Tar::DEBUG
1783 Set this variable to C<1> to always get the C<Carp::longmess> output
1784 of the warnings, instead of the regular C<carp>. This is the same
1785 message you would get by doing:
1791 =head2 $Archive::Tar::WARN
1793 Set this variable to C<0> if you do not want any warnings printed.
1794 Personally I recommend against doing this, but people asked for the
1795 option. Also, be advised that this is of course not threadsafe.
1799 =head2 $Archive::Tar::error
1801 Holds the last reported error. Kept for historical reasons, but its
1802 use is very much discouraged. Use the C<error()> method instead:
1804 warn $tar->error unless $tar->extract;
1806 =head2 $Archive::Tar::INSECURE_EXTRACT_MODE
1808 This variable indicates whether C<Archive::Tar> should allow
1809 files to be extracted outside their current working directory.
1811 Allowing this could have security implications, as a malicious
1812 tar archive could alter or replace any file the extracting user
1813 has permissions to. Therefor, the default is to not allow
1814 insecure extractions.
1816 If you trust the archive, or have other reasons to allow the
1817 archive to write files outside your current working directory,
1818 set this variable to C<true>.
1820 Note that this is a backwards incompatible change from version
1823 =head2 $Archive::Tar::HAS_PERLIO
1825 This variable holds a boolean indicating if we currently have
1826 C<perlio> support loaded. This will be enabled for any perl
1827 greater than C<5.8> compiled with C<perlio>.
1829 If you feel strongly about disabling it, set this variable to
1830 C<false>. Note that you will then need C<IO::String> installed
1831 to support writing stringified archives.
1833 Don't change this variable unless you B<really> know what you're
1836 =head2 $Archive::Tar::HAS_IO_STRING
1838 This variable holds a boolean indicating if we currently have
1839 C<IO::String> support loaded. This will be enabled for any perl
1840 that has a loadable C<IO::String> module.
1842 If you feel strongly about disabling it, set this variable to
1843 C<false>. Note that you will then need C<perlio> support from
1844 your perl to be able to write stringified archives.
1846 Don't change this variable unless you B<really> know what you're
1853 =item What's the minimum perl version required to run Archive::Tar?
1855 You will need perl version 5.005_03 or newer.
1857 =item Isn't Archive::Tar slow?
1859 Yes it is. It's pure perl, so it's a lot slower then your C</bin/tar>
1860 However, it's very portable. If speed is an issue, consider using
1861 C</bin/tar> instead.
1863 =item Isn't Archive::Tar heavier on memory than /bin/tar?
1865 Yes it is, see previous answer. Since C<Compress::Zlib> and therefore
1866 C<IO::Zlib> doesn't support C<seek> on their filehandles, there is little
1867 choice but to read the archive into memory.
1868 This is ok if you want to do in-memory manipulation of the archive.
1870 If you just want to extract, use the C<extract_archive> class method
1871 instead. It will optimize and write to disk immediately.
1873 Another option is to use the C<iter> class method to iterate over
1874 the files in the tarball without reading them all in memory at once.
1876 =item Can you lazy-load data instead?
1878 In some cases, yes. You can use the C<iter> class method to iterate
1879 over the files in the tarball without reading them all in memory at once.
1881 =item How much memory will an X kb tar file need?
1883 Probably more than X kb, since it will all be read into memory. If
1884 this is a problem, and you don't need to do in memory manipulation
1885 of the archive, consider using the C<iter> class method, or C</bin/tar>
1888 =item What do you do with unsupported filetypes in an archive?
1890 C<Unix> has a few filetypes that aren't supported on other platforms,
1891 like C<Win32>. If we encounter a C<hardlink> or C<symlink> we'll just
1892 try to make a copy of the original file, rather than throwing an error.
1894 This does require you to read the entire archive in to memory first,
1895 since otherwise we wouldn't know what data to fill the copy with.
1896 (This means that you cannot use the class methods, including C<iter>
1897 on archives that have incompatible filetypes and still expect things
1900 For other filetypes, like C<chardevs> and C<blockdevs> we'll warn that
1901 the extraction of this particular item didn't work.
1903 =item I'm using WinZip, or some other non-POSIX client, and files are not being extracted properly!
1905 By default, C<Archive::Tar> is in a completely POSIX-compatible
1906 mode, which uses the POSIX-specification of C<tar> to store files.
1907 For paths greather than 100 characters, this is done using the
1908 C<POSIX header prefix>. Non-POSIX-compatible clients may not support
1909 this part of the specification, and may only support the C<GNU Extended
1910 Header> functionality. To facilitate those clients, you can set the
1911 C<$Archive::Tar::DO_NOT_USE_PREFIX> variable to C<true>. See the
1912 C<GLOBAL VARIABLES> section for details on this variable.
1914 Note that GNU tar earlier than version 1.14 does not cope well with
1915 the C<POSIX header prefix>. If you use such a version, consider setting
1916 the C<$Archive::Tar::DO_NOT_USE_PREFIX> variable to C<true>.
1918 =item How do I extract only files that have property X from an archive?
1920 Sometimes, you might not wish to extract a complete archive, just
1921 the files that are relevant to you, based on some criteria.
1923 You can do this by filtering a list of C<Archive::Tar::File> objects
1924 based on your criteria. For example, to extract only files that have
1925 the string C<foo> in their title, you would use:
1928 grep { $_->full_path =~ /foo/ } $tar->get_files
1931 This way, you can filter on any attribute of the files in the archive.
1932 Consult the C<Archive::Tar::File> documentation on how to use these
1935 =item How do I access .tar.Z files?
1937 The C<Archive::Tar> module can optionally use C<Compress::Zlib> (via
1938 the C<IO::Zlib> module) to access tar files that have been compressed
1939 with C<gzip>. Unfortunately tar files compressed with the Unix C<compress>
1940 utility cannot be read by C<Compress::Zlib> and so cannot be directly
1941 accesses by C<Archive::Tar>.
1943 If the C<uncompress> or C<gunzip> programs are available, you can use
1944 one of these workarounds to read C<.tar.Z> files from C<Archive::Tar>
1946 Firstly with C<uncompress>
1950 open F, "uncompress -c $filename |";
1951 my $tar = Archive::Tar->new(*F);
1954 and this with C<gunzip>
1958 open F, "gunzip -c $filename |";
1959 my $tar = Archive::Tar->new(*F);
1962 Similarly, if the C<compress> program is available, you can use this to
1963 write a C<.tar.Z> file
1968 my $fh = new IO::File "| compress -c >$filename";
1969 my $tar = Archive::Tar->new();
1974 =item How do I handle Unicode strings?
1976 C<Archive::Tar> uses byte semantics for any files it reads from or writes
1977 to disk. This is not a problem if you only deal with files and never
1978 look at their content or work solely with byte strings. But if you use
1979 Unicode strings with character semantics, some additional steps need
1982 For example, if you add a Unicode string like
1985 $tar->add_data('file.txt', "Euro: \x{20AC}");
1987 then there will be a problem later when the tarfile gets written out
1988 to disk via C<$tar->write()>:
1990 Wide character in print at .../Archive/Tar.pm line 1014.
1992 The data was added as a Unicode string and when writing it out to disk,
1993 the C<:utf8> line discipline wasn't set by C<Archive::Tar>, so Perl
1994 tried to convert the string to ISO-8859 and failed. The written file
1995 now contains garbage.
1997 For this reason, Unicode strings need to be converted to UTF-8-encoded
1998 bytestrings before they are handed off to C<add_data()>:
2001 my $data = "Accented character: \x{20AC}";
2002 $data = encode('utf8', $data);
2004 $tar->add_data('file.txt', $data);
2006 A opposite problem occurs if you extract a UTF8-encoded file from a
2007 tarball. Using C<get_content()> on the C<Archive::Tar::File> object
2008 will return its content as a bytestring, not as a Unicode string.
2010 If you want it to be a Unicode string (because you want character
2011 semantics with operations like regular expression matching), you need
2012 to decode the UTF8-encoded content and have Perl convert it into
2016 my $data = $tar->get_content();
2018 # Make it a Unicode string
2019 $data = decode('utf8', $data);
2021 There is no easy way to provide this functionality in C<Archive::Tar>,
2022 because a tarball can contain many files, and each of which could be
2023 encoded in a different way.
2031 =item Check if passed in handles are open for read/write
2033 Currently I don't know of any portable pure perl way to do this.
2034 Suggestions welcome.
2036 =item Allow archives to be passed in as string
2038 Currently, we only allow opened filehandles or filenames, but
2039 not strings. The internals would need some reworking to facilitate
2040 stringified archives.
2042 =item Facilitate processing an opened filehandle of a compressed archive
2044 Currently, we only support this if the filehandle is an IO::Zlib object.
2045 Environments, like apache, will present you with an opened filehandle
2046 to an uploaded file, which might be a compressed archive.
2054 =item The GNU tar specification
2056 C<http://www.gnu.org/software/tar/manual/tar.html>
2058 =item The PAX format specication
2060 The specifcation which tar derives from; C< http://www.opengroup.org/onlinepubs/007904975/utilities/pax.html>
2062 =item A comparison of GNU and POSIX tar standards; C<http://www.delorie.com/gnu/docs/tar/tar_114.html>
2064 =item GNU tar intends to switch to POSIX compatibility
2066 GNU Tar authors have expressed their intention to become completely
2067 POSIX-compatible; C<http://www.gnu.org/software/tar/manual/html_node/Formats.html>
2069 =item A Comparison between various tar implementations
2071 Lists known issues and incompatibilities; C<http://gd.tuwien.ac.at/utils/archivers/star/README.otherbugs>
2077 This module by Jos Boumans E<lt>kane@cpan.orgE<gt>.
2079 Please reports bugs to E<lt>bug-archive-tar@rt.cpan.orgE<gt>.
2081 =head1 ACKNOWLEDGEMENTS
2083 Thanks to Sean Burke, Chris Nandor, Chip Salzenberg, Tim Heaney, Gisle Aas
2084 and especially Andrew Savige for their help and suggestions.
2088 This module is copyright (c) 2002 - 2008 Jos Boumans
2089 E<lt>kane@cpan.orgE<gt>. All rights reserved.
2091 This library is free software; you may redistribute and/or modify
2092 it under the same terms as Perl itself.