1 ### the gnu tar specification:
2 ### http://www.gnu.org/software/tar/manual/tar.html
4 ### and the pax format spec, which tar derives from:
5 ### http://www.opengroup.org/onlinepubs/007904975/utilities/pax.html
11 use vars qw[$DEBUG $error $VERSION $WARN $FOLLOW_SYMLINK $CHOWN $CHMOD
12 $DO_NOT_USE_PREFIX $HAS_PERLIO $HAS_IO_STRING
13 $INSECURE_EXTRACT_MODE
22 $DO_NOT_USE_PREFIX = 0;
23 $INSECURE_EXTRACT_MODE = 0;
27 $HAS_PERLIO = $Config::Config{useperlio};
29 ### try and load IO::String anyway, so you can dynamically
30 ### switch between perlio and IO::String
35 $HAS_IO_STRING = $@ ? 0 : 1;
41 use Carp qw(carp croak);
43 use File::Spec::Unix ();
46 use Archive::Tar::File;
47 use Archive::Tar::Constant;
51 Archive::Tar - module for manipulations of tar archives
56 my $tar = Archive::Tar->new;
58 $tar->read('origin.tgz',1);
61 $tar->add_files('file/foo.pl', 'docs/README');
62 $tar->add_data('file/baz.txt', 'This is the contents now');
64 $tar->rename('oldname', 'new/file/name');
66 $tar->write('files.tar');
70 Archive::Tar provides an object oriented mechanism for handling tar
71 files. It provides class methods for quick and easy files handling
72 while also allowing for the creation of tar file objects for custom
73 manipulation. If you have the IO::Zlib module installed,
74 Archive::Tar will also support compressed or gzipped tar files.
76 An object of class Archive::Tar represents a .tar(.gz) archive full
81 =head2 Archive::Tar->new( [$file, $compressed] )
83 Returns a new Tar object. If given any arguments, C<new()> calls the
84 C<read()> method automatically, passing on the arguments provided to
87 If C<new()> is invoked with arguments and the C<read()> method fails
88 for any reason, C<new()> returns undef.
97 ### install get/set accessors for this object.
98 for my $key ( keys %$tmpl ) {
100 *{__PACKAGE__."::$key"} = sub {
102 $self->{$key} = $_[0] if @_;
103 return $self->{$key};
109 $class = ref $class if ref $class;
111 ### copying $tmpl here since a shallow copy makes it use the
112 ### same aref, causing for files to remain in memory always.
113 my $obj = bless { _data => [ ], _file => 'Unknown' }, $class;
116 unless ( $obj->read( @_ ) ) {
117 $obj->_error(qq[No data could be read from file]);
125 =head2 $tar->read ( $filename|$handle, $compressed, {opt => 'val'} )
127 Read the given tar file into memory.
128 The first argument can either be the name of a file or a reference to
129 an already open filehandle (or an IO::Zlib object if it's compressed)
130 The second argument indicates whether the file referenced by the first
131 argument is compressed.
133 The C<read> will I<replace> any previous content in C<$tar>!
135 The second argument may be considered optional if IO::Zlib is
136 installed, since it will transparently Do The Right Thing.
137 Archive::Tar will warn if you try to pass a compressed file if
138 IO::Zlib is not available and simply return.
140 Note that you can currently B<not> pass a C<gzip> compressed
141 filehandle, which is not opened with C<IO::Zlib>, nor a string
142 containing the full archive information (either compressed or
143 uncompressed). These are worth while features, but not currently
144 implemented. See the C<TODO> section.
146 The third argument can be a hash reference with options. Note that
147 all options are case-sensitive.
153 Do not read more than C<limit> files. This is useful if you have
154 very big archives, and are only interested in the first few files.
158 If set to true, immediately extract entries when reading them. This
159 gives you the same memory break as the C<extract_archive> function.
160 Note however that entries will not be read into memory, but written
165 All files are stored internally as C<Archive::Tar::File> objects.
166 Please consult the L<Archive::Tar::File> documentation for details.
168 Returns the number of files read in scalar context, and a list of
169 C<Archive::Tar::File> objects in list context.
176 my $gzip = shift || 0;
177 my $opts = shift || {};
179 unless( defined $file ) {
180 $self->_error( qq[No file to read from!] );
183 $self->_file( $file );
186 my $handle = $self->_get_handle($file, $gzip, READ_ONLY->( ZLIB ) )
189 my $data = $self->_read_tar( $handle, $opts ) or return;
191 $self->_data( $data );
193 return wantarray ? @$data : scalar @$data;
198 my $file = shift; return unless defined $file;
199 return $file if ref $file;
201 my $gzip = shift || 0;
202 my $mode = shift || READ_ONLY->( ZLIB ); # default to read only
206 ### only default to ZLIB if we're not trying to /write/ to a handle ###
207 if( ZLIB and $gzip || MODE_READ->( $mode ) ) {
209 ### IO::Zlib will Do The Right Thing, even when passed
215 $self->_error(qq[Compression not available - Install IO::Zlib!]);
224 unless( $fh->open( $file, $mode ) ) {
225 $self->_error( qq[Could not create filehandle for '$file': $!!] );
236 my $handle = shift or return;
237 my $opts = shift || {};
239 my $count = $opts->{limit} || 0;
240 my $extract = $opts->{extract} || 0;
242 ### set a cap on the amount of files to extract ###
244 $limit = 1 if $count > 0;
249 my $real_name; # to set the name of a file when
250 # we're encountering @longlink
254 while( $handle->read( $chunk, HEAD ) ) {
255 ### IO::Zlib doesn't support this yet
256 my $offset = eval { tell $handle } || 'unknown';
259 my $gzip = GZIP_MAGIC_NUM;
260 if( $chunk =~ /$gzip/ ) {
261 $self->_error( qq[Cannot read compressed format in tar-mode] );
266 ### if we can't read in all bytes... ###
267 last if length $chunk != HEAD;
269 ### Apparently this should really be two blocks of 512 zeroes,
270 ### but GNU tar sometimes gets it wrong. See comment in the
271 ### source code (tar.c) to GNU cpio.
272 next if $chunk eq TAR_END;
274 ### according to the posix spec, the last 12 bytes of the header are
275 ### null bytes, to pad it to a 512 byte block. That means if these
276 ### bytes are NOT null bytes, it's a corrrupt header. See:
277 ### www.koders.com/c/fidCE473AD3D9F835D690259D60AD5654591D91D5BA.aspx
279 { my $nulls = join '', "\0" x 12;
280 unless( $nulls eq substr( $chunk, 500, 12 ) ) {
281 $self->_error( qq[Invalid header block at offset $offset] );
286 ### pass the realname, so we can set it 'proper' right away
287 ### some of the heuristics are done on the name, so important
290 { my %extra_args = ();
291 $extra_args{'name'} = $$real_name if defined $real_name;
293 unless( $entry = Archive::Tar::File->new( chunk => $chunk,
296 $self->_error( qq[Couldn't read chunk at offset $offset] );
302 ### http://www.gnu.org/manual/tar/html_node/tar_139.html
303 next if $entry->is_label;
305 if( length $entry->type and ($entry->is_file || $entry->is_longlink) ) {
307 if ( $entry->is_file && !$entry->validate ) {
308 ### sometimes the chunk is rather fux0r3d and a whole 512
309 ### bytes ends up in the ->name area.
310 ### clean it up, if need be
311 my $name = $entry->name;
312 $name = substr($name, 0, 100) if length $name > 100;
315 $self->_error( $name . qq[: checksum error] );
319 my $block = BLOCK_SIZE->( $entry->size );
321 $data = $entry->get_content_by_ref;
323 ### just read everything into memory
324 ### can't do lazy loading since IO::Zlib doesn't support 'seek'
325 ### this is because Compress::Zlib doesn't support it =/
326 ### this reads in the whole data in one read() call.
327 if( $handle->read( $$data, $block ) < $block ) {
328 $self->_error( qq[Read error on tarfile (missing data) '].
329 $entry->full_path ."' at offset $offset" );
333 ### throw away trailing garbage ###
334 substr ($$data, $entry->size) = "" if defined $$data;
336 ### part II of the @LongLink munging -- need to do /after/
337 ### the checksum check.
338 if( $entry->is_longlink ) {
339 ### weird thing in tarfiles -- if the file is actually a
340 ### @LongLink, the data part seems to have a trailing ^@
341 ### (unprintable) char. to display, pipe output through less.
342 ### but that doesn't *always* happen.. so check if the last
343 ### character is a control character, and if so remove it
344 ### at any rate, we better remove that character here, or tests
345 ### like 'eq' and hashlook ups based on names will SO not work
346 ### remove it by calculating the proper size, and then
347 ### tossing out everything that's longer than that size.
349 ### count number of nulls
350 my $nulls = $$data =~ tr/\0/\0/;
352 ### cut data + size by that many bytes
353 $entry->size( $entry->size - $nulls );
354 substr ($$data, $entry->size) = "";
358 ### clean up of the entries.. posix tar /apparently/ has some
359 ### weird 'feature' that allows for filenames > 255 characters
360 ### they'll put a header in with as name '././@LongLink' and the
361 ### contents will be the name of the /next/ file in the archive
362 ### pretty crappy and kludgy if you ask me
364 ### set the name for the next entry if this is a @LongLink;
365 ### this is one ugly hack =/ but needed for direct extraction
366 if( $entry->is_longlink ) {
369 } elsif ( defined $real_name ) {
370 $entry->name( $$real_name );
375 $self->_extract_file( $entry ) if $extract
376 && !$entry->is_longlink
377 && !$entry->is_unknown
378 && !$entry->is_label;
380 ### Guard against tarfiles with garbage at the end
381 last LOOP if $entry->name eq '';
383 ### push only the name on the rv if we're extracting
384 ### -- for extract_archive
385 push @$tarfile, ($extract ? $entry->name : $entry);
388 $count-- unless $entry->is_longlink || $entry->is_dir;
389 last LOOP unless $count;
398 =head2 $tar->contains_file( $filename )
400 Check if the archive contains a certain file.
401 It will return true if the file is in the archive, false otherwise.
403 Note however, that this function does an exact match using C<eq>
404 on the full path. So it cannot compensate for case-insensitive file-
405 systems or compare 2 paths to see if they would point to the same
414 return unless defined $full;
416 ### don't warn if the entry isn't there.. that's what this function
417 ### is for after all.
419 return 1 if $self->_find_entry($full);
423 =head2 $tar->extract( [@filenames] )
425 Write files whose names are equivalent to any of the names in
426 C<@filenames> to disk, creating subdirectories as necessary. This
427 might not work too well under VMS.
428 Under MacPerl, the file's modification time will be converted to the
429 MacOS zero of time, and appropriate conversions will be done to the
430 path. However, the length of each element of the path is not
431 inspected to see whether it's longer than MacOS currently allows (32
434 If C<extract> is called without a list of file names, the entire
435 contents of the archive are extracted.
437 Returns a list of filenames extracted.
446 # use the speed optimization for all extracted files
447 local($self->{cwd}) = cwd() unless $self->{cwd};
449 ### you requested the extraction of only certian files
451 for my $file ( @args ) {
453 ### it's already an object?
454 if( UNIVERSAL::isa( $file, 'Archive::Tar::File' ) ) {
462 for my $entry ( @{$self->_data} ) {
463 next unless $file eq $entry->full_path;
465 ### we found the file you're looking for
471 return $self->_error(
472 qq[Could not find '$file' in archive] );
477 ### just grab all the file items
479 @files = $self->get_files;
482 ### nothing found? that's an error
483 unless( scalar @files ) {
484 $self->_error( qq[No files found for ] . $self->_file );
489 for my $entry ( @files ) {
490 unless( $self->_extract_file( $entry ) ) {
491 $self->_error(q[Could not extract ']. $entry->full_path .q['] );
499 =head2 $tar->extract_file( $file, [$extract_path] )
501 Write an entry, whose name is equivalent to the file name provided to
502 disk. Optionally takes a second parameter, which is the full native
503 path (including filename) the entry will be written to.
507 $tar->extract_file( 'name/in/archive', 'name/i/want/to/give/it' );
509 $tar->extract_file( $at_file_object, 'name/i/want/to/give/it' );
511 Returns true on success, false on failure.
517 my $file = shift; return unless defined $file;
520 my $entry = $self->_find_entry( $file )
521 or $self->_error( qq[Could not find an entry for '$file'] ), return;
523 return $self->_extract_file( $entry, $alt );
528 my $entry = shift or return;
531 ### you wanted an alternate extraction location ###
532 my $name = defined $alt ? $alt : $entry->full_path;
534 ### splitpath takes a bool at the end to indicate
535 ### that it's splitting a dir
536 my ($vol,$dirs,$file);
537 if ( defined $alt ) { # It's a local-OS path
538 ($vol,$dirs,$file) = File::Spec->splitpath( $alt,
541 ($vol,$dirs,$file) = File::Spec::Unix->splitpath( $name,
546 ### is $name an absolute path? ###
547 if( File::Spec->file_name_is_absolute( $dirs ) ) {
549 ### absolute names are not allowed to be in tarballs under
550 ### strict mode, so only allow it if a user tells us to do it
551 if( not defined $alt and not $INSECURE_EXTRACT_MODE ) {
553 q[Entry ']. $entry->full_path .q[' is an absolute path. ].
554 q[Not extracting absolute paths under SECURE EXTRACT MODE]
559 ### user asked us to, it's fine.
562 ### it's a relative path ###
564 my $cwd = (defined $self->{cwd} ? $self->{cwd} : cwd());
566 my @dirs = defined $alt
567 ? File::Spec->splitdir( $dirs ) # It's a local-OS path
568 : File::Spec::Unix->splitdir( $dirs ); # it's UNIX-style, likely
569 # straight from the tarball
571 ### paths that leave the current directory are not allowed under
572 ### strict mode, so only allow it if a user tells us to do this.
573 if( not defined $alt and
574 not $INSECURE_EXTRACT_MODE and
575 grep { $_ eq '..' } @dirs
578 q[Entry ']. $entry->full_path .q[' is attempting to leave the ].
579 q[current working directory. Not extracting under SECURE ].
585 ### '.' is the directory delimiter, of which the first one has to
586 ### be escaped/changed.
587 map tr/\./_/, @dirs if ON_VMS;
589 my ($cwd_vol,$cwd_dir,$cwd_file)
590 = File::Spec->splitpath( $cwd );
591 my @cwd = File::Spec->splitdir( $cwd_dir );
592 push @cwd, $cwd_file if length $cwd_file;
594 ### We need to pass '' as the last elemant to catpath. Craig Berry
595 ### explains why (msgid <p0624083dc311ae541393@[172.16.52.1]>):
596 ### The root problem is that splitpath on UNIX always returns the
597 ### final path element as a file even if it is a directory, and of
598 ### course there is no way it can know the difference without checking
599 ### against the filesystem, which it is documented as not doing. When
600 ### you turn around and call catpath, on VMS you have to know which bits
601 ### are directory bits and which bits are file bits. In this case we
602 ### know the result should be a directory. I had thought you could omit
603 ### the file argument to catpath in such a case, but apparently on UNIX
605 $dir = File::Spec->catpath(
606 $cwd_vol, File::Spec->catdir( @cwd, @dirs ), ''
609 ### catdir() returns undef if the path is longer than 255 chars on VMS
610 unless ( defined $dir ) {
611 $^W && $self->_error( qq[Could not compose a path for '$dirs'\n] );
617 if( -e $dir && !-d _ ) {
618 $^W && $self->_error( qq['$dir' exists, but it's not a directory!\n] );
623 eval { File::Path::mkpath( $dir, 0, 0777 ) };
625 $self->_error( qq[Could not create directory '$dir': $@] );
629 ### XXX chown here? that might not be the same as in the archive
630 ### as we're only chown'ing to the owner of the file we're extracting
631 ### not to the owner of the directory itself, which may or may not
632 ### be another entry in the archive
633 ### Answer: no, gnu tar doesn't do it either, it'd be the wrong
635 #if( $CHOWN && CAN_CHOWN ) {
636 # chown $entry->uid, $entry->gid, $dir or
637 # $self->_error( qq[Could not set uid/gid on '$dir'] );
641 ### we're done if we just needed to create a dir ###
642 return 1 if $entry->is_dir;
644 my $full = File::Spec->catfile( $dir, $file );
646 if( $entry->is_unknown ) {
647 $self->_error( qq[Unknown file type for file '$full'] );
651 if( length $entry->type && $entry->is_file ) {
652 my $fh = IO::File->new;
653 $fh->open( '>' . $full ) or (
654 $self->_error( qq[Could not open file '$full': $!] ),
660 syswrite $fh, $entry->data or (
661 $self->_error( qq[Could not write data to '$full'] ),
667 $self->_error( qq[Could not close file '$full'] ),
672 $self->_make_special_file( $entry, $full ) or return;
675 utime time, $entry->mtime - TIME_OFFSET, $full or
676 $self->_error( qq[Could not update timestamp] );
678 if( $CHOWN && CAN_CHOWN ) {
679 chown $entry->uid, $entry->gid, $full or
680 $self->_error( qq[Could not set uid/gid on '$full'] );
683 ### only chmod if we're allowed to, but never chmod symlinks, since they'll
684 ### change the perms on the file they're linking too...
685 if( $CHMOD and not -l $full ) {
686 chmod $entry->mode, $full or
687 $self->_error( qq[Could not chown '$full' to ] . $entry->mode );
693 sub _make_special_file {
695 my $entry = shift or return;
696 my $file = shift; return unless defined $file;
700 if( $entry->is_symlink ) {
703 symlink( $entry->linkname, $file ) or $fail++;
706 $self->_extract_special_file_as_plain_file( $entry, $file )
710 $err = qq[Making symbolink link from '] . $entry->linkname .
711 qq[' to '$file' failed] if $fail;
713 } elsif ( $entry->is_hardlink ) {
716 link( $entry->linkname, $file ) or $fail++;
719 $self->_extract_special_file_as_plain_file( $entry, $file )
723 $err = qq[Making hard link from '] . $entry->linkname .
724 qq[' to '$file' failed] if $fail;
726 } elsif ( $entry->is_fifo ) {
727 ON_UNIX && !system('mknod', $file, 'p') or
728 $err = qq[Making fifo ']. $entry->name .qq[' failed];
730 } elsif ( $entry->is_blockdev or $entry->is_chardev ) {
731 my $mode = $entry->is_blockdev ? 'b' : 'c';
733 ON_UNIX && !system('mknod', $file, $mode,
734 $entry->devmajor, $entry->devminor) or
735 $err = qq[Making block device ']. $entry->name .qq[' (maj=] .
736 $entry->devmajor . qq[ min=] . $entry->devminor .
739 } elsif ( $entry->is_socket ) {
740 ### the original doesn't do anything special for sockets.... ###
744 return $err ? $self->_error( $err ) : 1;
747 ### don't know how to make symlinks, let's just extract the file as
749 sub _extract_special_file_as_plain_file {
751 my $entry = shift or return;
752 my $file = shift; return unless defined $file;
756 my $orig = $self->_find_entry( $entry->linkname );
759 $err = qq[Could not find file '] . $entry->linkname .
764 ### clone the entry, make it appear as a normal file ###
765 my $clone = $entry->clone;
766 $clone->_downgrade_to_plainfile;
767 $self->_extract_file( $clone, $file ) or last TRY;
772 return $self->_error($err);
775 =head2 $tar->list_files( [\@properties] )
777 Returns a list of the names of all the files in the archive.
779 If C<list_files()> is passed an array reference as its first argument
780 it returns a list of hash references containing the requested
781 properties of each file. The following list of properties is
782 supported: name, size, mtime (last modified date), mode, uid, gid,
783 linkname, uname, gname, devmajor, devminor, prefix.
785 Passing an array reference containing only one element, 'name', is
786 special cased to return a list of names rather than a list of hash
787 references, making it equivalent to calling C<list_files> without
794 my $aref = shift || [ ];
796 unless( $self->_data ) {
797 $self->read() or return;
800 if( @$aref == 0 or ( @$aref == 1 and $aref->[0] eq 'name' ) ) {
801 return map { $_->full_path } @{$self->_data};
805 #for my $obj ( @{$self->_data} ) {
806 # push @rv, { map { $_ => $obj->$_() } @$aref };
810 ### this does the same as the above.. just needs a +{ }
811 ### to make sure perl doesn't confuse it for a block
812 return map { my $o=$_;
813 +{ map { $_ => $o->$_() } @$aref }
822 unless( defined $file ) {
823 $self->_error( qq[No file specified] );
827 ### it's an object already
828 return $file if UNIVERSAL::isa( $file, 'Archive::Tar::File' );
830 for my $entry ( @{$self->_data} ) {
831 my $path = $entry->full_path;
832 return $entry if $path eq $file;
835 $self->_error( qq[No such file in archive: '$file'] );
839 =head2 $tar->get_files( [@filenames] )
841 Returns the C<Archive::Tar::File> objects matching the filenames
842 provided. If no filename list was passed, all C<Archive::Tar::File>
843 objects in the current Tar object are returned.
845 Please refer to the C<Archive::Tar::File> documentation on how to
846 handle these objects.
853 return @{ $self->_data } unless @_;
856 for my $file ( @_ ) {
857 push @list, grep { defined } $self->_find_entry( $file );
863 =head2 $tar->get_content( $file )
865 Return the content of the named file.
871 my $entry = $self->_find_entry( shift ) or return;
876 =head2 $tar->replace_content( $file, $content )
878 Make the string $content be the content for the file named $file.
882 sub replace_content {
884 my $entry = $self->_find_entry( shift ) or return;
886 return $entry->replace_content( shift );
889 =head2 $tar->rename( $file, $new_name )
891 Rename the file of the in-memory archive to $new_name.
893 Note that you must specify a Unix path for $new_name, since per tar
894 standard, all files in the archive must be Unix paths.
896 Returns true on success and false on failure.
902 my $file = shift; return unless defined $file;
903 my $new = shift; return unless defined $new;
905 my $entry = $self->_find_entry( $file ) or return;
907 return $entry->rename( $new );
910 =head2 $tar->remove (@filenamelist)
912 Removes any entries with names matching any of the given filenames
913 from the in-memory archive. Returns a list of C<Archive::Tar::File>
922 my %seen = map { $_->full_path => $_ } @{$self->_data};
923 delete $seen{ $_ } for @list;
925 $self->_data( [values %seen] );
932 C<clear> clears the current in-memory archive. This effectively gives
933 you a 'blank' object, ready to be filled again. Note that C<clear>
934 only has effect on the object, not the underlying tarfile.
939 my $self = shift or return;
948 =head2 $tar->write ( [$file, $compressed, $prefix] )
950 Write the in-memory archive to disk. The first argument can either
951 be the name of a file or a reference to an already open filehandle (a
952 GLOB reference). If the second argument is true, the module will use
953 IO::Zlib to write the file in a compressed format. If IO::Zlib is
954 not available, the C<write> method will fail and return.
956 Note that when you pass in a filehandle, the compression argument
957 is ignored, as all files are printed verbatim to your filehandle.
958 If you wish to enable compression with filehandles, use an
959 C<IO::Zlib> filehandle instead.
961 Specific levels of compression can be chosen by passing the values 2
962 through 9 as the second parameter.
964 The third argument is an optional prefix. All files will be tucked
965 away in the directory you specify as prefix. So if you have files
966 'a' and 'b' in your archive, and you specify 'foo' as prefix, they
967 will be written to the archive as 'foo/a' and 'foo/b'.
969 If no arguments are given, C<write> returns the entire formatted
970 archive as a string, which could be useful if you'd like to stuff the
971 archive into a socket or a pipe to gzip or something.
977 my $file = shift; $file = '' unless defined $file;
978 my $gzip = shift || 0;
979 my $ext_prefix = shift; $ext_prefix = '' unless defined $ext_prefix;
982 ### only need a handle if we have a file to print to ###
983 my $handle = length($file)
984 ? ( $self->_get_handle($file, $gzip, WRITE_ONLY->($gzip) )
986 : $HAS_PERLIO ? do { open my $h, '>', \$dummy; $h }
987 : $HAS_IO_STRING ? IO::String->new
988 : __PACKAGE__->no_string_support();
992 for my $entry ( @{$self->_data} ) {
993 ### entries to be written to the tarfile ###
996 ### only now will we change the object to reflect the current state
997 ### of the name and prefix fields -- this needs to be limited to
999 my $clone = $entry->clone;
1002 ### so, if you don't want use to use the prefix, we'll stuff
1003 ### everything in the name field instead
1004 if( $DO_NOT_USE_PREFIX ) {
1006 ### you might have an extended prefix, if so, set it in the clone
1007 ### XXX is ::Unix right?
1008 $clone->name( length $ext_prefix
1009 ? File::Spec::Unix->catdir( $ext_prefix,
1011 : $clone->full_path );
1012 $clone->prefix( '' );
1014 ### otherwise, we'll have to set it properly -- prefix part in the
1015 ### prefix and name part in the name field.
1018 ### split them here, not before!
1019 my ($prefix,$name) = $clone->_prefix_and_file( $clone->full_path );
1021 ### you might have an extended prefix, if so, set it in the clone
1022 ### XXX is ::Unix right?
1023 $prefix = File::Spec::Unix->catdir( $ext_prefix, $prefix )
1024 if length $ext_prefix;
1026 $clone->prefix( $prefix );
1027 $clone->name( $name );
1030 ### names are too long, and will get truncated if we don't add a
1031 ### '@LongLink' file...
1032 my $make_longlink = ( length($clone->name) > NAME_LENGTH or
1033 length($clone->prefix) > PREFIX_LENGTH
1036 ### perhaps we need to make a longlink file?
1037 if( $make_longlink ) {
1038 my $longlink = Archive::Tar::File->new(
1039 data => LONGLINK_NAME,
1041 { type => LONGLINK }
1044 unless( $longlink ) {
1045 $self->_error( qq[Could not create 'LongLink' entry for ] .
1046 qq[oversize file '] . $clone->full_path ."'" );
1050 push @write_me, $longlink;
1053 push @write_me, $clone;
1055 ### write the one, optionally 2 a::t::file objects to the handle
1056 for my $clone (@write_me) {
1058 ### if the file is a symlink, there are 2 options:
1059 ### either we leave the symlink intact, but then we don't write any
1060 ### data OR we follow the symlink, which means we actually make a
1061 ### copy. if we do the latter, we have to change the TYPE of the
1063 my $link_ok = $clone->is_symlink && $Archive::Tar::FOLLOW_SYMLINK;
1064 my $data_ok = !$clone->is_symlink && $clone->has_content;
1066 ### downgrade to a 'normal' file if it's a symlink we're going to
1067 ### treat as a regular file
1068 $clone->_downgrade_to_plainfile if $link_ok;
1070 ### get the header for this block
1071 my $header = $self->_format_tar_entry( $clone );
1073 $self->_error(q[Could not format header for: ] .
1074 $clone->full_path );
1078 unless( print $handle $header ) {
1079 $self->_error(q[Could not write header for: ] .
1084 if( $link_ok or $data_ok ) {
1085 unless( print $handle $clone->data ) {
1086 $self->_error(q[Could not write data for: ] .
1091 ### pad the end of the clone if required ###
1092 print $handle TAR_PAD->( $clone->size ) if $clone->size % BLOCK
1095 } ### done writing these entries
1098 ### write the end markers ###
1099 print $handle TAR_END x 2 or
1100 return $self->_error( qq[Could not write tar end markers] );
1102 ### did you want it written to a file, or returned as a string? ###
1103 my $rv = length($file) ? 1
1104 : $HAS_PERLIO ? $dummy
1105 : do { seek $handle, 0, 0; local $/; <$handle> };
1107 ### make sure to close the handle;
1113 sub _format_tar_entry {
1115 my $entry = shift or return;
1116 my $ext_prefix = shift; $ext_prefix = '' unless defined $ext_prefix;
1117 my $no_prefix = shift || 0;
1119 my $file = $entry->name;
1120 my $prefix = $entry->prefix; $prefix = '' unless defined $prefix;
1122 ### remove the prefix from the file name
1123 ### not sure if this is still neeeded --kane
1124 ### no it's not -- Archive::Tar::File->_new_from_file will take care of
1125 ### this for us. Even worse, this would break if we tried to add a file
1127 #if( length $prefix ) {
1128 # $file =~ s/^$match//;
1131 $prefix = File::Spec::Unix->catdir($ext_prefix, $prefix)
1132 if length $ext_prefix;
1134 ### not sure why this is... ###
1135 my $l = PREFIX_LENGTH; # is ambiguous otherwise...
1136 substr ($prefix, 0, -$l) = "" if length $prefix >= PREFIX_LENGTH;
1138 my $f1 = "%06o"; my $f2 = "%11o";
1140 ### this might be optimizable with a 'changed' flag in the file objects ###
1145 (map { sprintf( $f1, $entry->$_() ) } qw[mode uid gid]),
1146 (map { sprintf( $f2, $entry->$_() ) } qw[size mtime]),
1148 "", # checksum field - space padded a bit down
1150 (map { $entry->$_() } qw[type linkname magic]),
1152 $entry->version || TAR_VERSION,
1154 (map { $entry->$_() } qw[uname gname]),
1155 (map { sprintf( $f1, $entry->$_() ) } qw[devmajor devminor]),
1157 ($no_prefix ? '' : $prefix)
1160 ### add the checksum ###
1161 substr($tar,148,7) = sprintf("%6o\0", unpack("%16C*",$tar));
1166 =head2 $tar->add_files( @filenamelist )
1168 Takes a list of filenames and adds them to the in-memory archive.
1170 The path to the file is automatically converted to a Unix like
1171 equivalent for use in the archive, and, if on MacOS, the file's
1172 modification time is converted from the MacOS epoch to the Unix epoch.
1173 So tar archives created on MacOS with B<Archive::Tar> can be read
1174 both with I<tar> on Unix and applications like I<suntar> or
1175 I<Stuffit Expander> on MacOS.
1177 Be aware that the file's type/creator and resource fork will be lost,
1178 which is usually what you want in cross-platform archives.
1180 Returns a list of C<Archive::Tar::File> objects that were just added.
1186 my @files = @_ or return;
1189 for my $file ( @files ) {
1190 unless( -e $file || -l $file ) {
1191 $self->_error( qq[No such file: '$file'] );
1195 my $obj = Archive::Tar::File->new( file => $file );
1197 $self->_error( qq[Unable to add file: '$file'] );
1204 push @{$self->{_data}}, @rv;
1209 =head2 $tar->add_data ( $filename, $data, [$opthashref] )
1211 Takes a filename, a scalar full of data and optionally a reference to
1212 a hash with specific options.
1214 Will add a file to the in-memory archive, with name C<$filename> and
1215 content C<$data>. Specific properties can be set using C<$opthashref>.
1216 The following list of properties is supported: name, size, mtime
1217 (last modified date), mode, uid, gid, linkname, uname, gname,
1218 devmajor, devminor, prefix, type. (On MacOS, the file's path and
1219 modification times are converted to Unix equivalents.)
1221 Valid values for the file type are the following constants defined in
1222 Archive::Tar::Constants:
1234 Hard and symbolic ("soft") links; linkname should specify target.
1240 Character and block devices. devmajor and devminor should specify the major
1241 and minor device numbers.
1257 Returns the C<Archive::Tar::File> object that was just added, or
1258 C<undef> on failure.
1264 my ($file, $data, $opt) = @_;
1266 my $obj = Archive::Tar::File->new( data => $file, $data, $opt );
1268 $self->_error( qq[Unable to add file: '$file'] );
1272 push @{$self->{_data}}, $obj;
1277 =head2 $tar->error( [$BOOL] )
1279 Returns the current errorstring (usually, the last error reported).
1280 If a true value was specified, it will give the C<Carp::longmess>
1281 equivalent of the error, in effect giving you a stacktrace.
1283 For backwards compatibility, this error is also available as
1284 C<$Archive::Tar::error> although it is much recommended you use the
1285 method call instead.
1295 my $msg = $error = shift;
1296 $longmess = Carp::longmess($error);
1298 ### set Archive::Tar::WARN to 0 to disable printing
1301 carp $DEBUG ? $longmess : $msg;
1309 return shift() ? $longmess : $error;
1313 =head2 $tar->setcwd( $cwd );
1315 C<Archive::Tar> needs to know the current directory, and it will run
1316 C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the
1317 tarfile and saves it in the file system. (As of version 1.30, however,
1318 C<Archive::Tar> will use the speed optimization described below
1319 automatically, so it's only relevant if you're using C<extract_file()>).
1321 Since C<Archive::Tar> doesn't change the current directory internally
1322 while it is extracting the items in a tarball, all calls to C<Cwd::cwd()>
1323 can be avoided if we can guarantee that the current directory doesn't
1324 get changed externally.
1326 To use this performance boost, set the current directory via
1329 $tar->setcwd( cwd() );
1331 once before calling a function like C<extract_file> and
1332 C<Archive::Tar> will use the current directory setting from then on
1333 and won't call C<Cwd::cwd()> internally.
1335 To switch back to the default behaviour, use
1337 $tar->setcwd( undef );
1339 and C<Archive::Tar> will call C<Cwd::cwd()> internally again.
1341 If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will
1350 $self->{cwd} = $cwd;
1353 =head2 $bool = $tar->has_io_string
1355 Returns true if we currently have C<IO::String> support loaded.
1357 Either C<IO::String> or C<perlio> support is needed to support writing
1358 stringified archives. Currently, C<perlio> is the preferred method, if
1361 See the C<GLOBAL VARIABLES> section to see how to change this preference.
1365 sub has_io_string { return $HAS_IO_STRING; }
1367 =head2 $bool = $tar->has_perlio
1369 Returns true if we currently have C<perlio> support loaded.
1371 This requires C<perl-5.8> or higher, compiled with C<perlio>
1373 Either C<IO::String> or C<perlio> support is needed to support writing
1374 stringified archives. Currently, C<perlio> is the preferred method, if
1377 See the C<GLOBAL VARIABLES> section to see how to change this preference.
1381 sub has_perlio { return $HAS_PERLIO; }
1384 =head1 Class Methods
1386 =head2 Archive::Tar->create_archive($file, $compression, @filelist)
1388 Creates a tar file from the list of files provided. The first
1389 argument can either be the name of the tar file to create or a
1390 reference to an open file handle (e.g. a GLOB reference).
1392 The second argument specifies the level of compression to be used, if
1393 any. Compression of tar files requires the installation of the
1394 IO::Zlib module. Specific levels of compression may be
1395 requested by passing a value between 2 and 9 as the second argument.
1396 Any other value evaluating as true will result in the default
1397 compression level being used.
1399 Note that when you pass in a filehandle, the compression argument
1400 is ignored, as all files are printed verbatim to your filehandle.
1401 If you wish to enable compression with filehandles, use an
1402 C<IO::Zlib> filehandle instead.
1404 The remaining arguments list the files to be included in the tar file.
1405 These files must all exist. Any files which don't exist or can't be
1406 read are silently ignored.
1408 If the archive creation fails for any reason, C<create_archive> will
1409 return false. Please use the C<error> method to find the cause of the
1412 Note that this method does not write C<on the fly> as it were; it
1413 still reads all the files into memory before writing out the archive.
1414 Consult the FAQ below if this is a problem.
1418 sub create_archive {
1421 my $file = shift; return unless defined $file;
1422 my $gzip = shift || 0;
1426 return $class->_error( qq[Cowardly refusing to create empty archive!] );
1429 my $tar = $class->new;
1430 $tar->add_files( @files );
1431 return $tar->write( $file, $gzip );
1434 =head2 Archive::Tar->list_archive ($file, $compressed, [\@properties])
1436 Returns a list of the names of all the files in the archive. The
1437 first argument can either be the name of the tar file to list or a
1438 reference to an open file handle (e.g. a GLOB reference).
1440 If C<list_archive()> is passed an array reference as its third
1441 argument it returns a list of hash references containing the requested
1442 properties of each file. The following list of properties is
1443 supported: full_path, name, size, mtime (last modified date), mode,
1444 uid, gid, linkname, uname, gname, devmajor, devminor, prefix.
1446 See C<Archive::Tar::File> for details about supported properties.
1448 Passing an array reference containing only one element, 'name', is
1449 special cased to return a list of names rather than a list of hash
1456 my $file = shift; return unless defined $file;
1457 my $gzip = shift || 0;
1459 my $tar = $class->new($file, $gzip);
1462 return $tar->list_files( @_ );
1465 =head2 Archive::Tar->extract_archive ($file, $gzip)
1467 Extracts the contents of the tar file. The first argument can either
1468 be the name of the tar file to create or a reference to an open file
1469 handle (e.g. a GLOB reference). All relative paths in the tar file will
1470 be created underneath the current working directory.
1472 C<extract_archive> will return a list of files it extracted.
1473 If the archive extraction fails for any reason, C<extract_archive>
1474 will return false. Please use the C<error> method to find the cause
1479 sub extract_archive {
1481 my $file = shift; return unless defined $file;
1482 my $gzip = shift || 0;
1484 my $tar = $class->new( ) or return;
1486 return $tar->read( $file, $gzip, { extract => 1 } );
1489 =head2 Archive::Tar->can_handle_compressed_files
1491 A simple checking routine, which will return true if C<Archive::Tar>
1492 is able to uncompress compressed archives on the fly with C<IO::Zlib>,
1493 or false if C<IO::Zlib> is not installed.
1495 You can use this as a shortcut to determine whether C<Archive::Tar>
1496 will do what you think before passing compressed archives to its
1501 sub can_handle_compressed_files { return ZLIB ? 1 : 0 }
1503 sub no_string_support {
1504 croak("You have to install IO::String to support writing archives to strings");
1511 =head1 GLOBAL VARIABLES
1513 =head2 $Archive::Tar::FOLLOW_SYMLINK
1515 Set this variable to C<1> to make C<Archive::Tar> effectively make a
1516 copy of the file when extracting. Default is C<0>, which
1517 means the symlink stays intact. Of course, you will have to pack the
1518 file linked to as well.
1520 This option is checked when you write out the tarfile using C<write>
1521 or C<create_archive>.
1523 This works just like C</bin/tar>'s C<-h> option.
1525 =head2 $Archive::Tar::CHOWN
1527 By default, C<Archive::Tar> will try to C<chown> your files if it is
1528 able to. In some cases, this may not be desired. In that case, set
1529 this variable to C<0> to disable C<chown>-ing, even if it were
1532 The default is C<1>.
1534 =head2 $Archive::Tar::CHMOD
1536 By default, C<Archive::Tar> will try to C<chmod> your files to
1537 whatever mode was specified for the particular file in the archive.
1538 In some cases, this may not be desired. In that case, set this
1539 variable to C<0> to disable C<chmod>-ing.
1541 The default is C<1>.
1543 =head2 $Archive::Tar::DO_NOT_USE_PREFIX
1545 By default, C<Archive::Tar> will try to put paths that are over
1546 100 characters in the C<prefix> field of your tar header, as
1547 defined per POSIX-standard. However, some (older) tar programs
1548 do not implement this spec. To retain compatibility with these older
1549 or non-POSIX compliant versions, you can set the C<$DO_NOT_USE_PREFIX>
1550 variable to a true value, and C<Archive::Tar> will use an alternate
1551 way of dealing with paths over 100 characters by using the
1552 C<GNU Extended Header> feature.
1554 Note that clients who do not support the C<GNU Extended Header>
1555 feature will not be able to read these archives. Such clients include
1556 tars on C<Solaris>, C<Irix> and C<AIX>.
1558 The default is C<0>.
1560 =head2 $Archive::Tar::DEBUG
1562 Set this variable to C<1> to always get the C<Carp::longmess> output
1563 of the warnings, instead of the regular C<carp>. This is the same
1564 message you would get by doing:
1570 =head2 $Archive::Tar::WARN
1572 Set this variable to C<0> if you do not want any warnings printed.
1573 Personally I recommend against doing this, but people asked for the
1574 option. Also, be advised that this is of course not threadsafe.
1578 =head2 $Archive::Tar::error
1580 Holds the last reported error. Kept for historical reasons, but its
1581 use is very much discouraged. Use the C<error()> method instead:
1583 warn $tar->error unless $tar->extract;
1585 =head2 $Archive::Tar::INSECURE_EXTRACT_MODE
1587 This variable indicates whether C<Archive::Tar> should allow
1588 files to be extracted outside their current working directory.
1590 Allowing this could have security implications, as a malicious
1591 tar archive could alter or replace any file the extracting user
1592 has permissions to. Therefor, the default is to not allow
1593 insecure extractions.
1595 If you trust the archive, or have other reasons to allow the
1596 archive to write files outside your current working directory,
1597 set this variable to C<true>.
1599 Note that this is a backwards incompatible change from version
1602 =head2 $Archive::Tar::HAS_PERLIO
1604 This variable holds a boolean indicating if we currently have
1605 C<perlio> support loaded. This will be enabled for any perl
1606 greater than C<5.8> compiled with C<perlio>.
1608 If you feel strongly about disabling it, set this variable to
1609 C<false>. Note that you will then need C<IO::String> installed
1610 to support writing stringified archives.
1612 Don't change this variable unless you B<really> know what you're
1615 =head2 $Archive::Tar::HAS_IO_STRING
1617 This variable holds a boolean indicating if we currently have
1618 C<IO::String> support loaded. This will be enabled for any perl
1619 that has a loadable C<IO::String> module.
1621 If you feel strongly about disabling it, set this variable to
1622 C<false>. Note that you will then need C<perlio> support from
1623 your perl to be able to write stringified archives.
1625 Don't change this variable unless you B<really> know what you're
1632 =item What's the minimum perl version required to run Archive::Tar?
1634 You will need perl version 5.005_03 or newer.
1636 =item Isn't Archive::Tar slow?
1638 Yes it is. It's pure perl, so it's a lot slower then your C</bin/tar>
1639 However, it's very portable. If speed is an issue, consider using
1640 C</bin/tar> instead.
1642 =item Isn't Archive::Tar heavier on memory than /bin/tar?
1644 Yes it is, see previous answer. Since C<Compress::Zlib> and therefore
1645 C<IO::Zlib> doesn't support C<seek> on their filehandles, there is little
1646 choice but to read the archive into memory.
1647 This is ok if you want to do in-memory manipulation of the archive.
1648 If you just want to extract, use the C<extract_archive> class method
1649 instead. It will optimize and write to disk immediately.
1651 =item Can't you lazy-load data instead?
1653 No, not easily. See previous question.
1655 =item How much memory will an X kb tar file need?
1657 Probably more than X kb, since it will all be read into memory. If
1658 this is a problem, and you don't need to do in memory manipulation
1659 of the archive, consider using C</bin/tar> instead.
1661 =item What do you do with unsupported filetypes in an archive?
1663 C<Unix> has a few filetypes that aren't supported on other platforms,
1664 like C<Win32>. If we encounter a C<hardlink> or C<symlink> we'll just
1665 try to make a copy of the original file, rather than throwing an error.
1667 This does require you to read the entire archive in to memory first,
1668 since otherwise we wouldn't know what data to fill the copy with.
1669 (This means that you cannot use the class methods on archives that
1670 have incompatible filetypes and still expect things to work).
1672 For other filetypes, like C<chardevs> and C<blockdevs> we'll warn that
1673 the extraction of this particular item didn't work.
1675 =item I'm using WinZip, or some other non-POSIX client, and files are not being extracted properly!
1677 By default, C<Archive::Tar> is in a completely POSIX-compatible
1678 mode, which uses the POSIX-specification of C<tar> to store files.
1679 For paths greather than 100 characters, this is done using the
1680 C<POSIX header prefix>. Non-POSIX-compatible clients may not support
1681 this part of the specification, and may only support the C<GNU Extended
1682 Header> functionality. To facilitate those clients, you can set the
1683 C<$Archive::Tar::DO_NOT_USE_PREFIX> variable to C<true>. See the
1684 C<GLOBAL VARIABLES> section for details on this variable.
1686 Note that GNU tar earlier than version 1.14 does not cope well with
1687 the C<POSIX header prefix>. If you use such a version, consider setting
1688 the C<$Archive::Tar::DO_NOT_USE_PREFIX> variable to C<true>.
1690 =item How do I extract only files that have property X from an archive?
1692 Sometimes, you might not wish to extract a complete archive, just
1693 the files that are relevant to you, based on some criteria.
1695 You can do this by filtering a list of C<Archive::Tar::File> objects
1696 based on your criteria. For example, to extract only files that have
1697 the string C<foo> in their title, you would use:
1700 grep { $_->full_path =~ /foo/ } $tar->get_files
1703 This way, you can filter on any attribute of the files in the archive.
1704 Consult the C<Archive::Tar::File> documentation on how to use these
1707 =item How do I access .tar.Z files?
1709 The C<Archive::Tar> module can optionally use C<Compress::Zlib> (via
1710 the C<IO::Zlib> module) to access tar files that have been compressed
1711 with C<gzip>. Unfortunately tar files compressed with the Unix C<compress>
1712 utility cannot be read by C<Compress::Zlib> and so cannot be directly
1713 accesses by C<Archive::Tar>.
1715 If the C<uncompress> or C<gunzip> programs are available, you can use
1716 one of these workarounds to read C<.tar.Z> files from C<Archive::Tar>
1718 Firstly with C<uncompress>
1722 open F, "uncompress -c $filename |";
1723 my $tar = Archive::Tar->new(*F);
1726 and this with C<gunzip>
1730 open F, "gunzip -c $filename |";
1731 my $tar = Archive::Tar->new(*F);
1734 Similarly, if the C<compress> program is available, you can use this to
1735 write a C<.tar.Z> file
1740 my $fh = new IO::File "| compress -c >$filename";
1741 my $tar = Archive::Tar->new();
1746 =item How do I handle Unicode strings?
1748 C<Archive::Tar> uses byte semantics for any files it reads from or writes
1749 to disk. This is not a problem if you only deal with files and never
1750 look at their content or work solely with byte strings. But if you use
1751 Unicode strings with character semantics, some additional steps need
1754 For example, if you add a Unicode string like
1757 $tar->add_data('file.txt', "Euro: \x{20AC}");
1759 then there will be a problem later when the tarfile gets written out
1760 to disk via C<$tar->write()>:
1762 Wide character in print at .../Archive/Tar.pm line 1014.
1764 The data was added as a Unicode string and when writing it out to disk,
1765 the C<:utf8> line discipline wasn't set by C<Archive::Tar>, so Perl
1766 tried to convert the string to ISO-8859 and failed. The written file
1767 now contains garbage.
1769 For this reason, Unicode strings need to be converted to UTF-8-encoded
1770 bytestrings before they are handed off to C<add_data()>:
1773 my $data = "Accented character: \x{20AC}";
1774 $data = encode('utf8', $data);
1776 $tar->add_data('file.txt', $data);
1778 A opposite problem occurs if you extract a UTF8-encoded file from a
1779 tarball. Using C<get_content()> on the C<Archive::Tar::File> object
1780 will return its content as a bytestring, not as a Unicode string.
1782 If you want it to be a Unicode string (because you want character
1783 semantics with operations like regular expression matching), you need
1784 to decode the UTF8-encoded content and have Perl convert it into
1788 my $data = $tar->get_content();
1790 # Make it a Unicode string
1791 $data = decode('utf8', $data);
1793 There is no easy way to provide this functionality in C<Archive::Tar>,
1794 because a tarball can contain many files, and each of which could be
1795 encoded in a different way.
1803 =item Check if passed in handles are open for read/write
1805 Currently I don't know of any portable pure perl way to do this.
1806 Suggestions welcome.
1808 =item Allow archives to be passed in as string
1810 Currently, we only allow opened filehandles or filenames, but
1811 not strings. The internals would need some reworking to facilitate
1812 stringified archives.
1814 =item Facilitate processing an opened filehandle of a compressed archive
1816 Currently, we only support this if the filehandle is an IO::Zlib object.
1817 Environments, like apache, will present you with an opened filehandle
1818 to an uploaded file, which might be a compressed archive.
1826 =item The GNU tar specification
1828 C<http://www.gnu.org/software/tar/manual/tar.html>
1830 =item The PAX format specication
1832 The specifcation which tar derives from; C< http://www.opengroup.org/onlinepubs/007904975/utilities/pax.html>
1834 =item A comparison of GNU and POSIX tar standards; C<http://www.delorie.com/gnu/docs/tar/tar_114.html>
1836 =item GNU tar intends to switch to POSIX compatibility
1838 GNU Tar authors have expressed their intention to become completely
1839 POSIX-compatible; C<http://www.gnu.org/software/tar/manual/html_node/Formats.html>
1841 =item A Comparison between various tar implementations
1843 Lists known issues and incompatibilities; C<http://gd.tuwien.ac.at/utils/archivers/star/README.otherbugs>
1849 This module by Jos Boumans E<lt>kane@cpan.orgE<gt>.
1851 Please reports bugs to E<lt>bug-archive-tar@rt.cpan.orgE<gt>.
1853 =head1 ACKNOWLEDGEMENTS
1855 Thanks to Sean Burke, Chris Nandor, Chip Salzenberg, Tim Heaney and
1856 especially Andrew Savige for their help and suggestions.
1860 This module is copyright (c) 2002 - 2007 Jos Boumans
1861 E<lt>kane@cpan.orgE<gt>. All rights reserved.
1863 This library is free software; you may redistribute and/or modify
1864 it under the same terms as Perl itself.