10 find - traverse a file tree
12 finddepth - traverse a directory structure depth-first
17 find(\&wanted, '/foo', '/bar');
21 finddepth(\&wanted, '/foo', '/bar');
25 find({ wanted => \&process, follow => 1 }, '.');
29 The first argument to find() is either a hash reference describing the
30 operations to be performed for each file, or a code reference.
32 Here are the possible keys for the hash:
38 The value should be a code reference. This code reference is called
39 I<the wanted() function> below.
43 Reports the name of a directory only AFTER all its entries
44 have been reported. Entry point finddepth() is a shortcut for
45 specifying C<{ bydepth => 1 }> in the first argument of find().
49 The value should be a code reference. This code reference is used to
50 preprocess a directory; it is called after readdir() but before the loop that
51 calls the wanted() function. It is called with a list of strings and is
52 expected to return a list of strings. The code can be used to sort the
53 strings alphabetically, numerically, or to filter out directory entries based
58 The value should be a code reference. It is invoked just before leaving the
59 current directory. It is called in void context with no arguments. The name
60 of the current directory is in $File::Find::dir. This hook is handy for
61 summarizing a directory, such as calculating its disk usage.
65 Causes symbolic links to be followed. Since directory trees with symbolic
66 links (followed) may contain files more than once and may even have
67 cycles, a hash has to be built up with an entry for each file.
68 This might be expensive both in space and time for a large
69 directory tree. See I<follow_fast> and I<follow_skip> below.
70 If either I<follow> or I<follow_fast> is in effect:
76 It is guaranteed that an I<lstat> has been called before the user's
77 I<wanted()> function is called. This enables fast file checks involving S< _>.
81 There is a variable C<$File::Find::fullname> which holds the absolute
82 pathname of the file with all symbolic links resolved
88 This is similar to I<follow> except that it may report some files more
89 than once. It does detect cycles, however. Since only symbolic links
90 have to be hashed, this is much cheaper both in space and time. If
91 processing a file more than once (by the user's I<wanted()> function)
92 is worse than just taking time, the option I<follow> should be used.
96 C<follow_skip==1>, which is the default, causes all files which are
97 neither directories nor symbolic links to be ignored if they are about
98 to be processed a second time. If a directory or a symbolic link
99 are about to be processed a second time, File::Find dies.
100 C<follow_skip==0> causes File::Find to die if any file is about to be
101 processed a second time.
102 C<follow_skip==2> causes File::Find to ignore any duplicate files and
103 dirctories but to proceed normally otherwise.
108 Does not C<chdir()> to each directory as it recurses. The wanted()
109 function will need to be aware of this, of course. In this case,
110 C<$_> will be the same as C<$File::Find::name>.
114 If find is used in taint-mode (-T command line switch or if EUID != UID
115 or if EGID != GID) then internally directory names have to be untainted
116 before they can be cd'ed to. Therefore they are checked against a regular
117 expression I<untaint_pattern>. Note that all names passed to the
118 user's I<wanted()> function are still tainted.
120 =item C<untaint_pattern>
122 See above. This should be set using the C<qr> quoting operator.
123 The default is set to C<qr|^([-+@\w./]+)$|>.
124 Note that the parantheses are vital.
126 =item C<untaint_skip>
128 If set, directories (subtrees) which fail the I<untaint_pattern>
129 are skipped. The default is to 'die' in such a case.
133 The wanted() function does whatever verifications you want.
134 C<$File::Find::dir> contains the current directory name, and C<$_> the
135 current filename within that directory. C<$File::Find::name> contains
136 the complete pathname to the file. You are chdir()'d to
137 C<$File::Find::dir> when the function is called, unless C<no_chdir>
138 was specified. When <follow> or <follow_fast> are in effect, there is
139 also a C<$File::Find::fullname>. The function may set
140 C<$File::Find::prune> to prune the tree unless C<bydepth> was
141 specified. Unless C<follow> or C<follow_fast> is specified, for
142 compatibility reasons (find.pl, find2perl) there are in addition the
143 following globals available: C<$File::Find::topdir>,
144 C<$File::Find::topdev>, C<$File::Find::topino>,
145 C<$File::Find::topmode> and C<$File::Find::topnlink>.
147 This library is useful for the C<find2perl> tool, which when fed,
149 find2perl / -name .nfs\* -mtime +7 \
150 -exec rm -f {} \; -o -fstype nfs -prune
152 produces something like:
156 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
160 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
162 ($File::Find::prune = 1);
165 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
169 Here's another interesting wanted function. It will find all symlinks
173 -l && !-e && print "bogus link: $File::Find::name\n";
176 See also the script C<pfind> on CPAN for a nice application of this
181 Be aware that the option to follow symbolic links can be dangerous.
182 Depending on the structure of the directory tree (including symbolic
183 links to directories) you might traverse a given (physical) directory
184 more than once (only if C<follow_fast> is in effect).
185 Furthermore, deleting or changing files in a symbolically linked directory
186 might cause very unpleasant surprises, since you delete or change files
187 in an unknown directory.
192 our @ISA = qw(Exporter);
193 our @EXPORT = qw(find finddepth);
199 require File::Basename;
202 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
203 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat,
204 $pre_process, $post_process);
209 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
211 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
215 my $abs_name= $cdir . $fn;
217 if (substr($fn,0,3) eq '../') {
218 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
225 sub PathCombine($$) {
226 my ($Base,$Name) = @_;
229 if (substr($Name,0,1) eq '/') {
233 $AbsName= contract_name($Base,$Name);
236 # (simple) check for recursion
237 my $newlen= length($AbsName);
238 if ($newlen <= length($Base)) {
239 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
240 && $AbsName eq substr($Base,0,$newlen))
248 sub Follow_SymLink($) {
251 my ($NewName,$DEV, $INO);
252 ($DEV, $INO)= lstat $AbsName;
255 if ($SLnkSeen{$DEV, $INO}++) {
256 if ($follow_skip < 2) {
257 die "$AbsName is encountered a second time";
263 $NewName= PathCombine($AbsName, readlink($AbsName));
264 unless(defined $NewName) {
265 if ($follow_skip < 2) {
266 die "$AbsName is a recursive symbolic link";
275 ($DEV, $INO) = lstat($AbsName);
276 return undef unless defined $DEV; # dangling symbolic link
279 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
280 if ($follow_skip < 1) {
281 die "$AbsName encountered a second time";
291 our($dir, $name, $fullname, $prune);
292 sub _find_dir_symlnk($$$);
297 die "invalid top directory" unless defined $_[0];
299 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
300 my $cwd_untainted = $cwd;
301 $wanted_callback = $wanted->{wanted};
302 $bydepth = $wanted->{bydepth};
303 $pre_process = $wanted->{preprocess};
304 $post_process = $wanted->{postprocess};
305 $no_chdir = $wanted->{no_chdir};
306 $full_check = $wanted->{follow};
307 $follow = $full_check || $wanted->{follow_fast};
308 $follow_skip = $wanted->{follow_skip};
309 $untaint = $wanted->{untaint};
310 $untaint_pat = $wanted->{untaint_pattern};
311 $untaint_skip = $wanted->{untaint_skip};
313 # for compatability reasons (find.pl, find2perl)
314 our ($topdir, $topdev, $topino, $topmode, $topnlink);
316 # a symbolic link to a directory doesn't increase the link count
317 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
320 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
321 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
324 my ($abs_dir, $Is_Dir);
327 foreach my $TOP (@_) {
329 $top_item =~ s|/\z|| unless $top_item eq '/';
332 ($topdev,$topino,$topmode,$topnlink) = stat $top_item;
335 if (substr($top_item,0,1) eq '/') {
336 $abs_dir = $top_item;
338 elsif ($top_item eq '.') {
341 else { # care about any ../
342 $abs_dir = contract_name("$cwd/",$top_item);
344 $abs_dir= Follow_SymLink($abs_dir);
345 unless (defined $abs_dir) {
346 warn "$top_item is a dangling symbolic link\n";
350 _find_dir_symlnk($wanted, $abs_dir, $top_item);
356 unless (defined $topnlink) {
357 warn "Can't stat $top_item: $!\n";
361 $top_item =~ s/\.dir\z// if $Is_VMS;
362 _find_dir($wanted, $top_item, $topnlink);
371 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
372 ($dir,$_) = ('./', $top_item);
377 my $abs_dir_save = $abs_dir;
378 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
379 unless (defined $abs_dir) {
380 if ($untaint_skip == 0) {
381 die "directory $abs_dir_save is still tainted";
389 unless ($no_chdir or chdir $abs_dir) {
390 warn "Couldn't chdir $abs_dir: $!\n";
394 $name = $abs_dir . $_;
396 { &$wanted_callback }; # protect against wild "next"
400 $no_chdir or chdir $cwd_untainted;
406 # $p_dir : "parent directory"
407 # $nlink : what came back from the stat
409 # chdir (if not no_chdir) to dir
412 my ($wanted, $p_dir, $nlink) = @_;
413 my ($CdLvl,$Level) = (0,0);
416 my ($subcount,$sub_nlink);
418 my $dir_name= $p_dir;
419 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
420 my $dir_rel= '.'; # directory name relative to current directory
422 local ($dir, $name, $prune, *DIR);
424 unless ($no_chdir or $p_dir eq '.') {
427 $udir = $1 if $p_dir =~ m|$untaint_pat|;
428 unless (defined $udir) {
429 if ($untaint_skip == 0) {
430 die "directory $p_dir is still tainted";
437 unless (chdir $udir) {
438 warn "Can't cd to $udir: $!\n";
443 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
445 while (defined $SE) {
449 $_= ($no_chdir ? $dir_name : $dir_rel );
450 # prune may happen here
452 { &$wanted_callback }; # protect against wild "next"
456 # change to that directory
457 unless ($no_chdir or $dir_rel eq '.') {
460 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
461 unless (defined $udir) {
462 if ($untaint_skip == 0) {
464 . ($p_dir ne '/' ? $p_dir : '')
465 . "/) $dir_rel is still tainted";
469 unless (chdir $udir) {
471 . ($p_dir ne '/' ? $p_dir : '')
480 # Get the list of files in the current directory.
481 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
482 warn "Can't opendir($dir_name): $!\n";
485 @filenames = readdir DIR;
487 @filenames = &$pre_process(@filenames) if $pre_process;
488 push @Stack,[$CdLvl,$dir_name,"",-2] if $post_process;
490 if ($nlink == 2 && !$avoid_nlink) {
491 # This dir has no subdirectories.
492 for my $FN (@filenames) {
493 next if $FN =~ /^\.{1,2}\z/;
495 $name = $dir_pref . $FN;
496 $_ = ($no_chdir ? $name : $FN);
497 { &$wanted_callback }; # protect against wild "next"
502 # This dir has subdirectories.
503 $subcount = $nlink - 2;
505 for my $FN (@filenames) {
506 next if $FN =~ /^\.{1,2}\z/;
507 if ($subcount > 0 || $avoid_nlink) {
508 # Seen all the subdirs?
509 # check for directoriness.
510 # stat is faster for a file in the current directory
511 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
515 $FN =~ s/\.dir\z// if $Is_VMS;
516 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
519 $name = $dir_pref . $FN;
520 $_= ($no_chdir ? $name : $FN);
521 { &$wanted_callback }; # protect against wild "next"
525 $name = $dir_pref . $FN;
526 $_= ($no_chdir ? $name : $FN);
527 { &$wanted_callback }; # protect against wild "next"
533 while ( defined ($SE = pop @Stack) ) {
534 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
535 if ($CdLvl > $Level && !$no_chdir) {
536 my $tmp = join('/',('..') x ($CdLvl-$Level));
537 die "Can't cd to $dir_name" . $tmp
541 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
542 $dir_pref = "$dir_name/";
543 if ( $nlink == -2 ) {
544 $name = $dir = $p_dir;
546 &$post_process; # End-of-directory processing
547 } elsif ( $nlink < 0 ) { # must be finddepth, report dirname now
549 if ( substr($name,-2) eq '/.' ) {
553 $_ = ($no_chdir ? $dir_name : $dir_rel );
554 if ( substr($_,-2) eq '/.' ) {
557 { &$wanted_callback }; # protect against wild "next"
559 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
569 # $dir_loc : absolute location of a dir
570 # $p_dir : "parent directory"
572 # chdir (if not no_chdir) to dir
574 sub _find_dir_symlnk($$$) {
575 my ($wanted, $dir_loc, $p_dir) = @_;
579 my $pdir_loc = $dir_loc;
581 my $dir_name = $p_dir;
582 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
583 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
584 my $dir_rel = '.'; # directory name relative to current directory
585 my $byd_flag; # flag for pending stack entry if $bydepth
587 local ($dir, $name, $fullname, $prune, *DIR);
589 unless ($no_chdir or $p_dir eq '.') {
592 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
593 unless (defined $udir) {
594 if ($untaint_skip == 0) {
595 die "directory $dir_loc is still tainted";
602 unless (chdir $udir) {
603 warn "Can't cd to $udir: $!\n";
608 push @Stack,[$dir_loc,$pdir_loc,$p_dir,$dir_rel,-1] if $bydepth;
610 while (defined $SE) {
613 # change to parent directory
615 my $udir = $pdir_loc;
617 $udir = $1 if $pdir_loc =~ m|$untaint_pat|;
619 unless (chdir $udir) {
620 warn "Can't cd to $udir: $!\n";
626 $_= ($no_chdir ? $dir_name : $dir_rel );
628 # prune may happen here
630 lstat($_); # make sure file tests with '_' work
631 { &$wanted_callback }; # protect against wild "next"
635 # change to that directory
636 unless ($no_chdir or $dir_rel eq '.') {
639 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
640 unless (defined $udir ) {
641 if ($untaint_skip == 0) {
642 die "directory $dir_loc is still tainted";
649 unless (chdir $udir) {
650 warn "Can't cd to $udir: $!\n";
657 # Get the list of files in the current directory.
658 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
659 warn "Can't opendir($dir_loc): $!\n";
662 @filenames = readdir DIR;
665 for my $FN (@filenames) {
666 next if $FN =~ /^\.{1,2}\z/;
668 # follow symbolic links / do an lstat
669 $new_loc = Follow_SymLink($loc_pref.$FN);
671 # ignore if invalid symlink
672 next unless defined $new_loc;
675 push @Stack,[$new_loc,$dir_loc,$dir_name,$FN,1];
678 $fullname = $new_loc;
679 $name = $dir_pref . $FN;
680 $_ = ($no_chdir ? $name : $FN);
681 { &$wanted_callback }; # protect against wild "next"
687 while (defined($SE = pop @Stack)) {
688 ($dir_loc, $pdir_loc, $p_dir, $dir_rel, $byd_flag) = @$SE;
689 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
690 $dir_pref = "$dir_name/";
691 $loc_pref = "$dir_loc/";
692 if ( $byd_flag < 0 ) { # must be finddepth, report dirname now
693 unless ($no_chdir or $dir_rel eq '.') {
694 my $udir = $pdir_loc;
696 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
698 unless (chdir $udir) {
699 warn "Can't cd to $udir: $!\n";
703 $fullname = $dir_loc;
705 if ( substr($name,-2) eq '/.' ) {
709 $_ = ($no_chdir ? $dir_name : $dir_rel);
710 if ( substr($_,-2) eq '/.' ) {
714 lstat($_); # make sure file tests with '_' work
715 { &$wanted_callback }; # protect against wild "next"
717 push @Stack,[$dir_loc, $pdir_loc, $p_dir, $dir_rel,-1] if $bydepth;
727 if ( ref($wanted) eq 'HASH' ) {
728 if ( $wanted->{follow} || $wanted->{follow_fast}) {
729 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
731 if ( $wanted->{untaint} ) {
732 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
733 unless defined $wanted->{untaint_pattern};
734 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
739 return { wanted => $wanted };
745 _find_opt(wrap_wanted($wanted), @_);
746 %SLnkSeen= (); # free memory
750 my $wanted = wrap_wanted(shift);
751 $wanted->{bydepth} = 1;
752 _find_opt($wanted, @_);
753 %SLnkSeen= (); # free memory
756 # These are hard-coded for now, but may move to hint files.
759 $File::Find::dont_use_nlink = 1;
762 $File::Find::dont_use_nlink = 1
763 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32' ||
764 $^O eq 'cygwin' || $^O eq 'epoc';
766 # Set dont_use_nlink in your hint file if your system's stat doesn't
767 # report the number of links in a directory as an indication
768 # of the number of files.
769 # See, e.g. hints/machten.sh for MachTen 2.2.
770 unless ($File::Find::dont_use_nlink) {
772 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});