8 find - traverse a file tree
10 finddepth - traverse a directory structure depth-first
15 find(\&wanted, '/foo', '/bar');
19 finddepth(\&wanted, '/foo', '/bar');
23 find({ wanted => \&process, follow => 1 }, '.');
27 The first argument to find() is either a hash reference describing the
28 operations to be performed for each file, or a code reference.
30 Here are the possible keys for the hash:
36 The value should be a code reference. This code reference is called
37 I<the wanted() function> below.
41 Reports the name of a directory only AFTER all its entries
42 have been reported. Entry point finddepth() is a shortcut for
43 specifying C<{ bydepth => 1 }> in the first argument of find().
47 The value should be a code reference. This code reference is used to
48 preprocess a directory; it is called after readdir() but before the loop that
49 calls the wanted() function. It is called with a list of strings and is
50 expected to return a list of strings. The code can be used to sort the
51 strings alphabetically, numerically, or to filter out directory entries based
56 The value should be a code reference. It is invoked just before leaving the
57 current directory. It is called in void context with no arguments. The name
58 of the current directory is in $File::Find::dir. This hook is handy for
59 summarizing a directory, such as calculating its disk usage.
63 Causes symbolic links to be followed. Since directory trees with symbolic
64 links (followed) may contain files more than once and may even have
65 cycles, a hash has to be built up with an entry for each file.
66 This might be expensive both in space and time for a large
67 directory tree. See I<follow_fast> and I<follow_skip> below.
68 If either I<follow> or I<follow_fast> is in effect:
74 It is guarantueed that an I<lstat> has been called before the user's
75 I<wanted()> function is called. This enables fast file checks involving S< _>.
79 There is a variable C<$File::Find::fullname> which holds the absolute
80 pathname of the file with all symbolic links resolved
86 This is similar to I<follow> except that it may report some files
87 more than once. It does detect cycles however.
88 Since only symbolic links have to be hashed, this is
89 much cheaper both in space and time.
90 If processing a file more than once (by the user's I<wanted()> function)
91 is worse than just taking time, the option I<follow> should be used.
95 C<follow_skip==1>, which is the default, causes all files which are
96 neither directories nor symbolic links to be ignored if they are about
97 to be processed a second time. If a directory or a symbolic link
98 are about to be processed a second time, File::Find dies.
99 C<follow_skip==0> causes File::Find to die if any file is about to be
100 processed a second time.
101 C<follow_skip==2> causes File::Find to ignore any duplicate files and
102 dirctories but to proceed normally otherwise.
107 Does not C<chdir()> to each directory as it recurses. The wanted()
108 function will need to be aware of this, of course. In this case,
109 C<$_> will be the same as C<$File::Find::name>.
113 If find is used in taint-mode (-T command line switch or if EUID != UID
114 or if EGID != GID) then internally directory names have to be untainted
115 before they can be cd'ed to. Therefore they are checked against a regular
116 expression I<untaint_pattern>. Note, that all names passed to the
117 user's I<wanted()> function are still tainted.
119 =item C<untaint_pattern>
121 See above. This should be set using the C<qr> quoting operator.
122 The default is set to C<qr|^([-+@\w./]+)$|>.
123 Note that the paranthesis which are vital.
125 =item C<untaint_skip>
127 If set, directories (subtrees) which fail the I<untaint_pattern>
128 are skipped. The default is to 'die' in such a case.
132 The wanted() function does whatever verifications you want.
133 C<$File::Find::dir> contains the current directory name, and C<$_> the
134 current filename within that directory. C<$File::Find::name> contains
135 the complete pathname to the file. You are chdir()'d to C<$File::Find::dir> when
136 the function is called, unless C<no_chdir> was specified.
137 When <follow> or <follow_fast> are in effect there is also a
138 C<$File::Find::fullname>.
139 The function may set C<$File::Find::prune> to prune the tree
140 unless C<bydepth> was specified.
141 Unless C<follow> or C<follow_fast> is specified, for compatibility
142 reasons (find.pl, find2perl) there are in addition the following globals
143 available: C<$File::Find::topdir>, C<$File::Find::topdev>, C<$File::Find::topino>,
144 C<$File::Find::topmode> and C<$File::Find::topnlink>.
146 This library is useful for the C<find2perl> tool, which when fed,
148 find2perl / -name .nfs\* -mtime +7 \
149 -exec rm -f {} \; -o -fstype nfs -prune
151 produces something like:
155 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
159 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
161 ($File::Find::prune = 1);
164 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
168 Here's another interesting wanted function. It will find all symlinks
172 -l && !-e && print "bogus link: $File::Find::name\n";
175 See also the script C<pfind> on CPAN for a nice application of this
180 Be aware that the option to follow symblic links can be dangerous.
181 Depending on the structure of the directory tree (including symbolic
182 links to directories) you might traverse a given (physical) directory
183 more than once (only if C<follow_fast> is in effect).
184 Furthermore, deleting or changing files in a symbolically linked directory
185 might cause very unpleasant surprises, since you delete or change files
186 in an unknown directory.
192 @EXPORT = qw(find finddepth);
198 require File::Basename;
201 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
202 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat,
203 $pre_process, $post_process);
208 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
210 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
214 my $abs_name= $cdir . $fn;
216 if (substr($fn,0,3) eq '../') {
217 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
224 sub PathCombine($$) {
225 my ($Base,$Name) = @_;
228 if (substr($Name,0,1) eq '/') {
232 $AbsName= contract_name($Base,$Name);
235 # (simple) check for recursion
236 my $newlen= length($AbsName);
237 if ($newlen <= length($Base)) {
238 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
239 && $AbsName eq substr($Base,0,$newlen))
247 sub Follow_SymLink($) {
250 my ($NewName,$DEV, $INO);
251 ($DEV, $INO)= lstat $AbsName;
254 if ($SLnkSeen{$DEV, $INO}++) {
255 if ($follow_skip < 2) {
256 die "$AbsName is encountered a second time";
262 $NewName= PathCombine($AbsName, readlink($AbsName));
263 unless(defined $NewName) {
264 if ($follow_skip < 2) {
265 die "$AbsName is a recursive symbolic link";
274 ($DEV, $INO) = lstat($AbsName);
275 return undef unless defined $DEV; # dangling symbolic link
278 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
279 if ($follow_skip < 1) {
280 die "$AbsName encountered a second time";
290 our($dir, $name, $fullname, $prune);
291 sub _find_dir_symlnk($$$);
296 die "invalid top directory" unless defined $_[0];
298 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
299 my $cwd_untainted = $cwd;
300 $wanted_callback = $wanted->{wanted};
301 $bydepth = $wanted->{bydepth};
302 $pre_process = $wanted->{preprocess};
303 $post_process = $wanted->{postprocess};
304 $no_chdir = $wanted->{no_chdir};
305 $full_check = $wanted->{follow};
306 $follow = $full_check || $wanted->{follow_fast};
307 $follow_skip = $wanted->{follow_skip};
308 $untaint = $wanted->{untaint};
309 $untaint_pat = $wanted->{untaint_pattern};
310 $untaint_skip = $wanted->{untaint_skip};
312 # for compatability reasons (find.pl, find2perl)
313 our ($topdir, $topdev, $topino, $topmode, $topnlink);
315 # a symbolic link to a directory doesn't increase the link count
316 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
319 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
320 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
323 my ($abs_dir, $Is_Dir);
326 foreach my $TOP (@_) {
328 $top_item =~ s|/\z|| unless $top_item eq '/';
331 ($topdev,$topino,$topmode,$topnlink) = stat $top_item;
334 if (substr($top_item,0,1) eq '/') {
335 $abs_dir = $top_item;
337 elsif ($top_item eq '.') {
340 else { # care about any ../
341 $abs_dir = contract_name("$cwd/",$top_item);
343 $abs_dir= Follow_SymLink($abs_dir);
344 unless (defined $abs_dir) {
345 warn "$top_item is a dangling symbolic link\n";
349 _find_dir_symlnk($wanted, $abs_dir, $top_item);
355 unless (defined $topnlink) {
356 warn "Can't stat $top_item: $!\n";
360 $top_item =~ s/\.dir\z// if $Is_VMS;
361 _find_dir($wanted, $top_item, $topnlink);
370 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
371 ($dir,$_) = ('./', $top_item);
376 my $abs_dir_save = $abs_dir;
377 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
378 unless (defined $abs_dir) {
379 if ($untaint_skip == 0) {
380 die "directory $abs_dir_save is still tainted";
388 unless ($no_chdir or chdir $abs_dir) {
389 warn "Couldn't chdir $abs_dir: $!\n";
393 $name = $abs_dir . $_;
395 { &$wanted_callback }; # protect against wild "next"
399 $no_chdir or chdir $cwd_untainted;
405 # $p_dir : "parent directory"
406 # $nlink : what came back from the stat
408 # chdir (if not no_chdir) to dir
411 my ($wanted, $p_dir, $nlink) = @_;
412 my ($CdLvl,$Level) = (0,0);
415 my ($subcount,$sub_nlink);
417 my $dir_name= $p_dir;
418 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
419 my $dir_rel= '.'; # directory name relative to current directory
421 local ($dir, $name, $prune, *DIR);
423 unless ($no_chdir or $p_dir eq '.') {
426 $udir = $1 if $p_dir =~ m|$untaint_pat|;
427 unless (defined $udir) {
428 if ($untaint_skip == 0) {
429 die "directory $p_dir is still tainted";
436 unless (chdir $udir) {
437 warn "Can't cd to $udir: $!\n";
442 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
444 while (defined $SE) {
448 $_= ($no_chdir ? $dir_name : $dir_rel );
449 # prune may happen here
451 { &$wanted_callback }; # protect against wild "next"
455 # change to that directory
456 unless ($no_chdir or $dir_rel eq '.') {
459 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
460 unless (defined $udir) {
461 if ($untaint_skip == 0) {
463 . ($p_dir ne '/' ? $p_dir : '')
464 . "/) $dir_rel is still tainted";
468 unless (chdir $udir) {
470 . ($p_dir ne '/' ? $p_dir : '')
479 # Get the list of files in the current directory.
480 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
481 warn "Can't opendir($dir_name): $!\n";
484 @filenames = readdir DIR;
486 @filenames = &$pre_process(@filenames) if $pre_process;
487 push @Stack,[$CdLvl,$dir_name,"",-2] if $post_process;
489 if ($nlink == 2 && !$avoid_nlink) {
490 # This dir has no subdirectories.
491 for my $FN (@filenames) {
492 next if $FN =~ /^\.{1,2}\z/;
494 $name = $dir_pref . $FN;
495 $_ = ($no_chdir ? $name : $FN);
496 { &$wanted_callback }; # protect against wild "next"
501 # This dir has subdirectories.
502 $subcount = $nlink - 2;
504 for my $FN (@filenames) {
505 next if $FN =~ /^\.{1,2}\z/;
506 if ($subcount > 0 || $avoid_nlink) {
507 # Seen all the subdirs?
508 # check for directoriness.
509 # stat is faster for a file in the current directory
510 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
514 $FN =~ s/\.dir\z// if $Is_VMS;
515 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
518 $name = $dir_pref . $FN;
519 $_= ($no_chdir ? $name : $FN);
520 { &$wanted_callback }; # protect against wild "next"
524 $name = $dir_pref . $FN;
525 $_= ($no_chdir ? $name : $FN);
526 { &$wanted_callback }; # protect against wild "next"
532 while ( defined ($SE = pop @Stack) ) {
533 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
534 if ($CdLvl > $Level && !$no_chdir) {
535 my $tmp = join('/',('..') x ($CdLvl-$Level));
536 die "Can't cd to $dir_name" . $tmp
540 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
541 $dir_pref = "$dir_name/";
542 if ( $nlink == -2 ) {
543 $name = $dir = $p_dir;
545 &$post_process; # End-of-directory processing
546 } elsif ( $nlink < 0 ) { # must be finddepth, report dirname now
548 if ( substr($name,-2) eq '/.' ) {
552 $_ = ($no_chdir ? $dir_name : $dir_rel );
553 if ( substr($_,-2) eq '/.' ) {
556 { &$wanted_callback }; # protect against wild "next"
558 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
568 # $dir_loc : absolute location of a dir
569 # $p_dir : "parent directory"
571 # chdir (if not no_chdir) to dir
573 sub _find_dir_symlnk($$$) {
574 my ($wanted, $dir_loc, $p_dir) = @_;
578 my $pdir_loc = $dir_loc;
580 my $dir_name = $p_dir;
581 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
582 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
583 my $dir_rel = '.'; # directory name relative to current directory
584 my $byd_flag; # flag for pending stack entry if $bydepth
586 local ($dir, $name, $fullname, $prune, *DIR);
588 unless ($no_chdir or $p_dir eq '.') {
591 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
592 unless (defined $udir) {
593 if ($untaint_skip == 0) {
594 die "directory $dir_loc is still tainted";
601 unless (chdir $udir) {
602 warn "Can't cd to $udir: $!\n";
607 push @Stack,[$dir_loc,$pdir_loc,$p_dir,$dir_rel,-1] if $bydepth;
609 while (defined $SE) {
612 # change to parent directory
614 my $udir = $pdir_loc;
616 $udir = $1 if $pdir_loc =~ m|$untaint_pat|;
618 unless (chdir $udir) {
619 warn "Can't cd to $udir: $!\n";
625 $_= ($no_chdir ? $dir_name : $dir_rel );
627 # prune may happen here
629 lstat($_); # make sure file tests with '_' work
630 { &$wanted_callback }; # protect against wild "next"
634 # change to that directory
635 unless ($no_chdir or $dir_rel eq '.') {
638 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
639 unless (defined $udir ) {
640 if ($untaint_skip == 0) {
641 die "directory $dir_loc is still tainted";
648 unless (chdir $udir) {
649 warn "Can't cd to $udir: $!\n";
656 # Get the list of files in the current directory.
657 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
658 warn "Can't opendir($dir_loc): $!\n";
661 @filenames = readdir DIR;
664 for my $FN (@filenames) {
665 next if $FN =~ /^\.{1,2}\z/;
667 # follow symbolic links / do an lstat
668 $new_loc = Follow_SymLink($loc_pref.$FN);
670 # ignore if invalid symlink
671 next unless defined $new_loc;
674 push @Stack,[$new_loc,$dir_loc,$dir_name,$FN,1];
677 $fullname = $new_loc;
678 $name = $dir_pref . $FN;
679 $_ = ($no_chdir ? $name : $FN);
680 { &$wanted_callback }; # protect against wild "next"
686 while (defined($SE = pop @Stack)) {
687 ($dir_loc, $pdir_loc, $p_dir, $dir_rel, $byd_flag) = @$SE;
688 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
689 $dir_pref = "$dir_name/";
690 $loc_pref = "$dir_loc/";
691 if ( $byd_flag < 0 ) { # must be finddepth, report dirname now
692 unless ($no_chdir or $dir_rel eq '.') {
693 my $udir = $pdir_loc;
695 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
697 unless (chdir $udir) {
698 warn "Can't cd to $udir: $!\n";
702 $fullname = $dir_loc;
704 if ( substr($name,-2) eq '/.' ) {
708 $_ = ($no_chdir ? $dir_name : $dir_rel);
709 if ( substr($_,-2) eq '/.' ) {
713 lstat($_); # make sure file tests with '_' work
714 { &$wanted_callback }; # protect against wild "next"
716 push @Stack,[$dir_loc, $pdir_loc, $p_dir, $dir_rel,-1] if $bydepth;
726 if ( ref($wanted) eq 'HASH' ) {
727 if ( $wanted->{follow} || $wanted->{follow_fast}) {
728 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
730 if ( $wanted->{untaint} ) {
731 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
732 unless defined $wanted->{untaint_pattern};
733 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
738 return { wanted => $wanted };
744 _find_opt(wrap_wanted($wanted), @_);
745 %SLnkSeen= (); # free memory
749 my $wanted = wrap_wanted(shift);
750 $wanted->{bydepth} = 1;
751 _find_opt($wanted, @_);
752 %SLnkSeen= (); # free memory
755 # These are hard-coded for now, but may move to hint files.
758 $File::Find::dont_use_nlink = 1;
761 $File::Find::dont_use_nlink = 1
762 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32' ||
765 # Set dont_use_nlink in your hint file if your system's stat doesn't
766 # report the number of links in a directory as an indication
767 # of the number of files.
768 # See, e.g. hints/machten.sh for MachTen 2.2.
769 unless ($File::Find::dont_use_nlink) {
771 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});