11 find - traverse a file tree
13 finddepth - traverse a directory structure depth-first
18 find(\&wanted, '/foo', '/bar');
22 finddepth(\&wanted, '/foo', '/bar');
26 find({ wanted => \&process, follow => 1 }, '.');
30 The first argument to find() is either a hash reference describing the
31 operations to be performed for each file, or a code reference.
33 Here are the possible keys for the hash:
39 The value should be a code reference. This code reference is called
40 I<the wanted() function> below.
44 Reports the name of a directory only AFTER all its entries
45 have been reported. Entry point finddepth() is a shortcut for
46 specifying C<{ bydepth => 1 }> in the first argument of find().
50 The value should be a code reference. This code reference is used to
51 preprocess a directory; it is called after readdir() but before the loop that
52 calls the wanted() function. It is called with a list of strings and is
53 expected to return a list of strings. The code can be used to sort the
54 strings alphabetically, numerically, or to filter out directory entries based
59 The value should be a code reference. It is invoked just before leaving the
60 current directory. It is called in void context with no arguments. The name
61 of the current directory is in $File::Find::dir. This hook is handy for
62 summarizing a directory, such as calculating its disk usage.
66 Causes symbolic links to be followed. Since directory trees with symbolic
67 links (followed) may contain files more than once and may even have
68 cycles, a hash has to be built up with an entry for each file.
69 This might be expensive both in space and time for a large
70 directory tree. See I<follow_fast> and I<follow_skip> below.
71 If either I<follow> or I<follow_fast> is in effect:
77 It is guaranteed that an I<lstat> has been called before the user's
78 I<wanted()> function is called. This enables fast file checks involving S< _>.
82 There is a variable C<$File::Find::fullname> which holds the absolute
83 pathname of the file with all symbolic links resolved
89 This is similar to I<follow> except that it may report some files more
90 than once. It does detect cycles, however. Since only symbolic links
91 have to be hashed, this is much cheaper both in space and time. If
92 processing a file more than once (by the user's I<wanted()> function)
93 is worse than just taking time, the option I<follow> should be used.
97 C<follow_skip==1>, which is the default, causes all files which are
98 neither directories nor symbolic links to be ignored if they are about
99 to be processed a second time. If a directory or a symbolic link
100 are about to be processed a second time, File::Find dies.
101 C<follow_skip==0> causes File::Find to die if any file is about to be
102 processed a second time.
103 C<follow_skip==2> causes File::Find to ignore any duplicate files and
104 dirctories but to proceed normally otherwise.
109 Does not C<chdir()> to each directory as it recurses. The wanted()
110 function will need to be aware of this, of course. In this case,
111 C<$_> will be the same as C<$File::Find::name>.
115 If find is used in taint-mode (-T command line switch or if EUID != UID
116 or if EGID != GID) then internally directory names have to be untainted
117 before they can be cd'ed to. Therefore they are checked against a regular
118 expression I<untaint_pattern>. Note that all names passed to the
119 user's I<wanted()> function are still tainted.
121 =item C<untaint_pattern>
123 See above. This should be set using the C<qr> quoting operator.
124 The default is set to C<qr|^([-+@\w./]+)$|>.
125 Note that the parantheses are vital.
127 =item C<untaint_skip>
129 If set, directories (subtrees) which fail the I<untaint_pattern>
130 are skipped. The default is to 'die' in such a case.
134 The wanted() function does whatever verifications you want.
135 C<$File::Find::dir> contains the current directory name, and C<$_> the
136 current filename within that directory. C<$File::Find::name> contains
137 the complete pathname to the file. You are chdir()'d to
138 C<$File::Find::dir> when the function is called, unless C<no_chdir>
139 was specified. When <follow> or <follow_fast> are in effect, there is
140 also a C<$File::Find::fullname>. The function may set
141 C<$File::Find::prune> to prune the tree unless C<bydepth> was
142 specified. Unless C<follow> or C<follow_fast> is specified, for
143 compatibility reasons (find.pl, find2perl) there are in addition the
144 following globals available: C<$File::Find::topdir>,
145 C<$File::Find::topdev>, C<$File::Find::topino>,
146 C<$File::Find::topmode> and C<$File::Find::topnlink>.
148 This library is useful for the C<find2perl> tool, which when fed,
150 find2perl / -name .nfs\* -mtime +7 \
151 -exec rm -f {} \; -o -fstype nfs -prune
153 produces something like:
157 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
161 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
163 ($File::Find::prune = 1);
166 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
170 Here's another interesting wanted function. It will find all symlinks
174 -l && !-e && print "bogus link: $File::Find::name\n";
177 See also the script C<pfind> on CPAN for a nice application of this
182 Be aware that the option to follow symbolic links can be dangerous.
183 Depending on the structure of the directory tree (including symbolic
184 links to directories) you might traverse a given (physical) directory
185 more than once (only if C<follow_fast> is in effect).
186 Furthermore, deleting or changing files in a symbolically linked directory
187 might cause very unpleasant surprises, since you delete or change files
188 in an unknown directory.
193 our @ISA = qw(Exporter);
194 our @EXPORT = qw(find finddepth);
200 require File::Basename;
203 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
204 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat,
205 $pre_process, $post_process);
210 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
212 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
216 my $abs_name= $cdir . $fn;
218 if (substr($fn,0,3) eq '../') {
219 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
226 sub PathCombine($$) {
227 my ($Base,$Name) = @_;
230 if (substr($Name,0,1) eq '/') {
234 $AbsName= contract_name($Base,$Name);
237 # (simple) check for recursion
238 my $newlen= length($AbsName);
239 if ($newlen <= length($Base)) {
240 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
241 && $AbsName eq substr($Base,0,$newlen))
249 sub Follow_SymLink($) {
252 my ($NewName,$DEV, $INO);
253 ($DEV, $INO)= lstat $AbsName;
256 if ($SLnkSeen{$DEV, $INO}++) {
257 if ($follow_skip < 2) {
258 die "$AbsName is encountered a second time";
264 $NewName= PathCombine($AbsName, readlink($AbsName));
265 unless(defined $NewName) {
266 if ($follow_skip < 2) {
267 die "$AbsName is a recursive symbolic link";
276 ($DEV, $INO) = lstat($AbsName);
277 return undef unless defined $DEV; # dangling symbolic link
280 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
281 if ($follow_skip < 1) {
282 die "$AbsName encountered a second time";
292 our($dir, $name, $fullname, $prune);
293 sub _find_dir_symlnk($$$);
298 die "invalid top directory" unless defined $_[0];
300 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
301 my $cwd_untainted = $cwd;
302 $wanted_callback = $wanted->{wanted};
303 $bydepth = $wanted->{bydepth};
304 $pre_process = $wanted->{preprocess};
305 $post_process = $wanted->{postprocess};
306 $no_chdir = $wanted->{no_chdir};
307 $full_check = $wanted->{follow};
308 $follow = $full_check || $wanted->{follow_fast};
309 $follow_skip = $wanted->{follow_skip};
310 $untaint = $wanted->{untaint};
311 $untaint_pat = $wanted->{untaint_pattern};
312 $untaint_skip = $wanted->{untaint_skip};
314 # for compatability reasons (find.pl, find2perl)
315 our ($topdir, $topdev, $topino, $topmode, $topnlink);
317 # a symbolic link to a directory doesn't increase the link count
318 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
321 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
322 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
325 my ($abs_dir, $Is_Dir);
328 foreach my $TOP (@_) {
330 $top_item =~ s|/\z|| unless $top_item eq '/';
333 ($topdev,$topino,$topmode,$topnlink) = stat $top_item;
336 if (substr($top_item,0,1) eq '/') {
337 $abs_dir = $top_item;
339 elsif ($top_item eq '.') {
342 else { # care about any ../
343 $abs_dir = contract_name("$cwd/",$top_item);
345 $abs_dir= Follow_SymLink($abs_dir);
346 unless (defined $abs_dir) {
347 warn "$top_item is a dangling symbolic link\n";
351 _find_dir_symlnk($wanted, $abs_dir, $top_item);
357 unless (defined $topnlink) {
358 warn "Can't stat $top_item: $!\n";
362 $top_item =~ s/\.dir\z// if $Is_VMS;
363 _find_dir($wanted, $top_item, $topnlink);
372 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
373 ($dir,$_) = ('./', $top_item);
378 my $abs_dir_save = $abs_dir;
379 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
380 unless (defined $abs_dir) {
381 if ($untaint_skip == 0) {
382 die "directory $abs_dir_save is still tainted";
390 unless ($no_chdir or chdir $abs_dir) {
391 warn "Couldn't chdir $abs_dir: $!\n";
395 $name = $abs_dir . $_;
397 { &$wanted_callback }; # protect against wild "next"
401 $no_chdir or chdir $cwd_untainted;
407 # $p_dir : "parent directory"
408 # $nlink : what came back from the stat
410 # chdir (if not no_chdir) to dir
413 my ($wanted, $p_dir, $nlink) = @_;
414 my ($CdLvl,$Level) = (0,0);
417 my ($subcount,$sub_nlink);
419 my $dir_name= $p_dir;
420 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
421 my $dir_rel= '.'; # directory name relative to current directory
423 local ($dir, $name, $prune, *DIR);
425 unless ($no_chdir or $p_dir eq '.') {
428 $udir = $1 if $p_dir =~ m|$untaint_pat|;
429 unless (defined $udir) {
430 if ($untaint_skip == 0) {
431 die "directory $p_dir is still tainted";
438 unless (chdir $udir) {
439 warn "Can't cd to $udir: $!\n";
444 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
446 while (defined $SE) {
450 $_= ($no_chdir ? $dir_name : $dir_rel );
451 # prune may happen here
453 { &$wanted_callback }; # protect against wild "next"
457 # change to that directory
458 unless ($no_chdir or $dir_rel eq '.') {
461 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
462 unless (defined $udir) {
463 if ($untaint_skip == 0) {
465 . ($p_dir ne '/' ? $p_dir : '')
466 . "/) $dir_rel is still tainted";
470 unless (chdir $udir) {
472 . ($p_dir ne '/' ? $p_dir : '')
481 # Get the list of files in the current directory.
482 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
483 warn "Can't opendir($dir_name): $!\n";
486 @filenames = readdir DIR;
488 @filenames = &$pre_process(@filenames) if $pre_process;
489 push @Stack,[$CdLvl,$dir_name,"",-2] if $post_process;
491 if ($nlink == 2 && !$avoid_nlink) {
492 # This dir has no subdirectories.
493 for my $FN (@filenames) {
494 next if $FN =~ /^\.{1,2}\z/;
496 $name = $dir_pref . $FN;
497 $_ = ($no_chdir ? $name : $FN);
498 { &$wanted_callback }; # protect against wild "next"
503 # This dir has subdirectories.
504 $subcount = $nlink - 2;
506 for my $FN (@filenames) {
507 next if $FN =~ /^\.{1,2}\z/;
508 if ($subcount > 0 || $avoid_nlink) {
509 # Seen all the subdirs?
510 # check for directoriness.
511 # stat is faster for a file in the current directory
512 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
516 $FN =~ s/\.dir\z// if $Is_VMS;
517 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
520 $name = $dir_pref . $FN;
521 $_= ($no_chdir ? $name : $FN);
522 { &$wanted_callback }; # protect against wild "next"
526 $name = $dir_pref . $FN;
527 $_= ($no_chdir ? $name : $FN);
528 { &$wanted_callback }; # protect against wild "next"
534 while ( defined ($SE = pop @Stack) ) {
535 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
536 if ($CdLvl > $Level && !$no_chdir) {
537 my $tmp = join('/',('..') x ($CdLvl-$Level));
538 die "Can't cd to $dir_name" . $tmp
542 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
543 $dir_pref = "$dir_name/";
544 if ( $nlink == -2 ) {
545 $name = $dir = $p_dir;
547 &$post_process; # End-of-directory processing
548 } elsif ( $nlink < 0 ) { # must be finddepth, report dirname now
550 if ( substr($name,-2) eq '/.' ) {
554 $_ = ($no_chdir ? $dir_name : $dir_rel );
555 if ( substr($_,-2) eq '/.' ) {
558 { &$wanted_callback }; # protect against wild "next"
560 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
570 # $dir_loc : absolute location of a dir
571 # $p_dir : "parent directory"
573 # chdir (if not no_chdir) to dir
575 sub _find_dir_symlnk($$$) {
576 my ($wanted, $dir_loc, $p_dir) = @_;
580 my $pdir_loc = $dir_loc;
582 my $dir_name = $p_dir;
583 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
584 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
585 my $dir_rel = '.'; # directory name relative to current directory
586 my $byd_flag; # flag for pending stack entry if $bydepth
588 local ($dir, $name, $fullname, $prune, *DIR);
590 unless ($no_chdir or $p_dir eq '.') {
593 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
594 unless (defined $udir) {
595 if ($untaint_skip == 0) {
596 die "directory $dir_loc is still tainted";
603 unless (chdir $udir) {
604 warn "Can't cd to $udir: $!\n";
609 push @Stack,[$dir_loc,$pdir_loc,$p_dir,$dir_rel,-1] if $bydepth;
611 while (defined $SE) {
614 # change to parent directory
616 my $udir = $pdir_loc;
618 $udir = $1 if $pdir_loc =~ m|$untaint_pat|;
620 unless (chdir $udir) {
621 warn "Can't cd to $udir: $!\n";
627 $_= ($no_chdir ? $dir_name : $dir_rel );
629 # prune may happen here
631 lstat($_); # make sure file tests with '_' work
632 { &$wanted_callback }; # protect against wild "next"
636 # change to that directory
637 unless ($no_chdir or $dir_rel eq '.') {
640 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
641 unless (defined $udir ) {
642 if ($untaint_skip == 0) {
643 die "directory $dir_loc is still tainted";
650 unless (chdir $udir) {
651 warn "Can't cd to $udir: $!\n";
658 # Get the list of files in the current directory.
659 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
660 warn "Can't opendir($dir_loc): $!\n";
663 @filenames = readdir DIR;
666 for my $FN (@filenames) {
667 next if $FN =~ /^\.{1,2}\z/;
669 # follow symbolic links / do an lstat
670 $new_loc = Follow_SymLink($loc_pref.$FN);
672 # ignore if invalid symlink
673 next unless defined $new_loc;
676 push @Stack,[$new_loc,$dir_loc,$dir_name,$FN,1];
679 $fullname = $new_loc;
680 $name = $dir_pref . $FN;
681 $_ = ($no_chdir ? $name : $FN);
682 { &$wanted_callback }; # protect against wild "next"
688 while (defined($SE = pop @Stack)) {
689 ($dir_loc, $pdir_loc, $p_dir, $dir_rel, $byd_flag) = @$SE;
690 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
691 $dir_pref = "$dir_name/";
692 $loc_pref = "$dir_loc/";
693 if ( $byd_flag < 0 ) { # must be finddepth, report dirname now
694 unless ($no_chdir or $dir_rel eq '.') {
695 my $udir = $pdir_loc;
697 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
699 unless (chdir $udir) {
700 warn "Can't cd to $udir: $!\n";
704 $fullname = $dir_loc;
706 if ( substr($name,-2) eq '/.' ) {
710 $_ = ($no_chdir ? $dir_name : $dir_rel);
711 if ( substr($_,-2) eq '/.' ) {
715 lstat($_); # make sure file tests with '_' work
716 { &$wanted_callback }; # protect against wild "next"
718 push @Stack,[$dir_loc, $pdir_loc, $p_dir, $dir_rel,-1] if $bydepth;
728 if ( ref($wanted) eq 'HASH' ) {
729 if ( $wanted->{follow} || $wanted->{follow_fast}) {
730 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
732 if ( $wanted->{untaint} ) {
733 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
734 unless defined $wanted->{untaint_pattern};
735 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
740 return { wanted => $wanted };
746 _find_opt(wrap_wanted($wanted), @_);
747 %SLnkSeen= (); # free memory
751 my $wanted = wrap_wanted(shift);
752 $wanted->{bydepth} = 1;
753 _find_opt($wanted, @_);
754 %SLnkSeen= (); # free memory
757 # These are hard-coded for now, but may move to hint files.
760 $File::Find::dont_use_nlink = 1;
763 $File::Find::dont_use_nlink = 1
764 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32' ||
765 $^O eq 'cygwin' || $^O eq 'epoc';
767 # Set dont_use_nlink in your hint file if your system's stat doesn't
768 # report the number of links in a directory as an indication
769 # of the number of files.
770 # See, e.g. hints/machten.sh for MachTen 2.2.
771 unless ($File::Find::dont_use_nlink) {
773 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});