8 find - traverse a file tree
10 finddepth - traverse a directory structure depth-first
15 find(\&wanted, '/foo', '/bar');
19 finddepth(\&wanted, '/foo', '/bar');
23 find({ wanted => \&process, follow => 1 }, '.');
27 The first argument to find() is either a hash reference describing the
28 operations to be performed for each file, or a code reference.
30 Here are the possible keys for the hash:
36 The value should be a code reference. This code reference is called
37 I<the wanted() function> below.
41 Reports the name of a directory only AFTER all its entries
42 have been reported. Entry point finddepth() is a shortcut for
43 specifying C<{ bydepth => 1 }> in the first argument of find().
47 The value should be a code reference. This code reference is used to
48 preprocess a directory; it is called after readdir() but before the loop that
49 calls the wanted() function. It is called with a list of strings and is
50 expected to return a list of strings. The code can be used to sort the
51 strings alphabetically, numerically, or to filter out directory entries based
56 The value should be a code reference. It is invoked just before leaving the
57 current directory. It is called in void context with no arguments. The name
58 of the current directory is in $File::Find::dir. This hook is handy for
59 summarizing a directory, such as calculating its disk usage.
63 Causes symbolic links to be followed. Since directory trees with symbolic
64 links (followed) may contain files more than once and may even have
65 cycles, a hash has to be built up with an entry for each file.
66 This might be expensive both in space and time for a large
67 directory tree. See I<follow_fast> and I<follow_skip> below.
68 If either I<follow> or I<follow_fast> is in effect:
74 It is guaranteed that an I<lstat> has been called before the user's
75 I<wanted()> function is called. This enables fast file checks involving S< _>.
79 There is a variable C<$File::Find::fullname> which holds the absolute
80 pathname of the file with all symbolic links resolved
86 This is similar to I<follow> except that it may report some files more
87 than once. It does detect cycles, however. Since only symbolic links
88 have to be hashed, this is much cheaper both in space and time. If
89 processing a file more than once (by the user's I<wanted()> function)
90 is worse than just taking time, the option I<follow> should be used.
94 C<follow_skip==1>, which is the default, causes all files which are
95 neither directories nor symbolic links to be ignored if they are about
96 to be processed a second time. If a directory or a symbolic link
97 are about to be processed a second time, File::Find dies.
98 C<follow_skip==0> causes File::Find to die if any file is about to be
99 processed a second time.
100 C<follow_skip==2> causes File::Find to ignore any duplicate files and
101 dirctories but to proceed normally otherwise.
106 Does not C<chdir()> to each directory as it recurses. The wanted()
107 function will need to be aware of this, of course. In this case,
108 C<$_> will be the same as C<$File::Find::name>.
112 If find is used in taint-mode (-T command line switch or if EUID != UID
113 or if EGID != GID) then internally directory names have to be untainted
114 before they can be cd'ed to. Therefore they are checked against a regular
115 expression I<untaint_pattern>. Note that all names passed to the
116 user's I<wanted()> function are still tainted.
118 =item C<untaint_pattern>
120 See above. This should be set using the C<qr> quoting operator.
121 The default is set to C<qr|^([-+@\w./]+)$|>.
122 Note that the parantheses are vital.
124 =item C<untaint_skip>
126 If set, directories (subtrees) which fail the I<untaint_pattern>
127 are skipped. The default is to 'die' in such a case.
131 The wanted() function does whatever verifications you want.
132 C<$File::Find::dir> contains the current directory name, and C<$_> the
133 current filename within that directory. C<$File::Find::name> contains
134 the complete pathname to the file. You are chdir()'d to
135 C<$File::Find::dir> when the function is called, unless C<no_chdir>
136 was specified. When <follow> or <follow_fast> are in effect, there is
137 also a C<$File::Find::fullname>. The function may set
138 C<$File::Find::prune> to prune the tree unless C<bydepth> was
139 specified. Unless C<follow> or C<follow_fast> is specified, for
140 compatibility reasons (find.pl, find2perl) there are in addition the
141 following globals available: C<$File::Find::topdir>,
142 C<$File::Find::topdev>, C<$File::Find::topino>,
143 C<$File::Find::topmode> and C<$File::Find::topnlink>.
145 This library is useful for the C<find2perl> tool, which when fed,
147 find2perl / -name .nfs\* -mtime +7 \
148 -exec rm -f {} \; -o -fstype nfs -prune
150 produces something like:
154 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
158 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
160 ($File::Find::prune = 1);
163 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
167 Here's another interesting wanted function. It will find all symlinks
171 -l && !-e && print "bogus link: $File::Find::name\n";
174 See also the script C<pfind> on CPAN for a nice application of this
179 Be aware that the option to follow symbolic links can be dangerous.
180 Depending on the structure of the directory tree (including symbolic
181 links to directories) you might traverse a given (physical) directory
182 more than once (only if C<follow_fast> is in effect).
183 Furthermore, deleting or changing files in a symbolically linked directory
184 might cause very unpleasant surprises, since you delete or change files
185 in an unknown directory.
191 @EXPORT = qw(find finddepth);
197 require File::Basename;
200 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
201 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat,
202 $pre_process, $post_process);
207 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
209 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
213 my $abs_name= $cdir . $fn;
215 if (substr($fn,0,3) eq '../') {
216 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
223 sub PathCombine($$) {
224 my ($Base,$Name) = @_;
227 if (substr($Name,0,1) eq '/') {
231 $AbsName= contract_name($Base,$Name);
234 # (simple) check for recursion
235 my $newlen= length($AbsName);
236 if ($newlen <= length($Base)) {
237 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
238 && $AbsName eq substr($Base,0,$newlen))
246 sub Follow_SymLink($) {
249 my ($NewName,$DEV, $INO);
250 ($DEV, $INO)= lstat $AbsName;
253 if ($SLnkSeen{$DEV, $INO}++) {
254 if ($follow_skip < 2) {
255 die "$AbsName is encountered a second time";
261 $NewName= PathCombine($AbsName, readlink($AbsName));
262 unless(defined $NewName) {
263 if ($follow_skip < 2) {
264 die "$AbsName is a recursive symbolic link";
273 ($DEV, $INO) = lstat($AbsName);
274 return undef unless defined $DEV; # dangling symbolic link
277 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
278 if ($follow_skip < 1) {
279 die "$AbsName encountered a second time";
289 our($dir, $name, $fullname, $prune);
290 sub _find_dir_symlnk($$$);
295 die "invalid top directory" unless defined $_[0];
297 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
298 my $cwd_untainted = $cwd;
299 $wanted_callback = $wanted->{wanted};
300 $bydepth = $wanted->{bydepth};
301 $pre_process = $wanted->{preprocess};
302 $post_process = $wanted->{postprocess};
303 $no_chdir = $wanted->{no_chdir};
304 $full_check = $wanted->{follow};
305 $follow = $full_check || $wanted->{follow_fast};
306 $follow_skip = $wanted->{follow_skip};
307 $untaint = $wanted->{untaint};
308 $untaint_pat = $wanted->{untaint_pattern};
309 $untaint_skip = $wanted->{untaint_skip};
311 # for compatability reasons (find.pl, find2perl)
312 our ($topdir, $topdev, $topino, $topmode, $topnlink);
314 # a symbolic link to a directory doesn't increase the link count
315 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
318 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
319 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
322 my ($abs_dir, $Is_Dir);
325 foreach my $TOP (@_) {
327 $top_item =~ s|/\z|| unless $top_item eq '/';
330 ($topdev,$topino,$topmode,$topnlink) = stat $top_item;
333 if (substr($top_item,0,1) eq '/') {
334 $abs_dir = $top_item;
336 elsif ($top_item eq '.') {
339 else { # care about any ../
340 $abs_dir = contract_name("$cwd/",$top_item);
342 $abs_dir= Follow_SymLink($abs_dir);
343 unless (defined $abs_dir) {
344 warn "$top_item is a dangling symbolic link\n";
348 _find_dir_symlnk($wanted, $abs_dir, $top_item);
354 unless (defined $topnlink) {
355 warn "Can't stat $top_item: $!\n";
359 $top_item =~ s/\.dir\z// if $Is_VMS;
360 _find_dir($wanted, $top_item, $topnlink);
369 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
370 ($dir,$_) = ('./', $top_item);
375 my $abs_dir_save = $abs_dir;
376 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
377 unless (defined $abs_dir) {
378 if ($untaint_skip == 0) {
379 die "directory $abs_dir_save is still tainted";
387 unless ($no_chdir or chdir $abs_dir) {
388 warn "Couldn't chdir $abs_dir: $!\n";
392 $name = $abs_dir . $_;
394 { &$wanted_callback }; # protect against wild "next"
398 $no_chdir or chdir $cwd_untainted;
404 # $p_dir : "parent directory"
405 # $nlink : what came back from the stat
407 # chdir (if not no_chdir) to dir
410 my ($wanted, $p_dir, $nlink) = @_;
411 my ($CdLvl,$Level) = (0,0);
414 my ($subcount,$sub_nlink);
416 my $dir_name= $p_dir;
417 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
418 my $dir_rel= '.'; # directory name relative to current directory
420 local ($dir, $name, $prune, *DIR);
422 unless ($no_chdir or $p_dir eq '.') {
425 $udir = $1 if $p_dir =~ m|$untaint_pat|;
426 unless (defined $udir) {
427 if ($untaint_skip == 0) {
428 die "directory $p_dir is still tainted";
435 unless (chdir $udir) {
436 warn "Can't cd to $udir: $!\n";
441 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
443 while (defined $SE) {
447 $_= ($no_chdir ? $dir_name : $dir_rel );
448 # prune may happen here
450 { &$wanted_callback }; # protect against wild "next"
454 # change to that directory
455 unless ($no_chdir or $dir_rel eq '.') {
458 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
459 unless (defined $udir) {
460 if ($untaint_skip == 0) {
462 . ($p_dir ne '/' ? $p_dir : '')
463 . "/) $dir_rel is still tainted";
467 unless (chdir $udir) {
469 . ($p_dir ne '/' ? $p_dir : '')
478 # Get the list of files in the current directory.
479 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
480 warn "Can't opendir($dir_name): $!\n";
483 @filenames = readdir DIR;
485 @filenames = &$pre_process(@filenames) if $pre_process;
486 push @Stack,[$CdLvl,$dir_name,"",-2] if $post_process;
488 if ($nlink == 2 && !$avoid_nlink) {
489 # This dir has no subdirectories.
490 for my $FN (@filenames) {
491 next if $FN =~ /^\.{1,2}\z/;
493 $name = $dir_pref . $FN;
494 $_ = ($no_chdir ? $name : $FN);
495 { &$wanted_callback }; # protect against wild "next"
500 # This dir has subdirectories.
501 $subcount = $nlink - 2;
503 for my $FN (@filenames) {
504 next if $FN =~ /^\.{1,2}\z/;
505 if ($subcount > 0 || $avoid_nlink) {
506 # Seen all the subdirs?
507 # check for directoriness.
508 # stat is faster for a file in the current directory
509 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
513 $FN =~ s/\.dir\z// if $Is_VMS;
514 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
517 $name = $dir_pref . $FN;
518 $_= ($no_chdir ? $name : $FN);
519 { &$wanted_callback }; # protect against wild "next"
523 $name = $dir_pref . $FN;
524 $_= ($no_chdir ? $name : $FN);
525 { &$wanted_callback }; # protect against wild "next"
531 while ( defined ($SE = pop @Stack) ) {
532 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
533 if ($CdLvl > $Level && !$no_chdir) {
534 my $tmp = join('/',('..') x ($CdLvl-$Level));
535 die "Can't cd to $dir_name" . $tmp
539 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
540 $dir_pref = "$dir_name/";
541 if ( $nlink == -2 ) {
542 $name = $dir = $p_dir;
544 &$post_process; # End-of-directory processing
545 } elsif ( $nlink < 0 ) { # must be finddepth, report dirname now
547 if ( substr($name,-2) eq '/.' ) {
551 $_ = ($no_chdir ? $dir_name : $dir_rel );
552 if ( substr($_,-2) eq '/.' ) {
555 { &$wanted_callback }; # protect against wild "next"
557 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
567 # $dir_loc : absolute location of a dir
568 # $p_dir : "parent directory"
570 # chdir (if not no_chdir) to dir
572 sub _find_dir_symlnk($$$) {
573 my ($wanted, $dir_loc, $p_dir) = @_;
577 my $pdir_loc = $dir_loc;
579 my $dir_name = $p_dir;
580 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
581 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
582 my $dir_rel = '.'; # directory name relative to current directory
583 my $byd_flag; # flag for pending stack entry if $bydepth
585 local ($dir, $name, $fullname, $prune, *DIR);
587 unless ($no_chdir or $p_dir eq '.') {
590 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
591 unless (defined $udir) {
592 if ($untaint_skip == 0) {
593 die "directory $dir_loc is still tainted";
600 unless (chdir $udir) {
601 warn "Can't cd to $udir: $!\n";
606 push @Stack,[$dir_loc,$pdir_loc,$p_dir,$dir_rel,-1] if $bydepth;
608 while (defined $SE) {
611 # change to parent directory
613 my $udir = $pdir_loc;
615 $udir = $1 if $pdir_loc =~ m|$untaint_pat|;
617 unless (chdir $udir) {
618 warn "Can't cd to $udir: $!\n";
624 $_= ($no_chdir ? $dir_name : $dir_rel );
626 # prune may happen here
628 lstat($_); # make sure file tests with '_' work
629 { &$wanted_callback }; # protect against wild "next"
633 # change to that directory
634 unless ($no_chdir or $dir_rel eq '.') {
637 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
638 unless (defined $udir ) {
639 if ($untaint_skip == 0) {
640 die "directory $dir_loc is still tainted";
647 unless (chdir $udir) {
648 warn "Can't cd to $udir: $!\n";
655 # Get the list of files in the current directory.
656 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
657 warn "Can't opendir($dir_loc): $!\n";
660 @filenames = readdir DIR;
663 for my $FN (@filenames) {
664 next if $FN =~ /^\.{1,2}\z/;
666 # follow symbolic links / do an lstat
667 $new_loc = Follow_SymLink($loc_pref.$FN);
669 # ignore if invalid symlink
670 next unless defined $new_loc;
673 push @Stack,[$new_loc,$dir_loc,$dir_name,$FN,1];
676 $fullname = $new_loc;
677 $name = $dir_pref . $FN;
678 $_ = ($no_chdir ? $name : $FN);
679 { &$wanted_callback }; # protect against wild "next"
685 while (defined($SE = pop @Stack)) {
686 ($dir_loc, $pdir_loc, $p_dir, $dir_rel, $byd_flag) = @$SE;
687 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
688 $dir_pref = "$dir_name/";
689 $loc_pref = "$dir_loc/";
690 if ( $byd_flag < 0 ) { # must be finddepth, report dirname now
691 unless ($no_chdir or $dir_rel eq '.') {
692 my $udir = $pdir_loc;
694 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
696 unless (chdir $udir) {
697 warn "Can't cd to $udir: $!\n";
701 $fullname = $dir_loc;
703 if ( substr($name,-2) eq '/.' ) {
707 $_ = ($no_chdir ? $dir_name : $dir_rel);
708 if ( substr($_,-2) eq '/.' ) {
712 lstat($_); # make sure file tests with '_' work
713 { &$wanted_callback }; # protect against wild "next"
715 push @Stack,[$dir_loc, $pdir_loc, $p_dir, $dir_rel,-1] if $bydepth;
725 if ( ref($wanted) eq 'HASH' ) {
726 if ( $wanted->{follow} || $wanted->{follow_fast}) {
727 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
729 if ( $wanted->{untaint} ) {
730 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
731 unless defined $wanted->{untaint_pattern};
732 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
737 return { wanted => $wanted };
743 _find_opt(wrap_wanted($wanted), @_);
744 %SLnkSeen= (); # free memory
748 my $wanted = wrap_wanted(shift);
749 $wanted->{bydepth} = 1;
750 _find_opt($wanted, @_);
751 %SLnkSeen= (); # free memory
754 # These are hard-coded for now, but may move to hint files.
757 $File::Find::dont_use_nlink = 1;
760 $File::Find::dont_use_nlink = 1
761 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32' ||
762 $^O eq 'cygwin' || $^O eq 'epoc';
764 # Set dont_use_nlink in your hint file if your system's stat doesn't
765 # report the number of links in a directory as an indication
766 # of the number of files.
767 # See, e.g. hints/machten.sh for MachTen 2.2.
768 unless ($File::Find::dont_use_nlink) {
770 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});