8 find - traverse a file tree
10 finddepth - traverse a directory structure depth-first
15 find(\&wanted, '/foo', '/bar');
19 finddepth(\&wanted, '/foo', '/bar');
23 find({ wanted => \&process, follow => 1 }, '.');
27 The first argument to find() is either a hash reference describing the
28 operations to be performed for each file, or a code reference.
30 Here are the possible keys for the hash:
36 The value should be a code reference. This code reference is called
37 I<the wanted() function> below.
41 Reports the name of a directory only AFTER all its entries
42 have been reported. Entry point finddepth() is a shortcut for
43 specifying C<{ bydepth => 1 }> in the first argument of find().
47 Causes symbolic links to be followed. Since directory trees with symbolic
48 links (followed) may contain files more than once and may even have
49 cycles, a hash has to be built up with an entry for each file.
50 This might be expensive both in space and time for a large
51 directory tree. See I<follow_fast> and I<follow_skip> below.
52 If either I<follow> or I<follow_fast> is in effect:
58 It is guarantueed that an I<lstat> has been called before the user's
59 I<wanted()> function is called. This enables fast file checks involving S< _>.
63 There is a variable C<$File::Find::fullname> which holds the absolute
64 pathname of the file with all symbolic links resolved
70 This is similar to I<follow> except that it may report some files
71 more than once. It does detect cycles however.
72 Since only symbolic links have to be hashed, this is
73 much cheaper both in space and time.
74 If processing a file more than once (by the user's I<wanted()> function)
75 is worse than just taking time, the option I<follow> should be used.
79 C<follow_skip==1>, which is the default, causes all files which are
80 neither directories nor symbolic links to be ignored if they are about
81 to be processed a second time. If a directory or a symbolic link
82 are about to be processed a second time, File::Find dies.
83 C<follow_skip==0> causes File::Find to die if any file is about to be
84 processed a second time.
85 C<follow_skip==2> causes File::Find to ignore any duplicate files and
86 dirctories but to proceed normally otherwise.
91 Does not C<chdir()> to each directory as it recurses. The wanted()
92 function will need to be aware of this, of course. In this case,
93 C<$_> will be the same as C<$File::Find::name>.
97 If find is used in taint-mode (-T command line switch or if EUID != UID
98 or if EGID != GID) then internally directory names have to be untainted
99 before they can be cd'ed to. Therefore they are checked against a regular
100 expression I<untaint_pattern>. Note, that all names passed to the
101 user's I<wanted()> function are still tainted.
103 =item C<untaint_pattern>
105 See above. This should be set using the C<qr> quoting operator.
106 The default is set to C<qr|^([-+@\w./]+)$|>.
107 Note that the paranthesis which are vital.
109 =item C<untaint_skip>
111 If set, directories (subtrees) which fail the I<untaint_pattern>
112 are skipped. The default is to 'die' in such a case.
116 The wanted() function does whatever verifications you want.
117 C<$File::Find::dir> contains the current directory name, and C<$_> the
118 current filename within that directory. C<$File::Find::name> contains
119 the complete pathname to the file. You are chdir()'d to C<$File::Find::dir> when
120 the function is called, unless C<no_chdir> was specified.
121 When <follow> or <follow_fast> are in effect there is also a
122 C<$File::Find::fullname>.
123 The function may set C<$File::Find::prune> to prune the tree
124 unless C<bydepth> was specified.
125 Unless C<follow> or C<follow_fast> is specified, for compatibility
126 reasons (find.pl, find2perl) there are in addition the following globals
127 available: C<$File::Find::topdir>, C<$File::Find::topdev>, C<$File::Find::topino>,
128 C<$File::Find::topmode> and C<$File::Find::topnlink>.
130 This library is useful for the C<find2perl> tool, which when fed,
132 find2perl / -name .nfs\* -mtime +7 \
133 -exec rm -f {} \; -o -fstype nfs -prune
135 produces something like:
139 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
143 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
145 ($File::Find::prune = 1);
148 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
152 Here's another interesting wanted function. It will find all symlinks
156 -l && !-e && print "bogus link: $File::Find::name\n";
159 See also the script C<pfind> on CPAN for a nice application of this
164 Be aware that the option to follow symblic links can be dangerous.
165 Depending on the structure of the directory tree (including symbolic
166 links to directories) you might traverse a given (physical) directory
167 more than once (only if C<follow_fast> is in effect).
168 Furthermore, deleting or changing files in a symbolically linked directory
169 might cause very unpleasant surprises, since you delete or change files
170 in an unknown directory.
176 @EXPORT = qw(find finddepth);
182 require File::Basename;
185 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
186 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat);
191 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
193 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
197 my $abs_name= $cdir . $fn;
199 if (substr($fn,0,3) eq '../') {
200 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
207 sub PathCombine($$) {
208 my ($Base,$Name) = @_;
211 if (substr($Name,0,1) eq '/') {
215 $AbsName= contract_name($Base,$Name);
218 # (simple) check for recursion
219 my $newlen= length($AbsName);
220 if ($newlen <= length($Base)) {
221 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
222 && $AbsName eq substr($Base,0,$newlen))
230 sub Follow_SymLink($) {
233 my ($NewName,$DEV, $INO);
234 ($DEV, $INO)= lstat $AbsName;
237 if ($SLnkSeen{$DEV, $INO}++) {
238 if ($follow_skip < 2) {
239 die "$AbsName is encountered a second time";
245 $NewName= PathCombine($AbsName, readlink($AbsName));
246 unless(defined $NewName) {
247 if ($follow_skip < 2) {
248 die "$AbsName is a recursive symbolic link";
257 ($DEV, $INO) = lstat($AbsName);
258 return undef unless defined $DEV; # dangling symbolic link
261 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
262 if ($follow_skip < 1) {
263 die "$AbsName encountered a second time";
273 our($dir, $name, $fullname, $prune);
274 sub _find_dir_symlnk($$$);
279 die "invalid top directory" unless defined $_[0];
281 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
282 my $cwd_untainted = $cwd;
283 $wanted_callback = $wanted->{wanted};
284 $bydepth = $wanted->{bydepth};
285 $no_chdir = $wanted->{no_chdir};
286 $full_check = $wanted->{follow};
287 $follow = $full_check || $wanted->{follow_fast};
288 $follow_skip = $wanted->{follow_skip};
289 $untaint = $wanted->{untaint};
290 $untaint_pat = $wanted->{untaint_pattern};
291 $untaint_skip = $wanted->{untaint_skip};
293 # for compatability reasons (find.pl, find2perl)
294 our ($topdir, $topdev, $topino, $topmode, $topnlink);
296 # a symbolic link to a directory doesn't increase the link count
297 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
300 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
301 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
304 my ($abs_dir, $Is_Dir);
307 foreach my $TOP (@_) {
309 $top_item =~ s|/\z|| unless $top_item eq '/';
312 ($topdev,$topino,$topmode,$topnlink) = stat $top_item;
315 if (substr($top_item,0,1) eq '/') {
316 $abs_dir = $top_item;
318 elsif ($top_item eq '.') {
321 else { # care about any ../
322 $abs_dir = contract_name("$cwd/",$top_item);
324 $abs_dir= Follow_SymLink($abs_dir);
325 unless (defined $abs_dir) {
326 warn "$top_item is a dangling symbolic link\n";
330 _find_dir_symlnk($wanted, $abs_dir, $top_item);
336 unless (defined $topnlink) {
337 warn "Can't stat $top_item: $!\n";
341 $top_item =~ s/\.dir\z// if $Is_VMS;
342 _find_dir($wanted, $top_item, $topnlink);
351 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
352 ($dir,$_) = ('./', $top_item);
357 my $abs_dir_save = $abs_dir;
358 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
359 unless (defined $abs_dir) {
360 if ($untaint_skip == 0) {
361 die "directory $abs_dir_save is still tainted";
369 unless ($no_chdir or chdir $abs_dir) {
370 warn "Couldn't chdir $abs_dir: $!\n";
374 $name = $abs_dir . $_;
380 $no_chdir or chdir $cwd_untainted;
386 # $p_dir : "parent directory"
387 # $nlink : what came back from the stat
389 # chdir (if not no_chdir) to dir
392 my ($wanted, $p_dir, $nlink) = @_;
393 my ($CdLvl,$Level) = (0,0);
396 my ($subcount,$sub_nlink);
398 my $dir_name= $p_dir;
399 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
400 my $dir_rel= '.'; # directory name relative to current directory
402 local ($dir, $name, $prune, *DIR);
404 unless ($no_chdir or $p_dir eq '.') {
407 $udir = $1 if $p_dir =~ m|$untaint_pat|;
408 unless (defined $udir) {
409 if ($untaint_skip == 0) {
410 die "directory $p_dir is still tainted";
417 unless (chdir $udir) {
418 warn "Can't cd to $udir: $!\n";
423 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
425 while (defined $SE) {
429 $_= ($no_chdir ? $dir_name : $dir_rel );
430 # prune may happen here
436 # change to that directory
437 unless ($no_chdir or $dir_rel eq '.') {
440 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
441 unless (defined $udir) {
442 if ($untaint_skip == 0) {
444 . ($p_dir ne '/' ? $p_dir : '')
445 . "/) $dir_rel is still tainted";
449 unless (chdir $udir) {
451 . ($p_dir ne '/' ? $p_dir : '')
460 # Get the list of files in the current directory.
461 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
462 warn "Can't opendir($dir_name): $!\n";
465 @filenames = readdir DIR;
468 if ($nlink == 2 && !$avoid_nlink) {
469 # This dir has no subdirectories.
470 for my $FN (@filenames) {
471 next if $FN =~ /^\.{1,2}\z/;
473 $name = $dir_pref . $FN;
474 $_ = ($no_chdir ? $name : $FN);
480 # This dir has subdirectories.
481 $subcount = $nlink - 2;
483 for my $FN (@filenames) {
484 next if $FN =~ /^\.{1,2}\z/;
485 if ($subcount > 0 || $avoid_nlink) {
486 # Seen all the subdirs?
487 # check for directoriness.
488 # stat is faster for a file in the current directory
489 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
493 $FN =~ s/\.dir\z// if $Is_VMS;
494 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
497 $name = $dir_pref . $FN;
498 $_= ($no_chdir ? $name : $FN);
503 $name = $dir_pref . $FN;
504 $_= ($no_chdir ? $name : $FN);
511 while ( defined ($SE = pop @Stack) ) {
512 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
513 if ($CdLvl > $Level && !$no_chdir) {
514 my $tmp = join('/',('..') x ($CdLvl-$Level));
515 die "Can't cd to $dir_name" . $tmp
519 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
520 $dir_pref = "$dir_name/";
521 if ( $nlink < 0 ) { # must be finddepth, report dirname now
523 if ( substr($name,-2) eq '/.' ) {
527 $_ = ($no_chdir ? $dir_name : $dir_rel );
528 if ( substr($_,-2) eq '/.' ) {
533 push @Stack,[$CdLvl,$p_dir,$dir_rel,-1] if $bydepth;
543 # $dir_loc : absolute location of a dir
544 # $p_dir : "parent directory"
546 # chdir (if not no_chdir) to dir
548 sub _find_dir_symlnk($$$) {
549 my ($wanted, $dir_loc, $p_dir) = @_;
553 my $pdir_loc = $dir_loc;
555 my $dir_name = $p_dir;
556 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
557 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
558 my $dir_rel = '.'; # directory name relative to current directory
559 my $byd_flag; # flag for pending stack entry if $bydepth
561 local ($dir, $name, $fullname, $prune, *DIR);
563 unless ($no_chdir or $p_dir eq '.') {
566 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
567 unless (defined $udir) {
568 if ($untaint_skip == 0) {
569 die "directory $dir_loc is still tainted";
576 unless (chdir $udir) {
577 warn "Can't cd to $udir: $!\n";
582 push @Stack,[$dir_loc,$pdir_loc,$p_dir,$dir_rel,-1] if $bydepth;
584 while (defined $SE) {
587 # change to parent directory
589 my $udir = $pdir_loc;
591 $udir = $1 if $pdir_loc =~ m|$untaint_pat|;
593 unless (chdir $udir) {
594 warn "Can't cd to $udir: $!\n";
600 $_= ($no_chdir ? $dir_name : $dir_rel );
602 # prune may happen here
604 lstat($_); # make sure file tests with '_' work
609 # change to that directory
610 unless ($no_chdir or $dir_rel eq '.') {
613 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
614 unless (defined $udir ) {
615 if ($untaint_skip == 0) {
616 die "directory $dir_loc is still tainted";
623 unless (chdir $udir) {
624 warn "Can't cd to $udir: $!\n";
631 # Get the list of files in the current directory.
632 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
633 warn "Can't opendir($dir_loc): $!\n";
636 @filenames = readdir DIR;
639 for my $FN (@filenames) {
640 next if $FN =~ /^\.{1,2}\z/;
642 # follow symbolic links / do an lstat
643 $new_loc = Follow_SymLink($loc_pref.$FN);
645 # ignore if invalid symlink
646 next unless defined $new_loc;
649 push @Stack,[$new_loc,$dir_loc,$dir_name,$FN,1];
652 $fullname = $new_loc;
653 $name = $dir_pref . $FN;
654 $_ = ($no_chdir ? $name : $FN);
661 while (defined($SE = pop @Stack)) {
662 ($dir_loc, $pdir_loc, $p_dir, $dir_rel, $byd_flag) = @$SE;
663 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
664 $dir_pref = "$dir_name/";
665 $loc_pref = "$dir_loc/";
666 if ( $byd_flag < 0 ) { # must be finddepth, report dirname now
667 unless ($no_chdir or $dir_rel eq '.') {
668 my $udir = $pdir_loc;
670 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
672 unless (chdir $udir) {
673 warn "Can't cd to $udir: $!\n";
677 $fullname = $dir_loc;
679 if ( substr($name,-2) eq '/.' ) {
683 $_ = ($no_chdir ? $dir_name : $dir_rel);
684 if ( substr($_,-2) eq '/.' ) {
688 lstat($_); # make sure file tests with '_' work
691 push @Stack,[$dir_loc, $pdir_loc, $p_dir, $dir_rel,-1] if $bydepth;
701 if ( ref($wanted) eq 'HASH' ) {
702 if ( $wanted->{follow} || $wanted->{follow_fast}) {
703 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
705 if ( $wanted->{untaint} ) {
706 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
707 unless defined $wanted->{untaint_pattern};
708 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
713 return { wanted => $wanted };
719 _find_opt(wrap_wanted($wanted), @_);
720 %SLnkSeen= (); # free memory
724 my $wanted = wrap_wanted(shift);
725 $wanted->{bydepth} = 1;
726 _find_opt($wanted, @_);
727 %SLnkSeen= (); # free memory
730 # These are hard-coded for now, but may move to hint files.
733 $File::Find::dont_use_nlink = 1;
736 $File::Find::dont_use_nlink = 1
737 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32';
739 # Set dont_use_nlink in your hint file if your system's stat doesn't
740 # report the number of links in a directory as an indication
741 # of the number of files.
742 # See, e.g. hints/machten.sh for MachTen 2.2.
743 unless ($File::Find::dont_use_nlink) {
745 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});