8 find - traverse a file tree
10 finddepth - traverse a directory structure depth-first
15 find(\&wanted, '/foo', '/bar');
19 finddepth(\&wanted, '/foo', '/bar');
23 find({ wanted => \&process, follow => 1 }, '.');
27 The first argument to find() is either a hash reference describing the
28 operations to be performed for each file, or a code reference.
30 Here are the possible keys for the hash:
36 The value should be a code reference. This code reference is called
37 I<the wanted() function> below.
41 Reports the name of a directory only AFTER all its entries
42 have been reported. Entry point finddepth() is a shortcut for
43 specifying C<{ bydepth => 1 }> in the first argument of find().
47 Causes symbolic links to be followed. Since directory trees with symbolic
48 links (followed) may contain files more than once and may even have
49 cycles, a hash has to be built up with an entry for each file.
50 This might be expensive both in space and time for a large
51 directory tree. See I<follow_fast> and I<follow_skip> below.
52 If either I<follow> or I<follow_fast> is in effect:
58 It is guarantueed that an I<lstat> has been called before the user's
59 I<wanted()> function is called. This enables fast file checks involving S< _>.
63 There is a variable C<$File::Find::fullname> which holds the absolute
64 pathname of the file with all symbolic links resolved
70 This is similar to I<follow> except that it may report some files
71 more than once. It does detect cycles however.
72 Since only symbolic links have to be hashed, this is
73 much cheaper both in space and time.
74 If processing a file more than once (by the user's I<wanted()> function)
75 is worse than just taking time, the option I<follow> should be used.
79 C<follow_skip==1>, which is the default, causes all files which are
80 neither directories nor symbolic links to be ignored if they are about
81 to be processed a second time. If a directory or a symbolic link
82 are about to be processed a second time, File::Find dies.
83 C<follow_skip==0> causes File::Find to die if any file is about to be
84 processed a second time.
85 C<follow_skip==2> causes File::Find to ignore any duplicate files and
86 dirctories but to proceed normally otherwise.
91 Does not C<chdir()> to each directory as it recurses. The wanted()
92 function will need to be aware of this, of course. In this case,
93 C<$_> will be the same as C<$File::Find::name>.
97 If find is used in taint-mode (-T command line switch or if EUID != UID
98 or if EGID != GID) then internally directory names have to be untainted
99 before they can be cd'ed to. Therefore they are checked against a regular
100 expression I<untaint_pattern>. Note, that all names passed to the
101 user's I<wanted()> function are still tainted.
103 =item C<untaint_pattern>
105 See above. This should be set using the C<qr> quoting operator.
106 The default is set to C<qr|^([-+@\w./]+)$|>.
107 Note that the paranthesis which are vital.
109 =item C<untaint_skip>
111 If set, directories (subtrees) which fail the I<untaint_pattern>
112 are skipped. The default is to 'die' in such a case.
116 The wanted() function does whatever verifications you want.
117 C<$File::Find::dir> contains the current directory name, and C<$_> the
118 current filename within that directory. C<$File::Find::name> contains
119 the complete pathname to the file. You are chdir()'d to C<$File::Find::dir> when
120 the function is called, unless C<no_chdir> was specified.
121 When <follow> or <follow_fast> are in effect there is also a
122 C<$File::Find::fullname>.
123 The function may set C<$File::Find::prune> to prune the tree
124 unless C<bydepth> was specified.
126 This library is useful for the C<find2perl> tool, which when fed,
128 find2perl / -name .nfs\* -mtime +7 \
129 -exec rm -f {} \; -o -fstype nfs -prune
131 produces something like:
135 (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) &&
139 ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) &&
141 ($File::Find::prune = 1);
144 Set the variable C<$File::Find::dont_use_nlink> if you're using AFS,
148 Here's another interesting wanted function. It will find all symlinks
152 -l && !-e && print "bogus link: $File::Find::name\n";
155 See also the script C<pfind> on CPAN for a nice application of this
160 Be aware that the option to follow symblic links can be dangerous.
161 Depending on the structure of the directory tree (including symbolic
162 links to directories) you might traverse a given (physical) directory
163 more than once (only if C<follow_fast> is in effect).
164 Furthermore, deleting or changing files in a symbolically linked directory
165 might cause very unpleasant surprises, since you delete or change files
166 in an unknown directory.
172 @EXPORT = qw(find finddepth);
178 require File::Basename;
181 my ($wanted_callback, $avoid_nlink, $bydepth, $no_chdir, $follow,
182 $follow_skip, $full_check, $untaint, $untaint_skip, $untaint_pat);
187 return substr($cdir,0,rindex($cdir,'/')) if $fn eq '.';
189 $cdir = substr($cdir,0,rindex($cdir,'/')+1);
193 my $abs_name= $cdir . $fn;
195 if (substr($fn,0,3) eq '../') {
196 do 1 while ($abs_name=~ s|/(?>[^/]+)/\.\./|/|);
203 sub PathCombine($$) {
204 my ($Base,$Name) = @_;
207 if (substr($Name,0,1) eq '/') {
211 $AbsName= contract_name($Base,$Name);
214 # (simple) check for recursion
215 my $newlen= length($AbsName);
216 if ($newlen <= length($Base)) {
217 if (($newlen == length($Base) || substr($Base,$newlen,1) eq '/')
218 && $AbsName eq substr($Base,0,$newlen))
226 sub Follow_SymLink($) {
229 my ($NewName,$DEV, $INO);
230 ($DEV, $INO)= lstat $AbsName;
233 if ($SLnkSeen{$DEV, $INO}++) {
234 if ($follow_skip < 2) {
235 die "$AbsName is encountered a second time";
241 $NewName= PathCombine($AbsName, readlink($AbsName));
242 unless(defined $NewName) {
243 if ($follow_skip < 2) {
244 die "$AbsName is a recursive symbolic link";
253 ($DEV, $INO) = lstat($AbsName);
254 return undef unless defined $DEV; # dangling symbolic link
257 if ($full_check && $SLnkSeen{$DEV, $INO}++) {
258 if ($follow_skip < 1) {
259 die "$AbsName encountered a second time";
269 use vars qw/ $dir $name $fullname $prune /;
270 sub _find_dir_symlnk($$$);
275 die "invalid top directory" unless defined $_[0];
277 my $cwd = $wanted->{bydepth} ? Cwd::fastcwd() : Cwd::cwd();
278 my $cwd_untainted = $cwd;
279 $wanted_callback = $wanted->{wanted};
280 $bydepth = $wanted->{bydepth};
281 $no_chdir = $wanted->{no_chdir};
282 $full_check = $wanted->{follow};
283 $follow = $full_check || $wanted->{follow_fast};
284 $follow_skip = $wanted->{follow_skip};
285 $untaint = $wanted->{untaint};
286 $untaint_pat = $wanted->{untaint_pattern};
287 $untaint_skip = $wanted->{untaint_skip};
290 # a symbolic link to a directory doesn't increase the link count
291 $avoid_nlink = $follow || $File::Find::dont_use_nlink;
294 $cwd_untainted= $1 if $cwd_untainted =~ m|$untaint_pat|;
295 die "insecure cwd in find(depth)" unless defined($cwd_untainted);
298 my ($abs_dir, $nlink, $Is_Dir);
301 foreach my $TOP (@_) {
303 $top_item =~ s|/$|| unless $top_item eq '/';
307 if (substr($top_item,0,1) eq '/') {
308 $abs_dir = $top_item;
310 elsif ($top_item eq '.') {
313 else { # care about any ../
314 $abs_dir = contract_name("$cwd/",$top_item);
316 $abs_dir= Follow_SymLink($abs_dir);
317 unless (defined $abs_dir) {
318 warn "$top_item is a dangling symbolic link\n";
322 _find_dir_symlnk($wanted, $abs_dir, $top_item);
327 $nlink = (lstat $top_item)[3];
328 unless (defined $nlink) {
329 warn "Can't stat $top_item: $!\n";
333 $top_item =~ s/\.dir$// if $Is_VMS;
334 _find_dir($wanted, $top_item, $nlink);
343 unless (($_,$dir) = File::Basename::fileparse($abs_dir)) {
344 ($dir,$_) = ('.', $top_item);
349 my $abs_dir_save = $abs_dir;
350 $abs_dir = $1 if $abs_dir =~ m|$untaint_pat|;
351 unless (defined $abs_dir) {
352 if ($untaint_skip == 0) {
353 die "directory $abs_dir_save is still tainted";
361 unless ($no_chdir or chdir $abs_dir) {
362 warn "Couldn't chdir $abs_dir: $!\n";
372 $no_chdir or chdir $cwd_untainted;
378 # $p_dir : "parent directory"
379 # $nlink : what came back from the stat
381 # chdir (if not no_chdir) to dir
384 my ($wanted, $p_dir, $nlink) = @_;
385 my ($CdLvl,$Level) = (0,0);
388 my ($subcount,$sub_nlink);
390 my $dir_name= $p_dir;
391 my $dir_pref= ( $p_dir eq '/' ? '/' : "$p_dir/" );
392 my $dir_rel= '.'; # directory name relative to current directory
394 local ($dir, $name, $prune, *DIR);
396 unless ($no_chdir or $p_dir eq '.') {
399 $udir = $1 if $p_dir =~ m|$untaint_pat|;
400 unless (defined $udir) {
401 if ($untaint_skip == 0) {
402 die "directory $p_dir is still tainted";
409 unless (chdir $udir) {
410 warn "Can't cd to $udir: $!\n";
415 while (defined $SE) {
419 $_= ($no_chdir ? $dir_name : $dir_rel );
420 # prune may happen here
426 # change to that directory
427 unless ($no_chdir or $dir_rel eq '.') {
430 $udir = $1 if $dir_rel =~ m|$untaint_pat|;
431 unless (defined $udir) {
432 if ($untaint_skip == 0) {
434 . ($p_dir ne '/' ? $p_dir : '')
435 . "/) $dir_rel is still tainted";
439 unless (chdir $udir) {
441 . ($p_dir ne '/' ? $p_dir : '')
450 # Get the list of files in the current directory.
451 unless (opendir DIR, ($no_chdir ? $dir_name : '.')) {
452 warn "Can't opendir($dir_name): $!\n";
455 @filenames = readdir DIR;
458 if ($nlink == 2 && !$avoid_nlink) {
459 # This dir has no subdirectories.
460 for my $FN (@filenames) {
461 next if $FN =~ /^\.{1,2}$/;
463 $name = $dir_pref . $FN;
464 $_ = ($no_chdir ? $name : $FN);
470 # This dir has subdirectories.
471 $subcount = $nlink - 2;
473 for my $FN (@filenames) {
474 next if $FN =~ /^\.{1,2}$/;
475 if ($subcount > 0 || $avoid_nlink) {
476 # Seen all the subdirs?
477 # check for directoriness.
478 # stat is faster for a file in the current directory
479 $sub_nlink = (lstat ($no_chdir ? $dir_pref . $FN : $FN))[3];
483 $FN =~ s/\.dir$// if $Is_VMS;
484 push @Stack,[$CdLvl,$dir_name,$FN,$sub_nlink];
487 $name = $dir_pref . $FN;
488 $_= ($no_chdir ? $name : $FN);
493 $name = $dir_pref . $FN;
494 $_= ($no_chdir ? $name : $FN);
502 $_ = ($no_chdir ? $dir_name : $dir_rel );
507 if ( defined ($SE = pop @Stack) ) {
508 ($Level, $p_dir, $dir_rel, $nlink) = @$SE;
509 if ($CdLvl > $Level && !$no_chdir) {
510 die "Can't cd to $dir_name" . '../' x ($CdLvl-$Level)
511 unless chdir '../' x ($CdLvl-$Level);
514 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
515 $dir_pref = "$dir_name/";
523 # $dir_loc : absolute location of a dir
524 # $p_dir : "parent directory"
526 # chdir (if not no_chdir) to dir
528 sub _find_dir_symlnk($$$) {
529 my ($wanted, $dir_loc, $p_dir) = @_;
534 my $dir_name = $p_dir;
535 my $dir_pref = ( $p_dir eq '/' ? '/' : "$p_dir/" );
536 my $loc_pref = ( $dir_loc eq '/' ? '/' : "$dir_loc/" );
537 my $dir_rel = '.'; # directory name relative to current directory
539 local ($dir, $name, $fullname, $prune, *DIR);
541 unless ($no_chdir or $p_dir eq '.') {
544 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
545 unless (defined $udir) {
546 if ($untaint_skip == 0) {
547 die "directory $dir_loc is still tainted";
554 unless (chdir $udir) {
555 warn "Can't cd to $udir: $!\n";
560 while (defined $SE) {
565 $_= ($no_chdir ? $dir_name : $dir_rel );
567 # prune may happen here
573 # change to that directory
574 unless ($no_chdir or $dir_rel eq '.') {
577 $udir = $1 if $dir_loc =~ m|$untaint_pat|;
578 unless (defined $udir ) {
579 if ($untaint_skip == 0) {
580 die "directory $dir_loc is still tainted";
587 unless (chdir $udir) {
588 warn "Can't cd to $udir: $!\n";
595 # Get the list of files in the current directory.
596 unless (opendir DIR, ($no_chdir ? $dir_loc : '.')) {
597 warn "Can't opendir($dir_loc): $!\n";
600 @filenames = readdir DIR;
603 for my $FN (@filenames) {
604 next if $FN =~ /^\.{1,2}$/;
606 # follow symbolic links / do an lstat
607 $new_loc = Follow_SymLink($loc_pref.$FN);
609 # ignore if invalid symlink
610 next unless defined $new_loc;
613 push @Stack,[$new_loc,$dir_name,$FN];
616 $fullname = $new_loc;
617 $name = $dir_pref . $FN;
618 $_ = ($no_chdir ? $name : $FN);
624 $fullname = $dir_loc;
626 $_ = ($no_chdir ? $dir_name : $dir_rel);
631 if (defined($SE = pop @Stack)) {
632 ($dir_loc, $p_dir, $dir_rel) = @$SE;
633 $dir_name = ($p_dir eq '/' ? "/$dir_rel" : "$p_dir/$dir_rel");
634 $dir_pref = "$dir_name/";
635 $loc_pref = "$dir_loc/";
643 if ( ref($wanted) eq 'HASH' ) {
644 if ( $wanted->{follow} || $wanted->{follow_fast}) {
645 $wanted->{follow_skip} = 1 unless defined $wanted->{follow_skip};
647 if ( $wanted->{untaint} ) {
648 $wanted->{untaint_pattern} = qr|^([-+@\w./]+)$|
649 unless defined $wanted->{untaint_pattern};
650 $wanted->{untaint_skip} = 0 unless defined $wanted->{untaint_skip};
655 return { wanted => $wanted };
661 _find_opt(wrap_wanted($wanted), @_);
662 %SLnkSeen= (); # free memory
666 my $wanted = wrap_wanted(shift);
667 $wanted->{bydepth} = 1;
668 _find_opt($wanted, @_);
669 %SLnkSeen= (); # free memory
672 # These are hard-coded for now, but may move to hint files.
675 $File::Find::dont_use_nlink = 1;
678 $File::Find::dont_use_nlink = 1
679 if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32';
681 # Set dont_use_nlink in your hint file if your system's stat doesn't
682 # report the number of links in a directory as an indication
683 # of the number of files.
684 # See, e.g. hints/machten.sh for MachTen 2.2.
685 unless ($File::Find::dont_use_nlink) {
687 $File::Find::dont_use_nlink = 1 if ($Config::Config{'dont_use_nlink'});