fts improvements, alternatives
Tim Kientzle
kientzle at freebsd.org
Thu Jan 13 16:48:52 PST 2005
[Moved to -current for further discussion.]
David Schultz wrote:
>Tim Kientzle wrote:
>>Pawel Jakub Dawidek wrote:
>>
>>> Introduce new field 'fts_bignum' ...
>>> This work is part of the BigDisk project:
>>> http://www.FreeBSD.org/projects/bigdisk/
>>
>>Any plans to deal with other fts limits ... ?
>
> Removing FTS_LOGICAL's path length limitation is nontrivial, but
> given your experience with bsdtar, I'd say you're an ideal person
> to do it. ;-)
As it happens, I started tinkering with these ideas a
while ago but haven't found time to finish it all.
Here's a snapshot of the current WIP:
http://people.freebsd.org/~kientzle/libarchive/src/tree.tgz
This includes a simple main() test function that
just does the rough equivalent of "find .".
> .. fts() effectively uses chdir("..") to
> climb up one level in the tree in chdir mode. If it chdir'd
> across a symlink earlier, this would put it in the wrong place.
> Perhaps you have a better solution, but my best idea is to keep
> the parent directory open when chdiring ...
The tree package does exactly this. It just keeps a
flag with each ancestor node indicating the type of traversal
that's needed.
> A more uniform approach would be to ... always keep the
> parent open when descending a level. Unfortunately, this limits
> the traversal depth to the number of available file descriptors.
The tree package does not do this for exactly this reason.
I have some ideas for handling the case where the number of symlinks
exceeds the available file descriptors, but that doesn't seem
particularly pressing at the moment. <grin>
> (On the other hand, I would argue that anyone with a directory
> tree a few thousand levels deep is asking for trouble.)
I thought you *wanted* to support big disks! ;-)
I started this work partly because I wanted to really stress
the long pathname support in bsdtar. I've archived
directory trees with megabyte pathnames (several thousand
directory levels crossing several hundred symlinks) in testing.
Of course, I can't yet *dearchive* such deep trees.
That seems to be a harder problem.
The tree package also keeps a *lot* less data in memory than fts.
It has no trouble with million-entry directories, for example.
In comparison, the current ls crashes on such large directories
in part because of the memory required for fts.
The tree package is quite a bit different in many respects:
* The traversal is in a very different order, for instance.
* It has a completely different API than fts.
It's fully opaque (so should be easier to change in the future,
unlike fts).
* It takes a very different approach to determine
when to visit a child. In particular, instead of
the client specifying a mode and optionally inhibiting the descent
through a "prune" request, the tree package has the client
specifically request descent. If you request descent for
*every* item, you'll end up with a logical traversal; if you
request descent for every dir, you'll end up with a physical
traversal. (The tree package is smart enough to ignore any
descent request that isn't for a dir or a link to a dir.)
Feedback, suggestions, etc. appreciated.
Tim
More information about the freebsd-current
mailing list