Impact of having a large number of open file descriptors
rwatson at FreeBSD.org
Tue Jun 3 10:38:12 UTC 2008
On Mon, 2 Jun 2008, Garance A Drosihn wrote:
> I remember a discussion of changes to MacOS10 in Leopard which made it
> easier to implement features such as Spotlight and TimeMachine. The
> description starts here, I think:
> the section on file-system events.
> The idea I thought was interesting was to save the metadata on a directory
> basis, instead of saving it on the file. So, if file /some/dir/fname was
> changed, then they'd record that *some* file under /some/dir has changed.
> So when your userland process comes along later on, it still has to scan all
> files in that directory to see which file(s) actually changed. But that's a
> lot less work than scanning all files in the filesystem, and it also means
> there is much less data that has to be kept track of.
> I have no idea how easy it would be to implement something similar on
> FreeBSD, but the strategy seemed like a pretty neat idea.
fsevents allows user processes to subscribe, effectively on a per-filesystem
basis, to namespace and file close operations. The implementation is split
into two parts: a kernel component, which captures events with possible
coalescing, and a user daemon, fseventsd, which listens on a special device
and then provides scope narrowing and persistence for subscriptions.
Applications talk to fseventsd, using Mach ports, I believe, and fseventsd is
responsible for tracking subscriptions, filtering events, and so on.
I'm aware of several limitations that should be considered very carefully
before adopting this code:
(1) The user<->kernel interface is essentially a firehose, and available only
to privileged processes. fseventsd performs checks in user space to see
whether each consumer is allowed access to each event, which can lead to
confusing and potentially quite incorrect results.
(2) The kernel code requires a reliable conversion from vnode to path, which
we don't have, as events are with respect to paths, and especially
(3) The user daemon requires synchronous hooks into the file system umount
event because fseventsd stores its events journal in the file system root,
so must first close it before the file system can be unmounted. In Mac OS
X, this is satisfied by having the disk arbitration daemon, which performs
unmounts, first send a message to fseventsd and wait for it to finish up.
I've seen a number of occasions where the disk unmount process has become
non-trivially stalled due to fseventsd, so there's a potential robustness
(4) As I understand it, events frequently come down to "file system X
changed" in practice, which could be captured by a far simpler mechanism.
I've not done any measurements to confirm whether this is the case, but
it's not impossible to imagine on a busy system.
I think there's also considerable overlap with other kernel event systems,
such as audit, and we might benefit from thinking seriously about enhancing
those event systems rather than introducing a new one. The design of fsevents
is pretty much entirely dictated by the needs of Spotlight and later Time
Machine. In particular, it's not clear to me that the persistency
requirements, which are a large part of the fsevents design, are important to
us... or are they?
Robert N M Watson
University of Cambridge
More information about the freebsd-hackers