Re: native inotify implementation

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 05 Jul 2025 16:30:18 UTC
On Sat, Jul 05, 2025 at 03:49:46AM +0300, Vadim Goncharov wrote:
> On Sat, 17 May 2025 11:18:34 -0400
> Mark Johnston <markj@freebsd.org> wrote:
> 
> > On Fri, May 16, 2025 at 11:02:33AM -0500, Jake Freeland wrote:
> > > On Mon May 12, 2025 at 3:58 PM CDT, Mark Johnston wrote:  
> > > > For the past while I've been hacking on a native implementation of
> > > > Linux's inotify.  Functionality-wise, this is similar to but not quite
> > > > equivalent to the EVFILT_VNODE kqueue filter.  While we already have a
> > > > userspace implementation of inotify built on top of kqueue, it shares
> > > > the limitations of EVFILT_VNODE, and my version can also be used in the
> > > > Linuxulator.  (Please let me know if you're interested in working on
> > > > that and testing it out.)
> [...]
> > > > This work was largely motivated by a race condition in EVFILT_VNODE: in
> > > > order to get events for a particular file, you first have to open it, by
> > > > which point you may have missed the event(s) you care about.  For
> > > > instance, if some upload service adds files to a directory, and you want
> > > > to know when a new file has finished uploading, you'd have to watch the
> > > > directory to get new file events, scan the directory to actually find
> > > > the new file(s), open them, and then wait for NOTE_CLOSE (which might
> > > > never arrive if the upload had already finished).  Aside from that, the
> > > > need to hold each monitored file open is also a problem for large
> > > > directory hierarchies as it's easy to exhaust file descriptor limits.
> > > >
> > > > My initial solution was a new kqueue filter, EVFILT_FSWATCH, which lets
> > > > one watch for all file events under a mountpoint.  The consumer would
> > > > allocate a ring buffer with space to store paths and event metadata,
> > > > register that with the kernel, and the kernel would write entries to the
> > > > buffer, using reverse lookups to find a path for each event vnode.  This
> > > > prototype worked, but got somewhat hairy and I decided it would be
> > > > better to simply implement an existing interface: inotify already exists
> > > > and is commonly used, and has a somewhat simpler model, as it merely
> > > > watches for events within a particular directory.  
> > > 
> > > I've found that more and more developers are blindly using Linux-specific
> > > interfaces these days, so +1 for natively supporting another one.
> > > 
> > > The more support we have for these, the easier porting/Linux emulation is.
> > > I think the benefits of this far outweighs the cost of maintaining the
> > > code.  
> > 
> > I think so too.  My perspective is that we should implement widely used
> > Linux interfaces as part of the larger goal of making existing software
> > usable on FreeBSD.  This is more important than the purity of the
> > kernel's interfaces or architecture, at least up to a certain point.
> > 
> > The whole purpose of an OS is to let users run the programs they want to
> > run, without getting in the way (too much).
> 
> Yes, and no. While it's often useful in short-term perspective, such approach
> leaves FreeBSD without unique features so it becomes yet another "Linux, just
> poorer" with obvious then "why choose it?". It's understandable that in some
> cases it is simple to implement compatible API, but an alternative like "have
> more general solution with a compatibility shim layer via which their API is
> implemented" is better, when possible.

Sure, but so far there is no clear description of a more general
solution, and the shortcomings of EVFILT_VNODE have been known for a
long time.

There's also nothing precluding this inotify implementation from being
extended or replaced, just so long as a compatible implementation can be
provided in libc.

> It's late in which particular topic as commit was landed, but for future we
> should think how to extend kqueue to be able more.

As I mentioned in my original email, that's what I tried to do first.
It is immediately more complicated than inotify since kevent() doesn't
have a good way to return arbitrary data (particularly file names and
paths) to userspace.  It is possible if we make kevent() write to a user
pointer embedded in the knote, but it's not simple.  I note that XNU
also does not use kqueue for this purpose, and I'm skeptical that it's
the right substrate for a file montoring interface.

> [E.g. I'd want to have notifications for my protocol with multiple streams
> inside one socket (think like QUIC), but it does not fit nicely into current
> struct kevent or socket API (multiple socket buffers with separate reading)]