Re: Kqueues and fork

From: Vadim Goncharov <vadimnuclight_at_gmail.com>
Date: Fri, 22 Aug 2025 18:36:48 UTC
On Fri, 22 Aug 2025 10:29:32 -0400
Mark Johnston <markj@freebsd.org> wrote:

> > > But file descriptions (i.e., struct file *) are shared after fork, in
> > > general.  With the patch, the child receives a completely separate
> > > kqueue after fork.  I am not saying it is wrong, but AFAIU this is not
> > > how epoll works, so if the goal is to provide some epoll compatibility
> > > in userspace, there might be some problems.  
> > What could we share between kqueues after fork?
> > Kqueue events ids are file descriptors, not files.  

...and not always descriptors, even.

> Sure, but file descriptors are duplicated across fork, so the event IDs
> stay the same.  And the knote references the underlying file
> description, and those will be the same after fork.
> 
> Yes, if one of the processes closes an fd, then kevent() may still
> return events for the closed fd because the other process still holds it
> open, but maybe the solution is "don't do that."

The unsubscription on close() is the Right Thing in kqueue(), it is it's
strength, applicatiopn writers rely on it. So this must continue to work even
if for compatibility reasons.

> > Linux man page for epoll states that epoll indexes registered events by
> > key:
> >         The key is the combination of the file descriptor number and the
> > open file description (also known as an "open file handle",  the  kernel's
> >         internal representation of an open file).
> > 
> > When I did a research before starting the implementation, and discussed
> > some of it with the app writers who initiated the work, I found this
> > https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/
> > (ignore the 'fundamentally broken', author claim is that everything is
> > broken anyway).
> >
> > I believe that my choice is the most natural one, given the existing
> > structure of kqueue.  The example I posted demonstrates the natural
> > use of the copy on fork: I register the pipe reads and the timer before
> > fork, and wait for events after, in the child.  
> 
> Again, I am not saying your choice is wrong, just observing that with
> your patch, fork creates a new, separate kqueue for each kqueue
> descriptor in the parent, but this is different from how fork behaves
> for all other descriptor types (including epoll).
> 
> I think your approach is simpler for the kernel side, it is just not so
> clear to me that it is what app writers want.  The behaviour in the blog
> post is weird certainly (how should knote_fdclose() behave if multiple
> processes can share a kqueue?) but it can be avoided by applications.
> If a per-process kqueue is useful to app writers, then ok.

Application writers don't want to bother with implementation details. The blog
post demsotrates how epoll() is broken in regard to close(), and kqueue() does
not have this problem, so appplication writers rely on this.

-- 
WBR, @nuclight