Re: Kqueues and fork

From: Mark Johnston <markj_at_freebsd.org>
Date: Thu, 21 Aug 2025 20:53:48 UTC
On Thu, Aug 21, 2025 at 10:15:44PM +0300, Konstantin Belousov wrote:
> On Thu, Aug 21, 2025 at 02:48:28PM -0400, Mark Johnston wrote:
> > On Wed, Aug 20, 2025 at 02:11:44PM +0300, Konstantin Belousov wrote:
> > > Right now, kqueues fds are marked as not D_PASSABLE, which means that
> > > the corresponding file descriptor is not copied into the child filedesc
> > > table on fork. The reasoning is that kqueues work on file descriptors,
> > > and not even files, so they are tied to the fdesc table.
> > > 
> > > As a curious coincidence, I have two private discussions last week,
> > > where in both cases people were interested in getting more useful
> > > behavior on fork from kqueues. [My understanding is that epoll does
> > > that, so there is a desire to make kqueue equal in the capability.]
> > > 
> > > I convinced myself, that indeed kqueues can be copied on fork.
> > > Precisely, the proposed semantics is the following:
> > > - fdesc copy would allocate a new kqueue as the same fd as the existing
> > >   kqueue in the source fdesc
> > > - each registered event in the source kqueue is copied into same event
> > >   (for the same filter, of course) into the new kqueue
> > > - if the event is active at the time of copy, its copy is activated
> > >   as well
> > > 
> > > The prototype in https://reviews.freebsd.org/D52045 gives the naive
> > > implementation of the idea.  What I mean by 'naive' is explained in the
> > > review summary, where I point out the places requiring (much) more work.
> > > 
> > > The new copy behavior is requested by the KQUEUE_CPONFORK flag to
> > > kqueue1(2).  Existing code that does not specify the flag, gets the old
> > > (drop) action on fork.
> > > 
> > > Example of the usage is provided by https://reviews.freebsd.org/P665.
> > > 
> > > Before I spend a lot of efforts into finishing this, I want to discuss
> > > the proposal:
> > > 
> > > Is this what the app writers want?
> > 
> > Looking at your patch, it seems that the child will receive a completely
> > separate kqueue, i.e., the queue itself is process-private.  From my
> > reading of epoll docs, after fork the child will share the epoll state
> > with the parent in some sense.
> 
> I do not see how we could share anything because we copy filedesc.

But file descriptions (i.e., struct file *) are shared after fork, in
general.  With the patch, the child receives a completely separate
kqueue after fork.  I am not saying it is wrong, but AFAIU this is not
how epoll works, so if the goal is to provide some epoll compatibility
in userspace, there might be some problems.

> > I wonder if it is really useful for the child process to inherit non-fd
> > knotes?  Maybe such knotes should be ignored.
> 
> IMO the inheritance of e.g. timer events is the right thing to do.
> I do not see why would child not want the signal events, or in fact
> most of the non-isfd events.  They are all functionally meaningful
> after the fork.
>
> I understand that in specific circumstances child might not want some
> kind of events, but it is up to the child code to EV_DELETE then, or
> use hypothetical EV_NOCPONFORK flag proposed by Kyle.
> 
> If there is some preference to not copy non-isfd events, I can add
> two flags to kqueue1() instead of one.  E.g. KQUEUE_CPONFORKFD and
> KQUEUE_CPONFORKNONFD, and then
> #define KQUEUE_CPONFORK (KQUEUE_CPONFORKFD | KQUEUE_CPONFORKNONFD)