Re: Kqueues and fork

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 21 Aug 2025 17:28:39 UTC
On Thu, Aug 21, 2025 at 09:53:31AM -0500, Kyle Evans wrote:
> On 8/21/25 02:45, Konstantin Belousov wrote:
> > On Wed, Aug 20, 2025 at 10:56:44AM -0500, Kyle Evans wrote:
> > > On 8/20/25 06:11, Konstantin Belousov wrote:
> > > > Right now, kqueues fds are marked as not D_PASSABLE, which means that
> > > > the corresponding file descriptor is not copied into the child filedesc
> > > > table on fork. The reasoning is that kqueues work on file descriptors,
> > > > and not even files, so they are tied to the fdesc table.
> > > > 
> > > > As a curious coincidence, I have two private discussions last week,
> > > > where in both cases people were interested in getting more useful
> > > > behavior on fork from kqueues. [My understanding is that epoll does
> > > > that, so there is a desire to make kqueue equal in the capability.]
> > > > 
> > > > I convinced myself, that indeed kqueues can be copied on fork.
> > > > Precisely, the proposed semantics is the following:
> > > > - fdesc copy would allocate a new kqueue as the same fd as the existing
> > > >     kqueue in the source fdesc
> > > > - each registered event in the source kqueue is copied into same event
> > > >     (for the same filter, of course) into the new kqueue
> > > > - if the event is active at the time of copy, its copy is activated
> > > >     as well
> > > > 
> > > > The prototype in https://reviews.freebsd.org/D52045 gives the naive
> > > > implementation of the idea.  What I mean by 'naive' is explained in the
> > > > review summary, where I point out the places requiring (much) more work.
> > > > > The new copy behavior is requested by the KQUEUE_CPONFORK flag to
> > > > kqueue1(2).  Existing code that does not specify the flag, gets the old
> > > > (drop) action on fork.
> > > > 
> > > > Example of the usage is provided by https://reviews.freebsd.org/P665.
> > > > 
> > > > Before I spend a lot of efforts into finishing this, I want to discuss
> > > > the proposal:
> > > > 
> > > > Is this what the app writers want?
> > > > Is there some reasons for the proposal to be either architecturally
> > > > unsound, or containing some hard to overcome problems?
> > > > 
> > > 
> > > The immediate question that comes to mind is whether filters should have a
> > > chance to drop out on fork, behavior that might be defined by some flag,
> > > fflag or an intrinsic property of the filter.
> > > 
> > > I'm mainly thinking of app-controlled event dropping at the moment, admittedly,
> > > and specifically in things like timers and signals that can easily lose their
> > > meaning across a fork and I wouldn't always want to persist (especially
> > > if trying to be cognizant of the kern.kq_calloutmax limit).
> > Timers are esp. complicated. I plan to add the f_copy() method to knote
> > f_ops, with the immediate application to handle timers.
> > 
> > Timers knotes are unusual because they are detached, but also have an
> > additional allocation behind them, and that structure points back to
> > the owning process.  So they require a lot of special handling: avoid
> > knlist_add(), do the alloc, point the pri	vate structure to the right
> > process etc, start the callout.
> > 
> > Process notes should be easier, there mostly the pids need fixing.
> > 
> 
> My more concrete concern here was more with the application not
> wanting them to persist in the child, mostly for convenience. Perhaps
> these timers in the parent are just for other maintenance tasks that
> the fork won't be doing, the natural complement to KQUEUE_CPONFORK
> would be an EV_DELETEONFORK or EV_NOFORK event flag to drop selective
> parts of the kqueue now that they can persist the kqueue itself.
>

I see. Problem with EV_NOFORK is that we run out of bits for flags.
We might stole a bit from EV_SYSFLAGS, or can make it only valid for
register() then internal interfaces would not be able to request EV_NOFORK
which is not needed IMO.

On the other hand, app needs explicit action to request kqueue copy on fork,
so it might as well explicitly EV_DELETE unneeded events.

> > > 
> > > As for the overall proposal, it seems reasonable to me given that it's opt-in
> > > at the application level.  All of the problems I can think of would need to be
> > > handled by the application and aren't really our problem if we just highlight
> > > the extra consideration needed if you're going to do this (that the event will
> > > persist in both parent and child, need their own coordination to handle race
> > > conditions for I/O events, etc).
> > > 
> > > I am a little uncertain of what this means for events with udata/ext[2]/ext[3]
> > > attached, but that would seem to be just another problem for the application.
> > 
> > What problems do you see with udata/ext?  With kernel-colored glasses on,
> > they are just opaque binaries, we should not care about them.  In fact,
> > I would expect them to be more problematic for execve(), if somebody
> > cares about pointer semantic.
> 
> I was reasoning through kqueue across an rfork(RFPROC|RFMEM) and didn't really end
> up with a complete thought here, sorry.  It had occurred to me that these are good
> candidates for less-obvious pointers to shared data structures to wander across, but
> application developer should know what they're doing if they're trying to combine the
> two for some reason.
> 

Meantime I updated the review with the proper implementation of the timer
knotes copying.