Re: Kqueues and fork

Reply: Konstantin Belousov : "Re: Kqueues and fork"
In reply to: Konstantin Belousov : "Kqueues and fork"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Kyle Evans <kevans_at_FreeBSD.org>
Date: Wed, 20 Aug 2025 15:56:44 UTC
On 8/20/25 06:11, Konstantin Belousov wrote:
> Right now, kqueues fds are marked as not D_PASSABLE, which means that
> the corresponding file descriptor is not copied into the child filedesc
> table on fork. The reasoning is that kqueues work on file descriptors,
> and not even files, so they are tied to the fdesc table.
> 
> As a curious coincidence, I have two private discussions last week,
> where in both cases people were interested in getting more useful
> behavior on fork from kqueues. [My understanding is that epoll does
> that, so there is a desire to make kqueue equal in the capability.]
> 
> I convinced myself, that indeed kqueues can be copied on fork.
> Precisely, the proposed semantics is the following:
> - fdesc copy would allocate a new kqueue as the same fd as the existing
>    kqueue in the source fdesc
> - each registered event in the source kqueue is copied into same event
>    (for the same filter, of course) into the new kqueue
> - if the event is active at the time of copy, its copy is activated
>    as well
> 
> The prototype in https://reviews.freebsd.org/D52045 gives the naive
> implementation of the idea.  What I mean by 'naive' is explained in the
> review summary, where I point out the places requiring (much) more work.
> > The new copy behavior is requested by the KQUEUE_CPONFORK flag to
> kqueue1(2).  Existing code that does not specify the flag, gets the old
> (drop) action on fork.
> 
> Example of the usage is provided by https://reviews.freebsd.org/P665.
> 
> Before I spend a lot of efforts into finishing this, I want to discuss
> the proposal:
> 
> Is this what the app writers want?
> Is there some reasons for the proposal to be either architecturally
> unsound, or containing some hard to overcome problems?
> 

The immediate question that comes to mind is whether filters should have a
chance to drop out on fork, behavior that might be defined by some flag,
fflag or an intrinsic property of the filter.

I'm mainly thinking of app-controlled event dropping at the moment, admittedly,
and specifically in things like timers and signals that can easily lose their
meaning across a fork and I wouldn't always want to persist (especially
if trying to be cognizant of the kern.kq_calloutmax limit).

As for the overall proposal, it seems reasonable to me given that it's opt-in
at the application level.  All of the problems I can think of would need to be
handled by the application and aren't really our problem if we just highlight
the extra consideration needed if you're going to do this (that the event will
persist in both parent and child, need their own coordination to handle race
conditions for I/O events, etc).

I am a little uncertain of what this means for events with udata/ext[2]/ext[3]
attached, but that would seem to be just another problem for the application.

Thanks,

Kyle Evans