Re: Kqueues and fork
- Reply: Mark Johnston : "Re: Kqueues and fork"
- In reply to: Mark Johnston : "Re: Kqueues and fork"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 22 Aug 2025 11:18:10 UTC
On Thu, Aug 21, 2025 at 04:53:48PM -0400, Mark Johnston wrote:
> On Thu, Aug 21, 2025 at 10:15:44PM +0300, Konstantin Belousov wrote:
> > On Thu, Aug 21, 2025 at 02:48:28PM -0400, Mark Johnston wrote:
> > > On Wed, Aug 20, 2025 at 02:11:44PM +0300, Konstantin Belousov wrote:
> > > > Right now, kqueues fds are marked as not D_PASSABLE, which means that
> > > > the corresponding file descriptor is not copied into the child filedesc
> > > > table on fork. The reasoning is that kqueues work on file descriptors,
> > > > and not even files, so they are tied to the fdesc table.
> > > >
> > > > As a curious coincidence, I have two private discussions last week,
> > > > where in both cases people were interested in getting more useful
> > > > behavior on fork from kqueues. [My understanding is that epoll does
> > > > that, so there is a desire to make kqueue equal in the capability.]
> > > >
> > > > I convinced myself, that indeed kqueues can be copied on fork.
> > > > Precisely, the proposed semantics is the following:
> > > > - fdesc copy would allocate a new kqueue as the same fd as the existing
> > > > kqueue in the source fdesc
> > > > - each registered event in the source kqueue is copied into same event
> > > > (for the same filter, of course) into the new kqueue
> > > > - if the event is active at the time of copy, its copy is activated
> > > > as well
> > > >
> > > > The prototype in https://reviews.freebsd.org/D52045 gives the naive
> > > > implementation of the idea. What I mean by 'naive' is explained in the
> > > > review summary, where I point out the places requiring (much) more work.
> > > >
> > > > The new copy behavior is requested by the KQUEUE_CPONFORK flag to
> > > > kqueue1(2). Existing code that does not specify the flag, gets the old
> > > > (drop) action on fork.
> > > >
> > > > Example of the usage is provided by https://reviews.freebsd.org/P665.
> > > >
> > > > Before I spend a lot of efforts into finishing this, I want to discuss
> > > > the proposal:
> > > >
> > > > Is this what the app writers want?
> > >
> > > Looking at your patch, it seems that the child will receive a completely
> > > separate kqueue, i.e., the queue itself is process-private. From my
> > > reading of epoll docs, after fork the child will share the epoll state
> > > with the parent in some sense.
> >
> > I do not see how we could share anything because we copy filedesc.
>
> But file descriptions (i.e., struct file *) are shared after fork, in
> general. With the patch, the child receives a completely separate
> kqueue after fork. I am not saying it is wrong, but AFAIU this is not
> how epoll works, so if the goal is to provide some epoll compatibility
> in userspace, there might be some problems.
What could we share between kqueues after fork?
Kqueue events ids are file descriptors, not files.
Linux man page for epoll states that epoll indexes registered events by
key:
The key is the combination of the file descriptor number and the open
file description (also known as an "open file handle", the kernel's
internal representation of an open file).
When I did a research before starting the implementation, and discussed
some of it with the app writers who initiated the work, I found this
https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/
(ignore the 'fundamentally broken', author claim is that everything is
broken anyway).
I believe that my choice is the most natural one, given the existing
structure of kqueue. The example I posted demonstrates the natural
use of the copy on fork: I register the pipe reads and the timer before
fork, and wait for events after, in the child.
>
> > > I wonder if it is really useful for the child process to inherit non-fd
> > > knotes? Maybe such knotes should be ignored.
> >
> > IMO the inheritance of e.g. timer events is the right thing to do.
> > I do not see why would child not want the signal events, or in fact
> > most of the non-isfd events. They are all functionally meaningful
> > after the fork.
> >
> > I understand that in specific circumstances child might not want some
> > kind of events, but it is up to the child code to EV_DELETE then, or
> > use hypothetical EV_NOCPONFORK flag proposed by Kyle.
> >
> > If there is some preference to not copy non-isfd events, I can add
> > two flags to kqueue1() instead of one. E.g. KQUEUE_CPONFORKFD and
> > KQUEUE_CPONFORKNONFD, and then
> > #define KQUEUE_CPONFORK (KQUEUE_CPONFORKFD | KQUEUE_CPONFORKNONFD)