Re: Kqueues and fork
- Reply: Vadim Goncharov : "Re: Kqueues and fork"
- In reply to: Konstantin Belousov : "Re: Kqueues and fork"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 22 Aug 2025 14:29:32 UTC
On Fri, Aug 22, 2025 at 02:18:10PM +0300, Konstantin Belousov wrote: > On Thu, Aug 21, 2025 at 04:53:48PM -0400, Mark Johnston wrote: > > On Thu, Aug 21, 2025 at 10:15:44PM +0300, Konstantin Belousov wrote: > > > On Thu, Aug 21, 2025 at 02:48:28PM -0400, Mark Johnston wrote: > > > > On Wed, Aug 20, 2025 at 02:11:44PM +0300, Konstantin Belousov wrote: > > > > > Right now, kqueues fds are marked as not D_PASSABLE, which means that > > > > > the corresponding file descriptor is not copied into the child filedesc > > > > > table on fork. The reasoning is that kqueues work on file descriptors, > > > > > and not even files, so they are tied to the fdesc table. > > > > > > > > > > As a curious coincidence, I have two private discussions last week, > > > > > where in both cases people were interested in getting more useful > > > > > behavior on fork from kqueues. [My understanding is that epoll does > > > > > that, so there is a desire to make kqueue equal in the capability.] > > > > > > > > > > I convinced myself, that indeed kqueues can be copied on fork. > > > > > Precisely, the proposed semantics is the following: > > > > > - fdesc copy would allocate a new kqueue as the same fd as the existing > > > > > kqueue in the source fdesc > > > > > - each registered event in the source kqueue is copied into same event > > > > > (for the same filter, of course) into the new kqueue > > > > > - if the event is active at the time of copy, its copy is activated > > > > > as well > > > > > > > > > > The prototype in https://reviews.freebsd.org/D52045 gives the naive > > > > > implementation of the idea. What I mean by 'naive' is explained in the > > > > > review summary, where I point out the places requiring (much) more work. > > > > > > > > > > The new copy behavior is requested by the KQUEUE_CPONFORK flag to > > > > > kqueue1(2). Existing code that does not specify the flag, gets the old > > > > > (drop) action on fork. > > > > > > > > > > Example of the usage is provided by https://reviews.freebsd.org/P665. > > > > > > > > > > Before I spend a lot of efforts into finishing this, I want to discuss > > > > > the proposal: > > > > > > > > > > Is this what the app writers want? > > > > > > > > Looking at your patch, it seems that the child will receive a completely > > > > separate kqueue, i.e., the queue itself is process-private. From my > > > > reading of epoll docs, after fork the child will share the epoll state > > > > with the parent in some sense. > > > > > > I do not see how we could share anything because we copy filedesc. > > > > But file descriptions (i.e., struct file *) are shared after fork, in > > general. With the patch, the child receives a completely separate > > kqueue after fork. I am not saying it is wrong, but AFAIU this is not > > how epoll works, so if the goal is to provide some epoll compatibility > > in userspace, there might be some problems. > What could we share between kqueues after fork? > Kqueue events ids are file descriptors, not files. Sure, but file descriptors are duplicated across fork, so the event IDs stay the same. And the knote references the underlying file description, and those will be the same after fork. Yes, if one of the processes closes an fd, then kevent() may still return events for the closed fd because the other process still holds it open, but maybe the solution is "don't do that." > Linux man page for epoll states that epoll indexes registered events by > key: > The key is the combination of the file descriptor number and the open > file description (also known as an "open file handle", the kernel's > internal representation of an open file). > > When I did a research before starting the implementation, and discussed > some of it with the app writers who initiated the work, I found this > https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/ > (ignore the 'fundamentally broken', author claim is that everything is > broken anyway). > > I believe that my choice is the most natural one, given the existing > structure of kqueue. The example I posted demonstrates the natural > use of the copy on fork: I register the pipe reads and the timer before > fork, and wait for events after, in the child. Again, I am not saying your choice is wrong, just observing that with your patch, fork creates a new, separate kqueue for each kqueue descriptor in the parent, but this is different from how fork behaves for all other descriptor types (including epoll). I think your approach is simpler for the kernel side, it is just not so clear to me that it is what app writers want. The behaviour in the blog post is weird certainly (how should knote_fdclose() behave if multiple processes can share a kqueue?) but it can be avoided by applications. If a per-process kqueue is useful to app writers, then ok.