Re: Kqueues and fork
- Reply: Konstantin Belousov : "Re: Kqueues and fork"
- In reply to: Konstantin Belousov : "Re: Kqueues and fork"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 21 Aug 2025 14:53:31 UTC
On 8/21/25 02:45, Konstantin Belousov wrote: > On Wed, Aug 20, 2025 at 10:56:44AM -0500, Kyle Evans wrote: >> On 8/20/25 06:11, Konstantin Belousov wrote: >>> Right now, kqueues fds are marked as not D_PASSABLE, which means that >>> the corresponding file descriptor is not copied into the child filedesc >>> table on fork. The reasoning is that kqueues work on file descriptors, >>> and not even files, so they are tied to the fdesc table. >>> >>> As a curious coincidence, I have two private discussions last week, >>> where in both cases people were interested in getting more useful >>> behavior on fork from kqueues. [My understanding is that epoll does >>> that, so there is a desire to make kqueue equal in the capability.] >>> >>> I convinced myself, that indeed kqueues can be copied on fork. >>> Precisely, the proposed semantics is the following: >>> - fdesc copy would allocate a new kqueue as the same fd as the existing >>> kqueue in the source fdesc >>> - each registered event in the source kqueue is copied into same event >>> (for the same filter, of course) into the new kqueue >>> - if the event is active at the time of copy, its copy is activated >>> as well >>> >>> The prototype in https://reviews.freebsd.org/D52045 gives the naive >>> implementation of the idea. What I mean by 'naive' is explained in the >>> review summary, where I point out the places requiring (much) more work. >>>> The new copy behavior is requested by the KQUEUE_CPONFORK flag to >>> kqueue1(2). Existing code that does not specify the flag, gets the old >>> (drop) action on fork. >>> >>> Example of the usage is provided by https://reviews.freebsd.org/P665. >>> >>> Before I spend a lot of efforts into finishing this, I want to discuss >>> the proposal: >>> >>> Is this what the app writers want? >>> Is there some reasons for the proposal to be either architecturally >>> unsound, or containing some hard to overcome problems? >>> >> >> The immediate question that comes to mind is whether filters should have a >> chance to drop out on fork, behavior that might be defined by some flag, >> fflag or an intrinsic property of the filter. >> >> I'm mainly thinking of app-controlled event dropping at the moment, admittedly, >> and specifically in things like timers and signals that can easily lose their >> meaning across a fork and I wouldn't always want to persist (especially >> if trying to be cognizant of the kern.kq_calloutmax limit). > Timers are esp. complicated. I plan to add the f_copy() method to knote > f_ops, with the immediate application to handle timers. > > Timers knotes are unusual because they are detached, but also have an > additional allocation behind them, and that structure points back to > the owning process. So they require a lot of special handling: avoid > knlist_add(), do the alloc, point the pri vate structure to the right > process etc, start the callout. > > Process notes should be easier, there mostly the pids need fixing. > My more concrete concern here was more with the application not wanting them to persist in the child, mostly for convenience. Perhaps these timers in the parent are just for other maintenance tasks that the fork won't be doing, the natural complement to KQUEUE_CPONFORK would be an EV_DELETEONFORK or EV_NOFORK event flag to drop selective parts of the kqueue now that they can persist the kqueue itself. >> >> As for the overall proposal, it seems reasonable to me given that it's opt-in >> at the application level. All of the problems I can think of would need to be >> handled by the application and aren't really our problem if we just highlight >> the extra consideration needed if you're going to do this (that the event will >> persist in both parent and child, need their own coordination to handle race >> conditions for I/O events, etc). >> >> I am a little uncertain of what this means for events with udata/ext[2]/ext[3] >> attached, but that would seem to be just another problem for the application. > > What problems do you see with udata/ext? With kernel-colored glasses on, > they are just opaque binaries, we should not care about them. In fact, > I would expect them to be more problematic for execve(), if somebody > cares about pointer semantic. I was reasoning through kqueue across an rfork(RFPROC|RFMEM) and didn't really end up with a complete thought here, sorry. It had occurred to me that these are good candidates for less-obvious pointers to shared data structures to wander across, but application developer should know what they're doing if they're trying to combine the two for some reason. Thanks, Kyle Evans