Re: Kqueues and fork
- In reply to: Kyle Evans : "Re: Kqueues and fork"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 21 Aug 2025 17:28:39 UTC
On Thu, Aug 21, 2025 at 09:53:31AM -0500, Kyle Evans wrote: > On 8/21/25 02:45, Konstantin Belousov wrote: > > On Wed, Aug 20, 2025 at 10:56:44AM -0500, Kyle Evans wrote: > > > On 8/20/25 06:11, Konstantin Belousov wrote: > > > > Right now, kqueues fds are marked as not D_PASSABLE, which means that > > > > the corresponding file descriptor is not copied into the child filedesc > > > > table on fork. The reasoning is that kqueues work on file descriptors, > > > > and not even files, so they are tied to the fdesc table. > > > > > > > > As a curious coincidence, I have two private discussions last week, > > > > where in both cases people were interested in getting more useful > > > > behavior on fork from kqueues. [My understanding is that epoll does > > > > that, so there is a desire to make kqueue equal in the capability.] > > > > > > > > I convinced myself, that indeed kqueues can be copied on fork. > > > > Precisely, the proposed semantics is the following: > > > > - fdesc copy would allocate a new kqueue as the same fd as the existing > > > > kqueue in the source fdesc > > > > - each registered event in the source kqueue is copied into same event > > > > (for the same filter, of course) into the new kqueue > > > > - if the event is active at the time of copy, its copy is activated > > > > as well > > > > > > > > The prototype in https://reviews.freebsd.org/D52045 gives the naive > > > > implementation of the idea. What I mean by 'naive' is explained in the > > > > review summary, where I point out the places requiring (much) more work. > > > > > The new copy behavior is requested by the KQUEUE_CPONFORK flag to > > > > kqueue1(2). Existing code that does not specify the flag, gets the old > > > > (drop) action on fork. > > > > > > > > Example of the usage is provided by https://reviews.freebsd.org/P665. > > > > > > > > Before I spend a lot of efforts into finishing this, I want to discuss > > > > the proposal: > > > > > > > > Is this what the app writers want? > > > > Is there some reasons for the proposal to be either architecturally > > > > unsound, or containing some hard to overcome problems? > > > > > > > > > > The immediate question that comes to mind is whether filters should have a > > > chance to drop out on fork, behavior that might be defined by some flag, > > > fflag or an intrinsic property of the filter. > > > > > > I'm mainly thinking of app-controlled event dropping at the moment, admittedly, > > > and specifically in things like timers and signals that can easily lose their > > > meaning across a fork and I wouldn't always want to persist (especially > > > if trying to be cognizant of the kern.kq_calloutmax limit). > > Timers are esp. complicated. I plan to add the f_copy() method to knote > > f_ops, with the immediate application to handle timers. > > > > Timers knotes are unusual because they are detached, but also have an > > additional allocation behind them, and that structure points back to > > the owning process. So they require a lot of special handling: avoid > > knlist_add(), do the alloc, point the pri vate structure to the right > > process etc, start the callout. > > > > Process notes should be easier, there mostly the pids need fixing. > > > > My more concrete concern here was more with the application not > wanting them to persist in the child, mostly for convenience. Perhaps > these timers in the parent are just for other maintenance tasks that > the fork won't be doing, the natural complement to KQUEUE_CPONFORK > would be an EV_DELETEONFORK or EV_NOFORK event flag to drop selective > parts of the kqueue now that they can persist the kqueue itself. > I see. Problem with EV_NOFORK is that we run out of bits for flags. We might stole a bit from EV_SYSFLAGS, or can make it only valid for register() then internal interfaces would not be able to request EV_NOFORK which is not needed IMO. On the other hand, app needs explicit action to request kqueue copy on fork, so it might as well explicitly EV_DELETE unneeded events. > > > > > > As for the overall proposal, it seems reasonable to me given that it's opt-in > > > at the application level. All of the problems I can think of would need to be > > > handled by the application and aren't really our problem if we just highlight > > > the extra consideration needed if you're going to do this (that the event will > > > persist in both parent and child, need their own coordination to handle race > > > conditions for I/O events, etc). > > > > > > I am a little uncertain of what this means for events with udata/ext[2]/ext[3] > > > attached, but that would seem to be just another problem for the application. > > > > What problems do you see with udata/ext? With kernel-colored glasses on, > > they are just opaque binaries, we should not care about them. In fact, > > I would expect them to be more problematic for execve(), if somebody > > cares about pointer semantic. > > I was reasoning through kqueue across an rfork(RFPROC|RFMEM) and didn't really end > up with a complete thought here, sorry. It had occurred to me that these are good > candidates for less-obvious pointers to shared data structures to wander across, but > application developer should know what they're doing if they're trying to combine the > two for some reason. > Meantime I updated the review with the proper implementation of the timer knotes copying.