Re: init / supervisor in jail
- Reply: Andriy Gapon : "Re: init / supervisor in jail"
- In reply to: Konstantin Belousov : "Re: init / supervisor in jail"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 11 Nov 2025 10:36:08 UTC
On Tue, 11 Nov 2025 at 08:10, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Mon, Nov 10, 2025 at 11:16:01AM -0800, James Gritton wrote: > > On 2025-11-10 04:27, Andriy Gapon wrote: > > > I played a little bit with OCI containers and podman. > > > I had a hiccup with one specific container created for Docker / Linux. > > > Its difference from other containers is that it uses multiple daemons > > > and a supervisor process to take care of them. That particular > > > supervisor is another variation of "advanced init", it's called s6. > > > Apparently, it is relatively popular for container use (not sure about > > > host systems). Probably other alternatives can be / are used for that > > > purpose as well. > > > > > > I think that this is what a supervisor in a container needs: > > > 1. its PID is 1; > > > 2. orphaned processes get re-parented to it. > > > > > > I think that (1) is not a hard requirement, but it's an easy way to > > > check if the process would be able to work as init. > > > Also, some other processes might expect to find init at PID 1, but I am > > > not sure about that. > > > > > > (2) is important for doing the supervising (at least, when > > > procctl(PROC_REAP*) is not used) . > > > > > > I think that on Linux they have separate PID namespace per container, > so > > > the first process to run naturally gets PID 1. > > > > > > I think that per-container PID namespace may be an overkill. > > > Maybe there is a way to make PID 1 special without going that way. > > > > > > E.g., a jail could record the first process it runs. > > > We can patch up getpid() to return 1 for that process. > > > Also, we could patch up the process lookup to return the first process > > > in the jail for PID 1. > > > > > > Re-parenting to the "jail init" sounds harder but should be possible as > > > well (e.g., using PROC_REAP). > This is why PROC_REAP was initially implemented: to allow something to > manage zombies of all its descendants, for surrogate init processes. > Later it appeared that at least timeout(1) benefits from it as well. > > A side note: machinery to reliably signal all specific descendands of > the reaper is way too complicated. > > > > > > > Not sure what to do if the "jail init" dies... should all processes in > > > the jail get killed and the jail should die as well (unless > persistent)? > > > > > > This proposal sounds like a kludge but it could be a shortcut to > support > > > more Linux containers and to allow similar FreeBSD jails / containers > > > with alternative init-s / supervisors. > > > > Far from being a kludge, I think it's a feature we need, and one at the > top > > of my list. Forcing it to look like PID 1 from jailed perspective is > > definitely doable (and something I'd done outside of the project a decade > > ago). In addition to those two requirements, I would add one that > answers > > your last question: > > > > 3. signals to init and reboot(2) work as they would on the host side. > > > > A jailed reboot would kill all processes and restart rc, and possibly do > > other kernel-side cleanups yet to be clearly defined. A jailed halt > would > > remove the jail. A jailed single-user mode could exist where instead of > > init spawning a shell, it just sits around while the system has a chance > to > > jexec into it. > > > > init handles various signals by rebooting/halting/etc, and it should be > able > > to do that as it does now, by calling reboot(2), directing the kernel to > do > > what it needs to with the jail. If init goes away, it's probably like a > > halt and removes the jail. > > I completely disagree with this design, I insist that init(8) should > stay as full system init, and reboot(2) should be kept as the machine > reboot. > > For jail-contained inits, it should be a separate/dedicated implementation > of init. It would be aware of its usage model, in particular, it should > proclaim itself the reaper, it should use reaper signalling facilities > for killing processes when shutting the container down (not ever tweaking > the reboot(2)). It must not have the ugly protection against signals > delivery we have for real init. > Almost a side note but we do have catatonit in the ports tree. This uses PROC_REAP to clear up zombie processes and doesn't need to be PID 1. It would be nice to have a more full-featured BSD licensed jailable init though.