Re: POSIX shared memory and dying jails

From: James Gritton <jamie_at_freebsd.org>
Date: Sat, 26 Jun 2021 10:11:08 -0700
On 2021-06-26 08:13, Mark Johnston wrote:
> On Fri, Jun 25, 2021 at 08:08:31PM -0700, James Gritton wrote:
>> On 2021-06-25 09:58, Michael Gmelin wrote:
>> > On Fri, 25 Jun 2021 09:19:05 -0700
>> > James Gritton <jamie_at_freebsd.org> wrote:
>> >
>> >> On 2021-06-25 07:41, Michael Gmelin wrote:
>> >> > It seems like non-anonymous POSIX shared memory is not freed
>> >> > automatically when a jail is removed and keeps it in a dying state,
>> >> > until the shared memory segment is deleted manually.
>> >> >
>> >> > See below for the most basic example:
>> >> >
>> >> >     [root_at_jailhost ~]# jail -c path=/ command=/bin/sh
>> >> >     # posixshmcontrol create /removeme
>> >> >     # exit
>> >> >     [root_at_jailhost ~]# jls -dv -j shmtest dying
>> >> >     true
>> >> >
>> >> > So at this point, the jail is stuck in a dying state.
>> >> >
>> >> > Checking POSIX shared memory segments shows the shared memory
>> >> > segment which is stopping the jail from crossing the Styx:
>> >> >
>> >> >     [root_at_jailhost ~]# posixshmcontrol list
>> >> >     MODE            OWNER   GROUP   SIZE    PATH
>> >> >     rw-------       root    wheel   0       /removeme
>> >> >
>> >> > After removing the shared memory segment manually...
>> >> >
>> >> >     [root_at_jailhost ~]# posixshmcontrol rm /removeme
>> >> >
>> >> > the jail passes away peacefully:
>> >> >
>> >> >     [root_at_jailhost ~]#  jls -dv -j shmtest dying
>> >> >     jls: jail "shmtest" not found
>> >> >
>> >> > I wonder if it wouldn't make sense to always remove POSIX shared
>> >> > memory created by a jail automatically when it's removed.
> 
> Cyril ran into exactly this problem when adding racct support for POSIX
> shared memory.  In particular, we'd like to be able to limit the number
> and total size of POSIX shared memory objects belonging to a given 
> jail.
> 
> Aside from the problem of the leaked credential, the current behaviour
> of not destroying objects created in a jail makes accounting more
> complicated.  One possibility is to somehow re-home any shm objects 
> that
> exist when the jail is destroyed, and transfer the accounting as well.
> 
>> >>
>> >> That does seem reasonable, though it would take some bookkeeping to do
>> >> right.  There is currently no concrete idea of a jail's ownership of a
>> >> POSIX shm object, as it uses only uid and gid for access permissions,
>> >> same as files.  The tie to the jail is in the underlying vm_object,
>> >> which holds a cred that references the jail - that seems to be what's
>> >> keeping the jail from going away.
>> >
>> > Interesting - I was wondering how that worked, thanks. Would there by a
>> > way to cut that tie somehow (for use cases that deliberately want to
>> > leave the shared memory segment behind)?
>> 
>> It might be possible to change vm_object's cred to one that has the
>> same uid/gid but is outside of the jail.  The big argument against
>> that is that I don't know enough about the VM subsystem to go poking
>> about there lightly.
> 
> When we looked at this problem, it seemed the intent was for POSIX
> shared memory objects to behave like filesystem objects: jailed
> processes can create shm objects in the jail's filesystem namespace, 
> and
> such objects are not removed when the jail goes away.  Moreover, jails
> sharing a filesystem root also share a POSIX shm namespace.
> 
> I think the semantic of tying shm objects to the lifetime of the
> creator's jail is more natural, even though it diverges from the
> treatment of filesystem objects.  It also avoids the problem of having
> to figure out whether it's ok to switch the object's credential.

I prefer that one too.  It's cleaner in execution, and fits with the
idea of jails being vm-lite - when the jail goes away, so do the
ephemeral things it owns.

>>  From the user perspective, you can keep such objects with a little
>> planning ahead: always create them outside of the jail, though using
>> the jail's path in the name (which is how a non-jailed process would
>> refer to it anyway).  Then jailed processes can access the shared
>> memory, but won't own it.
> 
> If a process in the host holds a jailed object open, and the jail is
> destroyed (unlinking the object from the jail's namespace), would the
> process' reference still cause the jail to linger in the dying state?

Yes, it would remain dying until all references to the object are
gone.  But I'm fine with that situation.  Even if it's for an
arbitrarily long time, it's not bad behavior unless dying jails are
just stuck with no reasonable chance of going away.

- Jamie
Received on Sat Jun 26 2021 - 17:11:08 UTC

Original text of this message