Re: POSIX shared memory and dying jails

From: James Gritton <jamie_at_freebsd.org>
Date: Sat, 26 Jun 2021 03:08:31 UTC
On 2021-06-25 09:58, Michael Gmelin wrote:
> On Fri, 25 Jun 2021 09:19:05 -0700
> James Gritton <jamie@freebsd.org> wrote:
> 
>> On 2021-06-25 07:41, Michael Gmelin wrote:
>> > It seems like non-anonymous POSIX shared memory is not freed
>> > automatically when a jail is removed and keeps it in a dying state,
>> > until the shared memory segment is deleted manually.
>> >
>> > See below for the most basic example:
>> >
>> >     [root@jailhost ~]# jail -c path=/ command=/bin/sh
>> >     # posixshmcontrol create /removeme
>> >     # exit
>> >     [root@jailhost ~]# jls -dv -j shmtest dying
>> >     true
>> >
>> > So at this point, the jail is stuck in a dying state.
>> >
>> > Checking POSIX shared memory segments shows the shared memory
>> > segment which is stopping the jail from crossing the Styx:
>> >
>> >     [root@jailhost ~]# posixshmcontrol list
>> >     MODE            OWNER   GROUP   SIZE    PATH
>> >     rw-------       root    wheel   0       /removeme
>> >
>> > After removing the shared memory segment manually...
>> >
>> >     [root@jailhost ~]# posixshmcontrol rm /removeme
>> >
>> > the jail passes away peacefully:
>> >
>> >     [root@jailhost ~]#  jls -dv -j shmtest dying
>> >     jls: jail "shmtest" not found
>> >
>> > I wonder if it wouldn't make sense to always remove POSIX shared
>> > memory created by a jail automatically when it's removed.
>> 
>> That does seem reasonable, though it would take some bookkeeping to do
>> right.  There is currently no concrete idea of a jail's ownership of a
>> POSIX shm object, as it uses only uid and gid for access permissions,
>> same as files.  The tie to the jail is in the underlying vm_object,
>> which holds a cred that references the jail - that seems to be what's
>> keeping the jail from going away.
> 
> Interesting - I was wondering how that worked, thanks. Would there by a
> way to cut that tie somehow (for use cases that deliberately want to
> leave the shared memory segment behind)?

It might be possible to change vm_object's cred to one that has the
same uid/gid but is outside of the jail.  The big argument against
that is that I don't know enough about the VM subsystem to go poking
about there lightly.

 From the user perspective, you can keep such objects with a little
planning ahead: always create them outside of the jail, though using
the jail's path in the name (which is how a non-jailed process would
refer to it anyway).  Then jailed processes can access the shared
memory, but won't own it.

- Jamie