Re: POSIX shared memory and dying jails

Reply: James Gritton : "Re: POSIX shared memory and dying jails"
In reply to: James Gritton : "Re: POSIX shared memory and dying jails"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 26 Jun 2021 15:13:05 UTC
On Fri, Jun 25, 2021 at 08:08:31PM -0700, James Gritton wrote:
> On 2021-06-25 09:58, Michael Gmelin wrote:
> > On Fri, 25 Jun 2021 09:19:05 -0700
> > James Gritton <jamie@freebsd.org> wrote:
> > 
> >> On 2021-06-25 07:41, Michael Gmelin wrote:
> >> > It seems like non-anonymous POSIX shared memory is not freed
> >> > automatically when a jail is removed and keeps it in a dying state,
> >> > until the shared memory segment is deleted manually.
> >> >
> >> > See below for the most basic example:
> >> >
> >> >     [root@jailhost ~]# jail -c path=/ command=/bin/sh
> >> >     # posixshmcontrol create /removeme
> >> >     # exit
> >> >     [root@jailhost ~]# jls -dv -j shmtest dying
> >> >     true
> >> >
> >> > So at this point, the jail is stuck in a dying state.
> >> >
> >> > Checking POSIX shared memory segments shows the shared memory
> >> > segment which is stopping the jail from crossing the Styx:
> >> >
> >> >     [root@jailhost ~]# posixshmcontrol list
> >> >     MODE            OWNER   GROUP   SIZE    PATH
> >> >     rw-------       root    wheel   0       /removeme
> >> >
> >> > After removing the shared memory segment manually...
> >> >
> >> >     [root@jailhost ~]# posixshmcontrol rm /removeme
> >> >
> >> > the jail passes away peacefully:
> >> >
> >> >     [root@jailhost ~]#  jls -dv -j shmtest dying
> >> >     jls: jail "shmtest" not found
> >> >
> >> > I wonder if it wouldn't make sense to always remove POSIX shared
> >> > memory created by a jail automatically when it's removed.

Cyril ran into exactly this problem when adding racct support for POSIX
shared memory.  In particular, we'd like to be able to limit the number
and total size of POSIX shared memory objects belonging to a given jail.

Aside from the problem of the leaked credential, the current behaviour
of not destroying objects created in a jail makes accounting more
complicated.  One possibility is to somehow re-home any shm objects that
exist when the jail is destroyed, and transfer the accounting as well.

> >> 
> >> That does seem reasonable, though it would take some bookkeeping to do
> >> right.  There is currently no concrete idea of a jail's ownership of a
> >> POSIX shm object, as it uses only uid and gid for access permissions,
> >> same as files.  The tie to the jail is in the underlying vm_object,
> >> which holds a cred that references the jail - that seems to be what's
> >> keeping the jail from going away.
> > 
> > Interesting - I was wondering how that worked, thanks. Would there by a
> > way to cut that tie somehow (for use cases that deliberately want to
> > leave the shared memory segment behind)?
> 
> It might be possible to change vm_object's cred to one that has the
> same uid/gid but is outside of the jail.  The big argument against
> that is that I don't know enough about the VM subsystem to go poking
> about there lightly.

When we looked at this problem, it seemed the intent was for POSIX
shared memory objects to behave like filesystem objects: jailed
processes can create shm objects in the jail's filesystem namespace, and
such objects are not removed when the jail goes away.  Moreover, jails
sharing a filesystem root also share a POSIX shm namespace.

I think the semantic of tying shm objects to the lifetime of the
creator's jail is more natural, even though it diverges from the
treatment of filesystem objects.  It also avoids the problem of having
to figure out whether it's ok to switch the object's credential.

>  From the user perspective, you can keep such objects with a little
> planning ahead: always create them outside of the jail, though using
> the jail's path in the name (which is how a non-jailed process would
> refer to it anyway).  Then jailed processes can access the shared
> memory, but won't own it.

If a process in the host holds a jailed object open, and the jail is
destroyed (unlinking the object from the jail's namespace), would the
process' reference still cause the jail to linger in the dying state?