How to implement jail-aware SysV IPC (with my nasty patch)

Mateusz Guzik mjguzik at gmail.com
Mon Jun 15 21:44:35 UTC 2015


On Tue, Jun 16, 2015 at 03:45:34AM +0900, kikuchan at uranus.dti.ne.jp wrote:
> On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik <mjguzik at gmail.com> wrote:
> > Fundamentally the basic question is how does the implementation cope
> > with processes having sysvshm mappings obtained from 2 different jails
> > (provided they use different sysvshms).
> > 
> > Preferably the whole business would be /prevented/. Prevention mechanism
> > would have to deal with shared address spaces (rfork(2) + RFMEM),
> > threads and pre-existing mappings.
> > 
> > The patch posted here just puts permission checks in several places,
> > while leaving the namespace shared, which I find to be a user-visible
> > hack with no good justification. There is also no analysis how this
> > behaves when presented with aforementioned scenario. Even if it turns
> > out the resut is harmless with resulting code, this leaves us with a
> > very error-prone scheme.
> > 
> > There is no technical problem adding a pointer to struct prison and
> > dereferencing it instead of current global vars. Adding proper sysctls
> > dumping the content for given jail is trivial and so is providing
> > resource limits when creating a first-level jail with a separate
> > sysvshm. Something which cannot be as easily achieved with the patch in
> > question.
> 
> Could you try the latest patch, please?
> I justify user-visibility, make it hierarchical jail friendly, and use EINVAL instead of EACCES to conceal information leak.
> https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed)
> 
> 
> I realized my method is a bit better, when I'm trying to port/write the real namespace separation.
> Let me explain (again) why I choose this method for sysv ipc, and could you tell me how it should be, please?
> 
> struct shmmap_state {
> 	vm_offset_t va;
> 	int shmid;
> };
> 
> In sysv_shm.c, struct shmmap_state, exist per process as p->p_vmspace->vm_shm, is a lookup-table for va -> shm object lookup.
> The shmmap_state entry holds a reference (here, shmid) to shm object for further detach, and entries are simply copied on fork.
> 
> If you split namespace (includes shmid space) completely, shmid would be no longer a unique identifier for IPC object in kernel.
> To make it unique, adding a reference to prison into shmmap_state like this;
> 
> struct shmmap_state {
> 	vm_offset_t va;
> 	struct prison *prison;
> 	int shmid;
> };
> 
> would be bad idea, because after a process calls jail_attach(), the process holds a reference to another (creator) prison, or copy the IPC object completely on every jail_attach() occurs?

As I explained in the previous thread, with a separate namespace it is a
strict requirement to prevent sharing of sysvshm mappings. With the
requirement met, there is no issue. As you will see later in the mail,
even your approach would benefit greatly from having such a restriction.


> How do you deal with hierarchical jail?
> 

If proper resource limiting for hierarchical jails is implemented, the
new jail either inherits or gets a new namespace, depending on used
options.

With only simplistic support first level jails can inherit or get a new
namespace, the rest must inherit.

There is no issue here due to sharing prevention.

> My method didn't touch anything about the mapping stuff, thus it behaves exactly the same as current FreeBSD behave on this point.
> 

Sure it did. As you noticed yourself it makes sense to clean up sysvshms
on jail destruction, which you do in sysvshm_cleanup_for_prison_myhook.

Your code does:
               if ((shmseg->u.shm_perm.mode & SHMSEG_ALLOCATED) &&
                   shmseg->cred->cr_prison == pr) {
                       shm_remove(shmseg, i);
....

which differs from what is executed by kern_shmdt_locked.

Now let's consider a process which rforks and shared the address space
with it's child. The child enters a jail and grabs a sysvshm mapping,
then exits and we kill the jail.

In effect we got a process with an address space which used a mapping
created in a now-destroyed jail. Is this situation problematic? I don't
see any anlysis provided.

Maybe it is, maybe it so happens it is not. The mere posibility of this
scenario needlessly complicates maintenance, and such a scenario has
likely no practical purpose. As such, it is best /prevented/.

With it prevented there is nothing positive about your approach that I
could see.

> I'm not sure I could understand properly what the shared address space problem is, (Could someone help me to understand, perhaps in code?)
> and, I'm not sure whether the current FreeBSD has the shared address space problem for sysvshm combined with jails.
> If it has the problem, unfortunately my patch doesn't provide any solution for that,
> but if not, my patch doesn't have the problem either, because I didn't change code structure.
> 

As I mentioned, you sure did.

I don't know if there are any serious problems /as it is/ and I'm too
lazy to check. I surely expect any patch doing sysvshm for jails to be
provided with an anslysis of its behaviour in that regard though.

> The patch just fixes key_t collision for jails, nothing more.
> So, the patch is harmless for non-jail user, and I believe it's useful for jail user using allow.sysvipc=true.
> 
> 
> BTW, What do you think about the following design for jail-aware sysvipc?
> 
> > - IPC objects created on parent jail, are invisible to children.
> > - IPC objects created on neighbor jail, are also invisible each other.
> > - IPC objects craeted on child jail, are VISIBLE from parent.
> > - IPC key_t spaces are separated between jails. If you see the key_t named object from parent, it's shown as IPC_PRIVATE.
> 

How about the following: the jail decided whether it wants to share a
namespace with a particular child (and by extension grandchildren and so
on). Done.

There is nothing complicated to do here unless you want to try out named
namespace which you e.g. assign to different jails on the same level.

-- 
Mateusz Guzik <mjguzik gmail.com>


More information about the freebsd-virtualization mailing list