new feature: private IPC for every jail

Tue Apr 4 10:47:15 UTC 2006

On Tue, 4 Apr 2006, Peter Jeremy wrote:

> On Mon, 2006-Apr-03 16:34:59 +0100, Robert Watson wrote:
>> (2) The name space model for system v ipc is flat, so while it's desirable
>> to
>>    allow the administrator in the host environment to monitor and control
>>    resource use in the jail (for example, delete allocated but unused
>>    segments), doing that requires developing an administrative model for
>>    it.
>
> The SysV SHM name space is made up of a 32-bit user-selected key which is 
> mapped into a 32-bit (system chosen) identifier, which (on FreeBSD) is made 
> up of a 16-bit pool identifier (in the range 0..shmmni-1) and a 16-bit 
> generation counter.
>
> At the expense of restricting shmmni, the generation counter and JAIL_MAX, 
> it would seem possible to embed prison.pr_id into the shmid and treat pr_id 
> as an (implicit) part of the key - insisting they must match for jailed 
> processes.  Since the name space remains the same, ipcs and ipcrm would not 
> be affected and a non-jailed ipcrm could delete jailed IPC by identifier.
>
> On the surface, this approach looks easier than having a distinct name space 
> associated with each prison (as per kern/48471) and has the advantage of 
> allowing non-jailed processes to manage jailed IPC. The disadvantage is 
> restricting the ranges of various counters - though I believe they are 
> overly generous by default.
>
> This doesn't really address the problem of SysV IPC and jails becoming more 
> intimately entwined.

Hmm.  This sounds like it might be workable.  To make sure I understand your 
proposal:

- We add a new prison ID field to the in-kernel description of each segment,
   semaphore, message queue, etc.  This is initialized to the prison ID of the
   process creating the object at the time of creation.

- shmget(), et al, will, in addition to matching the key when searching for an
   existing object, will also attempt to match the prison ID of the object to
   the process.  For the sake of completeness, we will use prison ID 0 for
   unjailed processes (or something along those lines).  This guarantees that
   two jails, or even the host and a jail, will never receive an ID already
   allocated to another jail, and in particular, not an ID for an object from
   another jail with the same key as might be used in the current jail.

- shmat(), et al, will perform an access control check to confirm that if a
   process is jailed, its prison ID matches that of the object.

Is it necessary, as you suggest, to change the IPC ID name space at all?  I 
assume applications do consistently use shmget() to look up IDs, and that they 
can't/don't make assumptions about long-term persistence of those mappings 
across boot (which is effectively what a jail restart is?  Is the behavior of 
IPXSEQ_TO_IPCID() something that has documented or relied on properties, or 
are we free to perform a mapping from a name (key) to an object (id) in any 
way we choose?

I guess another change is also needed:

- At jail termination, we GC all resources with the prison ID in question.

This prevents a future jail from turning up with the same ID and seeing old 
shared memory (etc) segments.

Robert N M Watson