Re: POSIX shared memory, jails, and (lack of) limits

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 02 Aug 2021 19:03:27 UTC
On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote:
> 
> 
> > On 2. Aug 2021, at 15:56, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
> >> Hi,
> >> 
> >> I've been playing a bit with POSIX shared memory and, unlike for SysV
> >> shared memory, I couldn't find any way to limit its use by jails.
> >> 
> >> First, I looked at racct/rctl, but there is no resource for POSIX shared
> >> memory and memoryuse/vmemoryuse don't seem to have an effect (which
> >> makes sense).
> >> 
> >> Then I checked if there are jail parameters that could help, but there
> >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
> >> memory to limit access to the feature.
> >> 
> >> So, unless I'm missing something, it seems like all jails on a system
> >> have unlimited access to POSIX shared memory and therefore any single
> >> jail can use up the jailhost's virtual memory until the jailhost comes
> >> to a grinding halt.
> >> 
> >> I wrote a little test program that keeps allocating POSIX shared memory
> >> inside of a jail and it can easily bring the host down to its knees:
> >> 
> >>  login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
> >>  was killed: out of swap space
> >>  Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
> >>  /dev/ttyu0, sleeping 30 secs
> >>  Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
> >>  killed: out of swap space
> > 
> > Posix shm is limited by the swap accounting.  For non-jail consumers,
> > it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
> > RLIMIT_SWAP per-jail per-uid.
> 
> Unfortunately it seems like POSIX shared memory is not linked to the jail it was created in (we discussed this on this list in June and I created a few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t have the desired effect ^_^).
> 

In what sense 'not linked'?  The backing vm_object is created with the
current process credentials, which are jailed if creator belongs to a jail.