Re: POSIX shared memory, jails, and (lack of) limits

From: Michael Gmelin <freebsd_at_grem.de>
Date: Mon, 02 Aug 2021 20:38:54 UTC

> On 2. Aug 2021, at 21:40, Mark Johnston <markj@freebsd.org> wrote:
> 
> On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote:
>>> On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote:
>>> 
>>> 
>>>> On 2. Aug 2021, at 15:56, Konstantin Belousov <kostikbel@gmail.com> wrote:
>>>> 
>>>> On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote:
>>>>> Hi,
>>>>> 
>>>>> I've been playing a bit with POSIX shared memory and, unlike for SysV
>>>>> shared memory, I couldn't find any way to limit its use by jails.
>>>>> 
>>>>> First, I looked at racct/rctl, but there is no resource for POSIX shared
>>>>> memory and memoryuse/vmemoryuse don't seem to have an effect (which
>>>>> makes sense).
> 
> Cyril has written a few patches for racct, including one which includes
> POSIX shared memory objects in rctl's "nshm" and "shmsize" resources,
> which currently only apply to SysV shm objects:
> https://reviews.freebsd.org/D30775
> We plan to get them committed in the next couple of weeks.
> 
> "memoryuse" and "vmemoryuse" only count objects that are mapped into
> some process' address space, so they're not the right way to limit
> allocations of POSIX shm objects, see below.
> 
>>>>> 
>>>>> Then I checked if there are jail parameters that could help, but there
>>>>> doesn't seem to be anything like "allow.sysvshm" for POSIX shared
>>>>> memory to limit access to the feature.
>>>>> 
>>>>> So, unless I'm missing something, it seems like all jails on a system
>>>>> have unlimited access to POSIX shared memory and therefore any single
>>>>> jail can use up the jailhost's virtual memory until the jailhost comes
>>>>> to a grinding halt.
>>>>> 
>>>>> I wrote a little test program that keeps allocating POSIX shared memory
>>>>> inside of a jail and it can easily bring the host down to its knees:
>>>>> 
>>>>> login: Aug  2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0,
>>>>> was killed: out of swap space
>>>>> Aug  2 12:12:10 test init[11827]: getty repeating too quickly on port
>>>>> /dev/ttyu0, sleeping 30 secs
>>>>> Aug  2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was
>>>>> killed: out of swap space
>>>> 
>>>> Posix shm is limited by the swap accounting.  For non-jail consumers,
>>>> it is per-uid RLIMIT_SWAP.  I do not know if other mechanisms make
>>>> RLIMIT_SWAP per-jail per-uid.
> 
> racct/rctl provides the "swapuse" resource which should account for
> this.  It does not apply to largepage objects, though.

I tried to limit swapuse for a jail and it doesn’t limit posix shared memory created within the jail (I can still create shared memory segments within the jail until the machine runs out of virtual memory).

Should I share the test case to make sure I didn’t mess up?

-m



> 
>>> Unfortunately it seems like POSIX shared memory is not linked to the jail it was created in (we discussed this on this list in June and I created a few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t have the desired effect ^_^).
>>> 
>> 
>> In what sense 'not linked'?  The backing vm_object is created with the
>> current process credentials, which are jailed if creator belongs to a jail.
> 
> I believe the problem that Michael is referring to is that named POSIX
> shm objects created within a jail do not disappear when the jail is
> destroyed, and the vm object cred reference is leaked.  But this is
> unrelated to swap space accounting.