From nobody Mon Aug 02 19:40:00 2021 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2E44912B8925 for ; Mon, 2 Aug 2021 19:40:00 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GdpHh0D2Hz4vGD for ; Mon, 2 Aug 2021 19:40:00 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x732.google.com with SMTP id c18so17759333qke.2 for ; Mon, 02 Aug 2021 12:40:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=etyVdWkVfqJmeIq6fYYlQGq3EVl9Heh9rmU8vy4vqYw=; b=ieNRC0SPyEy1Zoc0F1b7eXx8++FCEOU0xVSgvY9Cum8laCQT1lTpX6pSd+7iKc0FVN tnt8fPSvo1hzQgXDiISsjVx0NL+bs+BcNcP+vLsRLNEQ8rHaMqbAD6mSPWar4oi9VvRW mIuSHr/CgfWEpTz+8Lr3nkUTMJk9SbH2JQuji3dnIuRxjutI4gwO5Lo3gm/cw5Ycy3ml krifxjz1j/zDdHMak2I4WdxOSuCWaaEWMb/k/yX33stQ9vIzWidyPAFV6Wjv0FmXQTnM OFbbUx9HCcmEQrBaxgZFXgbPceFn4GPfIKgLxzt/Mewhxswcs4vAR00H1oAFoRdo3j85 mgHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=etyVdWkVfqJmeIq6fYYlQGq3EVl9Heh9rmU8vy4vqYw=; b=XhKuSl5134fQJ2W0/cd5VL/EQvk+n/1T79CSrRPNo3rG9qADUBzix9xj8Zl3HolSMR K2RsEdGEOw4iuekQCqHr0OG5ugiMMzQl1ZTBwp6XZYLsPZQ2iw42Ab1ar1FkYjXW4oAH ecRLExDCLp8Em5m8wDmMWoVTo7hfTMPw/U8dE5iUh8QhPWdjxr82H5W/uJXR974Jn0mQ ZUoak0hEYljzkw0PhP+dZPBR/16muSJizA8VHuXk3yrB2RevYR7RRfEUBj2Clk+eexBc bgopPTZ0YHxFJFh/q3gaBIfSo3gCVed8985vzHLqZn0Usyeg7696DTxbLa1b+R5+3n2P tQ2g== X-Gm-Message-State: AOAM5311suYeqKblxxiQnXCsWTl67DJlOiysoj7VAYiE07SHnwW9NTjP +fAPi3C9NHPu/i6ZSv6WLOA= X-Google-Smtp-Source: ABdhPJxuz1mWWPWKw34ih1GaTk44K8VjP/2YeyRXaz76V9f4q9Vw7HpgqQ5Kjw8QAXB4WfnmgsgX8A== X-Received: by 2002:a37:a3c5:: with SMTP id m188mr11712596qke.307.1627933199624; Mon, 02 Aug 2021 12:39:59 -0700 (PDT) Received: from nuc ([142.126.162.193]) by smtp.gmail.com with ESMTPSA id x7sm6306616qki.102.2021.08.02.12.39.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Aug 2021 12:39:59 -0700 (PDT) Date: Mon, 2 Aug 2021 15:40:00 -0400 From: Mark Johnston To: Konstantin Belousov Cc: Michael Gmelin , jail@freebsd.org Subject: Re: POSIX shared memory, jails, and (lack of) limits Message-ID: References: List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 4GdpHh0D2Hz4vGD X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wrote: > On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote: > > > > > > > On 2. Aug 2021, at 15:56, Konstantin Belousov wrote: > > > > > > On Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote: > > >> Hi, > > >> > > >> I've been playing a bit with POSIX shared memory and, unlike for SysV > > >> shared memory, I couldn't find any way to limit its use by jails. > > >> > > >> First, I looked at racct/rctl, but there is no resource for POSIX shared > > >> memory and memoryuse/vmemoryuse don't seem to have an effect (which > > >> makes sense). Cyril has written a few patches for racct, including one which includes POSIX shared memory objects in rctl's "nshm" and "shmsize" resources, which currently only apply to SysV shm objects: https://reviews.freebsd.org/D30775 We plan to get them committed in the next couple of weeks. "memoryuse" and "vmemoryuse" only count objects that are mapped into some process' address space, so they're not the right way to limit allocations of POSIX shm objects, see below. > > >> > > >> Then I checked if there are jail parameters that could help, but there > > >> doesn't seem to be anything like "allow.sysvshm" for POSIX shared > > >> memory to limit access to the feature. > > >> > > >> So, unless I'm missing something, it seems like all jails on a system > > >> have unlimited access to POSIX shared memory and therefore any single > > >> jail can use up the jailhost's virtual memory until the jailhost comes > > >> to a grinding halt. > > >> > > >> I wrote a little test program that keeps allocating POSIX shared memory > > >> inside of a jail and it can easily bring the host down to its knees: > > >> > > >> login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, > > >> was killed: out of swap space > > >> Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port > > >> /dev/ttyu0, sleeping 30 secs > > >> Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was > > >> killed: out of swap space > > > > > > Posix shm is limited by the swap accounting. For non-jail consumers, > > > it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make > > > RLIMIT_SWAP per-jail per-uid. racct/rctl provides the "swapuse" resource which should account for this. It does not apply to largepage objects, though. > > Unfortunately it seems like POSIX shared memory is not linked to the jail it was created in (we discussed this on this list in June and I created a few PRs about that), so per jail rctl rules don’t apply (and limiting uid 0 won’t have the desired effect ^_^). > > > > In what sense 'not linked'? The backing vm_object is created with the > current process credentials, which are jailed if creator belongs to a jail. I believe the problem that Michael is referring to is that named POSIX shm objects created within a jail do not disappear when the jail is destroyed, and the vm object cred reference is leaked. But this is unrelated to swap space accounting.