From nobody Sat Jun 26 15:13:05 2021 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C181911D769C for ; Sat, 26 Jun 2021 15:13:06 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GBy6p4v1tz4rQs; Sat, 26 Jun 2021 15:13:06 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qt1-x82e.google.com with SMTP id w26so9813306qto.13; Sat, 26 Jun 2021 08:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Lp8KDz5LmTTACFh0hrQcBFRtelpEXld1W2cI0r0ZBG0=; b=ChHYu7+UYL3bTbOTVHIUqj8Y6GDAfv9mdJ4PQZPLkUZvjNrvcX35LpAI/apSTtPf0D dZ0LNUo5j7cbExjJlg+yX15rMdKjJ0mrGxLuT9EHa2UjZjIoj8AcVsYhUhEnJhbXF45+ rAid1+z2diifQ8u+/mfhB8JWbZnEV7K+s3I1pDYGPxNY5W4d/7sh170HvgSbYzhaHsJC q+lNkIiFL87vm/nni3+w+yFxoQrElmkOgUeMHkI9zWGHujQ8crTzj+lB/nCuJZE6zFUT 8BXO0I4+ZHytWsQhoOb+M/vbVvPl2DAmWGmoAuTYtxS8j1QRWf22oA+AlVOsERfetb3M GwUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=Lp8KDz5LmTTACFh0hrQcBFRtelpEXld1W2cI0r0ZBG0=; b=UcSajB269fQRpw4Np5j5o8rOhciaZsA0oD6eZn2OPe9dX2R3uJ7MoY/TG8m6OhXQ9L gfYHx+8fxBaxod5TgEdOCNqadnZquvk9L9Ui6tRVynnZ/iJeP3/Vegu3x7BV+oKTckQ+ OI5IBork3yD1c1BLGxcvGMoYPvvvk8zjPBjiYysWV4wkORJg8vQd1tnmluT3tCwroibv LKt/bsIfGlraqLEIgKbtlt0852OXsOZj4wofzsdVHym/Yld3BsRUZUAp7mf3PY3dfWwV OJAm/Jo039VorFV0wMDdwB6kedy9PDUWKugcSWsX+qyly1SzgtdJoJ8FgmYPArES9weF a7FQ== X-Gm-Message-State: AOAM531idyRt8H+o5gOqhJdAihz7id2HyWLr9hOPDhmL2odrf45ON3iF nmoPvPyzrsCOv1gu+lPLEz6SHgM9acUCug== X-Google-Smtp-Source: ABdhPJycRkhcYrDy61peW8dK7BZoHX8FyybrJfRyG4Zhk9wOd4f/ihnlcC6dIiGJYfdVSuD01PlqKg== X-Received: by 2002:ac8:5c48:: with SMTP id j8mr14239896qtj.154.1624720385238; Sat, 26 Jun 2021 08:13:05 -0700 (PDT) Received: from nuc (bras-base-toroon0560w-grc-73-184-146-17-79.dsl.bell.ca. [184.146.17.79]) by smtp.gmail.com with ESMTPSA id e6sm2629909qkg.12.2021.06.26.08.13.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 26 Jun 2021 08:13:04 -0700 (PDT) Date: Sat, 26 Jun 2021 11:13:05 -0400 From: Mark Johnston To: James Gritton Cc: jail@freebsd.org, Michael Gmelin , cyril@freebsdfoundation.org Subject: Re: POSIX shared memory and dying jails Message-ID: References: <20210625164100.73c71055@bsd64.grem.de> <03809b2655a40134dd802386afa6be7d@freebsd.org> <20210625185859.40fead46@bsd64.grem.de> <75475234c76c97c67a8bd2525669c650@freebsd.org> List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <75475234c76c97c67a8bd2525669c650@freebsd.org> X-Rspamd-Queue-Id: 4GBy6p4v1tz4rQs X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, Jun 25, 2021 at 08:08:31PM -0700, James Gritton wrote: > On 2021-06-25 09:58, Michael Gmelin wrote: > > On Fri, 25 Jun 2021 09:19:05 -0700 > > James Gritton wrote: > > > >> On 2021-06-25 07:41, Michael Gmelin wrote: > >> > It seems like non-anonymous POSIX shared memory is not freed > >> > automatically when a jail is removed and keeps it in a dying state, > >> > until the shared memory segment is deleted manually. > >> > > >> > See below for the most basic example: > >> > > >> > [root@jailhost ~]# jail -c path=/ command=/bin/sh > >> > # posixshmcontrol create /removeme > >> > # exit > >> > [root@jailhost ~]# jls -dv -j shmtest dying > >> > true > >> > > >> > So at this point, the jail is stuck in a dying state. > >> > > >> > Checking POSIX shared memory segments shows the shared memory > >> > segment which is stopping the jail from crossing the Styx: > >> > > >> > [root@jailhost ~]# posixshmcontrol list > >> > MODE OWNER GROUP SIZE PATH > >> > rw------- root wheel 0 /removeme > >> > > >> > After removing the shared memory segment manually... > >> > > >> > [root@jailhost ~]# posixshmcontrol rm /removeme > >> > > >> > the jail passes away peacefully: > >> > > >> > [root@jailhost ~]# jls -dv -j shmtest dying > >> > jls: jail "shmtest" not found > >> > > >> > I wonder if it wouldn't make sense to always remove POSIX shared > >> > memory created by a jail automatically when it's removed. Cyril ran into exactly this problem when adding racct support for POSIX shared memory. In particular, we'd like to be able to limit the number and total size of POSIX shared memory objects belonging to a given jail. Aside from the problem of the leaked credential, the current behaviour of not destroying objects created in a jail makes accounting more complicated. One possibility is to somehow re-home any shm objects that exist when the jail is destroyed, and transfer the accounting as well. > >> > >> That does seem reasonable, though it would take some bookkeeping to do > >> right. There is currently no concrete idea of a jail's ownership of a > >> POSIX shm object, as it uses only uid and gid for access permissions, > >> same as files. The tie to the jail is in the underlying vm_object, > >> which holds a cred that references the jail - that seems to be what's > >> keeping the jail from going away. > > > > Interesting - I was wondering how that worked, thanks. Would there by a > > way to cut that tie somehow (for use cases that deliberately want to > > leave the shared memory segment behind)? > > It might be possible to change vm_object's cred to one that has the > same uid/gid but is outside of the jail. The big argument against > that is that I don't know enough about the VM subsystem to go poking > about there lightly. When we looked at this problem, it seemed the intent was for POSIX shared memory objects to behave like filesystem objects: jailed processes can create shm objects in the jail's filesystem namespace, and such objects are not removed when the jail goes away. Moreover, jails sharing a filesystem root also share a POSIX shm namespace. I think the semantic of tying shm objects to the lifetime of the creator's jail is more natural, even though it diverges from the treatment of filesystem objects. It also avoids the problem of having to figure out whether it's ok to switch the object's credential. > From the user perspective, you can keep such objects with a little > planning ahead: always create them outside of the jail, though using > the jail's path in the name (which is how a non-jailed process would > refer to it anyway). Then jailed processes can access the shared > memory, but won't own it. If a process in the host holds a jailed object open, and the jail is destroyed (unlinking the object from the jail's namespace), would the process' reference still cause the jail to linger in the dying state?