From nobody Mon Jul 04 11:14:08 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5041A8BAB48 for ; Mon, 4 Jul 2022 11:14:28 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Lc39H2fMrz4tVs for ; Mon, 4 Jul 2022 11:14:27 +0000 (UTC) (envelope-from dfr@rabson.org) Received: by mail-lj1-x230.google.com with SMTP id a11so10656187ljb.5 for ; Mon, 04 Jul 2022 04:14:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rabson-org.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+VvGwgC6usEDsxdLDR7S9PP5pW6ke9jjSgXNB2tOJFo=; b=c7/20qhfpBnRNNyb5tHHtVVBlzwA1NPa0ZcnX/c4u06ofTHH3zpppCPE6TKTvnipvB cpim5vzUy/6Ut4YmR66dq8aXW9J90/Y8lkQT/+fg+ALweSDgtQ5uZH2IpyjInDjYLvIx bNAmBWXHIdIo09ItnY5MuvcFH+tqxGOTRR1UAAK3v3oMrEbW+5XmrS4UsvyZPBYKjk/f Qk6uUHMRErfeBRwhzUzGdTiI1O8+PnswyHROpcFt+pPgn/C/PNX7y82a67c1p/M5aotN RlNFsIg3/f7UuQJ+Vi6KfDhdEUNomFzgVfwGXtzXMtqQ/Ip3dX6ktD/zV6rTFsftQwvN vXGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+VvGwgC6usEDsxdLDR7S9PP5pW6ke9jjSgXNB2tOJFo=; b=TZ4eLFC7hQSjTXY7eV5C4y9lNFAS98WVrVGEDuh1rhQyTmGf1a/kXND3odIqpOvHhd QOXF2488rkAlL/o2r3ttj42AwYhGfAt6pizmD5tWNyxXDX9j4e2ezRtCYHa2xRtyl6Am ozT/QaVxkmRf52Sz11d3tPVHMtfvqvc4hKQvb2V6kZC7gdA2pGBP3L8rYU+WRPPPUUsv ihxXTHFwL7GfMEzRIHMTuJktpeRYjnMiTQH31JVUrXGJoLHiH7Tp2pu7mJD70pg2rxPb 1qeoreWPiLSFNQg+JtOucDkwG16YggKY7yWidNWey9Xuct8mAlyPkZ5nNYsFesw1wqaP SCFg== X-Gm-Message-State: AJIora+t0zLWZGCoUELLRXnF3FMdh6gOlf/3EBhgh5T356+MCfPd2hO4 BJF2Thk4IQyE5/qMkZOpXWwxcdAhmSDCXsqDyUrhGScEiurWWg== X-Google-Smtp-Source: AGRyM1vKmFq2nmRVS2QBazxAxTApzVgeJUIpdQMeDVWhgsgdO/Zp5avYFUkYxri2xlNWj1jQsQ0R3HytKqsSM1XhJKw= X-Received: by 2002:a05:651c:1544:b0:25a:8e6f:a1b6 with SMTP id y4-20020a05651c154400b0025a8e6fa1b6mr16491856ljp.314.1656933259917; Mon, 04 Jul 2022 04:14:19 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Doug Rabson Date: Mon, 4 Jul 2022 12:14:08 +0100 Message-ID: Subject: Re: Container Networking for jails To: Gijs Peskens Cc: freebsd-jail@freebsd.org, freebsd-net@freebsd.org, Samuel Karp Content-Type: multipart/alternative; boundary="0000000000005fdc7205e2f8d6e0" X-Rspamd-Queue-Id: 4Lc39H2fMrz4tVs X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=rabson-org.20210112.gappssmtp.com header.s=20210112 header.b="c7/20qhf"; dmarc=none; spf=pass (mx1.freebsd.org: domain of dfr@rabson.org designates 2a00:1450:4864:20::230 as permitted sender) smtp.mailfrom=dfr@rabson.org X-Spamd-Result: default: False [-3.50 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[rabson-org.20210112.gappssmtp.com:s=20210112]; FREEFALL_USER(0.00)[dfr]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; DMARC_NA(0.00)[rabson.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[rabson-org.20210112.gappssmtp.com:+]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::230:from]; MLMMJ_DEST(0.00)[freebsd-net]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --0000000000005fdc7205e2f8d6e0 Content-Type: text/plain; charset="UTF-8" I think it's important that configuring the container network does not rely on any utilities from inside the container - for one thing, there are no guarantees that these utilities even exist inside the container and as you note, local versions may be incompatible. On the subject of risk, with the current jail infrastructure, the only user which can create and modify containers is root. Certain users may have delegated authority, e.g. by using setuid on a daemon-less setup like podman or by adjusting permissions on a unix domain socket but this is clearly a huge risk and should be strongly discouraged (IMO). Rootless containers using something similar to linux user namespaces would be nice but it is probably a higher priority to get containers working well for root first. My concern for supporting an alternative 'tooling' image for network utilities is that it adds complexity to the infrastructure for very little gain. You could even make a weak argument that it adds a threat vector, e.g. if the network utilities image is fetched from a compromised repository (pretty far fetched IMO but possible). On Sun, 3 Jul 2022 at 17:29, Gijs Peskens wrote: > I went with exactly the same design for the Docker port I started a while > ago. > The reason I went with that design is that there weren't any facilities to > modify a jails vent network configuration from outside of the jail. So it's > needed to enter the jail, run ifconfig et all. > Linux jails will lack a compatible ifconfig. > So having a parent FreeBSD based vnet jail ensures that networking can be > configured for Linux children. > > There is a risk to using the / filesystem: users that might be allowed to > setup and configure containers run standard system tools as root on the > root filesystem, even if they might not have root permission themselves.. > If an exploit was to be ever found in any of those tools to modify files > that could be used as a step in a privilege escalation. > > Imho, that risk is acceptable in a first port, but should be documented. > And ideally an option should be provided to use an alternative root if the > user deems the risk unacceptable. > > > > > On 30 June 2022 09:04:24 CEST, Doug Rabson wrote: >> >> I wanted to get a quick sanity check for my current approach to container >> networking with buildah and podman. These systems use CNI ( >> https://www.cni.dev) to set up the network. This uses a sequence of >> 'plugins' which are executables that perform successive steps in the >> process - a very common setup uses a 'bridge' plugin to add one half of an >> epair to a bridge and put the other half into the container's vnet. IP >> addresses are managed by an 'ipam' plugin and an optional 'portmap' plugin >> can be used to advertise container service ports on the host. All of these >> plugins run on the host with root privileges. >> >> In kubernetes and podman, it is possible for more than one container to >> share a network namespace in a 'pod'. Each container in the pod can >> communicate with its peers directly via localhost and they all share a >> single IP address. >> >> Mapping this over to jails, I am using one vnet jail to manage the >> network namespace and child jails of this to isolate the containers. The >> vnet jail uses '/' as its root path and the only things which run inside >> this jail are the CNI plugins. Using the host root means that a plugin can >> safely call host utilities such as ifconfig and route without having to >> trust the container's version of them. An important factor here is that the >> CNI plugins will only be run strictly before the container (to set up) or >> strictly after (to tear down) - at no point will CNI plugins be executed at >> the same time as container executables. >> >> The child jails use ip4/6=inherit to share the vnet and each will use a >> root path to the container's contents in the same way as a normal >> non-hierarchical jail. >> >> Can anyone see any potential security problems here, particularly around >> the use of nested jails? I believe that the only difference between this >> setup and a regular non-nested jail is that the vnet outlives the container >> briefly before it is torn down. >> > > -- > Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn > beknoptheid. > --0000000000005fdc7205e2f8d6e0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think it's=C2=A0important that configuring the conta= iner network does not rely on any utilities from inside the container - for= one thing, there are no guarantees that these utilities even exist inside = the container and as you note, local versions may be incompatible.

=
On the subject of risk, with the current jail infrastructure, th= e only user which can create and modify containers is root. Certain users m= ay have delegated authority, e.g. by using setuid on a daemon-less setup li= ke podman or by adjusting permissions on a unix domain socket but this is c= learly a huge risk and should be strongly discouraged (IMO). Rootless conta= iners using something similar to linux user namespaces would be nice but it= is probably a higher priority to get containers working well for root firs= t.

My concern for supporting an alternative 't= ooling' image for network utilities is that it adds complexity to the i= nfrastructure for very little gain. You could even make a weak argument tha= t it adds a threat vector, e.g. if the network utilities image is fetched f= rom a compromised repository (pretty far fetched IMO but possible).



On Sun, 3 Jul 2022 at 17:29, Gijs Peskens <= ;gijs@peskens.net> wrote:
I went with exactly the same design for the Docker= port I started a while ago.
The reason I went with that design is that = there weren't any facilities to modify a jails vent network configurati= on from outside of the jail. So it's needed to enter the jail, run ifco= nfig et all.
Linux jails will lack a compatible ifconfig.
So having = a parent FreeBSD based vnet jail ensures that networking can be configured = for Linux children.

There is a risk to using the / filesystem: user= s that might be allowed to setup and configure containers run standard syst= em tools as root on the root filesystem, even if they might not have root p= ermission themselves.. If an exploit was to be ever found in any of those t= ools to modify files that could be used as a step in a privilege escalation= .

Imho, that risk is acceptable in a first port, but should be docu= mented. And ideally an option should be provided to use an alternative root= if the user deems the risk unacceptable.




On 30 June 2022 09:04:24 CEST, Doug Rabson <dfr@rabson.org> wrote:

<= div>In kubernetes and podman, it is possible for more than one container to= share a network namespace in a 'pod'. Each container in the pod ca= n communicate with its peers directly via localhost and they all share a si= ngle IP address.



Can anyone see any potential security problems here, particularly around = the use of nested jails? I believe that the only difference between this se= tup and a regular non-nested jail is that the vnet outlives the container b= riefly before it is torn down.

--
Verstuurd vanaf mijn Android apparaat met K-9= Mail. Excuseer mijn beknoptheid.
--0000000000005fdc7205e2f8d6e0--