[Bug 262179] Prevent jail escaping via shared nullfs; option to disable UNIX domain socket binding

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 24 Feb 2022 22:35:53 UTC

            Bug ID: 262179
           Summary: Prevent jail escaping via shared nullfs; option to
                    disable UNIX domain socket binding
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: firk@cantconnect.ru
 Attachment #232085 text/plain
         mime type:

Created attachment 232085
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=232085&action=edit
PoC source

==== Background: possible security hole with jail & shared nullfs ====

When two independent jails have the same directory shared via nullfs, they can
escape chroot-ed filesystem.

How this is done:
1) jail A creates listening UNIX domain socket in the shared directory.
2) jail B connects to that listening socket
3) jail A does open("/",O_DIRECTORY) and sends this fd over socket using
4) jail B receives the fd and now able to go through ".." into host filesystem

Quick guide how to reproduce (sendfd.c in attachment):

> gcc -o sendfd sendfd.c
> mkdir /j /j/1 /j/2
> tar -c -f - /bin /lib /libexec | tar -x -f - -C /j/1
> tar -c -f - /bin /lib /libexec | tar -x -f - -C /j/2
> cp sendfd /j/1/bin
> cp sendfd /j/2/bin
> mkdir /j/1/shared /j/2/shared
> mount -t nullfs /j/2/shared /j/1/shared

first console:
> jail /j/2 x /bin/sendfd listen /shared/2.sock /bin

second console:
> jail /j/1 x /bin/sendfd sh /shared/2.sock
> pwd

and you'll see pwd "/j/2" and may explore ../../ system root.

==== Proposed fix ====
New mount flag "nosockbind" means "do not allow UNIX domain bind()/bindat() to
paths on this filesystem". This flag is not transparent over nullfs, so it is
possible to mount bindable nullfs over non-bindable base partition.

Note that connecting to UNIX domain socket on such filesystem is still

There is patches for 10.4, 11.4, 12.3 and 13-stable branches. At least 12.3
version looks working fine, but: VFS subsystem is quite complicated, and I'm
new to it, so there are many (almst everywhere) things that I'm unsure:

=== vfs_mount.c ===
1) I'm not sure if I should add the option just to global_opts[] in vfs_mount.c
or also to fs-specific lists (at least I've seen noexec and nosuid duplicated
from global_opts[] to ffs_opts[])

2) I'm not sure about old sys_mount() API, there is some manual handling for
ro/nosuid/noexec there.

3) I think user mounts (they automatically gets MNT_NOSUID) is not an issue
here, am I right?

=== uipc_usrreq.c ===
1) It seems that mp from vn_start_write is not always the direct mp for
specified vnode (ex. for nullfs) so I read nd.ni_dvp->v_mount - is it correct?
It seems that ni_dvp can't be NULL here, and also it seems that it is locked
and so can't disappear.

2) Could ni_dvp->v_mount be zero or spontaneously disappear in middle? I've
added a check against NULL but may be it is not needed.

=== mount.h ===
Using unused flag 0x0000020000000000ULL for MNT_NOSOCKBIND.
Somewhere in CURRENT MNT_RECURSE=0x0000100000000000ULL was added which is
larger, but bits 0x00000E0000000000ULL still seems unused.

==== What did not done ====
Since nullfs is marked as jail-friendly, it seems that it is still possible to
do all this when jail created with allow_mount flag, by mounting unrestricted
nullfs over any place. Possible fixes for this:
1) enforce all jailed nullfs mounts to inherit "nosockbind" from underlying fs
2) disallow jailed updating a mount from "nosockbind" to "nonosockbind" state
3) make "nosockbind" transparent over nullfs (may be optional via sysctl)
4) workaround: do not make crossjail-shared nullfs accessible for allow_mount

You are receiving this mail because:
You are the assignee for the bug.