RFC: Jail Capsules
- Reply: James Gritton : "Re: RFC: Jail Capsules"
- Reply: Dewayne Geraghty : "Re: RFC: Jail Capsules"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 01 Sep 2025 03:27:02 UTC
Hi,
I've been toying around with this idea for a bit, and I wanted to solicit some opinions on the
design and whether this seems like something that FreeBSD would be find useful outside of my
own tree.
Background: I've been thinking a lot lately about secure product design and threat models, and
wondering what kinds of things one could incorporate into their design as a defense in depth
kind of thing. This is, of course, not something I would pitch as a strong isolation mechanism,
but rather as a mechanism to protect against some less sophisticated threats.
The basic idea that I'm proposing is the ability to seal a jail to turn it into a 'capsule'. You
can either seal it at creation time, or while it's already running. If you create the jail as a
capsule, you must attach to it at the same time. Sealing it later is a compromise to give the
system some runway to configure the jail first, presumably before other user activity could
start and try to compromise the capsule before it's sealed.
Once sealed, the capsule has the following properties (that I've thought about, at least):
- The capsule may not be unsealed
- Processes outside of the capsule may not attach to it
- Unprivileged users in the parent cannot see or tamper with processes in the jail, regardless of
the security.bsd.see_* sysctls. persist and all of the allow.unprivileged_* jail knobs will be
forcibly unset and result in errors if one attempts to set them after
- Privileged processes may see and signal the processes in a capsule if securelevel is <= 0, but it
cannot attach to, debug, or cpuset individual processes in a capsule at any securelevel
The premise of a capsule is that you (attempt to) seal off access points into the jail besides for a
well-defined (by the software in the capsule) security boundary. It is naturally not protected if
the kernel is compromised or in some other scenarios, but you eliminate a number of threats where an
attacker can manage to make syscalls but doesn't have the tools available to escalate further. Capsules
would simply be a building block to a larger secure design.
An obvious elephant in the room here is filesystem access. A capsule would force an attacker to get
a little more creative if they want to tamper with capsule processes, in particular if it's combined
with a heightened securelevel (or removal of other features like /dev/mem entirely), but it does not
stop an attacker from filesystem tampering to disrupt capsule activities. This kind of leaves a huge
part of protecting itself up to application design, which arguably eliminates many benefits of the idea.
I don't really have a good answer for how one might solve that. The rest of the design is fairly
straightforward to implement, but I would rather suspect it might get hairy if you try to block off parts
of the filesystem (even from root, maybe contingent on securelevel) based on whether the path has been
used for a capsule or not.
Comments/questions/tomatoes welcome. The idea was somewhat inspired by enclaves and a design where one
can slice off some CPUs to dedicate to the capsule alone to try and mitigate some side-channel
possibilities from other user processes, but the initial capsule thought process doesn't go to the
extent of trying to carve out memory to dedicate to a capsule.
Thanks,
Kyle Evans