Re: curtain: WIP sandboxing mechanism with pledge()/unveil() support

From: Mathieu <sigsys_at_gmail.com>
Date: Fri, 01 Apr 2022 22:51:51 UTC
On 4/1/22 06:37, Poul-Henning Kamp wrote:
> --------
> David Chisnall writes:
>
>>> pledge()/unveil() are usually used for fairly well-disciplined
>>> applications that either don't run other programs or run very specific
>>> programs that are also well-disciplined and don't expect too much
>>> (unless you just drop the pledges on execve()).
>> The execve hole is the reason that I have little interest in pledge as
>> an enforcement mechanism.
> That (and the name) is why I have never seen it as an enforcement mechanism, but only as a special case of asserts:
>
> 	"I pledge that I'm not going to ... (until I tell you otherwise), fail me if I do".
>
> It is not obvious to me what role the "curtain" proposal is intended to play,
> or what role the originator of that proposal think pledge()/unveil() has ?


Not sure how I would define what pledge()/unveil() are on OpenBSD.  I 
think that they don't like the words "containers" or "sandboxes" for 
some reason (those are poorly defined terms anyway).  I have more often 
seen it described as an "exploit-mitigation" mechanism.  I believe that 
reducing the kernel attack surface was a big priority for them (while 
it's more of a secondary goal for me).  But as far as I can tell, the 
enforcement is solid.  How much of a "sandbox" it really is depends on 
the pledge promises and unveils that you use.  The "execve hole" is just 
there if you ask for it.

And it's a bit difficult to explain how curtain/pledge are different as 
mechanisms without enumerating all of the details that are different.  
But they added up enough that I thought it was better to give it a 
different name and a different API rather than trying to mold pledge() 
into something it wasn't designed to be.  And with a separate API I 
didn't have to worry about breaking pledge()/unveil() compatibility all 
the time.

The main problems I had with pledge()/unveil() for my goals were:

1) pledge() is capable of sandboxing unsuspecting programs but only to 
some extent.  There are a lot of little details that will make many 
programs fail.  With curtain(1) you can get much more of a real 
UNIX-like environment while (hopefully!) remaining isolated.  And what 
really helped with that was that on FreeBSD, kernel access checks were 
already modularized.  So it's much easier to expose more kernel 
functionality while maintaining isolation.

2) pledge() is nestable, but unveil() is not.  Once you "commit" your 
unveils, you can't ever use unveil() again (nor can any of the programs 
you execute).  And unveil is essential for sandboxing.  That's a problem 
if you, say, run a whole shell session sandboxed, and then try to 
sandbox something within it (or run something that tries to self-sandbox 
itself automatically). Or say if you wanted to sandbox firefox as a 
whole on top of firefox using pledge()/unveil() internally (to get 
proper isolation in-between its own processes as well).  All of this 
works with my curtain module.

3) I wanted restrictions to be more configurable in userland (partly to 
help with testing applications).  I didn't want to have to modify the 
kernel every time I wanted to add permissions for a certain sysctl, 
ioctl, privilege, socket option, etc.  That's the main reason I used a 
completely different API.  Also, on OpenBSD, the paths allowed by 
certain promises (e.g. "/dev/stdout" for "stdio", "/etc/resolv.conf" for 
"dns", etc) are hardcoded in the kernel and the semantics are different 
than from user unveils.  I moved this to the userland too and there's a 
unified method to unveil paths (which also helps fixing problem 2).

4) unveil() doesn't reveal the directory "skeleton" above the paths that 
you unveil (you generally get ENOENT trying to stat() or list parent 
directories).  This makes a lot of programs unhappy (especially GUI 
ones).  With curtain(1) by default you get a filtered view of the FS 
instead (but the unveil(3) compatibility still behaves like on OpenBSD).

To run unsuspecting programs, you'd use the curtain(1) utility.  And 
there's a configuration system to manage permissions.  This does not use 
pledge() at all anywhere, it uses a new curtain(3) API (and the 
underlying curtainctl(2) syscall).

I think the nearest analogy to curtain(1) is "firejail" on Linux, not 
pledge().


> What is the level of ambition and the use-cases here ?
>

To easily run EVERYTHING sandboxed.  Realistically it won't be able to.  
But it's the goal.  If a program doesn't work, then why not?  It should.

For desktop apps, the main problems are programs that ignore $TMPDIR, 
dbus/dconf and untrusted X11 problems.  And many KDE apps that need 
their own separate XDG directories to work for now.

But browsers work, audio players work, gimp works, qbittorrent works, 
libreoffice works if you give it access to /tmp.

Sandboxing whole shell sessions work.  You can build/run untrusted 
programs and confine the whole thing to the current directory (but you 
need to be very careful with the way you run git in it for example).  
These aren't things that you do with pledge(), but it's what curtain(1) 
was designed to do and it tries to make it convenient.

There are other use cases, like something I'd call "last-ditch" 
sandboxing for server programs that need to run as root (say to 
authenticate users and switch credentials).  It's possible to run samba 
for example with read-only access to the needed system files, read-write 
access to its own state/logs directories and restrict its access to 
specific data directories you want to share.  I believe that this can 
maintain system integrity if samba is compromised.  This becomes a bit 
like "traditional MAC" with the integrity labels, but it can be setup on 
an application-by-application basis rather than the whole system.  It 
might not be worth it if the most important thing on the server to begin 
with are those data directories, but at the very least it could make 
forensics easier if there's a breach.  There are examples of this in the 
sample config file.