Re: curtain: WIP sandboxing mechanism with pledge()/unveil() support
- In reply to: Poul-Henning Kamp: "Re: curtain: WIP sandboxing mechanism with pledge()/unveil() support"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 01 Apr 2022 22:51:51 UTC
On 4/1/22 06:37, Poul-Henning Kamp wrote: > -------- > David Chisnall writes: > >>> pledge()/unveil() are usually used for fairly well-disciplined >>> applications that either don't run other programs or run very specific >>> programs that are also well-disciplined and don't expect too much >>> (unless you just drop the pledges on execve()). >> The execve hole is the reason that I have little interest in pledge as >> an enforcement mechanism. > That (and the name) is why I have never seen it as an enforcement mechanism, but only as a special case of asserts: > > "I pledge that I'm not going to ... (until I tell you otherwise), fail me if I do". > > It is not obvious to me what role the "curtain" proposal is intended to play, > or what role the originator of that proposal think pledge()/unveil() has ? Not sure how I would define what pledge()/unveil() are on OpenBSD. I think that they don't like the words "containers" or "sandboxes" for some reason (those are poorly defined terms anyway). I have more often seen it described as an "exploit-mitigation" mechanism. I believe that reducing the kernel attack surface was a big priority for them (while it's more of a secondary goal for me). But as far as I can tell, the enforcement is solid. How much of a "sandbox" it really is depends on the pledge promises and unveils that you use. The "execve hole" is just there if you ask for it. And it's a bit difficult to explain how curtain/pledge are different as mechanisms without enumerating all of the details that are different. But they added up enough that I thought it was better to give it a different name and a different API rather than trying to mold pledge() into something it wasn't designed to be. And with a separate API I didn't have to worry about breaking pledge()/unveil() compatibility all the time. The main problems I had with pledge()/unveil() for my goals were: 1) pledge() is capable of sandboxing unsuspecting programs but only to some extent. There are a lot of little details that will make many programs fail. With curtain(1) you can get much more of a real UNIX-like environment while (hopefully!) remaining isolated. And what really helped with that was that on FreeBSD, kernel access checks were already modularized. So it's much easier to expose more kernel functionality while maintaining isolation. 2) pledge() is nestable, but unveil() is not. Once you "commit" your unveils, you can't ever use unveil() again (nor can any of the programs you execute). And unveil is essential for sandboxing. That's a problem if you, say, run a whole shell session sandboxed, and then try to sandbox something within it (or run something that tries to self-sandbox itself automatically). Or say if you wanted to sandbox firefox as a whole on top of firefox using pledge()/unveil() internally (to get proper isolation in-between its own processes as well). All of this works with my curtain module. 3) I wanted restrictions to be more configurable in userland (partly to help with testing applications). I didn't want to have to modify the kernel every time I wanted to add permissions for a certain sysctl, ioctl, privilege, socket option, etc. That's the main reason I used a completely different API. Also, on OpenBSD, the paths allowed by certain promises (e.g. "/dev/stdout" for "stdio", "/etc/resolv.conf" for "dns", etc) are hardcoded in the kernel and the semantics are different than from user unveils. I moved this to the userland too and there's a unified method to unveil paths (which also helps fixing problem 2). 4) unveil() doesn't reveal the directory "skeleton" above the paths that you unveil (you generally get ENOENT trying to stat() or list parent directories). This makes a lot of programs unhappy (especially GUI ones). With curtain(1) by default you get a filtered view of the FS instead (but the unveil(3) compatibility still behaves like on OpenBSD). To run unsuspecting programs, you'd use the curtain(1) utility. And there's a configuration system to manage permissions. This does not use pledge() at all anywhere, it uses a new curtain(3) API (and the underlying curtainctl(2) syscall). I think the nearest analogy to curtain(1) is "firejail" on Linux, not pledge(). > What is the level of ambition and the use-cases here ? > To easily run EVERYTHING sandboxed. Realistically it won't be able to. But it's the goal. If a program doesn't work, then why not? It should. For desktop apps, the main problems are programs that ignore $TMPDIR, dbus/dconf and untrusted X11 problems. And many KDE apps that need their own separate XDG directories to work for now. But browsers work, audio players work, gimp works, qbittorrent works, libreoffice works if you give it access to /tmp. Sandboxing whole shell sessions work. You can build/run untrusted programs and confine the whole thing to the current directory (but you need to be very careful with the way you run git in it for example). These aren't things that you do with pledge(), but it's what curtain(1) was designed to do and it tries to make it convenient. There are other use cases, like something I'd call "last-ditch" sandboxing for server programs that need to run as root (say to authenticate users and switch credentials). It's possible to run samba for example with read-only access to the needed system files, read-write access to its own state/logs directories and restrict its access to specific data directories you want to share. I believe that this can maintain system integrity if samba is compromised. This becomes a bit like "traditional MAC" with the integrity labels, but it can be setup on an application-by-application basis rather than the whole system. It might not be worth it if the most important thing on the server to begin with are those data directories, but at the very least it could make forensics easier if there's a breach. There are examples of this in the sample config file.