Re: Capsicum revocable (proxy) file descriptors

Reply: Vinícius_dos_Santos_Oliveira : "Re: Capsicum revocable (proxy) file descriptors"
In reply to: Vinícius_dos_Santos_Oliveira : "Capsicum revocable (proxy) file descriptors"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: David Chisnall <theraven_at_FreeBSD.org>
Date: Tue, 07 Oct 2025 17:16:40 UTC
On 7 Oct 2025, at 16:25, Vinícius dos Santos Oliveira <vini.ipsmaker@gmail.com> wrote:
> 
> I was wondering what design choices other developers would have when
> designing a new file descriptor type for access revocation purposes in
> a capability system.
> 
> The standard practice to revoke capabilities is to create a new
> capability in a domain the user has control over and can revoke at any
> later time[1]. For Capsicum, we can't quite do that.
> 
> If a new file descriptor type were to be designed just to forward
> requests (which the creator could revoke later), what design concerns
> should be taken into consideration?

The main thing is how you handle this in the case of mutual distrust.  If someone gives you a file descriptor that they can revoke, how do you write defensive code that doesn’t have exciting bugs from atomicity things and so on.

A file descriptor that is a proxy to another file descriptor with an ioctl that deletes the proxied thing seems moderately easy.  Ideally it would at least have another ioctl that is an is-revocable thing.

The biggest thing to be careful of is that dup on the proxy gives a file descriptor that is also revoked at the same time.

But even then, being able to do things like execve a setuid binary and then close its stdout *after* it has checked that it’s checked that stdout exists makes me nervous.  I’m not sure if it’s useful to an attacker, but it’s a change in existing behaviour and so would need some analysis.  I’d want some experienced exploit developers to weigh in with ‘I can’t think of a way that I’d use that primitive’.

There’s also a bit the question of: what security problem would it solve?  If I’ve given someone read access to a file via a descriptor, I have to assume that they’ve read all of it before I revoke it.  If I’ve given someone write access, I have to assume that they have arbitrarily modified it before I revoke it.  If I want to ensure that their changes are committed, the following are roughly equivalent:

 - Revoke their fd
 - Create a CoW clone of the file and rename the clone over the original.

The second one has a couple of benefits:

 - It’s built out of generally useful components (Apple added APIs to CoW clone files and file trees because it’s useful for backup tools, for example).
 - It doesn’t make their program behave unexpectedly (they can still write, it’s just that the writes don’t persist).

I think it would be more useful to have a protocol where the untrusted party agrees to close a file descriptor and then a mechanism for validating that they have.  This has the advantage of working in both asymmetric and mutual distrust domains.

The mechanism for that is mandatory locking: you agree to close the file descriptor, I try to acquire an exclusive read-write lock.  If it succeeds, you don’t have an open fd to the resource, if it fails, you (or someone else) do.  Similarly, if I acquire a read lock, I know that you’ve closed the write file descriptor.  Currently, unless it was added recently while I wasn’t looking, FreeBSD has only advisory locks and so you can’t do this.  And that’s a shame.

Mandatory locking is a generally useful mechanism, rather than something special cased for a particular Capsicum use case.

Note: On CHERIoT RTOS, we lean into capability models far more than FreeBSD and we don’t have a mechanism for revoking arbitrary capabilities in the presence of a peer that wants to retain them.  We do have a notion of lexically-scoped delegation, where you can hand a peer a capability for the duration of a call.  If FreeBSD had something like io_uring (which has a per-ring namespace for file descriptors as well as the process-global one) then you can imagine doing something similar on top of some underlying IPC mechanism used for RPC, where a UNIX domain socket sends you one or more FDs that enter reserved slots in that ring’s namespace that are available until you send the response, at which point they’re implicitly closed (and cannot be dup’d in the middle).

David