In general, in UNIX, "unlink" is a namespace operation relative to a directory, and not an operation on a file, so I wouldn't expect to have a system call that searches a directory looking for a matching file, but rather always a call that specifies the specific segment to remove (as there may well be more than one of them).

It seems to me like there are a few different use cases:

(1) Just want some temporary non-persistent file-like storage please. Here, swap-backed anonymous objects are probably generally preferable, although if they will be huge, perhaps a filesystem is a better place to back them.

(2) Want a temporary (non-persistent) hierarchal namespace full of file-like things. This need is not well served, as you need to not only create this within a current filesystem, but garbage collection of the results is not reliable in the presence of crashes/etc.

(3) Want capability-based access to a persistent hierarchal namespace full of files. This is well served by the current at(2) system calls along with filesystems, although there are API gaps (e.g., a lack of unlinkat(2) in FreeBSD).

Because of the complexity of (2), a Casper service is likely the way to go. We should fill the API gaps on (3) through new POSIX-like at(2). For (1), the real issue is if the current swap-backed APIs are insufficient, in which case a Casper service might be the way to go.


> Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat <https://linux.die.net/man/2/unlinkat>) but it doesn't seem to have a flag that lets you unlink the fd itself.
> Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I haven't tried whether that works.
> It also has a AT_REMOVEDIR flag to make it function as rmdirat().
> FWIW, this is part of why we introduced anonymous POSIX shared memory objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a SHM_ANON special name, which causes the creation of a swap-backed, mappable file-like object that can have I/O, memory mapping, etc, performed on it .. but never has any persistent state across reboots even in the event of a crash.
> With Capsicum you can then refine a file descriptor to the otherwise writable object to be read-only for the purposes of delegation. There is not, however, a mechanism to "freeze" the state of the object causing other outstanding writable descriptors to become read-only -- certainly something could be added, but some care regarding VM semantics would be required -- in particular, so that faults could not be experienced as a result of an memory store performed before the "freeze" but issued to VFS only later.
> I certainly have no objection to an unlinkat(2) system call -- it's unfortunate that a full suite of the at(2) APIs wasn't introduced in the first place. It would be worth checking that no one else (e.g., Solaris, Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API semantics for. I think I take the view that for truly anonymous objects, shm_open(2) without a name (or the Linux equiv) is the right thing -- and hence unlinkat(2) is for more conventional use cases where the final pathname element is known.
> On directories: There, I find myself falling back on a Casper-like service, since GC'ing a single anonymous memory object is straightforward, but GC'ing a directory hierarchy is a more messy business.
> > I think it would make sense to have an unlinkfd() that unlinks the file from
> > everywhere, so it does not need a name to be specified. This might be
> > hard to implement.
> >
> > For temporary files, I really like Linux memfd_create(2) that opens an anonymous
> > file without a name. This semantics is really useful. (Linux memfd also has
> > additional options for sealing the file fo make it immutable which are very
> > useful for safely passing files between processes.) Having a way to make
> > unnamed temporary files solves a lot of deletion issues as the file
> > never needs to
> > be unlinked.
> >
> >> Today I would like to propose a new syscall called unlinkfd(2) which came up
> >> during a discussion with Ed Maste.
> >>
> >> Currently in UNIX we can’t remove files safely. If we will try to do so we
> >> always end up in a race condition. For example when we open a file, and check
> >> it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
> >> unlink could be a different one than the one we were fstating just a moment ago.
> >>
> >> Another reason of implementing unlinkfd(2) came to us when we were trying
> >> to sandbox some applications like: uudecode/b64decode or bspatch. It occured
> >> to us that we don’t have a good way of removing single files. Of course we can
> >> try to determine in which directory we are in, and then open this directory and
> >> remove a single file.
> >>
> >> It looks even more bizarre if we would think about a program which operates on
> >> multiple files. If we would analyze a situation with two totally different
> >> directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
> >> a root directory or keeping as many directories as we are working on open.
> >> All of that effort only to remove two files. This make it totally impractical!
> >>
> >> I think that opening directories also presents some wider attack vector because
> >> we are keeping a single descriptor to a directory only to remove one file.
> >> Unfortunately this means that an attacker can remove all files in that directory.
> >>
> >> I proposed this as well on the last Capsicum call. There was a suggestion that
> >> instead of doing a single syscall maybe we should have a Casper service that
> >> will allow us to remove files. Another idea was that we should perhaps redesign
> >> programs to create some subdirs work on the subdirs and then remove all files in
> >> this subdir. I don’t feel that creating a Casper service is a good idea because
> >> we still have exactly the same issue of race condition. In my opinion creating
> >> subdirs is also a problem for us.
> >>
> >> First we would need to redesign some of our tools and I think we should
> >> simplyfiy capsicumizition of the process instead of making it harder.
> >>
> >> Secondly we can create a temporary subdirectory but what will remove it?
> >> We are going back to having a fd to directory in which we just created a subdir.
> >> Another way would be to have Casper service which would remove a directory but
> >> with the risk of RC.
> >>
> >> In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
> >> is easy to implement. The only downside of this implementation is that we not
> >> only need to provide a fd but also a path file. This is because inodes nor
> >> vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
> >> path, if they are exactly the same we remove a file. In the syscall we are using
> >> a fd so there is no Ambient Authority because we are proving that we already
> >> have access to that file. Thanks to that the syscall can be safely used with
> >> Caspsicum. I have already discussed this with some people and they said
> >> `Hey I already had that idea a while ago…` so let’s do something with that idea!
> >> If you are intereted in patch you can find it here:
> >> https://reviews.freebsd.org/D14567 <https://reviews.freebsd.org/D14567>
> >>
