svn commit: r341689 - in head: lib/libc/sys sys/compat/freebsd32 sys/kern sys/sys

John Baldwin jhb at FreeBSD.org
Fri Dec 7 23:31:11 UTC 2018


On 12/7/18 3:13 PM, Konstantin Belousov wrote:
> On Fri, Dec 07, 2018 at 10:04:51AM -0800, John Baldwin wrote:
>> On 12/7/18 9:47 AM, Konstantin Belousov wrote:
>>> On Fri, Dec 07, 2018 at 09:21:34AM -0800, John Baldwin wrote:
>>>> On 12/7/18 7:17 AM, Konstantin Belousov wrote:
>>>>> Author: kib
>>>>> Date: Fri Dec  7 15:17:29 2018
>>>>> New Revision: 341689
>>>>> URL: https://svnweb.freebsd.org/changeset/base/341689
>>>>>
>>>>> Log:
>>>>>   Add new file handle system calls.
>>>>>   
>>>>>   Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
>>>>>   syscalls are provided for a NFS userspace server (nfs-ganesha).
>>>>>   
>>>>>   Submitted by:	Jack Halford <jack at gandi.net>
>>>>>   Sponsored by:	Gandi.net
>>>>>   Tested by:	pho
>>>>>   Feedback from:	brooks, markj
>>>>>   MFC after:	1 week
>>>>>   Differential revision:	https://reviews.freebsd.org/D18359
>>>>
>>>> Can this be used to implement 'flink' (create a link to an open file
>>>> descriptor)?  Hmm, it appears so.  It is limited to PRIV_VFS_GETFH at least.
>>>> The getfh(2) manpage notes this explicitly, but the new manpages don't
>>>> appear to.  Even with the PRIV check, I'm still somewhat nervous about what
>>>> flink means for processes running as root that are using Capsicum.  Maybe
>>>> it's ok, but I didn't see any discussion of this in the review.
>>>
>>> If the process can execute getfh(2) and fhlink(2), then its privileges
>>> are not much different from the privileges of the in-kernel NFS server.
>>> During the review, I verified that PRIV_VFS_GETFH is checked, and considered
>>> suggesting fine-grainer individual privs for other operations, but decided
>>> that this is not too useful.
>>>
>>> That said, how can you translate from file descriptor to file
>>> handle ? E.g. to know (and not guess) the file handle on UFS,
>>> the process must posses PRIV_VFS_GENERATION. If you have
>>> PRIV_VFS_GETFH/PRIV_VFS_GENERATION privs, then you can implement
>>> flink(2) for UFS, but might be that it is even not undesirable.
>>
>> My understanding of the normal reason against flink is that you can make
>> unlinked files readable by other processes (though something like
>> /proc/<pid>/fd/<n> already permits this), and in particular if a more
>> privileged process passes an fd to a less privileged process.  The
>> requirement for root mostly mitigates this when root vs not-root is your
>> only privilege.  However, a capsicum vs non-capsicum process is a more
>> recent privilege that is orthogonal to root vs non-root.  It might be that
>> allowing a capsicumized root to create links to files that were intentionally
>> unlinked by a non-capsicumized root would be the same problem.
>>
>> In practice on the majority of FreeBSD systems, root has all of the PRIV_foo
>> things.  You have to write custom MAC modules to actually limit root.  Thus
>> it would seem that we should now be happy to add flink() so long as it
>> requires root.
> 
> Do you think that flink(2) itself is useful ?

I'm not sure.  The motivating use case in some of the stuff I found online
today was to permit one to construct a file with an initial set of contents
such that other processes couldn't open the file until it was fully ready,
so you created an unlinked file (Linux has an O_TMPFILE for this I think)
and only later hooked it up in the filesystem.  (You can use linkat with
either /dev/fd/<n> or the /proc equivalent I think as the source to do this
on Linux apparently.)  I'm not sure it is otherwise useful.  This particular
use case also seems like a kludge to workaround the advisory file locking in
POSIX, and you could also accomplish this by just doing a rename() from a
temporary name to the final name instead.

A use case I had in the past was a helper application that wanted to avoid
races with trying to execute binaries over NFS, so it would copy the binary
to local disk and fork a child to call exec.  Once the child exec'd it would
unlink the binary so they didn't leak on the local filesystem.  However, it
would also watch the child process and if the child process crashed instead
of exiting cleanly it would make a new copy of the binary (by reading from
the original and writing it out to a new file) so that there was a matching
binary for the core that could be used with a debugger.  In theory flink()
would have been more efficient than making a copy of the file.  OTOH, that
was also running as an unprivileged user and flink() for non-root user, so if
the previous security concerns are still valid I think you probably don't
want non-root using flink().

-- 
John Baldwin

                                                                            


More information about the svn-src-head mailing list