current fd allocation idiom

Fri Jul 18 13:06:37 UTC 2014

On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote:
> The kernel has to deal with concurrent attempts to open, close, dup2 and
> use the same file descriptor.
> 
> I start with stating a rather esoteric bug affecting this area, then I
> follow with a short overview of what is happening in general and a
> proposal how to change to get rid of the bug and get some additional
> enchancements. Interestingly enough turns out Linux is doing pretty much
> the same thing.
> 
> ============================
> THE BUG:
> /*
>  * Initialize the file pointer with the specified properties.
>  *
>  * The ops are set with release semantics to be certain that the flags, type,
>  * and data are visible when ops is.  This is to prevent ops methods from being
>  * called with bad data.
>  */
> void
> finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops)
> {
>         fp->f_data = data;
>         fp->f_flag = flag;
>         fp->f_type = type;
>         atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops);
> }
> 
> This itself is fine, but it assumes all code obtaining fp from fdtable places
> a read memory barrier after reading f_ops and before reading anything else.
> As you could guess no code does that and I don't believe placing rmb's
> in several new places is the way to go.
I think your analysis is correct for all cases except kern_openat().

For kern_openat(), we install a file into fdtable only after the struct
file is fully initialized, see kern_openat(). The file is allocated
with falloc_noinstall(), then file is initialized, then finstall() does
FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file
accessors do fget_unlocked(), which does acquire (of fp->f_count, but
this does not matter).

The only critical constraint is that other accessors must not see
f_ops != badfileops while struct file is not fully visible.  IMO,
the full barrier in finstall() and acquire in fget*() guarantee
that we see badfileops until writes to other members are visible.

For falloc(), indeed the write to f_ops could become visible too early
(but not on x86).

> 
> ============================
> GENERAL OVERVIEW OF CURRENT STATE:
> 
> fps are obtained and installed as follows:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	fdrop(fp);
> 	return (error);
> }
> 
> finit(fp, ....);
> fdrop(fp);
> return (0);
> 
> After falloc fp is installed in fdtable, it has refcount 2 and ops set to
> 'badfileops'.
> 
> if something() failed:
> fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp
> fdrop() clears the second reference, everything is cleared up
> 
> if something() succeeded:
> finit() finishes initialization of fp
> fdrop() cleares the second reference. fp now has expected refcount of 1.
> 
> Now a little complication:
> parallel close() execution:
> fd is recognizes as used. it is cleared and fdrop(fp) is called.
> 
> if something() succeeded after close:
> fdrop() kills fp
> 
> if something() failed after close:
> fdclose() concludes nothing to do
> fdrop() kill fp
> 
> Same story with dup2.
> 
> What readers need to do:
> - rmb() after reading fp_ops
> - check fp_ops for badfileops
How can readers see badfileops ?

> 
> ============================
> PROPOSAL:
> 
> struct file *fp;
> int fd;
> 
> if (error = falloc(&fp, &fd))
Use falloc_noinstall() there.

> 	return (error);
> if (error = something(....)) {
> 	fdclose(fp, fd);
> 	return (error);
> }
> 
> finit(fp, ....);
> factivate(fd, fp);
This function is spelled finstall().

It seems that all what is needed is conversion of places using
falloc() to falloc_noinstall()/finstall().

> return (0);
> 
> After falloc fd is only marked as used, fp is NOT installed.
> fp is returned with refcount of 1 and is invisible to anyone but
> curthread.
> 
> if something() failed:
> fdclose() marks fd as unused and kills fp
> 
> if something() succeeded:
> finit() finishes initialization of fp
> factivate() sets fp to non-null with a barrier
> 
> Now a little complication:
> parallel close() execution:
> since fp is null, fd is recognized as unused. EBADF is returned.
> 
> The only problem is with dup2 and I believe it is actually a step
> forward.
> 
> Let's assume fd was marked as used by falloc, but fp was not installed yet.
> dup2(n, fd) will see that fd is used. With current code there is no
> problem since there is fp to fdrop and it can proceed. With the proposal
> however, there is nothing to fdrop. Linux returns EBADF in this case
> which deals with the problem and does not seem to provide any drawbacks
> for behaving processes.
> 
> So, differences to current approach:
> 1. fewer barriers and atomic operations
> 2. no need to check for f_ops type
> 3. new case when dup2 can return an error
> 
> Note that 3 should not be a problem since Linux is doing this already.
> 
> Also note current approach is not implemented correctly at the moment as
> it misses rmbs, although I'm unsure how much this matters in practice.
> 
> Thoughts?
> -- 
> Mateusz Guzik <mjguzik gmail.com>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20140718/dd099ae9/attachment.sig>