svn commit: r250220 - head/sys/kern

Bruce Evans brde at optusnet.com.au
Sun May 12 04:09:03 UTC 2013


On Fri, 10 May 2013, John Baldwin wrote:

> On Thursday, May 09, 2013 11:05:46 pm Bruce Evans wrote:
>> On Mon, 6 May 2013, John Baldwin wrote:
>> ...
>>> static int
>>> vn_ioctl(fp, com, data, active_cred, td)
>>> 	struct file *fp;
>>> 	u_long com;
>>> 	void *data;
>>> 	struct ucred *active_cred;
>>> 	struct thread *td;
>>> {
>>> 	struct vnode *vp = fp->f_vnode;
>>> 	struct vattr vattr;
>>>
>>> 	switch (vp->v_type) {
>>> 	case VREG:
>>> 	case VDIR:
>>> 		switch (com) {
>>> 		case FIONREAD:
>>> 			vn_lock(vp, LK_SHARED | LK_RETRY);
>>> 			error = VOP_GETATTR(vp, &vattr, active_cred);
>>> 			VOP_UNLOCK(vp, 0);
>>> 			if (!error)
>>> 				*(int *)data = vattr.va_size - fp->f_offset;
>>> 			return (error);
>>> 		case FIONBIO:
>>> 		case FIOASYNC:
>>> 			return (0);	/* XXX */
>>> 		default:
>>> 			return (VOP_IOCTL(vp, com, data, fp->f_flag,
>>> 			    active_cred, td));
>>> 		}
>>> 	default:
>>> 		return (ENOTTY);
>>> 	}
>>> }
> ...
>>> (The 'XXX' comment could perhaps be expanded to something along the lines of
>>> 'Allow fcntl() to toggle FNONBLOCK and FASYNC.')
>>
>> Is that what it is about?  IIRC, upper layers do some partial handling
>> and then call lower layers to possibly do some more handling.  But here
>> and in some other places there is nothing more to be done.  No comment
>> is needed, but maybe the XXX's were reminders to clean up the layering.
>> FreeBSD has cleaned up the layering a bit, by using differnt fops for
>> different file types.
>
> The problem is this code in fcntl() which I ran into when working on a
> tutorial for writing character devices:
>
> 	case F_SETFL:
> 		error = fget_unlocked(fdp, fd, CAP_FCNTL, F_SETFL, &fp, NULL);
> 		if (error != 0)
> 			break;
> 		do {
> 			tmp = flg = fp->f_flag;
> 			tmp &= ~FCNTLFLAGS;
> 			tmp |= FFLAGS(arg & ~O_ACCMODE) & FCNTLFLAGS;
> 		} while(atomic_cmpset_int(&fp->f_flag, flg, tmp) == 0);
> 		tmp = fp->f_flag & FNONBLOCK;
> 		error = fo_ioctl(fp, FIONBIO, &tmp, td->td_ucred, td);
> 		if (error != 0) {
> 			fdrop(fp, td);
> 			break;
> 		}
> 		tmp = fp->f_flag & FASYNC;
> 		error = fo_ioctl(fp, FIOASYNC, &tmp, td->td_ucred, td);
> 		if (error == 0) {
> 			fdrop(fp, td);
> 			break;
> 		}
> 		atomic_clear_int(&fp->f_flag, FNONBLOCK);
> 		tmp = 0;
> 		(void)fo_ioctl(fp, FIONBIO, &tmp, td->td_ucred, td);
> 		fdrop(fp, td);
> 		break;
>
> Hmm, this seems to have the bug that if you had FNONBLOCK set and
> tried to set FASYNC via fcntl() but FIOASYNC isn't supported,
> FNONBLOCK is cleared.  It seems we should only clear FNONBLOCK
> if it wasn't set in 'flg'.  I think this would fix that:

sys_socket.c mishandles this differently by voiding the result of
all fo_ioctls().  The above only assumes that since the first
fo_ioctl() for FIONBIO succeeded, the second one will too.

I now remember more about this bad code:
- it is still very broken for devices.  ioctls are inherently per-device,
    but file flags aren't even per-file -- they are per-file-descriptor
    -- so the device state becomes inconsistent with the file descriptor
    state if more than 1 fd is open on a device and any non-null change
    of the file flags is made using one of the fd's (applications might
    be able to keep the file flags fairly consistent by doing fcntl for
    all the fd's (including ones shared across processes), but the file
    flags would be at least transiently inconsistent.  Only the FNONBLOCK
    (O_NONBLOCK) and FASYNC (O_ASYNC) flags are passed down.  O_NONBLOCK
    works right for some devices because it is also passed to open() and
    read()/write() (so it will work right for ioctl() if the device
    driver ignores it then).  There are related not so bad cases for
    F_GETFL, F_GETOWN and F_SETOWN.  F_GETFL returns per-fd flags.  There
    are no per-fd ownerships, so F_GETOWN has to use an ioctl to fetch
    a non-per-fd ownership (the one set by F_SETOWN) which is at least
    consistently not per-fd.

Most drivers/file types are actually non-broken for FIONBIO and do nothing
except return 0 for it (I only grepped for FIONBIO; not for O_NONBLOCK
or FNONBLOCK).  This return is sometimes XXX'ed, but it really
shouldn't be.  Returning 0 and not doing anything means that the driver
supports O_NONBLOCK correctly (by checking it per-fd in the flag passed
to open() and read/write()).  It is the drivers that do something which
deserve an XXX.  The only broken ioctl routines for FIONBIO are now:
- subr_bus.c: devioctl()
- sys_socket.c: soo_ioctl()
- pcm/dsp.c: dsp_ioctl()
- audit/audit_pipe.c: audit_pipe_ioctl()
- cbus/fdc.c: fdioctl().  This is different from fdc/fdc.c.  The latter
    just returns 0 and claims that this is for backwards compatibility
    with old utilities.  Apparently the pc98 version hasn't been moved
    to userland as much as the pc98 version.  But the comment is wrong,
    since old utilities are just broken since they depend on kernel
    features that were removed.
IIRC, not much more than soo_ioctl() had this bug in 4.4BSD, except
F_SETOWN as  more complicated and broken (but not supported for as
many devices/file types).  truckman@ did a lot of work to fix F_SETOWN
and assoicated signal handling.

> Index: kern_descrip.c
> ===================================================================
> --- kern_descrip.c	(revision 250451)
> +++ kern_descrip.c	(working copy)
> @@ -539,7 +539,7 @@ kern_fcntl(struct thread *td, int fd, int cmd, int
> 		}
> 		tmp = fp->f_flag & FASYNC;
> 		error = fo_ioctl(fp, FIOASYNC, &tmp, td->td_ucred, td);
> -		if (error == 0) {
> +		if (error == 0 || (flg & FNONBLOCK)) {
> 			fdrop(fp, td);
> 			break;
> 		}
>
> This would mean you would no longer have to support the FIOASYNC
> ioctl just to allow FIONBLOCK to be set.

Looks good, but...

> Note, btw, that kern_ioctl() doesn't enforce quite the same
> requirement as F_SETFL in that FIONBIO doesn't check for FASYNC and vice
> versa.  ioctl() also doesn't revert the change if the backing fo_ioctl
> method fails.

... it's bogus to back out of just the FIONBIO setting.  Other flags
changes (to O_APPEND and O_DIRECT) are not backed out of.

Apparently the idea in fcntl was to back out of all changes.  This
may have even worked before O_APPEND was supported.  O_DIRECT didn't
exist until later.  ioctl() is too complicated to have ever attempted
this.  This causes problems for applications programming: I don't know
what the spec for fcntl() or ioctl() says, but for tcsetattr() (which
is implemented using ioctl() in FreeBSD), POSIX says that the syscall
shall (?) succeed if at least one attribute was "set" (changed?).  So
when you ask tcsetattr() to change lots of attributes, the only correct
way to determine if it succeeded is to call tcgetattr() and check that
all the changes that you care about actually occurred).

> I actually think that fcntl should probably ignore ENOTTY errors
> if the flag is not set.  Otherwise using fcntl to toggle some other
> flag can succeed but still return an error.

I think ENOTTY errors for FIONBIO are rare, since too many places copy
the code that returns 0 for FIONBIO even when O_NONBLOCK is not supported.
E.g., the non-pc98 fdc driver doesn't support O_NONBLOCK, but it has a
compatibiity excuse for claiming support.

I think fo_ioctl() shouldn't be called for null changes (especially
from the unset case to the unset case).  Calling it should fail quite
often and thus give ENOTTY even for callers that have no interest in
FNONBLOCK or FASYNC but want to toggle some other changes.  However,
always calling it minimises the inconsistencies for drivers(...) that
act on it -- this syncs the driver state with the fd state from the
last fcntl() on the set of fd's referencing the device, irrespective
of whether the current fcntl() is toggling the state.

Bruce


More information about the svn-src-all mailing list