svn commit: r253045 - head/sys/fs/ext2fs

Tue Jul 9 05:36:25 UTC 2013

On Mon, 8 Jul 2013, Pedro F. Giffuni wrote:

> Log:
>  Avoid a panic and return EINVAL instead.
>
>  Merge from UFS r232692:
>  syscall() fuzzing can trigger this panic.

This breaks the assertion without fixing the bug.

I don't know what is in the inscrutable reference r232692, but UFS doesn't
exist, and ffs still has the assertions.  Even ext2fs_write() still has
the assertions.

> Modified: head/sys/fs/ext2fs/ext2_vnops.c
> ==============================================================================
> --- head/sys/fs/ext2fs/ext2_vnops.c	Mon Jul  8 19:40:50 2013	(r253044)
> +++ head/sys/fs/ext2fs/ext2_vnops.c	Mon Jul  8 20:21:36 2013	(r253045)
> @@ -1598,11 +1598,11 @@ ext2_read(struct vop_read_args *ap)
> 	} else if (vp->v_type != VREG && vp->v_type != VDIR)
> 		panic("%s: type %d", "ext2_read", vp->v_type);
> #endif
> +	if (uio->uio_resid < 0 || uio->uio_offset < 0)
> +		return (EINVAL);
> 	orig_resid = uio->uio_resid;
> -	KASSERT(orig_resid >= 0, ("ext2_read: uio->uio_resid < 0"));
> 	if (orig_resid == 0)
> 		return (0);

Syscall fuzzing doesn't trigger this panic.  One of the many type errors
in ext2fs used to be trigger this panic.  Now this type error just gives
EINVAL for valid syscalls.

The type error is that orig_resid stil has type int, but uio->uio_resid
has type ssize_t.  SSIZE_MAX is is perfectly broken to POSIX spec, so
on 64-bit arches ssize_t is int64_t and SSIZE_MAX is 2**63-1.  So on
64-bit arches, anyone can easily trigger the panic using invalid syscall
args where uio->uio_resid is larger than INT_MAX and this is larger
than the buffer; anyone with a large amount of virtual memory and a
large data segment size limit can not quite so easily trigger the panic
using valid syscall args where uio->uio_resid is larger than INT_MAX
and this is not larger than the buffer.  Assignment to orig_resid then
overflows.  The behaviour is implementation-defined and is normally
to truncate the value (the FreeBSD implementation is missing
documentation to say this).  Sometimes the result is negative so the
error is detected.  Othertimes a truncated positive or zero value is
used, and the final result either a too-short i/o (if more than the
truncated count could be done) or actually correct (if the i/o is a
read() and EOF is reached before the non-truncated part of the count
is needed).

> -	KASSERT(uio->uio_offset >= 0, ("ext2_read: uio->uio_offset < 0"));
> 	fs = ip->i_e2fs;
> 	if (uio->uio_offset < ip->i_size &&
> 	    uio->uio_offset >= fs->e2fs_maxfilesize)

This part of the change is just wrong.  uio->uio_offset is not copied to
a local variable of either the correct type or a wrong type, and there are
no problems with checking it directly.  Also, it has been 64 bits on all
arches in FreeBSD since FreeBSD-2, so doing a wrong check of it here is
presumably not necessary for avoiding worse problems later.

This commit doesn't change the KASSERT()s in ext2_write().  Now the
truncation bug doesn't affect the first KASSERT(), since uio->uio_resid
is not corrupted before checking it.

I thought that kib swept the tree to fix all the resid types to ssize_t
when he expanded the limit for read() and write(), etc.  Grep shows
that orig_resid has the correct type in all file systems in /sys/fs
except ext2fs.  It is spelled the same in ext2fs, msdosfs and nfsclient.
udf doesn't need it since its only use in the other file systems is to
handle setting of atime at the end, and udf is read-only.  But udf has
the following bugs handling uio->uio_resid: in udf_read():
- uio->uio_resid < 0 is not checked on entry, either to return EINVAL
   or panic.  Of course, this can't happen and the check doesn't belong
   in every file system.  Perhaps it could be moved to VOP_READ() and
   VOP_WRITE(), and omitted there more often than most KASSERT()s.
- uio->uio_resid == 0 is checked on entry.  Most or all file systems do
   this, but I think this can't happen either.
- uio->uio_offset < 0 is checked on entry, but not under a KASSERT() as
   on most file systems.  Putting it under KASSERT() saves space when
   invariants are not configured.
- uio->uio_resid is assigned to a variable n of the correct type.  The
   code using this seems to be OK.
- there is the following very bad code:
 		n = min((u_int)(udfmp->bsize - on),
 			uio->uio_resid);
   Here n has the correct type, but min() corrupts its uio->uio_resid arg
   to u_int.  The corruption gives much the same implementation-defined
   bugs as truncation by assignment to "int uio_resid", except more than
   1 of the misbehaviours may occur in a single syscall as uio->resid
   mod 2**32 crosses the 0 boundary.

   This and related code has many style bugs:
   - bogus cast to u_int.  min() converts both its args to u_int.  The
     conversion is normally (too) silent and the cast is even more silent.
     So the cast can at most break warnings about dubious types.
   - dubious and wrong types include:
        ssize_t n
        long size, on
     Except for the previous unrelated use of n, these variables are all for
     the block size and small offsets within a block.  They should have type
     int or u_int.  'on' is short for "old n".  It should have the same type
     as n.  I think the type of n was changed to ssize_t so that its other
     use was correct.  But it shouldn't have been used for unrelated variables.
   - n is bogusly cast to int in the call to uiomove() later.  I think this
     dates from 1980's code that this was cloned from (n had type long for
     bogus reasons, but fitted on an int, and uiomove() wasn't prototyped
     so this cast was required to support K&R).

Not quite similarly in cd9660_read().  It was just like udf_read() except
its n is not aboved for an unrelated purpose and it still has type long.
'long' is accidentally the same as ssize_t on all supported arches, but
isn't used to hold uio->uio_resid, and using it for n is bogus too, as
above above.  cd9660_read() used to use min() in the same place as
udf_read().  But someone fixed the corruption of uio->uio_resid in
min() by changing min() to MIN().  Using MIN() is a style bug.  It was
intentionally left out of 4.4BSD in the kernel, but contribed code kept
using this deprecated KPI and the KPI was broken to match and then non-
contribed code started using it too :-(.  MIN() has the disadvantages
of being unsafe and ugly.  The min() family has the disadantage of not
being type-generic; type errors using it are common.

Not similarly in devfs.  devfs_read() spells orig_resid worse as plain
resid, but doesn't have the type error.

Not similarly in fdescfs.  It uses uio_resid in fdesc_readdir().  It
has a lower limit of UIO_MAX instead of 1.  Similarly in fuse.

Not similarly in nandfs.  It doesn't check for uio_resid < 0.  It
replaces min() by omin().  This gives another type that is harmless
in practice.  omin() is for off_t's, but is used with an ssize_t
resid and an off_t filesize, after previously corrupting the filesize
variable from a uint64_t to an off_t.  64 bits signed should be enough
for anyone.  The logic seems to be broken if uio_resid < 0 actually
occurs.

Not similarly in pseudofs.  pfs_read() checks correctly for uio_resid < 0,
but handles this by returning EINVAL instead of KASSERT().  It has a
variable resid of type ssize_t, but doesn't do anything with this except
assign uio_resid to it so as to get an overflow if the types are broken
and the value doesn't fit.  This made more  sense when resid had type int.
It still has an INT_MAX limit on buflen and a bogus overflow check for
buflen (check for overflow after overflow has already caused undefined
behaviour).  The undefined behaviour can still easily occur on 32-bit
systems, since exoanding resid and buflen to ssize_t doesn't really
change them on 32-bit systems.  pfs_readdir() still assigns uio_resid
to "int resid".  Then it has a lower limit of more than 0 on resid,
as in fdesc_readdir() but now the check is broken since uio_resid was
corrupted by assignment to resid.

Not similarly in tmpfs.  tmpfs manages to combine most of the bugs.
tmpfs_read() starts with "int resid".  Then it has null sanity checking
for uio_resid (it does check uio_offset and handles it using EINVAL
instead of KASSERT().) Then it corrupts uio_resid by assigning it to
resid in the main loop.  The loop continuation criterion is that
corrupted resid is > 0.  Of course it uses the MIN() style bug.  It
doesn't use the old 'n' and 'on' variable names.  tmpfs_write() also
uses "int resid", but first does some checks using the uncorrupted
uio_resid.  The one for EFBIG depends on undefined behaviour to work.
Where ffs carefully casts the offset + resid expression to use uoff_t,
tmpfs lets it overflow and then converts the result to u_int64_t (sic),
provided off_t is no larger than 64 bits.  Normally the overflow is
benign and off_t is 64 bits, so this works.  Many cases where uio_resid
< 0 are detected accidentally here, since they cause offset + resid
to be negative and converting this this to u_int64_t often overflows
to a value > maxfilesize.

Bruce