svn commit: r349391 - head/sys/kern

Tue Jul 16 21:00:06 UTC 2019

On Tue, Jul 16, 2019 at 3:48 AM Bruce Evans <brde at optusnet.com.au> wrote:
>
> On Mon, 15 Jul 2019, John Baldwin wrote:
>
> > On 7/14/19 12:08 PM, Conrad Meyer wrote:
> >>
> >> This change restores the possible overflow beyond IO_SEQMAX that the
> >> removed conditional prevented.
> >>
> >> On Tue, Jun 25, 2019 at 12:44 PM Alan Somers <asomers at freebsd.org> wrote:
>
> I thought the same for a while on Tue, Jan 25, then didn't reply since I
> saw a non-problem while writing the reply.  Actually, all versions are
> broken.
>
> >>> Author: asomers
> >>> Date: Tue Jun 25 19:44:22 2019
> >>> New Revision: 349391
> >>> URL: https://svnweb.freebsd.org/changeset/base/349391
> >>>
> >>> --- head/sys/kern/vfs_vnops.c   Tue Jun 25 19:36:01 2019        (r349390)
> >>> +++ head/sys/kern/vfs_vnops.c   Tue Jun 25 19:44:22 2019        (r349391)
> >>> @@ -499,10 +499,8 @@ sequential_heuristic(struct uio *uio, struct file *fp)
> >>>                  * closely related to the best I/O size for real disks than
> >>>                  * to any block size used by software.
> >>>                  */
> >>> -               fp->f_seqcount += MIN(IO_SEQMAX,
> >>> +               fp->f_seqcount += lmin(IO_SEQMAX,
> >>>                     howmany(uio->uio_resid, 16384));
> >>> -               if (fp->f_seqcount > IO_SEQMAX)
> >>> -                       fp->f_seqcount = IO_SEQMAX;
> >>>                 return (fp->f_seqcount << IO_SEQSHIFT);
> >>>         }
>
> Overflow occurs in all versions in howmany() when uio_resid is near
> its limit of SSIZE_MAX.  Then adding (65536 - 1) to uio_resid overflows.
> The behaviour is undefined, but usually the addition gives a large
> negative value and howmany gives a not so large negative value after
> dividing by 65536.
>
> In the previous version, MIN() preserves signedness and the type of
> howmany(), which is always long (since ssize_t happens to be long on all
> supported arches).  The large negative value is less than IO_SEQMAX sinc
> that has a correct type (signed int) and is non-negative.  So MIN() preserves
> the large negative value.  This is blindly added to f_seqcount, giving
> another overflow on 64-bit arches and a garbage negative value on 32-bit
> arches if f_seqcount was not garbage.  The removed conditional only fixes
> up large positive values, so has no effect in this case.  Finally, left
> shifting the garbage gives further undefined behaviour.
>
> The second overflow on 64-bit arches is because the not so large negative
> value is still much larger than can be represented in f_seqcount.
> Values near -SSIZE_MAX have 63 value bits, so dividing by 16384 leaves
> 47 value bits, but f_seqcount has only 31 value bits.
>
> In this version, lmin() happens to give the same result as MIN() on all
> supported machines, because ssize_t happens to have the same representation
> on all supported machines.  So overflows occur as before for large uio_resid.
>
> So this change makes no differences for the 2 overflows for uio_resid near
> SSIZE_MAX.  But removing the conditional restores an overflow that actually
> was fixed in a previous recent commit.
>
> >> Perhaps instead this should be:
> >>
> >> fp->f_seqcount = lmin(IO_SEQMAX,
> >>   fp->f_seqcount + howmany(...));
>
> This works to restore the check in the removed conditional, but is a bit
> subtle and leaves the 2 old overflow bugs unfixed.
>
> > While I agree with your assessment of the removed conditional, I think the
> > suggestion above can have a funky overflow where f_seqcount + howmany(...) ends
> > up less than IO_SEQMAX and this expression would then be used instead of IO_SEQMAX?
>
> This only happens when howmany() itself overflows.  Otherwise, dividing by
> 16384 reduces the value enough to add f_seqcount to howmany(...) without
> overflow.
>
> > The first lmin seems designed to minimize what we add in the hope that since the
> > existing value is bounded to '0...IO_SEQMAX' the result of the += can only be in
> > the range '0...2*IO_SEQMAX'.
>
> One addition doesn't overflow, and the shift could be clamped later, but
> overflow still occurs after approx. INT_MAX / IO_SEQMAX additions, so it
> is better to clamp earlier.
>
> > I'm not sure which variants are most readable:
> >
> > A)
> >    fp->f_seqcount += lmin(IO_SEQMAX,
> >        howmany(resid, 16384));
> >    if (fp->f_seqcount > IO_SEQMAX)
> >            fp->f_seqcount = IO_SEQMAX;
>
> This change is because I said that using the unsafe macro MIN() is a style
> bug.  I also said that using the lmin() family is too hard, since it is
> not type generic.  I didn't notice the overflow problem in howmany().
>
> This has another style bug -- using lmin() and then open-coding the clamp
> f->f_seqcount = imax(f->f_seqcount, IO_SEQMAX).  This is sort of backwards --
> we use open code for the simple case because it is simple with either
> spelling, but we use lmin() for the complicated case to avoid spelling out
> the howmany() expression twice.  This obscures the 2 overflow bugs in the
> howmany() expression and many delicate type promotions and type equivalences
> that are needed for lmin() to work.
>
> > B)
> >    fp->f_seqcount += lmin(IO_SEQMAX,
> >        howmany(resid, 16384));
> >    fp->f_seqcount = lmin(fp->f_seqcount, IO_SEQMAX);
>
> Consistent style, but another has another type error.  f_seqcount has type
> int.  Of course, lmin() handles ints too, but to see that it works you
> still have to check that f_seqcount is representable as long and once you
> do that it is easier to see that imin() is courrect.
>
> > C)
> >    if (fp->f_seqcount + howmany(resid, 16384) < fp->f_seqcount)
> >            fp->f_seqcount = IO_SEQMAX;
> >    else
> >            fp->f_seqcount = lmin(IO_SEQMAX,
> >                fp->f_seqcount + howmany(resid, 16384));
> >
> > I'm probably not a fan of C).
>
> On supported arches, it recovers from overflow in howmany(), but only if
> the compiler doesn't optimize away the recovery.
>
> The first part can be done better:
>
>         if (uio->uio_resid >= IO_SEQMAX * 16384)
>                 fp->f_seqcount = IO_SEQMAX;
>
> Then for the second part, any version works since all values are small and
> non-negative, but I prefer to restore the the version that I rewrote 15-20
> years ago and committed 11 years ago (r175106):
>
>                 fp->f_seqcount += howmany(uio->uio_resid, 16384);
>                 if (fp->f_seqcount > IO_SEQMAX)
>                         fp->f_seqcount = IO_SEQMAX;
>
> My changes to this were to use 16384 instead of BKVASIZE and howmany()
> instead of its expansion.  Changing to howmany() was a mistake since
> it hides the first overflow.  When I rewrote it, uio_resid had type
> int so it was small enough after division by 16384, but not small enough
> to add (16384 - 1) to.  I didn't change the open-coded clamp on f_seqcount
> in this because it was good enough.
>
> Bruce

Bruce,
    Is this how you want it?
Index: sys/kern/vfs_vnops.c
===================================================================

--- sys/kern/vfs_vnops.c    (revision 349977)
+++ sys/kern/vfs_vnops.c    (working copy)
@@ -499,8 +499,13 @@
          * closely related to the best I/O size for real disks than
          * to any block size used by software.
          */
-        fp->f_seqcount += lmin(IO_SEQMAX,
-            howmany(uio->uio_resid, 16384));
+        if (uio->uio_resid >= IO_SEQMAX * 16384)
+            fp->f_seqcount = IO_SEQMAX;
+        else {
+            fp->f_seqcount += howmany(uio->uio_resid, 16384);
+            if (fp->f_seqcount > IO_SEQMAX)
+                fp->f_seqcount = IO_SEQMAX;
+        }
         return (fp->f_seqcount << IO_SEQSHIFT);
     }

-Alan