svn commit: r340450 - head/sys/sys

Mon Nov 19 16:05:43 UTC 2018

On Tue, 20 Nov 2018, Bruce Evans wrote:

> On Tue, 20 Nov 2018, Bruce Evans wrote:
>
>> On Mon, 19 Nov 2018, Warner Losh wrote:
>> 
>>> On Mon, Nov 19, 2018 at 12:31 AM Andriy Gapon <avg at freebsd.org> wrote:
> ...
> I found my test program.
>
>>> But I think I understand the problem now.
>>> 
>>> mstosbt(1000) is overflowing with my change, but not without because we're
>>> adding 2^32 to a number that's ~900 away from overflowing changing a very
>> 
>> This value is just invalid.  Negative values are obviously invalid. 
>> Positive
>> values of more than 1 second are invalid.  In the old version, values of
>> precisely 1 second don't overflow since the scale factor is rounded down;
>> the result is just 1 unit lower than 1 second.  Overflow (actually 
>> truncation)
>> occurs at 1 unit above 1 second.  E.g., sbttoms(mstosbt(1000)) was 999 and
>> sbttoms(mstosbt(1001)) was 0.  Now in my fixed version, 
>> sbttoms(mstosbt(1000))
>> is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1.
>> 
>> The test program also showed that in the old version, all valid args except
>> 0 are reduced by precisely 1 by conversion to sbt and back.
>> 
>>> large sleep of 1 second to a tiny sleep of 0 (which is 1ms at the default
>>> Hz). I think we get this in the mlx code because there's a msleep(1000) in
>>> at least one place. msleep(1001) would have failed before my change. Now I
>>> think msleep(999) works, but msleep(1000) fails. Since the code is waiting
>>> a second for hardware to initialize, a 1ms instead is likely to catch the
>>> hardware before it's finished. I think this will cause problems no matter
>>> cold or not, since it's all pause_sbt() under the covers and a delay of 0
>>> is still 0 either way.
>> 
>> Bug in the mlx code.  The seconds part must be converted separately, as in
>> tstosbt().
>
> mlx doesn't seem to use sbt directly, or even msleep(), so the bug does
> seem to be central.
>
> mlx actually uses mtx_sleep() with timo = hz().   mtx_sleep() is actually
> an obfuscated macro wrapping _sleep().  The conversion to sbt is done by
> the macro and is sloppy.  It multiplies timo by tick_sbt.
>
> tick_sbt is SBT_1S / hz, so the sbt is SBT_1S / hz * hz which is SBT_1S
> reduced a little.  This is not affected by the new code, and it still isn't
> converted back to 1 second in ms, us or ns.  Even if it is converted back
> and then forth to sbt using the new code, it remains less than 1 second as
> an sbt so shouldn't cause any new problems.

Here are all uses of these functions in kern outside of sys/time.h:

XX arm/arm/mpcore_timer.c:	sc->et.et_min_period = nstosbt(20);

OK since 20 ns is less than 1 second.

XX cddl/compat/opensolaris/sys/kcondvar.h:	return (cv_timedwait_sbt(cvp, mp, nstosbt(tim), nstosbt(res), 0));
XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:	pause_sbt("dmu_tx_delay", nstosbt(wakeup),
XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:	    nstosbt(zfs_delay_resolution_ns), C_ABSOLUTE);
XX cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:	sbintime_t sleep = nstosbt((zilog->zl_last_lwb_latency * pct) / 100);
XX compat/linuxkpi/common/include/linux/delay.h:	pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK);
XX compat/linuxkpi/common/src/linux_hrtimer.c:		    nstosbt(hrtimer->expires), nstosbt(hrtimer->precision), 0);
XX compat/linuxkpi/common/src/linux_hrtimer.c:	callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(time)),
XX compat/linuxkpi/common/src/linux_hrtimer.c:	    nstosbt(nsec), hrtimer_call_handler, hrtimer, 0);
XX compat/linuxkpi/common/src/linux_hrtimer.c:	callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(interval)),
XX compat/linuxkpi/common/src/linux_hrtimer.c:	    nstosbt(hrtimer->precision), hrtimer_call_handler, hrtimer, 0);
XX compat/linuxkpi/common/src/linux_schedule.c:	ret = -pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK | C_CATCH);

All of the above might are broken unless their timeout arg is restricted to
less than 1 second.

Also, at least the sleep and pause calls in the above probably have a
style bug in knowing about sbt's at all.  More important functions
like msleep() hide the use of sbt's and use fuzzier scale factors like
tick_sbt.

XX dev/iicbus/nxprtc.c:			pause_sbt("nxpotp", mstosbt(100), mstosbt(10), 0);
XX dev/iicbus/nxprtc.c:		pause_sbt("nxpbat", mstosbt(100), 0, 0);

OK since 100 ms is less than 1 second.

XX kern/kern_sysctl.c:	sb = ustosbt(tt);
XX kern/kern_sysctl.c:	sb = mstosbt(tt);

Recently added bugs.  The args is supplied by applications so can be far above
1 second.

Also, these functions have a bogus API that takes uint64_t args.  sbt's
can't represent that high, and the API doesn't support failure, so callers
have the burden of passing valid values.  It is easiest to only support args
up to 1 second and require the caller to handle seconds.

Lots of itimer and select() code handle the corresponding problem for
timevals and timespecs almost correctly.  Timevals and timespecs already
have the seconds part separate and itimer and select() syscalls do validity
checking on the fractional seconds, so there is no problem converting the
fractional sections to an sbt.  Howver, the seconds part has type time_t
and this can be int64_t, which sbt's cannot represent.  Also, POSIX doesn't
permit failure for seconds values that are representable by time_t.  The
almost correct handling is to split up large seconds values in some cases
and break POSIX conformance by silently reducing the seconds value in
other cases.  The reduction is usually to INT32_MAX / 2.  This allows
adding 2 limited values but it is not obvious that this is enough since
rounding up twice or just adding 2 more would give overflow.

XX kern/subr_rtc.c:			sbt = nstosbt(waitns);

This is correct, since waitns is the nanoseconds part of a timespec.

Bruce