svn commit: r340450 - head/sys/sys

Tue Nov 20 15:35:47 UTC 2018

On Mon, 19 Nov 2018, Warner Losh wrote:

> On Mon, Nov 19, 2018 at 9:05 AM Bruce Evans <brde at optusnet.com.au> wrote:
>
>> On Tue, 20 Nov 2018, Bruce Evans wrote:
>>
>>> On Tue, 20 Nov 2018, Bruce Evans wrote:
>>>
>>>> On Mon, 19 Nov 2018, Warner Losh wrote:
>>> ...
>>> I found my test program.
>>>
>>>>> But I think I understand the problem now.
>>>>>
>>>>> mstosbt(1000) is overflowing with my change, but not without because
>> we're
>>>>> adding 2^32 to a number that's ~900 away from overflowing changing a
>> very
>>>>
>>>> This value is just invalid.  Negative values are obviously invalid.
>>>> Positive
>>>> values of more than 1 second are invalid.  In the old version, values of
>>>> precisely 1 second don't overflow since the scale factor is rounded
>> down;
>>>> the result is just 1 unit lower than 1 second.  Overflow (actually
>>>> truncation)
>>>> occurs at 1 unit above 1 second.  E.g., sbttoms(mstosbt(1000)) was 999
>> and
>>>> sbttoms(mstosbt(1001)) was 0.  Now in my fixed version,
>>>> sbttoms(mstosbt(1000))
>>>> is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1.

This remains very broken.  I just noticed that the inverse functions
don't support sbt's much larger than 1 second either.  E.g., sbttons
multiplies by 1 billion, so it overflows for sbt's larger than 4.29
seconds.

It is better to not support times larger than 1 second in these
functions than to pessimize them all.  You only pessimized some of
them.  The inverse functions still overflow at small values; the direct
functions still overflow at large values.

>> Here are all uses of these functions in kern outside of sys/time.h:
>>
>> XX arm/arm/mpcore_timer.c:      sc->et.et_min_period = nstosbt(20);
>>
>> OK since 20 ns is less than 1 second.
>>
>> XX cddl/compat/opensolaris/sys/kcondvar.h:      return
>> (cv_timedwait_sbt(cvp, mp, nstosbt(tim), nstosbt(res), 0));
>> XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
>> pause_sbt("dmu_tx_delay", nstosbt(wakeup),
>> XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
>>  nstosbt(zfs_delay_resolution_ns), C_ABSOLUTE);
>> XX cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:    sbintime_t sleep =
>> nstosbt((zilog->zl_last_lwb_latency * pct) / 100);
>> XX compat/linuxkpi/common/include/linux/delay.h:
>> pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK);
>> XX compat/linuxkpi/common/src/linux_hrtimer.c:
>> nstosbt(hrtimer->expires), nstosbt(hrtimer->precision), 0);
>> XX compat/linuxkpi/common/src/linux_hrtimer.c:
>> callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(time)),
>> XX compat/linuxkpi/common/src/linux_hrtimer.c:      nstosbt(nsec),
>> hrtimer_call_handler, hrtimer, 0);
>> XX compat/linuxkpi/common/src/linux_hrtimer.c:
>> callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(interval)),
>> XX compat/linuxkpi/common/src/linux_hrtimer.c:
>> nstosbt(hrtimer->precision), hrtimer_call_handler, hrtimer, 0);
>> XX compat/linuxkpi/common/src/linux_schedule.c: ret =
>> -pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK | C_CATCH);
>>
>> All of the above might are broken unless their timeout arg is restricted to
>> less than 1 second.
>>
>> Also, at least the sleep and pause calls in the above probably have a
>> style bug in knowing about sbt's at all.  More important functions
>> like msleep() hide the use of sbt's and use fuzzier scale factors like
>> tick_sbt.
>
> Yes. It's for these users I'm fixing the >= 1s cases.

These functions are broken in FreeBSD-11 and FreeBSD-12 too.  They were
last correct in FreeBSD-10, where there weren't so many linuxkpi functions,
but where the zfs code is similar except it is missing the overflow bugs.

E.g., where the above does nstosbt(wakeup), FreeBSD-10 does wakeup * SBT_1NS.
SBT_1NS is 4, but the infinite-precision scale factor is 4.294967296.  The
sloppy scale factor is gives large absolute errors for large args, but at
least it doesn't overflow for merely large args; it does overflow for huge
args.

Maybe zfs's wakeup arg is always less that 1 second, so it wasn't broken
by blind conversion to use nstosbt(), but that is unclear.

Conversions in the other direction are very rare, at least using the
conversion functions instead of divisions by SBT_*.  sbttous() is used once
in dev/ow and once in kern_sysctl.c; sbttoms() is used once in kern_sysctl.c;
sbttons() is never used.  Thus the overflow in sbttons() doesn't matter, and
overflow in kern_sysctl.c's misuse of sbttous() only occurs at 4294+ seconds.

>
>> XX dev/iicbus/nxprtc.c:                 pause_sbt("nxpotp", mstosbt(100),
>> mstosbt(10), 0);
>> XX dev/iicbus/nxprtc.c:         pause_sbt("nxpbat", mstosbt(100), 0, 0);
>>
>> OK since 100 ms is less than 1 second.
>>
>> XX kern/kern_sysctl.c:  sb = ustosbt(tt);
>> XX kern/kern_sysctl.c:  sb = mstosbt(tt);
>>
>> Recently added bugs.  The args is supplied by applications so can be far
>> above
>> 1 second.
>>
>> Also, these functions have a bogus API that takes uint64_t args.  sbt's
>> can't represent that high, and the API doesn't support failure, so callers
>> have the burden of passing valid values.  It is easiest to only support
>> args
>> up to 1 second and require the caller to handle seconds.
>
> It's just as easy to cope.

No, it is not.  In your next committed versions, all functions still
overflow, just at larger values.

>> Lots of itimer and select() code handle the corresponding problem for
>> timevals and timespecs almost correctly.  Timevals and timespecs already
>> have the seconds part separate and itimer and select() syscalls do validity
>> checking on the fractional seconds, so there is no problem converting the
>> fractional sections to an sbt.  Howver, the seconds part has type time_t
>> and this can be int64_t, which sbt's cannot represent.  Also, POSIX doesn't
>> permit failure for seconds values that are representable by time_t.  The
>> almost correct handling is to split up large seconds values in some cases
>> and break POSIX conformance by silently reducing the seconds value in
>> other cases.  The reduction is usually to INT32_MAX / 2.  This allows
>> adding 2 limited values but it is not obvious that this is enough since
>> rounding up twice or just adding 2 more would give overflow.
>
> Right, that's beyond the scope of what I'm fixing.

Better fix the overflows in FreeBSD-11 and FreeBSD-12 (or get their author
to fix them) before adding complications.

> https://reviews.freebsd.org/D18051
>
> Contains the changes. They address all the actual problems you've raised,
> but I'm going to disagree that 'ull' is bogus. For FreeBSD, it's fine and
> it's a lot more readable than the alternatives.

I can't quote it in mail from there, but saw a committed version which I
don't like, though it passes my tests.

I said that long long is an abomination.  'ull' doesn't even have any
effect in this case.  It was last needed for gcc -std=before-c99 on
i386, since C before C99 doesn't have the abomination and gcc warns
about constants that are so large that there type is [unsigned] long
long.  There are no such constants on 64-bit arches, but there are some
on 32-bit arches.   clang is too broken to warn about this.  Now
kern.pre.mk enforces std=c99 although this is wrong (gnu99 is needed,
as in userland), so the warning never occurs even for gcc.

If you don't use explicitly typed constants, then constants automatically
have the correct type.  This is long long or unsigned long long for large
constants on i386 -- there is no way to avoid the abomination then.  But
on amd64, the correct type for large constants is long or unsigned long.

You might need ull constants to get unsigned arithmetic instead of signed
arithmetic, where the natural type of the large constants is only long
or long long.  Your comments allude to this.  But the conversions are
too simple or bogusly typed to need this here -- almost all variable
types are uint64_t although things overflow at about UINT32_MAX, so the
natural arithmetic is uint64_t except on exotic unsupported/unsupportable
arches where uint64_t has lower rank than int (then uint64_t promotes to
only int, so there may be sign extension bugs, but using explicit unsigned
long long constants gives some further promotions to unsigned long long).

Using 'ull' is also a lexical style bug.  Old code consistently uses LL.
This was needed for the obfuscation given by using large decimal constants
10-15 years ago before -std was c99.  This gives signed constants, and old
code also consistently bogusly casts these constants to uint64_t to ensure
unsigned arithmetic (except on the exotic arches).

Bugs in the committed version:

XX Log:
XX   Ensure that all values of ns, us and ms work for {n,u,m}stosbt

All values don't work.

E.g., since nstosbt multiplies a 64-bit unsigned value by approx. 4 and
the result is a signed 64-bit value, values larger than about 1/8 of the
maximum in ns overflow to a garbage negative sbt value.

XX   Also, use a shift one fewer left to avoid integer overflow causing
XX   incorrect results, and adjust the equasion accordingly. Document this.

I'm not sure how this helps.  When I tried it with my most accurate version,
it reduced the accuracy of sbt values from +-1 in sbt units to +-2, for
the obvious reason than the sloppy shift loses 1 bit.

XX   A test program to test all cases will be committed shortly. The test
XX   exaustively tries every value (thanks to bde for the test).

Testing all 2**64 cases given by pretending to support times above 1 second
makes exhaustive testing impossible.

XX Modified: head/sys/sys/time.h
XX ==============================================================================
XX --- head/sys/sys/time.h	Tue Nov 20 01:59:57 2018	(r340663)
XX +++ head/sys/sys/time.h	Tue Nov 20 07:11:23 2018	(r340664)
XX @@ -162,9 +162,24 @@ sbttobt(sbintime_t _sbt)
XX   * large roundoff errors which sbttons() and nstosbt() avoid.  Millisecond and
XX   * microsecond functions are also provided for completeness.
XX   *
XX - * These functions return the smallest sbt larger or equal to the number of
XX - * seconds requested so that sbttoX(Xtosbt(y)) == y. The 1 << 32 - 1 term added
XX - * transforms the >> 32 from floor() to ceil().
XX + * These functions return the smallest sbt larger or equal to the
XX + * number of seconds requested so that sbttoX(Xtosbt(y)) == y.  Unlike

No, these functions return an sbt that is much larger than the perfectly
rounded one.  See my old mail for the details, and for a variant that
does perfect rounding.

XX + * The naive computation we'd do is this
XX + *	((unit * 2^64 / SIFACTOR) + 2^32-1) >> 32
XX + * However, that overflows. Instead, we compute
XX + *	((unit * 2^63 / SIFACTOR) + 2^31-1) >> 32
XX + * and use pre-computed constants that are the ceil of the 2^63 / SIFACTOR
XX + * term to ensure we are using exactly the right constant. We use the lesser
XX + * evil of ull rather than a uint64_t cast to ensure we have well defined
XX + * right shift semantics. With these changes, we get all the ns, us and ms
XX + * conversions back and forth right.
XX + * Note: This file is used for both kernel and userland includes, so we can't
XX + * rely on KASSERT being defined, nor can we pollute the namespace by including
XX + * assert.h.

Many style bugs:
- this comment has fancy formatting, but it is not protected by '/*-'.
- this comment misformats sentence breaks as 1 spaces.  Old comments in
   this file were carefully formatted to use 2 spaces.

There is no need to calculate these constants manually.  ISTR spending many
mails instructing ian@ about their previous correct caclulation.

The KASSERTs are never defined in the kernel either.  See below.

XX   */
XX  static __inline int64_t
XX  sbttons(sbintime_t _sbt)
XX @@ -176,8 +191,18 @@ sbttons(sbintime_t _sbt)
XX  static __inline sbintime_t
XX  nstosbt(int64_t _ns)
XX  {
XX +	sbintime_t sb = 0;

Style bug: initialization in declaration.

XX 
XX -	return ((_ns * (((uint64_t)1 << 63) / 500000000) + (1ull << 32) - 1) >> 32);
XX +#ifdef KASSERT
XX +	KASSERT(_ns >= 0, ("Negative values illegal for nstosbt: %jd", _ns));
XX +#endif

KASSERT is never defined here in correct kernel code, due to standard
namespace pollution.  Correct kernel code includes <sys/param.h> first,
and in the kernel <sys/param.h> includes this file.  KASSERT is not defined
then.

Correct kernel code includes <sys/systm.h> second.  It takes chumminess
with the misimplementation to get KASSERT defined here.  <sys/systm.h>
has grown much more massive and disgusting pollution than <sys/param.h>,
but not enough for it to be self-sufficient.  First you have to provide
a replacement for <sys/param.h> that includes almost everything in it
except <sys/time.h>.  Include that and then <sys/systm.h>.  <sys/systm.h>
defines KASSERT early.  It includes <sys/param.h> in is grosser pollution
section later.  This includes <sys/time.h>, and KASSERT is defined before
it is used in this case.

XX +	if (_ns >= SBT_1S) {
XX +		sb = (_ns / 1000000000) * SBT_1S;
XX +		_ns = _ns % 1000000000;
XX +	}

Not too pessimal since this case is rarely reached.

XX +	/* 9223372037 = ceil(2^63 / 1000000000) */
XX +	sb += ((_ns * 9223372037ull) + 0x7fffffff) >> 31;

This is harder to understand than my versions.

First I fix the obfuscations and other style bugs in it:
- contrary to what I said above, everything isn't uint64_t, so there is a
   sign extension bug in practice which the explicit unsigned long long
   constant fixes in an ugly way.  _ns is only int64_t, and the natural
   constant is not unsigned, so the multiplication by the natural constant
   overflows to a negative value at about 0.5 seconds and then the shift
   is signed and messes up the negative value some more
- however, there is no need to reduce the powers of 2 by 1.

Cleaned up version with manually calculated multiplicative constant:

 	/* 18446744973 = ceil(2**64 / 10**9) */
 	sb += (_ns * (uint64_t)18446744973 + 0xffffffff) >> 32;

(Also remove comments about reducing the powers by 1.)

Cleaned up version with the compiler calculating the multiplicative constant:

 	sb += (_ns * (((uint64_t)1 << 63) / 500000000 + 1) + 0xffffffff) >> 32;

The additive constant is harder to understand.  Rounding up the multiplicative
constant gives an increment that is higher than the infinite-precision value
before the addition and the shift, but the shift often loses the extras.
The additive constant keeps enough extras.

The resulting sbt is often too high, but the important case when _ns = 0 is
not converted to a nonzero sbt like some of my versions do.

Since rounding up to the ceiling increases the multiplier by ~0.3 in units
of 2**32, when _ns = 10**9-1, the increase from the scaling is ~300e8 and
the increase from the additive constant is ~2**32 for a total increase of
not much more than 1.1*2**32 in units of 2**32 or 1.1 in sbt units.  This
means that the sbt value is often 1 higher than the correctly rounded
ceiling.  The versions that round down the multiplier reduce it by ~0.7
in units of 2**32 and need an additive correction of about 4 sbt units to
compensate.  Most or all of my versions added this even when _ns = 0 when
it is too much.

It is not obvious that converting sbt values that are 1.1 units too high
back to nanoseconds restores the original value (by rounding down).
Increasing the multiplier by 1 more gives sbt values that are only 0.2
units higher than 1.1 when _ns = 10**9-1, but breaks the conversion back
in 85% of cases (not just for large _ns).

XX +	return (sb);
XX  }
XX 
XX  static __inline int64_t
XX ...

Similarly.

Bruce