svn commit: r354076 - head/sys/dev/ow

Sat Oct 26 11:51:33 UTC 2019

On Fri, 25 Oct 2019, Andriy Gapon wrote:

> On 25/10/2019 18:46, Ian Lepore wrote:
>> On Fri, 2019-10-25 at 15:38 +0000, Andriy Gapon wrote:
>>> Author: avg
>>> Date: Fri Oct 25 15:38:09 2019
>>> New Revision: 354076
>>> URL: https://svnweb.freebsd.org/changeset/base/354076
>>>
>>> Log:
>>>   owc_gpiobus_read_data: compare times in sbintime_t units
>>>
>>>   Previously the code used sbttous() before microseconds comparison
>>> in one
>>>   place, sbttons() and nanoseconds in another, division by SBT_1US
>>> and
>>>   microseconds in yet another.
>>>
>>>   Now the code consistently uses multiplication by SBT_1US to convert
>>>   microseconds to sbintime_t before comparing them with periods
>>> between
>>>   calls to sbinuptime().  This is fast, this is precise enough (below
>>>   0.03%) and the periods defined by the protocol cannot overflow.
>>>
>>>   Reviewed by:	imp (D22108)
>>>   MFC after:	2 weeks
>>>
>>> Modified:
>>>   head/sys/dev/ow/owc_gpiobus.c
>>>
>>> Modified: head/sys/dev/ow/owc_gpiobus.c
>>> =====================================================================
>>> =========
>>> --- head/sys/dev/ow/owc_gpiobus.c	Fri Oct 25 15:02:50 2019	(r354
>>> 075)
>>> +++ head/sys/dev/ow/owc_gpiobus.c	Fri Oct 25 15:38:09 2019	(r354
>>> 076)
>>> @@ -296,10 +296,10 @@ owc_gpiobus_read_data(device_t dev, struct
>>> ow_timing *
>>>  	do {
>>>  		now = sbinuptime();
>>>  		GETPIN(sc, &sample);
>>> -	} while (sbttous(now - then) < t->t_rdv + 2 && sample == 0);
>>> +	} while (now - then < (t->t_rdv + 2) * SBT_1US && sample == 0);
>>>  	critical_exit();
>>>
>>> -	if (sbttons(now - then) < t->t_rdv * 1000)
>>> +	if (now - then < t->t_rdv * SBT_1US)
>>>  		*bit = 1;
>>>  	else
>>>  		*bit = 0;
>>> @@ -307,7 +307,7 @@ owc_gpiobus_read_data(device_t dev, struct
>>> ow_timing *
>>>  	/* Wait out the rest of t_slot */
>>>  	do {
>>>  		now = sbinuptime();
>>> -	} while ((now - then) / SBT_1US < t->t_slot);
>>> +	} while (now - then < t->t_slot * SBT_1US);
>>>
>>>  	RELBUS(sc);
>>>
>>
>> Unit conversions with sbt times should be done using the macros that
>> carefully avoid roundoff errors.  I don't understand why you've changed
>> the code that correctly did use those macros to inline math.
>
> I think that the commit message explains it:
> This is fast, this is precise enough (below 0.03%) and the periods defined by
> the protocol cannot overflow.
>
> Do you disagree?
> Could you please explain in which of the three lines changed the new code is
> worse and why?

The old code is worse.  This is partly because the inline functions are
over-engineered and poorly implemented and don't even work.

The first thing improved in the change is:

>>>  	do {
>>>  		now = sbinuptime();
>>>  		GETPIN(sc, &sample);
>>> -	} while (sbttous(now - then) < t->t_rdv + 2 && sample == 0);
>>> +	} while (now - then < (t->t_rdv + 2) * SBT_1US && sample == 0);

sbintime_t is signed so that negative differences work. Scaling negative
differences by SBT_1*S works right.  Scaling of negative difference by the
inline functions attempts to panic, but even the panic is broken (unreachable).

Here 'now' is an uptime and 'then' is presumably a previous uptime, so
now < then "can't happen".  Similar code with non-monotonic or untrusted
times might produce a negative difference.

For efficiency, t_rdv + 2 could be kept as an sbintime.  Scaling it at
runtime gives the cleaner code:

 	} while (now - then < ustosbt(t->t_rdv + 2) && sample == 0);

Sign extension bugs are further away in this expression.  The arg
(t->t_rdv + 2) is more obviously >= 0 since it is a limit, and all of
the inline functions return a signed type (sbintime_t or int64_t) so
that signed differences work.

The next thing improved is:

>>> -	if (sbttons(now - then) < t->t_rdv * 1000)
>>> +	if (now - then < t->t_rdv * SBT_1US)

This preserves full accuracy and signs for 'now - then' but loses a little
more accuracy than needed for conversion of t_rdv.  But accuracy of t_rdv
is apparently unimportant.  If the last bit in it were important, then it
should be kept as an sbintime and difference like 'now - then' should never
be scaled since even perfect rounding would lose bits.

The final thing improved is:

>>> -	} while ((now - then) / SBT_1US < t->t_slot);
>>> +	} while (now - then < t->t_slot * SBT_1US);

This already used the unscaled 'now - then', but with a style bug (excessive
parentheses.

imp still hasn't replied to my mails that fix some of the bugs in the
overengineerined committed version.  My version is even more overengineered.
I maintain the following version but only use the non-comment parts of
sbttons().  First look at the bugs in the committed sbtons():

XX /*
XX  * Large comment.  The function is not very overengineered except in the
XX  * comment.  Half the detals are wrong.
XX  */
XX static __inline int64_t
XX sbttons(sbintime_t _sbt)
XX {
XX 	uint64_t ns;
XX 
XX #ifdef KASSERT
XX 	KASSERT(_sbt >= 0, ("Negative values illegal for sbttons: %jx", _sbt));
XX #endif

Multiplication by SBT_1NS used to be the proper conversion KPI.  It loses
some accuracy, but works for negative _sbt (since SBT_1NS is signed).
Here the conversion is more complicated.  It still uses mostly signed types
(0xffffffffu gets promoted), but doesn't work if )sbt is negative.

The above attempts to detect and panic for this, but has no effect in most
cases, since due tp standard namespace pollution in <sys/param.h>, <sys/time.h>
is always included before <sys/systm.h>, so KASSERT in only defined here for
kernel files for kernel files that have style bugs in their #include order
or selection.

XX 	ns = _sbt;

Bad style (use ns as scratch for an sbtbintime_t variable).

XX 	if (ns >= SBT_1S)
XX 		ns = (ns >> 32) * 1000000000;
XX 	else
XX 		ns = 0;

This avoids overflow for ns large and < 0.  For ns large and < 0, it
doesn't reduce adjust ns, so later code adds a garbage bs (unscaled ns =
full _sbt) to the fractional part of _sbt.

XX 
XX 	return (ns + (1000000000 * (_sbt & 0xffffffffu) >> 32));
XX }

The large buggy comment is attached only to this function, but only applies
to function converting in the other direction.  In this direction, the
complications and overengineering are small.  The functions just avoid
overflow for large ns.

XX --- /home/bde/sys13/sys/time.h	2019-09-19 17:44:48.507873000 +0000
XX +++ time.h	2018-11-16 07:19:07.542131000 +0000
XX @@ -30,5 +30,5 @@
XX   *
XX   *	@(#)time.h	8.5 (Berkeley) 5/4/95
XX - * $FreeBSD: head/sys/sys/time.h 346176 2019-04-13 04:46:35Z imp $
XX + * $FreeBSD: head/sys/sys/time.h 340450 2018-11-15 16:02:13Z imp $
XX   */
XX 
XX @@ -98,9 +98,9 @@
XX  	uint64_t _p1, _p2;
XX 
XX -	_p1 = (_bt->frac & 0xffffffffull) * _x;
XX +	_p1 = (_bt->frac & 0xffffffff) * _x;
XX  	_p2 = (_bt->frac >> 32) * _x + (_p1 >> 32);
XX  	_bt->sec *= _x;
XX  	_bt->sec += (_p2 >> 32);
XX -	_bt->frac = (_p2 << 32) | (_p1 & 0xffffffffull);
XX +	_bt->frac = (_p2 << 32) | (_p1 & 0xffffffff);
XX  }
XX

Vestiges of my old fixes for similar problems converting between bintimes
and timevals/timespecs.  Although bintimes have much more precision, they
can't represent timevals or timespecs exactly, so conversions to bintimes
and back tend to lose a bit.  E.g., 1 second survives, but 999999 usec
becomes 999998 usec.

The use of ull is just a style bug here.  We just want to extract low
bits from _bt->frac.  The type of the sub-expression remains as uint64_t.

XX @@ -163,38 +163,60 @@
XX   * microsecond functions are also provided for completeness.
XX   *
XX - * These functions return the smallest sbt larger or equal to the
XX - * number of seconds requested so that sbttoX(Xtosbt(y)) == y.  Unlike

These functions are not as accurate as claimed.

XX - * top of second computations below, which require that we tick at the
XX - * top of second, these need to be rounded up so we do whatever for at
XX - * least as long as requested.
XX - *
XX - * The naive computation we'd do is this
XX - *	((unit * 2^64 / SIFACTOR) + 2^32-1) >> 32
XX - * However, that overflows. Instead, we compute
XX - *	((unit * 2^63 / SIFACTOR) + 2^31-1) >> 32
XX - * and use pre-computed constants that are the ceil of the 2^63 / SIFACTOR
XX - * term to ensure we are using exactly the right constant. We use the lesser

This method is routine and shouldn't be documented here.

Howver, there are some complications and shortcuts in the details that should
be documented or avoided.

XX - * evil of ull rather than a uint64_t cast to ensure we have well defined
XX - * right shift semantics. With these changes, we get all the ns, us and ms
XX - * conversions back and forth right.

Using ull instead of the correct cast is just a gross style bug.  Using
any unsigned type asks for unsign extension bugs but may be needed for
technical reasons (we start with int64_t types and must handle negative
values before combinining with uint64_t constants.

XX - * Note: This file is used for both kernel and userland includes, so we can't
XX - * rely on KASSERT being defined, nor can we pollute the namespace by including
XX - * assert.h.

Remove nonense about KASSERT.

XX + * Violate the comment about "Background information":

phk didn't like my rounding fixes for bintimes and responded by adding the
the "Background information" comment.  This demands always perfectly rounding
down.  The sbintime conversion functions violate this by always rounding up.
Multiplication by SBT1_*S rounds down a bit too much, but always perfectly
rounding down would do much the same after a few iterations.

XX + *
XX + * All these functions do sloppy scaling and/or rounding, but it is arranged
XX + * that the errors compensate for each other so that sbttoX(Xtosbt(x)) == x
XX + * for all valid args.
XX + *
XX + * Sloppy scaling in the decimal->sbt functions tends to give a result that
XX + * is significantly far below the floor of the correctly rounded result (call
XX + * this the "true floor").  Similary sloppy scaling in the sbt->decimal
XX + * functions reduces further below the true floor.  The result was that
XX + * sbttoX(Xtosbt(x)) was 1 too small for all cases except x = 0.
XX + *
XX + * Doing correct even scaling and rounding up to the true ceiling in the
XX + * decimal->sbt functions alone is useless for fixing this, since, the
XX + * sbt->decimal functions reduc even true ceilings to below true floors.
XX + *
XX + * Instead of correcting all the functions, return a fake ceiling in the
XX + * decimal->sbt functions.  This ceiling is approx. 1 decimal unit above the
XX + * sloppily scaled value.  It is thus above the true ceiling but not 1
XX + * decimal unit or more above.  It is unobvious that the sbt->decimal
XX + * functions don't reduce even this fake ceiling to below the true floor.
XX + * Exhaustive testing shows that they don't.  The fake ceiling must be
XX + * significantly more than 1 sbt unit above the true ceiling for the
XX + * reduction to be not too large.  Fortunately, there are no numerical
XX + * accidents that give a too-low fake ceiling.
XX + *
XX + * The previous version claimed to calculate true ceilings.  These never
XX + * work.  But it actually calculated fake ceilings of 2**32-1 sbt units
XX + * above the sloppily scaled value.  This value is nonsense, but worked
XX + * accidentally in some cases.  The decimal unit of 1 nsec is
XX + * 2**64/10**9 ~= 18e9 sbt units.  2**32-1 is thus about 4.29 times too
XX + * small to work non-accidentally.  In practice, it worked accidentally in
XX + * about 92% of cases.  For conversions of microseconds, the value used was
XX + * about 4.29e3 times too small, but worked accidentally in all except 32
XX + * cases.  For conversions of milliseconds, the value used was about 4.29e6
XX + * times too small, but worked accidentally in all cases.
XX + *
XX + * Note that the range of valid args is significantly limited, since the
XX + * efficient sloppy scaling and rounding methods used here only work up to 1
XX + * second.  E.g., _ms must be between 0 and 999 inclusive, since
XX + * ustosbt(1000) = 0 due to overflow of of the multiplication to almost
XX + * exactly 0.  This allows exhaustive testing that the sloppy methods work
XX + * It is unclear if similar sloppy scaling would work above 1 second, but
XX + * non-sloppy scaling is very easy for whole seconds (see tstosbt()).
XX + *
XX + * Uncommitted code for ts/tv<->bt was more careful, but was disapproved and
XX + * the comment about "Background information" was added to inhibit commits
XX + * in this area.  It does non-sloppy scaling of fractional parts, then rounds
XX + * to nearest.  I don't remember doing exhaustive testing of it, and now don't
XX + * see why rounding to nearest is enough.
XX   */
XX  static __inline int64_t
XX  sbttons(sbintime_t _sbt)
XX  {
XX -	uint64_t ns;
XX -
XX -#ifdef KASSERT
XX -	KASSERT(_sbt >= 0, ("Negative values illegal for sbttons: %jx", _sbt));
XX -#endif
XX -	ns = _sbt;
XX -	if (ns >= SBT_1S)
XX -		ns = (ns >> 32) * 1000000000;
XX -	else
XX -		ns = 0;
XX 
XX -	return (ns + (1000000000 * (_sbt & 0xffffffffu) >> 32));
XX +	return ((1000000000 * _sbt) >> 32);
XX  }

In this direction, I just removed the buggy overflow handling (but kept
the accurate scaling).  Overflow occurs after only ~2 seconds for nanoseconds.

XX 
XX @@ -202,16 +224,78 @@
XX  nstosbt(int64_t _ns)
XX  {
XX -	sbintime_t sb = 0;
XX 
XX -#ifdef KASSERT
XX -	KASSERT(_ns >= 0, ("Negative values illegal for nstosbt: %jd", _ns));
XX +	/*
XX +	 * The adjustment is essentially to add SBT_1NS to the result of
XX +	 * sloppy scaling and rounding down.  That happens to work too, but
XX +	 * use the slightly less sloppy method of combining shifts and
XX +	 * subtracting 1.  The result in most cses is much higher than the
XX +	 * true ceiling, but sloppy rounding for the reverse conversion
XX +	 * discards the extras.
XX +	 *
XX +	 * E.g., 2**64/10**9 has a fractional part of ~0.71.  Multiplying
XX +	 * this by the maximum valid _ns of 10**9-1 gives ~0.71e9, so
XX +	 * rounding down after scaling gives a raw sbt that is ~0.71e9 below
XX +	 * the true floor (only ~0.17 below the true floor in seconds).
XX +	 * Replacing _ns by (_ns + 1) gives a raw sbt that is ~0.29e9 above
XX +	 * the true ceiling (only ~0.068 above the true ceiling in seconds).
XX +	 * Subtracting 1 from this makes no significant difference.   Sloppy
XX +	 * scaling and rounding in the reverse conversion recovers the
XX +	 * original value: the sloppy scaling reduces the value from above the
XX +	 * true ceiling to below the true ceiling (but not below the true
XX +	 * floor); then rounding down recovers the true floor.
XX +	 *
XX +	 * Similarly for _all_ valid args and for the us and ms conversions.
XX +	 * It is fairly obvious that the result of conversion back to ns is
XX +	 * never 1 too low (since we add 1), but not obvious that the result
XX +	 * is never 1 too high (since adding 1 risks this).
XX +	 *
XX +	 * Conversion from sbt to decimal and back has no chance of always
XX +	 * restoring the original values since the decimal formats have less
XX +	 * precision.  Adding 1 is too sloppy to do anything good for this
XX +	 * direction.  The uncommitted ts/tv<->bt conversions are supposed to
XX +	 * work as well as possible in both directions.  Rounding to nearest
XX +	 * works better and non-sloppy scaling is obviously needed to work in
XX +	 * both directions.  However, non-sloppy scaling doesn't work with
XX +	 * mixed or sloppy rounding.  In the above example with _ns = 10**9-1,
XX +	 * it gives a raw sbt that is just 1 below the true floor.  Adding 1
XX +	 * to this gives the true ceiling in the sbt, but unless the reverse
XX +	 * conversion is changed similarly, it reduces from the true ceiling
XX +	 * to below the true floor, so the result of converting back has much
XX +	 * the same off-by-1 errors as simpler method (always 1 too low except
XX +	 * for an arg of 0).
XX +	 *
XX +	 * Further examples:
XX +	 * - when _ns = 0, sbt is SBT_1NS, but converting this back to ns
XX +	 *   gives 0.  If we start with this sbt, oue conversion functions
XX +	 *   are still unusable for printing it in nsec.
XX +	 * - when _ns = 10**9-1, we are close to overflow.  Adding 1 to it
XX +	 *   doesn't quite reach the overflow threshold.
XX +	 * - in previous versions, when _ns = 10**9, overflow was not quite
XX +	 *   reached, and the result of a double conversion had the usual
XX +	 *   off-by-1 error.
XX +	 * - when _ns = 10**9, overflow now occurs and the result of a double
XX +	 *   conversion is 0
XX +	 * - when _ns = 10**9+1, overflow occurs in previous versions and the
XX +	 *   result of a double conversion is 0 (overflow gives an off-by-10**9
XX +	 *   error and then the usual off-by-1-error occurs.
XX +	 */
XX +#if 1
XX +	/*
XX +	 * This version does non-sloppy scaling which after rounding down is
XX +	 * close to or equal 1 below the true ceiling for all nonzero args.
XX +	 * Then adding 1 gives an adequate true or fake ceiling.  Some of the
XX +	 * above comments are wrong about the true ceiling not working.
XX +	 * Unfortunately, this is about 30% slower (a whole cycle extra in
XX +	 * the test).  The unnecessary subtraction of 1 in the main version
XX +	 * costs about 1/2 of a cycle in the test.
XX +	 */
XX +	uint64_t factor, frac;
XX +
XX +	factor = ((uint64_t)1 << 63) / 500000000;
XX +	frac = ((0 - factor * 1000000000) << 32) / 1000000000;
XX +	return (((_ns * factor + ((_ns * frac) >> 32)) >> 32) + 1);

This is the fully overengineered version.  It is too slow to use.

XX +#else
XX +	return (((_ns + 1) * (((uint64_t)1 << 63) / 500000000) - 1) >> 32);
XX  #endif
XX -	if (_ns >= SBT_1S) {
XX -		sb = (_ns / 1000000000) * SBT_1S;
XX -		_ns = _ns % 1000000000;
XX -	}
XX -	/* 9223372037 = ceil(2^63 / 1000000000) */
XX -	sb += ((_ns * 9223372037ull) + 0x7fffffff) >> 31;

Rounding to the ceiling gives an intger scale factor which is a bit too
high; after multiplying by _ns the error in nanoseconds can be more than
1 and so can the error in sbintimes.  Using (_ns + 1) instead of _ns is
supposed to fix this.

The version that I actually use is:

 	sb += (__ns << 32) / 1000000000 + (_ns == 0 : 0 : 1)

This does simpler scaling to first round down, then violates the
"Background information" comment by always rounding up except for _ns == 0,
so that it it is clear that the rounding is as intended (don't do extra
work to avoid rounding up if _ns is exactly representable.

XX -	return (sb);
XX  }
XX 
XX @@ -226,16 +310,6 @@
XX  ustosbt(int64_t _us)
XX  {
XX -	sbintime_t sb = 0;
XX 
XX -#ifdef KASSERT
XX -	KASSERT(_us >= 0, ("Negative values illegal for ustosbt: %jd", _us));
XX -#endif
XX -	if (_us >= SBT_1S) {
XX -		sb = (_us / 1000000) * SBT_1S;
XX -		_us = _us % 1000000;
XX -	}
XX -	/* 9223372036855 = ceil(2^63 / 1000000) */
XX -	sb += ((_us * 9223372036855ull) + 0x7fffffff) >> 31;
XX -	return (sb);
XX +	return (((_us + 1) * (((uint64_t)1 << 63) / 500000) - 1) >> 32);
XX  }
XX 
XX @@ -250,16 +324,6 @@
XX  mstosbt(int64_t _ms)
XX  {
XX -	sbintime_t sb = 0;
XX 
XX -#ifdef KASSERT
XX -	KASSERT(_ms >= 0, ("Negative values illegal for mstosbt: %jd", _ms));
XX -#endif
XX -	if (_ms >= SBT_1S) {
XX -		sb = (_ms / 1000) * SBT_1S;
XX -		_ms = _ms % 1000;
XX -	}
XX -	/* 9223372036854776 = ceil(2^63 / 1000) */
XX -	sb += ((_ms * 9223372036854776ull) + 0x7fffffff) >> 31;
XX -	return (sb);
XX +	return (((_ms + 1) * (((uint64_t)1 << 63) / 500) - 1) >> 32);
XX  }
XX

Bruce