svn commit: r343985 - head/sys/kern

Bruce Evans brde at optusnet.com.au
Mon Feb 11 12:08:26 UTC 2019


On Mon, 11 Feb 2019, Bruce Evans wrote:

> On Sun, 10 Feb 2019, Conrad Meyer wrote:
>
>> Log:
>>  Prevent overflow for usertime/systime in caclru1
>> ...
>> +static uint64_t
>> +mul64_by_fraction(uint64_t a, uint64_t b, uint64_t c)
>> +{
>> +	/*
>> +	 * Compute floor(a * (b / c)) without overflowing, (b / c) <= 1.0.
>> +	 */
>> +	return ((a / c) * b + (a % c) * (b / c) + (a % c) * (b % c) / c);
>> +}
>> +
>
> This is the slowest correct fix in the PR followup.  kib predicted
> that I wouldn't like it.  It does 2 64-bit divmods (after optimization)
> and many multiplications per call.  Times 2 calls.  clang will probably
> inline this, giving only 3 64-bit divmods instead of 4.
>
> Better version:
>
> + 	/*
> + 	 * Subdivide tu.  Combine avoiding overflow with reduction to 32-bit
> + 	 * operands in the multiplications in the usual case of tu <=
> + 	 * UINT32_MAX usec = 4294 seconds.
> + 	 */
> + 	if (__predict_true(tu <= UINT32_MAX && tt <= UINT32_MAX)) {
> + 		uu = ((uint64_t)(uint32_t)tu * (uint32_t)ut) / tt;
> + 		su = ((uint64_t)(uint32_t)tu * (uint32_t)st) / tt;
> + 	} else {
> + 		q = tu / tt;
> + 		r = tu % tt;
> + 		uu = q * ut + (r * ut) / tt;
> + 		su = q * st + (r * st) / tt;
> + 	}

I didn't keep the part of your reply to my reply above where you said
that you tested over a number of years.

I also tested over fake years in a running kernel using a fake cpu_tick
freqency of 100000 times smaller.

Now I see that all versions still overflow after 388 days with stathz =
128 when tt exceeds UINT32_MAX.  Then r can be 1 less than tt, and
ut and st can be the same as tt, so one operation is like sqaring tt
which overflows when tt is 1 more than UINT32_MAX.

There is a PR or 2 about this 388-day limit.  I knew about it before I
knew about the 105-hour limit fixed in this commit.(

Uptimes of a few years are not preposterous, and cpu_idle threads usually
have system time of nearly the uptime, so the overflow occurs on most
systems after they have been up for 388 days.

I'm trying a fix with tt and r divided by 16 in some places.  This should
work for 17 years with stathz = 128.  The thread runtime is more than 388
days when this occurs, so no one would care about differences of even a few
days but losing precision in the seconds and microseconds parts is not so
good.

Bruce


More information about the svn-src-head mailing list