svn commit: r259609 - head/sys/kern

Thu Dec 19 17:01:13 UTC 2013

Am 19.12.2013 11:49, schrieb Bruce Evans:
> On Thu, 19 Dec 2013, Stefan Esser wrote:
> 
>> Log:
>>  Fix overflow for timeout values of more than 68 years, which is the
>> maximum
>>  covered by sbintime (LONG_MAX seconds).
> 
> Not LONG_MAX seconds, but INT32_MAX seconds.  LONG_MAX seconds is about
> 2**32
> times larger on 64-bit arches.

Hi Bruce,

yes, you are of course correct ... The limit is 2^31-2^-32 seconds,
to be exact ;-)

This is represented by LONG_MAX in sbintime, which made me use that
wrong constant in the commit message.

> sbintimes and their complexity give many more possibilities for overflow.
> I thought that the new overflow bugs and some old ones were already  fixed.
> 
>>  Some programs use timeout values in excess of 1000 years. The conversion
>>  to sbintime caused wrap-around on overflow, which resulted in short or
>>  negative timeout values. This caused long delays on sockets opened by
>>  affected programs (e.g. OpenSSH).
>>
>>  Kernels compiled without -fno-strict-overflow were not affected,
>> apparently
>>  because the compiler tested the sign of the timeout value before
>> performing
>>  the multiplication that lead to overflow.
> 
> I think it just gave a garbage value that was accidentally less harmful.

The result of the multiplication was performed modulo 2^64 due to
the limited range of the operation. The result was then interpreted
as a signed 64 bit 2s-complement number.

The factor 2^32/1000 is 0x418937, which multiplied with a round
decimal number over 1000*2^32 will result in "random" bit sequence
in the resulting sbintime.

Half the values will have been positive, and many will have
corresponded to substantial timeout values, but some will have
been very low, resulting in unexpectedly low timeouts.

>>  When the -fno-strict-overflow option was added to CFLAGS, this
>> optimization
>>  was disabled and the test was performed on the result of the
>> multiplication.
>>  Negative products were caught and resulted in EINVAL being returned, but
>>  wrap-around to positive values just shortened the timeout value to the
>>  residue of the result that could be represented by sbintime.
> 
> This shows one reason why -fno-strict-overflow shouldn't be used.  It just
> wastes time to give different undefined behaviour with undocumented
> details.

Well, I thought so when I found the cause of the breakage (long
delays opening TCP connections) that were the result of the
compilation with -fno-strict-overflow.

But I reconsidered, because a real bug in the code has been identied,
this way. The bug existed, without being detected (because too short
but non-zero timeout values were caught at the application layer, which
just re-issued the call with the remaining time as timeout parameter).

If applications didn't have to worry about short timeouts for other
reasons, then this bug would have led to observable problems, even
with -fno-strict-overflow.

>>  The fix is to cap the timeout values at the maximum that can be
>> represented
>>  by sbintime, which is 2^31 - 1 seconds or more than 68 years.
> 
> 2^31 - 1 is a correct spelling of INT32_MAX, unlike LONG_MAX.

Yes, sbintime is defined in units of 2^-32 seconds, which makes the
highest representable time exactly (2^31 - 2^-32) seconds. This
value is represented by LLONG_MAX units of 2^32 seconds ...

>>  After this change, the kernel can be compiled with -fno-strict-overflow
>>  with no ill effects.
> 
>> Modified: head/sys/kern/kern_event.c
>> ==============================================================================
>>
>> --- head/sys/kern/kern_event.c    Thu Dec 19 07:33:07 2013    (r259608)
>> +++ head/sys/kern/kern_event.c    Thu Dec 19 09:01:46 2013    (r259609)
>> @@ -526,7 +526,8 @@ knote_fork(struct knlist *list, int pid)
>> static __inline sbintime_t
>> timer2sbintime(intptr_t data)
>> {
>> -
>> +    if (data > LLONG_MAX / SBT_1MS)
>> +        return LLONG_MAX;
>>     return (SBT_1MS * data);
>> }
> 
> This has the following style bugs:
> - it removes the empty line after the (null) declarations
> - it is missing parentheses around the return value
> - it uses the long long abomination

Ughh, I'll fix the style bugs ...

It had taken me several hours over the last week to find this bug,
and I committed the fix that worked on my system without making it
compliant with FreeBSD style. Sorry for that ...

> This has the following type errors:
> - using the long long abomination is not just a style bug.  sbintime_t has
>   type int64_t, not long long.  The overflow check will break if long long
>   becomes longer than int64_t

The definition of sbintime is int64_t on all architectures. If any
architecture used a wider integer type for sbintime, the fix would
just limit the maximum timeout to 68 years (instead of many magnitudes
longer than the universe will last), which is still way beyond any
sensible timeout value. And functions will have to check for too short
timeouts to be robust, anyway.

The functions in kern_timeout.c heavily depend on sbintime having at
least 64 bits and perform calculations under the assumption, that no
value above INT64_MAX can be passed as sbintime. So, even if there
was a wider sbintime, the highest value supported was still represented
by INT64_MAX.

> - returning LLONG_MAX is usually just a style bug.  When long long is
> longer
>   than int64_t, the return value will overflow, but the
> implementation-defined
>   behaviour for this is usually to convert it to the correct value.

Yes, it was a silly mistake to use LLONG_MAX, even though it is
identical to INT64_MAX on all architectures currently supported by
FreeBSD.

> INT64_MAX is a dangerous default maximum timeout.  You can't even add the
> minimum sbintime_t of 2**-32 seconds to this without overflowing.

The function timer2sbintime is used in only two places in the kernel,
one being filt_timerattach(), which caused the kernel breakage, the
other with a constant timeout of 0. No further additions are ever
performed on the value, so there is no risk of overflow. The function
is declared "static inline", and will therefore not be called from
outside of this source file.

> Old timeout code used the following method to reduce the problem of
> overflowing timeouts:
>   For alarm() and setitimer(), limit the timeout to 100 million seconds
>   (3.17 years) and return EINVAL if it is too large.  This works with
>   32-bit time_t until about 2035.  POSIX doesn't allow this.  Clamping
>   the time to 100 million seconds isn't allowed either, since the
>   residual time can be seen from applications.  Note that 64-bit time_t
>   has little effect on this problem.  Uses wishing to overflow the kernel
>   simply ask for a timeout in a timeval of INT64_MAX seconds plus 999999
>   seconds so that adding just 1 microseconds to it overflow.  With
>   seconds in sbintime_t instead of in time_t, overflow occurs much
>   sooner.

The brokenness of the kernel compiled with -fno-strict-overflow was
due to EINVAL being returned to callers of filt_timerattach(), which
apparently did not check that return code (or at least endlessly
retried with the same invalid timeout parameter).

> This is quite broken now:
> - the limit of 100000000 used to be enforced in itimerfix(), but this was
>   removed in connection with implementing POSIX realtime timers without
>   even breaking the documentaion to match or touching PR(s) about the
>   POSIX non-conformance of the limit.
> - setitimer() mostly uses sbintimes now, so the old limit is irrelevant
>   and enforcing it in itimerfix() would break the new sbintime code and
>   presumably the not so new realtime timers code.  The sbintime code
>   uses a modified limit that is also not allowed by POSIX.  This limit is
>   of course undocumented.  It is INT32_MAX / 2, so that there is plenty
>   of room for expansion.  This is what the above should use, modulo the
>   conformance bug.  EINVAL is returned when this limit is exceeded.

Are you sure about the limit of INT32_MAX / 2 (i.e. 2^30 -1) seconds?

My reading is that sbintime = INT64_MAX represents (2^31-2^-32) seconds,
and that this value is correctly treated in the timeout code.

It is possible to pass values of more than 2^31 seconds to setitimer()
on architectures where time_t is defined as int64_t. There are no tests
for overflow in tvtosbt() in /sys/sys/time.h.

> - select() and poll() were more careful and seem to be correct now.
>   The timeout in poll() is limited to INT_MAX milliseconds, so it fits
>   in sbintime_t.  Except this assumes that int is 32 bits.  select()
>   limits the timeout to INT32_MAX seconds so that it is representable
>   as an sbintime_t and then does a further overflow check involving
>   INT64_MAX.  When overflow would occur, it sets asbt to -1, which I
>   think means an infinite timeout.  I don't like this.  -1 is a tiny
>   amount less than 0, not near plus infinity.
> - nanosleep() used to be less careful than select() and poll(), but seems
>   to have been fixed.  For some reason, it uses a mixture of the above
>   methods.  It limits sleep times to INT32_MAX / 2 seconds but repeats
>   the sleep as necessary so as not to return an error.  The logic for
>   this is subtle and I haven't checked it.

I just looked at nanosleep() and have to admit, that I do not fully
understand its semantics. A tv_nsec value < 0 or > 10^9 results in
EINVAL, while a negative tv_sec is silently treated as 0.

If a timeout of 2^30 seconds or more is requested, the sleep time is
limited to 2^30 - 1 seconds, but the difference is returned in the
timespec value pointed to by the optional return value pointer (if
present). That value may thus be non-zero, even if the function had
returned 0 (indicating that the timeout has occured).

At least for the case of tv_sec in the range of 2^30 to 2^31-1, the
caller will assume that the full time has elapsed, while in fact only
2^30 seconds have passed and the call should be repeated with the
remaining sleep time returned in rmtp ...

> davide@ has patches related to fixing this, but none seem to have been
> committed yet, except some old ones that gave some of the above overflow
> checking.

Thanks for the detailed information and sorry about the style violation.
I'll fix it, if consensus is reached about the correct style.

I'd replace the two occurances of LLONG_MAX with INT64_MAX and add the
missing empty line:

static __inline sbintime_t
timer2sbintime(intptr_t data)
{

        if (data > INT64_MAX / SBT_1MS)
                return INT64_MAX;
        return (SBT_1MS * data);
}

If you can show evidence that a limit of INT64_MAX/2 is more appropriate
(2^30 seconds or 34 years), the limit could be of course be reduced to
that value.

I could not find any code that would not tolerate INT64_MAX, though ...

Regards, STefan