should_yield problem [Was: svn commit: r251322 - head/sys/kern]

Wed Jul 3 04:40:20 UTC 2013

On Tue, 2 Jul 2013, Andriy Gapon wrote:

> on 02/07/2013 20:52 Attilio Rao said the following:
>> I was just pointing out that the real bug is not in the subtraction
>> itself but on the hogticks comparison.
>> This is because hogticks is not an absolute measurement but it
>> represents really a ticks difference.
>> In my opinion we should define hogticks as an unsigned int and add a
>> small comment saying that it is a differential.
>> This will fix the issue.

This is sort of backwards.  hogticks is a small positive integer.  Making
it unsigned just asks for sign extension bugs when it is compared with
signed values.  However, the tick counters are circular counters.  Making
them signed asks for undefined behaviour when they overflow.

> I think that my original suggestion is equivalently well if not more obvious.
> This is a common knowledge:
> http://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
>
> distance = (signed)( i1 - i2 )

With 2's complement and benign overflow, this is no different from
'distance = i1 - i2'.  But signed counters are not wanted here.  A
negative distance, or equivalently with 32-bit unsigned ints, an unsigned
difference of >= 0x80000000 means overflow or an initialization error.
Your bug is an initialization error.

BTW, -ftrapv is broken in gcc-4.2.1.  It seems to have worked in
gcc-3.3, but gcc-3.3 generates pessimal code for it, with a call to
an extern function even for adding ints.  gcc-4.2.1 still generates
an declaration of the extern function, but never calls it.  The
declaration is unnecessary even when it is used.

These bugs seem to be fixed in clang, at least on amd64.  clang
generates an explicit inline check for overflow and executes an
unsupported instruction on overflow.  The code for the check is good
with -O but bad (large and slow) with -O0.

These bugs are still large at the hardware level.  Overflow checking
is handled correctly on x86 only for floating point.  In floating
point, you can set or clear the mode bit for trapping on overflow, so
that programs can be tested without recompiling them and without adding
slow runtime checks (the hardware has to pipleline the checks well so
that they have low cost).  Hardware support for trapping on integer
overflow keeps getting worse since programmers never check for overflow.
i386 has an "into" instruction that should have been used instead of
"jo foo" to check for overflow.  It should have been easier for hardware
to optimize than the branch.  But AFAIK, "into" has never been optimized
and has always been slower than the branch, so programmers never used
it, so AMD removed it in 64-bit mode.

Speed tests for incrementing a volatile local variable on corei7 in
32-bit mode:
- -fno-trapv:     6.2 cycles  (clang bug: warn about -fno-trapv being unused)
- add into:      12.3 cycles  (error handling is SIGBUS)
- -ftrapv:        6.7 cycles  (error handling is SIGILL)
- mod jo to into: 7.3 cycles  (error handling is SIGBUS).

Bruce