Fwd: 5-STABLE kernel build with icc broken

Bruce Evans bde at zeta.org.au
Thu Mar 31 03:40:56 PST 2005


On Thu, 31 Mar 2005, Peter Jeremy wrote:

> On Thu, 2005-Mar-31 17:17:58 +1000, Bruce Evans wrote:
>>  I still
>> think fully lazy switching (c2) is the best general method.
>
> I think it depends on the FP workload.  It's a definite win if there
> is exactly one FP thread - in this case the FPU state never needs to
> be saved (and you could even optimise away the DNA trap by clearing
> the TS and EM bits if the switched-to curthread is fputhread).

I think stopping the trap would be the usual method (not sure what
Linux did), but to collect statistics for determining affinity you
would want to take the trap anyway.

> The worst case is two (or more) FP-intensive threads - in this case,
> lazy switching is of no benefit.  The DNA trap overheads mean that
> the performance is worse than just saving/restoring the FP state
> during a context switch.
>
> My guess is that the current generation workstation is closer to the
> second case - current generation graphical bloatware uses a lot of
> FP for rendering, not to mention that the idle task has a reasonable
> chance of being an FP-intensive distributed computing task (setiathome
> or similar).  It's probably time to do some more measuring (I'm not
> offering just now, I have lots of other things on my TODO list).

Bloatware might be so hoggish that it rarely makes context switches :-).
Context switches for interrupts increase the problem though, as would
using FP more in the kernel.

>> BTW, David and I recently found a bug in the context switching in the
>> fxsr case, at least on Athlon-XP's and AMD64's.
>
> I gather this is not noticable unless the application is doing its
> own FPU save/restore.  Is there a solution or work-around?

It's most noticeable for debugging, and if you worry about leaking
thread context.  Fortunately, the last-instruction pointers won't
have real user data in them unless the application encodes it there
intentionally.  I can't see any efficent solution or workaround.
The kernel should do a full save/restore for processes being debugged.
For applications, the bug seems to be larger.  Even if they know about
the amd behaviour and do a full save/restore because they need it, it
won't work because the kernel doesn't preserve the state across
context switches.  Applications like vmware might care more than most.

I forgot to mention that we couldn't find anything in intel manuals
about this behaviour, so it might be completely amd-specific.  Also,
the instruction pointers are fundamentally broken for 64-bit CPUs,
since although they are 64 bits, they have the segment selector encoded
in their top 32 bits, so they are not really different from the 32:32
selector:pointer format for the non-fxsr case.  Their format is specified
by SSE2 so 64-bit extensions would have to be elsewhere, but amd64 doesn't
seem to extend them.

Bruce


More information about the freebsd-hackers mailing list