kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu

Tue Jun 6 19:26:44 PDT 2006

The following reply was made to PR kern/98460; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Rostislav Krasny <rosti.bsd at gmail.com>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled
 for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu
Date: Wed, 7 Jun 2006 12:09:10 +1000 (EST)

 On Tue, 6 Jun 2006, Rostislav Krasny wrote:

 > On Mon, 5 Jun 2006 08:25:06 +1000 (EST)
 > Bruce Evans <bde at zeta.org.au> wrote:
 >
 >> On Sun, 4 Jun 2006, Rostislav Krasny wrote:
 >>
 >>> On Sun, 4 Jun 2006, Bruce Evans wrote:
 >>>> The configuration should be dynamic and automatic, so that it doesn't
 >>>> take changes to zillions of configuration files to implement and
 >>>> document an option that almost no one will know to set.  I think there
 >>>> is a simple feature test for the AMD misfeature.
 >>>
 >>> David Xu had proposed something like that. But from Colin Percival's
 >>> reply I understood that it is hard to be done effectively. See their
 >>> discussion by the first URL in this PR.
 >>
 >> I don't see how it can be hard.  Perhaps it is too CPU-dependent for
 >> tests based on cpuid to be easy or future-proof, but a runtime test
 >> in the probe would be easy.  Here is a userland version.  It gives the
 >> ...
 >
 > And then you want to call the fpu_clean_state() function conditionally,
 > like in following example?
 >
 > if (cpu_fxsr & CPU_FXSR_NEEDCLEAN)
 >        fpu_clean_state();

 Not quite like that.  In my version there is no function call -- the code
 is excecuted in the one place where it is needed, so there is no function
 call overhead or possible branch prediction oferhead for the function call.

 > But this looks same to what Davi Xu had proposed. Read what Colin
 > Percival had replied about that proposition:
 >
 > http://lists.freebsd.org/pipermail/freebsd-current/2006-May/062683.html

 >> The problem with doing something like this is that the branch will
 >> almost never be in the processor's branch prediction tables, so you
 >> will get a branch mis-prediction on the unaffected processors --
 >> which is likely to be more expensive than simply running the state
 >> cleaning code.

 It can't possibly be _more_ expensive, since the state-cleaning code
 has 2 or 3 branches in it instead of only 1.  It has 1 or 2 branches
 for the function call and return.  Whether function calls and returns
 use normal branch prediction is machine-dependent.  Whatever they use,
 it takes some CPU resources.  The state-cleaning code has a branch in
 it.  This branch is slightly harder to predict than a cpu_fxsr one.
 My second version of a fix avoided this branch by doing the fnclex()
 unconditionally (the first version did the load unconditionally and
 paniced in coner cases).  The code with the branch runs much faster
 than an unconditional fnclex() in a simple benchmark with the code in
 a loop, but I wonder if it is still faster after branch misprediction.

 > Eliminating the fpu_clean_state() by "options CPU_FXSAVE_NO_LEAK" could
 > be used as a custom optimization. No one is obliged to use it, as well
 > as many other CPU_* optimization options.

 There are too many options and not enough automatic tuning.  This
 particular optimization is particularly worth not doing since it is
 in the 10-100 cycle range (similar to what could be gained from avoiding
 a single branch misprediction or cache miss), but I care about it since
 it is to compensate for a pessimization.

 Bruce