kernel usage of fxsave/fxrstor

Thu May 20 17:41:25 UTC 2010

I'm wondering why we equate cpu_fxsr and hw_instruction_sse in our
kernel, when several families of Intel and AMD processors have
fxsave/fxrstor, but not sse, and various documents from both companies
suggest that fxsave/fxrstor is faster than fsave/fnsave/frstor, even
when only saving the fpu/mmx state, and ought to be used for context
switches and calls and returns from interrupt and exception handlers
(e.g.. Sections 8.1.11, 10.5, and 11.6.5 of the Intel 64 and IA-32
Software Developers' Manual, Volume 1:

http://www.intel.com/Assets/PDF/manual/253665.pdf

).

As far as I can tell from a cursory check, Linux draws a distinction
between cpu_has_fxsr, and cpu_has_xmm/xmm2, and uses fxsave/fxrstor on
all processors that have the feature, regardless of whether they have
sse.  Shouldn't we do the same?  Was this overlooked in the initial
sse commits? Or are the Intel assertions that the newer instructions
are faster incorrect?  Or was the extra handling needed for the
different semantics of the newer instructions, and/or concerns over
FreeBSD-SA-06:14.fpu.asc/CVE-2006-1056 responsible for their
suppression in pre-sse processors, even though safe methods of using
them was suggested:

http://security.freebsd.org/advisories/FreeBSD-SA-06:14-amd.txt ?

(Note that I'm not asking about setting the CR4.OSFXSR bit when sse
isn't needed or present, just using the newer fxsave/fxrstor when they
are present.)

Regards,
                 b.