8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?

Markus Gebert markus.gebert at hostpoint.ch
Mon Jul 12 13:57:32 UTC 2010

On 12.07.2010, at 14:51, John Baldwin wrote:

>> Well, the situation has changed. Machine died over the weekend running our 
>> test load with above kernel configuration. It seems that not having ehci in 
>> the kernel at boot just makes the MCE much more unlikely to occur, but it 
>> occurs. With ehci, I can panic the machine within a minute, without ehci it 
>> seems to take at least hours. Still, I don't get why not having the ehci 
>> driver in the kernel should have any effect, especially because nothing is 
>> attached to it.
> Ok, so maybe the SMI# interrupts do play a role somehow, at least as far as 
> altering the timing.

Hm, if I've understood your other email correctly, disabling usb legacy support should get rid of SMIs just as well as loading the ehci driver. What I tested was kernel with ehci (panic within a minute) versus kernel without ehci (panic within hours), but both cases with usb legacy support disabled in BIOS. So, again, if I understand this correctly, the "SMI rate" should have been the same in both cases, because usb legacy support was turned off entirely, and therefore loading or not loading ehci should not impact the SMI rate. If this should be the case, why would there be an altering of timings between these two test cases?

Since SMM is out the the OS' control, I guess there's no good way to track SMIs?


