8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?

John Baldwin jhb at freebsd.org
Mon Jul 12 15:08:46 UTC 2010


On Monday, July 12, 2010 9:57:29 am Markus Gebert wrote:
> 
> On 12.07.2010, at 14:51, John Baldwin wrote:
> 
> >> Well, the situation has changed. Machine died over the weekend running our 
> >> test load with above kernel configuration. It seems that not having ehci in 
> >> the kernel at boot just makes the MCE much more unlikely to occur, but it 
> >> occurs. With ehci, I can panic the machine within a minute, without ehci it 
> >> seems to take at least hours. Still, I don't get why not having the ehci 
> >> driver in the kernel should have any effect, especially because nothing is 
> >> attached to it.
> > 
> > Ok, so maybe the SMI# interrupts do play a role somehow, at least as far as 
> > altering the timing.
> 
> Hm, if I've understood your other email correctly, disabling usb legacy support should get rid of SMIs just as well as loading the ehci driver. 
What I tested was kernel with ehci (panic within a minute) versus kernel without ehci (panic within hours), but both cases with usb legacy support 
disabled in BIOS. So, again, if I understand this correctly, the "SMI rate" should have been the same in both cases, because usb legacy support was 
turned off entirely, and therefore loading or not loading ehci should not impact the SMI rate. If this should be the case, why would there be an 
altering of timings between these two test cases?

Oh, I didn't know that USB legacy support was disabled in both cases.  That
should disable all the SMIs in both cases as you say.

Are you using Cx states other than C1 for the CPUs at all?

-- 
John Baldwin


More information about the freebsd-stable mailing list