kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP

Don Bowman don at sandvine.com
Sat Nov 29 08:34:07 PST 2003


From: Uwe Doering [mailto:gemini at geminix.org]
> Jonathan Gilpin wrote:
> > I've run memtest (memtest86.com) kindly provided by Don and 
> it passed all
> > the tests. I've installed installed a kernel module to test 
> for memory
> > errors and found that again no memory errors are found... 
> So this means it's
> > either a problem with the CPU's or a geniune bug in the 
> kernel. (bugger!)
> 
> No, that's unfortunately not what it means.  If a memory test 
> fails you 
> can draw the conclusion that you have bad memory, but this 
> doesn't work 
> the other way round.  If a memory test passes there is still a 
> possibility that a memory chip is the culprit since memory 
> test software 
> cannot find all errors.
> 
> Also, there is the chip set on the mainboard that coordinates 
> bus access 
> etc. for the two CPUs.  Mainboard and chip set developers are 
> known to 
> make errors, too.  In this case you would have to swap the entire 
> mainboard, possible with one from a different manufacturer.  
> I can tell 
> you from my own experience that it is really hard to find reliable PC 
> hardware these days, in light of ever shorter and faster 
> product release 
> cycles.

I have several hundred of the motherboard the poster is using,
and it works reliably with MP operation with 4.X.
The memtest86 that i sent him understands the ECC registers
on the e7501 MCH, it should find all correctable and uncorrectable
errors.

--don


More information about the freebsd-stable mailing list