kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP

Don Bowman don at sandvine.com
Sat Nov 29 08:40:19 PST 2003


The following reply was made to PR kern/59719; it has been noted by GNATS.

From: Don Bowman <don at sandvine.com>
To: 'Uwe Doering' <gemini at geminix.org>,
	freebsd-gnats-submit at FreeBSD.org
Cc: freebsd-bugs at freebsd.org, freebsd-stable at freebsd.org
Subject: RE: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Date: Sat, 29 Nov 2003 11:33:58 -0500

 From: Uwe Doering [mailto:gemini at geminix.org]
 > Jonathan Gilpin wrote:
 > > I've run memtest (memtest86.com) kindly provided by Don and 
 > it passed all
 > > the tests. I've installed installed a kernel module to test 
 > for memory
 > > errors and found that again no memory errors are found... 
 > So this means it's
 > > either a problem with the CPU's or a geniune bug in the 
 > kernel. (bugger!)
 > 
 > No, that's unfortunately not what it means.  If a memory test 
 > fails you 
 > can draw the conclusion that you have bad memory, but this 
 > doesn't work 
 > the other way round.  If a memory test passes there is still a 
 > possibility that a memory chip is the culprit since memory 
 > test software 
 > cannot find all errors.
 > 
 > Also, there is the chip set on the mainboard that coordinates 
 > bus access 
 > etc. for the two CPUs.  Mainboard and chip set developers are 
 > known to 
 > make errors, too.  In this case you would have to swap the entire 
 > mainboard, possible with one from a different manufacturer.  
 > I can tell 
 > you from my own experience that it is really hard to find reliable PC 
 > hardware these days, in light of ever shorter and faster 
 > product release 
 > cycles.
 
 I have several hundred of the motherboard the poster is using,
 and it works reliably with MP operation with 4.X.
 The memtest86 that i sent him understands the ECC registers
 on the e7501 MCH, it should find all correctable and uncorrectable
 errors.
 
 --don


More information about the freebsd-bugs mailing list