FreeBSD Machines dieing, we've tried so much....

Chad Leigh -- Shire.Net LLC chad at
Wed Jun 22 16:12:09 GMT 2005

On Jun 22, 2005, at 9:59 AM, Matt Juszczak wrote:

>> The vast majority of panics are hardware-related.  It is rare  
>> nowadays
>> for a usermode program to make the system panic.  In particular  
>> you said
>> the problem happens more under load.  That really points even more  
>> to a
>> hardware problem - bad CPU cache ram, bad ram, scsi termination, that
>> sort of thing.
>> Ted
> This is kind of going to be a blanket post to all the recent  
> suggestions to me.  I appreciate suggestions :)   Ted, sorry, my  
> other posts had dmesg and hardware specs, etc. I just couldn't  
> remember the subject line of that thread. I'll be more descriptive  
> here.
> We have two different servers crashing.  Both are SMP, but on  
> different hardware.  We have five freeBSD servers in total, and  
> only two are affected.  That is why I do not believe this is a  
> hardware problem.
> In any case, the machines are in a cold room where the temperature  
> is constantly maintained.  20 other servers in there are perfectly  
> stable, with no probs.
> This particular machine that crashed last night while running  
> portsdb -uU is a Super Micro machine, with hyperthreading disabled  
> in the bios, dual CPU 3.06 ghz, with 4 gigs memory.  We ran mem  
> test on orion (the machine that crashed last night) a week or so  
> ago, and it found 70,000 ECC errors.  Those were fixed and that  
> machine has been stable until last night.  I've now disabled SMP  
> support, we'll see if that keeps it stable or not. Portsdb -uU ran  
> without problems after I disabled SMP.
> As far as uranus, the other box (we keep a planet scheme for a  
> certain set of servers), we ran memtest86 and found no errors at  
> all.  That box crashed about two days ago but has been stable  
> since.  It has not lasted more than a week without doing a kernel  
> trap and freezing.
> It seems that both these servers have this problem.  Out of the  
> five FreeBSD servers we have, these two are the ones with the  
> highest load.  Maybe a higher load on the other three servers would  
> cause the same problem.  I agree with you that this is a hardware  
> problem, but on more than one server with two different  
> architectures and our highest load makes me re-consider.
> If this is truly a bug in FreeBSD 5.4-RELEASE, maybe this is  
> something that has been fixed in -stable?  I will compile a debug  
> kernel today and try to provide a trace to the problem.  I'll do it  
> on which ever server crashes next.

What do they have in common?  Disk controller?  Network controller?


Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at

More information about the freebsd-questions mailing list