FreeBSD Machines dieing, we've tried so much....
fenix at ramb.com.ua
fenix at ramb.com.ua
Wed Jun 22 16:11:06 GMT 2005
Hello, Matt.
>>The vast majority of panics are hardware-related. It is rare nowadays
>>for a usermode program to make the system panic. In particular you said
>>the problem happens more under load. That really points even more to a
>>hardware problem - bad CPU cache ram, bad ram, scsi termination, that
>>sort of thing.
>>
>>Ted
>>
>>
> This is kind of going to be a blanket post to all the recent suggestions
> to me. I appreciate suggestions :) Ted, sorry, my other posts had
> dmesg and hardware specs, etc. I just couldn't remember the subject line
> of that thread. I'll be more descriptive here.
> We have two different servers crashing. Both are SMP, but on different
> hardware. We have five freeBSD servers in total, and only two are
> affected. That is why I do not believe this is a hardware problem.
> In any case, the machines are in a cold room where the temperature is
> constantly maintained. 20 other servers in there are perfectly stable,
> with no probs.
> This particular machine that crashed last night while running portsdb
> -uU is a Super Micro machine, with hyperthreading disabled in the bios,
> dual CPU 3.06 ghz, with 4 gigs memory. We ran mem test on orion (the
> machine that crashed last night) a week or so ago, and it found 70,000
> ECC errors. Those were fixed and that machine has been stable until
> last night. I've now disabled SMP support, we'll see if that keeps it
> stable or not. Portsdb -uU ran without problems after I disabled SMP.
> As far as uranus, the other box (we keep a planet scheme for a certain
> set of servers), we ran memtest86 and found no errors at all. That box
> crashed about two days ago but has been stable since. It has not lasted
> more than a week without doing a kernel trap and freezing.
> It seems that both these servers have this problem. Out of the five
> FreeBSD servers we have, these two are the ones with the highest load.
> Maybe a higher load on the other three servers would cause the same
> problem. I agree with you that this is a hardware problem, but on more
> than one server with two different architectures and our highest load
> makes me re-consider.
> If this is truly a bug in FreeBSD 5.4-RELEASE, maybe this is something
> that has been fixed in -stable? I will compile a debug kernel today and
> try to provide a trace to the problem. I'll do it on which ever server
> crashes next.
I had same situation with to different high loaded servers (both SMP, with 8Gb of
ram, and HT enabled,), with 5.4 Release, after disabeling HT and cvsup
OS to 5.4-stable all working fine without any problems, last reboot was 28
days ago.
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe at freebsd.org"
--
Best regards,
Sergey S. Ropchan
More information about the freebsd-questions
mailing list