memory errors
brad miele
bmiele at ipnstock.com
Fri Apr 28 20:02:36 UTC 2006
Hi,
We have been dealing with phantom reboots on one of our DL380s for a few
months. After finally getting hpasmd running, I started seeing the
following in /var/log/messages just prior to the reboots:
Apr 26 11:38:59 bwayipn02 hpasmd[669]: WARNING: hpasmd: Corrected Memory
Error threshold exceeded (System Memory, Memory Module 2)
Apr 26 11:39:00 bwayipn02 kernel: pid 669 (hpasmd), uid 0: exited on
signal 11 (core dumped)
In reviewing the management log in ILO, the errors did correspond with the
phantom reboots. HP sent new ram and we replaced it. The server was up
under miinimal load for two days, and then today, the same thing happened.
same error.
HP is sending a new board and more new ram, and we are going to try that.
My question is, is there anything on the os and software level that could
cause this behavior? or is it most likely bad hardware? I am concerned
because I noted an instance of the same error in the ilo logs of our other
dl380, although on module 1, and i thought that the odds of both having
bad ram/boards might be slim.
Thanks,
Brad
---------------------
Brad Miele
bmiele at ipnstock.com
More information about the freebsd-proliant
mailing list