memory errors

brad miele bmiele at ipnstock.com
Fri Apr 28 20:02:36 UTC 2006


Hi,

We have been dealing with phantom reboots on one of our DL380s for a few 
months. After finally getting hpasmd running, I started seeing the 
following in /var/log/messages just prior to the reboots:

Apr 26 11:38:59 bwayipn02 hpasmd[669]: WARNING: hpasmd: Corrected Memory 
Error threshold exceeded (System Memory, Memory Module 2)
Apr 26 11:39:00 bwayipn02 kernel: pid 669 (hpasmd), uid 0: exited on 
signal 11 (core dumped)

In reviewing the management log in ILO, the errors did correspond with the 
phantom reboots. HP sent new ram and we replaced it. The server was up 
under miinimal load for two days, and then today, the same thing happened. 
same error.

HP is sending a new board and more new ram, and we are going to try that.

My question is, is there anything on the os and software level that could 
cause this behavior? or is it most likely bad hardware? I am concerned 
because I noted an instance of the same error in the ilo logs of our other 
dl380, although on module 1, and i thought that the odds of both having 
bad ram/boards might be slim.

Thanks,

Brad
---------------------
Brad Miele
bmiele at ipnstock.com


More information about the freebsd-proliant mailing list