Memory error logged in /var/log/messages

Patrick M. Hausen hausen at punkt.de
Mon Nov 19 13:10:05 UTC 2018


Hi all,

one of our production servers, 11.2p3 is logging this every couple of minutes:

Nov 19 11:48:06 ph002 kernel: MCA: CPU 0 COR (5) OVER MS channel 3 memory error
Nov 19 11:48:06 ph002 kernel: MCA: Address 0x1f709a48c0
Nov 19 11:48:06 ph002 kernel: MCA: Misc 0x90010000040188c
Nov 19 11:48:06 ph002 kernel: MCA: Bank 12, Status 0xcc00010c000800c3
Nov 19 11:48:06 ph002 kernel: MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
Nov 19 11:48:06 ph002 kernel: MCA: Vendor "GenuineIntel", ID 0x406f1, APIC ID 0

Address and core varies but it is always bank 12.

It seems like applications are unaffected, we use, of course ECC memory.

Is the OS able to work around these errors and just notifies us or is in-memory
data already getting corrupted?

I’m at a bit of a loss identifying which DIMM might be the cause so I contacted Supermicro
support. They answered:

> We can't really answer this, we do not know how various OS's map the memory slots.
> Our advise is always to look at IPMI, but if that doesn't log any issues then we're not sure you're looking at a hardware issue.
> 
> But assuming the OS looks at the ranks of a module as a bank and you use dual rank memory then it should logically point at DIMMC2.

They are right on the IPMI (I told them when opening the case) - there’s nothing at all
in the event log.

Can they be correct that it might not even be a hardware issue?
If not how can I be sure which DIMM is to blame? Spare parts are ready but I’d like to
have a rather short maintenance break outside regular business hours.

I’ll attach a dmesg.boot. HW is a X10DRW-NT mainboard, SYS-1028R-WTNRT server platform.

Thanks for any hints,
Patrick
-- 
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info at punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg-boot.txt
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20181119/cc204869/attachment.txt>


More information about the freebsd-stable mailing list