kernel MCA messages

John Baldwin jhb at freebsd.org
Mon Aug 23 12:40:32 UTC 2010


On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote:
> on 23/08/2010 05:05 Dan Langille said the following:
> > On 8/22/2010 9:18 PM, Dan Langille wrote:
> >> What does this mean?
> >>
> >> kernel: MCA: Bank 4, Status 0x940c4001fe080813
> >> kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
> >> kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> >> kernel: MCA: CPU 0 COR BUSLG Source RD Memory
> >> kernel: MCA: Address 0x7ff6b0
> >>
> >> FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43
> > 
> > And another one:
> > 
> > kernel: MCA: Bank 4, Status 0x9459c0014a080813
> > kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
> > kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> > kernel: MCA: CPU 0 COR BUSLG Source RD Memory
> > kernel: MCA: Address 0x7ff670
> 
> I believe that you get correctable RAM ECC errors, but not entirely sure.
> There is mcelog utility that decodes such messages into human-friendly descriptions.
> The utility is available on Linux-based systems.
> John Baldwin has a port of it to FreeBSD, but it seems to be WIP and is private
> so far.  Wait and watch John posting decoded text in this thread :-)

It is not private, it is in //depot/projects/mcelog/... in p4.  It is not a
complete port yet though (doesn't support the daemon and client modes for
example).

Details for these errors:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge 
ADDR 7ff6b0 
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = fe18
       bit32 = err cpu0
       bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
             generic read mem transaction
             memory access, level generic'
STATUS 940c4001fe080813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge 
ADDR 7ff670 
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = 4ab3
       bit32 = err cpu0
       bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
             generic read mem transaction
             memory access, level generic'
STATUS 9459c0014a080813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 5

As Andriy guessed, I believe both of these are corrected ECC errors.  You
can likely ignore them as a low rate of corrected ECC errors is not
unexpected.

-- 
John Baldwin


More information about the freebsd-stable mailing list