8.1-STABLE amd64 machine check

John Baldwin jhb at FreeBSD.org
Wed Aug 11 17:38:47 UTC 2010


Dan Langille wrote:
> I am encountering a situation similar to one reported by Andrew Heybey 
> at http://docs.freebsd.org/cgi/mid.cgi?6E83197B-9DD5-4C7E-846D-AD176C25464D
> 
> This morning I found this in my /var/log/messages:
> 
> Aug 11 01:59:48 kraken kernel: MCA: Bank 4, Status 0x94614c62001c011b
> Aug 11 01:59:48 kraken kernel: MCA: Global Cap 0x0000000000000106, 
> Status 0x0000000000000000
> Aug 11 01:59:48 kraken kernel: MCA: Vendor "AuthenticAMD", ID 0x100f42, 
> APIC ID 0
> Aug 11 01:59:48 kraken kernel: MCA: CPU 0 COR GCACHE LG RD error
> Aug 11 01:59:48 kraken kernel: MCA: Address 0x5d0fe8c
> 
> 
> from /var/run/dmesg.boot
> 
> Copyright (c) 1992-2010 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.1-STABLE #0: Sun Jul 25 19:18:56 EDT 2010
>     dan at kraken.example.org:/usr/obj/usr/src/sys/KRAKEN amd64
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Phenom(tm) II X4 945 Processor (3010.17-MHz K8-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x100f42  Family = 10  Model = 4 
> Stepping = 2
> 
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> 
> 
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD 
> Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD 
> Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> 
> 
>   TSC: P-state invariant
> real memory  = 4294967296 (4096 MB)
> avail memory = 4100710400 (3910 MB)
> ACPI APIC Table: <111909 APIC1708>
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s)
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
>  cpu2 (AP): APIC ID:  2
>  cpu3 (AP): APIC ID:  3
> 
> 
> Andrew: You posted about this on July 14.  Anything new since then?
> 
> John: Is it time for me to get a new CPU?

Hmm, this is what mcelog says:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
ADDR 5d0fe8c
   Northbridge NB Array Error
        bit33 = err cpu1
        bit42 = L3 subcache in error bit 0
        bit43 = L3 subcache in error bit 1
        bit46 = corrected ecc error
   memory/cache error 'generic read mem transaction, generic 
transaction, level generic'
STATUS 94614c62001c011b MCGSTATUS 0
MCGCAP 106 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 16 Model 4

It was a corrected ECC error.  If you get more than one then perhaps the 
CPU is busted, but if you only get one, an isolated bit flip may not be 
worth worrying about.

-- 
John Baldwin


More information about the freebsd-hackers mailing list