MCA messages after upgrade to 8.2-BEAT1
John Baldwin
jhb at freebsd.org
Wed Dec 22 14:59:13 UTC 2010
On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
> Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e400000000833
> Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0000000000000105,
> Status 0x0000000000000000
> Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
> APIC ID 0
> Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory
> Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
You are getting corrected ECC errors in your RAM. You see them once an hour
because we poll the machine check registers once an hour. If this happens
constantly you might have a DIMM that is dying?
% ~/mcelog --ascii < foo.txt
mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 0 data cache
ADDR 236493c0
Data cache ECC error (syndrome 1c)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d40e400000000833 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 1 instruction cache
ADDR 2a1c9440
Instruction cache ECC error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
instruction fetch mem transaction
memory access, level generic'
STATUS d400400000000853 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 2 bus unit
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
MISC e00d0fff00000000 ADDR 2cac9678
Northbridge RAM ECC error
ECC syndrome = 1c
bit33 = err cpu1
bit46 = corrected ecc error
bit59 = misc error valid
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS dc0e400200000813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache
ADDR 23649640
Data cache ECC error (syndrome 1c)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d40e400000000833 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 1 instruction cache
ADDR 2a1c9440
Instruction cache ECC error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
instruction fetch mem transaction
memory access, level generic'
STATUS d400400000000853 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0
CPUID Vendor AMD Family 15 Model 67
--
John Baldwin
More information about the freebsd-stable
mailing list