Machine check errors

Alban Hertroys dalroi at solfertje.student.utwente.nl
Sun Jan 23 10:50:38 UTC 2011


Ever since installing 7.4-PRERELEASE I'm seeing MCA machine check errors on my home-server. They usually occur during my Sunday-night level1 dump via ssh to a disk connected to a different machine, although that's probably not relevant.

Today I finally managed to catch it on the terminal, here's a hand-transcribed copy:

MCA: Bank 0, Status 0xb622000000000135
MCA: Global Cap 0x0000000000000104, Status 0x0000000000000004
MCA: Vendor "AuthenticAMD". ID 0x662, APIC ID 1
MCA: CPU 0 UNCOR PCC DCACHE L1 DRD error
MCA: Address 0x162933f0


Fatal trap 20: Machine check trap while in user mode
cpuid = 0; apic id = 01
instruction pointer	= 0x33:0x8086bd0
stack pointer		= 0x3b:0xbfbfd390
frame pointer		= 0x3b:0xbfbfd3e8
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 3, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, IOPL = 0
current process		= 18119 (postgres)
trap number		= 20
panic: machine check trap
cpuid = 0
GEOM_MIRROR: Device home: provider mirror/home destroyed.

Dmesg is also attached.


!DSPAM:363,4d3c03e411733364220958!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_20110123
Type: application/octet-stream
Size: 6114 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110123/94ccf810/dmesg_20110123.obj
-------------- next part --------------


>From searching the archives I found claims that L1 cache errors would cause far more troubles than I'm seeing. The user in that case however was using an Intel-based Thinkpad laptop, while I'm seeing them on an AthlonXP-based server (Tyan Tiger board, 3Ware RAID-controller, the works).

Now there is something unusual about my server that could be related to these MCA errors: It's a dual-CPU motherboard that normally would host two AthlonMP's, but is instead hosting a single AthlonXP. So one of the CPU sockets has no CPU in it.

So, what's my situation? Do I need to go looking for a replacement CPU or is something wrong with the machine-check itself?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.



!DSPAM:363,4d3c03e411733364220958!


More information about the freebsd-stable mailing list