MCA: CPU 0 UNCOR PCC DTLB L1 error
John Hay
jhay at meraka.org.za
Mon May 16 17:18:39 UTC 2011
On Mon, May 16, 2011 at 06:23:19PM +0200, John Hay wrote:
> On Wed, May 11, 2011 at 05:26:50PM -0500, Alan Cox wrote:
> > On Tue, May 10, 2011 at 7:52 AM, John Hay <jhay at meraka.org.za> wrote:
> >
> > > Hi,
> > >
> > > I have seen this panic a few times on a Gigabyte E350N-USB3 running
> > > 8-STABLE.
> > > I have only seen it while in X, but then the machine is always in X. At
> > > first,
> > > I just got these hangs, so bought a PCI-express RS232 card and could see
> > > these
> > > at last. For some reason it does not go past this, so I have not been able
> > > to
> > > get a dump yet.
> > >
> > > Have anybody an idea of why this is or how to debug it further? I searched
> > > the archives and found something similar about a year ago, but it looks
> > > like it was solved with a fix that got committed.
> > >
> > > http://www.freebsd.org/cgi/query-pr.cgi?pr=140338
> > >
> > > I have now disabled mca in loader.conf with 'hw.mca.enabled="0"' and I have
> > > not seen that panic again. I do occasionally see a panic in devfs_open(),
> > > but I guess that should be handled in another thread.
> > >
> > > The kernel is basically a GENERIC kernel with puc uncommented and the
> > > following in loader.conf
> > >
> > > vm.kmem_size="12G"
> > > hw.mca.enabled="0"
> > > zfs_load="YES"
> > > ahci_load="YES"
> > > xhci_load="YES"
> > > amdtemp_load="YES"
> > > ng_ubt_load="YES"
> > > uplcom_load="YES"
> > >
> > > Here is the panic message and after that dmesg.
> > >
> > > John
> > > --
> > > John Hay -- jhay at meraka.csir.co.za / jhay at FreeBSD.org
> > >
> > > ####################################################
> > > MCA: Bank 0, Status 0xb600000000010015
> > > MCA: Global Cap 0x0000000000000106, Status 0x0000000000000004
> > > MCA: Vendor "AuthenticAMD", ID 0x500f10, APIC ID 0
> > > MCA: CPU 0 UNCOR PCC DTLB L1 error
> > > MCA: Address 0x8016c4000
> > >
> > >
> > > Fatal trap 28: machine check trap while in user mode
> > > cpuid = 0; apic id = 00
> > > instruction pointer = 0x43:0x80156af85
> > > stack pointer = 0x3b:0x7fffffffcb18
> > > frame pointer = 0x3b:0x80fe87800
> > > code segment = base 0x0, limit 0xfffff, type 0x1b
> > > = DPL 3, pres 1, long 1, def32 0, gran 1
> > > processor eflags = interrupt enabled, IOPL = 0
> > > current process = 2484 (initial thread)
> > > trap number = 28
> > > panic: machine check trap
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > #0 0xffffffff80608d5e at kdb_backtrace+0x5e
> > > #1 0xffffffff805d6707 at panic+0x187
> > > #2 0xffffffff808bf4c0 at trap_fatal+0x290
> > > #3 0xffffffff808bfaa9 at trap+0x109
> > > #4 0xffffffff808a7d94 at calltrap+0x8
> > > ####################################################
> > >
> > >
> > Please try the following patch:
> >
> > Index: x86/x86/mca.c
> > ===================================================================
> > --- x86/x86/mca.c (revision 219060)
> > +++ x86/x86/mca.c (working copy)
> > @@ -665,7 +665,8 @@ mca_setup(uint64_t mcg_cap)
> > * for Erratum 383.
> > */
> > if (cpu_vendor_id == CPU_VENDOR_AMD &&
> > - CPUID_TO_FAMILY(cpu_id) == 0x10 && amd10h_L1TP)
> > + (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> > + CPUID_TO_FAMILY(cpu_id) == 0x14) && amd10h_L1TP)
> > workaround_erratum383 = 1;
> >
> > mtx_init(&mca_lock, "mca", NULL, MTX_SPIN);
> > Index: i386/i386/pmap.c
> > ===================================================================
> > --- i386/i386/pmap.c (revision 219060)
> > +++ i386/i386/pmap.c (working copy)
> > @@ -758,7 +758,8 @@ pmap_init(void)
> > * machine monitor.
> > */
> > if (vm_guest == VM_GUEST_VM && cpu_vendor_id == CPU_VENDOR_AMD &&
> > - CPUID_TO_FAMILY(cpu_id) == 0x10)
> > + (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> > + CPUID_TO_FAMILY(cpu_id) == 0x14))
> > workaround_erratum383 = 1;
> >
> > /*
> > Index: amd64/amd64/pmap.c
> > ===================================================================
> > --- amd64/amd64/pmap.c (revision 219060)
> > +++ amd64/amd64/pmap.c (working copy)
> > @@ -727,7 +727,8 @@ pmap_init(void)
> > * machine monitor.
> > */
> > if (vm_guest == VM_GUEST_VM && cpu_vendor_id == CPU_VENDOR_AMD &&
> > - CPUID_TO_FAMILY(cpu_id) == 0x10)
> > + (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> > + CPUID_TO_FAMILY(cpu_id) == 0x14))
> > workaround_erratum383 = 1;
> >
> > /*
>
> I have applied the patch, but got another one today. I still do not get
> a prompt or dump. :-( It just get stuck right after #4. If there is anything
> more that I can try, just ask.
>
> #####################################################################
> MCA: Bank 0, Status 0xb600000000010015
> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000004
> MCA: Vendor "AuthenticAMD", ID 0x500f10, APIC ID 0
> MCA: CPU 0 UNCOR PCC DTLB L1 error
> MCA: Address 0x808ace000
>
>
> Fatal trap 28: machine check trap while in user mode
> cpuid = 1; apic id = 01
> instruction pointer = 0x43:0x80af206d5
> stack pointer = 0x3b:0x7fffffffb8e8
> frame pointer = 0x3b:0x809b92450
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 3, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, IOPL = 0
> current process = 22228 (initial thread)
> trap number = 28
> panic: machine check trap
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff80608f6e at kdb_backtrace+0x5e
> #1 0xffffffff805d6917 at panic+0x187
> #2 0xffffffff808bf7c0 at trap_fatal+0x290
> #3 0xffffffff808bfda9 at trap+0x109
> #4 0xffffffff808a8084 at calltrap+0x8
> #####################################################################
>
Some extra info. The machine is my new "always on" machine at home. Most
of the panics have happened while I was not there. My wife just mentioned
that it often happen when she was busy typing a reply in thunderbird. (I
do not use that machine for my email.) So I tried it, clicked reply on
one of her emails and within a few lines, it crashed.
John
--
John Hay -- jhay at meraka.csir.co.za / jhay at FreeBSD.org
More information about the freebsd-stable
mailing list