Page fault in _mca_init during startup

Alan Somers asomers at freebsd.org
Thu Feb 4 23:05:54 UTC 2021


On Thu, Feb 4, 2021 at 3:58 PM Konstantin Belousov <kostikbel at gmail.com>
wrote:

> On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers <asomers at freebsd.org> wrote:
> > >
> > > After upgrading a machine to FreeBSD, 12.2, it hit the following panic
> on
> > > its first reboot.  I suspect that a few other servers have hit this
> too,
> > > but since it happens before swap is mounted there are no core dumps,
> and
> > > they usually reboot immediately.  The code in question hasn't changed
> since
> > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody
> have
> > > any suggestions for how I could debug further?  I can't readily
> reproduce
> > > it, and I can't dump core, but I'd like to investigate it any way I
> can.
> > > The server in question has dual Xeon Gold 6142 CPUs.
> > >
> >
> > I can't actually help :( but I can add a +1  with similar hardware or
> > equivalent specs. It's not frequent, but it's often enough to be
> > annoying.
> > -M
> >
> > > if (!(ctl & MC_CTL2_CMCI_EN))
> > > /* This bank does not support CMCI. */
> > > return;
> > >
> > > cc = &cmc_state[PCPU_GET(cpuid)][i];    // <- panic here
> > >
> > > /* Determine maximum threshold. */
> > >
> > >
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 26; apic id = 34
> > > fault virtual address = 0xd0
> > > fault code = supervisor read data, page not present
> > > instruction pointer = 0x20:0xffffffff8125a009
> > > stack pointer        = 0x28:0xfffffe0000b65f20
> > > frame pointer        = 0x28:0xfffffe0000b65f50
> > > code segment = base 0x0, limit 0xfffff, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags = resume, IOPL = 0
> > > current process = 11 (idle: cpu26)
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 26
> > > time = 1
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > 0xfffffe0000b65be0
> > > vpanic() at vpanic+0x17b/frame 0xfffffe0000b65c30
> > > panic() at panic+0x43/frame 0xfffffe0000b65c90
> > > trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000b65cf0
> > > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000b65d40
> > > trap() at trap+0x286/frame 0xfffffe0000b65e50
> > > calltrap() at calltrap+0x8/frame 0xfffffe0000b65e50
> > > --- trap 0xc, rip = 0xffffffff8125a009, rsp = 0xfffffe0000b65f20, rbp =
> > > 0xfffffe0000b65f50 ---
> > > _mca_init() at _mca_init+0x5d9/frame 0xfffffe0000b65f50
> > > init_secondary_tail() at init_secondary_tail+0xfd/frame
> 0xfffffe0000b65f80
> > > init_secondary() at init_secondary+0x2d1/frame 0xfffffe0000b65ff0
> > > KDB: enter: panic
> > > [ thread pid 11 tid 100029 ]
> > > Stopped at      kdb_enter+0x37: movq    $0,0x12bc1f6(%rip)
>
> Try this.
>
> I think that there is no other dependencies in the startup order, but
> cannot know it for sure.
>
> commit 19584e3d3e9606d591fa30999b370ed758960e8c
> Author: Konstantin Belousov <kib at FreeBSD.org>
> Date:   Fri Feb 5 00:56:09 2021 +0200
>
>     x86: init mca before APs are started
>
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d455..e2bf2673cf69 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
>
>         mca_init();
>  }
> -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
>
>  /* Called when a machine check exception fires. */
>  void
>

I can test this patch on development servers, but so far I've only seen the
crash on production servers.  Do you have any suggestions for how to force
the crash, or how to test this patch besides simply making sure that my dev
servers can boot?
-Alan


More information about the freebsd-stable mailing list