Page fault in _mca_init during startup

Konstantin Belousov kostikbel at gmail.com
Fri Feb 5 02:40:12 UTC 2021


On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <kostikbel at gmail.com>
> wrote:
> > Do you have INVARIANTS enabled?  If not, I am curious if enabling them
> > would convert that rare page fault into rare "CPU %d has more MC banks"
> > assert.
> >
> > Also might be the output of the
> > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > /dev/cpuctl$x; done
> > command will show the issue (0x179 is the MCG_CAP MSR).
> > You need to load cpuctl(4) if it is not loaded yet.
> >
> 
> I don't have INVARIANTS enabled, and I can't enable it on the production
> servers.  However, I can turn those three KASSERTs into VERIFYs and see
> what happens.  Here is what your command shows on the server that panicked:
> $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179
> /dev/cpuctl$x; done | uniq -c
>   16 MSR 0x179: 0x00000000 0x0f000c14
>   16 MSR 0x179: 0x00000000 0x0f000814

It probably explains it, but it would be more telling if you left the
output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set.

I suspect that your machine has two sockets, and processor in one socket
has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
is not quite symmetric, perhaps processors were from different bins?

If BSP is selected on reporting socket, everything boots well. If
other socket wins the BSP selection race, cmci is not initialized, but
when per-cpu mca_init() sees CMCI_P bit, it calls cmci_setup() without
allocated cmc state, because BSP did not needed it.

If I am right, then unconditionally allocating the memory is probably the
only choice there.

commit 2e2c925ac3b626edc6492a57a80f6b87895801c2
Author: Konstantin Belousov <kib at FreeBSD.org>
Date:   Fri Feb 5 04:32:05 2021 +0200

    x86 mca: unconditionally allocate memory for cmc state

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..dff3f7631f5c 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1047,7 +1047,7 @@ mca_setup(uint64_t mcg_cap)
 	    "force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0,
 	    sysctl_mca_scan, "I", "Force an immediate scan for machine checks");
 #ifdef DEV_APIC
-	if (cmci_supported(mcg_cap))
+	if (cpu_vendor_id == CPU_VENDOR_INTEL)
 		cmci_setup();
 	else if (amd_thresholding_supported())
 		amd_thresholding_setup();


More information about the freebsd-stable mailing list