BINIT and BERR signals in MCA

From: Lee MATTHEWS <Lee.MATTHEWS.external_at_stormshield.eu>
Date: Tue, 11 Apr 2023 11:45:54 UTC
Hello,

One of our clients is experiencing problems using one of our products. It runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC processor.

Occasionally, under very light load, the kernel will panic. I've managed to get a couple of vmcores and I notice via the backtrace that the MCA interrupt is called.

I've managed to recover two vmcores and I notice in both of them that the Inter-Processor Interrupts are not being transferred from one CPU to the other. I've also noticed that the structure mca_internal contains information concerning the state of the MCA status register (value : 0x9000000020000003) for bank 0.

From Intel's software architecture document, the MCA Error Code is 0x0003 "The BINIT# from another processor caused this processor to enter machine check." and the Model Specific Error Code is 0x2000 "1 if BERR is driven."

The Intel document is not clear; could anyone please explain what the BINIT and BERR signals mean? They appear to be related to a bus, but I'm not sure which one. A bus external to the Atom SoC or one of the internal buses within the Atom SoC?

Do you have any ideas of what could generate this type of error? Is it likely a hardware or a software issue?

Thanks in advance.

Best wishes,
Lee Matthews