Re: BINIT and BERR signals in MCA

From: Eugene Grosbein <eugen_at_grosbein.net>
Date: Tue, 11 Apr 2023 11:59:08 UTC
11.04.2023 18:45, Lee MATTHEWS wrote:

> Hello,
> 
> One of our clients is experiencing problems using one of our products. It runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC processor.
> 
> Occasionally, under very light load, the kernel will panic. I've managed to get a couple of vmcores and I notice via the backtrace that the MCA interrupt is called.
> 
> I've managed to recover two vmcores and I notice in both of them that the Inter-Processor Interrupts are not being transferred from one CPU to the other. I've also noticed that the structure mca_internal contains information concerning the state of the MCA status register (value : 0x9000000020000003) for bank 0.
> 
>>From Intel's software architecture document, the MCA Error Code is 0x0003 "The BINIT# from another processor caused this processor to enter machine check." and the Model Specific Error Code is 0x2000 "1 if BERR is driven."
> 
> The Intel document is not clear; could anyone please explain what the BINIT and BERR signals mean? They appear to be related to a bus, but I'm not sure which one. A bus external to the Atom SoC or one of the internal buses within the Atom SoC?
> 
> Do you have any ideas of what could generate this type of error? Is it likely a hardware or a software issue?
> 
> Thanks in advance.
> 
> Best wishes,
> Lee Matthews

I believe this is some hardware issue, probably over-heating. Did you check for thermal sensor values?