Re: ARM64 system error

From: Andrew Turner <andrew_at_fubar.geek.nz>
Date: Wed, 03 Aug 2022 16:28:21 UTC

> On 31 Jul 2022, at 17:55, John F Carr <jfc@mit.edu> wrote:
> 
> My OverDrive 1000 (Cortex A57) running CURRENT just crashed with the unhelpful message "panic: Unhandled System Error".  Is there any way to get better information?  The ESR value bf000000 translates to "system error with implementation-defined code 0" so that's not much use.  The instruction associated with the interrupt can't fault ("subs w22, w22, #0x1") so it must be an asynchronous error.  On other systems I've seen bits you can test or registers you can read to get details.

By my reading of the Cortex-A57 documentation [1] I think the ESR value shows the exception can be attributed to the current core, is containable to a given code sequence, and is a decode error.

It’s likely due to msk_phy_readreg accessing the phy, but it doesn’t respond quickly enough.

Does an older kernel boot? If so can you try bisecting to find which commit caused the panic.

Andrew

[1] Bottom of https://developer.arm.com/documentation/ddi0488/h/system-control/aarch64-register-descriptions/exception-syndrome-register--el1-and-el3?lang=en

> 
>  x0:                0
>  x1: ffff0000b55bd000 (crypto_dev + b3f34ec0)
>  x2:             2880
>  x3:               20
>  x4:               d3
>  x5:                0
>  x6:              100
>  x7: ffff00011063daa0
>  x8: ffff00000077218c (generic_bs_r_2 + 0)
>  x9:             2880
> x10: ffff0000001ff9f4 (msk_phy_readreg + 84)
> x11:         a0000045
> x12:         56000000
> x13:         5e4a6f28
> x14: ffff000000c4d038 (vnet_entry_ipport_stoprandom + 0)
> x15: ffffa000016b3000
> x16:         40ef9400
> x17:                a
> x18: ffff0000b550e560 (crypto_dev + b3e86420)
> x19: ffff0000b57dc000 (crypto_dev + b4153ec0)
> x20: ffffa000029dc800
> x21:             2880
> x22:              3c4
> x23:             796d
> x24: ffffa000017f4100
> x25: ffff000000ad3da0 (miibus_readreg_desc + 0)
> x26: ffff000000bb6000 (vop_deallocate_desc + 28)
> x27: ffff000000e36980 (cc_cpu + 80)
> x28: ffff000000b1b828 (lock_class_mtx_sleep + 0)
> x29: ffff0000b550e670 (crypto_dev + b3e86530)
>  sp: ffff0000b550e560
>  lr: ffff0000001ff9f0 (msk_phy_readreg + 80)
> elr: ffff00000077806c (handle_el1h_irq + 8)
> spsr:         a00002c5
> far:                0
> esr:         bf000000
> panic: Unhandled System Error
> cpuid = 2
> time = 1659270153
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> do_serror() at do_serror+0x40
> handle_serror() at handle_serror+0x38
> --- system error, esr 0xbf000000
> handle_el1h_irq() at handle_el1h_irq+0x8
> --- interrupt
> msk_phy_readreg() at msk_phy_readreg+0x84
> e1000phy_status() at e1000phy_status+0x114
> e1000phy_service() at e1000phy_service+0x420
> mii_tick() at mii_tick+0x50
> msk_tick() at msk_tick+0x44
> softclock_call_cc() at softclock_call_cc+0x128
> softclock_thread() at softclock_thread+0xc4
> fork_exit() at fork_exit+0x74
> fork_trampoline() at fork_trampoline+0x14
> KDB: enter: panic
> [ thread pid 2 tid 100026 ]
> Stopped at      kdb_enter+0x44: undefined       f907c27f
>