Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 12 Sep 2022 22:08:03 UTC
On 2022-Sep-12, at 05:10, Dmitry Salychev <dsl@FreeBSD.org> wrote:

> <dpaa2_panics.txt>
> Hi,
> 
> It seems that the recent 14-CURRENT/aarch64 (866e021) with DPAA2 drivers
> panics under network throughtput stress test in random places

3 of your examples get a signal handler called at the
exact same instruction:

#6  0xffff0000004ced5c in witness_lock

The parameters vary, as do the callers:

#7  0xffff00000043a3a8 in __mtx_lock_flags
(twice)
vs.
#7  0xffff00000047d4ec in callout_lock
(once)

Showing one more level, where all are distinct:

#8  0xffff0000007d60a8 in dpaa2_swp_enq_mult (swp=swp@entry=0xffffa0000056ca00, ed=ed@entry=0xffff0000bcda2c70, fd=fd@entry=0xffff0000bcda2df8, flags=flags@entry=0xffff0000bcda2c6c, frames_n=frames_n@entry=1) at /usr/src/sys/dev/dpaa2/dpaa2_swp.c:795
vs.
#8  0xffff000000508f54 in soreceive_generic (so=0xffff00011d2c2200, psa=0x0, uio=<optimized out>, mp0=<optimized out>, controlp=0x0, flagsp=<optimized out>) at /usr/src/sys/kern/uipc_socket.c:2240
vs.
#8  callout_reset_sbt_on (c=0xffff0001121792c0, sbt=<optimized out>, prec=<optimized out>, ftn=0xffff00000047d4ec <callout_reset_sbt_on+204>, arg=0xffff000112179000, cpu=0, flags=256) at /usr/src/sys/kern/kern_timeout.c:962
(no address shown)

Perhaps looking at what the code at 0xffff0000004ced5c
(and before) is doing with what kinds of data would be
useful compared to the less frequent example signal
handler invocations. It is common to all 3 call-chains
above. If dumps for them are around, more than the code
might be able to be looked into.


> with
> unknown kernel exception 0 esr_el1 2000000 on Ten64 board (based on
> NXP's LS1088A, Cortex-A53), but the same code doesn't panic on HoneyComb
> (NXP LX2160A, Cortex-A72) even after ~10h long tests.
> 
> I've gathered some stack backtraces from ddb and kgdb (attached).
> Panic itself can easily be reproduced after several minutes from the
> start of the test. I've tried to change PCPU_PTR macro to use get_pcpu
> again (as discussed in the thread earlier), but it didn't help.
> 
> If you want to get your hands dirty, DPAA2 stuff I'm using is at
> https://github.com/mcusim/freebsd-src/tree/lx2160acex7-exp (branch is
> lx2160acex7-exp!)
> 
> Any ideas or places to check would be really helpful.



===
Mark Millard
marklmi at yahoo.com