Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Date: Mon, 07 Mar 2022 21:42:54 UTC
On Mon, Mar 07, 2022 at 09:54:26PM +0100, Ronald Klop wrote:
>
> Van: Mark Johnston <markj@freebsd.org>
> Datum: maandag, 7 maart 2022 16:13
> Aan: Ronald Klop <ronald-lists@klop.ws>
> CC: bob prohaska <fbsd@www.zefox.net>, Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org, freebsd-current <freebsd-current@freebsd.org>
> > I haven't been able to reproduce any crashes running poudriere in an
> > arm64 AWS instance, though. Could you please try the patch below and
> > confirm whether it fixes your panics? I verified that the apparent
> > problem described above is gone with the patch.
> >
> > diff --git a/sys/kern/kern_rmlock.c b/sys/kern/kern_rmlock.c
> > index 0cdcfb8fec62..e51c25136ae0 100644
> > --- a/sys/kern/kern_rmlock.c
> > +++ b/sys/kern/kern_rmlock.c
> > @@ -437,6 +437,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
> > {
> > struct thread *td = curthread;
> > struct pcpu *pc;
> > + int cpuid;
> >
> > if (SCHEDULER_STOPPED())
> > return (1);
> > @@ -452,6 +453,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
> > atomic_interrupt_fence();
> >
> > pc = get_pcpu();
> > + cpuid = pc->pc_cpuid;
> > rm_tracker_add(pc, tracker);
> > sched_pin();
> >
> > @@ -463,7 +465,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock)
> > * conditional jump.
> > */
> > if (__predict_true(0 == (td->td_owepreempt |
> > - CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus))))
> > + CPU_ISSET(cpuid, &rm->rm_writecpus))))
> > return (1);
> >
> > /* We do not have a read token and need to acquire one. */
> >
> >
> >
>
> Hi,
>
> This patch paniced again:
> x0: ffffa00005a31500
> x1: ffffa00005a0e000
> x2: 2
> x3: ffffa00076c4e9a0
> x4: 0
> x5: e672743c8f9e5
> x6: dc89f70500ab1
> x7: 14
> x8: ffffa00005a31518
> x9: 1
> x10: ffffa00005a0e000
> x11: 0
> x12: 0
> x13: a
> x14: 1013e6b85a8ecbe4
> x15: 1dce740d11a5
> x16: ffff3ea86e2434bf
> x17: fffffffffffffff2
> x18: ffff0000fe661800 (g_ctx + fcf9fa54)
> x19: ffffa00076c4e9a0
> x20: ffff0000fec39000 (g_ctx + fd577254)
> x21: 2
> x22: ffff0000419b6090 (g_ctx + 402f42e4)
> x23: ffff000000c0b137 (lockstat_enabled + 0)
> x24: 100
> x25: ffff000000c0b000 (version + a0)
> x26: ffff000000c0b000 (version + a0)
> x27: ffff000000c0b000 (version + a0)
> x28: 0
> x29: ffff0000fe661800 (g_ctx + fcf9fa54)
> sp: ffff0000fe661800
> lr: ffff00000154ea50 (zio_dva_throttle + 154)
> elr: ffff00000154ea80 (zio_dva_throttle + 184)
> spsr: 60000045
> far: 2b753286b0b8
> panic: Unknown kernel exception 0 esr_el1 2000000
> cpuid = 1
> time = 1646685857
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x174
> panic() at panic+0x44
> do_el1h_sync() at do_el1h_sync+0x184
> handle_el1h_sync() at handle_el1h_sync+0x10
> --- exception, esr 0x2000000
> zio_dva_throttle() at zio_dva_throttle+0x184
> zio_execute() at zio_execute+0x58
> KDB: enter: panic
> [ thread pid 0 tid 100129 ]
> Stopped at kdb_enter+0x44: undefined f901c11f
> db>
ZFS doesn't make use of rm locks as far as I can see, so this is a
little weird. I reverted the original rmlock commit in main, so it may
be worth verifying that the problem really is gone before digging
deeper. In other words, I'm a bit suspicious that this is a different
bug.