Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 22 May 2024 18:53:04 UTC
First order:

Looks like we're trying to schedule a trim, but that fails due to a malloc
issue. So then, since it's a
malloc issue, we wind up trying to automatically reschedule this I/O, which
recurses into the driver
with a bad lock held and boop.

Can you reproduce this?

If so, can you test this patch?

diff --git a/sys/cam/nvme/nvme_da.c b/sys/cam/nvme/nvme_da.c
index 3f6cf8702870..357e612200e9 100644
--- a/sys/cam/nvme/nvme_da.c
+++ b/sys/cam/nvme/nvme_da.c
@@ -1077,7 +1077,9 @@ ndastart(struct cam_periph *periph, union ccb
*start_ccb)

                        trim = malloc(sizeof(*trim), M_NVMEDA, M_ZERO |
M_NOWAIT);
                        if (trim == NULL) {
+                         cam_periph_unlock(periph);
                                biofinish(bp, NULL, ENOMEM);
+                         cam_periph_lock(periph);
                                xpt_release_ccb(start_ccb);
                                ndaschedule(periph);
                                return;

(the mailer may mangle it, so I've also attached it in case people want to
comment on this).

The logic here is that we have special logic in the ENOMEM case that will
recursively
call the start routine, which calls the scheduler which expects to be able
to take out the
periph lock. But it can't, because it's already locked. It also invokes a
pacing protocol that
slows down things even more.

What I'm now not sure about is whether or not we need to just release
start_ccb or if we also need to call ndaschedule too here. Seems like we
might not want to (but it's a safe nop if not). I've cc'd mav@ to see if he
has opinions on what's going on.

Warner


On Wed, May 22, 2024 at 11:22 AM Alexander Leidinger <
Alexander@leidinger.net> wrote:

> Hi,
>
> I got the panic in $Subject. Anyone an idea?
>
> Complete crashlog available at https://wiki.leidinger.net/core.txt.6
> (1.2 MB)
>
> Short version:
> ---snip---
> [11417] KDB: stack backtrace:
> [11417] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe043133f830
> [11417] vpanic() at vpanic+0x13f/frame 0xfffffe043133f960
> [11417] panic() at panic+0x43/frame 0xfffffe043133f9c0
> [11417] __mtx_lock_sleep() at __mtx_lock_sleep+0x491/frame
> 0xfffffe043133fa50
> [11417] __mtx_lock_flags() at __mtx_lock_flags+0x9c/frame
> 0xfffffe043133fa70
> [11417] ndastrategy() at ndastrategy+0x3c/frame 0xfffffe043133faa0
> [11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fb00
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fb30
> [11417] g_io_deliver() at g_io_deliver+0x1cc/frame 0xfffffe043133fb80
> [11417] g_disk_done() at g_disk_done+0xee/frame 0xfffffe043133fbc0
> [11417] ndastart() at ndastart+0x4a3/frame 0xfffffe043133fc20
> [11417] xpt_run_allocq() at xpt_run_allocq+0xa5/frame 0xfffffe043133fc70
> [11417] ndastrategy() at ndastrategy+0x6d/frame 0xfffffe043133fca0
> [11417] g_disk_start() at g_disk_start+0x569/frame 0xfffffe043133fd00
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd30
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd60
> [11417] g_io_request() at g_io_request+0x2b6/frame 0xfffffe043133fd90
> [11417] vdev_geom_io_start() at vdev_geom_io_start+0x257/frame
> 0xfffffe043133fdc0
> [11417] zio_vdev_io_start() at zio_vdev_io_start+0x321/frame
> 0xfffffe043133fe10
> [11417] zio_execute() at zio_execute+0x78/frame 0xfffffe043133fe40
> [11417] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame
> 0xfffffe043133fec0
> [11417] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame
> 0xfffffe043133fef0
> ---snip---
>
> This is with a world from 2024-05-17-084543.
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
>