Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469

From: Alexander Leidinger <Alexander_at_Leidinger.net>
Date: Wed, 22 May 2024 20:45:33 UTC
Am 2024-05-22 20:53, schrieb Warner Losh:

> First order:
> 
> Looks like we're trying to schedule a trim, but that fails due to a 
> malloc issue. So then, since it's a
> malloc issue, we wind up trying to automatically reschedule this I/O, 
> which recurses into the driver
> with a bad lock held and boop.
> 
> Can you reproduce this?

So far I had it once. At least I have only one crashdump. I had one more 
reboot/crash, but no dump. I also have a watchdog running on this 
system, so not sure what caused the (unusual) reboot. I had a poudriere 
build running at both times. Since the crashdump I didn't run poudriere 
anymore.

> If so, can you test this patch?

I give it a try tomorrow anyway, and I will try to stress the system 
again with poudriere.

The nvme is a cache and also a log device for a zpool, so not really a 
deterministic way to trigger access to it.

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF