Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Date: Sat, 25 May 2024 08:34:32 UTC
Am 2024-05-22 22:45, schrieb Alexander Leidinger: > Am 2024-05-22 20:53, schrieb Warner Losh: > >> First order: >> >> Looks like we're trying to schedule a trim, but that fails due to a >> malloc issue. So then, since it's a >> malloc issue, we wind up trying to automatically reschedule this I/O, >> which recurses into the driver >> with a bad lock held and boop. >> >> Can you reproduce this? > > So far I had it once. At least I have only one crashdump. I had one > more reboot/crash, but no dump. I also have a watchdog running on this > system, so not sure what caused the (unusual) reboot. I had a poudriere > build running at both times. Since the crashdump I didn't run poudriere > anymore. > >> If so, can you test this patch? > > I give it a try tomorrow anyway, and I will try to stress the system > again with poudriere. > > The nvme is a cache and also a log device for a zpool, so not really a > deterministic way to trigger access to it. I've run a lot of poudriere builds together with other load (about 30 jails with mysql, postgresql, redis, webmail, postfix, imap, java stuff, ...) on this system since thursday. So far no panic in the nvme part. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF