Re: ZFS deadlock in 14
- In reply to: Alexander Motin : "Re: ZFS deadlock in 14"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 17 Aug 2023 19:37:09 UTC
On 17.08.2023 14:57, Alexander Motin wrote: > On 15.08.2023 12:28, Dag-Erling Smørgrav wrote: >> Mateusz Guzik <mjguzik@gmail.com> writes: >>> Going through the list may or may not reveal other threads doing >>> something in the area and it very well may be they are deadlocked, >>> which then results in other processes hanging on them. >>> >>> Just like in your case the process reported as hung is a random victim >>> and whatever the real culprit is deeper. >> >> We already know the real culprit, see upthread. > > Dag, I looked through the thread once more, and, while thank you for > tracing it, but you never went beyond txg_wait_synced() in `zfs revert` > thread. If you are saying that thread is holding the lock, then the > question is why transaction commit is stuck. I need to see stacks for > ZFS sync threads, or better all kernel stacks, just in case. Without > that information I can only speculate. > > Trying to run your test (so far without reproduction) I see it producing > a substantial amount of ZIL writes. The range of commits you reduced > the scope to so far includes my ZIL locking refactoring, where I know > for sure are some deadlocks. I am already waiting for 3 weeks now for > reviews and tests for PR that should fix it: > https://github.com/openzfs/zfs/pull/15122 . It would be good if you > could test it, though it seems to depend on few more earlier patches not > merged to FreeBSD yet. Ah, appears on the pool I tested first I have sync=always from earlier tests, that explains the high amount of ZIL traffic I saw, so it may be irrelevant. But I still wonder what sync threads are doing in your case. -- Alexander Motin