vfs deadlock during panic?
jholland at fastsoft.com
Thu Feb 25 22:10:28 UTC 2010
> Are you sure one of the filesystems on the disk isn't corrupt?
> There's been reports of this problem in the past, but AFAIR it
> doesn't manifest itself in this manner.
Ah, thanks. Your comment spurred me to search for 'VOP_LOCK1_APV lock
order reversal', instead of 'freebsd hang on panic', and I see now that
this has been reported several times. I read a bunch of the threads,
but it looks like there's no solution yet. But you're right that nobody
else seems to be complaining about a rare hang-on-panic problem, either.
Anyway, I didn't see any of the threads that mentioned file system
corruption, with the possible exception of
tml, which said that running fsck was what triggered the LORs.
So I'm assuming this was a mis-remembered detail, unless you've got a
better reference, and I'll take a rain check on re-installing everything
on a new disk, for now. But thanks for the comment, I do appreciate it,
and it helped me realize what I should follow up on.
I guess my next step is to try to fix the vfs locking. I think I'll see
what happens if I use a sx_lock instead of a mtx for BO_MTX to guard the
block, so it won't care so much what the underlying file system does
during vnode operations, for the file access. I assume that won't work,
but maybe it's a start towards understanding what I do need.
The mounting one looks trickier, because the vn_lock looks rather
confusing, and I'm really not sure what to do about the Giant
dependencies it seems to have. But I guess maybe I'll see if there's a
way to defer some of these operations to a working thread or something.
Not sure if I'll actually have the time to go that deep on this issue,
and I'm unfortunately not certain it'll solve the hanging panic problem.
I guess I can see why nobody fixed it yet.
Oh well. Thanks again for the suggestion. Maybe in light of the
alternative it would be worth at least trying that separate disk idea
after all. I have already seen something very similar on at least 2
different machines with different disks, but they came from the same
dump/restore image, so maybe if it's because of fs corruption, there's a
shared reason behind it.
More information about the freebsd-stable