Deadlock between nfsd and snapshots.

Bruce Evans bde at zeta.org.au
Wed Aug 23 19:13:02 UTC 2006


On Wed, 23 Aug 2006, Tor Egge wrote:

>>> Protecting the existing i_flag and the timestamps with the vnode interlock
>>> when the current thread only has a shared vnode lock should be sufficient
>>> to protect against the races, removing the need for #3, #4 and #4 below.
>
>> I agree that this should be sufficient.  Don't know if it is.  Actually,
>> I thought that the vnode lock was more exclusive.  How can a shared lock
>> work even for a reader if a writer is changing inode contents?
>
> Holding a shared lock prevents others from holding an exclusive lock, thus
> the possible modifications to the inode are limited (i_flags and timestamps).

Seems to be not much of a problem (since vn_stat() acquires an exclusive lock.

>>> There are some constraints with regards to setting IN_MODIFIED on an inode.
>>> If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set
>>> IN_MODIFIED since the file system might be suspended or in the process of
>>> being suspended with the vnode sync loop in ffs_sync() having iterated past
>>> the vnode.
>
>> The case can't happen.  IN_CHANGE is always set if IN_MODIFIED will
>> (or should be) set later.  There are some buggy cases where the combined
>> setting of { IN_CHANGE, IN_MODIFIED } is incorrect, but these don't cause
>> any problems here.  We just have to avoid setting (or having to set)
>> IN_MODIFED when setting it is not safe.  Hopefully this is only in suspend
>> mode.
>
> If VOP_READ() has recently been called then IN_ACCESS might be set without
> IN_CHANGE, IN_UPDATE or IN_MODIFIED set.  That's when ufs_itimes() needs to be
> careful.  The VOP_READ() call might have been performed after the vnode was
> handled in the vnode sync loop.

Certainly.  You didn't mention IN_ACCESS originally, and I thought about it
but didn't mention it either.  For IN_ACCESS, we can consider the change
to not actually have occured (and thus force a setting of IN_MODIFED) until
we set the atime, or even later with IN_LAZYMOD, or never if we are really
lazy.  We can set IN_ACCESS in ffs_read() safely because it's not in the
dinode (and vn_rdwr() acquires an exclusive lock).  We can decide not to
convert it to IN_MODIFIED later if this is too hard.  Not setting the
atime properly is harmless compared with not setting other timestamps.
I should only have said that "IN_CHANGE is always set if IN_MODIFIED will
be set later due to a _disk_ inode change that has _already_ occurred"
(IN_ACCESS and IN_UPDATE both give in-core-only inode changes that will
affect the disk inode later).

>>> If the file system is suspended then IN_MODIFIED cannot be set. If
>>> IN_MODIFIED, IN_CHANGE or IN_UPDATE is set and the file system is suspended
>>> then something is wrong.
>
>> I think ufs_itimes() needs to use a non-blocking vn_start_write() and do
>> nothing (except perhaps assert that the above harmful IN_* flags are not
>> set) if it (ufs_itimes()) would set IN_MODIFIED.
>
> If a nonblocking vn_start_write() call fails then you don't know if the file
> system is suspended or in the processes of being suspended.  If the file system
> is in the process of being suspended then the vnode sync loop might still be
> running.

Yes.  I missed the different suspend flags.  Now I'll ask for a vfs level
function to avoid accessing the is-suspended flag directly.

>> - problem for atime updates caused by reads.  These become writes while in
>>    suspend mode.  We want reads to work in suspend mode, so we cannout
>>    disallow reads, and we cannot disallow the implicit writes without
>>    breaking atime semantics.  This problem can be partly avoided by
>>    ignoring IN_ACCESS in ufs_itimes() while in suspend mode, or by
>>    converting it to IN_LAZYMOD.  If the inode gets reclaimed then we
>>    lose the atime update; otherwise the atime gets updated some time
>>    after we leave suspend mode, and either way the update doesn't go
>>    to snapshots, as is necessary for having a coherent snapshot.
>
> vnodes are not reclaimed while the file system is suspended or in the
> processess of being suspended.

Does this mean that big tree walks soon run out of vnodes and cause
everything to block until the file system is unsuspended?

>> - problem for IN_LAZYMOD in ufs_reclaim().  Currently not reached.  A quick
>>    fix would be to lose whatever updates are being done lazily.
>
> What problem ?

It blindly converts IN_LAZYMOD to IN_MODIFIED and calls ufs_update(), but
it has no problem with suspend if it is not called while suspended.  It
isn't missing vnode locking like I first thought -- the locking
annotation was just wrong in the old version of vnode_if.src that I was
looking at.  The locking used to be "L L L" but was documented as "U U U".
Now it is "E E E" and it probably needs to be exclusive to set IN_MODIFIED.

Bruce


More information about the freebsd-fs mailing list