Deadlock between nfsd and snapshots.

Wed Aug 23 15:40:01 UTC 2006

On Wed, 23 Aug 2006, Kostik Belousov wrote:

> On Tue, Aug 22, 2006 at 09:46:38PM +0000, Tor Egge wrote:
>>> 2. All places that currently set IN_ACCESS, instead would increment
>>> i_accessed using the atomic ops. ufs_itimes shall update i_access
>>> under some mutex if i_accessed is greater than zero.
>>
>> Protecting the existing i_flag and the timestamps with the vnode
>> interlock when the current thread only has a shared vnode lock should
>> be sufficient to protect against the races, removing the need for #3,
>> #4 and #4 below.

You asked about this in a later reply (the one with the patch).

This seems wrong to me.  I think MNT_ILOCK() (like you used) is sufficient,
but you should just use a nonblocking vn_start_write() to avoid knowing
about the internals of vn_start_write().  If the shared (or whatever)
vnode lock is insufficient, then there are much larger, much older bugs.
Inodes are accessed a lot with just the the vnode lock, and the vnode
interlock here won't affect races with most other accesses.

>> What's left is avoiding setting IN_MODIFIED when it's unsafe, to
>> protect against the deadlock.
>
> So, I will do the following:
>
> 1. Protect both setting and reading inode times and i_flag with vnode
> interlock. This shall be done through all the sys/ufs/*/* code.

The patch doesn't change so much.  Most places shouldn't need changing.

> 2. Modify ufs_itimes:
>> If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to
>> set IN_MODIFIED since the file system might be suspended or in the
>> process of being suspended with the vnode sync loop in ffs_sync()
>> having iterated past the vnode.
> In other words, if IN_CHANGE or IN_UPDATE are already set, I can
> safely convert IN_ACCESS into IN_MOD.

Not quite.  IN_ACCESS can also be handled if IN_MODIFIED is already
set, even if neither IN_CHANGE nor IN_UPDATE is set.  All this is when
the file system is suspended; otherwise we can do anything to the
inode.  In all cases, we depend on the inode not changing underneath
us.  I think ordinary vnode locking gives that, just like it did before
suspension existed.

> Otherwise, I shall implemented the algorithm below. Suspending/suspended
> checks need to take MNT_ILOCK.
>> ...
>> When ufs_itimes() cannot set IN_MODIFIED then it has to either risk
>> losing the access time update or use some mechanism to defer it (e.g.
>> set IN_LAZYMOD or a new flag and let process_deferred_inactive() set
>> IN_MODIFIED after the file system has been resumed).
>>
> BTW, shall the test for MNT_RDONLY in the ufs_itimes moved earlier ?

Probably not.  It is supposed to be earlier, but is misplaced to work
around bugs in the MNT_RDONLY case (this case should never set a flag
or a timestamp, but in fact does set flags to work around bugs elsewhere).

> 3. Add the process_deferred_lazymod procedure, called from ffs_snapshot
> before proc_deferred_inactive, that shall convert IN_LAZYMOD | IN_ACCESS
> into IN_MODIFIED. To be safe, the proc_def_lazymod needs vn_start_write braces.

I think you should just ignore IN_ACCESS when it cannot be converted to
a timestamp, or use IN_LAZYMOD.

ufs_inactive() needs to do something with IN_LAZYMOD other than blindly
turn it into IN_MODIFIED (but I believe the problem case is unreachable
in -current -- see other mail).

ffs_update() clears IN_MODIFIED.  I now think this and other clearings
in ffs_update() are safe.  vnode locking should make it safe to change
the inode, and it doesn't matter if the file system is suspended (or
being suspended) provided IN_MODIFIED is not set when it shouldn't be.

Bruce