Deadlock between nfsd and snapshots.
Bruce Evans
bde at zeta.org.au
Tue Aug 22 09:15:45 UTC 2006
On Mon, 21 Aug 2006, Tor Egge wrote:
> I wrote:
>
>> The deadlock indicates that one or more of IN_CHANGE, IN_MODIFIED or
>> IN_UPDATE was set on the inode, indicating a write operation
>> (e.g. VOP_WRITE(), VOP_RENAME(), VOP_CREATE(), VOP_REMOVE(), VOP_LINK(),
>> VOP_SYMLINK(), VOP_SETATTR(), VOP_MKDIR(), VOP_RMDIR(), VOP_MKNOD()) that was
>> not protected by vn_start_write() or vn_start_secondary_write().
>
> The most common "write" operation was probably VOP_GETATTR().
Reading the attributes really is a write operation if it causes marks for
update to be turned into updates. ufs_inactive() is also a write operation
if it causes this. I think marking for update shouldn't require much locking.
> ufs_itimes(), called from ufs_getattr(), might set the IN_MODIFIED inode flag
> if IN_ACCESS is set on the inode even if neither IN_CHANGE nor IN_UPDATE is
> set, transitioning the inode flags into a state where ufs_inactive() calls the
> blocking variant of vn_start_secondary_write().
In the implementation of marking access times for update on exec, we try hard
to avoid calling vn_start_write(), etc. Early implementations did call
vn_start_write(), etc., and had some bugs from this. The current
implementation is mainly for ffs and is rather fragile: VOP_SETATTR()
depends on callers calling vfs_start_write, but vfs_mark_atime() calls
VOP_SETATTR() without calling vfs_start_write(). The correctness of this
depends on the VA_MARK_ATIME case of VOP_SETATTR() not writing any more
than VOP_READ() would or should.
I think ufs_itimes() shouldn't call vn_start_write() any more than
ufs_getattr() should. Callers should be aware that GETATTR may write
and thus it seems to be necessary for them to call vn_start_write()
unconditionally. ufs_itimes() is a utility function that is called
from places other than ufs_getattr(). The other places are ufs_*close()
and ufs_setattr(). These don't cause any additional problems.
VOP_CLOSE() is called from several places which already seem to have
sufficient locking. VOP_GETATTR() is called from many places that don't
call vn_start_write(), starting with vn_stat().
The updates to timestamps in in ufs_itimes() and ufs_getattr() are still
soft (not even delayed writes, but writes to the vnode that will usually
become delayed writes later). Perhaps vn_start_write() can be avoided
for them, but since they are logically writes this might be hard to
implement correctly. ufs_inactive() has to do a (possibly delayed) physical
write to force any updates to disk, so it needs strong locking.
BTW, ufs_itimes() has some possibly related kludges involving changing
from r/w mounts to r/o mounts. Some of the marks for update aren't
handled quite right and are still present after the change. Then they
want to be turned into updates on a r/o file system. This is impossible.
The problem is handled bogusly, essentially by clearing them without doing
the update, in a way that triggers seome of my local debugging code.
> calling ufs_itimes() with only a shared vnode lock might cause unsafe accesses
> to the inode flags. Setting of IN_ACCESS at the end of ffs_read() and
> ffs_extread() might also be unsafe. If DIRECTIO is enabled then O_DIRECT reads
> might not even attempt to set the IN_ACCESS flag.
I think setting it should be safe -- see above.
Not setting the IN_ACCESS flag in the early return for the O_DIRECT case
is a different bug. read() is supposed to set IN_ACCESS on sucessful
completion and returning early for the O_DIRECT case would defeat this.
Bruce
More information about the freebsd-fs
mailing list