Deadlock between nfsd and snapshots.
Tor Egge
Tor.Egge at cvsup.no.freebsd.org
Fri Aug 18 20:20:09 UTC 2006
> First, big thanks to Peter for helping debugging the problem !
>
> This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs).
> In fact, deadlock is not specific to nfsd. It happens when ufs_inactive()
> interposes with ffs_snapshot.
[snip]
> On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX,
> V_WAIT). ufs_inactive is running with vnode locked, If happens at the right
> time, system will deadlock.
>
> nfsd is the most vulnerable to the problem due to it oftenly being the
> only (and last) user of vnode, vput() from nfsd have high chance resulting
> in vinactive().
>
> Below is the patch that set VI_OWEINACT for the inode if the last call to
> vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe
> because mp == NULL means that no previous code that changes inode was
> executed.
> Please, review and test.
The deadlock indicates that one or more of IN_CHANGE, IN_MODIFIED or IN_UPDATE
was set on the inode, indicating a write operation (e.g. VOP_WRITE(),
VOP_RENAME(), VOP_CREATE(), VOP_REMOVE(), VOP_LINK(), VOP_SYMLINK(),
VOP_SETATTR(), VOP_MKDIR(), VOP_RMDIR(), VOP_MKNOD()) that was not protected by
vn_start_write() or vn_start_secondary_write().
The suspension of the file system should have cleared those flags on all
related inodes. Write operations protected by vn_start_write() should have
blocked without holding any vnode lock until the file system was resumed while
write operations protected by vn_start_secondary_write() should have triggered
a retry of the vnode sync loop in ffs_sync().
Such unprotected write operations might render the snapshot inconsistent. Your
patch addresses the deadlock symptom but not the cause.
- Tor Egge
More information about the freebsd-fs
mailing list