Deadlock between nfsd and snapshots.

Wed Aug 23 18:32:30 UTC 2006

On Wed, 23 Aug 2006, Tor Egge wrote:

>> I have at least one questions:
>>
>>>> Protecting the existing i_flag and the timestamps with the vnode
>>>> interlock when the current thread only has a shared vnode lock should
>>>> be sufficient to protect against the races, removing the need for #3,
>>>> #4 and #4 below.
>> Could you, please, explain this point ? I did not wrap all accesses to
>> i_flag and timestamps with vnode interlock, only ufs_itimes, ufs_lazyaccess
>> and ufs_getattr for now.
>
> As long as i_flag and the timstamps are never accessed without holding a shared
> or exclusive vnode lock, the vnode interlock can be used to serialize access
> for those holding a shared vnode lock.  Access is already serialized for those
> holding an exclusive vnode lock, since no other thread can hold a shared or
> exclusive lock for the same vnode at the same time.

I found some cases where i_flag, timestamps and probably other parts of the
vnode seem to be accessed without any proper locks (maybe a refcount) :-(.
See another (private) reply and below.

I don't understand how the vnode interlock can help much for interoperation
with code which uses vnode locks.  Aren't these locks almost independent of 
each other, so if the vnode lock is already held then acquiring the vnode
interlock wouldn't block?  So to protect iflag with the interlock you
would have to add the interlock to mounds of code that just uses the vnode
lock now.

vn_stat() uses an exclusive vnode lock although it is almost read-only.
Thus non-exlusive locks in ufs_itimes() don't occur for the most common
problem case of calls from ufs_getattr() for at least stat(2).

> An alternate locking protocol is to always use the vnode interlock to serialize
> access to i_flag and the timestamps.  That increases the cost of accessing the
> fields in code that uses an exclusive vnode lock, but can temporarily lower the
> cost in other parts of the code (e.g. when scanning all the vnodes belonging to
> a mount point looking for dirty vnodes) since the vnode lock would no longer be
> needed to access i_flag and the timestamps.

I understand this locking :-).  ffs_sync() actually uses only uses the
vnode interlock to access i_flag.  I think this is intentionally quick and
not quite right -- there is a comment a few lines before it saying that
we depend on a mntvnode lock to keep things stable enough for a quick test.
The scope of the comment is unclear.  I think the quick test is only good
enough for sync(2).

> Problems with your suggested patch:
>
> ufs_lazyaccess() changes i_flags with only the vnode interlock held.  The vnode
> interlock is not sufficient by itself to access i_flags without switching to
> the alternate locking protocol.

That's what I thought.  In fact, i_flags is accessed a lot without the vnode
lock held, sometimes even without the vnode interlock.  One case is
ufs_close().  It is called without the vnode lock since VNOP_CLOSE() doesn't
lock (I think).  It calls ufs_itimes() without acquiring the vnode lock.
It calls ufs_itimes with the vnode interlock, but only accidentally since
it has to acquire the interlock for accessing v_usecount.  This bug seems
to be very old.  In FreeBSD-1, ufs_close() does the equivalent of ufs_itimes()
if the vnode is NOT locked.  This makes some sense:
- if the vnode is locked, then write accesses to it (just memory accesses)
   would race with whatever has it locked, and apparently doing nothing was
   considered good enough,
- if the vnode is not locked, then write access to it without locking it were
   safe because the kernel was nor preemptible and the accesses don't block
   or trap.  Things are now more complicated.

Bruce