nfsd hung on ufs vnode lock
jhb at freebsd.org
Mon Jan 31 20:43:49 UTC 2011
On Friday, January 28, 2011 8:10:41 pm John Hickey wrote:
> There was a previous thread about this, but it doesn't look like there was any resolution:
> I run a fileserver for an Emulab (www.emulab.net) system. As such, the exports table is constantly modified as experiments are swapped in and
out. We also get a lot of researchers using NFS for strange things. In this case, the exclusive lock was for a cache directory shared by about
36 machines running Ubuntu 8.04 and mounting with NFSv2. Eventually, all our nfsd processes get stuck since the exclusive lock for the directory
is never released. I could use any and all pointers on getting this fixed.
> What I am running:
> jjh at users: ~$ uname -a
> FreeBSD users.isi.deterlab.net 7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #9: Tue Sep 14 16:24:57 PDT 2010
root at users.isi.deterlab.net:/usr/obj/usr/src/sys/USERS7 i386
> Here are the sleepchains for my system (note that 0xd1f72678 appears twice):
> 0xce089cf0: tag syncer, type VNON
> usecount 1, writecount 0, refcount 2 mountedhere 0
> flags ()
> lock type syncer: EXCL (count 1) by thread 0xcdb4b000 (pid 46)
> 0xd1f72678: tag ufs, type VDIR
> usecount 2, writecount 0, refcount 67 mountedhere 0
> flags ()
> v_object 0xd1e90e80 ref 0 pages 1
> lock type ufs: EXCL (count 1) by thread 0xce1146c0 (pid 866) with 62 pending
> ino 143173560, on dev mfid0s1f
From the stack trace, this vnode is the directory vnode that is the parent
of the new file being created.
> (kgdb) bt
> #0 sched_switch (td=0xce1146c0, newtd=Variable "newtd" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1936
> #1 0xc080a4a6 in mi_switch (flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/kern_synch.c:444
> #2 0xc0837aab in sleepq_switch (wchan=Variable "wchan" is not available.
> ) at /usr/src/sys/kern/subr_sleepqueue.c:497
> #3 0xc08380f6 in sleepq_wait (wchan=0xd4176394) at /usr/src/sys/kern/subr_sleepqueue.c:580
> #4 0xc080a92a in _sleep (ident=0xd4176394, lock=0xc0ceb498, priority=80, wmesg=0xc0bb656e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:230
> #5 0xc07ea9fa in acquire (lkpp=0xcd7375a0, extflags=Variable "extflags" is not available.
> ) at /usr/src/sys/kern/kern_lock.c:151
> #6 0xc07eb2ec in _lockmgr (lkp=0xd4176394, flags=8194, interlkp=0xd41763c4, td=0xce1146c0, file=0xc0bc20c8 "/usr/src/sys/kern/vfs_subr.c",
> at /usr/src/sys/kern/kern_lock.c:384
> #7 0xc0a24765 in ffs_lock (ap=0xcd737608) at /usr/src/sys/ufs/ffs/ffs_vnops.c:377
> #8 0xc0b26876 in VOP_LOCK1_APV (vop=0xc0ca4740, a=0xcd737608) at vnode_if.c:1618
> #9 0xc0896d76 in _vn_lock (vp=0xd417633c, flags=8194, td=0xce1146c0, file=0xc0bc20c8 "/usr/src/sys/kern/vfs_subr.c", line=2062) at
Note that, this vnode (vp) doesn't show up in your list above. You can try
using my gdb scripts at www.freebsd.org/~jhb/gdb (you want gdb6* and do
'source gdb6'). You can then do 'vprint vp' at this frame and should see
lock details about who holds this lock. However, I would not expect the
vnode lock for a new i-node to be already held.
There's a chance though you are tripping over the bug fixed by these two
Date: Fri Jul 16 20:23:24 2010
New Revision: 210173
When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were
changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE
in the vnode lock's flags) until after they had determined if the vnode was
a FIFO. This occurs after the vnode has been inserted into a VFS hash or
some similar table, so it is possible for another thread to find this vnode
via vget() on an i-node number and block on the vnode lock. If the lockmgr
interlock (vnode interlock for vnode locks) is not held when clearing the
LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result
the thread blocked on the vnode lock may never get woken up. Fix this by
holding the vnode interlock while modifying the lock flags in this case.
The softupdates code also toggles LK_NOSHARE in one function to close a
race with snapshots. Fix this code to grab the interlock while fiddling
Date: Fri Aug 20 20:58:57 2010
New Revision: 211533
Revert 210173 as it did not properly fix the bug. It assumed that the
VI_LOCK() for a given vnode was used as the internal interlock for that
vnode's v_lock lockmgr lock. This is not the case. Instead, add dedicated
routines to toggle the LK_NOSHARE and LK_CANRECURSE flags. These routines
lock the lockmgr lock's internal interlock to synchronize the updates to
the flags member with other threads attempting to acquire the lock. The
VN_LOCK_A*() macros now invoke these routines, and the softupdates code
uses these routines to temporarly enable recursion on buffer locks.
Reviewed by: kib
More information about the freebsd-stable