Deadlock in the NFS client

John Baldwin jhb at freebsd.org
Wed Mar 13 19:20:35 UTC 2013


I ran into a machine that had a deadlock among certain files on a given NFS 
mount today.  I'm not sure how best to resolve it, though it seems like 
perhaps there is a bug with how the pool of nfsiod threads is managed.  
Anyway, more details on the actual hang below.  This was on 8.x with the
old NFS client, but I don't see anything in HEAD that would fix this.

First note that the system was idle so it had dropped down to only one
nfsiod thread.

The nfsiod thread is hung on a vnode lock:

(kgdb) proc 36927
[Switching to thread 150 (Thread 100679)]#0  sched_switch (
    td=0xffffff0320de88c0, newtd=0xffffff0003521460, flags=Variable "flags" is 
not available.
)
    at /usr/src/sys/kern/sched_ule.c:1898
1898                    cpuid = PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=0xffffff0320de88c0, newtd=0xffffff0003521460, 
flags=Variable "flags" is not available.
)
    at /usr/src/sys/kern/sched_ule.c:1898
#1  0xffffffff80407953 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:449
#2  0xffffffff8043e342 in sleepq_wait (wchan=0xffffff0358bbb7f8, pri=96)
    at /usr/src/sys/kern/subr_sleepqueue.c:629
#3  0xffffffff803e5755 in __lockmgr_args (lk=0xffffff0358bbb7f8, flags=524544, 
    ilk=0xffffff0358bbb820, wmesg=Variable "wmesg" is not available.
) at /usr/src/sys/kern/kern_lock.c:220
#4  0xffffffff80489219 in vop_stdlock (ap=Variable "ap" is not available.
) at lockmgr.h:94
#5  0xffffffff80697322 in VOP_LOCK1_APV (vop=0xffffffff80892b00, 
    a=0xffffff847ac10600) at vnode_if.c:1988
#6  0xffffffff804a8bb7 in _vn_lock (vp=0xffffff0358bbb760, flags=524288, 
    file=0xffffffff806fa421 "/usr/src/sys/kern/vfs_subr.c", line=2138)
    at vnode_if.h:859
#7  0xffffffff8049b680 in vget (vp=0xffffff0358bbb760, flags=524544, 
    td=0xffffff0320de88c0) at /usr/src/sys/kern/vfs_subr.c:2138
#8  0xffffffff8048d4aa in vfs_hash_get (mp=0xffffff004a3a0000, hash=227722108, 
    flags=Variable "flags" is not available.
) at /usr/src/sys/kern/vfs_hash.c:81
#9  0xffffffff805631f6 in nfs_nget (mntp=0xffffff004a3a0000, 
    fhp=0xffffff03771eed56, fhsize=32, npp=0xffffff847ac10a40, flags=524288)
    at /usr/src/sys/nfsclient/nfs_node.c:120
#10 0xffffffff80570229 in nfs_readdirplusrpc (vp=0xffffff0179902760, 
    uiop=0xffffff847ac10ad0, cred=0xffffff005587c300)
    at /usr/src/sys/nfsclient/nfs_vnops.c:2636
---Type <return> to continue, or q <return> to quit---
#11 0xffffffff8055f144 in nfs_doio (vp=0xffffff0179902760, 
    bp=0xffffff83e05c5860, cr=0xffffff005587c300, td=Variable "td" is not 
available.
)
    at /usr/src/sys/nfsclient/nfs_bio.c:1600
#12 0xffffffff8056770a in nfssvc_iod (instance=Variable "instance" is not 
available.
)
    at /usr/src/sys/nfsclient/nfs_nfsiod.c:303
#13 0xffffffff803d0c2f in fork_exit (callout=0xffffffff805674b0 <nfssvc_iod>, 
    arg=0xffffffff809266e0, frame=0xffffff847ac10c40)
    at /usr/src/sys/kern/kern_fork.c:861

Thread stuck in getblk for that vnode (holds shared lock on this vnode):

(kgdb) proc 36902
[Switching to thread 149 (Thread 101543)]#0  sched_switch (
    td=0xffffff0378d8a8c0, newtd=0xffffff0003521460, flags=Variable "flags" is 
not available.
)
    at /usr/src/sys/kern/sched_ule.c:1898
1898                    cpuid = PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=0xffffff0378d8a8c0, newtd=0xffffff0003521460, 
flags=Variable "flags" is not available.
)
    at /usr/src/sys/kern/sched_ule.c:1898
#1  0xffffffff80407953 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:449
#2  0xffffffff8043e342 in sleepq_wait (wchan=0xffffff83e11bc1c0, pri=96)
    at /usr/src/sys/kern/subr_sleepqueue.c:629
#3  0xffffffff803e5755 in __lockmgr_args (lk=0xffffff83e11bc1c0, flags=530688, 
    ilk=0xffffff0358bbb878, wmesg=Variable "wmesg" is not available.
) at /usr/src/sys/kern/kern_lock.c:220
#4  0xffffffff80483aeb in getblk (vp=0xffffff0358bbb760, blkno=3, size=32768, 
    slpflag=0, slptimeo=0, flags=0) at lockmgr.h:94
#5  0xffffffff8055e963 in nfs_getcacheblk (vp=0xffffff0358bbb760, bn=3, 
    size=32768, td=0xffffff0378d8a8c0) at 
/usr/src/sys/nfsclient/nfs_bio.c:1259
#6  0xffffffff805627d9 in nfs_bioread (vp=0xffffff0358bbb760, 
    uio=0xffffff847bcf0ad0, ioflag=Variable "ioflag" is not available.
) at /usr/src/sys/nfsclient/nfs_bio.c:530
#7  0xffffffff806956a4 in VOP_READ_APV (vop=0xffffffff808a29a0, 
    a=0xffffff847bcf09c0) at vnode_if.c:887
#8  0xffffffff804a9e27 in vn_read (fp=0xffffff03b6a506e0, 
    uio=0xffffff847bcf0ad0, active_cred=Variable "active_cred" is not 
available.
) at vnode_if.h:384
#9  0xffffffff80444fb1 in dofileread (td=0xffffff0378d8a8c0, fd=3, 
    fp=0xffffff03b6a506e0, auio=0xffffff847bcf0ad0, offset=Variable "offset" 
is not available.
) at file.h:242

The buffer is locked by LK_KERNPROC:

(kgdb) bprint bp
0xffffff83e11bc128: BIO_READ flags (ASYNC|VMIO)
error = 0, bufsize = 32768, bcount = 32768, b_resid = 0
bufobj = 0xffffff0358bbb878, data = 0xffffff8435a05000, blkno = d, dep = 0x0
 lock type bufwait: EXCL by LK_KERNPROC with exclusive waiters pending

And this buffer is queued as the first pending buffer on the mount waiting
for service by nfsiod:

(kgdb) set $nmp = (struct nfsmount *)vp->v_mount->mnt_data
(kgdb) p $nmp->nm_bufq.tqh_first
$24 = (struct buf *) 0xffffff83e11bc128
(kgdb) p bp
$25 = (struct buf *) 0xffffff83e11bc128

So, the first process is waiting for a block from an NFS directory.  That 
block is in queue to be completed as an async I/O by the nfsiod thread pool.  
However, the lone nfsiod thread in the pool is waiting to exclusively lock the 
original NFS directory to update its attributes, so it cannot service the 
async I/O request.

-- 
John Baldwin


More information about the freebsd-fs mailing list