NFS deadlock (unkillable nfsd and no mounts work)
josh.carroll at gmail.com
Fri Nov 5 06:01:30 UTC 2010
I'm having a problem with nfsd hanging and not serving mount points,
during which time it can not not be killed. This problem started
happening sometime after November 2nd, since kernel from 11/2 sources
does not exhibit this problem.
The current kernel I'm running is via SVN I just grabbed this evening
(around 5pm PDT on November 4th), but I was having the same problem
yesterday around 9pm PDT after a csup yesterday (I switched to SVN
today to rule out a stale /usr/src from an out of sync cvsup mirror).
Here are the svn details:
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Node Kind: directory
Last Changed Author: jhb
Last Changed Rev: 214791
Last Changed Date: 2010-11-04 10:25:31 -0700 (Thu, 04 Nov 2010)
FreeBSD 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807: Thu Nov 4 17:13:05
PDT 2010 root at pflog.net:/usr/obj/usr/src/sys/PFLOG amd64
I have a Popcorn Hour, and as soon as I try to connect to my NFS mount
with it, it hangs on the Popcorn Hour, then eventually pops up a
message that says "Request cannot be processed". Likewise if I try to
mount it from my macbook, it hangs then later just says operation
timed out or something like that, after it hangs for quite a while.
During this hang, there is nothing in /var/log indicating a problem
nor any other indications something is wrong, except that none of my
NFS mounts work and the nfsd process will not die.
When I try to reboot the server, I wind up having to fsck all my
drives (except the ZFS one), since nfsd will not die. Even kill -9
doesn't kill it (it's showing as in the D state):
root 444 0.0 0.0 5812 1384 ?? D 9:30PM 0:00.00 nfsd: server (nfsd)
And if I try to /etc/rc.d/nfsd stop, it just says:
Waiting for PIDS: 444
And hangs there indefinitely. I tried to run a ktrace on both the
"nfsd: server" and "nfsd: master" processes (ktrace -i -d -f
nfsd_server.ktrace and ktrace -i -d -f nfsd_master.ktrace), but when I
try to connect to the NFS mount, ktrace doesn't capture anything and
the "nfsd: server" process goes to the "D" state and then I can't kill
it. If I try to kill the nfsd process BEFORE I attempt to mount
anything, it properly stops with /etc/rc.d/nfsd stop or with a kill
-TERM. Once I've tried to connect once, however, it can't be killed.
Hoping it was perhaps related to ZFS, I commented out the one ZFS
mount point in /etc/exports, but it still causes this deadlock in the
nfsd process. I even went as far as to comment everything in
/etc/exports and create a new export on a different disk, which did
not help, I get the same nfsd hang.
Another strange thing, if I try to truss on the "nfsd: server" process
(the child) before I try to mount anything, it causes the process to
exit immediately along with truss. If I look at what truss captured
for it, I see:
= 0 (0x0)
411: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
411: process exit, rval = 0
My kernel built from sources on 11/2 works fine, so it's something
that has changed sometime after November 2nd. At least, my kernel from
November 2nd runs fine and does not have this nfsd lockup problem.
My kernel is just GENERIC with a few additions:
If any other information is needed, please let me know. What are the
next things I should be doing to diagnose the problem? It seems
specific to nfsd, but I'm not sure how to prove it's that and not
something related or complimentary to nfsd. For what it's worth
rpcbind and mountd both stop fine, it's just the nfsd process that is
Thanks in advance for any advice on troubleshooting or root-causing
the issue would be appreciated.
More information about the freebsd-stable