Still getting NFS client locking up
Soren Schmidt
sos at spider.deepcore.dk
Mon Nov 10 09:44:01 PST 2003
It seems Robert Watson wrote:
> How fast are your systems, speaking of which? I live in the world of
> 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're
> using multi-ghz boxes, that could well be the distinguishing factor
> between our configurations...
Server is 533MhzVIA C3, clients everything from 300Mhz PII to 2.6G P4.
> Ok, here's the strategy I was planning to take once I could reproduce it:
>
> (1) Attempt to further narrow down responsibility to client/server. In
> particular, see if an apparent hang on one client affects the other
> clients.
For me its just the server end that fails, I've not seen the client hang.
> (2) Investigate Soren's report that killing and restarting nfsd on the
> server would clear the hang.
Yups, that works, in fact I have that in my crontab now every minute
to keep NFS from hosing my setup here.
NOTE: I also still need to ifconfig done/up my interfaces on some
boxes or the netstack will freeze (again done every minute in crontab).
However when NFS locks up it seems totatlly unrelated, ie all other
network traffic works...
> (3) Look at stack traces of involved processes on both the client and
> server: in particular, look at traces for any client blocked in NFS,
> any nfsiod processes on the client, and the nfsd processes on the
> server. Also look at the wait channels on clients and servers for
> these processes. Particularly interested in whether nfsd processes
> are blocked trying to grab locks.
Ok, will do..
> (4) Look at netstat information for NFS sockets, in particular, if the
> buffers are full, or not being drained. In particular, on the server,
> is the input queue not being drained by nfsd worker threads?
Netstat doesn't seem to give any hints or even usefull info here,
any special cmdøs you want the output from ?
> (5) Try backing out src/sys/nfsserver/nfs_serv.c:1.137, which removed
> another deadlock problem, but did change locking behavior in the NFS
> server.
No change already tried.
> (6) Look at packet traces between the client and server with ethereal,
> which has pretty good NFS decoding. Is the client retransmitting an
> RPC to the server and the server just isn't responding, or is the
> client failing to transmit? At the point of the hang, what sorts of
> RPCs are outstanding to the server? In the past, we've seen "apparent
> hangs" when some or another more obscure unusual error case on the NFS
> server fails to respond to an RPC, which causes the client to "wait
> forever".
I can try that easily, I'll get a trace to you later tonight...
> Things to look for: normally, idle nfsd and nfsiod processes have a WCHAN
> of "-" (ps -lax), which indicates they're blocked waiting for some event
> to kick them off. If you see nfsd processes "hung" in another state, it's
> a good sign we've identified a server problem. In the nfs client
> processes, "nfsrcvlk" typically indicates a process has sent out an RPC
> and is now waiting on a response.
I see the idle '-' wchan here when things go bad IIRC...
-Søren
More information about the freebsd-current
mailing list