nfs server /home not responding

Bill Moran wmoran at potentialtech.com
Tue Aug 24 19:12:01 UTC 2010


In response to Lucas Wang <lwang at us.toyota-itc.com>:
> 
> We use NFS to store /home directory for users in our lab.
> However, we occasionally get blocked from logging in because 
> the automount daemon on a NFS client machine hangs. When
> that happens, we get this error message on the NFS client machine
> called "bucks" in its system logs:
> Aug 24 10:53:40 bucks kernel: nfs server pid670 at bucks:/home: not responding
> 
> pid670 is the amd process.
> 
> Our NFS server(raptors) has the following configuration:
> FreeBSD raptors.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb  9 12:59:50 PST 2010     root at raptors.cs.ucla.edu:/usr/obj/usr/src/sys/RAPTORS  amd64
> 
> And the client machine is configured as:
> FreeBSD bucks.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb  9 20:47:50 UTC 2010     root at bucks.cs.ucla.edu:/usr/obj/usr/src/sys/BUCKS  amd64
> 
> Another thing I want to add is that several other NFS client machines
> also hang from time to time. But they don't usually hang at the same time.
> Even though rebooting can fix the problem once, we don't want it keep hurting us.
> 
> So any insights or suggestions will be greatly appreciated. Thanks a lot.

Do you have dumbtimer in the options for the nfs mount?

My research into this indicated that the NFS client keeps track of average
response times from the server.  If the server starts to respond significantly
slower than is expected, the code assumes that the server is down and the
mount freezes and that message appears in the logs.  Usually, after a
short wait (a few minutes) the connection resumes and you see a "server
is alive again message".  See man mount_nfs for more info.  Also, try
switching to TCP mounts.

If you have a network that occasionally gets hit with traffic spikes that
cause data packets to take abnormally long to travel, or an NFS server that
occasionally gets usage spikes that cause it to respond slowly, this will
happen.

In addition to dumbtimer you can also look at better segmenting your
network, or increasing the capacity of the NFS server to prevent the
problem.

If the NFS hangs occur and the mount never recovers (even after several
minutes) then you probably have a different problem.  Possibly a firewall
is losing the state table and thus the connection is going bad?

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


More information about the freebsd-questions mailing list