possible NFS lockups

Rick Macklem rmacklem at uoguelph.ca
Sun Aug 1 01:26:18 UTC 2010

> From: "krad" <kraduk at googlemail.com>
> To: freebsd-hackers at freebsd.org, "FreeBSD Questions" <freebsd-questions at freebsd.org>
> Sent: Tuesday, July 27, 2010 11:29:20 AM
> Subject: possible NFS lockups
> I have a production mail system with an nfs backend. Every now and
> again we
> see the nfs die on a particular head end. However it doesn't die
> across all
> the nodes. This suggests to me there isnt an issue with the filer
> itself and
> the stats from the filer concur with that.
> The symptoms are lines like this appearing in dmesg
> nfs server not responding
> nfs server is alive again
> trussing df it seems to hang on getfsstat, this is presumably when it
> tries
> the nfs mounts
> eg
> __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
> mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 1746583552 (0x681ac000)
> mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0)
> =
> 1747632128 (0x682ac000)
> munmap(0x681ac000,344064) = 0 (0x0)
> getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)
> I have played with mount options a fair bit but they dont make much
> difference. This is what they are set to at present
> /mail/0 nfs
> rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0
> When this locking is occuring I find that if I do a show mount or
> mount
> again under another mount point I can
> access it
> fine.
> One thing I have just noticed is that lockd and statd always seem to
> have
> died when this happens. Restarting does not help
lockd and statd implement separate protocols (NLM ans NSM) that do
locking. The protocols were poorly designed and fundamentally
broken imho. (That refers to the protocols and not the implementation.)

I am not familiar with the lockd and statd implementations, but if you
don't need file locking to work for the same file when accessed
concurrently from multiple clients (heads) concurrently, you can use
the "nolockd" mount option to avoid using them. (I have no idea if
the mail system you are using will work without lockd or not? It
should be ok to use "nolockd" if file locking is only done on a
given file in one client node.)

I suspect that some interaction between your server and the
lockd/statd client causes them to crash and then the client is
stuck trying to talk to them, but I don't really know? Looking
at where all the processes and threads are sleeping via "ps axlH"
may tell you what is stuck and where.

As others noted, intermittent "server not responding...server ok"
messages just indicate slow response from the server and don't
mean much. However, if a given process is hung and doesn't
recover, knowing what it is sleeping on can help w.r.t diagnosis.


More information about the freebsd-questions mailing list