Hanging/stalling mountd on heavily loaded NFS server
Marc Goroff
marc.goroff at quorum.net
Wed Jul 27 23:05:33 UTC 2016
We have a large and busy production NFS server running 10.2 that is
serving approximately 200 ZFS file systems to production VMs. The system
has been very stable up until last night when we attempted to mount new
ZFS filesystems on NFS clients. The mountd process hung and client mount
requests timed out. The NFS server continued to serve traffic to
existing clients during this time. The mountd was hung in state nfsv4lck:
[root at zfs-west1 ~]# ps -axgl|grep mount
0 38043 1 0 20 0 63672 17644 nfsv4lck Ds - 0:00.30
/usr/sbin/mountd -r -S /etc/exports /etc/zfs/exports
It remains in this state for an indeterminate amount of time. I once saw
it continue on after several minutes, but most of the time it seems to
stay in this state for 15+ minutes. During this time, it does not
respond to kill -9 but it will eventually exit after many minutes.
Restarting mountd will allow the existing NFS clients to continue (they
hang when mountd exits), but any attempt to perform additional NFS
mounts will push mountd back into the bad state.
This problem seems to be related to the number of NFS mounts off the
server. If we unmount some of the clients, we can successfully perform
the NFS mounts of the new ZFS filesystems. However, when we attempt to
mount all of the production NFS mounts, mountd will hang as above.
All clients are using NFS V3 only. dmesg and /var/log/messages show no
errors and the server seems to be operating normally other than mountd.
The nfs server is configured with 256 nfsd threads, 128GB of RAM, 280TB
of disk split into two zpools and 12 CPU cores. Below is the output of
'sysctl -a | grep nfsd' during one of these mountd events:
vfs.nfsd.fha.fhe_stats: hash 13: {
vfs.nfsd.fha.max_reqs_per_nfsd: 0
vfs.nfsd.fha.max_nfsds_per_fh: 8
vfs.nfsd.fha.bin_shift: 22
vfs.nfsd.fha.enable: 1
vfs.nfsd.request_space_throttle_count: 80875
vfs.nfsd.request_space_throttled: 0
vfs.nfsd.request_space_low: 31457280
vfs.nfsd.request_space_high: 47185920
vfs.nfsd.request_space_used_highest: 47841972
vfs.nfsd.request_space_used: 11074576
vfs.nfsd.groups: 2
vfs.nfsd.threads: 256
vfs.nfsd.maxthreads: 256
vfs.nfsd.minthreads: 256
vfs.nfsd.cachetcp: 1
vfs.nfsd.tcpcachetimeo: 43200
vfs.nfsd.udphighwater: 500
vfs.nfsd.tcphighwater: 0
vfs.nfsd.enable_stringtouid: 0
vfs.nfsd.debuglevel: 0
vfs.nfsd.enable_locallocks: 0
vfs.nfsd.issue_delegations: 0
vfs.nfsd.commit_miss: 0
vfs.nfsd.commit_blks: 0
vfs.nfsd.mirrormnt: 1
vfs.nfsd.async: 0
vfs.nfsd.server_max_nfsvers: 3
vfs.nfsd.server_min_nfsvers: 2
vfs.nfsd.nfs_privport: 0
vfs.nfsd.v4statelimit: 500000
vfs.nfsd.sessionhashsize: 20
vfs.nfsd.fhhashsize: 20
vfs.nfsd.clienthashsize: 20
vfs.nfsd.statehashsize: 10
vfs.nfsd.enable_nogroupcheck: 1
vfs.nfsd.enable_nobodycheck: 1
vfs.nfsd.disable_checkutf8: 0
Any suggestion on how to resolve this issue? Since this is a production
server, my options for intrusive debugging are very limited.
Thanks.
Marc
More information about the freebsd-fs
mailing list