ZFS - NFS server for VMware ESXi issues

Fri Oct 21 09:20:03 UTC 2016

Hi list,

I run a following server:

- Supermicro 6047R-E1R36L
- 32 GB RAM
- 1x INTEL CPU E5-2640 v2 @ 2.00GHz
- FreeBSD 10.1

Drive for OS:
- HW RAID1: 2x KINGSTON SV300S37A120G

zpool:
- 18x WD RED 4TB @ raidz2
- log: mirrored Intel 730 SSD

atime disabled on zfs datasets.
No NFS tuning
MTU 9000

The box works as a NFS filer for 3 VMware ESXi (5.0, 5.1, 5.5) servers 
and iSCSI drive for one VM requiring huge space over 1Gbit Network. 
Interfaces on the server are aggregated 4x1Gbit.

Current usage:
# zpool list
NAME    SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH ALTROOT
tank1    65T  27.3T  37.7T      -         -    41%  1.00x  ONLINE  -

The box has been working fine for about two years. However, about two 
weeks ago we experienced a NFS service unavailability.
ESXi servers lost NFS connection to the filer (shares were grayed out).

'top' command on the filer has shown that "nfsd: server" process hangs 
with consuming hundreds of CPU time (I didn't capture the output).
# service nfsd restart didn't help, the only solution was cold-rebooting 
machine

After the reboot I performed system upgrade (was running on 9.2-RELEASE 
before) to 10.1-RELEASE

Today, after two weeks of working, we experienced the same situation. 
The nfsd service was in following state:

   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
   984 root        128  20    0 12344K  4020K vq->vq  8 346:27 0.00% nfsd

nfsd service didn't respond to service nfsd restart, but this time 
machine was able to reboot using "# reboot" command.

nfsd service works in threaded model, with 128 threads (default)

Current top output:
last pid:  2535;  load averages:  0.13,  0.14, 0.15 up 0+04:29:06  11:00:24
36 processes:  1 running, 35 sleeping
CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
Mem: 5724K Active, 48M Inact, 25G Wired, 173M Buf, 5828M Free
ARC: 23G Total, 7737M MFU, 16G MRU, 16M Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free
  Displaying threads as a count
   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
  1949 root        129  24    0 12344K  4132K rpcsvc 10   7:15 1.27% nfsd
  1025 root          1  20    0 14472K  1936K select 15   0:01 0.00% powerd
  1021 root          1  20    0 26096K 18016K select  2   0:01 0.00% ntpd
  1147 marek         1  20    0 86488K  7592K select  2   0:01 0.00% sshd
  1083 root          1  20    0 24120K  5980K select  0   0:00 0.00% 
sendmail
  1052 root          1  20    0 30720K  5356K nanslp  6   0:00 0.00% smartd
  1260 root          1  20    0 23576K  3576K pause   4   0:00 0.00% csh
  1948 root          1  28    0 24632K  5832K select 10   0:00 0.00% nfsd
  1144 root          1  20    0 86488K  7544K select  9   0:00 0.00% sshd
  1148 marek         1  21    0 24364K  4324K pause   1   0:00 0.00% zsh
   852 root          1  20    0 16584K  2192K select  3   0:00 0.00% rpcbind
  1926 root          1  52    0 26792K  5956K select  1   0:00 0.00% mountd
   848 root          1  20    0 14504K  2144K select 13   0:00 0.00% syslogd
  1258 root          1  25    0 50364K  3468K select  6   0:00 0.00% sudo
  2432 marek         1  20    0 86488K  7620K select 15   0:00 0.00% sshd
  1090 root          1  49    0 16596K  2344K nanslp  0   0:00 0.00% cron
  2535 root          1  20    0 21920K  4252K CPU15  15   0:00 0.00% top
  1259 root          1  28    0 47708K  2808K wait   12   0:00 0.00% su
  2433 marek         1  20    0 24364K  4272K ttyin   1   0:00 0.00% zsh
  2429 root          1  20    0 86488K  7616K select 15   0:00 0.00% sshd
   751 root          1  20    0 13164K  4548K select 13   0:00 0.00% devd
  1086 smmsp         1  20    0 24120K  5648K pause  11   0:00 0.00% 
sendmail
  1080 root          1  20    0 61220K  6996K select  8   0:00 0.00% sshd
  1140 root          1  52    0 14492K  2068K ttyin   9   0:00 0.00% getty
  1138 root          1  52    0 14492K  2068K ttyin  15   0:00 0.00% getty
  1141 root          1  52    0 14492K  2068K ttyin  10   0:00 0.00% getty
  1143 root          1  52    0 14492K  2068K ttyin   0   0:00 0.00% getty
  1136 root          1  52    0 14492K  2068K ttyin   6   0:00 0.00% getty
  1139 root          1  52    0 14492K  2068K ttyin   2   0:00 0.00% getty
  1142 root          1  52    0 14492K  2068K ttyin   4   0:00 0.00% getty
  1137 root          1  52    0 14492K  2068K ttyin  13   0:00 0.00% getty
   953 root          1  20    0 27556K  3468K select  5   0:00 0.00% ctld
   152 root          1  52    0 12336K  1800K pause   9   0:00 0.00% 
adjkerntz
   734 root          1  52    0 16708K  2044K select  1   0:00 0.00% moused
   692 root          1  52    0 16708K  2040K select  8   0:00 0.00% moused
   713 root          1  52    0 16708K  2044K select  8   0:00 0.00% moused

and with I/O output:
last pid:  2535;  load averages:  0.09,  0.12, 0.15 up 0+04:30:05  11:01:23
36 processes:  1 running, 35 sleeping
CPU:  0.1% user,  0.0% nice,  1.7% system,  0.1% interrupt, 98.2% idle
Mem: 5208K Active, 49M Inact, 25G Wired, 173M Buf, 5821M Free
ARC: 23G Total, 7736M MFU, 16G MRU, 9744K Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free

   PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
  1949 root          360     31      0    131      0    131 100.00% nfsd
  1025 root            8      0      0      0      0      0   0.00% powerd
  1021 root            2      0      0      0      0      0   0.00% ntpd
  1147 marek           2      0      0      0      0      0   0.00% sshd
  1083 root            0      0      0      0      0      0   0.00% sendmail
  1052 root            0      0      0      0      0      0   0.00% smartd
  1260 root            0      0      0      0      0      0   0.00% csh
  1948 root            0      0      0      0      0      0   0.00% nfsd
  1144 root            0      0      0      0      0      0   0.00% sshd

My questions:
1. Since we are reaching ~30 TB of allocated space, could it be memory 
lacks ( magic rule 1 GB of RAM for 1 TB of  ZFS storage space)
2. Does NFS server need tuning in a standard 1Gbit network environment ? 
We use lagg aggregation and agree that for one ESXi server have at most 
1Gbit of throughput ? Is 128 threads too much ?
3. Could SMART tests have side effect in I/O performance that result in 
NFS hangs? I run quite intensively short tests (4 times per day), long 
test once per week (on weekend)

Cheers

Marek