ZFS - NFS server for VMware ESXi issues
Marek Salwerowicz
marek.salwerowicz at misal.pl
Fri Oct 21 09:20:03 UTC 2016
Hi list,
I run a following server:
- Supermicro 6047R-E1R36L
- 32 GB RAM
- 1x INTEL CPU E5-2640 v2 @ 2.00GHz
- FreeBSD 10.1
Drive for OS:
- HW RAID1: 2x KINGSTON SV300S37A120G
zpool:
- 18x WD RED 4TB @ raidz2
- log: mirrored Intel 730 SSD
atime disabled on zfs datasets.
No NFS tuning
MTU 9000
The box works as a NFS filer for 3 VMware ESXi (5.0, 5.1, 5.5) servers
and iSCSI drive for one VM requiring huge space over 1Gbit Network.
Interfaces on the server are aggregated 4x1Gbit.
Current usage:
# zpool list
NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT
tank1 65T 27.3T 37.7T - - 41% 1.00x ONLINE -
The box has been working fine for about two years. However, about two
weeks ago we experienced a NFS service unavailability.
ESXi servers lost NFS connection to the filer (shares were grayed out).
'top' command on the filer has shown that "nfsd: server" process hangs
with consuming hundreds of CPU time (I didn't capture the output).
# service nfsd restart didn't help, the only solution was cold-rebooting
machine
After the reboot I performed system upgrade (was running on 9.2-RELEASE
before) to 10.1-RELEASE
Today, after two weeks of working, we experienced the same situation.
The nfsd service was in following state:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
984 root 128 20 0 12344K 4020K vq->vq 8 346:27 0.00% nfsd
nfsd service didn't respond to service nfsd restart, but this time
machine was able to reboot using "# reboot" command.
nfsd service works in threaded model, with 128 threads (default)
Current top output:
last pid: 2535; load averages: 0.13, 0.14, 0.15 up 0+04:29:06 11:00:24
36 processes: 1 running, 35 sleeping
CPU: 0.0% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.9% idle
Mem: 5724K Active, 48M Inact, 25G Wired, 173M Buf, 5828M Free
ARC: 23G Total, 7737M MFU, 16G MRU, 16M Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free
Displaying threads as a count
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1949 root 129 24 0 12344K 4132K rpcsvc 10 7:15 1.27% nfsd
1025 root 1 20 0 14472K 1936K select 15 0:01 0.00% powerd
1021 root 1 20 0 26096K 18016K select 2 0:01 0.00% ntpd
1147 marek 1 20 0 86488K 7592K select 2 0:01 0.00% sshd
1083 root 1 20 0 24120K 5980K select 0 0:00 0.00%
sendmail
1052 root 1 20 0 30720K 5356K nanslp 6 0:00 0.00% smartd
1260 root 1 20 0 23576K 3576K pause 4 0:00 0.00% csh
1948 root 1 28 0 24632K 5832K select 10 0:00 0.00% nfsd
1144 root 1 20 0 86488K 7544K select 9 0:00 0.00% sshd
1148 marek 1 21 0 24364K 4324K pause 1 0:00 0.00% zsh
852 root 1 20 0 16584K 2192K select 3 0:00 0.00% rpcbind
1926 root 1 52 0 26792K 5956K select 1 0:00 0.00% mountd
848 root 1 20 0 14504K 2144K select 13 0:00 0.00% syslogd
1258 root 1 25 0 50364K 3468K select 6 0:00 0.00% sudo
2432 marek 1 20 0 86488K 7620K select 15 0:00 0.00% sshd
1090 root 1 49 0 16596K 2344K nanslp 0 0:00 0.00% cron
2535 root 1 20 0 21920K 4252K CPU15 15 0:00 0.00% top
1259 root 1 28 0 47708K 2808K wait 12 0:00 0.00% su
2433 marek 1 20 0 24364K 4272K ttyin 1 0:00 0.00% zsh
2429 root 1 20 0 86488K 7616K select 15 0:00 0.00% sshd
751 root 1 20 0 13164K 4548K select 13 0:00 0.00% devd
1086 smmsp 1 20 0 24120K 5648K pause 11 0:00 0.00%
sendmail
1080 root 1 20 0 61220K 6996K select 8 0:00 0.00% sshd
1140 root 1 52 0 14492K 2068K ttyin 9 0:00 0.00% getty
1138 root 1 52 0 14492K 2068K ttyin 15 0:00 0.00% getty
1141 root 1 52 0 14492K 2068K ttyin 10 0:00 0.00% getty
1143 root 1 52 0 14492K 2068K ttyin 0 0:00 0.00% getty
1136 root 1 52 0 14492K 2068K ttyin 6 0:00 0.00% getty
1139 root 1 52 0 14492K 2068K ttyin 2 0:00 0.00% getty
1142 root 1 52 0 14492K 2068K ttyin 4 0:00 0.00% getty
1137 root 1 52 0 14492K 2068K ttyin 13 0:00 0.00% getty
953 root 1 20 0 27556K 3468K select 5 0:00 0.00% ctld
152 root 1 52 0 12336K 1800K pause 9 0:00 0.00%
adjkerntz
734 root 1 52 0 16708K 2044K select 1 0:00 0.00% moused
692 root 1 52 0 16708K 2040K select 8 0:00 0.00% moused
713 root 1 52 0 16708K 2044K select 8 0:00 0.00% moused
and with I/O output:
last pid: 2535; load averages: 0.09, 0.12, 0.15 up 0+04:30:05 11:01:23
36 processes: 1 running, 35 sleeping
CPU: 0.1% user, 0.0% nice, 1.7% system, 0.1% interrupt, 98.2% idle
Mem: 5208K Active, 49M Inact, 25G Wired, 173M Buf, 5821M Free
ARC: 23G Total, 7736M MFU, 16G MRU, 9744K Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free
PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
1949 root 360 31 0 131 0 131 100.00% nfsd
1025 root 8 0 0 0 0 0 0.00% powerd
1021 root 2 0 0 0 0 0 0.00% ntpd
1147 marek 2 0 0 0 0 0 0.00% sshd
1083 root 0 0 0 0 0 0 0.00% sendmail
1052 root 0 0 0 0 0 0 0.00% smartd
1260 root 0 0 0 0 0 0 0.00% csh
1948 root 0 0 0 0 0 0 0.00% nfsd
1144 root 0 0 0 0 0 0 0.00% sshd
My questions:
1. Since we are reaching ~30 TB of allocated space, could it be memory
lacks ( magic rule 1 GB of RAM for 1 TB of ZFS storage space)
2. Does NFS server need tuning in a standard 1Gbit network environment ?
We use lagg aggregation and agree that for one ESXi server have at most
1Gbit of throughput ? Is 128 threads too much ?
3. Could SMART tests have side effect in I/O performance that result in
NFS hangs? I run quite intensively short tests (4 times per day), long
test once per week (on weekend)
Cheers
Marek
More information about the freebsd-fs
mailing list