Terrible NFS performance under 9.2-RELEASE?
Rick Macklem
rmacklem at uoguelph.ca
Tue Feb 4 00:03:26 UTC 2014
J David wrote:
> On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Btw, if you do want to test with O_DIRECT ("-I"), you should enable
> > direct io in the client.
> > sysctl vfs.nfs.nfs_directio_enable=1
> >
> > I just noticed that it is disabled by default. This means that your
> > "-I" was essentially being ignored by the FreeBSD client.
>
> Ouch. Yes, that appears to be correct.
>
> > It also explains why Linux isn't doing a read before write, since
> > that wouldn't happen for direct I/O. You should test Linux without
> > "-I"
> > and see if it still doesn't do the read before write, including a
> > "-r 2k"
> > to avoid the "just happens to be a page size" case.
>
> With O_DIRECT, the Linux client reads only during the read tests.
> Without O_DIRECT, the Linux client does *no reads at all*, not even
> for the read tests. It caches the whole file and returns
> commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec
> random reads).
>
It looks like the "-U" option can be used to get iozone to unmount/remount
the file system and avoid hits on the buffer cache.
(Alternately, a single run after doing a manual dismount/mount might help.)
> Setting the sysctl on the FreeBSD client does stop it from doing the
> excess reads. Ironically this actually makes O_DIRECT improve
> performance for all the workloads punished by that behavior.
>
> It also creates a fairly consistent pattern indicating performance
> being bottlenecked by the FreeBSD NFS server.
>
> Here is a sample of test results, showing both throughput and IOPS
> achieved:
>
> https://imageshack.com/i/4jiljhp
>
> In this chart, the 64k test is run 4 times, once with FreeBSD as both
> client and server (64k), once with Linux as the client and FreeBSD as
> the server (L/F 64k), once with FreeBSD as the client and Linux as
> the
> server (F/L 64k, hands down the best NFS combo), and once with Linux
> as the client and server (L/L 64k). For reference, the native
> performance of the md0 filesystem is also included.
>
> The TLDR version of this chart is that the FreeBSD NFS server is the
> primary bottleneck; it is not being held back by the network or the
> underlying disk. Ideally, it would be nice to see the 64k column for
> FreeBSD client / FreeBSD server as high as or higher than the column
> for FreeBSD client / Linux Server. (Also, the bottleneck of the F/L
> 64k test appears to be CPU on the FreeBSD client.)
>
> The more detailed findings are:
>
> 1) The Linux client runs at about half the IOPS of the FreeBSD client
> regardless of server type. The gut-level suspicion is that it must
> be
> doing twice as many NFS operations per write. (Possibly commit?)
>
> 2) The FreeBSD NFS server seems capped at around 6300 IOPS. This is
> neither a limit of the network (at least 28k IOPS) nor the filesystem
> (about 40k IOPs).
>
> 3) When O_DIRECT is not used (not shown), the excess read operations
> pull from the same 6300 IOPS bucket, and that's what kills small
> writes.
>
> 4) It's possible that the sharp drop off visible at the 64k/64k test
> is a result of doubling the number of packets traversing the
> TSO-capable network.
>
> Here's a representative top from the server while the test is
> running,
> showing all the nfsd kernel threads being utilized, and spare RAM and
> CPU:
>
> last pid: 14996; load averages: 0.58, 0.17, 0.10
> up 5+01:42:13 04:02:24
>
> 255 processes: 4 running, 223 sleeping, 28 waiting
>
> CPU: 0.0% user, 0.0% nice, 55.9% system, 9.2% interrupt, 34.9%
> idle
>
> Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free
>
> ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other
>
> Swap: 8192M Total, 8192M Free
>
>
> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU
> COMMAND
>
> 11 root 155 ki31 0K 32K RUN 1 120.8H 65.87%
> idle{idle: cpu1}
>
> 11 root 155 ki31 0K 32K RUN 0 120.6H 48.63%
> idle{idle: cpu0}
>
> 12 root -92 - 0K 448K WAIT 0 13:41 14.55%
> intr{irq268: virtio_p}
>
> 1001 root -8 - 0K 16K mdwait 0 19:37 12.99% md0
>
> 13 root -8 - 0K 48K - 0 10:53 6.05%
> geom{g_down}
>
> 12 root -92 - 0K 448K WAIT 1 3:13 4.64%
> intr{irq269: virtio_p}
>
> 859 root -4 0 9912K 1824K ufs 1 2:00 3.08%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 2:11 2.83%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 2:04 2.64%
> nfsd{nfsd: service}
>
> 859 root -4 0 9912K 1824K ufs 1 2:00 2.29%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 1 6:08 2.20%
> nfsd{nfsd: service}
>
> 859 root -4 0 9912K 1824K ufs 1 5:40 2.20%
> nfsd{nfsd: master}
>
> 859 root -4 0 9912K 1824K ufs 1 2:00 1.95%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 2:50 1.90%
> nfsd{nfsd: service}
>
> 859 root -4 0 9912K 1824K ufs 1 2:47 1.66%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K RUN 0 2:13 1.66%
> nfsd{nfsd: service}
>
> 13 root -8 - 0K 48K - 0 1:55 1.46%
> geom{g_up}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 2:39 1.42%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 5:18 1.32%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 1:55 1.12%
> nfsd{nfsd: service}
>
> 859 root -8 0 9912K 1824K rpcsvc 0 2:00 0.98%
> nfsd{nfsd: service}
>
> 859 root -4 0 9912K 1824K ufs 1 2:01 0.73%
> nfsd{nfsd: service}
>
> 859 root -4 0 9912K 1824K ufs 1 5:56 0.49%
> nfsd{nfsd: service}
>
>
> All of this tends to exonerate the client. So what would be the next
> step to track down the cause of poor performance on the server-side?
>
Also, as an alternative to setting the 2 sysctls for the TCP DRC cache,
you can simply disable it by setting the sysctl:
vfs.nfsd.cachetcp=0
Many would argue that doing a DRC for TCP isn't necessary (it wasn't
done in most NFS servers, including the old NFS server in FreeBSD).
I have no idea if the Linux server does a DRC for TCP.
And make sure you've increased your nfsd thread count. You can put
a line like this in your /etc/rc.conf to do that:
nfs_server_flags="-u -t -n 64"
(sets it to 64, which should be plenty for a single client.)
rick
> Thanks!
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list