Terrible NFS performance under 9.2-RELEASE?

Mon Feb 3 23:21:29 UTC 2014

J David wrote:
> On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Btw, if you do want to test with O_DIRECT ("-I"), you should enable
> > direct io in the client.
> > sysctl vfs.nfs.nfs_directio_enable=1
> >
> > I just noticed that it is disabled by default. This means that your
> > "-I" was essentially being ignored by the FreeBSD client.
> 
> Ouch.  Yes, that appears to be correct.
> 
> > It also explains why Linux isn't doing a read before write, since
> > that wouldn't happen for direct I/O. You should test Linux without
> > "-I"
> > and see if it still doesn't do the read before write, including a
> > "-r 2k"
> > to avoid the "just happens to be a page size" case.
> 
> With O_DIRECT, the Linux client reads only during the read tests.
> Without O_DIRECT, the Linux client does *no reads at all*, not even
> for the read tests.  It caches the whole file and returns
> commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec
> random reads).
> 
> Setting the sysctl on the FreeBSD client does stop it from doing the
> excess reads.  Ironically this actually makes O_DIRECT improve
> performance for all the workloads punished by that behavior.
> 
> It also creates a fairly consistent pattern indicating performance
> being bottlenecked by the FreeBSD NFS server.
> 
> Here is a sample of test results, showing both throughput and IOPS
> achieved:
> 
> https://imageshack.com/i/4jiljhp
> 
> In this chart, the 64k test is run 4 times, once with FreeBSD as both
> client and server (64k), once with Linux as the client and FreeBSD as
> the server (L/F 64k), once with FreeBSD as the client and Linux as
> the
> server (F/L 64k, hands down the best NFS combo), and once with Linux
> as the client and server (L/L 64k).  For reference, the native
> performance of the md0 filesystem is also included.
> 
> The TLDR version of this chart is that the FreeBSD NFS server is the
> primary bottleneck; it is not being held back by the network or the
> underlying disk.  Ideally, it would be nice to see the 64k column for
> FreeBSD client / FreeBSD server as high as or higher than the column
> for FreeBSD client / Linux Server.  (Also, the bottleneck of the F/L
> 64k test appears to be CPU on the FreeBSD client.)
> 
> The more detailed findings are:
> 
> 1) The Linux client runs at about half the IOPS of the FreeBSD client
> regardless of server type.  The gut-level suspicion is that it must
> be
> doing twice as many NFS operations per write. (Possibly commit?)
> 
> 2) The FreeBSD NFS server seems capped at around 6300 IOPS.  This is
> neither a limit of the network (at least 28k IOPS) nor the filesystem
> (about 40k IOPs).
> 
> 3) When O_DIRECT is not used (not shown), the excess read operations
> pull from the same 6300 IOPS bucket, and that's what kills small
> writes.
> 
> 4) It's possible that the sharp drop off visible at the 64k/64k test
> is a result of doubling the number of packets traversing the
> TSO-capable network.
> 
> Here's a representative top from the server while the test is
> running,
> showing all the nfsd kernel threads being utilized, and spare RAM and
> CPU:
> 
> last pid: 14996;  load averages:  0.58,  0.17,  0.10
>                                            up 5+01:42:13  04:02:24
> 
> 255 processes: 4 running, 223 sleeping, 28 waiting
> 
> CPU:  0.0% user,  0.0% nice, 55.9% system,  9.2% interrupt, 34.9%
> idle
> 
> Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free
> 
> ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other
> 
> Swap: 8192M Total, 8192M Free
> 
> 
>   PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU
>   COMMAND
> 
>    11 root       155 ki31     0K    32K RUN     1 120.8H 65.87%
>    idle{idle: cpu1}
> 
>    11 root       155 ki31     0K    32K RUN     0 120.6H 48.63%
>    idle{idle: cpu0}
> 
>    12 root       -92    -     0K   448K WAIT    0  13:41 14.55%
> intr{irq268: virtio_p}
> 
>  1001 root        -8    -     0K    16K mdwait  0  19:37 12.99% md0
> 
>    13 root        -8    -     0K    48K -       0  10:53  6.05%
>    geom{g_down}
> 
>    12 root       -92    -     0K   448K WAIT    1   3:13  4.64%
> intr{irq269: virtio_p}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  3.08%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:11  2.83%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:04  2.64%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  2.29%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  1   6:08  2.20%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   5:40  2.20%
> nfsd{nfsd: master}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  1.95%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:50  1.90%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:47  1.66%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K RUN     0   2:13  1.66%
> nfsd{nfsd: service}
> 
>    13 root        -8    -     0K    48K -       0   1:55  1.46%
>    geom{g_up}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:39  1.42%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   5:18  1.32%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   1:55  1.12%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:00  0.98%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:01  0.73%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   5:56  0.49%
> nfsd{nfsd: service}
> 
> 
> All of this tends to exonerate the client.  So what would be the next
> step to track down the cause of poor performance on the server-side?
> 
Ok, I'm going to assume that you either applied my patch to the server
(so it is using 4K clusters and isn't exceeding the 33 limit)
or updated your virtio network driver with the fixes Bryan recently
committed to head, so it should work better with the 34 packet read
replies.

Did you also set these sysctl values in the server?
vfs.nfsd.tcphighwater=100000
vfs.nfsd.tcpcache_timeout=600 (or whatever it is called, just do a
  "sysctl -a" and look for it in the vfs.nfsd section.)

If you have done all of the above and the server still appears slow,
you need to test with a recent kernel that has mav@'s changes in it.
He has done a bunch of recent performance work on it, using the
SpecNFS benchmark, I believe.

rick

(I believe he has now MFC'd these to stable/9, if using a recent
 head kernel isn't practical.)
> Thanks!
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>