Terrible NFS performance under 9.2-RELEASE?

Mon Feb 3 04:21:34 UTC 2014

On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Btw, if you do want to test with O_DIRECT ("-I"), you should enable
> direct io in the client.
> sysctl vfs.nfs.nfs_directio_enable=1
>
> I just noticed that it is disabled by default. This means that your
> "-I" was essentially being ignored by the FreeBSD client.

Ouch.  Yes, that appears to be correct.

> It also explains why Linux isn't doing a read before write, since
> that wouldn't happen for direct I/O. You should test Linux without "-I"
> and see if it still doesn't do the read before write, including a "-r 2k"
> to avoid the "just happens to be a page size" case.

With O_DIRECT, the Linux client reads only during the read tests.
Without O_DIRECT, the Linux client does *no reads at all*, not even
for the read tests.  It caches the whole file and returns
commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec
random reads).

Setting the sysctl on the FreeBSD client does stop it from doing the
excess reads.  Ironically this actually makes O_DIRECT improve
performance for all the workloads punished by that behavior.

It also creates a fairly consistent pattern indicating performance
being bottlenecked by the FreeBSD NFS server.

Here is a sample of test results, showing both throughput and IOPS achieved:

https://imageshack.com/i/4jiljhp

In this chart, the 64k test is run 4 times, once with FreeBSD as both
client and server (64k), once with Linux as the client and FreeBSD as
the server (L/F 64k), once with FreeBSD as the client and Linux as the
server (F/L 64k, hands down the best NFS combo), and once with Linux
as the client and server (L/L 64k).  For reference, the native
performance of the md0 filesystem is also included.

The TLDR version of this chart is that the FreeBSD NFS server is the
primary bottleneck; it is not being held back by the network or the
underlying disk.  Ideally, it would be nice to see the 64k column for
FreeBSD client / FreeBSD server as high as or higher than the column
for FreeBSD client / Linux Server.  (Also, the bottleneck of the F/L
64k test appears to be CPU on the FreeBSD client.)

The more detailed findings are:

1) The Linux client runs at about half the IOPS of the FreeBSD client
regardless of server type.  The gut-level suspicion is that it must be
doing twice as many NFS operations per write. (Possibly commit?)

2) The FreeBSD NFS server seems capped at around 6300 IOPS.  This is
neither a limit of the network (at least 28k IOPS) nor the filesystem
(about 40k IOPs).

3) When O_DIRECT is not used (not shown), the excess read operations
pull from the same 6300 IOPS bucket, and that's what kills small
writes.

4) It's possible that the sharp drop off visible at the 64k/64k test
is a result of doubling the number of packets traversing the
TSO-capable network.

Here's a representative top from the server while the test is running,
showing all the nfsd kernel threads being utilized, and spare RAM and
CPU:

last pid: 14996;  load averages:  0.58,  0.17,  0.10
                                           up 5+01:42:13  04:02:24

255 processes: 4 running, 223 sleeping, 28 waiting

CPU:  0.0% user,  0.0% nice, 55.9% system,  9.2% interrupt, 34.9% idle

Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free

ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other

Swap: 8192M Total, 8192M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND

   11 root       155 ki31     0K    32K RUN     1 120.8H 65.87% idle{idle: cpu1}

   11 root       155 ki31     0K    32K RUN     0 120.6H 48.63% idle{idle: cpu0}

   12 root       -92    -     0K   448K WAIT    0  13:41 14.55%
intr{irq268: virtio_p}

 1001 root        -8    -     0K    16K mdwait  0  19:37 12.99% md0

   13 root        -8    -     0K    48K -       0  10:53  6.05% geom{g_down}

   12 root       -92    -     0K   448K WAIT    1   3:13  4.64%
intr{irq269: virtio_p}

  859 root        -4    0  9912K  1824K ufs     1   2:00  3.08%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   2:11  2.83%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   2:04  2.64%
nfsd{nfsd: service}

  859 root        -4    0  9912K  1824K ufs     1   2:00  2.29%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  1   6:08  2.20%
nfsd{nfsd: service}

  859 root        -4    0  9912K  1824K ufs     1   5:40  2.20%
nfsd{nfsd: master}

  859 root        -4    0  9912K  1824K ufs     1   2:00  1.95%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   2:50  1.90%
nfsd{nfsd: service}

  859 root        -4    0  9912K  1824K ufs     1   2:47  1.66%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K RUN     0   2:13  1.66%
nfsd{nfsd: service}

   13 root        -8    -     0K    48K -       0   1:55  1.46% geom{g_up}

  859 root        -8    0  9912K  1824K rpcsvc  0   2:39  1.42%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   5:18  1.32%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   1:55  1.12%
nfsd{nfsd: service}

  859 root        -8    0  9912K  1824K rpcsvc  0   2:00  0.98%
nfsd{nfsd: service}

  859 root        -4    0  9912K  1824K ufs     1   2:01  0.73%
nfsd{nfsd: service}

  859 root        -4    0  9912K  1824K ufs     1   5:56  0.49%
nfsd{nfsd: service}

All of this tends to exonerate the client.  So what would be the next
step to track down the cause of poor performance on the server-side?

Thanks!