Terrible NFS performance under 9.2-RELEASE?

Rick Macklem rmacklem at uoguelph.ca
Tue Feb 4 00:03:26 UTC 2014


J David wrote:
> On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Btw, if you do want to test with O_DIRECT ("-I"), you should enable
> > direct io in the client.
> > sysctl vfs.nfs.nfs_directio_enable=1
> >
> > I just noticed that it is disabled by default. This means that your
> > "-I" was essentially being ignored by the FreeBSD client.
> 
> Ouch.  Yes, that appears to be correct.
> 
> > It also explains why Linux isn't doing a read before write, since
> > that wouldn't happen for direct I/O. You should test Linux without
> > "-I"
> > and see if it still doesn't do the read before write, including a
> > "-r 2k"
> > to avoid the "just happens to be a page size" case.
> 
> With O_DIRECT, the Linux client reads only during the read tests.
> Without O_DIRECT, the Linux client does *no reads at all*, not even
> for the read tests.  It caches the whole file and returns
> commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec
> random reads).
> 
It looks like the "-U" option can be used to get iozone to unmount/remount
the file system and avoid hits on the buffer cache.
(Alternately, a single run after doing a manual dismount/mount might help.)

> Setting the sysctl on the FreeBSD client does stop it from doing the
> excess reads.  Ironically this actually makes O_DIRECT improve
> performance for all the workloads punished by that behavior.
> 
> It also creates a fairly consistent pattern indicating performance
> being bottlenecked by the FreeBSD NFS server.
> 
> Here is a sample of test results, showing both throughput and IOPS
> achieved:
> 
> https://imageshack.com/i/4jiljhp
> 
> In this chart, the 64k test is run 4 times, once with FreeBSD as both
> client and server (64k), once with Linux as the client and FreeBSD as
> the server (L/F 64k), once with FreeBSD as the client and Linux as
> the
> server (F/L 64k, hands down the best NFS combo), and once with Linux
> as the client and server (L/L 64k).  For reference, the native
> performance of the md0 filesystem is also included.
> 
> The TLDR version of this chart is that the FreeBSD NFS server is the
> primary bottleneck; it is not being held back by the network or the
> underlying disk.  Ideally, it would be nice to see the 64k column for
> FreeBSD client / FreeBSD server as high as or higher than the column
> for FreeBSD client / Linux Server.  (Also, the bottleneck of the F/L
> 64k test appears to be CPU on the FreeBSD client.)
> 
> The more detailed findings are:
> 
> 1) The Linux client runs at about half the IOPS of the FreeBSD client
> regardless of server type.  The gut-level suspicion is that it must
> be
> doing twice as many NFS operations per write. (Possibly commit?)
> 
> 2) The FreeBSD NFS server seems capped at around 6300 IOPS.  This is
> neither a limit of the network (at least 28k IOPS) nor the filesystem
> (about 40k IOPs).
> 
> 3) When O_DIRECT is not used (not shown), the excess read operations
> pull from the same 6300 IOPS bucket, and that's what kills small
> writes.
> 
> 4) It's possible that the sharp drop off visible at the 64k/64k test
> is a result of doubling the number of packets traversing the
> TSO-capable network.
> 
> Here's a representative top from the server while the test is
> running,
> showing all the nfsd kernel threads being utilized, and spare RAM and
> CPU:
> 
> last pid: 14996;  load averages:  0.58,  0.17,  0.10
>                                            up 5+01:42:13  04:02:24
> 
> 255 processes: 4 running, 223 sleeping, 28 waiting
> 
> CPU:  0.0% user,  0.0% nice, 55.9% system,  9.2% interrupt, 34.9%
> idle
> 
> Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free
> 
> ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other
> 
> Swap: 8192M Total, 8192M Free
> 
> 
>   PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU
>   COMMAND
> 
>    11 root       155 ki31     0K    32K RUN     1 120.8H 65.87%
>    idle{idle: cpu1}
> 
>    11 root       155 ki31     0K    32K RUN     0 120.6H 48.63%
>    idle{idle: cpu0}
> 
>    12 root       -92    -     0K   448K WAIT    0  13:41 14.55%
> intr{irq268: virtio_p}
> 
>  1001 root        -8    -     0K    16K mdwait  0  19:37 12.99% md0
> 
>    13 root        -8    -     0K    48K -       0  10:53  6.05%
>    geom{g_down}
> 
>    12 root       -92    -     0K   448K WAIT    1   3:13  4.64%
> intr{irq269: virtio_p}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  3.08%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:11  2.83%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:04  2.64%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  2.29%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  1   6:08  2.20%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   5:40  2.20%
> nfsd{nfsd: master}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:00  1.95%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:50  1.90%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:47  1.66%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K RUN     0   2:13  1.66%
> nfsd{nfsd: service}
> 
>    13 root        -8    -     0K    48K -       0   1:55  1.46%
>    geom{g_up}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:39  1.42%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   5:18  1.32%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   1:55  1.12%
> nfsd{nfsd: service}
> 
>   859 root        -8    0  9912K  1824K rpcsvc  0   2:00  0.98%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   2:01  0.73%
> nfsd{nfsd: service}
> 
>   859 root        -4    0  9912K  1824K ufs     1   5:56  0.49%
> nfsd{nfsd: service}
> 
> 
> All of this tends to exonerate the client.  So what would be the next
> step to track down the cause of poor performance on the server-side?
> 
Also, as an alternative to setting the 2 sysctls for the TCP DRC cache,
you can simply disable it by setting the sysctl:
vfs.nfsd.cachetcp=0

Many would argue that doing a DRC for TCP isn't necessary (it wasn't
done in most NFS servers, including the old NFS server in FreeBSD).
I have no idea if the Linux server does a DRC for TCP.

And make sure you've increased your nfsd thread count. You can put
a line like this in your /etc/rc.conf to do that:
nfs_server_flags="-u -t -n 64"
(sets it to 64, which should be plenty for a single client.)

rick

> Thanks!
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
> 


More information about the freebsd-net mailing list