Linux NFS client and FreeBSD server strangeness

Thu Apr 5 04:39:06 UTC 2018

On Wed, 4 Apr 2018, Kaya Saman wrote:

> If I recall correctly the "sync" option is default. Though it might be
> different depending on the Linux distro in use?
>
> I use this: vers=3,defaults,auto,tcp,rsize=8192,wsize=8192
>
> though I could get rid of the tcp as that's also a default. The rsize
> and wsize options are for running Jumbo Frames ie. larger MTU then
> 1500; in my case 9000 for 1Gbps links.

These rsize and wsize options are pessimizations.  They override the
default sizes which are usually much larger for tcp.

The defaults are not documented in the man page, and the current
settings are almost equally impossible to see (e.g., mount -v doesn't
show them).  The defaults are not quite impossible to see in the source
code of course, but the source code for them is especially convoluted.
It seems to give the following results:
- for udp, the initial defaults are NFS_RSIZE and NFS_WSIZE.  These are
   8K.  This is almost simple.  The values are low because at least some
   versions have bugs with larger values.  32K never worked well for me.
   It hangs in some versions and is slower in others.  16K works for me.
- for tcp, the initial defaults are maxbcachebuf.  This is a read-only
   tunable.  It defaults to MAXBCACHEBUF.  This is an unsupported option.
   It defaults to MAXBSIZE.  MAXBSIZE is honestly non-optional -- it is
   always 64K unless the source code is edited.  Unsupported for an options
   means that it is ifdefed in a header file but is not in conf/options*,
   so it must be added to CFLAGS in some way before every include of the
   file (or just the ones that use MAXBCACHE if you know what they are.
   MAXBCACHE has a maximum of MAXPHYS.  MAXPHYS is a supported option
   with a default of 128K.  Although it is supported, it is much harder
   to change since it can reasonably used in applications where the support
   is null.
- for remount, the defaults are from the previous mount.  Their values
   are almost impossible to see.  Large values from a previous tcp mount
   tend to break remounting with udp.

I always use udp since tcp is just slower (higher overhead and latency).
I usually force the sizes to 8K since I don't want then to depend on
undocumented defaults or change if these defaults change.  Changes
invalidate benchmarks.  Sometimes I force them to 16K to see if this is
an optimization and keep it and update the benchmark results if it is.

I don't like large block sizes and mostly use 16K for all file systems,
but 32K is now better for throughput.  64K is just slower in all of my
tests, due to it being too large for small metadata.

nfs (v3) used to (5-30 years ago) have bursty behaviour even with a
FreeBSD client and server.  IIRC, this was from not much write combining
and too many daemons on the server.  (You will have to mount the client
async to give the server a chance to combine small writes.  Large block
sizes give large writes which may arrive out of order so need recombining,
and waiting for this in the sync case.  Too many daemons give more
reordering.)  The server couldn't keep up with the client, and the client
stopped to let the server catch up.  After fixing this the problem moved
to the client not being able to keep up with disks (for copying uncached
files).  This gave less bursty behaviour.  E.g., consistently 10-20% below
the disk bandwidth with network bandwidth to spare.

Bruce