Linux NFS client and FreeBSD server strangeness

Thu Apr 5 00:38:44 UTC 2018

Mike Tancsa wrote:
>Not sure where the tweaking needs to happen, but I am getting strange
>behaviour between a Linux nfs client and FreeBSD RELENG_11 NFS server.
>
>The FreeBSD server starts with
>
>
>nfs_client_enable="YES"
>nfs_server_enable="YES"
>
>
>rpcbind_enable="YES"
>rpc_lockd_enable="YES"
>rpc_statd_enable="YES"
Although it probably isn't related to what you are seeing, I avoid the NSM, NLM since
they are fundamentally flawed protocols. You only need them for NFSv3 clients where
the clients must see each others byte range locks.
If byte range locks only need to be visible to processes within a client, you can get
rid of these and use the "nfslockd" mount option, called "nfslock" on Linux, I think?

>nfs_server_flags="-u -t -n 16"
16 nfsd threads is very low. The default (if you don't specify "-n") is 8 per core, which
is still very low. Extra ones cause very little overhead (a kernel stack for each one), so
I usually use "-n 256" if the server is going to be under any amount of load.

Another thing you can try is:
# sysctl vfs.nfsd.cachetcp=0
which disables use of the DRC for TCP mounts. (Many NFS servers never use the DRC
for TCP mounts. I designed one to try and make NFS over TCP more fault tolerant,
but it does result in quite a bit of overhead for write loads. If this fixes the problem,
but you want to use it, it can be tuned with something like:
vfs.nfsd.tcpcachetimeo=300 (five minutes instead of hours)
vfs.nfsd.tcphighwater=100  (limit of 100 cached entries)
--> The smaller you make these, the lower the overheads and the less effective at
      making NFS over TCP reliable when TCP reconnects occur it becomes.

There are several tunables for NFSv4 (but none of these affect NFSv3):
vfs.nfsd.sessionhashsize=1000
vfs.nfsd.fhhashsize=1000
vfs.nfsd.clienthashsize=1000
vfs.nfsd.statehashsize=100
(A fairly large system dedicated to serving NFS might make the above "1000"s "10000"s.)

>and on the Linux client I have been trying various options to no avail.
>The mount works, but on a straight up write to the FreeBSD server,
>everything is very bursty.  I noticed this (I think) a few months ago
>where Linux dumps across an nfs mount seemed to take a lot longer and
>were getting very bursty.
>
>It seems if there are a mixture of reads and writes, everything is
>pretty fast. But if a client is just writing to the server, something,
>somewhere is blocking.  Doing something simple like
>ls -l /nfsmount
>from the client "wakes" up the server/client so that write stream can
>keep going. Otherwise, it will do a big blast of writes and then several
>seconds of pausing on the dump.
This sounds like a network device driver issue to me. The main difference between
a FreeBSD client and a Linux client that I am aware of is that the Linux client likes
to do page size (4K) writes, so it generates lots of them.

One example might be interrupt moderation. It's a wonderful thing for some TCP loads,
but can be a terrible thing for NFS loads. Basically anything that adds delay to interrupt
delivery/processing will increase latency and that kills NFS performance, from what I've
seen.
Someone else suggested disabling TSO, which is often broken in the net device drivers.
If you have a different type of net interface that uses a different driver, you might try
that and see if it has the same problem.

I might look at your packet trace someday, but I haven't yet.

Good luck with it, rick
[stuff snipped]