Why is NFSv4 so slow?

Rick C. Petty rick-freebsd2009 at kiwi-computer.com
Mon Aug 30 17:23:01 UTC 2010

On Sun, Aug 29, 2010 at 11:44:06AM -0400, Rick Macklem wrote:
> > Hi. I'm still having problems with NFSv4 being very laggy on one
> > client.
> > When the NFSv4 server is at 50% idle CPU and the disks are < 1% busy,
> > I am
> > getting horrible throughput on an idle client. Using dd(1) with 1 MB
> > block
> > size, when I try to read a > 100 MB file from the client, I'm getting
> > around 300-500 KiB/s. On another client, I see upwards of 20 MiB/s
> > with
> > the same test (on a different file). On the broken client:
> Since other client(s) are working well, that seems to suggest that it
> is a network related problem and not a bug in the NFS code.

Well I wouldn't say "well".  Every client I've set up has had this issue,
and somehow through tweaking various settings and restarting nfs a bunch of
times, I've been able to make it tolerable for most clients.  Only one
client is behaving well, and that happens to be the only machine I haven't
rebooted since I enabled NFSv4.  Other clients are seeing 2-3 MiB/s on my
dd(1) test.

I should point out that caching is an issue.  The second time I run a dd on
the same input file, I get upwards of 20-35 MiB/s on the "bad" client.  But
I can "invalidate" the cache by unmounting and remounting the file system
so it looks like client-side caching.

I'm not sure how you can say it's network-related and not NFS.  Things
worked just fine with NFSv3 (in fact NFSv3 client using the same NFSv4
server doesn't have this problem).  Using rsync over ssh I get around 15-20
MiB/s throughput, and dd piped through ssh gets almost 40 MiB/s (neither
one is using compression)!

> First off, the obvious question: How does this client differ from the
> one that performs much better?

Different hardware (CPU, board, memory).  I'm also hoping it was some
sysctl tweak I did, but I can't seem to determine what it was.

> Do they both use the same "re" network interface for the NFS traffic?
> (If the answer is "no", I'd be suspicious that the "re" hardware or
> device driver is the culprit.)

That's the same thing you and others said about the *other* NFSv4 clients
I set up.  How is v4 that much different than v3 in terms of network
traffic?  The other clients are all using re0 and exactly the same
ifconfig "options" and flags, including the client that's behaving fine.

> Things that I might try in an effort to isolate the problem:
> - switch the NFS traffic to use the nfe0 net interface.

I'll consider it.  I'm not convinced it's a NIC problem yet.

> - put a net interface identical to the one on the client that
>   works well in the machine and use that for the NFS traffic.

It's already close enough.  Bad client:

re0 at pci0:1:7:0: class=0x020000 card=0x816910ec chip=0x816910ec rev=0x10 hdr=0x00
    vendor     = 'Realtek Semiconductor'
    device     = 'Single Gigabit LOM Ethernet Controller (RTL8110)'
    class      = network
    subclass   = ethernet

Good client:

re0 at pci0:1:0:0: class=0x020000 card=0x84321043 chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor'
    device     = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
    class      = network
    subclass   = ethernet

Mediocre client:

re0 at pci0:1:0:0: class=0x020000 card=0x84321043 chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor'
    device     = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
    class      = network
    subclass   = ethernet

The mediocre and good clients have exactly identical hardware, and often
I'll witness the "slow client" behavior on the mediocre client, and rarely
on the "good client" although in previous emails to you, it was the "good
client" which was behaving the worst of all.

Other differences:
good client = 8.1 GENERIC r210227M amd64 12GB RAM Athlon II X2 255
med. client = 8.1 GENERIC r209555M i386 4GB RAM Athlon II X2 255
bad client = 8.1 GENERIC r211534M i386 2GB RAM Athlon 64 X2 5200+

> - turn off TXCSUM and RXCSUM on re0

Tried that, didn't help although it seemed to slow things down a little.

> - reduce the read/write data size, using rsize=N,wsize=N on the
>   mount. (It will default to MAXBSIZE and some net interfaces don't
>   handle large bursts of received data well. If you drop it to
>   rsize=8192,wszie=8192 and things improve, then increase N until it
>   screws up.)

8k didn't improve things at all.

> - check the port configuration on the switch end, to make sure it
>   is also 1000bps-full duplex.

It is, and has been.

> - move the client to a different net port on the switch or even a
>   different switch (and change the cable, while you're at it).

I've tried that too.  The switches are great and my cables are fine.
Like I said, NFSv3 on the same moint point works just fine (dd does
around 30-35 MiB/s).

> - Look at "netstat -s" and see if there are a lot of retransmits
>   going on in TCP.

2 of 40k TCP packets retransmitted, 7k of 40k duplicate acks received.
I don't see anything else in "netstat -s" with numbers larger than 10.

> If none of the above seems to help, you could look at a packet trace
> and see what is going on. Look for TCP reconnects (SYN, SYN-ACK...)
> or places where there is a large time delay/retransmit of a TCP
> segment.

Is that something easily scriptable with tcpdump?  I'd rather not look
for such things manually.

> Hopefully others who are more familiar with the networking side
> can suggest other things to try, rick

I'm still not convinced it's a network issue.  Here are some specs I
tested with dd(1) on the same file on the file server, listed in the
order I performed these tests:

client	mount	first attempt	second attempt
------	-----	-------------	--------------
bad	NFSv3	 32954968 B/s	 643911087 B/s
bad	NFSv4	   439672 B/s	   6694992 B/s
med.	NFSv4	   333837 B/s	    617387 B/s
med.	NFSv3	 95043062 B/s	1617717600 B/s
good	NFSv4	 64276844 B/s	2488692465 B/s
good	NFSv3	 98051629 B/s	2697787313 B/s
bad	NFSv4	   580284 B/s	  13554608 B/s

It seems pretty obvious to me that v3 outperforms v4, and that there are
some caching effects on the client.  But even with the cache, the
performance from v4 is pretty pitiful, except for one of my clients.  I'm
not sure what I tweaked.  I'll include a diff of the relevant "sysctl -a"
outputs from those two machines:

--- bad client
+++ good client
-kern.version: FreeBSD 8.1-STABLE #5 r211534M: Sat Aug 28 15:53:10 CDT 2010
+kern.version: FreeBSD 8.1-PRERELEASE #1 r210227M: Sun Jul 18 23:24:16 CDT 2010
-kern.ipc.maxsockbuf: 1048576
+kern.ipc.maxsockbuf: 524288
-kern.ipc.max_datalen: 124
+kern.ipc.max_datalen: 92
-kern.ipc.pipekva: 114688
-kern.ipc.maxpipekva: 16777216
+kern.ipc.pipekva: 589824
+kern.ipc.maxpipekva: 207671296
-kern.ipc.numopensockets: 59
+kern.ipc.numopensockets: 202
-kern.ipc.nsfbufspeak: 5
-kern.ipc.nsfbufs: 6656
+kern.ipc.nsfbufspeak: 0
+kern.ipc.nsfbufs: 0
-kern.openfiles: 144
+kern.openfiles: 632
-kern.maxssiz: 67108864
+kern.maxssiz: 536870912
-kern.maxdsiz: 536870912
+kern.maxdsiz: 34359738368
-kern.maxbcache: 209715200
+kern.maxbcache: 0
-kern.nbuf: 7224
+kern.nbuf: 79194
-vfs.ufs.dirhash_lowmemcount: 351
+vfs.ufs.dirhash_lowmemcount: 1725
-vfs.ufs.dirhash_mem: 1123139
-vfs.freevnodes: 24960
+vfs.freevnodes: 25000
 vfs.wantfreevnodes: 25000
-vfs.numvnodes: 36966
+vfs.numvnodes: 88150
-net.inet.icmp.bmcastecho: 0
+net.inet.icmp.bmcastecho: 1
-net.inet.tcp.sendspace: 65536
-net.inet.tcp.recvspace: 131072
+net.inet.tcp.sendspace: 32768
+net.inet.tcp.recvspace: 65536
-net.inet.tcp.hostcache.count: 3
+net.inet.tcp.hostcache.count: 8
-net.inet.tcp.recvbuf_max: 16777216
+net.inet.tcp.recvbuf_max: 262144
-net.inet.tcp.sendbuf_max: 16777216
+net.inet.tcp.sendbuf_max: 262144
-net.inet.tcp.reass.overflows: 0
+net.inet.tcp.reass.overflows: 1993
-net.inet.tcp.pcbcount: 16
+net.inet.tcp.pcbcount: 34
-machdep.tsc_freq: 2712350646
+machdep.tsc_freq: 3110426281

-- Rick

More information about the freebsd-stable mailing list