Terrible NFS performance under 9.2-RELEASE?
J David
j.david.lists at gmail.com
Mon Jan 20 04:11:10 UTC 2014
MIME-Version: 1.0
Sender: jdavidlists at gmail.com
Received: by 10.42.170.8 with HTTP; Sun, 19 Jan 2014 20:08:04 -0800 (PST)
In-Reply-To: <1349281953.12559529.1390174577569.JavaMail.root at uoguelph.ca>
References: <52DC1241.7010004 at egr.msu.edu>
<1349281953.12559529.1390174577569.JavaMail.root at uoguelph.ca>
Date: Sun, 19 Jan 2014 23:08:04 -0500
Delivered-To: jdavidlists at gmail.com
X-Google-Sender-Auth: 2XgnsPkoaEEkfTqW1ZVFM_Lel3o
Message-ID: <CABXB=RQDpva-fiMJDiRX_TZhkoQ9kZtk6n3i6=pw1z6cad_1KQ at mail.gmail.com>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists at gmail.com>
To: Rick Macklem <rmacklem at uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
On Sun, Jan 19, 2014 at 9:32 AM, Alfred Perlstein <alfred at freebsd.org> wrote:
> I hit nearly the same problem and raising the mbufs worked for me.
>
> I'd suggest raising that and retrying.
That doesn't seem to be an issue here; mbufs are well below max on
both client and server and all the "delayed"/"denied" lines are 0/0/0.
On Sun, Jan 19, 2014 at 12:58 PM, Adam McDougall <mcdouga9 at egr.msu.edu> wrote:
> Also try rsize=32768,wsize=32768 in your mount options, made a huge
> difference for me.
This does make a difference, but inconsistently.
In order to test this further, I created a Debian guest on the same
host as these two FreeBSD hosts and re-ran the tests with it acting as
both client and server, and ran them for both 32k and 64k.
Findings:
random random
write rewrite read reread read write
S:FBSD,C:FBSD,Z:64k
67246 2923 103295 1272407 172475 196
S:FBSD,C:FBSD,Z:32k
11951 99896 223787 1051948 223276 13686
S:FBSD,C:DEB,Z:64k
11414 14445 31554 30156 30368 13799
S:FBSD,C:DEB,Z:32k
11215 14442 31439 31026 29608 13769
S:DEB,C:FBSD,Z:64k
36844 173312 313919 1169426 188432 14273
S:DEB,C:FBSD,Z:32k
66928 120660 257830 1048309 225807 18103
So the rsize/wsize makes a difference between two FreeBSD nodes, but
with a Debian node as either client or server, it no longer seems to
matter much. And /proc/mounts on the debian box confirms that it
negotiates and honors the 64k size as a client.
On Sun, Jan 19, 2014 at 6:36 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Yes, it shouldn't make a big difference but it sometimes does. When it
> does, I believe that indicates there is a problem with your network
> fabric.
Given that this is an entirely virtual environment, if your belief is
correct, where would supporting evidence be found?
As far as I can tell, there are no interface errors reported on the
host (checking both taps and the bridge) or any of the guests, nothing
in sysctl dev.vtnet of concern, etc. Also the improvement from using
debian on either side, even with 64k sizes, seems counterintuitive.
To try to help vindicate the network stack, I did iperf -d between the
two FreeBSD nodes while the iozone was running:
Server:
$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 4] local 172.20.20.162 port 5001 connected with 172.20.20.169 port 37449
------------------------------------------------------------
Client connecting to 172.20.20.169, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 6] local 172.20.20.162 port 28634 connected with 172.20.20.169 port 5001
Waiting for server threads to complete. Interrupt again to force quit.
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 15.8 GBytes 13.6 Gbits/sec
[ 4] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec
Client:
$ iperf -c 172.20.20.162 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.20.20.162, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 5] local 172.20.20.169 port 32533 connected with 172.20.20.162 port 5001
[ 4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 36617
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec
[ 4] 0.0-10.0 sec 15.5 GBytes 13.3 Gbits/sec
mbuf usage is pretty low.
Server:
$ netstat -m
545/4075/4620 mbufs in use (current/cache/total)
535/1819/2354/131072 mbuf clusters in use (current/cache/total/max)
535/1641 mbuf+clusters out of packet secondary zone in use (current/cache)
0/2034/2034/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
1206K/12792K/13999K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
Client:
$ netstat -m
1841/3544/5385 mbufs in use (current/cache/total)
1172/1198/2370/32768 mbuf clusters in use (current/cache/total/max)
512/896 mbuf+clusters out of packet secondary zone in use (current/cache)
0/2314/2314/16384 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/8192 9k jumbo clusters in use (current/cache/total/max)
0/0/0/4096 16k jumbo clusters in use (current/cache/total/max)
2804K/12538K/15342K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
Here's 60 seconds of netstat -ss for ip and tcp from the server with
64k mount running ozone:
ip:
4776 total packets received
4758 packets for this host
18 packets for unknown/unsupported protocol
2238 packets sent from this host
tcp:
2244 packets sent
1427 data packets (238332 bytes)
5 data packets (820 bytes) retransmitted
812 ack-only packets (587 delayed)
2235 packets received
1428 acks (for 238368 bytes)
2007 packets (91952792 bytes) received in-sequence
225 out-of-order packets (325800 bytes)
1428 segments updated rtt (of 1426 attempts)
5 retransmit timeouts
587 correct data packet header predictions
225 SACK options (SACK blocks) sent
And with 32k mount:
ip:
24172 total packets received
24167 packets for this host
5 packets for unknown/unsupported protocol
26130 packets sent from this host
tcp:
26130 packets sent
23506 data packets (5362120 bytes)
2624 ack-only packets (454 delayed)
21671 packets received
18143 acks (for 5362192 bytes)
20278 packets (756617316 bytes) received in-sequence
96 out-of-order packets (145964 bytes)
18143 segments updated rtt (of 17469 attempts)
1093 correct ACK header predictions
3449 correct data packet header predictions
111 SACK options (SACK blocks) sent
So the 32k mount sends about 6x the packet volume. (This is on
iozone's linear write test.)
One thing I've noticed is that when the 64k connection bogs down, it
seems to "poison" things for awhile. For example, iperf will start
doing this afterward:
>From the client to the server:
$ iperf -c 172.20.20.162
------------------------------------------------------------
Client connecting to 172.20.20.162, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 172.20.20.169 port 14337 connected with 172.20.20.162 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 4.88 MBytes 4.05 Mbits/sec
Ouch! That's quite a drop from 13Gbit/sec. Weirdly, iperf to the
debian node not affected:
>From the client to the debian node:
$ iperf -c 172.20.20.166
------------------------------------------------------------
Client connecting to 172.20.20.166, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 172.20.20.169 port 24376 connected with 172.20.20.166 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 20.4 GBytes 17.5 Gbits/sec
>From the debian node to the server:
$ iperf -c 172.20.20.162
------------------------------------------------------------
Client connecting to 172.20.20.162, TCP port 5001
TCP window size: 23.5 KByte (default)
------------------------------------------------------------
[ 3] local 172.20.20.166 port 43166 connected with 172.20.20.162 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 12.9 GBytes 11.1 Gbits/sec
But if I let it run for longer, it will apprently figure things out
and creep back up to normal speed and stay there until NFS strikes
again. It's like the kernel is caching some sort of hint that
connectivity to that other host sucks, and it has to either expire or
be slowly overcome.
Client:
$ iperf -c 172.20.20.162 -t 60
------------------------------------------------------------
Client connecting to 172.20.20.162, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 172.20.20.169 port 59367 connected with 172.20.20.162 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 56.2 GBytes 8.04 Gbits/sec
Server:
$ netstat -I vtnet1 -ihw 1
input (vtnet1) output
packets errs idrops bytes packets errs bytes colls
7 0 0 420 0 0 0 0
7 0 0 420 0 0 0 0
8 0 0 480 0 0 0 0
8 0 0 480 0 0 0 0
7 0 0 420 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
11 0 0 12k 3 0 206 0
<--- starts here
17 0 0 227k 10 0 660 0
17 0 0 408k 10 0 660 0
17 0 0 417k 10 0 660 0
17 0 0 425k 10 0 660 0
17 0 0 438k 10 0 660 0
17 0 0 444k 10 0 660 0
16 0 0 453k 10 0 660 0
input (vtnet1) output
packets errs idrops bytes packets errs bytes colls
16 0 0 463k 10 0 660 0
16 0 0 469k 10 0 660 0
16 0 0 482k 10 0 660 0
16 0 0 487k 10 0 660 0
16 0 0 496k 10 0 660 0
16 0 0 504k 10 0 660 0
18 0 0 510k 10 0 660 0
16 0 0 521k 10 0 660 0
17 0 0 524k 10 0 660 0
17 0 0 538k 10 0 660 0
17 0 0 540k 10 0 660 0
17 0 0 552k 10 0 660 0
17 0 0 554k 10 0 660 0
17 0 0 567k 10 0 660 0
16 0 0 568k 10 0 660 0
16 0 0 581k 10 0 660 0
16 0 0 582k 10 0 660 0
16 0 0 595k 10 0 660 0
16 0 0 595k 10 0 660 0
16 0 0 609k 10 0 660 0
16 0 0 609k 10 0 660 0
input (vtnet1) output
packets errs idrops bytes packets errs bytes colls
16 0 0 620k 10 0 660 0
16 0 0 623k 10 0 660 0
17 0 0 632k 10 0 660 0
17 0 0 637k 10 0 660 0
8.7k 0 0 389M 4.4k 0 288k 0
42k 0 0 2.1G 21k 0 1.4M 0
41k 0 0 2.1G 20k 0 1.4M 0
38k 0 0 1.9G 19k 0 1.2M 0
40k 0 0 2.0G 20k 0 1.3M 0
40k 0 0 2.0G 20k 0 1.3M 0
40k 0 0 2G 20k 0 1.3M 0
39k 0 0 2G 20k 0 1.3M 0
43k 0 0 2.2G 22k 0 1.4M 0
42k 0 0 2.2G 21k 0 1.4M 0
39k 0 0 2G 19k 0 1.3M 0
38k 0 0 1.9G 19k 0 1.2M 0
42k 0 0 2.1G 21k 0 1.4M 0
44k 0 0 2.2G 22k 0 1.4M 0
41k 0 0 2.1G 20k 0 1.3M 0
41k 0 0 2.1G 21k 0 1.4M 0
40k 0 0 2.0G 20k 0 1.3M 0
input (vtnet1) output
packets errs idrops bytes packets errs bytes colls
43k 0 0 2.2G 22k 0 1.4M 0
41k 0 0 2.1G 20k 0 1.3M 0
40k 0 0 2.0G 20k 0 1.3M 0
42k 0 0 2.2G 21k 0 1.4M 0
39k 0 0 2G 19k 0 1.3M 0
42k 0 0 2.1G 21k 0 1.4M 0
40k 0 0 2.0G 20k 0 1.3M 0
42k 0 0 2.1G 21k 0 1.4M 0
38k 0 0 2G 19k 0 1.3M 0
39k 0 0 2G 20k 0 1.3M 0
45k 0 0 2.3G 23k 0 1.5M 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
It almost looks like something is limiting it to 10 packets per
second. So confusing! TCP super slow start?
Thanks!
(Sorry Rick, forgot to reply all so you got an extra! :( )
Also, here's the netstat from the client side showing the 10 packets
per second limit and eventual recovery:
$ netstat -I net1 -ihw 1
input (net1) output
packets errs idrops bytes packets errs bytes colls
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
15 0 0 962 11 0 114k 0
17 0 0 1.1k 10 0 368k 0
17 0 0 1.1k 10 0 411k 0
17 0 0 1.1k 10 0 425k 0
17 0 0 1.1k 10 0 432k 0
17 0 0 1.1k 10 0 439k 0
17 0 0 1.1k 10 0 452k 0
16 0 0 1k 10 0 457k 0
16 0 0 1k 10 0 467k 0
16 0 0 1k 10 0 477k 0
16 0 0 1k 10 0 481k 0
16 0 0 1k 10 0 495k 0
16 0 0 1k 10 0 498k 0
16 0 0 1k 10 0 510k 0
16 0 0 1k 10 0 515k 0
16 0 0 1k 10 0 524k 0
17 0 0 1.1k 10 0 532k 0
input (net1) output
packets errs idrops bytes packets errs bytes colls
17 0 0 1.1k 10 0 538k 0
17 0 0 1.1k 10 0 548k 0
17 0 0 1.1k 10 0 552k 0
17 0 0 1.1k 10 0 562k 0
17 0 0 1.1k 10 0 566k 0
16 0 0 1k 10 0 576k 0
16 0 0 1k 10 0 580k 0
16 0 0 1k 10 0 590k 0
17 0 0 1.1k 10 0 594k 0
16 0 0 1k 10 0 603k 0
16 0 0 1k 10 0 609k 0
16 0 0 1k 10 0 614k 0
16 0 0 1k 10 0 623k 0
16 0 0 1k 10 0 626k 0
17 0 0 1.1k 10 0 637k 0
18 0 0 1.1k 10 0 637k 0
17k 0 0 1.1M 34k 0 1.7G 0
21k 0 0 1.4M 42k 0 2.1G 0
20k 0 0 1.3M 39k 0 2G 0
19k 0 0 1.2M 38k 0 1.9G 0
20k 0 0 1.3M 41k 0 2.0G 0
input (net1) output
packets errs idrops bytes packets errs bytes colls
20k 0 0 1.3M 40k 0 2.0G 0
19k 0 0 1.2M 38k 0 1.9G 0
22k 0 0 1.5M 45k 0 2.3G 0
20k 0 0 1.3M 40k 0 2.1G 0
20k 0 0 1.3M 40k 0 2.1G 0
18k 0 0 1.2M 36k 0 1.9G 0
21k 0 0 1.4M 41k 0 2.1G 0
22k 0 0 1.4M 44k 0 2.2G 0
21k 0 0 1.4M 43k 0 2.2G 0
20k 0 0 1.3M 41k 0 2.1G 0
20k 0 0 1.3M 40k 0 2.0G 0
21k 0 0 1.4M 43k 0 2.2G 0
21k 0 0 1.4M 43k 0 2.2G 0
20k 0 0 1.3M 40k 0 2.0G 0
21k 0 0 1.4M 43k 0 2.2G 0
19k 0 0 1.2M 38k 0 1.9G 0
21k 0 0 1.4M 42k 0 2.1G 0
20k 0 0 1.3M 40k 0 2.0G 0
21k 0 0 1.4M 42k 0 2.1G 0
20k 0 0 1.3M 40k 0 2.0G 0
20k 0 0 1.3M 40k 0 2.0G 0
input (net1) output
packets errs idrops bytes packets errs bytes colls
24k 0 0 1.6M 48k 0 2.5G 0
6.3k 0 0 417k 12k 0 647M 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
6 0 0 360 0 0 0 0
More information about the freebsd-net
mailing list