Terrible ix performance

Thu Jul 4 02:01:35 UTC 2013

On 07/04/13 10:18, Kevin Oberman wrote:
> On Wed, Jul 3, 2013 at 4:21 PM, Steven Hartland <killing at multiplay.co.uk>wrote:
> 
>>
>> ----- Original Message ----- From: "Outback Dingo" <outbackdingo at gmail.com
>>>
>> To: "Lawrence Stewart" <lstewart at freebsd.org>
>> Cc: <net at freebsd.org>
>> Sent: Thursday, July 04, 2013 12:06 AM
>> Subject: Re: Terrible ix performance
>>
>>
>>
>>  On Wed, Jul 3, 2013 at 9:39 AM, Lawrence Stewart <lstewart at freebsd.org
>>>> wrote:
>>>
>>>  On 07/03/13 22:58, Outback Dingo wrote:
>>>>> On Wed, Jul 3, 2013 at 4:50 AM, Lawrence Stewart <lstewart at freebsd.org
>>>>> <mailto:lstewart at freebsd.org>> wrote:
>>>>>
>>>>>     On 07/03/13 14:28, Outback Dingo wrote:
>>>>>     > Ive got a high end storage server here, iperf shows decent
>>>> network
>>>> io
>>>>>     >
>>>>>     > iperf -i 10 -t 20 -c 10.0.96.1 -w 2.5M -l 2.5M
>>>>>     > ------------------------------**------------------------------
>>>>>     > Client connecting to 10.0.96.1, TCP port 5001
>>>>>     > TCP window size: 2.50 MByte (WARNING: requested 2.50 MByte)
>>>>>     > ------------------------------**------------------------------
>>>>>     > [  3] local 10.0.96.2 port 34753 connected with 10.0.96.1 port
>>>> 5001
>>>>>     > [ ID] Interval       Transfer     Bandwidth
>>>>>     > [  3]  0.0-10.0 sec  9.78 GBytes  8.40 Gbits/sec
>>>>>     > [ 3] 10.0-20.0 sec  8.95 GBytes  7.69 Gbits/sec
>>>>>     > [  3]  0.0-20.0 sec  18.7 GBytes  8.05 Gbits/sec
>>>>>
>>>>>     Given that iperf exercises the ixgbe driver (ix), network path and
>>>> TCP,
>>>>>     I would suggest that your subject is rather misleading ;)
>>>>>
>>>>>     > the card has a 3 meter twinax cable from cisco connected to it,
>>>> going
>>>>>     > through a fujitsu switch. We have tweaked various networking, and
>>>>>     kernel
>>>>>     > sysctls, however from a sftp and nfs session i cant get better
>>>>>     then 100MBs
>>>>>     > from a zpool with 8 mirrored vdevs. We also have an identical box
>>>>>     that will
>>>>>     > get 1.4Gbs with a 1 meter cisco twinax cables that writes 2.4Gbs
>>>>>     compared
>>>>>     > to reads only 1.4Gbs...
>>>>>
>>>>>     I take it the RTT between both hosts is very low i.e. sub 1ms?
>>>>
>>>> An answer to the above question would be useful.
>>>>
>>>>>     > does anyone have an idea of what the bottle neck could be?? This
>>>> is a
>>>>>     > shared storage array with dual LSI controllers connected to 32
>>>>>     drives via
>>>>>     > an enclosure, local dd and other tests show the zpool performs
>>>>>     quite well.
>>>>>     > however as soon as we introduce any type of protocol, sftp,
>>>> samba,
>>>> nfs
>>>>>     > performance plummets. Im quite puzzled and have run out of ideas.
>>>>>     so now
>>>>>     > curiousity has me........ its loading the ix driver and working
>>>>>     but not up
>>>>>     > to speed,
>>>>>
>>>>>     ssh (and sftp by extension) aren't often tuned for high speed
>>>> operation.
>>>>>     Are you running with the HPN patch applied or a new enough FreeBSD
>>>> that
>>>>>     has the patch included? Samba and NFS are both likely to need
>>>> tuning
>>>> for
>>>>>     multi-Gbps operation.
>>>>>
>>>>>
>>>>> Running 9-STABLE as of 3 days ago, what are you referring to s i can
>>>>> validate i dont need to apply it
>>>>
>>>> Ok so your SSH should have the HPN patch.
>>>>
>>>>> as for tuning for NFS/SAMBA sambas configured with AIO, and sendfile,
>>>>> and there so much information
>>>>> on tuninig these things that its a bit hard to decipher whats right and
>>>>> not right
>>>>
>>>> Before looking at tuning, I'd suggest testing with a protocol that
>>>> involves the disk but isn't as heavy weight as SSH/NFS/CIFS. FTP is the
>>>> obvious choice. Set up an inetd-based FTP instance, serve a file large
>>>> enough that it will take ~60s to transfer to the client and report back
>>>> what data rates you get from 5 back-to-back transfer trials.
>>>>
>>>>
>>>>  on the 1GB interface i get 100MB/s, on the 10GB interface i get 250MB/s
>>> via NFS
>>> on the 1GB Interface 1 get 112MB/s, on the 10GB interface i get
>>>
>>> ftp> put TEST3
>>> 53829697536 bytes sent in 01:56 (439.28 MiB/s)
>>> ftp> get TEST3
>>> 53829697536 bytes received in 01:21 (632.18 MiB/s)
>>> ftp> get TEST3
>>> 53829697536 bytes received in 01:37 (525.37 MiB/s)
>>> ftp> put TEST3
>>> 43474223104 bytes sent in 01:50 (376.35 MiB/s)
>>> ftp> put TEST3
>>> local: TEST3 remote: TEST3
>>> 229 Entering Extended Passive Mode (|||10613|)
>>> 226 Transfer complete
>>> 43474223104 bytes sent in 01:41 (410.09 MiB/s)
>>> ftp>
>>>
>>> so still about 50% performance on 10GB
>>>
>>
>> Out of interest have you tried limiting the number of queues?
>>
>> If not give it a try see if it helps, add the following to
>> /boot/loader.conf:
>> hw.ixgbe.num_queues=1
>>
>> If nothing else will give you another data point.

As noted in my first post to this thread, if iperf is able to push a
single flow at 8Gbps, then the NIC is unlikely to be the source of the
problem and trying to tune it is a waste of time (at least at this stage).

iperf tests memory-network-memory transfer speed without any disk
involvement, so the fact that it can get 8Gbps and ftp is getting around
4Gbps implies that either the iperf TCP tuning is better (only likely to
be relevant if the RTT is very large - Outback Dingo you still haven't
provided us with the RTT) or the disk subsystem at one or both ends is
slowing things down.

Outback Dingo: can you please run another iperf test without the -w
switch on both client and server to see if your send/receive window
autotuning on both ends is working. If all is well, you should see the
same results of ~8Gbps.

>> You might also try SIFTR to analyze the behavior and perhaps even figure
> out what the limiting factor might be.
> 
> kldload siftr
> See "Run-time Configuration" in the siftr(4) man page for details.
> 
> I'm a little surprised Lawrence didn't already suggest this as he is one of
> the authors. (The "Bugs" section is rather long and he might know that it
> won't be useful in this case, but it has greatly helped me look at
> performance issues.)

siftr is useful if you suspect a TCP/netstack tuning issue. Given that
iperf gets good results and the OP's tuning settings should be adequate
to achieve good performance if the RTT is low (4MB
sendbuf_max/recvbuf_max), I suspect the disk subsystem and/or VM is more
likely to be the issue i.e. siftr data is probably irrelevant.

Outback Dingo: Can you confirm you have appropriate tuning on both sides
of the connection? You didn't specify if the loader.conf/sysctl.conf
parameters you provided in the reply to Jack are only on one side of the
connection or both.

Cheers,
Lawrence