Re: Poor performance with stable/13 and Mellanox ConnectX-6 (mlx5)

From: Mike Jakubik <mike.jakubik_at_swiftsmsgateway.com>
Date: Mon, 13 Jun 2022 19:17:09 UTC
Hi,



No, I do not see any retransmission in Linux (see the forum URL for screenshots) so I do not think this is a hardware issue. I don't think these cards have flow control on them. I also do not see any errors, drops, or collisions in netstat -i. It's like the network stack doesnt know what do do initially, it seems to sometimes even out after a few seconds, see below. In Linux I get instant 14.6Gb and it stays that way, with zero retries.



[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01 

Connecting to host db-01, port 5201

[  5] local 192.168.10.31 port 42022 connected to 192.168.10.30 port 5201

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd

[  5]   0.00-1.00   sec   623 MBytes  5.23 Gbits/sec  171    640 KBytes

[  5]   1.00-2.00   sec   613 MBytes  5.14 Gbits/sec  135    543 KBytes

[  5]   2.00-3.00   sec   662 MBytes  5.55 Gbits/sec  107    471 KBytes

[  5]   3.00-4.00   sec   718 MBytes  6.02 Gbits/sec   32    350 KBytes

[  5]   4.00-5.00   sec   709 MBytes  5.95 Gbits/sec   28    685 KBytes

[  5]   5.00-6.00   sec   713 MBytes  5.98 Gbits/sec   39    603 KBytes

[  5]   6.00-7.00   sec   704 MBytes  5.91 Gbits/sec   95    540 KBytes

[  5]   7.00-8.00   sec   716 MBytes  6.01 Gbits/sec   49    466 KBytes

[  5]   8.00-9.00   sec   722 MBytes  6.06 Gbits/sec  132    752 KBytes

[  5]   9.00-10.00  sec   720 MBytes  6.04 Gbits/sec   19    649 KBytes

[  5]  10.00-11.00  sec   720 MBytes  6.04 Gbits/sec  267    474 KBytes

[  5]  11.00-12.00  sec   675 MBytes  5.65 Gbits/sec  138   1.16 MBytes

[  5]  12.00-13.00  sec  1.04 GBytes  8.96 Gbits/sec  118   1.22 MBytes

[  5]  13.00-14.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.29 MBytes

[  5]  14.00-15.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.31 MBytes

[  5]  15.00-16.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.34 MBytes

[  5]  16.00-17.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.34 MBytes

[  5]  17.00-18.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.36 MBytes

[  5]  18.00-19.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.36 MBytes

[  5]  19.00-20.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.37 MBytes

[  5]  20.00-21.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.39 MBytes

[  5]  21.00-22.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.40 MBytes

[  5]  22.00-23.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.41 MBytes

[  5]  23.00-24.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.41 MBytes

[  5]  24.00-25.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.42 MBytes

[  5]  25.00-26.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.44 MBytes

[  5]  26.00-27.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.44 MBytes

[  5]  27.00-28.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.44 MBytes

[  5]  28.00-29.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.45 MBytes

[  5]  29.00-30.00  sec  1.29 GBytes  11.1 Gbits/sec    0   1.46 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate         Retr

[  5]   0.00-30.00  sec  31.1 GBytes  8.91 Gbits/sec  1330             sender

[  5]   0.00-30.00  sec  31.1 GBytes  8.91 Gbits/sec                  receiver





Thanks.







---- On Mon, 13 Jun 2022 14:41:05 -0400 Santiago Martinez <mailto:sm@codenetworks.net> wrote ----








Mike Jakubik

https://www.swiftsmsgateway.com/



Disclaimer: This e-mail and any attachments are intended only for the use of the addressee(s) and may contain information that is privileged or confidential. If you are not the intended recipient, or responsible for delivering the information to the intended recipient, you are hereby notified that any dissemination, distribution, printing or copying of this e-mail and any attachments is strictly prohibited. If this e-mail and any attachments were received in error, please notify the sender by reply e-mail and delete the original message.








Hi there, there are a lot of re-transmission there... do you see
      the same with Linux? 

Are you seeing any drops or error counters increasing on the
      switch side? 

Have you check the sysctl for the card, I never used mellanox,
      but im pretty sure people here can help you.

You can also give it a try disabling control flow.

Hope it helps.

Santi



On 6/13/22 20:25, Mike Jakubik wrote:




Hello,



I have two new servers with a Mellnox ConnectX-6 card
              linked at 25Gb/s, however, I am unable to get much more
              than 6Gb/s when testing with iperf3.



The servers are Lenovo SR665 (2 x AMD EPYC 7443 24-Core
              Processor, 256 GB RAM, Mellanox ConnectX-6 Lx 10/25GbE
              SFP28 2-port OCP Ethernet Adapter)



They are connected to a Dell N3224PX-ON switch. Both
              servers are idle and not in use, with a fresh install
              of stable/13-ebea872f8, nothing running on them except
              ssh, sendmail, etc.



When i test with iperf3 I am unable to get a higher avg
              than about 6Gb/s. I have tried just about every knob
              listed in https://calomel.org/freebsd_network_tuning.html with
              little impact on the performance. The network cards have
              HW LRO enabled as per the driver documentation (though
              this only seems to lower IRQ usage with no impact on
              actual throughput).



The same exact servers tested on Linux (fedora 34)
              produced nearly 3x the performance (see attached
              screenshots), i was able to get a steady 14.6Gb/s rate
              with nearly 0 retries shown in iperf, the performance on
              FreeBSD seems to avg at around 6Gbs but it is very
              sporadic during the iperf run.



I have run out of ideas, any suggestions are welcome.
              Considering Netflix uses very similar HW and they push 400
              Gb/s tells me there is something really wrong here or
              Netflix isnt sharing all their secret sauce.





# ifconfig mce0

mce0:
              flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>
              metric 0 mtu 1500 

options=ffed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP,NOMAP,TXTLS4,TXTLS6,VXLAN_HWCSUM,VXLAN_HWTSO,TXTLS_RTLMT>

        ether b8:ce:f6:81:df:6a

        inet 192.168.10.31 netmask 0xffffff00 broadcast
              192.168.10.255

        media: Ethernet 25GBase-CR
              <full-duplex,rxpause,txpause>

        status: active

        nd6
              options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>





[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01 

Connecting to host db-01, port 5201

[  5] local 192.168.10.31 port 64695 connected to
              192.168.10.30 port 5201

[ ID] Interval           Transfer     Bitrate        
              Retr  Cwnd

[  5]   0.00-1.00   sec   930 MBytes  7.80 Gbits/sec  
              62    789 KBytes

[  5]   1.00-2.00   sec   942 MBytes  7.90 Gbits/sec 
              164    824 KBytes

[  5]   2.00-3.00   sec  1.00 GBytes  8.61 Gbits/sec 
              402    879 KBytes

[  5]   3.00-4.00   sec   761 MBytes  6.39 Gbits/sec  
              61    588 KBytes

[  5]   4.00-5.00   sec   724 MBytes  6.07 Gbits/sec 
              220    497 KBytes

[  5]   5.00-6.00   sec   723 MBytes  6.07 Gbits/sec  
              54    364 KBytes

[  5]   6.00-7.00   sec   716 MBytes  6.01 Gbits/sec 
              187    682 KBytes

[  5]   7.00-8.00   sec   728 MBytes  6.11 Gbits/sec  
              86    568 KBytes

[  5]   8.00-9.00   sec   761 MBytes  6.39 Gbits/sec  
              37    418 KBytes

[  5]   9.00-10.00  sec   733 MBytes  6.15 Gbits/sec   
              8    617 KBytes

[  5]  10.00-11.00  sec   734 MBytes  6.16 Gbits/sec 
              238    474 KBytes

[  5]  11.00-12.00  sec   736 MBytes  6.17 Gbits/sec 
              164    757 KBytes

[  5]  12.00-13.00  sec   610 MBytes  5.12 Gbits/sec 
              118    579 KBytes

[  5]  13.00-14.00  sec  1.02 GBytes  8.75 Gbits/sec 
              447    449 KBytes

[  5]  14.00-15.00  sec   728 MBytes  6.11 Gbits/sec 
              132    719 KBytes

[  5]  15.00-16.00  sec   724 MBytes  6.07 Gbits/sec 
              185    649 KBytes

[  5]  16.00-17.00  sec   597 MBytes  5.01 Gbits/sec 
              142    570 KBytes

[  5]  17.00-18.00  sec   733 MBytes  6.15 Gbits/sec 
              102    484 KBytes

[  5]  18.00-19.00  sec   726 MBytes  6.09 Gbits/sec  
              15    569 KBytes

[  5]  19.00-20.00  sec   733 MBytes  6.15 Gbits/sec 
              181    527 KBytes

[  5]  20.00-21.00  sec   729 MBytes  6.12 Gbits/sec 
              118    430 KBytes

[  5]  21.00-22.00  sec   733 MBytes  6.15 Gbits/sec 
              116    641 KBytes

[  5]  22.00-23.00  sec   728 MBytes  6.10 Gbits/sec 
              182    598 KBytes

[  5]  23.00-24.00  sec   743 MBytes  6.24 Gbits/sec 
              209    614 KBytes

[  5]  24.00-25.00  sec   746 MBytes  6.26 Gbits/sec  
              72    758 KBytes

[  5]  25.00-26.00  sec   742 MBytes  6.23 Gbits/sec 
              199    675 KBytes

[  5]  26.00-27.00  sec   799 MBytes  6.70 Gbits/sec 
              183    542 KBytes

[  5]  27.00-28.00  sec   908 MBytes  7.61 Gbits/sec   
              7   1.19 MBytes

[  5]  28.00-29.00  sec  1.37 GBytes  11.7 Gbits/sec 
              606   1013 KBytes

[  5]  29.00-30.00  sec  1.31 GBytes  11.3 Gbits/sec  
              74   1.02 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate        
              Retr

[  5]   0.00-30.00  sec  23.7 GBytes  6.79 Gbits/sec 
              4771             sender

[  5]   0.00-30.00  sec  23.7 GBytes  6.79
              Gbits/sec                  receiver





I have even tried changing to the RACK TCP stack, only
              to get slightly better results, however with RACK the
              amount of retries is nearly 0.



[root@db-02 ~]# sysctl
              net.inet.tcp.functions_default=rack 

net.inet.tcp.functions_default: rack -> rack

[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01

[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01

Connecting to host db-01, port 5201

[  5] local 192.168.10.31 port 51894 connected to
              192.168.10.30 port 5201

[ ID] Interval           Transfer     Bitrate        
              Retr  Cwnd

[  5]   0.00-1.00   sec   761 MBytes  6.38 Gbits/sec   
              0    737 KBytes

[  5]   1.00-2.00   sec   859 MBytes  7.21 Gbits/sec   
              0    761 KBytes

[  5]   2.00-3.00   sec   880 MBytes  7.38 Gbits/sec   
              0    785 KBytes

[  5]   3.00-4.00   sec   734 MBytes  6.16 Gbits/sec   
              0    804 KBytes

[  5]   4.00-5.00   sec   777 MBytes  6.52 Gbits/sec   
              0    824 KBytes

[  5]   5.00-6.00   sec   719 MBytes  6.03 Gbits/sec   
              0    841 KBytes

[  5]   6.00-7.00   sec   865 MBytes  7.26 Gbits/sec   
              0    862 KBytes

[  5]   7.00-8.00   sec   880 MBytes  7.38 Gbits/sec   
              0    882 KBytes

[  5]   8.00-9.00   sec   906 MBytes  7.60 Gbits/sec   
              0    904 KBytes

[  5]   9.00-10.00  sec   749 MBytes  6.29 Gbits/sec   
              0    921 KBytes

[  5]  10.00-11.00  sec   798 MBytes  6.69 Gbits/sec   
              0    938 KBytes

[  5]  11.00-12.00  sec   746 MBytes  6.26 Gbits/sec 
              209    772 KBytes

[  5]  12.00-13.00  sec   768 MBytes  6.44 Gbits/sec  
              35    644 KBytes

[  5]  13.00-14.00  sec   948 MBytes  7.95 Gbits/sec   
              0    673 KBytes

[  5]  14.00-15.00  sec  1.23 GBytes  10.5 Gbits/sec   
              0    711 KBytes

[  5]  15.00-16.00  sec  1.32 GBytes  11.4 Gbits/sec   
              0    748 KBytes

[  5]  16.00-17.00  sec  1.31 GBytes  11.2 Gbits/sec   
              0    785 KBytes

[  5]  17.00-18.00  sec  1.29 GBytes  11.1 Gbits/sec   
              0    819 KBytes

[  5]  18.00-19.00  sec  1.30 GBytes  11.2 Gbits/sec   
              0    852 KBytes

[  5]  19.00-20.00  sec  1.34 GBytes  11.5 Gbits/sec   
              0    883 KBytes

[  5]  20.00-21.00  sec  1.29 GBytes  11.1 Gbits/sec   
              0    914 KBytes

[  5]  21.00-22.00  sec  1.36 GBytes  11.7 Gbits/sec   
              0    944 KBytes

[  5]  22.00-23.00  sec  1.33 GBytes  11.4 Gbits/sec   
              0    974 KBytes

[  5]  23.00-24.00  sec  1.31 GBytes  11.2 Gbits/sec   
              0   1003 KBytes

[  5]  24.00-25.00  sec  1.30 GBytes  11.2 Gbits/sec   
              0   1.00 MBytes

[  5]  25.00-26.00  sec  1.34 GBytes  11.5 Gbits/sec   
              0   1.03 MBytes

[  5]  26.00-27.00  sec  1.32 GBytes  11.3 Gbits/sec   
              0   1.06 MBytes

[  5]  27.00-28.00  sec   957 MBytes  8.03 Gbits/sec   
              0   1.07 MBytes

[  5]  28.00-29.00  sec   837 MBytes  7.02 Gbits/sec   
              0   1.09 MBytes

[  5]  29.00-30.00  sec   729 MBytes  6.11 Gbits/sec   
              0   1.10 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate        
              Retr

[  5]   0.00-30.00  sec  30.6 GBytes  8.77 Gbits/sec 
              244             sender

[  5]   0.00-30.00  sec  30.6 GBytes  8.77
              Gbits/sec                  receiver







More data can be found @ https://forums.freebsd.org/threads/poor-performance-with-stable-13-and-mellanox-connectx-6-mlx5.85460/







Mike Jakubik

https://www.swiftsmsgateway.com/



Disclaimer:
                    This e-mail and any attachments are intended only
                    for the use of the addressee(s) and may contain
                    information that is privileged or confidential. If
                    you are not the intended recipient, or responsible
                    for delivering the information to the intended
                    recipient, you are hereby notified that any
                    dissemination, distribution, printing or copying of
                    this e-mail and any attachments is strictly
                    prohibited. If this e-mail and any attachments were
                    received in error, please notify the sender by reply
                    e-mail and delete the original message.