Re: Poor performance with stable/13 and Mellanox ConnectX-6 (mlx5)

From: Mike Jakubik <mike.jakubik_at_swiftsmsgateway.com>
Date: Tue, 14 Jun 2022 14:21:51 UTC
Disabling rx/tx pause seems to produce higher peaks.



[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01 

Connecting to host db-01, port 5201

[  5] local 192.168.10.31 port 10146 connected to 192.168.10.30 port 5201

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd

[  5]   0.00-1.00   sec  1.89 GBytes  16.2 Gbits/sec    0   1.10 MBytes

[  5]   1.00-2.00   sec  1.86 GBytes  15.9 Gbits/sec    0   1.10 MBytes

[  5]   2.00-3.00   sec  2.05 GBytes  17.6 Gbits/sec    0   1.11 MBytes

[  5]   3.00-4.00   sec   859 MBytes  7.20 Gbits/sec   21    938 KBytes

[  5]   4.00-5.00   sec   652 MBytes  5.47 Gbits/sec    0   1.01 MBytes

[  5]   5.00-6.00   sec   659 MBytes  5.53 Gbits/sec    0   1.03 MBytes

[  5]   6.00-7.00   sec   666 MBytes  5.59 Gbits/sec    0   1.05 MBytes

[  5]   7.00-8.00   sec   657 MBytes  5.51 Gbits/sec   98    989 KBytes

[  5]   8.00-9.00   sec   665 MBytes  5.58 Gbits/sec  139    712 KBytes

[  5]   9.00-10.00  sec   647 MBytes  5.43 Gbits/sec    0   1.02 MBytes

[  5]  10.00-11.00  sec   650 MBytes  5.45 Gbits/sec    4    606 KBytes

[  5]  11.00-12.00  sec  1.53 GBytes  13.1 Gbits/sec  358   1.07 MBytes

[  5]  12.00-13.00  sec  2.10 GBytes  18.1 Gbits/sec  162    837 KBytes

[  5]  13.00-14.00  sec  2.09 GBytes  18.0 Gbits/sec  332    838 KBytes

[  5]  14.00-15.00  sec  2.43 GBytes  20.9 Gbits/sec  639    747 KBytes

[  5]  15.00-16.00  sec  2.38 GBytes  20.4 Gbits/sec  612   1.02 MBytes

[  5]  16.00-17.00  sec  2.25 GBytes  19.3 Gbits/sec  535   1.24 MBytes

[  5]  17.00-18.00  sec  2.52 GBytes  21.6 Gbits/sec  818    423 KBytes

[  5]  18.00-19.00  sec  2.29 GBytes  19.7 Gbits/sec  218    444 KBytes

[  5]  19.00-20.00  sec  2.29 GBytes  19.7 Gbits/sec  114    859 KBytes

[  5]  20.00-21.00  sec  1.65 GBytes  14.1 Gbits/sec  100    541 KBytes

[  5]  21.00-22.00  sec  1.01 GBytes  8.67 Gbits/sec    0    639 KBytes

[  5]  22.00-23.00  sec   625 MBytes  5.24 Gbits/sec    0    648 KBytes

[  5]  23.00-24.00  sec   630 MBytes  5.28 Gbits/sec    0    648 KBytes

[  5]  24.00-25.00  sec  1.56 GBytes  13.4 Gbits/sec    0    702 KBytes

[  5]  25.00-26.00  sec  1.78 GBytes  15.3 Gbits/sec  118    406 KBytes

[  5]  26.00-27.00  sec  1.37 GBytes  11.8 Gbits/sec  105    890 KBytes

[  5]  27.00-28.00  sec  1.82 GBytes  15.6 Gbits/sec  104    963 KBytes

[  5]  28.00-29.00  sec  1.68 GBytes  14.4 Gbits/sec    0   1.20 MBytes

[  5]  29.00-30.00  sec  1.67 GBytes  14.4 Gbits/sec    0   1.38 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - - 

[ ID] Interval           Transfer     Bitrate         Retr

[  5]   0.00-30.00  sec  44.8 GBytes  12.8 Gbits/sec  4477             sender

[  5]   0.00-30.01  sec  44.8 GBytes  12.8 Gbits/sec                  receiver



After a few runs:



[root@db-02 ~]# iperf3 -i 1 -t 30 -c db-01 

Connecting to host db-01, port 5201

[  5] local 192.168.10.31 port 52152 connected to 192.168.10.30 port 5201

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd

[  5]   0.00-1.00   sec  1.91 GBytes  16.4 Gbits/sec   67    606 KBytes

[  5]   1.00-2.00   sec  1.78 GBytes  15.3 Gbits/sec    0   1.07 MBytes

[  5]   2.00-3.00   sec  1.60 GBytes  13.7 Gbits/sec    0   1.54 MBytes

[  5]   3.00-4.00   sec  1.61 GBytes  13.8 Gbits/sec    0   1.61 MBytes

[  5]   4.00-5.00   sec  1.66 GBytes  14.3 Gbits/sec    0   1.61 MBytes

[  5]   5.00-6.00   sec  1.67 GBytes  14.3 Gbits/sec    0   1.61 MBytes

[  5]   6.00-7.00   sec  1.65 GBytes  14.1 Gbits/sec    0   1.61 MBytes

[  5]   7.00-8.00   sec  1.70 GBytes  14.6 Gbits/sec    0   1.61 MBytes

[  5]   8.00-9.00   sec  1.72 GBytes  14.8 Gbits/sec    0   1.61 MBytes

[  5]   9.00-10.00  sec  1.85 GBytes  15.9 Gbits/sec    0   1.61 MBytes

[  5]  10.00-11.00  sec  1.81 GBytes  15.5 Gbits/sec    0   1.61 MBytes

[  5]  11.00-12.00  sec  1.67 GBytes  14.3 Gbits/sec    0   1.61 MBytes

[  5]  12.00-13.00  sec  1.66 GBytes  14.3 Gbits/sec    0   1.61 MBytes

[  5]  13.00-14.00  sec  1.83 GBytes  15.7 Gbits/sec    0   1.61 MBytes

[  5]  14.00-15.00  sec  1.18 GBytes  10.1 Gbits/sec    0    794 KBytes

[  5]  15.00-16.00  sec  1.67 GBytes  14.4 Gbits/sec    0   1.60 MBytes

[  5]  16.00-17.00  sec  1.73 GBytes  14.8 Gbits/sec    0   1.60 MBytes

[  5]  17.00-18.00  sec  1.73 GBytes  14.9 Gbits/sec    0   1.60 MBytes

[  5]  18.00-19.00  sec  1.83 GBytes  15.7 Gbits/sec    0   1.61 MBytes

[  5]  19.00-20.00  sec  1.77 GBytes  15.2 Gbits/sec    0   1.61 MBytes

[  5]  20.00-21.00  sec  1.80 GBytes  15.5 Gbits/sec    0   1.61 MBytes

[  5]  21.00-22.00  sec  2.03 GBytes  17.4 Gbits/sec    0   1.61 MBytes

[  5]  22.00-23.00  sec  1.88 GBytes  16.1 Gbits/sec    0   1.61 MBytes

[  5]  23.00-24.00  sec  1.80 GBytes  15.5 Gbits/sec    0   1.61 MBytes

[  5]  24.00-25.01  sec  1.59 GBytes  13.6 Gbits/sec    0   1.61 MBytes

[  5]  25.01-26.00  sec  1.33 GBytes  11.4 Gbits/sec    0   1.61 MBytes

[  5]  26.00-27.00  sec  1.71 GBytes  14.7 Gbits/sec    0   1.61 MBytes

[  5]  27.00-28.00  sec  1.71 GBytes  14.7 Gbits/sec   97   1.01 MBytes

[  5]  28.00-29.00  sec   719 MBytes  6.03 Gbits/sec    0   1.01 MBytes

[  5]  29.00-30.00  sec   727 MBytes  6.10 Gbits/sec    0   1.01 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate         Retr

[  5]   0.00-30.00  sec  49.3 GBytes  14.1 Gbits/sec  164             sender

[  5]   0.00-30.00  sec  49.3 GBytes  14.1 Gbits/sec                  receiver







The CPU usage is rather low, the NIC is in the OCP port, so im sure thats designed accordingly and the NIC is bound to numa0.



CPU:  0.0% user,  0.0% nice,  0.5% system,  0.7% interrupt, 98.8% idle



  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND 

2195 root          1  52    0    17M  6884K select  83   0:14  27.99% iperf3



# vmstat -i -w1|grep mlx5

irq671: mlx5_core0                 49969      47008



(this drops to about 14k with HW LRO enabled)



The dump is rather large, i dont think i can attach it in the mailing list, but if you wish to see it i can upload it somewhere.



Thank You.








---- On Mon, 13 Jun 2022 16:42:59 -0400 Hans Petter Selasky <hps@selasky.org> wrote ----




Some ideas: 
 
Try to disable "rxpause,txpause" when setting the media. 
 
Keep HW LRO off for now, it doesn't work for large number of connections. 
 
What is the CPU usage during test? Is iperf3 running on a CPU-core which 
has direct access to the NIC's numa domain? 
 
Is the NIC installed in the "correct" PCI high-performance slot? 
 
There are some sysctl knobs which may tell where the problem is, if it's 
PCI backpressure or something else. 
 
sysctl -a | grep diag_pci_enable 
sysctl -a | grep diag_general_enable 
 
Set these two to 1, then run some traffic and dump all mce sysctls: 
 
sysctl -a | grep mce > dump.txt 
 
--HPS 
 





Mike Jakubik

https://www.swiftsmsgateway.com/



Disclaimer: This e-mail and any attachments are intended only for the use of the addressee(s) and may contain information that is privileged or confidential. If you are not the intended recipient, or responsible for delivering the information to the intended recipient, you are hereby notified that any dissemination, distribution, printing or copying of this e-mail and any attachments is strictly prohibited. If this e-mail and any attachments were received in error, please notify the sender by reply e-mail and delete the original message.