From nobody Mon Jun 13 20:42:59 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D21FB832A75 for ; Mon, 13 Jun 2022 20:43:12 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [IPv6:2a01:4f8:c17:6c4b::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4LMNnD0059z4ZX6 for ; Mon, 13 Jun 2022 20:43:11 +0000 (UTC) (envelope-from hps@selasky.org) Received: from [10.36.2.165] (unknown [178.232.223.95]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 607BF260DA0; Mon, 13 Jun 2022 22:43:04 +0200 (CEST) Message-ID: <63396d47-3d0b-fd83-7b2e-ae5c02eeae2e@selasky.org> Date: Mon, 13 Jun 2022 22:42:59 +0200 List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: Poor performance with stable/13 and Mellanox ConnectX-6 (mlx5) Content-Language: en-US To: mike.jakubik@swiftsmsgateway.com, freebsd-net References: <1815e506878.cf301a5a1195924.6506017618978817828@swiftsmsgateway.com> From: Hans Petter Selasky In-Reply-To: <1815e506878.cf301a5a1195924.6506017618978817828@swiftsmsgateway.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4LMNnD0059z4ZX6 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 2a01:4f8:c17:6c4b::2 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-1.30 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net:c]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_SPAM_SHORT(1.00)[1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; MLMMJ_DEST(0.00)[freebsd-net]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/32, country:DE]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[178.232.223.95:received] X-ThisMailContainsUnwantedMimeParts: N On 6/13/22 20:25, Mike Jakubik wrote: > Hello, > > I have two new servers with a Mellnox ConnectX-6 card linked at 25Gb/s, however, I am unable to get much more than 6Gb/s when testing with iperf3. > > > > The servers are Lenovo SR665 (2 x AMD EPYC 7443 24-Core Processor, 256 GB RAM, Mellanox ConnectX-6 Lx 10/25GbE SFP28 2-port OCP Ethernet Adapter) > > > > They are connected to a Dell N3224PX-ON switch. Both servers are idle and not in use, with a fresh install of stable/13-ebea872f8, nothing running on them except ssh, sendmail, etc. > > > > When i test with iperf3 I am unable to get a higher avg than about 6Gb/s. I have tried just about every knob listed in https://calomel.org/freebsd_network_tuning.html with little impact on the performance. The network cards have HW LRO enabled as per the driver documentation (though this only seems to lower IRQ usage with no impact on actual throughput). > > > > The same exact servers tested on Linux (fedora 34) produced nearly 3x the performance (see attached screenshots), i was able to get a steady 14.6Gb/s rate with nearly 0 retries shown in iperf, the performance on FreeBSD seems to avg at around 6Gbs but it is very sporadic during the iperf run. > > > > I have run out of ideas, any suggestions are welcome. Considering Netflix uses very similar HW and they push 400 Gb/s tells me there is something really wrong here or Netflix isnt sharing all their secret sauce. > > Some ideas: Try to disable "rxpause,txpause" when setting the media. Keep HW LRO off for now, it doesn't work for large number of connections. What is the CPU usage during test? Is iperf3 running on a CPU-core which has direct access to the NIC's numa domain? Is the NIC installed in the "correct" PCI high-performance slot? There are some sysctl knobs which may tell where the problem is, if it's PCI backpressure or something else. sysctl -a | grep diag_pci_enable sysctl -a | grep diag_general_enable Set these two to 1, then run some traffic and dump all mce sysctls: sysctl -a | grep mce > dump.txt --HPS