From nobody Tue Aug 03 15:27:51 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8432913711F0 for ; Tue, 3 Aug 2021 15:27:52 +0000 (UTC) (envelope-from franco@lastsummer.de) Received: from host64.shmhost.net (host64.shmhost.net [IPv6:2a01:4f8:a0:51d3::107:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4GfJfJ2nxjz4prV for ; Tue, 3 Aug 2021 15:27:52 +0000 (UTC) (envelope-from franco@lastsummer.de) Received: from smtpclient.apple (p200300cd873f1bfc2588dc7e5f960169.dip0.t-ipconnect.de [IPv6:2003:cd:873f:1bfc:2588:dc7e:5f96:169]) by host64.shmhost.net (Postfix) with ESMTPSA id 4GfJfH4hF4zNtK0; Tue, 3 Aug 2021 17:27:51 +0200 (CEST) Content-Type: text/plain; charset=us-ascii List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) Subject: Re: igb(4) and VLAN issue? From: Franco Fichtner In-Reply-To: Date: Tue, 3 Aug 2021 17:27:51 +0200 Cc: FreeBSD Net Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Kevin Bowling X-Mailer: Apple Mail (2.3654.100.0.2.22) X-Virus-Scanned: clamav-milter 0.103.2 at host64.shmhost.net X-Virus-Status: Clean X-Rspamd-Queue-Id: 4GfJfJ2nxjz4prV X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Hi Kevin, [RESENT TO MAILING LIST AS SUBSCRIBER] > On 2. Aug 2021, at 7:51 PM, Kevin Bowling = wrote: >=20 > I caught wind that an igb(4) commit I've done to main and that has > been in stable/12 for a few months seems to be causing a regression on > opnsense. The commit in question is > = https://cgit.freebsd.org/src/commit/?id=3Deea55de7b10808b86277d7fdbed2d05d= 3c6db1b2 >=20 > The report is at: > https://forum.opnsense.org/index.php?topic=3D23867.0 Looks like I spoke to soon earlier. This is a weird one for sure. :) So first of all this causes an ifconfig hang for VLAN/LAGG combo = creation, but later reports were coming in about ahci errors and cam timeouts. Some reported the instabilities start with using netmap, but later = others confirmed the same for high load scenarios without netmap in use. The does not appear to happen when MSIX is disabled, e.g.: # sysctl -a | grep dev.igb | grep msix dev.igb.5.iflib.disable_msix: 1 dev.igb.4.iflib.disable_msix: 1 dev.igb.3.iflib.disable_msix: 1 dev.igb.2.iflib.disable_msix: 1 dev.igb.1.iflib.disable_msix: 1 dev.igb.0.iflib.disable_msix: 1 What's also being linked to this is some form of softraid misbehaving and the general tendency for cheaper hardware with particular igb chipsets. > I haven't heard of this issue elsewhere and cannot replicate it on my > I210s running main. I've gone over the code changes line by line > several times and verified all the logic and register writes and it > all looks correct to my understanding. The only hypothesis I have at > the moment is it may be some subtle timing issue since VLAN changes > unnecessarily restart the interface on e1000 until I push in a work in > progress to stop doing that. I also have no way of reproducing this locally, but the community is probably willing to give any kernel change a try that would address the problem without havinbg to back out the commit in question. > I'd like to see the output of all the processes or at least the > process configuring the VLANs to see where it is stuck. Franco, do > you have the ability to 'control+t' there or otherwise set up a break > into a debugger? Stacktraces would be a great start but a core and a > kernel may be necessary if it isn't obvious. Let me see if I can deliver on this easily. Cheers, Franco