From nobody Thu Sep 23 22:46:37 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CA3EB175A389 for ; Thu, 23 Sep 2021 22:46:49 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HFqzF0Q91z4mTN for ; Thu, 23 Sep 2021 22:46:49 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: by mail-yb1-xb2c.google.com with SMTP id s16so1670344ybe.0 for ; Thu, 23 Sep 2021 15:46:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kev009.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=e0uMcGkE8Dc41fMS8TTFuYOnktiB6iyawt7SnC5YETo=; b=mCSpwxzyeZR5yqtkd2yJ995BboGFCLx+mJpLfpqHuyU3Q+V6W8KomTUsgkJDuQnWuj z8AKbdaz/CRtWDMDMeWg6UJeTrk8+c6CrgzycDsqrIzPlwtfRlSER+6XW/MtBXCJZ+GY PlrNX3uk/xmG5UIJbKFFZpEva9E3IQHUeU7sE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=e0uMcGkE8Dc41fMS8TTFuYOnktiB6iyawt7SnC5YETo=; b=1sLYNDXTcJ6Vo+kh5ApCKb3aukqWnFi07IvFgVK0ClverB9Po4TmF9uE3GGMj5KxvZ AK+LOK8Fmu+DDe7Ou4EN6hP7rRS+HRDtCbESqj714vEXBKMTrfKOG+cW+biL3/YtqPAT yfUiJxbyMq98k2AaBSdul0DxtRMozdpNkRvnbNcWaXMgL4UscpozD0mK365ubxQl9bag NvogqHpnJSv/EeRDN0PzhUUNiJ52tUNqMNdboFleEkuPzWfn/dEB3StpWRytLWnmrt7+ r4CZFQrnVGcBgW1DdLmK69PRgNNoCSBP+Gd6y01af0ThbPPGo32wrnLqlYs2HEAqMInr Nu+A== X-Gm-Message-State: AOAM532pav7LMIblR+Z8hH8wMb+LMTFYR6F/FdXfQKOzX+BltKFWJ+jH 57dmeFzAZlqNfwSHqnOjnNmZcPuOq01k7ep67zE5t54P5kpzynCE X-Google-Smtp-Source: ABdhPJzBvVWYmc18lenmP1knk17koP+306u8tpxY4sYKDuR01VrEw4aXthojCV5IBkQlVARUm8GFq9c1HyPS4gwaA50= X-Received: by 2002:a25:500c:: with SMTP id e12mr8518388ybb.493.1632437208411; Thu, 23 Sep 2021 15:46:48 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Kevin Bowling Date: Thu, 23 Sep 2021 15:46:37 -0700 Message-ID: Subject: Re: igb(4) and VLAN issue? To: Franco Fichtner Cc: FreeBSD Net Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HFqzF0Q91z4mTN X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none ("invalid DKIM record") header.d=kev009.com header.s=google header.b=mCSpwxzy; dmarc=none; spf=pass (mx1.freebsd.org: domain of kevin.bowling@kev009.com designates 2607:f8b0:4864:20::b2c as permitted sender) smtp.mailfrom=kevin.bowling@kev009.com X-Spamd-Result: default: False [-2.30 / 15.00]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[kev009.com:~]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-0.998]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; DMARC_NA(0.00)[kev009.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::b2c:from]; R_DKIM_PERMFAIL(0.00)[kev009.com:s=google]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N Franco, I think I found it: https://reviews.freebsd.org/D32087 Regards, Kevin On Tue, Aug 3, 2021 at 8:50 AM Kevin Bowling wrote: > > On Tue, Aug 3, 2021 at 8:27 AM Franco Fichtner wrote: > > > > Hi Kevin, > > > > [RESENT TO MAILING LIST AS SUBSCRIBER] > > > > > On 2. Aug 2021, at 7:51 PM, Kevin Bowling wrote: > > > > > > I caught wind that an igb(4) commit I've done to main and that has > > > been in stable/12 for a few months seems to be causing a regression on > > > opnsense. The commit in question is > > > https://cgit.freebsd.org/src/commit/?id=eea55de7b10808b86277d7fdbed2d05d3c6db1b2 > > > > > > The report is at: > > > https://forum.opnsense.org/index.php?topic=23867.0 > > > > Looks like I spoke to soon earlier. This is a weird one for sure. :) > > > > So first of all this causes an ifconfig hang for VLAN/LAGG combo creation, > > but later reports were coming in about ahci errors and cam timeouts. > > Some reported the instabilities start with using netmap, but later others > > confirmed the same for high load scenarios without netmap in use. > > > > The does not appear to happen when MSIX is disabled, e.g.: > > > > # sysctl -a | grep dev.igb | grep msix > > dev.igb.5.iflib.disable_msix: 1 > > dev.igb.4.iflib.disable_msix: 1 > > dev.igb.3.iflib.disable_msix: 1 > > dev.igb.2.iflib.disable_msix: 1 > > dev.igb.1.iflib.disable_msix: 1 > > dev.igb.0.iflib.disable_msix: 1 > > > > What's also being linked to this is some form of softraid misbehaving > > and the general tendency for cheaper hardware with particular igb > > chipsets. > > Hmm, there is so much that /could/ be going on it's not easy to > pinpoint anything yet. If nothing jumps out after getting more data > it may be worth mitigating in your build that way and retrying once > you have updated to FreeBSD 13. > > > > I haven't heard of this issue elsewhere and cannot replicate it on my > > > I210s running main. I've gone over the code changes line by line > > > several times and verified all the logic and register writes and it > > > all looks correct to my understanding. The only hypothesis I have at > > > the moment is it may be some subtle timing issue since VLAN changes > > > unnecessarily restart the interface on e1000 until I push in a work in > > > progress to stop doing that. > > > > I also have no way of reproducing this locally, but the community is > > probably willing to give any kernel change a try that would address > > the problem without havinbg to back out the commit in question. > > I need some more info before making any changes. A full dmesg of the > older working version and a (partial?) dmesg of the broken would be > another useful data point to start out with, let's see if there is > something going on during MSI-X vector allocation etc. > > > > I'd like to see the output of all the processes or at least the > > > process configuring the VLANs to see where it is stuck. Franco, do > > > you have the ability to 'control+t' there or otherwise set up a break > > > into a debugger? Stacktraces would be a great start but a core and a > > > kernel may be necessary if it isn't obvious. > > > > Let me see if I can deliver on this easily. > > > > > > Cheers, > > Franco > >