From nobody Tue Aug 03 15:50:48 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D39A21376496 for ; Tue, 3 Aug 2021 15:51:00 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GfK905Y2qz3C9J for ; Tue, 3 Aug 2021 15:51:00 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: by mail-yb1-xb2c.google.com with SMTP id w17so34495402ybl.11 for ; Tue, 03 Aug 2021 08:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kev009.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=W09Vx21c0KFoGG7KHoN+WGSZCyxsd/mvI8HEZelYbv0=; b=jN+s6mS3FfAcR6oDGzc1G1lEGic8WjTsBhis54575SLRpcA8Wn/UvCoLeLZsKHaFeS 71XYNH2EVJ2jrDC2t5w7G5CFRwqrAR8YYwaL9J3qPwbXpQ2OinoIkRcO8R6fYnvSKwoR V5uZxrtg94KdqbliEG9n6DLKkXcVP3UoFt1mw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=W09Vx21c0KFoGG7KHoN+WGSZCyxsd/mvI8HEZelYbv0=; b=s0y3mNZ/H7+DbqAEiGxX9ZYDbqHAcFfCfcu7EpcJ8HHPviCZo5CwG6wldrWZBsaBMs T5WB9th1rrrqQEth4DBiVyuh82TXnNf5TUH44+o/xOdz64DlEL9CrTcGq7y9WvPeYavi woqj3VVRBEUAiTiFFOSi+/wlVI5xodQzuPoF/liFANvDvhB4XEy+7qZQyVkmov6ukDRs Pc7Os2DH3ouHrf3ipcQ+9Jp50rMZtgr6kDbFIpt690IXWjRxxtGMjXEyjo1Sib1GDxav RkaJb5C7GlnFipqY24W0Yla4ZLKr5r3eBxyE0XWqX7ex+lXNqs1+yp6hsemjHGRSEozj SJ0A== X-Gm-Message-State: AOAM532IR8IfK5ffa4jtDXT9ocvh2eU3UVxUsa1fHwRZ09A/6+ss4xg0 MFZrdZ8aYpuENm1jE/uURtkPIwrqvRsPnHtNkmm4CffVGQE= X-Google-Smtp-Source: ABdhPJwG3yB4KfkY6FXwg8eC1i1n9O5+ByrqUC4e6zX0akMQ/OEnxxzpzNBqjMAf0G/WDx4VUQEU3GzFp40PbmyV54c= X-Received: by 2002:a25:18a:: with SMTP id 132mr28857139ybb.123.1628005860128; Tue, 03 Aug 2021 08:51:00 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Kevin Bowling Date: Tue, 3 Aug 2021 08:50:48 -0700 Message-ID: Subject: Re: igb(4) and VLAN issue? To: Franco Fichtner Cc: FreeBSD Net Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4GfK905Y2qz3C9J X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Aug 3, 2021 at 8:27 AM Franco Fichtner wrote: > > Hi Kevin, > > [RESENT TO MAILING LIST AS SUBSCRIBER] > > > On 2. Aug 2021, at 7:51 PM, Kevin Bowling wrote: > > > > I caught wind that an igb(4) commit I've done to main and that has > > been in stable/12 for a few months seems to be causing a regression on > > opnsense. The commit in question is > > https://cgit.freebsd.org/src/commit/?id=eea55de7b10808b86277d7fdbed2d05d3c6db1b2 > > > > The report is at: > > https://forum.opnsense.org/index.php?topic=23867.0 > > Looks like I spoke to soon earlier. This is a weird one for sure. :) > > So first of all this causes an ifconfig hang for VLAN/LAGG combo creation, > but later reports were coming in about ahci errors and cam timeouts. > Some reported the instabilities start with using netmap, but later others > confirmed the same for high load scenarios without netmap in use. > > The does not appear to happen when MSIX is disabled, e.g.: > > # sysctl -a | grep dev.igb | grep msix > dev.igb.5.iflib.disable_msix: 1 > dev.igb.4.iflib.disable_msix: 1 > dev.igb.3.iflib.disable_msix: 1 > dev.igb.2.iflib.disable_msix: 1 > dev.igb.1.iflib.disable_msix: 1 > dev.igb.0.iflib.disable_msix: 1 > > What's also being linked to this is some form of softraid misbehaving > and the general tendency for cheaper hardware with particular igb > chipsets. Hmm, there is so much that /could/ be going on it's not easy to pinpoint anything yet. If nothing jumps out after getting more data it may be worth mitigating in your build that way and retrying once you have updated to FreeBSD 13. > > I haven't heard of this issue elsewhere and cannot replicate it on my > > I210s running main. I've gone over the code changes line by line > > several times and verified all the logic and register writes and it > > all looks correct to my understanding. The only hypothesis I have at > > the moment is it may be some subtle timing issue since VLAN changes > > unnecessarily restart the interface on e1000 until I push in a work in > > progress to stop doing that. > > I also have no way of reproducing this locally, but the community is > probably willing to give any kernel change a try that would address > the problem without havinbg to back out the commit in question. I need some more info before making any changes. A full dmesg of the older working version and a (partial?) dmesg of the broken would be another useful data point to start out with, let's see if there is something going on during MSI-X vector allocation etc. > > I'd like to see the output of all the processes or at least the > > process configuring the VLANs to see where it is stuck. Franco, do > > you have the ability to 'control+t' there or otherwise set up a break > > into a debugger? Stacktraces would be a great start but a core and a > > kernel may be necessary if it isn't obvious. > > Let me see if I can deliver on this easily. > > > Cheers, > Franco >