From nobody Tue Mar 15 16:30:41 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E828F1A2A89A for ; Tue, 15 Mar 2022 16:30:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KHzRb6FjSz4ptg; Tue, 15 Mar 2022 16:30:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1647361851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GWKriZHUF/ETK3855ApxgRgi9pW5v7PHWjQrp07PtPw=; b=KVOWZn1KYcV433idplOy8MFnUgALq4yaQsBkfm0Y1FXAEap7f3A017yRbaKTUfY75Kjlr7 3EsZF7tRsRxtyZYv4mjQqwALY2+k1tUTvkKbeiFc9HnfJrmUvmiDvMRiglpDCeq/aT4Bdn m0Rv4M6wpj8QzIbphXbu6U0qrlcMyZ1GyAErVDNAly/BDTQI1wJR4wg0iIT8Hd+CJG5l3G QOTq+ipiB0znKGN/cNvXV+p4KrXwtxBZ8oYrmRV7DPPWoICy4h7hLompnOSiABtTS6/e0H ifNYGKJMiVFsex/NBX9E26IGjrWD44i+S8kORK+h66HU1Y98eOInp0r8shRw6A== Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "R3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 9D0F920062; Tue, 15 Mar 2022 16:30:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id E40B023430; Tue, 15 Mar 2022 17:30:48 +0100 (CET) From: Kristof Provost To: Michael Gmelin Cc: "Bjoern A. Zeeb" , Johan Hendriks , "Patrick M. Hausen" , freeBSD-net Subject: Re: epair and vnet jail loose connection. Date: Tue, 15 Mar 2022 10:30:41 -0600 X-Mailer: MailMate (1.14r5852) Message-ID: <2131DA64-EB0F-4908-9B6C-50175311D941@FreeBSD.org> In-Reply-To: <20220315010230.6083dd72.grembo@freebsd.org> References: <797A280E-5DF2-4276-BB72-E4E1053A19FA@lists.zabbadoz.net> <6086BA6D-3D54-4851-B636-3B32FACB35E9@freebsd.org> <3B5E2D6F-5444-4448-B7C3-704E294368C3@lists.zabbadoz.net> <20220314144451.35f803a9.grembo@freebsd.org> <20220315010230.6083dd72.grembo@freebsd.org> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1647361851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GWKriZHUF/ETK3855ApxgRgi9pW5v7PHWjQrp07PtPw=; b=pGzJ9TPJX1LZH1ZTI1MuJcXijEYkBH+kqCc9qFOY1xWrGSs3DoHFTpUwuoALb76Agbfuk3 yYSDQ35G0+qCZ8nUgJ8USDAW9N3lrawrTYDqWy3ONhyPSR9KfJfMTYx5XG2/r9Tb7ZVSqC sOImPq+PMOCCjKGkZlBYVgU1pB20///UlnuntN0yKwym1P2/Q3zeJRb/i2tA1licVrsjQq 8Lqu6LqBbhp7TbmsKh+cP168fF3UP05qM9XikyiT8x9gSP8vs4FM3mxhxLlqGqyH5d/wJv 1iDNOYjqkb2hRS76gElJU/hODqPLPL8yaMOiPI/Hrqwf8hhxPOoFHKilNfaVcw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1647361851; a=rsa-sha256; cv=none; b=H7fiVGKo1yV9lsWWt8eIcTOUZjiPhGNgdKZvzjhRHcZBPeUiXCu3T5/XBRSuOA1ZE4crET 8Fe9BV8/9eSue8VybpgVpzkk6VhTfOfUvKO6w83uAz5o9yJ/gpV/ZXuLF2XvNlEtkYPMnj kS5KIimfPqOMIElDJZkxw4+O9NjcVDCKMVYcpHF/5jB7UaLuZLZuEvXFMeDa4kTl04BzNM Nc1uXeQBDViMxptsBkXklacdJFF8Gv1iTCtmdrVRXXp7G69YV1ZCAO//7FqG3PYuzE3F35 7cRZmfEkIkAqE/hqck05H68cUpClin93+3imeRQPcMQ7Y2JMqB/lw8AA2azIUw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N On 14 Mar 2022, at 18:02, Michael Gmelin wrote: > On Mon, 14 Mar 2022 09:09:49 -0600 > Kristof Provost wrote: > >> On 14 Mar 2022, at 7:44, Michael Gmelin wrote: >>> On Sun, 13 Mar 2022 17:53:44 +0000 >>> "Bjoern A. Zeeb" wrote: >>> >>>> On 13 Mar 2022, at 17:45, Michael Gmelin wrote: >>>> >>>>>> On 13. Mar 2022, at 18:16, Bjoern A. Zeeb >>>>>> wrote: >>>>>> >>>>>> =EF=BB=BFOn 13 Mar 2022, at 16:33, Michael Gmelin wrote: >>>>>>> It's important to point out that this only happens with >>>>>>> kern.ncpu>1. With kern.ncpu=3D=3D1 nothing gets stuck. >>>>>>> >>>>>>> This perfectly fits into the picture, since, as pointed out by >>>>>>> Johan, >>>>>>> the first commit that is affected[0] is about multicore >>>>>>> support. >>>>>> >>>>>> Ignore my ignorance, what is the default of net.isr.maxthreads >>>>>> and net.isr.bindthreads (in stable/13) these days? >>>>>> >>>>> >>>>> My tests were on CURRENT and I=E2=80=99m afk, but according to cgit= [0][1], >>>>> max is 1 and bind is 0. >>>>> >>>>> Would it make sense to repeat the test with max=3D-1? >>>> >>>> I=E2=80=99d say yes, I=E2=80=99d also bind, but that=E2=80=99s just = me. >>>> >>>> I would almost assume Kristof running with -1 by default (but he >>>> can chime in on that). >>> >>> I tried various configuration permutations, all with ncpu=3D2: >>> >>> - 14.0-CURRENT #0 main-n253697-f1d450ddee6 >>> - 13.1-BETA1 #0 releng/13.1-n249974-ad329796bdb >>> - net.isr.maxthreads: -1 (which results in 2 threads), 1, 2 >>> - net.isr.bindthreads: -1, 0, 1, 2 >>> - net.isr.dispatch: direct, deferred >>> >>> All resulting in the same behavior (hang after a few seconds). They >>> all >>> work ok when running on a single core instance (threads=3D1 in this >>> case). >>> >>> I also ran the same test on 13.0-RELEASE-p7 for >>> comparison (unsurprisingly, it's ok). >>> >>> I placed the script to reproduce the issue on freefall for your >>> convenience, so running it is as simple as: >>> >>> fetch https://people.freebsd.org/~grembo/hang_epair.sh >>> # inspect content >>> sh hang_epair.sh >>> >>> or, if you feel lucky >>> >>> fetch -o - https://people.freebsd.org/~grembo/hang_epair.sh | sh >>> >> With that script I can also reproduce the problem. >> >> I=E2=80=99ve experimented with this hack: >> >> diff --git a/sys/net/if_epair.c b/sys/net/if_epair.c >> index c39434b31b9f..1e6bb07ccc4e 100644 >> --- a/sys/net/if_epair.c >> +++ b/sys/net/if_epair.c >> @@ -415,7 +415,10 @@ epair_ioctl(struct ifnet *ifp, u_long >> cmd, caddr_t data) >> >> case SIOCSIFMEDIA: >> case SIOCGIFMEDIA: >> + printf("KP: %s() SIOCGIFMEDIA\n", __func__); >> sc =3D ifp->if_softc; >> + taskqueue_enqueue(epair_tasks.tq[0], >> &sc->queues[0].tx_task); >> + >> error =3D ifmedia_ioctl(ifp, ifr, &sc->media, >> cmd); break; >> >> That kicks the receive code whenever I `ifconfig epair0a`, and I see >> a little more traffic every time I do so. >> That suggests pretty strongly that there=E2=80=99s an issue with how w= e >> dispatch work to the handler thread. So presumably there=E2=80=99s a r= ace >> between epair_menq() and epair_tx_start_deferred(). >> >> epair_menq() tries to only enqueue the receive work if there=E2=80=99s= >> nothing in the buf_ring, on the grounds that if there is the previous >> packet scheduled the work. Clearly there=E2=80=99s an issue there. >> >> I=E2=80=99ll try to dig into that in the next few days. >> > > Hi Kristof, > > This sounds plausible. I spent a few hours getting familiar with the > epair code and came up with a patch that seems to fix the issue at hand= > (both with and without RSS). I'm not certain that it is a good > solution, especially in terms of performance, but I wanted to share it > with you anyway, maybe it helps: > https://people.freebsd.org/~grembo/epair.patch > That seems to be working, and at first glance doesn=E2=80=99t look like i= t=E2=80=99d hurt performance too badly. Can you write up a commit message and post it on phabricator? Kristof