From nobody Tue May 25 06:34:16 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 0C2F09F4B42 for ; Tue, 25 May 2021 06:34:30 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fq47914mpz3qBf for ; Tue, 25 May 2021 06:34:28 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: by mail-yb1-xb2f.google.com with SMTP id y36so24867755ybi.11 for ; Mon, 24 May 2021 23:34:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kev009.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=y+KccFs2oQqKia5PolT4sU0d4wouFEGyCTgLahlMB9U=; b=r7+JlHKiL1+nfSVbA/ZUNJayZz8m7MZ2mSMl2p0ZMg9x6+5ojxPBzV07u/cUFmaTcK llYQ/RE08YHfu0PIcSndkXjh8p+0P9jggQAcGbPcIMrjevEOM6trhWkDmEbHCCCoM7Cn ErfD6AC+IyhEmep0+BwHILTsIwixbhR+jKm4k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=y+KccFs2oQqKia5PolT4sU0d4wouFEGyCTgLahlMB9U=; b=bESzhgf7Xafk4f1O73hptfVffnEIsLSsCziinTSFAIrGvqb15Oc86gP9tMl/9B31pl /P1OTlb+z6hYQNZimNedm3kDafJOOxUFhOvu4qR9dLO3ttKEFm5/bhibgnrChGuYkscS E1DL/lgEOX17FikSUSE9ssH9wh47/pNbOYeYbjqCpKM/FobHL0guFIU1jbZav/mqe0tE 2k8jg4ZTRTU4Xke9Ji77k+9nD5Q4qBfcLtdpOVGy8nYjRNO08XQJSC5YF7CfJLKVGuk1 4y7uqjnGMQGG5ivXoON/RLw/RAe77UTtA30OpXSsl4zNzkKDS2QmxPVJCP2vswnmF+QI ixng== X-Gm-Message-State: AOAM530AET7bXiSurqYGnoztwiGrByasmf7hBMFYFS6AHLuqKw8KGh7R aKqayBv8n4oZp0d88QSSfFRNjBDizGEz4lE5ByOi9Q== X-Google-Smtp-Source: ABdhPJy3owA4HiK2RDDMnTdAVVoxG53DouDDaSe/vpQHsRHnsj6G0knlmwQQ7sbtapd/F0imbkTwTs8kq1uH5HauZx4= X-Received: by 2002:a25:ce51:: with SMTP id x78mr42927746ybe.87.1621924467850; Mon, 24 May 2021 23:34:27 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <91e21d18a4214af4898dd09f11144493@EX16-05.ad.unipi.it> <20210517192054.0907beea@x23> In-Reply-To: From: Kevin Bowling Date: Mon, 24 May 2021 23:34:16 -0700 Message-ID: Subject: Re: Vector Packet Processing (VPP) portability on FreeBSD To: Vincenzo Maffione Cc: Francois ten Krooden , Jacques Fourie , Marko Zec , "freebsd-net@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000c2538805c321b7f6" X-Rspamd-Queue-Id: 4Fq47914mpz3qBf X-Spamd-Bar: ++ Authentication-Results: mx1.freebsd.org; dkim=none (invalid DKIM record) header.d=kev009.com header.s=google header.b=r7+JlHKi; dmarc=none; spf=pass (mx1.freebsd.org: domain of kevin.bowling@kev009.com designates 2607:f8b0:4864:20::b2f as permitted sender) smtp.mailfrom=kevin.bowling@kev009.com X-Spamd-Result: default: False [2.05 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; SUSPICIOUS_RECIPS(1.50)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCPT_COUNT_FIVE(0.00)[5]; DKIM_TRACE(0.00)[kev009.com:~]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::b2f:from]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.85)[0.846]; NEURAL_HAM_LONG(-1.00)[-1.000]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; DMARC_NA(0.00)[kev009.com]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::b2f:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::b2f:from]; R_DKIM_PERMFAIL(0.00)[kev009.com:s=google]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net]; FREEMAIL_CC(0.00)[nanoteq.com,gmail.com,fer.hr,freebsd.org] X-Spam: Yes --000000000000c2538805c321b7f6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The one other thing I want to mention, what this means in effect is every que ends up limited by EITR on ixgbe (around 30kps with the default settings) whether it=E2=80=99s a TX or RX workload. This ends up working o= k if you have sufficient CPU but seems awkward. On the TX workload we should need a magnitude less interrupts to do 10g. There was some work to adapt AIM to this new combined handler but it is not properly tuned and I=E2=80=99m not = sure it should consider TX at all. Regards, Kevin On Mon, May 24, 2021 at 11:16 PM Kevin Bowling wrote: > I don't fully understand the issue, but in iflib_fast_intr_rxtx > https://cgit.freebsd.org/src/tree/sys/net/iflib.c#n1581 it seems like > we end up re-enabling interrupts per course instead of only handling > spurious cases or some low water threshold (which seems like it would > be tricky to do here). The idea is we want to pump interrupts by > disabling them in the msix_que handler, and then wait to re-enable > only when we have more work to do in the ift_task grouptask. > > It was a lot easier to reason about this with separate TX and RX > interrupts. Doing the combined TXRX is definitely a win in terms of > reducing msi-x vector usage (which is important in a lot of FreeBSD > use cases), but it's tricky to understand. > > My time has been sucked away due to work, so I haven't been looking at > this problem to the depth I want to. I'd be interested in discussing > it further with anyone that is interested in it. > > Regards, > Kevin > > On Tue, May 18, 2021 at 2:11 PM Vincenzo Maffione > wrote: > > > > > > > > Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling < > kevin.bowling@kev009.com> ha scritto: > >> > >> > >> > >> On Mon, May 17, 2021 at 10:20 AM Marko Zec wrote: > >>> > >>> On Mon, 17 May 2021 09:53:25 +0000 > >>> Francois ten Krooden wrote: > >>> > >>> > On 2021/05/16 09:22, Vincenzo Maffione wrote: > >>> > > >>> > > > >>> > > Hi, > >>> > > Yes, you are not using emulated netmap mode. > >>> > > > >>> > > In the test setup depicted here > >>> > > https://github.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap- > >>> > > interfaces#test-setup > >>> > > I think you should really try to replace VPP with the netmap > >>> > > "bridge" application (tools/tools/netmap/bridge.c), and see what > >>> > > numbers you get. > >>> > > > >>> > > You would run the application this way > >>> > > # bridge -i ix0 -i ix1 > >>> > > and this will forward any traffic between ix0 and ix1 (in both > >>> > > directions). > >>> > > > >>> > > These numbers would give you a better idea of where to look next > >>> > > (e.g. VPP code improvements or system tuning such as NIC > >>> > > interrupts, CPU binding, etc.). > >>> > > >>> > Thank you for the suggestion. > >>> > I did run a test with the bridge this morning, and updated the > >>> > results as well. +-------------+------------------+ > >>> > | Packet Size | Throughput (pps) | > >>> > +-------------+------------------+ > >>> > | 64 bytes | 7.197 Mpps | > >>> > | 128 bytes | 7.638 Mpps | > >>> > | 512 bytes | 2.358 Mpps | > >>> > | 1280 bytes | 964.915 kpps | > >>> > | 1518 bytes | 815.239 kpps | > >>> > +-------------+------------------+ > >>> > >>> I assume you're on 13.0 where netmap throughput is lower compared to > >>> 11.x due to migration of most drivers to iflib (apparently increased > >>> overhead) and different driver defaults. On 11.x I could move 10G li= ne > >>> rate from one ix to another at low CPU freqs, where on 13.x the CPU > >>> must be set to max speed, and still can't do 14.88 Mpps. > >> > >> > >> I believe this issue is in the combined txrx interrupt filter. It is > causing a bunch of unnecessary tx re-arms. > > > > > > Could you please elaborate on that? > > > > TX completion is indeed the one thing that changed considerably with th= e > porting to iflib. And this could be a major contributor to the performanc= e > drop. > > My understanding is that TX interrupts are not really used anymore on > multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are used, meani= ng > that a timer is used to perform TX completion. I don't know what the > motivations were for this design decision. > > I had to decrease the timer period to 90us to ensure timely completion > (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D248652). However= , > the timer period is currently not adaptive. > > > > > >> > >> > >>> > >>> #1 thing which changed: default # of packets per ring dropped down fr= om > >>> 2048 (11.x) to 1024 (13.x). Try changing this in /boot/loader.conf: > >>> > >>> dev.ixl.0.iflib.override_nrxds=3D2048 > >>> dev.ixl.0.iflib.override_ntxds=3D2048 > >>> dev.ixl.1.iflib.override_nrxds=3D2048 > >>> dev.ixl.1.iflib.override_ntxds=3D2048 > >>> etc. > >>> > >>> For me this increases the throughput of > >>> bridge -i netmap:ixl0 -i netmap:ixl1 > >>> from 9.3 Mpps to 11.4 Mpps > >>> > >>> #2: default interrupt moderation delays seem to be too long. Combine= d > >>> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from 62 > >>> (default) to 40 increases the throughput further from 11.4 to 14.5 Mp= ps > >>> > >>> Hope this helps, > >>> > >>> Marko > >>> > >>> > >>> > Besides for the 64-byte and 128-byte packets the other sizes where > >>> > matching the maximum rates possible on 10Gbps. This was when the > >>> > bridge application was running on a single core, and the cpu core w= as > >>> > maxing out at a 100%. > >>> > > >>> > I think there might be a bit of system tuning needed, but I suspect > >>> > most of the improvement would be needed in VPP. > >>> > > >>> > Regards > >>> > Francois > >>> _______________________________________________ > >>> freebsd-net@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net > >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " > --000000000000c2538805c321b7f6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The one other thing I want to mention, what this means in= effect is every que ends up limited by EITR on ixgbe (around 30kps with th= e default settings) whether it=E2=80=99s a TX or RX workload.=C2=A0 This en= ds up working ok if you have sufficient CPU but seems awkward.=C2=A0 On the= TX workload we should need a magnitude less interrupts to do 10g. There wa= s some work to adapt AIM to this new combined handler but it is not properl= y tuned and I=E2=80=99m not sure it should consider TX at all.

Regards,
Kevi= n

On Mon, May 24, 2021 at 11:16 PM Kevin Bowling <kevin.bowling@kev009.com> wrote:
=
I don't fully understand the issue, but in iflib_fast_= intr_rxtx
https://cgit.freebsd.org/src/tree/sys/net/ifl= ib.c#n1581 it seems like
we end up re-enabling interrupts per course instead of only handling
spurious cases or some low water threshold (which seems like it would
be tricky to do here).=C2=A0 The idea is we want to pump interrupts by
disabling them in the msix_que handler, and then wait to re-enable
only when we have more work to do in the ift_task grouptask.

It was a lot easier to reason about this with separate TX and RX
interrupts.=C2=A0 Doing the combined TXRX is definitely a win in terms of reducing msi-x vector usage (which is important in a lot of FreeBSD
use cases), but it's tricky to understand.

My time has been sucked away due to work, so I haven't been looking at<= br> this problem to the depth I want to.=C2=A0 I'd be interested in discuss= ing
it further with anyone that is interested in it.

Regards,
Kevin

On Tue, May 18, 2021 at 2:11 PM Vincenzo Maffione <vmaffione@freebsd.org> wrote:<= br> >
>
>
> Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling <kevin.bowling@kev009.com= > ha scritto:
>>
>>
>>
>> On Mon, May 17, 2021 at 10:20 AM Marko Zec <zec@fer.hr> wrote:
>>>
>>> On Mon, 17 May 2021 09:53:25 +0000
>>> Francois ten Krooden <ftk@Nanoteq.com> wrote:
>>>
>>> > On 2021/05/16 09:22, Vincenzo Maffione wrote:
>>> >
>>> > >
>>> > > Hi,
>>> > >=C2=A0 =C2=A0Yes, you are not using emulated netmap m= ode.
>>> > >
>>> > >=C2=A0 =C2=A0In the test setup depicted here
>>> > > https://githu= b.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap-
>>> > > interfaces#test-setup
>>> > > I think you should really try to replace VPP with th= e netmap
>>> > > "bridge" application (tools/tools/netmap/b= ridge.c), and see what
>>> > > numbers you get.
>>> > >
>>> > > You would run the application this way
>>> > > # bridge -i ix0 -i ix1
>>> > > and this will forward any traffic between ix0 and ix= 1 (in both
>>> > > directions).
>>> > >
>>> > > These numbers would give you a better idea of where = to look next
>>> > > (e.g. VPP code improvements or system tuning such as= NIC
>>> > > interrupts, CPU binding, etc.).
>>> >
>>> > Thank you for the suggestion.
>>> > I did run a test with the bridge this morning, and update= d the
>>> > results as well. +-------------+------------------+
>>> > | Packet Size | Throughput (pps) |
>>> > +-------------+------------------+
>>> > |=C2=A0 =C2=A064 bytes=C2=A0 |=C2=A0 =C2=A0 7.197 Mpps=C2= =A0 =C2=A0 |
>>> > |=C2=A0 128 bytes=C2=A0 |=C2=A0 =C2=A0 7.638 Mpps=C2=A0 = =C2=A0 |
>>> > |=C2=A0 512 bytes=C2=A0 |=C2=A0 =C2=A0 2.358 Mpps=C2=A0 = =C2=A0 |
>>> > | 1280 bytes=C2=A0 |=C2=A0 964.915 kpps=C2=A0 =C2=A0 | >>> > | 1518 bytes=C2=A0 |=C2=A0 815.239 kpps=C2=A0 =C2=A0 | >>> > +-------------+------------------+
>>>
>>> I assume you're on 13.0 where netmap throughput is lower c= ompared to
>>> 11.x due to migration of most drivers to iflib (apparently inc= reased
>>> overhead) and different driver defaults.=C2=A0 On 11.x I could= move 10G line
>>> rate from one ix to another at low CPU freqs, where on 13.x th= e CPU
>>> must be set to max speed, and still can't do 14.88 Mpps. >>
>>
>> I believe this issue is in the combined txrx interrupt filter.=C2= =A0 It is causing a bunch of unnecessary tx re-arms.
>
>
> Could you please elaborate on that?
>
> TX completion is indeed the one thing that changed considerably with t= he porting to iflib. And this could be a major contributor to the performan= ce drop.
> My understanding is that TX interrupts are not really used anymore on = multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are use= d, meaning that a timer is used to perform TX completion. I don't know = what the motivations were for this design decision.
> I had to decrease the timer period to 90us to ensure timely completion= (see https://bugs.freebsd.org/bugzilla/sh= ow_bug.cgi?id=3D248652). However, the timer period is currently not ada= ptive.
>
>
>>
>>
>>>
>>> #1 thing which changed: default # of packets per ring dropped = down from
>>> 2048 (11.x) to 1024 (13.x).=C2=A0 Try changing this in /boot/l= oader.conf:
>>>
>>> dev.ixl.0.iflib.override_nrxds=3D2048
>>> dev.ixl.0.iflib.override_ntxds=3D2048
>>> dev.ixl.1.iflib.override_nrxds=3D2048
>>> dev.ixl.1.iflib.override_ntxds=3D2048
>>> etc.
>>>
>>> For me this increases the throughput of
>>> bridge -i netmap:ixl0 -i netmap:ixl1
>>> from 9.3 Mpps to 11.4 Mpps
>>>
>>> #2: default interrupt moderation delays seem to be too long.= =C2=A0 Combined
>>> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from= 62
>>> (default) to 40 increases the throughput further from 11.4 to = 14.5 Mpps
>>>
>>> Hope this helps,
>>>
>>> Marko
>>>
>>>
>>> > Besides for the 64-byte and 128-byte packets the other si= zes where
>>> > matching the maximum rates possible on 10Gbps. This was w= hen the
>>> > bridge application was running on a single core, and the = cpu core was
>>> > maxing out at a 100%.
>>> >
>>> > I think there might be a bit of system tuning needed, but= I suspect
>>> > most of the improvement would be needed in VPP.
>>> >
>>> > Regards
>>> > Francois
>>> _______________________________________________
>>> f= reebsd-net@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman= /listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@f= reebsd.org"
--000000000000c2538805c321b7f6--