Re: Vector Packet Processing (VPP) portability on FreeBSD

From: Kevin Bowling <kevin.bowling_at_kev009.com>
Date: Tue, 25 May 2021 06:16:48 UTC
I don't fully understand the issue, but in iflib_fast_intr_rxtx
https://cgit.freebsd.org/src/tree/sys/net/iflib.c#n1581 it seems like
we end up re-enabling interrupts per course instead of only handling
spurious cases or some low water threshold (which seems like it would
be tricky to do here).  The idea is we want to pump interrupts by
disabling them in the msix_que handler, and then wait to re-enable
only when we have more work to do in the ift_task grouptask.

It was a lot easier to reason about this with separate TX and RX
interrupts.  Doing the combined TXRX is definitely a win in terms of
reducing msi-x vector usage (which is important in a lot of FreeBSD
use cases), but it's tricky to understand.

My time has been sucked away due to work, so I haven't been looking at
this problem to the depth I want to.  I'd be interested in discussing
it further with anyone that is interested in it.

Regards,
Kevin

On Tue, May 18, 2021 at 2:11 PM Vincenzo Maffione <vmaffione@freebsd.org> wrote:
>
>
>
> Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling <kevin.bowling@kev009.com> ha scritto:
>>
>>
>>
>> On Mon, May 17, 2021 at 10:20 AM Marko Zec <zec@fer.hr> wrote:
>>>
>>> On Mon, 17 May 2021 09:53:25 +0000
>>> Francois ten Krooden <ftk@Nanoteq.com> wrote:
>>>
>>> > On 2021/05/16 09:22, Vincenzo Maffione wrote:
>>> >
>>> > >
>>> > > Hi,
>>> > >   Yes, you are not using emulated netmap mode.
>>> > >
>>> > >   In the test setup depicted here
>>> > > https://github.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap-
>>> > > interfaces#test-setup
>>> > > I think you should really try to replace VPP with the netmap
>>> > > "bridge" application (tools/tools/netmap/bridge.c), and see what
>>> > > numbers you get.
>>> > >
>>> > > You would run the application this way
>>> > > # bridge -i ix0 -i ix1
>>> > > and this will forward any traffic between ix0 and ix1 (in both
>>> > > directions).
>>> > >
>>> > > These numbers would give you a better idea of where to look next
>>> > > (e.g. VPP code improvements or system tuning such as NIC
>>> > > interrupts, CPU binding, etc.).
>>> >
>>> > Thank you for the suggestion.
>>> > I did run a test with the bridge this morning, and updated the
>>> > results as well. +-------------+------------------+
>>> > | Packet Size | Throughput (pps) |
>>> > +-------------+------------------+
>>> > |   64 bytes  |    7.197 Mpps    |
>>> > |  128 bytes  |    7.638 Mpps    |
>>> > |  512 bytes  |    2.358 Mpps    |
>>> > | 1280 bytes  |  964.915 kpps    |
>>> > | 1518 bytes  |  815.239 kpps    |
>>> > +-------------+------------------+
>>>
>>> I assume you're on 13.0 where netmap throughput is lower compared to
>>> 11.x due to migration of most drivers to iflib (apparently increased
>>> overhead) and different driver defaults.  On 11.x I could move 10G line
>>> rate from one ix to another at low CPU freqs, where on 13.x the CPU
>>> must be set to max speed, and still can't do 14.88 Mpps.
>>
>>
>> I believe this issue is in the combined txrx interrupt filter.  It is causing a bunch of unnecessary tx re-arms.
>
>
> Could you please elaborate on that?
>
> TX completion is indeed the one thing that changed considerably with the porting to iflib. And this could be a major contributor to the performance drop.
> My understanding is that TX interrupts are not really used anymore on multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are used, meaning that a timer is used to perform TX completion. I don't know what the motivations were for this design decision.
> I had to decrease the timer period to 90us to ensure timely completion (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248652). However, the timer period is currently not adaptive.
>
>
>>
>>
>>>
>>> #1 thing which changed: default # of packets per ring dropped down from
>>> 2048 (11.x) to 1024 (13.x).  Try changing this in /boot/loader.conf:
>>>
>>> dev.ixl.0.iflib.override_nrxds=2048
>>> dev.ixl.0.iflib.override_ntxds=2048
>>> dev.ixl.1.iflib.override_nrxds=2048
>>> dev.ixl.1.iflib.override_ntxds=2048
>>> etc.
>>>
>>> For me this increases the throughput of
>>> bridge -i netmap:ixl0 -i netmap:ixl1
>>> from 9.3 Mpps to 11.4 Mpps
>>>
>>> #2: default interrupt moderation delays seem to be too long.  Combined
>>> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from 62
>>> (default) to 40 increases the throughput further from 11.4 to 14.5 Mpps
>>>
>>> Hope this helps,
>>>
>>> Marko
>>>
>>>
>>> > Besides for the 64-byte and 128-byte packets the other sizes where
>>> > matching the maximum rates possible on 10Gbps. This was when the
>>> > bridge application was running on a single core, and the cpu core was
>>> > maxing out at a 100%.
>>> >
>>> > I think there might be a bit of system tuning needed, but I suspect
>>> > most of the improvement would be needed in VPP.
>>> >
>>> > Regards
>>> > Francois
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"