Re: Chelsio NIC with RSS - Traffic distribution to different Queues

From: <Josef.Zahner1_at_swisscom.com>
Date: Mon, 03 Jul 2023 12:44:09 UTC
Hi Navdeep

Sorry if it is a dump question, but your link contains only a diff for your git commit. Where do I find the patch?

I’ve executed the requested commands. The netstat command doesn’t show any drops during the flapping. You can see the number of packets goes down somewhere in the middle, that’s when CARP is getting interrupted and the traffic switched over to the other server and shortly later is coming back.

root@fw-94:~ # sysctl hw.model hw.ncpu hw.physmem
hw.model: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
hw.ncpu: 32
hw.physmem: 137259728896


root@fw-94:~ # netstat -dw1 -I cxl0
            input           cxl0           output
   packets  errs idrops      bytes    packets  errs      bytes colls drops
    671238     0     0 1021616323          4     0       2363     0     0
    747413     0     2 1137554493          5     0       2442     0     0
    607635     0    11  924813835          4     0       2372     0     0
    393916     0    10  599533517          4     0       2369     0     0
    810438     0    23 1233479762          0     0          0     0     0
   1029533     0    17 1566935966          8     0       4776     0     0
         0     0     0          0          0     0          0     0     0
    610613     0     2  929344893          4     0       2386     0     0
   1164903     0    25 1772971854          9     0       4847     0     0
    267084     0     5  406495986          1     0         98     0     0
    145353     0     0  221213912          4     0       2374     0     0
         5     0     0        817          5     0       2434     0     0

As soon as I see your code I try to find out how I can get this patch integrated into my OPNsense firewall with FreeBSD 13.1.

Cheers Josef


From: Navdeep Parhar <np@freebsd.org>
Date: Sunday, 2 July 2023 at 03:21
To: Zahner Josef, GSA-REP-LI <Josef.Zahner1@swisscom.com>
Cc: freebsd-net@freebsd.org <freebsd-net@freebsd.org>
Subject: Re: Chelsio NIC with RSS - Traffic distribution to different Queues
Hello,

Please try this patch: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.freebsd.org%2F~np%2Fnorssq.diff&data=05%7C01%7CJosef.Zahner1%40swisscom.com%7C594d42052a784ef51f2c08db7a9a9c3c%7C364e5b87c1c7420d9beec35d19b557a1%7C0%7C0%7C638238576699363621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1URgDv6dFKJITqMKlW9quvUNBS4rsf5QK4m4YWIWm2g%3D&reserved=0<https://people.freebsd.org/~np/norssq.diff>

It adds these sysctls to the driver.
1) hw.cxgbe.rsrv_norssq.  This is what you originally asked for.
2) hw.cxgbe.rx_budget.  This can be used to force the driver's RX to
yield periodically.

What kind of system (cores, memory, etc.) is this?  Control packets
are either getting dropped or the threads/timers responsible for
sending or processing these packets are starved of CPU.  It would be
useful to monitor interface activity with "netstat -d -I <ifnet>"
during the test.

# sysctl hw.model hw.ncpu hw.physmem
# netstat -dw1 -I cxl0

Try the settings listed below.  nrxq=X might help in case the driver
RX threads are hogging all the cores because all rx queues are heavily
loaded.  Set nrxq to something less than the number of cores in the
system.  rx_budget can be changed any time (try 64, 128, 256) and
might improve the responsiveness of the rest of the system during
load.

(in loader.conf)
hw.cxgbe.nrxq=2                 (3 if you've patched the kernel and set norssq)
hw.cxgbe.rsrv_noflowq=1
hw.cxgbe.pause_settings=0
hw.cxgbe.cong_drop=1            (2 would be better but needs a recent driver)
hw.cxgbe.rsrv_norssq=1          (needs patch)
hw.cxgbe.rx_budget=128          (needs patch)

Let us know how it goes.

Regards,
Navdeep

On Thu, Jun 29, 2023 at 5:53 AM <Josef.Zahner1@swisscom.com> wrote:
>
> Can you tell me which netstat command you have in mind? I tried “netstat -Q”, it shows a few drops but not that much that it would explain the CARP drops. What I can tell you is, that especially CARP on the corresponding server is just sending out packets as long as it is the master box and CPU0 load is below 100%. It doesn’t receive any CARP traffic at all, just normal network traffic. What I see is, that those CARP packets are not sent anymore if CPU0 has 100% load -> if that happens the server switches to standby and the traffic is gone on the machine. So because of this behavior we would like to have an option, which allows us to have Control Plane Traffic (LACP, CARP,…?) in RSS RX queue 0 and nothing else. Question is, what would Control Plane traffic be. Hopefully as well CARP/VRRP,…
>
> We tried hw.cxgbe.cong_drops=1, but it doesn’t help in our case.
>
> Can you explain a bit what your patch will do? Am I right that you will post the link later on here?
>
> Cheers Josef