Re: igc problems with heavy traffic (update)

From: John Fieber <jrf_at_ursamaris.org>
Date: Mon, 26 Sep 2022 16:49:39 UTC
> On Sep 26, 2022, at 5:57 AM, mike tancsa <mike@sentex.net> wrote:
> 
> On 9/24/2022 8:30 PM, John Fieber wrote:
>>>> On Sep 14, 2022, at 8:03 AM, mike tancsa <mike@sentex.net> wrote:
>>> 
>>> OK, an update hence the top post. I got a new pair of boxes which use a different Jasper Lake chipset and have i226-V vs the i225 of the previous box.
>>> 
>>> dev.igc.0.%parent: pci2
>>> dev.igc.0.%pnpinfo: vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000 class=0x020000
>>> dev.igc.0.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PC00.RP05.PXSX
>>> dev.igc.0.%driver: igc
>>> dev.igc.0.%desc: Intel(R) Ethernet Controller I226-V
>>> dev.igc.%parent:
>>> 
>>> WIth a default RELENG_13, out of the box with no tweaks, I am NOT able to cause the transmitting nic to bounce with heave traffic. I used the same test script (a constant stream of iperf3 alternating in direction) maxing out the NIC's bandwidth and all seems fine running the test for some 18hrs.  Maybe something different about the i225 version of this NIC that needs some different driver defaults ?
>>> 
>>>     ---Mike
>>> 
>> I also see this behavior with 13.1-RELEASE-p2 on:
>> 
>> These, however, offer unflappable performance:
>> 
>> - FreeBSD-14.0-CURRENT-amd64-20220923
>> - vyos-1.4 (for reference, what I mostly use on this hardware, via bhyve)
>> 
> Interesting, so just to confirm, the same hardware i225, the igc under 14.x does not see link drops under heavy load ?  I wonder what the difference is, since the driver does not seem to be different ?

Correct.  NIC chips are SLNMH, B3 stepping. All the other OS versions started dropping within a minute. 14-CURRENT ran for an hour at about 2.3 gigabit without a single hiccup.  All the tests were done with a fresh OS install in bhyve (with pci passthrough for the nics) followed by “pkg install iperf3” and no other tweaks. The switch port was configured with flow control and EEE off in all cases.

I ran the tests with the test OS as the iperf client, with -R and -P4 arguments. -R was always the quickest failure path. The hour long run on 14-CURRENT was with --bidi, after -R alone failed to fail after five minutes.

The hardware in question is my spare one of three identical 4x2.5g ali-express celeron J4125 boxes, and as such open for more and/or longer experiments. (My spare time to fiddle is a bigger constraint.)

-john