Re: Intermittent failure of routing/gateway with ix(4) (x86_64)

From: Karl Denninger <karl_at_denninger.net>
Date: Sun, 24 Aug 2025 02:12:32 UTC
On 8/23/2025 9:31 PM, R Tyler Croy wrote:
> (replies inline)
>
>
> On Saturday, August 23rd, 2025 at 6:24 PM, Karl Denninger<karl@denninger.net> wrote:
>
>> What physical hardware is on that driver?
>>
>> I have a box here with two ix interfaces in it that is my edge router and beat the SNOT out of it without problems. This is what the boot messages are for them in my machine here:
> These are both identical 10GigE NICs, from dmesg
>
> ix0: <Intel(R) X540-AT2> mem 0xe0400000-0xe05fffff,0xe0600000-0xe0603fff irq 18 at device 0.0 on pci1
> ix0: Using 2048 TX descriptors and 2048 RX descriptors
> ix0: Using 2 RX queues 2 TX queues
> ix0: Using MSI-X interrupts with 3 vectors
> ix0: allocated for 2 queues
> ix0: allocated for 2 rx queues
> ix0: Ethernet address: a0:36:9f:38:44:a8
> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
> ix0: fw 4.2.0 nvm 4.03.0 eTrack 0x8000037c
> ix0: netmap queues/slots: TX 2/2048, RX 2/2048
>
>
> System: FreeBSD 14.2-RELEASE-p4
>
> When I was chasing that arpresolve warning I saw in the console, I did see some discussion like this (https://be-virtual.net/pfsense-arpresolve-cant-allocate-llinfo-for-x-x-x-x-on-emx/) about funky routers on the other end of the link causing trouble. Since this is a newer fiber rollout from my local ISP (Sonic) I wouldn't be surprised if there was something funky happening there. Since the LAN-side routing goes haywire, I'm thinking that's a red herring.

Probably.  Do you have TSO and LRO off on at least the external 
interface?  You might try turning those off on BOTH interfaces.  When it 
goes dead is it Ip4, Ip6 or both?   I'm on 14.3 but as you can see the 
kernel is a couple of months old.

I had some funky-chicken stuff going on here with Ip6 for a while with 
my ISP (Gigabit fiber) when I switched to them from former cable 
service; the inside interface on my end is on 10Gig SFP+ (Twinax to the 
switch) where the external is a copper SFP transceiver to the back of 
the ONT.  I got it sorted; they were being rather persnickety (and odd) 
in that their end "married" the duid presented to the MAC and my setup 
runs all out of RAM (nanobsd, thus power safe) and thus by default duid 
is not invariant across boots.  It only impacted Ip6; it would not get a 
delegation after the first time it booted unless they reset their end, 
but Ip4 was unaffected -- apparently they have completely different 
infrastructure on their end between the stacks when it comes to 
delegation.  I was able to fix that by making sure the duid was in fact 
invariant but it took a good long while to figure out what was going on, 
plus they also are a bit bizarre in how they send down route 
advertisements to my end.  Its been stable now for ~2 months; last reset 
on that box was some 60 days ago with no network disruptions and I run a 
fair bit of video streaming out that connection to users on the Internet 
at-large (the actual source is on the inside network link through a 
Mellanox card into the switch at 10G.)

My level of success with RealTek 2.5Gb NICs in this service has not been 
good at all but the ix driver has not caused any trouble.  I saw plenty 
of issue with arp mappings for the ISP side of the link just flat-out 
*disappearing* with the RealTek interfaces from time to time; downing 
and uping the interface would clear it but obviously that's very bad 
news and iperf3 into those interfaces would also show ridiculous retry 
counts thus jitter to stream watchers was off-the-rack.  I know the 
issue wasn't on the ISP side because an older pcEngines box, while not 
able to saturate the link, had zero trouble (it has Intel chipset 
gigabit interfaces in it) as does what I'm using now on the ix interfaces.

-- 
Karl Denninger
karl@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/