X710 stalled TX Queue and loss of networking

From: Alex Shalima <alex_at_hotelwifi.com>
Date: Mon, 26 Feb 2024 18:25:26 UTC
Hello,

DATA
We are running FreeBSD 13.2-RELEASE-p9 #25 on top of several Dell R650 (Example Service Tag: 8FKQRY3). The system is running bhyve for other FreeBSD Virtual Machines.

All these servers have X710-DA4 Fiber Network cards (4 port of SFP+).
dev.ixl.0.%desc: Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.3.3-k
dev.ixl.0.fw_version: fw 9.840.76614 api 1.15 nvm 9.40 etid 8000e9b5 oem 22.5632.7

Some servers have an additional X710-DA2 (same card but with 2 ports) for extra fiber ports.


ISSUE
Periodically, the networking will stop working on individual interfaces. During packet capture we can see that the networking card is receiving traffic, but no traffic is being set out. During further investigation we found that ixl interface TX queue is getting into STALLED mode.

[user@server ~]$ sysctl dev.ixl | grep ring_state
dev.ixl.5.iflib.txq0.ring_state: pidx_head: 0751 pidx_tail: 0751 cidx: 0751 state: IDLE
dev.ixl.4.iflib.txq0.ring_state: pidx_head: 1254 pidx_tail: 1254 cidx: 1254 state: IDLE
dev.ixl.3.iflib.txq0.ring_state: pidx_head: 1193 pidx_tail: 1193 cidx: 1195 state: STALLED
dev.ixl.2.iflib.txq0.ring_state: pidx_head: 0000 pidx_tail: 0000 cidx: 0000 state: IDLE
dev.ixl.1.iflib.txq0.ring_state: pidx_head: 1393 pidx_tail: 1393 cidx: 1395 state: STALLED
dev.ixl.0.iflib.txq0.ring_state: pidx_head: 0181 pidx_tail: 0181 cidx: 0183 state: STALLE


RESOLUTIONS TRIED

  *   Factory resetting the system (not a permanent fix, issue comes back)
  *   Recreating Netowrking interfaces invluding VLANs (not a permanent fix, issue comes back)
  *   Updating the driver with Dell iDRAC to the latest official


QUESTION
Is there anything else we can try to get this permanently resolved?


Best Regards,
Alex