TCP_sends_9KB_segments_via_netgraph_tunnel_despite_MTU/ MSS_—_TSO-related?
Date: Wed, 14 May 2025 19:45:27 UTC
Hello, I've been investigating a network issue that took quite some time to trace. I still cannot reproduce it in a test environment, but it consistently occurs on a specific FreeBSD server with a more complex network configuration. Summary of the issue: Under certain conditions, the system attempts to send TCP packets larger than 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS was negotiated to 1400. This happens when the initial route is via the default uplink, but PF then re-routes the packet via the netgraph tunnel using `route-to`. If the traffic is routed through ng0 directly (without PF), the issue does not occur. The problem also disappears if TSO is disabled on the uplink NIC. System: FreeBSD 13.5-RELEASE releng/13.5-n259162-882b9f3f2218 GENERIC amd64 Interfaces: - Primary LAN interface (where disabling TSO fixes the problem): igb0, MTU 1500 options=4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU, VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO, RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> - Internet uplink: onp, VLAN over igb0, MTU 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> - Netgraph tunnel: ng0, MTU 1472 inet 10.10.0.1 → 10.10.0.2 PF rules used for re-routing: nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) round-robin pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA keep state tagged NG Packet trace (via pflog during a POST request ~10KB to YouTube): 15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], seq 597:9703, length 9106 15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472) This shows the kernel trying to send a 9106-byte segment over a link that clearly can't handle it. The MSS was already negotiated at 1400, so this seems unexpected. The ICMP response is generated locally. The result is segment loss, out-of-order retransmissions, and poor TLS performance. I also reproduced this behavior with OpenVPN — so the issue is not netgraph-specific. Questions: - Is this expected behavior due to TSO interacting poorly with PF route-to? - Should TSO respect the effective MTU based on the post-PF routing decision? - Or is this a bug in the TCP offload path? Thanks in advance for any insights.