TCP_sends_9KB_segments_via_netgraph_tunnel_despite_MTU/ MSS_—_TSO-related?

From: Ivan <email_at_nigge.ru>
Date: Wed, 14 May 2025 19:45:27 UTC
Hello,

I've been investigating a network issue that took quite some time to trace. I still cannot reproduce it in a test environment, but it consistently occurs on a specific FreeBSD server with a more complex network configuration.

Summary of the issue:  
Under certain conditions, the system attempts to send TCP packets larger than 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS was negotiated to 1400.

This happens when the initial route is via the default uplink, but PF then re-routes the packet via the netgraph tunnel using `route-to`. If the traffic is routed through ng0 directly (without PF), the issue does not occur. The problem also disappears if TSO is disabled on the uplink NIC.

System:
  FreeBSD 13.5-RELEASE
  releng/13.5-n259162-882b9f3f2218 GENERIC amd64

Interfaces:

- Primary LAN interface (where disabling TSO fixes the problem):
    igb0, MTU 1500  
    options=4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,
                    VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,
                    RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Internet uplink:
    onp, VLAN over igb0, MTU 1500  
    options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Netgraph tunnel:
    ng0, MTU 1472  
    inet 10.10.0.1 → 10.10.0.2

PF rules used for re-routing:
    nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) round-robin
    pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA keep state tagged NG

Packet trace (via pflog during a POST request ~10KB to YouTube):

    15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], seq 597:9703, length 9106
    15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472)

This shows the kernel trying to send a 9106-byte segment over a link that clearly can't handle it. The MSS was already negotiated at 1400, so this seems unexpected. The ICMP response is generated locally. The result is segment loss, out-of-order retransmissions, and poor TLS performance.

I also reproduced this behavior with OpenVPN — so the issue is not netgraph-specific.

Questions:
- Is this expected behavior due to TSO interacting poorly with PF route-to?
- Should TSO respect the effective MTU based on the post-PF routing decision?
- Or is this a bug in the TCP offload path?

Thanks in advance for any insights.