Re: Problem with net.inet.tcp.path_mtu_discovery=1

From: Michael Tuexen <michael.tuexen_at_lurchi.franken.de>
Date: Wed, 04 Jun 2025 19:06:39 UTC
> On 4. Jun 2025, at 19:29, Christos Chatzaras <chris@cretaforce.gr> wrote:
> 
> 
> 
>> On 4 Jun 2025, at 19:36, Dave Cottlehuber <dch@skunkwerks.at> wrote:
>> 
>> On Wed, 4 Jun 2025, at 16:36, Christos Chatzaras wrote:
>>> Hello,
>>> 
>>> I manage some servers hosting websites.
>> 
>> What does tcpdump/wireshark show for traffic, particularly icmp? Wireshark is very helpful in explaining some issues.
>> 
>> What is the actual MTU on the working net vs the failing one?
>> 
>> Is there a local MTU where the failing websites start working again?
>> 
>> see ping(8) and use -v -D -s …. together to find a working MTU and cross check with tcpdump to find where things seem to break.
>> 
>> On a recent cloud environment I needed to add ‘ set reassemble yes no-df’ to my pf.conf to address MTU issues between VNET jails and the internet.
>> 
>> Happy hunting
>> Dave
>> 
> 
> First, I reverted the server settings to their defaults:
> sysctl net.inet.tcp.path_mtu_discovery=1
> sysctl net.inet.tcp.pmtud_blackhole_detection=0
> 
> Next, I set the MTU on my local computer to 1460 and everything worked as expected:
> tcpdump: listening on en0, link-type EN10MB (Ethernet), snapshot length 524288 bytes
> 20:15:05.651375 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
>     192.168.2.18.65322 > 94.130.217.87.443: Flags [S], cksum 0x293e (correct), seq 3503095669, win 65535, options [mss 1420,nop,wscale 6,nop,nop,TS val 639376397 ecr 0,sackOK,eol], length 0
> 20:15:05.705913 IP (tos 0x0, ttl 54, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     94.130.217.87.443 > 192.168.2.18.65322: Flags [S.], cksum 0x9c22 (correct), seq 3647364942, ack 3503095670, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 1782053626 ecr 639376397], length 0
> 
> However, when I set my local computer’s MTU back to 1500 (the default), the issue reappeared:
> tcpdump: listening on en0, link-type EN10MB (Ethernet), snapshot length 524288 bytes
> 20:17:45.662993 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
>     192.168.2.18.65333 > 94.130.217.87.443: Flags [S], cksum 0x4a07 (correct), seq 3674289142, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 681359835 ecr 0,sackOK,eol], length 0
> 20:17:45.726988 IP (tos 0x0, ttl 54, id 0, offset 0, flags [DF], proto TCP (6), length 60)
>     94.130.217.87.443 > 192.168.2.18.65333: Flags [S.], cksum 0x9b1d (correct), seq 1443843488, ack 3674289143, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2890559459 ecr 681359835], length 0
> 
> So, with local computer MTU 1460, everything works, but with MTU 1500, the problem persists.
The difference is that you announce a smaller MSS in SYN segment you
sent. This means that the peer can only send you smaller TCP segments.

So there seems to be a problem if the peer sends too large TCP segments.
That means that the peer must do PMTUD or TCP blackhole detection, not
the local node.

Best regards
Michael