Re: Regression with pf or IPv6 on FreeBSD 14 with IPsec gif(4) tunnel

From: Xin Li <delphij_at_delphij.net>
Date: Fri, 15 Sep 2023 05:38:57 UTC
On 2023-09-14 3:28 AM, Kristof Provost wrote:
> On 14 Sep 2023, at 4:54, Xin Li wrote:
>> Hi!
>>
>> I recently upgraded my home router and found that there is some regression related to pf or IPv6.
>>
>> When attempting to connect an IPv6 TCP service, process would enter a seemingly unkillable state (the stack varies but always begins with write, so it seems that tailscale was trying to send some packet to the server), until racoon is killed and restarted (at which point the connection would be dropped).
>>
>> tcpdump over the gif(4) channel captured a lot of seemingly duplicated packets like this:
>>
>> 03:40:50.088262 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 4225, win 129, options [nop,nop,TS val 2817088580 ecr 3077807235], length 1328
>> 03:40:50.088332 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 4225, win 129, options [nop,nop,TS val 2817088581 ecr 3077807235], length 1328
>> [identical except timestamp]
>> 03:40:50.089107 IP6 LOCAL.16275 > REMOTE.443: Flags [.], seq 1619:2947, ack 4225, win 129, options [nop,nop,TS val 2817088581 ecr 3077807235], length 1328
>>
>> Am I the only person who is seeing this?  (Admittedly my setup is somewhat unique; my home ISP doesn't provide IPv6 service, so I have a gif(4) tunnel to my datacenter, which connects to Hurricane Electric's IPv6 tunnel service and basically routes my IPv6 traffic to that tunnel.  Earlier I discovered that some IPv6 connectivity issues were related to MTU being too big (1480; reduced to 1400 now) but the unkillable IPv6 applications was new and only happened on 14.x)
>>
> 
> That doesn’t immediately ring any bells, no.
> 
> Are you using route-to anywhere? There’s been a change (829a69db855b48ff7e8242b95e193a0783c489d9) that has some potential to affect uncommon setups, but right now I’m just guessing.

No.  Actually my IPv6 related rule was quite simple:

pass in quick inet6 proto tcp from <myv6> to any flags S/SA keep state
block in quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA
block out quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA

(where myv6 is the /48 prefix from Hurricane Electric tunnel).

The rest of rules were mostly NAT rules that translates my internal IPv4 
addresses to the WAN interface IP and a set of rule to block IPs passed 
from sshguard:

block in quick on $ext_if proto tcp from <sshguard> to any port 22 label 
"ssh bruteforce"


> I’d recommend tcpdump-ing the wan link at the same time as the gif tunnel so you can work out if the packets are being dropped locally or remotely. Or you can try adding ‘log’ statements to the pf rules and using pflog to figure out if/why packets are being dropped.

The gif tunnel's traffic is significantly larger than the WAN port 
(captured ~2.5GB on gif, and only 69kb on WAN related to the tunnel with 
'host <tunnelIP>' as filter), so it's _probably_ something that gets 
dropped locally.

The change between stable/13 and stable/14 appeared to be quite harmless 
(it's mostly marius@'s e82d7b2952afaf9625a3d7b05d03c43c1d3e7d9c and I 
can't think it could caused this).

And as a shoot to the dark, I tried again with IPsec (racoon) disabled, 
and the issue is gone.  My IPsec configuration is fairly common:

===
flush;
spdflush;

spdadd WAN_IP4 REMOTE_IP4 any -P out ipsec esp/transport//require;
spdadd REMOTE_IP4 WAN_IP4 any -P in ipsec esp/transport//require;
===

I'm still comparing the code and reading the history of changes between 
stable/13 and stable/14 to see if there are something obvious, but more 
insights from others would be appreciated :)

Cheers,