[patch] Path MTU Discovery when routing over IPSec connections

Tom Judge tom at tomjudge.com
Wed Nov 15 10:06:06 UTC 2006


I have been looking into some problems with PMTU Discovery when routing 
packets over IPSec (gif) tunnels, I have submitted the details to the 
open PR kern/91412 but have had no response as to whether my patch is 
the correct solution to the problem.

The problem occurs when sys/netinet/ip_input.c constructs the ICMP Host 
Unreachable message with an MTU hint.  Triggered when a packet that is 
to be routed over the IPSec link is larger than the MTU on the link and 
has the Don't Fragment bit set.  There is a block of code that is 
specific to IPSec (gif) MTU discovery which attempts to calculate the 
MTU size of the link by working out the size of the IPSec header and 
subtracting this from the MTU of the transmission interface and the gif 
IP header.  However the code fails to retrieve a fully populated 
security policy which means that the code block designed to calculate 
the MTU never gets run. The code then breaks out of the case statement 
and transmits the ICMP packet with the MTU hint set to 0.  If the break 
is removed from IPSec code the MTU calculation is carried out by the non 
IPSec code successfully.  This begs the question whether the IPSec code 
is even needed as the normal code works fine?


Network Layout:

Box 1 --------- Router 2 --(Ipsec tunnel)-- Router 3 --(lan) --- Box 2
	|(lan)
	|------ Router 1


Box 1: FreeBSD 5.4
Router [123]: FreeBSD 6.1
Box 2: Linux 2.6


Tests to reproduce the error:

PING Test from box 1 to box 2 with do not fragment set and a packet 
larger than the path MTU:

box1# ping -s 1280 -D box2
PING box2 (10.0.0.79): 1280 data bytes
36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 051c b454   0 0000  40  01 c9fc 172.17.1.48  10.0.0.79

36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 0)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 1c05 b454   0 0000  3f  01 cafc 172.17.1.48  10.0.0.79

36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 051c b45f   0 0000  40  01 c9f1 172.17.1.48  10.0.0.79

36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 0)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 1c05 b45f   0 0000  3f  01 caf1 172.17.1.48  10.0.0.79

^C
--- box2 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

PING Test from box 1 to box 2 with do not fragment set and a packet 
smaller than the path MTU:

box1# ping -s 1200 -D box2
PING box2 (10.0.0.79): 1200 data bytes
36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 04cc b472   0 0000  40  01 ca2e 172.17.1.48  10.0.0.79

1208 bytes from 10.0.0.79: icmp_seq=0 ttl=61 time=111.017 ms
36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 04cc b479   0 0000  40  01 ca27 172.17.1.48  10.0.0.79

1208 bytes from 10.0.0.79: icmp_seq=1 ttl=61 time=110.419 ms
^C
--- box2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 110.419/110.718/111.017/0.299 ms
box1#


Relevent interface configuration on box1 (from ifconfig):

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet 172.17.1.48 netmask 0xffff0000 broadcast 172.17.255.255
        ether 00:0f:1f:fa:d1:b5
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active


Relevent interface configuration on router2 (from ifconfig):

em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet 172.17.3.6 netmask 0xffff0000 broadcast 172.17.255.255
        ether 00:c0:9f:12:13:1b
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
        tunnel inet 63.174.xxx.xxx --> 82.195.xxx.xxx
        inet 192.168.174.10 --> 192.168.174.9 netmask 0xfffffffc



Patch:

Index: sys/netinet/ip_input.c
===================================================================
--- sys/netinet/ip_input.c      (revision 24)
+++ sys/netinet/ip_input.c      (working copy)
@@ -1990,8 +1990,8 @@
  #else /* FAST_IPSEC */
                                 KEY_FREESP(&sp);
  #endif
-                               ipstat.ips_cantfrag++;
-                               break;
+//                             ipstat.ips_cantfrag++;
+//                             break;
                         }
                 }
  #endif /*IPSEC || FAST_IPSEC*/



Tests after the patch has been applied:

PING Test from box 1 to box 2 with do not fragment set and a packet 
larger than the path MTU:

box1# ping -s 1280 -D box2
PING box2 (10.0.0.79): 1280 data bytes
36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 051c b454   0 0000  40  01 c9fc 172.17.1.48  10.0.0.79

36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 1280)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 1c05 b454   0 0000  3f  01 cafc 172.17.1.48  10.0.0.79

36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 051c b45f   0 0000  40  01 c9f1 172.17.1.48  10.0.0.79

36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 1280)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 1c05 b45f   0 0000  3f  01 caf1 172.17.1.48  10.0.0.79


Any comments suggestions on this would be greatly appreciated.

Tom J




More information about the freebsd-hackers mailing list