kern/118026: [PATCH]
Nikolay Govoruha
bardano at gmail.com
Tue Nov 13 14:00:03 PST 2007
>Number: 118026
>Category: kern
>Synopsis: [PATCH]
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Tue Nov 13 22:00:03 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator: Nikolay Govoruha
>Release: FreeBSD 6.2 Release
>Organization:
VITAL
>Environment:
FreeBSD plant.vital.dp.ua 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Tue Nov 13 00:26:02 UTC 2007 root at plant.vital.dp.ua:/usr/src/sys/i386/compile/VITAL i386
>Description:
It's a bug in the Path MTU Discovery technique - RFC1191 . When IPSEC option is turned on in the kernel configuration file the following behaviour is present. One host try to send an IP packet to other with size=1500 and DF (Do Not Fragment) bit set. Gateway - FreeBSD 6.2 Release - has a route for this packet with mtu=1408. net.inet.tcp.path_mtu_discovery: 1. Gateway can not transmit the packet to another gateway in this case. As an answer, Gateway sends an icmp packet to sender with type = ICMP_UNREACH (0x03) and code = ICMP_UNREACH_NEEDFRAG (0x04). But! Gateway does not set the mtu field in the packet. This field = 0x0000. tcpdump:
//*****************************************************************************
pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
09:42:39.379247 IP (tos 0x0, ttl 63, id 23385, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
IP (tos 0x0, ttl 126, id 60516, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c1 (->2ff3)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
0x0000: 4500 0038 5b59 4000 3f01 05a0 505d 761e
0x0010: 0a00 0a51 0304 9eab 0000 0000 4500 05d4
0x0020: ec64 4000 7e06 34c1 0a00 0a51 505d 761e
0x0030: 0669 152d 97bf a62c
09:42:39.379644 IP (tos 0x0, ttl 63, id 23386, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
IP (tos 0x0, ttl 126, id 60517, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c0 (->2ff2)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
0x0000: 4500 0038 5b5a 4000 3f01 059f 505d 761e
0x0010: 0a00 0a51 0304 98ff 0000 0000 4500 05d4
0x0020: ec65 4000 7e06 34c0 0a00 0a51 505d 761e
0x0030: 0669 152d 97bf abd8
//*****************************************************************************
>How-To-Repeat:
Try to use FTP connection for file transfer and see tcpdump - field "next hop mtu" - RFC1191.
>Fix:
I made the following patch to sys/netinet/ip_input.c and rebuild the kernel.
Original - Line 1948:
//*****************************************************************************
case EMSGSIZE:
type = ICMP_UNREACH;
code = ICMP_UNREACH_NEEDFRAG;
#if defined(IPSEC) || defined(FAST_IPSEC)
/*
* If the packet is routed over IPsec tunnel, tell the
* originator the tunnel MTU.
* tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz
* XXX quickhack!!!
*/
{
struct secpolicy *sp = NULL;
int ipsecerror;
int ipsechdr;
struct route *ro;
#ifdef IPSEC
sp = ipsec4_getpolicybyaddr(mcopy,
IPSEC_DIR_OUTBOUND,
IP_FORWARDING,
&ipsecerror);
#else /* FAST_IPSEC */
sp = ipsec_getpolicybyaddr(mcopy,
IPSEC_DIR_OUTBOUND,
IP_FORWARDING,
&ipsecerror);
#endif
if (sp != NULL) {
/* count IPsec header size */
ipsechdr = ipsec4_hdrsiz(mcopy,
IPSEC_DIR_OUTBOUND,
NULL);
/*
* find the correct route for outer IPv4
* header, compute tunnel MTU.
*/
if (sp->req != NULL
&& sp->req->sav != NULL
&& sp->req->sav->sah != NULL) {
ro = &sp->req->sav->sah->sa_route;
if (ro->ro_rt && ro->ro_rt->rt_ifp) {
mtu =
ro->ro_rt->rt_rmx.rmx_mtu ?
ro->ro_rt->rt_rmx.rmx_mtu :
ro->ro_rt->rt_ifp->if_mtu;
mtu -= ipsechdr;
}
}
#ifdef IPSEC
key_freesp(sp);
#else /* FAST_IPSEC */
KEY_FREESP(&sp);
#endif
ipstat.ips_cantfrag++;
break;
}
}
#endif /*IPSEC || FAST_IPSEC*/
/*
* If the MTU wasn't set before use the interface mtu or
* fall back to the next smaller mtu step compared to the
* current packet size.
*/
if (mtu == 0) {
if (ia != NULL)
mtu = ia->ia_ifp->if_mtu;
else
mtu = ip_next_mtu(ip->ip_len, 0);
}
ipstat.ips_cantfrag++;
break;
//*****************************************************************************
I used the printf() function to debug the problem. In my kernel was defined IPSEC. In my case sp = ipsec4_getpolicybyaddr(......) returned non-NULL value. But sp->req was NULL. In this case "if (sp != NULL){}" statement is executed, but mtu do not calculated, mtu stays equal zeroo, and at the end of the "if (sp != NULL){}" statement "break;" statement is present. So mtu stays equal zeroo and after "switch (error)" statement zeroo get to the "mtu" field to the icmp packet.
Is it a bug?
I resolved this problem by the following way:
//*****************************************************************************
#ifdef IPSEC
key_freesp(sp);
#else /* FAST_IPSEC */
KEY_FREESP(&sp);
#endif
//ipstat.ips_cantfrag++;
//break;
}
}
#endif /*IPSEC || FAST_IPSEC*/
/*
* If the MTU wasn't set before use the interface mtu or
* fall back to the next smaller mtu step compared to the
* current packet size.
*/
if (mtu == 0) {
if (ia != NULL)
mtu = ia->ia_ifp->if_mtu;
else
mtu = ip_next_mtu(ip->ip_len, 0);
}
ipstat.ips_cantfrag++;
break;
//*****************************************************************************
By comment the "break" statement and previous statement. In this case If mtu stays equal zeroo the following code is executed - the code that always executed when IPSEC and FAST_IPSEC are not defined. The tcpdump result:
//*****************************************************************************
pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
12:13:48.471242 IP (tos 0x0, ttl 63, id 20521, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
IP (tos 0x0, ttl 126, id 50667, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b32 (->5664)!) 10.0.10.81.1769 > 80.93.118.30.5421: tcp 1476 [bad hdr length 4 - too short, < 20]
0x0000: 4500 0038 5029 4000 3f01 10d0 505d 761e
0x0010: 0a00 0a51 0304 7408 0000 0580 4500 05dc
0x0020: c5eb 4000 7e06 5b32 0a00 0a51 505d 761e
0x0030: 06e9 152d 5af0 079f
12:13:48.471583 IP (tos 0x0, ttl 63, id 20522, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
IP (tos 0x0, ttl 126, id 50668, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b31 (->5663)!) 10.0.10.81.1769 > 80.93.118.30.5421: [|tcp]
0x0000: 4500 0038 502a 4000 3f01 10cf 505d 761e
0x0010: 0a00 0a51 0304 6e54 0000 0580 4500 05dc
0x0020: c5ec 4000 7e06 5b31 0a00 0a51 505d 761e
0x0030: 06e9 152d 5af0 0d53
//*****************************************************************************
Yo see "next hop mtu" field has correct value - 0x0580 = 1408 decimal.
Tell please, is this patch correct?
mailto:bardano at gmail.com
P.S. "bad cksum 5b31 (->5663)!) " it's a packet after natd, may be I have some incorrect natd configuration.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list