kern/118026: [PATCH]

Nikolay Govoruha bardano at gmail.com
Tue Nov 13 14:00:03 PST 2007


>Number:         118026
>Category:       kern
>Synopsis:       [PATCH]
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 13 22:00:03 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator:     Nikolay Govoruha
>Release:        FreeBSD 6.2 Release
>Organization:
VITAL
>Environment:
FreeBSD plant.vital.dp.ua 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Tue Nov 13 00:26:02 UTC 2007     root at plant.vital.dp.ua:/usr/src/sys/i386/compile/VITAL  i386
>Description:
It's a bug in the Path MTU Discovery technique - RFC1191 . When IPSEC option is turned on in the kernel configuration file the following behaviour is present. One host try to send an IP packet to other with size=1500 and DF (Do Not Fragment) bit set. Gateway - FreeBSD 6.2 Release - has a route for this packet with mtu=1408. net.inet.tcp.path_mtu_discovery: 1. Gateway can not transmit the packet to another gateway in this case. As an answer, Gateway sends an icmp packet to sender with type = ICMP_UNREACH (0x03) and code = ICMP_UNREACH_NEEDFRAG (0x04). But! Gateway does not set the mtu field in the packet. This field = 0x0000. tcpdump:

//*****************************************************************************

pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
09:42:39.379247 IP (tos 0x0, ttl  63, id 23385, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
        IP (tos 0x0, ttl 126, id 60516, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c1 (->2ff3)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 5b59 4000 3f01 05a0 505d 761e
        0x0010:  0a00 0a51 0304 9eab 0000 0000 4500 05d4
        0x0020:  ec64 4000 7e06 34c1 0a00 0a51 505d 761e
        0x0030:  0669 152d 97bf a62c
09:42:39.379644 IP (tos 0x0, ttl  63, id 23386, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
        IP (tos 0x0, ttl 126, id 60517, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c0 (->2ff2)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 5b5a 4000 3f01 059f 505d 761e
        0x0010:  0a00 0a51 0304 98ff 0000 0000 4500 05d4
        0x0020:  ec65 4000 7e06 34c0 0a00 0a51 505d 761e
        0x0030:  0669 152d 97bf abd8

//*****************************************************************************

>How-To-Repeat:
Try to use FTP connection for file transfer and see tcpdump - field "next hop mtu" - RFC1191.
>Fix:
I made the following patch to sys/netinet/ip_input.c and rebuild the kernel.

Original - Line 1948:

//*****************************************************************************

	case EMSGSIZE:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_NEEDFRAG;
#if defined(IPSEC) || defined(FAST_IPSEC)
		/*
		 * If the packet is routed over IPsec tunnel, tell the
		 * originator the tunnel MTU.
		 *	tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz
		 * XXX quickhack!!!
		 */
		{
			struct secpolicy *sp = NULL;
			int ipsecerror;
			int ipsechdr;
			struct route *ro;

#ifdef IPSEC
			sp = ipsec4_getpolicybyaddr(mcopy,
						    IPSEC_DIR_OUTBOUND,
						    IP_FORWARDING,
						    &ipsecerror);
#else /* FAST_IPSEC */
			sp = ipsec_getpolicybyaddr(mcopy,
						   IPSEC_DIR_OUTBOUND,
						   IP_FORWARDING,
						   &ipsecerror);
#endif
			if (sp != NULL) {
				/* count IPsec header size */
				ipsechdr = ipsec4_hdrsiz(mcopy,
							 IPSEC_DIR_OUTBOUND,
							 NULL);

				/*
				 * find the correct route for outer IPv4
				 * header, compute tunnel MTU.
				 */
				if (sp->req != NULL
				 && sp->req->sav != NULL
				 && sp->req->sav->sah != NULL) {
					ro = &sp->req->sav->sah->sa_route;
					if (ro->ro_rt && ro->ro_rt->rt_ifp) {
						mtu =
						    ro->ro_rt->rt_rmx.rmx_mtu ?
						    ro->ro_rt->rt_rmx.rmx_mtu :
						    ro->ro_rt->rt_ifp->if_mtu;
						mtu -= ipsechdr;
					}
				}

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				ipstat.ips_cantfrag++;
				break;
			}
		}
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * If the MTU wasn't set before use the interface mtu or
		 * fall back to the next smaller mtu step compared to the
		 * current packet size.
		 */
		if (mtu == 0) {
			if (ia != NULL)
				mtu = ia->ia_ifp->if_mtu;
			else
				mtu = ip_next_mtu(ip->ip_len, 0);
		}
		ipstat.ips_cantfrag++;
		break;

//*****************************************************************************

I used the printf() function to debug the problem. In my kernel was defined IPSEC. In my case sp = ipsec4_getpolicybyaddr(......) returned non-NULL value. But sp->req was NULL. In this case "if (sp != NULL){}" statement is executed, but mtu do not calculated, mtu stays equal zeroo, and at the end of the "if (sp != NULL){}" statement "break;" statement is present. So mtu stays equal zeroo and after "switch (error)" statement zeroo get to the "mtu" field to the icmp packet.

Is it a bug?

I resolved this problem by the following way:

//*****************************************************************************

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				//ipstat.ips_cantfrag++;
				//break;
			}
		}
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * If the MTU wasn't set before use the interface mtu or
		 * fall back to the next smaller mtu step compared to the
		 * current packet size.
		 */
		if (mtu == 0) {
			if (ia != NULL)
				mtu = ia->ia_ifp->if_mtu;
			else
				mtu = ip_next_mtu(ip->ip_len, 0);
		}
		ipstat.ips_cantfrag++;
		break;

//*****************************************************************************

By comment the "break" statement and previous statement. In this case If mtu stays equal zeroo the following code is executed - the code that always executed when IPSEC and FAST_IPSEC are not defined. The tcpdump result:

//*****************************************************************************

pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
12:13:48.471242 IP (tos 0x0, ttl  63, id 20521, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
        IP (tos 0x0, ttl 126, id 50667, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b32 (->5664)!) 10.0.10.81.1769 > 80.93.118.30.5421:  tcp 1476 [bad hdr length 4 - too short, < 20]
        0x0000:  4500 0038 5029 4000 3f01 10d0 505d 761e
        0x0010:  0a00 0a51 0304 7408 0000 0580 4500 05dc
        0x0020:  c5eb 4000 7e06 5b32 0a00 0a51 505d 761e
        0x0030:  06e9 152d 5af0 079f
12:13:48.471583 IP (tos 0x0, ttl  63, id 20522, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
        IP (tos 0x0, ttl 126, id 50668, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b31 (->5663)!) 10.0.10.81.1769 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 502a 4000 3f01 10cf 505d 761e
        0x0010:  0a00 0a51 0304 6e54 0000 0580 4500 05dc
        0x0020:  c5ec 4000 7e06 5b31 0a00 0a51 505d 761e
        0x0030:  06e9 152d 5af0 0d53

//*****************************************************************************

Yo see "next hop mtu" field has correct value - 0x0580 = 1408 decimal.

Tell please, is this patch correct?

mailto:bardano at gmail.com

P.S. "bad cksum 5b31 (->5663)!) " it's a packet after natd, may be I have some incorrect natd configuration.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list