kern/90155: use sysctl(8) to control hardware checksumming.

Paul Kern pak at cns.utoronto.ca
Fri Dec 9 11:40:09 PST 2005


>Number:         90155
>Category:       kern
>Synopsis:       use sysctl(8) to control hardware checksumming.
>Confidential:   no
>Severity:       non-critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Fri Dec 09 19:40:02 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     pak
>Release:        FreeBSD 5.4-RELEASE i386
>Organization:
CNS, University of Toronto
>Environment:
System: FreeBSD leftrb.utcs 5.4-RELEASE FreeBSD 5.4-RELEASE #5: Tue Oct 11 14:04:00 EDT 2005 pak at leftrb.utcs:/usr/src/sys/i386/compile/FILBERT i386


>Description:

	When ethernet bridging with a NIC that can handle hardware
	checksumming, the NIC will corrupt the checksums of the bridge
	system's own packets if those packets have to "cross the bridge"
	to get to their destination.  The checksums of those packets are
	mistakenly recalculated in the NIC hardware.  One fix was to
	make it an option to calculate checksums in hardware.
	See also "kern/57100" and "kern/63982".

>How-To-Repeat:

	Use two systems: a bridging system and a detecting system.

	Configure an ethernet bridge machine with at least one NIC
	capable of calculating checksums in hardware (eg. em(4)).

	Configure an ordinary system positioned in the network so as to
	be on the opposite side of the bridging system's primary NIC.

			----------	# <-- bridged network segment
			| bridge |	#
	{  the	}	|        |	#	---------
	{outside}======[1]      [2]=====+	| plain |
	{ world	}	----------	#	|       |
					+=======+       |
					#	---------

	Using the crude diagram above, configure NIC #1 on {bridge} to
	be its primary interface (ie. the NIC that it uses for its main
	communication needs).  Configure NIC #2 with an RFC1918 IP
	address (eg. 10.9.8.7).

	Start tcpdump on {plain}.

	From {plain} try to connect to {bridge}'s primary IP address
	(eg. using ssh).  These connection attempts should fail.

	Tcpdump will report that packets returning to {plain} from
	{bridge} will have bad checksums. In effect, the return packets
	have to "cross the bridge" and are subjected to an extra
	checksum step (once upon leaving {bridge} via NIC#1 and once
	again on being bridged from NIC#1 to NIC#2).

	Throughout this, {plain} is still able to communicate with
	the outside world through {bridge}.

>Fix:

	Make it an option to calculate checksums in NIC hardware.

	This patch adds a sysctl variable "net.inet.ip.disable_hwassist"
	with a default value of '0' (ie. to allow hardware checksums
	when possible).  Setting the variable to a non-zero value would
	disable the use of hardware checksums in all NICs on the system.

	(It would be better if hardware checksums could be controlled
	 for each individual NIC instead of using a system-wide flag,
	 but that would need a much bigger patch.)

	===================================================================
	RCS file: /usr/src/sys/netinet/RCS/ip_var.h,v
	retrieving revision 1.1
	diff -u -r1.1 /usr/src/sys/netinet/ip_var.h
	--- /usr/src/sys/netinet/ip_var.h	2005/07/29 15:08:08	1.1
	+++ /usr/src/sys/netinet/ip_var.h	2005/07/29 17:18:11
	@@ -209,6 +209,9 @@
		return htons(ip_id++);
	 }
	 
	+extern int	disable_hwassist;
	+#define HWASSIST(ifp)	(disable_hwassist ? 0 : (ifp)->if_hwassist)
	+
	 #endif /* _KERNEL */
	 
	 #endif /* !_NETINET_IP_VAR_H_ */
	===================================================================
	RCS file: /usr/src/sys/netinet/RCS/ip_fastfwd.c,v
	retrieving revision 1.1
	diff -u -r1.1 /usr/src/sys/netinet/ip_fastfwd.c
	--- /usr/src/sys/netinet/ip_fastfwd.c	2005/07/29 14:25:43	1.1
	+++ /usr/src/sys/netinet/ip_fastfwd.c	2005/07/29 15:07:58
	@@ -540,7 +540,7 @@
			mtu = ifp->if_mtu;
	 
		if (ip->ip_len <= mtu ||
	-	    (ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
	+	    ((HWASSIST(ifp)) & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
			/*
			 * Restore packet header fields to original values
			 */
	@@ -569,8 +569,8 @@
				 * ip_fragment expects ip_len and ip_off in host byte
				 * order but returns all packets in network byte order
				 */
	-			if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
	-					(~ifp->if_hwassist & CSUM_DELAY_IP))) {
	+			if (ip_fragment(ip, &m, mtu, (HWASSIST(ifp)),
	+					(~(HWASSIST(ifp)) & CSUM_DELAY_IP))) {
					goto drop;
				}
				KASSERT(m != NULL, ("null mbuf and no error"));
	===================================================================
	RCS file: /usr/src/sys/netinet/RCS/ip_output.c,v
	retrieving revision 1.1
	diff -u -r1.1 /usr/src/sys/netinet/ip_output.c
	--- /usr/src/sys/netinet/ip_output.c	2005/07/28 20:02:12	1.1
	+++ /usr/src/sys/netinet/ip_output.c	2005/07/29 17:17:45
	@@ -92,6 +92,10 @@
		&mbuf_frag_size, 0, "Fragment outgoing mbufs to this size");
	 #endif
	 
	+int disable_hwassist = 0;
	+SYSCTL_INT(_net_inet_ip, OID_AUTO, disable_hwassist, CTLFLAG_RW,
	+	&disable_hwassist, 0, "Disable network interface hardware checksum capability");
	+
	 static struct mbuf *ip_insertoptions(struct mbuf *, struct mbuf *, int *);
	 static struct ifnet *ip_multicast_if(struct in_addr *, int *);
	 static void	ip_mloopback
	@@ -733,18 +737,18 @@
		}
	 
		m->m_pkthdr.csum_flags |= CSUM_IP;
	-	sw_csum = m->m_pkthdr.csum_flags & ~ifp->if_hwassist;
	+	sw_csum = m->m_pkthdr.csum_flags & ~(HWASSIST(ifp));
		if (sw_csum & CSUM_DELAY_DATA) {
			in_delayed_cksum(m);
			sw_csum &= ~CSUM_DELAY_DATA;
		}
	-	m->m_pkthdr.csum_flags &= ifp->if_hwassist;
	+	m->m_pkthdr.csum_flags &= (HWASSIST(ifp));
	 
		/*
		 * If small enough for interface, or the interface will take
		 * care of the fragmentation for us, can just send directly.
		 */
	-	if (ip->ip_len <= ifp->if_mtu || (ifp->if_hwassist & CSUM_FRAGMENT &&
	+	if (ip->ip_len <= ifp->if_mtu || ((HWASSIST(ifp)) & CSUM_FRAGMENT &&
		    ((ip->ip_off & IP_DF) == 0))) {
			ip->ip_len = htons(ip->ip_len);
			ip->ip_off = htons(ip->ip_off);
	@@ -793,7 +797,7 @@
		 * Too large for interface; fragment if possible. If successful,
		 * on return, m will point to a list of packets to be sent.
		 */
	-	error = ip_fragment(ip, &m, ifp->if_mtu, ifp->if_hwassist, sw_csum);
	+	error = ip_fragment(ip, &m, ifp->if_mtu, (HWASSIST(ifp)), sw_csum);
		if (error)
			goto bad;
		for (; m; m = m0) {


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list