svn commit: r214866 - stable/8/sys/netinet

Lawrence Stewart lstewart at FreeBSD.org
Sat Nov 6 10:31:52 UTC 2010


Author: lstewart
Date: Sat Nov  6 10:31:52 2010
New Revision: 214866
URL: http://svn.freebsd.org/changeset/base/214866

Log:
  MFC r213913:
  
  Retire the system-wide, per-reassembly queue segment limit. The mechanism is far
  too coarse grained to be useful and the default value significantly degrades TCP
  performance on moderate to high bandwidth-delay product paths with non-zero loss
  (e.g. 5+Mbps connections across the public Internet often suffer).
  
  Replace the outgoing mechanism with an individual per-queue limit based on the
  number of MSS segments that fit into the socket's receive buffer.  This should
  strike a good balance between performance and the potential for resource
  exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer
  autotuning (which is enabled by default), the reassembly queue tracks the socket
  buffer and benefits too.
  
  As the XXX comment suggests, my testing uncovered some unexpected behaviour
  which requires further investigation. By using so->so_rcv.sb_hiwat instead of
  sbspace(&so->so_rcv), we allow more segments to be held across both the socket
  receive buffer and reassembly queue than we probably should. The tradeoff is
  better performance in at least one common scenario, versus a devious sender's
  ability to consume more resources on a FreeBSD receiver.
  
  Sponsored by:	FreeBSD Foundation
  Reviewed by:	andre, gnn, rpaulo

Modified:
  stable/8/sys/netinet/tcp_reass.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)

Modified: stable/8/sys/netinet/tcp_reass.c
==============================================================================
--- stable/8/sys/netinet/tcp_reass.c	Sat Nov  6 10:26:49 2010	(r214865)
+++ stable/8/sys/netinet/tcp_reass.c	Sat Nov  6 10:31:52 2010	(r214866)
@@ -92,12 +92,6 @@ SYSCTL_VNET_PROC(_net_inet_tcp_reass, OI
     &VNET_NAME(tcp_reass_qsize), 0, &tcp_reass_sysctl_qsize, "I",
     "Global number of TCP Segments currently in Reassembly Queue");
 
-static VNET_DEFINE(int, tcp_reass_maxqlen) = 48;
-#define	V_tcp_reass_maxqlen		VNET(tcp_reass_maxqlen)
-SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxqlen, CTLFLAG_RW,
-    &VNET_NAME(tcp_reass_maxqlen), 0,
-    "Maximum number of TCP Segments per individual Reassembly Queue");
-
 static VNET_DEFINE(int, tcp_reass_overflows) = 0;
 #define	V_tcp_reass_overflows		VNET(tcp_reass_overflows)
 SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, overflows, CTLFLAG_RD,
@@ -197,13 +191,23 @@ tcp_reass(struct tcpcb *tp, struct tcphd
 		goto present;
 
 	/*
-	 * Limit the number of segments in the reassembly queue to prevent
-	 * holding on to too many segments (and thus running out of mbufs).
-	 * Make sure to let the missing segment through which caused this
-	 * queue.
+	 * Limit the number of segments that can be queued to reduce the
+	 * potential for mbuf exhaustion. For best performance, we want to be
+	 * able to queue a full window's worth of segments. The size of the
+	 * socket receive buffer determines our advertised window and grows
+	 * automatically when socket buffer autotuning is enabled. Use it as the
+	 * basis for our queue limit.
+	 * Always let the missing segment through which caused this queue.
+	 * NB: Access to the socket buffer is left intentionally unlocked as we
+	 * can tolerate stale information here.
+	 *
+	 * XXXLAS: Using sbspace(so->so_rcv) instead of so->so_rcv.sb_hiwat
+	 * should work but causes packets to be dropped when they shouldn't.
+	 * Investigate why and re-evaluate the below limit after the behaviour
+	 * is understood.
 	 */
 	if (th->th_seq != tp->rcv_nxt &&
-	    tp->t_segqlen >= V_tcp_reass_maxqlen) {
+	    tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
 		V_tcp_reass_overflows++;
 		TCPSTAT_INC(tcps_rcvmemdrop);
 		m_freem(m);


More information about the svn-src-stable-8 mailing list