FreeBSD Window updates

Andre Oppermann andre at freebsd.org
Sun Nov 30 15:11:21 PST 2008


David Malone wrote:
> I've got an example extract tcpdump of this at the end of the mail
> - here 6 ACKs are sent, 5 of which are pure window updates and
> several are 2us apart!
> 
> I think the easy option is to delete the code that generates explicit
> window updates if the window moves by 2*MSS. We then should be doing
> something similar to Linux. The other easy alternative would be to
> add a sysclt that lets us generate an window update every N*MSS and
> by default set it to something big, like 10 or 100. That should
> effectively eliminate the updates during bulk data transfer, but
> may still generate some window updates after a loss.

The main problem of the pure window update test in tcp_output() is
its complete ignorance of delayed ACKs.  Second is the strict 4.4BSD
adherence to sending an update for every window increase of >= 2*MSS.
The third issue of sending a slew of window updates after having
received a FIN (telling us the other end won't ever send more data)
I have already fixed some moons ago.

In my new-tcp work I've come across the window update logic some time
ago and backchecked with relevant RFCs and other implementations.
Attached is a compiling but otherwise untested backport of the new logic.

-- 
Andre

> Normal ACKing for driving congestion control shouldn't be impacted
> by either of these suggested changes.
> 
> 	David.
> 
> 1227622713.276609 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40798 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.276611 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40830 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.276613 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40862 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.276615 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40894 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.276852 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40926 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.276855 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144097745 win 40958 <nop,nop,timestamp 5365425 5103763> (DF)
> 1227622713.296585 172.16.2.2.5002 > 172.16.1.51.39077: . ack 144100641 win 40956 <nop,nop,timestamp 5365445 5103766> (DF)

-------------- next part --------------
Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.158
diff -u -p -r1.158 tcp_output.c
--- tcp_output.c	27 Nov 2008 13:19:42 -0000	1.158
+++ tcp_output.c	30 Nov 2008 22:58:46 -0000
@@ -539,14 +539,32 @@ after_sack_rexmit:
 	}
 
 	/*
-	 * Compare available window to amount of window
-	 * known to peer (as advertised window less
-	 * next expected input).  If the difference is at least two
-	 * max size segments, or at least 50% of the maximum possible
-	 * window, then want to send a window update to peer.
+	 * Compare available window to amount of window known to peer
+	 * (as advertised window less next expected input) and decide
+	 * if we have to send a pure window update segment.
+	 *
+	 * When a delayed ACK is scheduled, do nothing.  It will update
+	 * the window anyway in a few milliseconds.
+	 *
+	 * If the receive socket buffer has less than 1/4 of space
+	 * available and if the difference is at least two max size
+	 * segments, send an immediate window update to peer.
+	 *
+	 * Otherwise if the difference is 1/8 (or more) of the receive
+	 * socket buffer, or at least 1/2 of the maximum possible window,
+	 * then we send a window update too.
+	 *
 	 * Skip this if the connection is in T/TCP half-open state.
 	 * Don't send pure window updates when the peer has closed
 	 * the connection and won't ever send more data.
+	 *
+	 * See RFC793, Section 3.7, page 43, Window Management Suggestions
+	 * See RFC1122: Section 4.2.3.3, When to Send a Window Update
+	 *
+	 * Note: We are less aggressive with sending window update than
+	 * recommended in RFC1122.  This is fine with todays large socket
+	 * buffers and will not stall the peer.  In addition we piggy back
+	 * window update on regular ACKs and sends.
 	 */
 	if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
 	    !TCPS_HAVERCVDFIN(tp->t_state)) {
@@ -554,14 +572,24 @@ after_sack_rexmit:
 		 * "adv" is the amount we can increase the window,
 		 * taking into account that we are limited by
 		 * TCP_MAXWIN << tp->rcv_scale.
+		 *
+		 * NB: adv must be equal or larger than the smallest
+		 * unscaled window increment.
 		 */
 		long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
 			(tp->rcv_adv - tp->rcv_nxt);
 
-		if (adv >= (long) (2 * tp->t_maxseg))
-			goto send;
-		if (2 * adv >= (long) so->so_rcv.sb_hiwat)
-			goto send;
+		if (!(tp->t_flags & TF_DELACK) &&
+		    adv >= (long)0x1 << tp->rcv_scale) {
+			if (recwin <= (long)(so->so_rcv.sb_hiwat / 4) &&
+			    adv >= (long)(2 * tp->t_maxseg))
+				goto send;
+			if (adv >= (long)(so->so_rcv.sb_hiwat / 8) &&
+			    adv >= (long)tp->t_maxseg)
+				goto send;
+			if (2 * adv >= (long)so->so_rcv.sb_hiwat)
+				goto send;
+		}
 	}
 
 	/*


More information about the freebsd-net mailing list