FreeBSD Window updates

Andre Oppermann andre at freebsd.org
Mon Dec 1 13:11:36 PST 2008


Venkat Venkatsubra wrote:
> Hi Andre,
> 
> When delayed Ack is set the window update is not sent.
> Does this mean when odd number of packets are received and later read,
> a window update won't go out either till the next segment arrives or
> 200 msecs delayed ack timer ? Can this reduced window block the sender from
> sending the next segment that we are waiting for to open up the window ?

Yes.  The very idea of delayed ACK is to reduce the network utilization
by ACKing only every other segment.  Window updates should not override
this as they currently do.  Nagle comes into plays as well where we wait
for the application to write something within the delayed ACK timeout to
piggyback the answer together with the ACK (and window update).

To answer your question: I do think we are fine with waiting for the
delayed ACK.  If an application starts to seriously lag behind like
in your example the feedback mechanism should work and cause the sender
to slow down too.  The feedback loop in TCP is not only the network but
also the sending and receiving application.  In a normal bulk transfer
where the receiving application services the receive buffer in regular
intervals we update the window with every ACK.

I'm open to other ideas if they fix the problem David is seeing without
having more serious shortcomings.

> What's the purpose of the 2 MSS check by the way ? 

This is part of the Silly Window Syndrome prevention.  A good description is here:
http://www.tcpipguide.com/free/t_TCPSillyWindowSyndromeandChangesTotheSlidingWindow.htm

PS: Attached is an updated version of the patch.  The flag TF_DELACK
can't be used to test for the presence of a delayed ACK.  The presence
of the delack timer has to be tested.

-- 
Andre

> Venkat  
> 
> 
> 
> 
> ________________________________
> From: Andre Oppermann <andre at freebsd.org>
> To: David Malone <dwmalone at maths.tcd.ie>
> Cc: Rui Paulo <rpaulo at fnop.net>; freebsd-net at freebsd.org; Venkat Venkatsubra <venkatvenkatsubra at yahoo.com>; Kevin Oberman <oberman at es.net>
> Sent: Sunday, November 30, 2008 5:18:22 PM
> Subject: Re: FreeBSD Window updates
> 
> Andre Oppermann wrote:
>> David Malone wrote:
>>> I've got an example extract tcpdump of this at the end of the mail
>>> - here 6 ACKs are sent, 5 of which are pure window updates and
>>> several are 2us apart!
>>>
>>> I think the easy option is to delete the code that generates explicit
>>> window updates if the window moves by 2*MSS. We then should be doing
>>> something similar to Linux. The other easy alternative would be to
>>> add a sysclt that lets us generate an window update every N*MSS and
>>> by default set it to something big, like 10 or 100. That should
>>> effectively eliminate the updates during bulk data transfer, but
>>> may still generate some window updates after a loss.
>> The main problem of the pure window update test in tcp_output() is
>> its complete ignorance of delayed ACKs.  Second is the strict 4.4BSD
>> adherence to sending an update for every window increase of >= 2*MSS.
>> The third issue of sending a slew of window updates after having
>> received a FIN (telling us the other end won't ever send more data)
>> I have already fixed some moons ago.
>>
>> In my new-tcp work I've come across the window update logic some time
>> ago and backchecked with relevant RFCs and other implementations.
>> Attached is a compiling but otherwise untested backport of the new logic.
> 
> Slightly improved version attached.
> 

-------------- next part --------------
Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.158
diff -u -p -r1.158 tcp_output.c
--- tcp_output.c	27 Nov 2008 13:19:42 -0000	1.158
+++ tcp_output.c	1 Dec 2008 21:06:28 -0000
@@ -539,29 +539,59 @@ after_sack_rexmit:
 	}
 
 	/*
-	 * Compare available window to amount of window
-	 * known to peer (as advertised window less
-	 * next expected input).  If the difference is at least two
-	 * max size segments, or at least 50% of the maximum possible
-	 * window, then want to send a window update to peer.
+	 * Compare available window to amount of window known to peer
+	 * (as advertised window less next expected input) and decide
+	 * if we have to send a pure window update segment.
+	 *
+	 * When a delayed ACK is scheduled, do nothing.  It will update
+	 * the window anyway in a few milliseconds when the:
+	 *  - next segment arrives we have to ack immediately;
+	 *  - application sends some data back to the peer;
+	 *  - delayed ACK timer expires.
+	 *
+	 * If the receive socket buffer has less than 1/4 of space
+	 * available and if the difference is at least two max size
+	 * segments, send an immediate window update to peer.
+	 *
+	 * Otherwise if the difference is 1/8 (or more) of the receive
+	 * socket buffer, or at least 1/2 of the maximum possible window,
+	 * then we send a window update too.
+	 *
 	 * Skip this if the connection is in T/TCP half-open state.
 	 * Don't send pure window updates when the peer has closed
 	 * the connection and won't ever send more data.
+	 *
+	 * See RFC793, Section 3.7, page 43, Window Management Suggestions
+	 * See RFC1122: Section 4.2.3.3, When to Send a Window Update
+	 *
+	 * Note: We are less aggressive with sending window update than
+	 * recommended in RFC1122.  This is fine with todays large socket
+	 * buffers and will not stall the peer.  In addition we piggy back
+	 * window update on regular ACKs and sends.
 	 */
-	if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
-	    !TCPS_HAVERCVDFIN(tp->t_state)) {
+	if (recwin > 0 && !tcp_timer_active(tp, TT_DELACK) &&
+	    !(tp->t_flags & TF_NEEDSYN) && !TCPS_HAVERCVDFIN(tp->t_state)) {
 		/*
 		 * "adv" is the amount we can increase the window,
 		 * taking into account that we are limited by
 		 * TCP_MAXWIN << tp->rcv_scale.
+		 *
+		 * NB: adv must be equal or larger than the smallest
+		 * unscaled window increment.
 		 */
 		long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
 			(tp->rcv_adv - tp->rcv_nxt);
 
-		if (adv >= (long) (2 * tp->t_maxseg))
-			goto send;
-		if (2 * adv >= (long) so->so_rcv.sb_hiwat)
-			goto send;
+		if (adv >= (long)0x1 << tp->rcv_scale) {
+			if (recwin <= (long)(so->so_rcv.sb_hiwat / 4) &&
+			    adv >= (long)(2 * tp->t_maxseg))
+				goto send;
+			if (adv >= (long)(so->so_rcv.sb_hiwat / 8) &&
+			    adv >= (long)tp->t_maxseg)
+				goto send;
+			if (2 * adv >= (long)so->so_rcv.sb_hiwat)
+				goto send;
+		}
 	}
 
 	/*


More information about the freebsd-net mailing list