The tale of a TCP bug

John Baldwin jhb at freebsd.org
Mon Mar 28 14:21:04 UTC 2011


On Saturday, March 26, 2011 10:02:12 am Stefan `Sec` Zehl wrote:
> Hi again,
> 
> On Fri, Mar 25, 2011 at 16:40 -0400, John Baldwin wrote:
> > Reading some more.  I'm trying to understand the breakage in your case.
> > 
> > You are saying that FreeBSD is the sender, who has data to send, yet is not 
> > sending any window probes because it never starts the persist timer when the 
> > initial window is zero?  Is that correct?
> 
> Yes. The receiver never sends a window update on its own, but when
> probed will "admit" to a bigger window.
> 
> > And the problem is that the code that uses 'adv' to determine if it
> > sound send a window update to the remote end is falsely succeeding due
> > to the overflow causing tcp_output() to 'goto send' but that it then
> > fails to send any data because it thinks the remote window is full?
> 
> Yes, as far as I remember (I did that part of debugging 2 Months ago,
> when I submitted the PR %-) that's what happens.
> 
> > So one thing I don't quite follow is how you are having rcv_nxt >
> > rcv_adv.  I saw this when the other side would send a window probe,
> > and then the receiving side would take the -1 remaining window and
> > explode it into the maximum window size when it ACKd.
> 
> No, it's not rcv_nxt > rcv_adv. It's
> 
> (rcv_adv - rcv_nxt) > min(recwin, (long)TCP_MAXWIN << tp->rcv_scale)
> 
> My sample case has (rcv_adv - rcv_nxt) = 65536, but 
> (TCP_MAXWIN << tp->rcv_scale) = 65535 (as there is no window scaling in
> effect)

Ahhhh.

> > Are you seeing the other end of the connection send a window probe, but 
> > FreeBSD is not setting the persist timer so that it will send its own window 
> > probes?
> 
> No, the dump looks like this:
> 
> | 10.42.0.25.44852 > 10.42.0.2.1516: Flags [S], 
> |    seq 3339144437, win 65535, options [...], length 0
> 
> FreeBSD sending the first SYN.
> [rcv_adv=0, rcv_nxt=0]
> 
> | 10.42.0.2.1516 > 10.42.0.25.44852: Flags [S.], 
> |    seq 42, ack 3339144438, win 0, length 0
> 
> The other end SYN|ACKing with a window size of 0.
> 
> | 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.], 
> |    seq 1, ack 1, win 65535, length 0
> 
> FreeBSD ACKing, and (correctly) sending no data.
> [rcv_adv=67779, rcv_nxt=43], thus resulting in adv=-1/0xffffffff

Ahh, and this is the real bug.  And this goes back to the calculation of
'rcv_wnd' in tcp_input().

How about this:

Index: tcp_input.c
===================================================================
--- tcp_input.c	(revision 220098)
+++ tcp_input.c	(working copy)
@@ -1694,6 +1694,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th,
 	win = sbspace(&so->so_rcv);
 	if (win < 0)
 		win = 0;
+	if (win > TCP_MAXWIN << tp->rcv_scale)
+		win = TCP_MAXWIN << tp->rcv_scale;
 	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
 
 	/* Reset receive buffer auto scaling when not in bulk receive mode. */

This is basically the same as your patch except that it ensures that
'rcv_wnd' is accurate for any other uses.

It looks like the syncache code is already correct as it uses a similar test
to initialize 'sc_wnd':

	/*
	 * Initial receive window: clip sbspace to [0 .. TCP_MAXWIN].
	 * win was derived from socket earlier in the function.
	 */
	win = imax(win, 0);
	win = imin(win, TCP_MAXWIN);
	sc->sc_wnd = win;

-- 
John Baldwin


More information about the freebsd-net mailing list