The tale of a TCP bug

Stefan `Sec` Zehl sec at 42.org
Sat Mar 26 14:02:14 UTC 2011


Hi again,

On Fri, Mar 25, 2011 at 16:40 -0400, John Baldwin wrote:
> Reading some more.  I'm trying to understand the breakage in your case.
> 
> You are saying that FreeBSD is the sender, who has data to send, yet is not 
> sending any window probes because it never starts the persist timer when the 
> initial window is zero?  Is that correct?

Yes. The receiver never sends a window update on its own, but when
probed will "admit" to a bigger window.

> And the problem is that the code that uses 'adv' to determine if it
> sound send a window update to the remote end is falsely succeeding due
> to the overflow causing tcp_output() to 'goto send' but that it then
> fails to send any data because it thinks the remote window is full?

Yes, as far as I remember (I did that part of debugging 2 Months ago,
when I submitted the PR %-) that's what happens.

> So one thing I don't quite follow is how you are having rcv_nxt >
> rcv_adv.  I saw this when the other side would send a window probe,
> and then the receiving side would take the -1 remaining window and
> explode it into the maximum window size when it ACKd.

No, it's not rcv_nxt > rcv_adv. It's

(rcv_adv - rcv_nxt) > min(recwin, (long)TCP_MAXWIN << tp->rcv_scale)

My sample case has (rcv_adv - rcv_nxt) = 65536, but 
(TCP_MAXWIN << tp->rcv_scale) = 65535 (as there is no window scaling in
effect)

> Are you seeing the other end of the connection send a window probe, but 
> FreeBSD is not setting the persist timer so that it will send its own window 
> probes?

No, the dump looks like this:

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [S], 
|    seq 3339144437, win 65535, options [...], length 0

FreeBSD sending the first SYN.
[rcv_adv=0, rcv_nxt=0]

| 10.42.0.2.1516 > 10.42.0.25.44852: Flags [S.], 
|    seq 42, ack 3339144438, win 0, length 0

The other end SYN|ACKing with a window size of 0.

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.], 
|    seq 1, ack 1, win 65535, length 0

FreeBSD ACKing, and (correctly) sending no data.
[rcv_adv=67779, rcv_nxt=43], thus resulting in adv=-1/0xffffffff

At this point amd64 hangs 'forever' as the opposite side doesn't send
any packets on its own.

On i386 the persist timer is started, and we get:

| 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.],
|    seq 1:2, ack 1, win 65535, length 1

A window probe [a few seconds later]

| 10.42.0.2.1516 > 10.42.0.25.44852: Flags [.],
|    seq 1, ack 2, win 70, length 0

At which point the remote side admits to having the window open
which results in the connection working fine after that.

CU,
    Sec
-- 
I know that you believe that you understand what you think I said.
But I am not sure you realize, that what you heared is not what i meant.


More information about the freebsd-net mailing list