svn commit: r221346 - head/sys/netinet
John Baldwin
jhb at FreeBSD.org
Sat May 14 14:37:53 UTC 2011
On 5/14/11 6:37 AM, Mikolaj Golub wrote:
> Hi,
>
> On Mon, 2 May 2011 21:05:52 +0000 (UTC) John Baldwin wrote:
>
> JB> Author: jhb
> JB> Date: Mon May 2 21:05:52 2011
> JB> New Revision: 221346
> JB> URL: http://svn.freebsd.org/changeset/base/221346
>
> JB> Log:
> JB> Handle a rare edge case with nearly full TCP receive buffers. If a TCP
> JB> buffer fills up causing the remote sender to enter into persist mode, but
> JB> there is still room available in the receive buffer when a window probe
> JB> arrives (either due to window scaling, or due to the local application
> JB> very slowing draining data from the receive buffer), then the single byte
> JB> of data in the window probe is accepted. However, this can cause rcv_nxt
> JB> to be greater than rcv_adv. This condition will only last until the next
> JB> ACK packet is pushed out via tcp_output(), and since the previous ACK
> JB> advertised a zero window, the ACK should be pushed out while the TCP
> JB> pcb is write-locked.
> JB>
> JB> During the window while rcv_nxt is greather than rcv_adv, a few places
> JB> would compute the remaining receive window via rcv_adv - rcv_nxt.
> JB> However, this value was then (uint32_t)-1. On a 64 bit machine this
> JB> could expand to a positive 2^32 - 1 when cast to a long. In particular,
> JB> when calculating the receive window in tcp_output(), the result would be
> JB> that the receive window was computed as 2^32 - 1 resulting in advertising
> JB> a far larger window to the remote peer than actually existed.
> JB>
> JB> Fix various places that compute the remaining receive window to either
> JB> assert that it is not negative (i.e. rcv_nxt<= rcv_adv), or treat the
> JB> window as full if rcv_nxt is greather than rcv_adv.
> JB>
> JB> Reviewed by: bz
> JB> MFC after: 1 month
>
> JB> Modified:
> JB> head/sys/netinet/tcp_input.c
> JB> head/sys/netinet/tcp_output.c
> JB> head/sys/netinet/tcp_timewait.c
>
> JB> Modified: head/sys/netinet/tcp_input.c
> JB> ==============================================================================
> JB> --- head/sys/netinet/tcp_input.c Mon May 2 21:04:37 2011 (r221345)
> JB> +++ head/sys/netinet/tcp_input.c Mon May 2 21:05:52 2011 (r221346)
> JB> @@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tc
> JB> win = sbspace(&so->so_rcv);
> JB> if (win< 0)
> JB> win = 0;
> JB> + KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
> JB> + ("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
> JB> + tp->rcv_adv, tp->rcv_nxt));
>
> I am getting this when running tests with HAST (both primary and secondary HAST
> instances on the same host).
>
> HAST is synchronizing data in MAXPHYS (131072 bytes) blocks. The sender splits
> them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while the receiver
> receives the whole block calling recv() with MSG_WAITALL option.
Can you capture a tcpdump (probably easiest to do from the other host)?
--
John Baldwin
More information about the svn-src-all
mailing list