svn commit: r221346 - head/sys/netinet

John Baldwin jhb at FreeBSD.org
Sat May 14 14:37:53 UTC 2011


On 5/14/11 6:37 AM, Mikolaj Golub wrote:
> Hi,
>
> On Mon, 2 May 2011 21:05:52 +0000 (UTC) John Baldwin wrote:
>
>   JB>  Author: jhb
>   JB>  Date: Mon May  2 21:05:52 2011
>   JB>  New Revision: 221346
>   JB>  URL: http://svn.freebsd.org/changeset/base/221346
>
>   JB>  Log:
>   JB>    Handle a rare edge case with nearly full TCP receive buffers.  If a TCP
>   JB>    buffer fills up causing the remote sender to enter into persist mode, but
>   JB>    there is still room available in the receive buffer when a window probe
>   JB>    arrives (either due to window scaling, or due to the local application
>   JB>    very slowing draining data from the receive buffer), then the single byte
>   JB>    of data in the window probe is accepted.  However, this can cause rcv_nxt
>   JB>    to be greater than rcv_adv.  This condition will only last until the next
>   JB>    ACK packet is pushed out via tcp_output(), and since the previous ACK
>   JB>    advertised a zero window, the ACK should be pushed out while the TCP
>   JB>    pcb is write-locked.
>   JB>
>   JB>    During the window while rcv_nxt is greather than rcv_adv, a few places
>   JB>    would compute the remaining receive window via rcv_adv - rcv_nxt.
>   JB>    However, this value was then (uint32_t)-1.  On a 64 bit machine this
>   JB>    could expand to a positive 2^32 - 1 when cast to a long.  In particular,
>   JB>    when calculating the receive window in tcp_output(), the result would be
>   JB>    that the receive window was computed as 2^32 - 1 resulting in advertising
>   JB>    a far larger window to the remote peer than actually existed.
>   JB>
>   JB>    Fix various places that compute the remaining receive window to either
>   JB>    assert that it is not negative (i.e. rcv_nxt<= rcv_adv), or treat the
>   JB>    window as full if rcv_nxt is greather than rcv_adv.
>   JB>
>   JB>    Reviewed by:        bz
>   JB>    MFC after:        1 month
>
>   JB>  Modified:
>   JB>    head/sys/netinet/tcp_input.c
>   JB>    head/sys/netinet/tcp_output.c
>   JB>    head/sys/netinet/tcp_timewait.c
>
>   JB>  Modified: head/sys/netinet/tcp_input.c
>   JB>  ==============================================================================
>   JB>  --- head/sys/netinet/tcp_input.c        Mon May  2 21:04:37 2011        (r221345)
>   JB>  +++ head/sys/netinet/tcp_input.c        Mon May  2 21:05:52 2011        (r221346)
>   JB>  @@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tc
>   JB>           win = sbspace(&so->so_rcv);
>   JB>           if (win<  0)
>   JB>                   win = 0;
>   JB>  +        KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
>   JB>  +            ("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
>   JB>  +            tp->rcv_adv, tp->rcv_nxt));
>
> I am getting this when running tests with HAST (both primary and secondary HAST
> instances on the same host).
>
> HAST is synchronizing data in MAXPHYS (131072 bytes) blocks. The sender splits
> them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while the receiver
> receives the whole block calling recv() with MSG_WAITALL option.

Can you capture a tcpdump (probably easiest to do from the other host)?

-- 
John Baldwin


More information about the svn-src-head mailing list