kern/108670: TCP connection ETIMEDOUT

dave baukus david.baukus at us.fujitsu.com
Thu Feb 1 23:40:20 UTC 2007


>Number:         108670
>Category:       kern
>Synopsis:       TCP connection ETIMEDOUT
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 01 23:40:18 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     dave baukus
>Release:        FreeBSD6.1
>Organization:
FNC
>Environment:
FreeBSD krakatoa 6.1-RELEASE FreeBSD 6.1-RELEASE #23: Wed Aug  9 13:33:37 CDT 2006     dbaukus at krakatoa:/home/dbaukus/kern/i386/compile/KRAKATOA-FNC  i386

>Description:
There is a bug  tcp_output() for at least freeBSD6.1
that causes a perfectly good TCP to be dropped by its
retransmit timer; the application receives ETIMEDOUT.

Consider a TCP that never transmits (the receive end of the ttcp
utility is an example), while the TCP is established
snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit
timer should never be started. If the retransmit timer is started
then it is never stopped by tcp_input/tcp_out because
snd_max == snd_una == snd_nxt (always). Once started the
timer continues its count up till tp->t_rxtshift == 12 and
the connection that never transmitted gets falsely killed.

The bug is to blindly rely on the return value of ip_output().
If ip_output() returns ENOBUFS then the retransmit timer is
activated:

>From the end of tcp_output():
out:
SOCKBUF_UNLOCK_ASSERT(&so->so_snd);    /* Check gotos. */
if (error == ENOBUFS) {
        if (!callout_active(tp->tt_rexmt) &&
            !callout_active(tp->tt_persist))
                     callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                         tcp_timer_rexmt, tp);
                     tp->snd_cwnd = tp->t_maxseg;
                     return (0);
}

My simple minded fix would be not to start the retransmit timer;
if tcp_output() wanted to time this transmit it would have started
the timer up above.

This ETIMEDOUT problem is easily recreated on any old machine
using a single slow ethernet device and the ttcp test utility.
First, fire up a couple ttcp receivers. Second, flood the same
interface with enough ttcp transmitters to cause the driver's transmit
ring and interface queue to back up. Eventually, one of the ttcp
receives will get ENOBUFS from ip_output() and the retransmit
timer will be wrongly activated for a pure ACK segment.

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r 
>How-To-Repeat:

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r 
>Fix:
Do not start the retransmit timer based on error codes from ip_output() ?
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list