In TCP recovery state, problem of computing the pipe(amount of data in flight).

Yongmin Cho yongmincho82 at
Mon Feb 29 08:40:43 UTC 2016

Thanks very much for the quick reply.

I'm sorry, I didn't make patch file.
First, I wanted to discuss my opinion is right.

Let me give you example, please check this.

<- ACK (cumack = 1,  sack[3-4], sack[6-7], sack[9-10])
* here segment/byte 2, 5, and 8 are missing.
<- ACK (cumack = 1, sack[12-13], sack[15-16], sack[18-19])
* here segment/bytes 11, 14 and 17 are also reported missing.
<- ACK (cumack = 1, sack[18-19], sack[21-24], sack[27, 28])
* here segment/bytes 20 and 25, 26 are missing. (triggered fast
<- ACK.....(cumack = 1, sack[21-24], sack[27, 28], sack[32, 33])
<- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 44])
<- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 45])
<- ACKs.....(many duplication acks, and new sacked blocks)

In the fast recovery phase, the pipe is caculated like below, If the
net.inet.tcp.rfc6675_pipe is turned on.
pipe = snd_max - snd_una + sack_bytes_rexmit(1 MSS size) -
sacked_bytes(10 = 34-35, 37-40, 42-45 tcp_sack.c:390)
One segment is sended(sack_bytes_rexmit), when triggered fast
Because the snd_cwnd was set 1 mss size. (tcp_input.c:2609)
In the fast recovery phase, The sender can send data,
If this condition is right(awnd < tp->snd_ssthresh tcp_input.c:2568).
When in the network, It still has many in flight packets,
snd_max and snd_una will not be changed, and sack_bytes_rexmit is one MSS 
size, and sacked_bytes is caculated by last ACK that has three SACK
blocks(34-35, 37-40, 42-45).
So, sometimes(In my test environment) the awnd(pipe) value can't go
down less than the snd_ssthresh, while receiving each ACKs
in fast recovery phase.
You know, If the awnd value can't go down less than the snd_ssthresh,
The sender can't send data that is included snd_holes.

So, I think, the sacked_bytes should be caculated by all of sacked
blocks that is greater than snd_una, like below.
pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(3-4, 6-7,
9-10, 12-13, 15-16, 18-19, 21-24, 27-28, ...).

But on current implementation, the sacked_bytes is caculated by three(or four)
sacked blocks that is in last ACK, like below.
pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(34-35,
37-40, 42-45 -> sacked blocks of last ACK).

My opinion may not be right. Just I want to check implementation of
computing pipe.

Thank you.

On Fri, Feb 26, 2016 at 07:16:04PM -0800, hiren panchasara wrote:
> On 02/25/16 at 08:26P, Yongmin Cho wrote:
> > Hi, all.
> > 
> > I have a question about net.inet.tcp.rfc6675_pipe in sysctl.
> > The bytes in flight was changed to be like below in r290122.
> > pipe = snd_max - snd_una - sackhint.sacked_bytes +
> > sackhint.sack_bytes_rexmit.
> > I think, The implementation of sackhint.sack_bytes_rexmit is right.
> > But, I don't think, sackhint.sacked_bytes is right way.
> > The sackhint.sacked_bytes is computed by array of sack_blocks in
> > tcp_sack_doack function.
> > You know, tcp header can have four sacked blocks.
> > (If tcp uses timestmap option, tcp header can have three sacked
> > blocks.)
> > Even if The receiver has sacked blocks greater than three or four,
> > The receiver can send ack with three or four last sack blocks.
> > So if the receiver has many sacked blocks, the sender only knows three
> > sacked_bytes.
> > the snd_holes tail queue in struct tcpcb has all of sack holes which
> > is greater than snd_una.
> > So, i think, sack_bytes_rexmit is correct.
> > Because sack_bytes_rexmit is computed by snd_holes tail queue in
> > struct tcpcb.
> > but sackhint.sacked_bytes is too small.
> > Because sackhint.sacked_bytes is just computed by ack with three or
> > four last sacked blocks.
> > So, the return value of tcp_compute_pipe() function is too big, while
> > recovery phase.
> > In recovery state, the sender can send data,
> > if the return value of tcp_compute_pipe() should be less than
> > snd_ssthresh.
> > Sometimes it takes a long time to send data, if the sender knows many
> > sack holes.
> > Furthermore, Sometimes the sender can't send data, Because the return
> > value of tcp_compute_pipe() function.
> > And retransmission timeout is triggered.
> Your analysis is correct and we did think about this. Please look at
> 's summary section. Main reason for
> going with this approach was that it was at least on the conservative
> side i.e. would send less data (and not more) and would not bloat the
> network.
> BTW, have you run into this problem of this causing slower recovery?
> > 
> > IMO, sackhint.sack_bytes should be computed using snd_holes tail
> > queue.
> > Because snd_holes has all of sack holes which is greater than snd_una,
> > sackhint.sack_bytes can be computed using snd_holes.
> I thought snd_holes also gets populated by the info in SACKs and if for
> some reason other end has more than 3 or 4 holes and can't send it,
> snd_holes would also have incorrect info. I'd have to look at the code
> again to see if its possible to do this more correctly with snd_holes.
> Though, I do see the point of this approach would provide better
> protection against transient problems where other end cannot send SACK
> holes info for a couple times and resumes again. Again, I'd have to go
> look at the code closely.
> It'd be even better if you have a patch for this. If not, no worries.
> :-)
> Cheers,
> Hiren

More information about the freebsd-transport mailing list