In TCP recovery state, problem of computing the pipe(amount of data in flight).

Sat Feb 27 03:16:10 UTC 2016

On 02/25/16 at 08:26P, Yongmin Cho wrote:
> Hi, all.
> 
> I have a question about net.inet.tcp.rfc6675_pipe in sysctl.
> The bytes in flight was changed to be like below in r290122.
> pipe = snd_max - snd_una - sackhint.sacked_bytes +
> sackhint.sack_bytes_rexmit.
> I think, The implementation of sackhint.sack_bytes_rexmit is right.
> But, I don't think, sackhint.sacked_bytes is right way.
> The sackhint.sacked_bytes is computed by array of sack_blocks in
> tcp_sack_doack function.
> You know, tcp header can have four sacked blocks.
> (If tcp uses timestmap option, tcp header can have three sacked
> blocks.)
> Even if The receiver has sacked blocks greater than three or four,
> The receiver can send ack with three or four last sack blocks.
> So if the receiver has many sacked blocks, the sender only knows three
> sacked_bytes.
> the snd_holes tail queue in struct tcpcb has all of sack holes which
> is greater than snd_una.
> So, i think, sack_bytes_rexmit is correct.
> Because sack_bytes_rexmit is computed by snd_holes tail queue in
> struct tcpcb.
> but sackhint.sacked_bytes is too small.
> Because sackhint.sacked_bytes is just computed by ack with three or
> four last sacked blocks.
> So, the return value of tcp_compute_pipe() function is too big, while
> recovery phase.
> In recovery state, the sender can send data,
> if the return value of tcp_compute_pipe() should be less than
> snd_ssthresh.
> Sometimes it takes a long time to send data, if the sender knows many
> sack holes.
> Furthermore, Sometimes the sender can't send data, Because the return
> value of tcp_compute_pipe() function.
> And retransmission timeout is triggered.

Your analysis is correct and we did think about this. Please look at
https://reviews.freebsd.org/D3971 's summary section. Main reason for
going with this approach was that it was at least on the conservative
side i.e. would send less data (and not more) and would not bloat the
network.

BTW, have you run into this problem of this causing slower recovery?
> 
> IMO, sackhint.sack_bytes should be computed using snd_holes tail
> queue.
> Because snd_holes has all of sack holes which is greater than snd_una,
> sackhint.sack_bytes can be computed using snd_holes.

I thought snd_holes also gets populated by the info in SACKs and if for
some reason other end has more than 3 or 4 holes and can't send it,
snd_holes would also have incorrect info. I'd have to look at the code
again to see if its possible to do this more correctly with snd_holes.
Though, I do see the point of this approach would provide better
protection against transient problems where other end cannot send SACK
holes info for a couple times and resumes again. Again, I'd have to go
look at the code closely.

It'd be even better if you have a patch for this. If not, no worries.
:-)

Cheers,
Hiren
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 603 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-transport/attachments/20160226/f093471f/attachment.sig>