SACK + RTO interaction

Wed Jun 17 00:05:05 UTC 2020

Hello,

Admittedly, a rather late reply, but as you see my ToDo-list is actually effective
and things don't fall off the cliff ;-)

Thanks for bringing this up, Richard!

I tried to reproduce the scenario you described and as far as I can see, we
(darwin) keep on re-arming the timer.

For two reasons:

1. if tcp_output won't transmit new data, it makes sure that the REXMT timer
   is armed https://github.com/apple/darwin-xnu/blob/master/bsd/netinet/tcp_output.c#L1395

2. in tcp_input, we are also making sure that the timer is armed:
   https://github.com/apple/darwin-xnu/blob/master/bsd/netinet/tcp_input.c#L4632

The packetdrill sequence I used to attempt a reproduction is:

+0		> . 1:1001(1000) ack 1
+0		> . 1001:2001(1000) ack 1
+0		> . 2001:3001(1000) ack 1
+0		> . 3001:4001(1000) ack 1
+0		> . 4001:5001(1000) ack 1
+0		> . 5001:6001(1000) ack 1
+0		> . 6001:7001(1000) ack 1
+0		> . 7001:8001(1000) ack 1
+0		> . 8001:9001(1000) ack 1
+0		> P. 9001:10001(1000) ack 1

+0.01		< . 1:1(0) ack 1 win 4242 <sack 3001:10001, nop, nop>
+0		> . 1:1001(1000) ack 1
+0		> . 1001:2001(1000) ack 1
+0		> . 2001:3001(1000) ack 1

+0		< . 1:1(0) ack 1001 win 4242 <sack 7001:10001, nop, nop>
+0.23		> . 1001:2001(1000) ack 1

+0		< . 1:1(0) ack 2001 win 4242 <sack 7001:10001, nop, nop>
+0		> . 2001:3001(1000) ack 1

The first ack 1001 will enter tcp_sack_partialack and set REXMT to 0, but
the latter call to tcp_output will reset it to a valid value.

Or did you had a different sequence of events in mind?

Thanks,
Christoph

On 01/13/20 - 23:39, Scheffenegger, Richard wrote:
> Hi guys,
> 
> I believe, Cheng has uncovered another long lurking bug, this time in the interaction between RTO and SACK.
> 
> Since its inception, tcp_sack_partialack stops the RTO timer apparently - which doesn't seem right.
> 
> What he observed is that if you run twice during the same SACK loss recovery episode into a lost retransmission (which is currently only recoverable by RTO), the initial loss is recovered by the RTO (unless an partial ACK disabled the timer prior to it firing), and the 2nd twice lost segment is at the mercy of any other tcp timer which hopefully is still active (keepalive, persist, ...).
> 
> https://reviews.freebsd.org/source/src/browse/head/sys/netinet/tcp_sack.c#782
> 
> I strongly suspect, that this should never cancelled the RTO, but reset it anew after a partial ACK. At least that would be more logical - to pull forward the timeout, if you are making some forward progress - not to stop the timeout completely, if one (of possibly many) retransmissions went through; if SACK loss recovery doesn't complete in an RTO timeout (which is many more RTTs than the single RTT a SACK loss recover should be taking), it would be prudent to give up and fall back to RTO, not?
> 
> This effect may also explain some of the other sporadic, very lengthy SACK recoveries we couldn't really pin down so far...
> 
> The patch should be easy enough
> tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
> instead of
> tcp_timer_activate(tp, TT_REXMT, 0);
> 
> Any comments?
> 
> BTW, found the same in Darwin.
> 
> 
> 
> Richard Scheffenegger
> Consulting Solution Architect
> NAS & Networking
> 
> NetApp
> +43 1 3676 811 3157 Direct Phone
> +43 664 8866 1857 Mobile Phone
> Richard.Scheffenegger at netapp.com<mailto:Richard.Scheffenegger at netapp.com>
> 
> 
> [Welcome to Data Driven]<https://datavisionary.netapp.com/>
> 
> <https://datavisionary.netapp.com/>
> [Facebook]<https://www.facebook.com/NetApp?fref=ts> [Twitter] <https://twitter.com/NetApp>
>  #DataDriven
> 
> https://ts.la/richard49892
>