Re: low TCP speed, wrong rtt measurement

From: Scheffenegger, Richard <rscheff_at_freebsd.org>
Date: Sun, 09 Apr 2023 09:40:29 UTC
[resending to the lists]

Hi,

Adding fbsd-transport too.

For stable-12, I believe all relevant (algorithm) improvements went in.

However, 12.2 is missing D26807 and D26808 - improvements in Cubic to 
retransmission timeouts (but these are not material)

While 12.1. has none of the improvements done in 2020 to the Cubic 
module - D18954, D18982, D19118, D23353, D23655, D25065, D25133, D25744, 
D24657, D25746, D25976, D26060, D26807, D26808.

These should fix numerous issues in cubic, which would very likely make 
it perform poorly particularly on longer duration sessions.

However, Cubic is heavily reliant on a valid measurement of RTT and the 
epoch since the last congestion response (measured in units of RTT). An 
issue in getting RTT measured properly would derail cubic for sure (most 
likely cubic would inflate cwnd much faster, then running into 
significant packet loss, very likely loss of retransmissions, followed 
by retransmission timeouts, and shrinking of the ssthresh to small values.


I haven't looked into cc_vegas or the ertt module though.

One more initial question: Are you using timestamps on that long, thin 
pipe - or is net.inet.tcp.rfc1323 disabled (more recent versions allow 
the selective enablement/disabling of window scaling and timestamps 
indepentend of each other, but I don't think this is in and 12 release. 
(D36863)?

Finally, you could be using SIFTR to track the evolution of the minrtt 
value over the course of the session.

Although I suspect ultimately a tcpdump including the tcp header (-s 80) 
, and the sifter internal state evolution would be optimal to 
understanding when and why the RTT values go off the rails.


At first glance, the ertt module may be prone to miscalculations, when 
retransmissions are in play - no special precautions appear to be 
present, to distinguish between the originally sent packet, and any 
retransmission, nor any filtering of ACKs which come in as duplicates. 
Thus there could be a scenario, where an ACK for a spurious 
retransmission, e.g. due to reordering, could lead to a wrong baseline 
RTT measurement, which is physically impossible on such a long distance 
connection...

But again, I haven't looked into the ertt module so far at all.

How are the base stack RTT related values look on these misbehaving 
sessions?
Tcpcb-> t_rttmin, t_srtt, t_rttvar, t_rxtcur, t_rtttime, t_rtseq,
Tcpcb-> t_rttlow, t_rttupdated

Best regards,
   Richard