Re: FreeBSD CUBIC - Extremely low performance for short RTT setups experiencing congestion

From: Scheffenegger, Richard <rscheff_at_freebsd.org>
Date: Fri, 28 Apr 2023 08:57:04 UTC
Hi Bhaskar,

Personally, I'm not a big fan of having initial-RTT dependent toggling 
of the behavior of a TCP session.

Two other approaches come to mind:

a) The separate TCP RACK stack already uses TCP_HPTS (high precicion 
timer schedule) which yields a granularity of 1 usec. However, I believe 
this does not actually improve the granularity of the RTT measurements 
(and by extention, yield more precision in the Cubic calculation). 
However, it may be an approach that when TCP_HPTS is compiled in, to 
also increase the RTT measurement granularity to usec.

b) During congestion avoidance (CA), Cubic already has a linear cwnd 
growth function, which is modelled to be equal to NewReno CA cwnd growth 
- when the cube function is very close to its inflection point, and thus 
nearly flat. It should be relatively straight forward, to also use the 
actual NewReno CA growth function (1 / cnwd) in that conditional branch.


But the optimal fix is to heed the 8312bis document.
https://datatracker.ietf.org/doc/draft-ietf-tcpm-rfc8312bis/

The cubic algorithm in FreeBSD predates RFC8312, and 8312bis is in the 
final states of getting published.

While the original RFC indeed specified a time-based calculation in the 
concave region, the -bis document uses the actual acked segments (it is 
written with a segment-oriented stack in mind, acked bytes when cwnd is 
in bytes is equally valid.

Updating the FreeBSD Cubic module using 8312bis would be the best path 
forward.

Best regards,
   Richard

----------

Hi team,

I am a networking datapath engineer working at VMware. I primarily work 
on VMware's ESX's TCP/IP stack. We have been playing around with 
FreeBSD's CUBIC Congestion Control Algorithm (CCA) for a while. In most 
of our performance tests, CUBIC is performing fine. However, we saw that 
CUBIC results in performance degradation of around 100-150% in setups 
where the following conditions are met
1.	sender and receiver are back-to-back connected (short RTT)
2.	back-to-back connection/link experiences congestion
We saw that the congestion window (cwnd) is not increasing fast enough 
in the case of CUBIC, after experiencing a congestion event. On the same 
setup, NewReno is performing very well and provides a very fast increase 
in the cwnd after a congestion event.

https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-cubic-06 - CUBIC 
Internet-Draft (I-D)
As per the I-D, it is mentioned that for short RTT (or even low BDP) 
networks standard TCP CCAs perform better than CUBIC. Hence, in such 
cases, the TCP-friendly window size (equation 4) will always come out to 
be greater than the cubic window size (equation 1). Hence, we will focus 
our discussion only on the TCP-friendly window size being calculated 
during the congestion avoidance phase.

Theoretically, CUBIC should perform at least as better as standard TCP 
for short RTT setups. However, in reality, this is not being observed. 
We find that the granularity of the system's ticks value is causing 
CUBIC to not grow the cwnd faster. I will explain this issue with an 
example.

-------

Consider that we have a short RTT setup where the RTT is 0.05ms. Let us 
assume that the system's tick frequency is 1000 HZ, i.e. the ticks value 
will increase by 1, once every 1ms (FreeBSD probably has a 1ms tick timer).

Now let us consider a period of 1s. During this 1s period, the sender 
will receive around 1s/0.05ms = 1000ms/0.05ms = 20000 ACKs. 
Consequently, we will be calling the cc_ack_received() callback for each 
of these ACKs. For NewReno, the cwnd will be increased for each of those 
ACKs, till the TCP flow becomes limited by the receiver's window. 
However, in CUBIC, even though the cubic_ack_received() callback is 
invoked for each of those 20000 ACKs, the cwnd will not be increased for 
each of those ACKs. This is because of the "ticks" value used to 
calculate the time elapsed from the last congestion event. In FreeBSD 
for a period of 1ms, the ticks value will be the same. In 1ms, we will 
receive 1ms/0.05ms = 20 ACKs. For all of these 20 ACKs, ticks value will 
be the same, the time elapsed from the last congestion will be the same, 
and finally, the TCP-friendly window estimate will be the same. Hence, 
cwnd will not increase for the entire 1ms duration. If a system has some 
other timer period (for eg. 10ms), the cwnd will stay the same for that 
entire period.

Of these 20000 ACKs,  NewReno will try to increase the cwnd value for 
almost every ACK. However, CUBIC will increase it only for 1000 ACKs. 
That too distributed over a period of 1s. I hope the issue is now clear.

----
We wanted to discuss the solution to this issue. Currently, we have 
thought of falling back to using the newreno way of doing congestion 
avoidance when we are dealing with short RTT connections. This means 
that if the mean RTT value maintained by CUBIC private data is less than 
or equal to 1, we will use NewReno's way of doing congestion avoidance 
to get a TCP-friendly window estimate. In other cases (non-short RTT), 
we will use equation 4 of I-D to get the TCP-friendly estimate.

This will resolve the issue we are seeing. However, the only concern we 
have in this approach is that this will make CUBIC RTT-dependent for 
short RTT networks. As per the I-D I see the following
    Another notable feature of CUBIC is that its window increase rate is
    mostly independent of RTT, and follows a (cubic) function of the
    elapsed time from the beginning of congestion avoidance.
So, we are not sure if logically this solution is right or not. Also, we 
are not sure of any other implications this change might cause in CUBIC.

----
Adding Richard to the thread directly, as I have been following his work 
on CUBIC for some time.

Thanks,
Bhaskar Pardeshi (bpardeshi@vmware.com)
VMware, Inc.