FreeBSD CUBIC - Extremely low performance for short RTT setups experiencing congestion

From: biggy pardeshi <biggy.pardeshi_at_gmail.com>
Date: Thu, 27 Apr 2023 09:00:33 UTC
Hi team,

I am a networking datapath engineer working at VMware. I primarily work on
VMware's ESX's TCP/IP stack. We have been playing around with FreeBSD's
CUBIC Congestion Control Algorithm (CCA) for a while. In most of our
performance tests, CUBIC is performing fine. However, we saw that CUBIC
results in performance degradation of around 100-150% in setups where the
following conditions are met

   1. sender and receiver are back-to-back connected (short RTT)
   2. back-to-back connection/link experiences congestion

We saw that the congestion window (cwnd) is not increasing fast enough in
the case of CUBIC, after experiencing a congestion event. On the same
setup, NewReno is performing very well and provides a very fast increase in
the cwnd after a congestion event.

https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-cubic-06 - CUBIC
Internet-Draft (I-D)
As per the I-D, it is mentioned that for short RTT (or even low BDP)
networks standard TCP CCAs perform better than CUBIC. Hence, in such cases,
the TCP-friendly window size (equation 4) will always come out to be
greater than the cubic window size (equation 1). Hence, we will focus our
discussion only on the TCP-friendly window size being calculated during the
congestion avoidance phase.

Theoretically, CUBIC should perform at least as better as standard TCP for
short RTT setups. However, in reality, this is not being observed. We find
that the granularity of the system's ticks value is causing CUBIC to not
grow the cwnd faster. I will explain this issue with an example.

-------

Consider that we have a short RTT setup where the RTT is 0.05ms. Let us
assume that the system's tick frequency is 1000 HZ, i.e. the ticks value
will increase by 1, once every 1ms (FreeBSD probably has a 1ms tick timer).

Now let us consider a period of 1s. During this 1s period, the sender will
receive around 1s/0.05ms = 1000ms/0.05ms = 20000 ACKs. Consequently, we
will be calling the cc_ack_received() callback for each of these ACKs. For
NewReno, the cwnd will be increased for each of those ACKs, till the TCP
flow becomes limited by the receiver's window. However, in CUBIC, even
though the cubic_ack_received() callback is invoked for each of those 20000
ACKs, the cwnd will not be increased for each of those ACKs. This is
because of the "ticks" value used to calculate the time elapsed from the
last congestion event. In FreeBSD for a period of 1ms, the ticks value will
be the same. In 1ms, we will receive 1ms/0.05ms = 20 ACKs. For all of these
20 ACKs, ticks value will be the same, the time elapsed from the last
congestion will be the same, and finally, the TCP-friendly window estimate
will be the same. Hence, cwnd will not increase for the entire 1ms
duration. If a system has some other timer period (for eg. 10ms), the cwnd
will stay the same for that entire period.

Of these 20000 ACKs,  NewReno will try to increase the cwnd value for
almost every ACK. However, CUBIC will increase it only for 1000 ACKs. That
too distributed over a period of 1s. I hope the issue is now clear.

----
We wanted to discuss the solution to this issue. Currently, we have thought
of falling back to using the newreno way of doing congestion avoidance when
we are dealing with short RTT connections. This means that if the mean RTT
value maintained by CUBIC private data is less than or equal to 1, we will
use NewReno's way of doing congestion avoidance to get a TCP-friendly
window estimate. In other cases (non-short RTT), we will use equation 4 of
I-D to get the TCP-friendly estimate.

This will resolve the issue we are seeing. However, the only concern we
have in this approach is that this will make CUBIC RTT-dependent for short
RTT networks. As per the I-D I see the following

   Another notable feature of CUBIC is that its window increase rate is
   mostly independent of RTT, and follows a (cubic) function of the
   elapsed time from the beginning of congestion avoidance.

So, we are not sure if logically this solution is right or not. Also, we
are not sure of any other implications this change might cause in CUBIC.

----
Adding Richard to the thread directly, as I have been following his work on
CUBIC for some time.

Thanks,
Bhaskar Pardeshi (bpardeshi@vmware.com)
VMware, Inc.