Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)

From: <tuexen_at_freebsd.org>
Date: Thu, 18 Jul 2024 16:35:46 UTC

> On 18. Jul 2024, at 16:03, Alan Somers <asomers@FreeBSD.org> wrote:
> 
> On Wed, Jul 17, 2024 at 2:27 PM <tuexen@freebsd.org> wrote:
>> 
>>> On 17. Jul 2024, at 22:00, Alan Somers <asomers@freebsd.org> wrote:
>>> 
>>> On Sat, Jul 13, 2024 at 1:50 AM <tuexen@freebsd.org> wrote:
>>>> 
>>>>> On 13. Jul 2024, at 01:43, Alan Somers <asomers@FreeBSD.org> wrote:
>>>>> 
>>>>> I've been experimenting with RACK and BBR.  In my environment, they
>>>>> can dramatically improve single-stream TCP performance, which is
>>>>> awesome.  But pf interferes.  I have to disable pf in order for them
>>>>> to work at all.
>>>>> 
>>>>> Is this a known limitation?  If not, I will experiment some more to
>>>>> determine exactly what aspect of my pf configuration is responsible.
>>>>> If so, can anybody suggest what changes would have to happen to make
>>>>> the two compatible?
>>>> A problem with same symptoms was already reported and fixed in
>>>> https://reviews.freebsd.org/D43769
>>>> 
>>>> Which version are you using?
>>>> 
>>>> Best regards
>>>> Michael
>>>>> 
>>>>> -Alan
>>> 
>>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>>> 
>>> I want to follow up with the list to post my conclusions.  Firstly
>>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
>>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
>>> confirm that tcp_bbr works for me if I either disable LRO, disable PF,
>>> or switch to a 14.1 server.
>>> 
>>> Here's the real problem: on multiple production servers, downloading
>>> large files (or ZFS send/recv streams) was slow.  After ruling out
>>> many possible causes, wireshark revealed that the connection was
>>> suffering about 0.05% packet loss.  I don't know the source of that
>>> packet loss, but I don't believe it to be congestion-related.  Along
>>> with a 54ms RTT, that's a fatal combination for the throughput of
>>> loss-based congestion control algorithms.  According to the Mathis
>>> Formula [1], I could only expect 1.1 MBps over such a connection.
>>> That's actually worse than what I saw.  With default settings
>>> (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
>>> outdated, but that's still pretty close for such a simple formula
>>> that's 27 years old.
>>> 
>>> So I benchmarked all available congestion control algorithms for
>>> single download streams.  The results are summarized in the table
>>> below.
>>> 
>>> Algo    Packet Loss Rate    Average Throughput
>>> vegas   0.05%               2.0 MBps
>>> newreno 0.05%               3.2 MBps
>>> cubic   0.05%               5.6 MBps
>>> hd      0.05%               8.6 MBps
>>> cdg     0.05%               13.5 MBps
>>> rack    0.04%               14 MBps
>>> htcp    0.05%               15 MBps
>>> dctcp   0.05%               15 MBps
>>> chd     0.05%               17.3 MBps
>>> bbr     0.05%               29.2 MBps
>>> cubic   10%                 159 kBps
>>> chd     10%                 208 kBps
>>> bbr     10%                 5.7 MBps
>>> 
>>> RACK seemed to achieve about the same maximum bandwidth as BBR, though
>>> it took a lot longer to get there.  Also, with RACK, wireshark
>>> reported about 10x as many retransmissions as dropped packets, which
>>> is suspicious.
>>> 
>>> At one point, something went haywire and packet loss briefly spiked to
>>> the neighborhood of 10%.  I took advantage of the chaos to repeat my
>>> measurements.  As the table shows, all algorithms sucked under those
>>> conditions, but BBR sucked impressively less than the others.
>>> 
>>> Disclaimer: there was significant run-to-run variation; the presented
>>> results are averages.  And I did not attempt to measure packet loss
>>> exactly for most runs; 0.05% is merely an average of a few selected
>>> runs.  These measurements were taken on a production server running a
>>> real workload, which introduces noise.  Soon I hope to have the
>>> opportunity to repeat the experiment on an idle server in the same
>>> environment.
>>> 
>>> In conclusion, while we'd like to use BBR, we really can't until we
>>> upgrade to 14.1, which hopefully will be soon.  So in the meantime
>>> we've switched all relevant servers from cubic to chd, and we'll
>>> reevaluate BBR after the upgrade.
>> Hi Alan,
>> 
>> just to be clear: the version of BBR currently implemented is
>> BBR version 1, which is known to be unfair in certain scenarios.
>> Google is still working on BBR to address this problem and improve
>> it in other aspects. But there is no RFC yet and the updates haven't
>> been implemented yet in FreeBSD.
> 
> I've also heard that RACK suffers from fairness problems.  Do you know
> how RACK and BBR compare for fairness?
RACK should be fare, BBR (version 1) is known not be be fair...

Best regards
Michael