From nobody Fri Jul 19 06:05:19 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WQK0Q2DtVz5R0sk for ; Fri, 19 Jul 2024 06:05:26 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WQK0P2SPPz3x5R; Fri, 19 Jul 2024 06:05:25 +0000 (UTC) (envelope-from tuexen@freebsd.org) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=freebsd.org (policy=none); spf=softfail (mx1.freebsd.org: 2001:638:a02:a001:20e:cff:fe4a:feaa is neither permitted nor denied by domain of tuexen@freebsd.org) smtp.mailfrom=tuexen@freebsd.org Received: from smtpclient.apple (unknown [193.203.3.54]) (Authenticated sender: micmac) by mail-n.franken.de (Postfix) with ESMTPSA id 9C3B7721E2817; Fri, 19 Jul 2024 08:05:19 +0200 (CEST) Content-Type: text/plain; charset=utf-8 List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls) From: tuexen@freebsd.org In-Reply-To: Date: Fri, 19 Jul 2024 08:05:19 +0200 Cc: Alan Somers , FreeBSD Net Content-Transfer-Encoding: quoted-printable Message-Id: <9B43971A-1E12-44C4-930B-51BB1F3ABDC1@freebsd.org> References: <400A46A2-E75F-4BE3-BFFF-340CF4557322@freebsd.org> To: Junho Choi X-Mailer: Apple Mail (2.3774.600.62) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.66 / 15.00]; NEURAL_HAM_LONG(-0.99)[-0.993]; NEURAL_HAM_SHORT(-0.99)[-0.993]; NEURAL_HAM_MEDIUM(-0.78)[-0.778]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : No valid SPF, No valid DKIM,none]; ONCE_RECEIVED(0.10)[]; FROM_NO_DN(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_TO(0.00)[gmail.com]; TO_DN_ALL(0.00)[]; FREEFALL_USER(0.00)[tuexen]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all:c]; ASN(0.00)[asn:680, ipnet:2001:638::/32, country:DE]; R_DKIM_NA(0.00)[]; APPLE_MAILER_COMMON(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_TRACE(0.00)[0:+] X-Rspamd-Queue-Id: 4WQK0P2SPPz3x5R > On 19. Jul 2024, at 05:07, Junho Choi wrote: >=20 > RACK is a loss detection algorithm and BBR is a congestion control = algorithm so it's on a different layer. > e.g. linux can configure them independently. >=20 > However in FreeBSD it looks like it is using the same configuration = sysctl (net.inet.tcp.functions_default=3Dtcp_rack|tcp_bbr), > so not able to set it both. FreeBSD supports multiple TCP stacks. One stack has the name RACK and = does RACK loss discovery, but much more like pacing, advanced LRO and so on. You can configure CC modules like newreno, cubic or so to be used. Another TCP stack is called BBR. It does BBRv1 congestion control and other things. >=20 > Is there any plan to improve it? or does tcp_bbr include tcp_rack's = loss probe behavior? I am not aware of a plan to improve the BBR stack right now. Best regards Michael >=20 > A little confused. >=20 > Best, >=20 >=20 > On Fri, Jul 19, 2024 at 4:23=E2=80=AFAM wrote: >> On 18. Jul 2024, at 20:37, Alan Somers wrote: >>=20 >> Coexist how? Do you mean that one socket can use one and a different >> socket uses the other? That makes sense. > Correct. >=20 > Best regards > Michael >>=20 >> On Thu, Jul 18, 2024 at 10:34=E2=80=AFAM wrote: >>>=20 >>>> On 18. Jul 2024, at 15:00, Junho Choi wrote: >>>>=20 >>>> Alan - this is a great result to see. Thanks for experimenting. >>>>=20 >>>> Just curious why bbr and rack don't co-exist? Those are two = separate things. >>>> Is it a current bug or by design? >>> Technically RACK and BBR can coexist. The problem was with pf and/or = LRO. >>>=20 >>> But this is all fixed now in 14.1 and head. >>>=20 >>> Best regards >>> Michael >>>>=20 >>>> BR, >>>>=20 >>>> On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM wrote: >>>>> On 17. Jul 2024, at 22:00, Alan Somers = wrote: >>>>>=20 >>>>> On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM = wrote: >>>>>>=20 >>>>>>> On 13. Jul 2024, at 01:43, Alan Somers = wrote: >>>>>>>=20 >>>>>>> I've been experimenting with RACK and BBR. In my environment, = they >>>>>>> can dramatically improve single-stream TCP performance, which is >>>>>>> awesome. But pf interferes. I have to disable pf in order for = them >>>>>>> to work at all. >>>>>>>=20 >>>>>>> Is this a known limitation? If not, I will experiment some more = to >>>>>>> determine exactly what aspect of my pf configuration is = responsible. >>>>>>> If so, can anybody suggest what changes would have to happen to = make >>>>>>> the two compatible? >>>>>> A problem with same symptoms was already reported and fixed in >>>>>> https://reviews.freebsd.org/D43769 >>>>>>=20 >>>>>> Which version are you using? >>>>>>=20 >>>>>> Best regards >>>>>> Michael >>>>>>>=20 >>>>>>> -Alan >>>>>=20 >>>>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best >>>>>=20 >>>>> I want to follow up with the list to post my conclusions. Firstly >>>>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a = 3-way >>>>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf. I can >>>>> confirm that tcp_bbr works for me if I either disable LRO, disable = PF, >>>>> or switch to a 14.1 server. >>>>>=20 >>>>> Here's the real problem: on multiple production servers, = downloading >>>>> large files (or ZFS send/recv streams) was slow. After ruling out >>>>> many possible causes, wireshark revealed that the connection was >>>>> suffering about 0.05% packet loss. I don't know the source of = that >>>>> packet loss, but I don't believe it to be congestion-related. = Along >>>>> with a 54ms RTT, that's a fatal combination for the throughput of >>>>> loss-based congestion control algorithms. According to the Mathis >>>>> Formula [1], I could only expect 1.1 MBps over such a connection. >>>>> That's actually worse than what I saw. With default settings >>>>> (cc_cubic), I averaged 5.6 MBps. Probably Mathis's assumptions = are >>>>> outdated, but that's still pretty close for such a simple formula >>>>> that's 27 years old. >>>>>=20 >>>>> So I benchmarked all available congestion control algorithms for >>>>> single download streams. The results are summarized in the table >>>>> below. >>>>>=20 >>>>> Algo Packet Loss Rate Average Throughput >>>>> vegas 0.05% 2.0 MBps >>>>> newreno 0.05% 3.2 MBps >>>>> cubic 0.05% 5.6 MBps >>>>> hd 0.05% 8.6 MBps >>>>> cdg 0.05% 13.5 MBps >>>>> rack 0.04% 14 MBps >>>>> htcp 0.05% 15 MBps >>>>> dctcp 0.05% 15 MBps >>>>> chd 0.05% 17.3 MBps >>>>> bbr 0.05% 29.2 MBps >>>>> cubic 10% 159 kBps >>>>> chd 10% 208 kBps >>>>> bbr 10% 5.7 MBps >>>>>=20 >>>>> RACK seemed to achieve about the same maximum bandwidth as BBR, = though >>>>> it took a lot longer to get there. Also, with RACK, wireshark >>>>> reported about 10x as many retransmissions as dropped packets, = which >>>>> is suspicious. >>>>>=20 >>>>> At one point, something went haywire and packet loss briefly = spiked to >>>>> the neighborhood of 10%. I took advantage of the chaos to repeat = my >>>>> measurements. As the table shows, all algorithms sucked under = those >>>>> conditions, but BBR sucked impressively less than the others. >>>>>=20 >>>>> Disclaimer: there was significant run-to-run variation; the = presented >>>>> results are averages. And I did not attempt to measure packet = loss >>>>> exactly for most runs; 0.05% is merely an average of a few = selected >>>>> runs. These measurements were taken on a production server = running a >>>>> real workload, which introduces noise. Soon I hope to have the >>>>> opportunity to repeat the experiment on an idle server in the same >>>>> environment. >>>>>=20 >>>>> In conclusion, while we'd like to use BBR, we really can't until = we >>>>> upgrade to 14.1, which hopefully will be soon. So in the meantime >>>>> we've switched all relevant servers from cubic to chd, and we'll >>>>> reevaluate BBR after the upgrade. >>>> Hi Alan, >>>>=20 >>>> just to be clear: the version of BBR currently implemented is >>>> BBR version 1, which is known to be unfair in certain scenarios. >>>> Google is still working on BBR to address this problem and improve >>>> it in other aspects. But there is no RFC yet and the updates = haven't >>>> been implemented yet in FreeBSD. >>>>=20 >>>> Best regards >>>> Michael >>>>>=20 >>>>> [1]: = https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html >>>>>=20 >>>>> -Alan >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> Junho Choi | https://saturnsoft.net >>>=20 >>=20 >=20 >=20 >=20 >=20 > --=20 > Junho Choi | https://saturnsoft.net