From nobody Fri Jul 19 03:07:15 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WQF3c5D5Nz5QxZj for ; Fri, 19 Jul 2024 03:07:56 +0000 (UTC) (envelope-from junho.choi@gmail.com) Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WQF3c2h4fz4cxc; Fri, 19 Jul 2024 03:07:56 +0000 (UTC) (envelope-from junho.choi@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2eeb2d60efbso22936401fa.1; Thu, 18 Jul 2024 20:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721358473; x=1721963273; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MT2G3BtEhcqel7vj1rWb+BdogIQ79JXaiPiWYDfQFo0=; b=mVprtj0vDD/J0PTFn2xXUEKhh7DRLue/e3TZoKvdnuVd0ScFWAHZHLAK/boYSXB1jm XZDDQScZ8GZBtSC7S8U9Ex7GZ4tjpit/ko0FFA6FfE0XaraINDL4xyvJNVCZH+Jxs1vF rCbKHw0nG1WkM88jPTCNn8/BsrP8PfrL14kKEqzDBWhi8Ax6KhepgexYDGVB3krgqPZI CiWc00GZCP3+J+TuTjvu90V/+a3Dfhhu53DNVQ+cgRkTkl/0OlehskChkNoY8id2gm4a GN8IkzYs94+YErxxaXyxjh7zF2ez2ab0TvZBjhiu8AM4D7Bt34mFixSKpwuBAtAlfo/r wH5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721358473; x=1721963273; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MT2G3BtEhcqel7vj1rWb+BdogIQ79JXaiPiWYDfQFo0=; b=f7X+qigN9jPvPjq3ySReIsatvG6yLifVVBZjG5sjZYb2EkQOyDiN7GycC6YjwWFqhd TvS0cuuR/qW3ftbNM8WopDhiirxefpVv7XObxQ6Kw4zY06QUHmDZ78bObf9iQhrQZnzn YLhTixPzuhgeiU0Y/6tS/Et0J7mD20c2Cc52Q6ArSbCaKK7K/4DWZIqtPFt1kDK0MqWG JQN1EWuRrFUDUmQ9LwTN0AhEKflgpZ4UzGDA+SkYM/Ph3HWuWqpINNIoVy2bwF5gTE7V 3SzoifhCnbxdyym80/KjImfxRRjuxtDrnHFolplAU+7FM6AalJ1bKVom4gZtXNcWCrC3 QJuQ== X-Forwarded-Encrypted: i=1; AJvYcCX+MAqfPalU0249daRhkZzq8wAWODyw+WZC57/br9YfTkryYzczLRdVGPi30M+r7oBnBrSAB8HlKsG6ypTFO8fBmR2VSN6hkA== X-Gm-Message-State: AOJu0YxiBrpVM9W7yFwYJJ6WWFPi4RF2Ks3CrFd9cAU2Jv3mK+oE3Q/g X9RuUhEC2kOMxBUqnHEvzFVFw90HGIonoGeXE9D0+MzDk7ik0eYFCcky8BeLRKe03wJ3xj0b8Tb 54kOXzCCdPFvrKf1ZBa7X/C5GCsrx2mJb X-Google-Smtp-Source: AGHT+IGa3TjIdY3AmBgQCpzhIwdf8g4zBxhOQUTC0XDCbvxEfvX3KX3zBLd68izJQSwRJ+48S+7oovCzEMPCrbouW2w= X-Received: by 2002:a05:6512:3a91:b0:52e:95fc:3937 with SMTP id 2adb3069b0e04-52ee53b1494mr4829728e87.15.1721358472871; Thu, 18 Jul 2024 20:07:52 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 References: <400A46A2-E75F-4BE3-BFFF-340CF4557322@freebsd.org> In-Reply-To: <400A46A2-E75F-4BE3-BFFF-340CF4557322@freebsd.org> From: Junho Choi Date: Fri, 19 Jul 2024 12:07:15 +0900 Message-ID: Subject: Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls) To: tuexen@freebsd.org Cc: Alan Somers , FreeBSD Net Content-Type: multipart/alternative; boundary="0000000000004e85a0061d9100d9" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4WQF3c2h4fz4cxc --0000000000004e85a0061d9100d9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable RACK is a loss detection algorithm and BBR is a congestion control algorithm so it's on a different layer. e.g. linux can configure them independently. However in FreeBSD it looks like it is using the same configuration sysctl (net.inet.tcp.functions_default=3Dtcp_rack|tcp_bbr), so not able to set it both. Is there any plan to improve it? or does tcp_bbr include tcp_rack's loss probe behavior? A little confused. Best, On Fri, Jul 19, 2024 at 4:23=E2=80=AFAM wrote: > > On 18. Jul 2024, at 20:37, Alan Somers wrote: > > > > Coexist how? Do you mean that one socket can use one and a different > > socket uses the other? That makes sense. > Correct. > > Best regards > Michael > > > > On Thu, Jul 18, 2024 at 10:34=E2=80=AFAM wrote: > >> > >>> On 18. Jul 2024, at 15:00, Junho Choi wrote: > >>> > >>> Alan - this is a great result to see. Thanks for experimenting. > >>> > >>> Just curious why bbr and rack don't co-exist? Those are two separate > things. > >>> Is it a current bug or by design? > >> Technically RACK and BBR can coexist. The problem was with pf and/or > LRO. > >> > >> But this is all fixed now in 14.1 and head. > >> > >> Best regards > >> Michael > >>> > >>> BR, > >>> > >>> On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM wrote: > >>>> On 17. Jul 2024, at 22:00, Alan Somers wrote: > >>>> > >>>> On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM wrote: > >>>>> > >>>>>> On 13. Jul 2024, at 01:43, Alan Somers wrote= : > >>>>>> > >>>>>> I've been experimenting with RACK and BBR. In my environment, the= y > >>>>>> can dramatically improve single-stream TCP performance, which is > >>>>>> awesome. But pf interferes. I have to disable pf in order for th= em > >>>>>> to work at all. > >>>>>> > >>>>>> Is this a known limitation? If not, I will experiment some more t= o > >>>>>> determine exactly what aspect of my pf configuration is responsibl= e. > >>>>>> If so, can anybody suggest what changes would have to happen to ma= ke > >>>>>> the two compatible? > >>>>> A problem with same symptoms was already reported and fixed in > >>>>> https://reviews.freebsd.org/D43769 > >>>>> > >>>>> Which version are you using? > >>>>> > >>>>> Best regards > >>>>> Michael > >>>>>> > >>>>>> -Alan > >>>> > >>>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best > >>>> > >>>> I want to follow up with the list to post my conclusions. Firstly > >>>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way > >>>> incompatibility between (tcp_bbr || tcp_rack) && lro && pf. I can > >>>> confirm that tcp_bbr works for me if I either disable LRO, disable P= F, > >>>> or switch to a 14.1 server. > >>>> > >>>> Here's the real problem: on multiple production servers, downloading > >>>> large files (or ZFS send/recv streams) was slow. After ruling out > >>>> many possible causes, wireshark revealed that the connection was > >>>> suffering about 0.05% packet loss. I don't know the source of that > >>>> packet loss, but I don't believe it to be congestion-related. Along > >>>> with a 54ms RTT, that's a fatal combination for the throughput of > >>>> loss-based congestion control algorithms. According to the Mathis > >>>> Formula [1], I could only expect 1.1 MBps over such a connection. > >>>> That's actually worse than what I saw. With default settings > >>>> (cc_cubic), I averaged 5.6 MBps. Probably Mathis's assumptions are > >>>> outdated, but that's still pretty close for such a simple formula > >>>> that's 27 years old. > >>>> > >>>> So I benchmarked all available congestion control algorithms for > >>>> single download streams. The results are summarized in the table > >>>> below. > >>>> > >>>> Algo Packet Loss Rate Average Throughput > >>>> vegas 0.05% 2.0 MBps > >>>> newreno 0.05% 3.2 MBps > >>>> cubic 0.05% 5.6 MBps > >>>> hd 0.05% 8.6 MBps > >>>> cdg 0.05% 13.5 MBps > >>>> rack 0.04% 14 MBps > >>>> htcp 0.05% 15 MBps > >>>> dctcp 0.05% 15 MBps > >>>> chd 0.05% 17.3 MBps > >>>> bbr 0.05% 29.2 MBps > >>>> cubic 10% 159 kBps > >>>> chd 10% 208 kBps > >>>> bbr 10% 5.7 MBps > >>>> > >>>> RACK seemed to achieve about the same maximum bandwidth as BBR, thou= gh > >>>> it took a lot longer to get there. Also, with RACK, wireshark > >>>> reported about 10x as many retransmissions as dropped packets, which > >>>> is suspicious. > >>>> > >>>> At one point, something went haywire and packet loss briefly spiked = to > >>>> the neighborhood of 10%. I took advantage of the chaos to repeat my > >>>> measurements. As the table shows, all algorithms sucked under those > >>>> conditions, but BBR sucked impressively less than the others. > >>>> > >>>> Disclaimer: there was significant run-to-run variation; the presente= d > >>>> results are averages. And I did not attempt to measure packet loss > >>>> exactly for most runs; 0.05% is merely an average of a few selected > >>>> runs. These measurements were taken on a production server running = a > >>>> real workload, which introduces noise. Soon I hope to have the > >>>> opportunity to repeat the experiment on an idle server in the same > >>>> environment. > >>>> > >>>> In conclusion, while we'd like to use BBR, we really can't until we > >>>> upgrade to 14.1, which hopefully will be soon. So in the meantime > >>>> we've switched all relevant servers from cubic to chd, and we'll > >>>> reevaluate BBR after the upgrade. > >>> Hi Alan, > >>> > >>> just to be clear: the version of BBR currently implemented is > >>> BBR version 1, which is known to be unfair in certain scenarios. > >>> Google is still working on BBR to address this problem and improve > >>> it in other aspects. But there is no RFC yet and the updates haven't > >>> been implemented yet in FreeBSD. > >>> > >>> Best regards > >>> Michael > >>>> > >>>> [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.htm= l > >>>> > >>>> -Alan > >>> > >>> > >>> > >>> > >>> -- > >>> Junho Choi | https://saturnsoft.net > >> > > > > > --=20 Junho Choi | https://saturnsoft.net --0000000000004e85a0061d9100d9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
RACK is a loss detection algorithm and BBR is a conge= stion control algorithm so it's on a different layer.
e.g. li= nux can configure them independently.

However = in FreeBSD it looks like it is using the same configuration sysctl (net.ine= t.tcp.functions_default=3Dtcp_rack|tcp_bbr),
so not able to set i= t both.

Is there any plan to improve it? or does t= cp_bbr include tcp_rack's loss probe behavior?

A little confused.

Best,

=
= On Fri, Jul 19, 2024 at 4:23=E2=80=AFAM <tuexen@freebsd.org> wrote:
> On 18. Jul 2024, at 20:37, Alan Somers <asomers@freebsd.org<= /a>> wrote:
>
> Coexist how?=C2=A0 Do you mean that one socket can use one and a diffe= rent
> socket uses the other?=C2=A0 That makes sense.
Correct.

Best regards
Michael
>
> On Thu, Jul 18, 2024 at 10:34=E2=80=AFAM <tuexen@freebsd.org> wrote:
>>
>>> On 18. Jul 2024, at 15:00, Junho Choi <junho.choi@gmail.com> wrote: >>>
>>> Alan - this is a great result to see. Thanks for experimenting= .
>>>
>>> Just curious why bbr and rack don't co-exist? Those are tw= o separate things.
>>> Is it a current bug or by design?
>> Technically RACK and BBR can coexist. The problem was with pf and/= or LRO.
>>
>> But this is all fixed now in 14.1 and head.
>>
>> Best regards
>> Michael
>>>
>>> BR,
>>>
>>> On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM <tuexen@freebsd.org> wrote:
>>>> On 17. Jul 2024, at 22:00, Alan Somers <asomers@freebsd.org> wrote= :
>>>>
>>>> On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM <tuexen@freebsd.org> wrote:=
>>>>>
>>>>>> On 13. Jul 2024, at 01:43, Alan Somers <asomers= @FreeBSD.org> wrote:
>>>>>>
>>>>>> I've been experimenting with RACK and BBR.=C2= =A0 In my environment, they
>>>>>> can dramatically improve single-stream TCP perform= ance, which is
>>>>>> awesome.=C2=A0 But pf interferes.=C2=A0 I have to = disable pf in order for them
>>>>>> to work at all.
>>>>>>
>>>>>> Is this a known limitation?=C2=A0 If not, I will e= xperiment some more to
>>>>>> determine exactly what aspect of my pf configurati= on is responsible.
>>>>>> If so, can anybody suggest what changes would have= to happen to make
>>>>>> the two compatible?
>>>>> A problem with same symptoms was already reported and = fixed in
>>>>> https://reviews.freebsd.org/D43769
>>>>>
>>>>> Which version are you using?
>>>>>
>>>>> Best regards
>>>>> Michael
>>>>>>
>>>>>> -Alan
>>>>
>>>> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is b= est
>>>>
>>>> I want to follow up with the list to post my conclusions.= =C2=A0 Firstly
>>>> tuexen@ helped me solve my problem: in FreeBSD 14.0 there = is a 3-way
>>>> incompatibility between (tcp_bbr || tcp_rack) && l= ro && pf.=C2=A0 I can
>>>> confirm that tcp_bbr works for me if I either disable LRO,= disable PF,
>>>> or switch to a 14.1 server.
>>>>
>>>> Here's the real problem: on multiple production server= s, downloading
>>>> large files (or ZFS send/recv streams) was slow.=C2=A0 Aft= er ruling out
>>>> many possible causes, wireshark revealed that the connecti= on was
>>>> suffering about 0.05% packet loss.=C2=A0 I don't know = the source of that
>>>> packet loss, but I don't believe it to be congestion-r= elated.=C2=A0 Along
>>>> with a 54ms RTT, that's a fatal combination for the th= roughput of
>>>> loss-based congestion control algorithms.=C2=A0 According = to the Mathis
>>>> Formula [1], I could only expect 1.1 MBps over such a conn= ection.
>>>> That's actually worse than what I saw.=C2=A0 With defa= ult settings
>>>> (cc_cubic), I averaged 5.6 MBps.=C2=A0 Probably Mathis'= ;s assumptions are
>>>> outdated, but that's still pretty close for such a sim= ple formula
>>>> that's 27 years old.
>>>>
>>>> So I benchmarked all available congestion control algorith= ms for
>>>> single download streams.=C2=A0 The results are summarized = in the table
>>>> below.
>>>>
>>>> Algo=C2=A0 =C2=A0 Packet Loss Rate=C2=A0 =C2=A0 Average Th= roughput
>>>> vegas=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A02.0 MBps
>>>> newreno 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A03.2 MBps
>>>> cubic=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A05.6 MBps
>>>> hd=C2=A0 =C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A08.6 MBps
>>>> cdg=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A013.5 MBps
>>>> rack=C2=A0 =C2=A0 0.04%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A014 MBps
>>>> htcp=C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A015 MBps
>>>> dctcp=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A015 MBps
>>>> chd=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A017.3 MBps
>>>> bbr=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A029.2 MBps
>>>> cubic=C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0159 kBps
>>>> chd=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0208 kBps
>>>> bbr=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A05.7 MBps
>>>>
>>>> RACK seemed to achieve about the same maximum bandwidth as= BBR, though
>>>> it took a lot longer to get there.=C2=A0 Also, with RACK, = wireshark
>>>> reported about 10x as many retransmissions as dropped pack= ets, which
>>>> is suspicious.
>>>>
>>>> At one point, something went haywire and packet loss brief= ly spiked to
>>>> the neighborhood of 10%.=C2=A0 I took advantage of the cha= os to repeat my
>>>> measurements.=C2=A0 As the table shows, all algorithms suc= ked under those
>>>> conditions, but BBR sucked impressively less than the othe= rs.
>>>>
>>>> Disclaimer: there was significant run-to-run variation; th= e presented
>>>> results are averages.=C2=A0 And I did not attempt to measu= re packet loss
>>>> exactly for most runs; 0.05% is merely an average of a few= selected
>>>> runs.=C2=A0 These measurements were taken on a production = server running a
>>>> real workload, which introduces noise.=C2=A0 Soon I hope t= o have the
>>>> opportunity to repeat the experiment on an idle server in = the same
>>>> environment.
>>>>
>>>> In conclusion, while we'd like to use BBR, we really c= an't until we
>>>> upgrade to 14.1, which hopefully will be soon.=C2=A0 So in= the meantime
>>>> we've switched all relevant servers from cubic to chd,= and we'll
>>>> reevaluate BBR after the upgrade.
>>> Hi Alan,
>>>
>>> just to be clear: the version of BBR currently implemented is<= br> >>> BBR version 1, which is known to be unfair in certain scenario= s.
>>> Google is still working on BBR to address this problem and imp= rove
>>> it in other aspects. But there is no RFC yet and the updates h= aven't
>>> been implemented yet in FreeBSD.
>>>
>>> Best regards
>>> Michael
>>>>
>>>> [1]: https://www.sl= ac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html
>>>>
>>>> -Alan
>>>
>>>
>>>
>>>
>>> --
>>> Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.ne= t
>>
>




--
Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.net
--0000000000004e85a0061d9100d9--