From nobody Thu Jul 18 13:00:38 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WPtGn0S7Bz5RPkY for ; Thu, 18 Jul 2024 13:01:21 +0000 (UTC) (envelope-from junho.choi@gmail.com) Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WPtGm4jFpz3xWN; Thu, 18 Jul 2024 13:01:20 +0000 (UTC) (envelope-from junho.choi@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-52e9f788e7bso407497e87.0; Thu, 18 Jul 2024 06:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721307677; x=1721912477; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=zULF/gbAryHYfR6gEc0c+LdZEGoRE2j1Ibpbvpx3eCA=; b=IK2xXuSsDnMky55gASOgRMkTbDkyaK5V0wsqWKKnIpbqu5GIL62sqJ/v1x8FkGmvTF VKd6V2HYy7MWkU9Bh8zgwp9WqNY4U3l9jPHQPQGLyjYOom8xTWeICM585+r94Bo1/9Av LSaZDjnUAa5bgNYNcL1cgfSe1lwgY6ZaCxTLbENqQsouUJIwDzK2Ce14Tv16nlw6ckrU SjuldroXGCNjjDhqkDbJgPyMxfhVg0wZBdbRIJkzIVc4D8G6A6nrFub39MElWtokUwZA Py81ZayLbtvF+i6Rst/3Eaw5Un2r57ArFdQnxbpPelf9IYW83MzC1dsbALD3VU6RLzd9 bbTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721307677; x=1721912477; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zULF/gbAryHYfR6gEc0c+LdZEGoRE2j1Ibpbvpx3eCA=; b=S9gN8K48dRJ+c+OBYLJg0FOEPrGCxe1nJCjAQPK1tU/49cNkMJ80VL8PaSjoo0OSs2 cLTTzA/vkiJplvWJ4MPgClhJ858mzB/IilKiPfgXaMjjd2esxY89Ur3bBty4isIe5mu4 19O5wEC2vEyayhiRD2NraCGeDwnxB+wYIwat0442PKTdnJUoskhwGM7SNNVycOCQgRCB /bMvyrFKrNO+iwGxGk7a0DYtdrsjt/Zngc5wy7wZpsKuq6xNeqx6MAbHhntN/7Ri35qH VczrAGhNAqm0rRH+0AAGcX/ndaEtpn/AtQ6BvJyzkoecr27uL28VWTUeBQhx0uNsljWN gPyg== X-Forwarded-Encrypted: i=1; AJvYcCWxDukFe9ueR0X9DSXJSSVbL44zH7QBHVGHZdbXHWjhsgUj4DWXEHWX+v+2uCHusB8VxmiGcw6iNh3cAE2e6F92hhLMV3XGbg== X-Gm-Message-State: AOJu0YxWA5v/YyeEFYHXjwaYmy+a0CSWBimlnkrBUH8mwc827TZQwDVC EOgafW32tC22fcJOs/LwP4PdfDkj36fgyCqGg/G7uKY27WWqFRz8Fw1wUVcCtWDAOAY6Pz5pxds IaHHi8VVg0oXUrxZOmX0vV56FVWHX5Q== X-Google-Smtp-Source: AGHT+IEN02eOpwejlEhEq0Si5kmcszm7F/atZMsxSUYTIerKAxRgi9ljMVXru1MoZ0Zju9NtczPmOD9OFRIRZQvPUh8= X-Received: by 2002:a05:6512:1243:b0:52c:df86:68c6 with SMTP id 2adb3069b0e04-52ee53b1b4amr3345566e87.16.1721307676726; Thu, 18 Jul 2024 06:01:16 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Junho Choi Date: Thu, 18 Jul 2024 22:00:38 +0900 Message-ID: Subject: Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls) To: tuexen@freebsd.org Cc: Alan Somers , FreeBSD Net Content-Type: multipart/alternative; boundary="0000000000009edf8f061d852cb0" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4WPtGm4jFpz3xWN --0000000000009edf8f061d852cb0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Alan - this is a great result to see. Thanks for experimenting. Just curious why bbr and rack don't co-exist? Those are two separate things= . Is it a current bug or by design? BR, On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM wrote: > > On 17. Jul 2024, at 22:00, Alan Somers wrote: > > > > On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM wrote: > >> > >>> On 13. Jul 2024, at 01:43, Alan Somers wrote: > >>> > >>> I've been experimenting with RACK and BBR. In my environment, they > >>> can dramatically improve single-stream TCP performance, which is > >>> awesome. But pf interferes. I have to disable pf in order for them > >>> to work at all. > >>> > >>> Is this a known limitation? If not, I will experiment some more to > >>> determine exactly what aspect of my pf configuration is responsible. > >>> If so, can anybody suggest what changes would have to happen to make > >>> the two compatible? > >> A problem with same symptoms was already reported and fixed in > >> https://reviews.freebsd.org/D43769 > >> > >> Which version are you using? > >> > >> Best regards > >> Michael > >>> > >>> -Alan > > > > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best > > > > I want to follow up with the list to post my conclusions. Firstly > > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way > > incompatibility between (tcp_bbr || tcp_rack) && lro && pf. I can > > confirm that tcp_bbr works for me if I either disable LRO, disable PF, > > or switch to a 14.1 server. > > > > Here's the real problem: on multiple production servers, downloading > > large files (or ZFS send/recv streams) was slow. After ruling out > > many possible causes, wireshark revealed that the connection was > > suffering about 0.05% packet loss. I don't know the source of that > > packet loss, but I don't believe it to be congestion-related. Along > > with a 54ms RTT, that's a fatal combination for the throughput of > > loss-based congestion control algorithms. According to the Mathis > > Formula [1], I could only expect 1.1 MBps over such a connection. > > That's actually worse than what I saw. With default settings > > (cc_cubic), I averaged 5.6 MBps. Probably Mathis's assumptions are > > outdated, but that's still pretty close for such a simple formula > > that's 27 years old. > > > > So I benchmarked all available congestion control algorithms for > > single download streams. The results are summarized in the table > > below. > > > > Algo Packet Loss Rate Average Throughput > > vegas 0.05% 2.0 MBps > > newreno 0.05% 3.2 MBps > > cubic 0.05% 5.6 MBps > > hd 0.05% 8.6 MBps > > cdg 0.05% 13.5 MBps > > rack 0.04% 14 MBps > > htcp 0.05% 15 MBps > > dctcp 0.05% 15 MBps > > chd 0.05% 17.3 MBps > > bbr 0.05% 29.2 MBps > > cubic 10% 159 kBps > > chd 10% 208 kBps > > bbr 10% 5.7 MBps > > > > RACK seemed to achieve about the same maximum bandwidth as BBR, though > > it took a lot longer to get there. Also, with RACK, wireshark > > reported about 10x as many retransmissions as dropped packets, which > > is suspicious. > > > > At one point, something went haywire and packet loss briefly spiked to > > the neighborhood of 10%. I took advantage of the chaos to repeat my > > measurements. As the table shows, all algorithms sucked under those > > conditions, but BBR sucked impressively less than the others. > > > > Disclaimer: there was significant run-to-run variation; the presented > > results are averages. And I did not attempt to measure packet loss > > exactly for most runs; 0.05% is merely an average of a few selected > > runs. These measurements were taken on a production server running a > > real workload, which introduces noise. Soon I hope to have the > > opportunity to repeat the experiment on an idle server in the same > > environment. > > > > In conclusion, while we'd like to use BBR, we really can't until we > > upgrade to 14.1, which hopefully will be soon. So in the meantime > > we've switched all relevant servers from cubic to chd, and we'll > > reevaluate BBR after the upgrade. > Hi Alan, > > just to be clear: the version of BBR currently implemented is > BBR version 1, which is known to be unfair in certain scenarios. > Google is still working on BBR to address this problem and improve > it in other aspects. But there is no RFC yet and the updates haven't > been implemented yet in FreeBSD. > > Best regards > Michael > > > > [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html > > > > -Alan > > > --=20 Junho Choi | https://saturnsoft.net --0000000000009edf8f061d852cb0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Alan - this is a great result to see. Thanks for expe= rimenting.

Just curious why bbr and rack don&#= 39;t co-exist? Those are two separate things.
Is it a current bug= or by design?

BR,

On Thu, Jul 18, 2024= at 5:27=E2=80=AFAM <tuexen@freebs= d.org> wrote:
> On 17. Jul 2024, at 22:00, Alan Somers <asomers@freebsd.org> wrote:
>
> On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM <tuexen@freebsd.org> wrote:
>>
>>> On 13. Jul 2024, at 01:43, Alan Somers <asomers@FreeBSD.org= > wrote:
>>>
>>> I've been experimenting with RACK and BBR.=C2=A0 In my env= ironment, they
>>> can dramatically improve single-stream TCP performance, which = is
>>> awesome.=C2=A0 But pf interferes.=C2=A0 I have to disable pf i= n order for them
>>> to work at all.
>>>
>>> Is this a known limitation?=C2=A0 If not, I will experiment so= me more to
>>> determine exactly what aspect of my pf configuration is respon= sible.
>>> If so, can anybody suggest what changes would have to happen t= o make
>>> the two compatible?
>> A problem with same symptoms was already reported and fixed in
>> https://reviews.freebsd.org/D43769
>>
>> Which version are you using?
>>
>> Best regards
>> Michael
>>>
>>> -Alan
>
> TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
>
> I want to follow up with the list to post my conclusions.=C2=A0 Firstl= y
> tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way > incompatibility between (tcp_bbr || tcp_rack) && lro &&= ; pf.=C2=A0 I can
> confirm that tcp_bbr works for me if I either disable LRO, disable PF,=
> or switch to a 14.1 server.
>
> Here's the real problem: on multiple production servers, downloadi= ng
> large files (or ZFS send/recv streams) was slow.=C2=A0 After ruling ou= t
> many possible causes, wireshark revealed that the connection was
> suffering about 0.05% packet loss.=C2=A0 I don't know the source o= f that
> packet loss, but I don't believe it to be congestion-related.=C2= =A0 Along
> with a 54ms RTT, that's a fatal combination for the throughput of<= br> > loss-based congestion control algorithms.=C2=A0 According to the Mathi= s
> Formula [1], I could only expect 1.1 MBps over such a connection.
> That's actually worse than what I saw.=C2=A0 With default settings=
> (cc_cubic), I averaged 5.6 MBps.=C2=A0 Probably Mathis's assumptio= ns are
> outdated, but that's still pretty close for such a simple formula<= br> > that's 27 years old.
>
> So I benchmarked all available congestion control algorithms for
> single download streams.=C2=A0 The results are summarized in the table=
> below.
>
> Algo=C2=A0 =C2=A0 Packet Loss Rate=C2=A0 =C2=A0 Average Throughput
> vegas=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A02.0 MBps
> newreno 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.= 2 MBps
> cubic=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A05.6 MBps
> hd=C2=A0 =C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A08.6 MBps
> cdg=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A013.5 MBps
> rack=C2=A0 =C2=A0 0.04%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A014 MBps
> htcp=C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A015 MBps
> dctcp=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A015 MBps
> chd=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A017.3 MBps
> bbr=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A029.2 MBps
> cubic=C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0159 kBps
> chd=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0208 kBps
> bbr=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A05.7 MBps
>
> RACK seemed to achieve about the same maximum bandwidth as BBR, though=
> it took a lot longer to get there.=C2=A0 Also, with RACK, wireshark > reported about 10x as many retransmissions as dropped packets, which > is suspicious.
>
> At one point, something went haywire and packet loss briefly spiked to=
> the neighborhood of 10%.=C2=A0 I took advantage of the chaos to repeat= my
> measurements.=C2=A0 As the table shows, all algorithms sucked under th= ose
> conditions, but BBR sucked impressively less than the others.
>
> Disclaimer: there was significant run-to-run variation; the presented<= br> > results are averages.=C2=A0 And I did not attempt to measure packet lo= ss
> exactly for most runs; 0.05% is merely an average of a few selected > runs.=C2=A0 These measurements were taken on a production server runni= ng a
> real workload, which introduces noise.=C2=A0 Soon I hope to have the > opportunity to repeat the experiment on an idle server in the same
> environment.
>
> In conclusion, while we'd like to use BBR, we really can't unt= il we
> upgrade to 14.1, which hopefully will be soon.=C2=A0 So in the meantim= e
> we've switched all relevant servers from cubic to chd, and we'= ll
> reevaluate BBR after the upgrade.
Hi Alan,

just to be clear: the version of BBR currently implemented is
BBR version 1, which is known to be unfair in certain scenarios.
Google is still working on BBR to address this problem and improve
it in other aspects. But there is no RFC yet and the updates haven't been implemented yet in FreeBSD.

Best regards
Michael
>
> [1]: https://www.slac.stanford.= edu/comp/net/wan-mon/thru-vs-loss.html
>
> -Alan




--
Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.net
--0000000000009edf8f061d852cb0--