From nobody Sun Apr 09 00:58:53 2023
X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PvDJb604sz455QM
	for <freebsd-hackers@mlmmj.nyi.freebsd.org>; Sun,  9 Apr 2023 00:59:11 +0000 (UTC)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Received: from gndrsh.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4PvDJZ4G7Hz3wlR;
	Sun,  9 Apr 2023 00:59:10 +0000 (UTC)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Authentication-Results: mx1.freebsd.org;
	none
Received: from gndrsh.dnsmgr.net (localhost [127.0.0.1])
	by gndrsh.dnsmgr.net (8.13.3/8.13.3) with ESMTP id 3390wr4x020758;
	Sat, 8 Apr 2023 17:58:53 -0700 (PDT)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Received: (from freebsd-rwg@localhost)
	by gndrsh.dnsmgr.net (8.13.3/8.13.3/Submit) id 3390wrE1020757;
	Sat, 8 Apr 2023 17:58:53 -0700 (PDT)
	(envelope-from freebsd-rwg)
From: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Message-Id: <202304090058.3390wrE1020757@gndrsh.dnsmgr.net>
Subject: Re: low TCP speed, wrong rtt measurement
In-Reply-To: <ZDHuz+/p3EemMnK7@jodi.ci.com.au>
To: Richard Perini <rpp@ci.com.au>
Date: Sat, 8 Apr 2023 17:58:53 -0700 (PDT)
CC: freebsd-hackers@FreeBSD.org, rscheff@FreeBSD.org
X-Mailer: ELM [version 2.4ME+ PL121h (25)]
List-Id: Technical discussions relating to FreeBSD <freebsd-hackers.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hackers
List-Help: <mailto:freebsd-hackers+help@freebsd.org>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Subscribe: <mailto:freebsd-hackers+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hackers+unsubscribe@freebsd.org>
Sender: owner-freebsd-hackers@freebsd.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-Rspamd-Queue-Id: 4PvDJZ4G7Hz3wlR
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:13868, ipnet:69.59.192.0/19, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-ThisMailContainsUnwantedMimeParts: N

> On Tue, Apr 04, 2023 at 02:46:34PM -0000, Peter 'PMc' Much wrote:
> > ** maybe this should rather go the -net list, but then
> > ** there are only bug messages
> > 
> > Hi,
> >   I'm trying to transfer backup data via WAN; the link bandwidth is
> > only ~2 Mbit, but this can well run for days and just saturate the spare
> > bandwidth. 
> > 
> > The problem is, it doesn't saturate the bandwidth.
> > 
> > I found that the backup application opens the socket in this way:
> >       if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
> > 
> > Apparently that doesn't work well. So I patched the application to do
> > it this way:
> > -      if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
> > +      if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) {
> > 
> > The result, observed with tcpdump, was now noticeably different, but
> > rather worse than better.
> > 
> > I tried various cc algorithms, all behaved very bad with the exception
> > of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying
> > results with less than 1% tradeoff.
> > 
> > But only for a time. After transferring for a couple of hours the
> > throughput went bad again:
> > 
> > # netstat -aC
> > Proto Recv-Q Send-Q Local Address          Foreign Address        (state)     CC          cwin   ssthresh   MSS ECN
> > tcp6       0  57351 edge-jo.26996          pole-n.22              ESTABLISHED vegas      22203      10392  1311 off
> > tcp4       0 106305 edge-e.62275           pole-n.bacula-sd       ESTABLISHED vegas      11943       5276  1331 off
> > 
> > The first connection is freshly created. The second one runs for a day
> > already , and it is obviousely hosed - it doesn't recover.
> > 
> > # sysctl net.inet.tcp.cc.vegas
> > net.inet.tcp.cc.vegas.beta: 14
> > net.inet.tcp.cc.vegas.alpha: 8
> > 
> > 8 (alpha) x 1331 (mss) = 10648
> > 
> > The cwin is adjusted to precisely one tick above the alpha, and
> > doesn't rise further. (Increasing the alpha further does solve the
> > issue for this connection - but that is not how things are supposed to
> > work.)
> > 
> > Now I tried to look into the data that vegas would use for it's
> > decisions, and found this:
> > 
> > # dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\
> > (*((struct tcpcb **)(arg0+24)))->snd_cwnd,\
> > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\
> > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\
> > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\
> > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\
> > }'
> > CPU     ID                    FUNCTION:NAME
> >   6  17478         vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> >  17  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> >  17  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> >   3  17478         vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> >   5  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> >  17  17478         vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> >  11  17478         vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
> >  15  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> >  13  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> >  16  17478         vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
> >   3  17478         vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> > 
> > One can see that the "minrtt" value for the freshly created connection
> > is 56 (which is very plausible).
> > But the old and hosed connection shows minrtt = 1, which explains the
> > observed cwin.
> > 
> > The minrtt gets calculated in sys/netinet/khelp/h_ertt.c:
> >               e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1;
> > There is a "+1", so this was apparently zero.
> > 
> > But source and destination are at least 1000 km apart. So either we
> > have had one of the rare occasions of hyperspace tunnelling, or
> > something is going wrong in the ertt measurement code.
> > 
> > For now this is a one-time observation, but it might also explain why
> > the other cc algorithms behaved badly. These algorithms are widely in
> > use and should work - the ertt measurement however is the same for all of
> > them.
> 
> I can confirm I am seeing similar problems transferring files to our various
> production sites around Australia. Various types/sizes of links and bandwidths.
> I can saturate the nearby links, but the link utilisation/saturation decreases
> with distance.
> 
> I've tried various transfer protocols: ftp, scp, rcp, http: results are 
> similar for all.  Ping times for the closest WAN link is 2.3ms, furthest is
> 60ms.  On the furthest link, we get around 15% utilisation. Transfer between
> 2 Windows hosts on the furthest link yields ~80% utilisation.

Windows should be using cc_cubic, you say above you had tried all the
congestion algorithims, and only cc_vegas after tuning gave good results.

> 
> FreeBSD versions involved are 12.1 and 12.2.

I wonder if cc_cubic is broken in 12.X, it should give
similiar results to windows if things are working correctly.

I am adding Richard Scheffenegger as he is the most recent expect
on the congestion control code in FreeBSD.

> --
> Richard Perini  
> Ramico Australia Pty Ltd   Sydney, Australia   rpp@ci.com.au  +61 2 9552 5500
> -----------------------------------------------------------------------------
> "The difference between theory and practice is that in theory there is no
>  difference, but in practice there is"

-- 
Rod Grimes                                                 rgrimes@freebsd.org