From nobody Tue Apr 04 14:46:34 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PrW4q5F6Yz43hN7 for ; Tue, 4 Apr 2023 14:54:35 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PrW4p07Bmz3Q1X for ; Tue, 4 Apr 2023 14:54:33 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of li-fbsd@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=li-fbsd@citylink.dinoex.sub.org; dmarc=none Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 334Es76f003480 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 4 Apr 2023 16:54:07 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: (from uucp@localhost) by uucp.dinoex.org (8.17.1/8.17.1/Submit) with UUCP id 334Es72r003479 for freebsd-hackers@freebsd.org; Tue, 4 Apr 2023 16:54:07 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from admn.intra.daemon.contact (localhost [127.0.0.1]) by admn.intra.daemon.contact (8.17.1/8.17.1) with ESMTPS id 334EkcoA096039 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 4 Apr 2023 16:46:39 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from intra.daemon.contact (news@localhost) by admn.intra.daemon.contact (8.17.1/8.17.1/Submit) with NNTP id 334EkYHH096016 for freebsd-hackers@freebsd.org; Tue, 4 Apr 2023 16:46:34 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-Authentication-Warning: admn.intra.daemon.contact: news set sender to li-fbsd@citylink.dinoex.sub.org using -f From: "Peter 'PMc' Much" X-Newsgroups: m2n.fbsd.hackers Subject: low TCP speed, wrong rtt measurement Date: Tue, 4 Apr 2023 14:46:34 -0000 (UTC) Message-ID: Injection-Date: Tue, 4 Apr 2023 14:46:34 -0000 (UTC) Injection-Info: admn.intra.daemon.contact; logging-data="90721"; mail-complaints-to="usenet@citylink.dinoex.sub.org" User-Agent: slrn/1.0.3 (FreeBSD) To: freebsd-hackers@freebsd.org X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Tue, 04 Apr 2023 16:54:10 +0200 (CEST) X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; FORGED_SENDER(0.30)[pmc@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; R_DKIM_NA(0.00)[]; DMARC_NA(0.00)[sub.org]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; RCVD_COUNT_THREE(0.00)[4]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; TO_MATCH_ENVRCPT_ALL(0.00)[]; HAS_XAW(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; FROM_NEQ_ENVFROM(0.00)[pmc@citylink.dinoex.sub.org,li-fbsd@citylink.dinoex.sub.org] X-Rspamd-Queue-Id: 4PrW4p07Bmz3Q1X X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org ** maybe this should rather go the -net list, but then ** there are only bug messages Hi, I'm trying to transfer backup data via WAN; the link bandwidth is only ~2 Mbit, but this can well run for days and just saturate the spare bandwidth. The problem is, it doesn't saturate the bandwidth. I found that the backup application opens the socket in this way: if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { Apparently that doesn't work well. So I patched the application to do it this way: - if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { + if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) { The result, observed with tcpdump, was now noticeably different, but rather worse than better. I tried various cc algorithms, all behaved very bad with the exception of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying results with less than 1% tradeoff. But only for a time. After transferring for a couple of hours the throughput went bad again: # netstat -aC Proto Recv-Q Send-Q Local Address Foreign Address (state) CC cwin ssthresh MSS ECN tcp6 0 57351 edge-jo.26996 pole-n.22 ESTABLISHED vegas 22203 10392 1311 off tcp4 0 106305 edge-e.62275 pole-n.bacula-sd ESTABLISHED vegas 11943 5276 1331 off The first connection is freshly created. The second one runs for a day already , and it is obviousely hosed - it doesn't recover. # sysctl net.inet.tcp.cc.vegas net.inet.tcp.cc.vegas.beta: 14 net.inet.tcp.cc.vegas.alpha: 8 8 (alpha) x 1331 (mss) = 10648 The cwin is adjusted to precisely one tick above the alpha, and doesn't rise further. (Increasing the alpha further does solve the issue for this connection - but that is not how things are supposed to work.) Now I tried to look into the data that vegas would use for it's decisions, and found this: # dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\ (*((struct tcpcb **)(arg0+24)))->snd_cwnd,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\ }' CPU ID FUNCTION:NAME 6 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 3 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 5 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 17 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 11 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 15 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 13 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 16 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 3 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 One can see that the "minrtt" value for the freshly created connection is 56 (which is very plausible). But the old and hosed connection shows minrtt = 1, which explains the observed cwin. The minrtt gets calculated in sys/netinet/khelp/h_ertt.c: e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1; There is a "+1", so this was apparently zero. But source and destination are at least 1000 km apart. So either we have had one of the rare occasions of hyperspace tunnelling, or something is going wrong in the ertt measurement code. For now this is a one-time observation, but it might also explain why the other cc algorithms behaved badly. These algorithms are widely in use and should work - the ertt measurement however is the same for all of them.