Re: low TCP speed, wrong rtt measurement
- In reply to: Peter 'PMc' Much: "low TCP speed, wrong rtt measurement"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 08 Apr 2023 22:46:39 UTC
On Tue, Apr 04, 2023 at 02:46:34PM -0000, Peter 'PMc' Much wrote:
> ** maybe this should rather go the -net list, but then
> ** there are only bug messages
>
> Hi,
> I'm trying to transfer backup data via WAN; the link bandwidth is
> only ~2 Mbit, but this can well run for days and just saturate the spare
> bandwidth.
>
> The problem is, it doesn't saturate the bandwidth.
>
> I found that the backup application opens the socket in this way:
> if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
>
> Apparently that doesn't work well. So I patched the application to do
> it this way:
> - if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
> + if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) {
>
> The result, observed with tcpdump, was now noticeably different, but
> rather worse than better.
>
> I tried various cc algorithms, all behaved very bad with the exception
> of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying
> results with less than 1% tradeoff.
>
> But only for a time. After transferring for a couple of hours the
> throughput went bad again:
>
> # netstat -aC
> Proto Recv-Q Send-Q Local Address Foreign Address (state) CC cwin ssthresh MSS ECN
> tcp6 0 57351 edge-jo.26996 pole-n.22 ESTABLISHED vegas 22203 10392 1311 off
> tcp4 0 106305 edge-e.62275 pole-n.bacula-sd ESTABLISHED vegas 11943 5276 1331 off
>
> The first connection is freshly created. The second one runs for a day
> already , and it is obviousely hosed - it doesn't recover.
>
> # sysctl net.inet.tcp.cc.vegas
> net.inet.tcp.cc.vegas.beta: 14
> net.inet.tcp.cc.vegas.alpha: 8
>
> 8 (alpha) x 1331 (mss) = 10648
>
> The cwin is adjusted to precisely one tick above the alpha, and
> doesn't rise further. (Increasing the alpha further does solve the
> issue for this connection - but that is not how things are supposed to
> work.)
>
> Now I tried to look into the data that vegas would use for it's
> decisions, and found this:
>
> # dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\
> (*((struct tcpcb **)(arg0+24)))->snd_cwnd,\
> ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\
> ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\
> ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\
> ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\
> }'
> CPU ID FUNCTION:NAME
> 6 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> 3 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> 5 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> 17 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
> 11 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
> 15 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> 13 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
> 16 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
> 3 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
>
> One can see that the "minrtt" value for the freshly created connection
> is 56 (which is very plausible).
> But the old and hosed connection shows minrtt = 1, which explains the
> observed cwin.
>
> The minrtt gets calculated in sys/netinet/khelp/h_ertt.c:
> e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1;
> There is a "+1", so this was apparently zero.
>
> But source and destination are at least 1000 km apart. So either we
> have had one of the rare occasions of hyperspace tunnelling, or
> something is going wrong in the ertt measurement code.
>
> For now this is a one-time observation, but it might also explain why
> the other cc algorithms behaved badly. These algorithms are widely in
> use and should work - the ertt measurement however is the same for all of
> them.
I can confirm I am seeing similar problems transferring files to our various
production sites around Australia. Various types/sizes of links and bandwidths.
I can saturate the nearby links, but the link utilisation/saturation decreases
with distance.
I've tried various transfer protocols: ftp, scp, rcp, http: results are
similar for all. Ping times for the closest WAN link is 2.3ms, furthest is
60ms. On the furthest link, we get around 15% utilisation. Transfer between
2 Windows hosts on the furthest link yields ~80% utilisation.
FreeBSD versions involved are 12.1 and 12.2.
--
Richard Perini
Ramico Australia Pty Ltd Sydney, Australia rpp@ci.com.au +61 2 9552 5500
-----------------------------------------------------------------------------
"The difference between theory and practice is that in theory there is no
difference, but in practice there is"