low TCP speed, wrong rtt measurement
- Reply: Richard Perini : "Re: low TCP speed, wrong rtt measurement"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 04 Apr 2023 14:46:34 UTC
** maybe this should rather go the -net list, but then
** there are only bug messages
Hi,
I'm trying to transfer backup data via WAN; the link bandwidth is
only ~2 Mbit, but this can well run for days and just saturate the spare
bandwidth.
The problem is, it doesn't saturate the bandwidth.
I found that the backup application opens the socket in this way:
if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
Apparently that doesn't work well. So I patched the application to do
it this way:
- if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) {
+ if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) {
The result, observed with tcpdump, was now noticeably different, but
rather worse than better.
I tried various cc algorithms, all behaved very bad with the exception
of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying
results with less than 1% tradeoff.
But only for a time. After transferring for a couple of hours the
throughput went bad again:
# netstat -aC
Proto Recv-Q Send-Q Local Address Foreign Address (state) CC cwin ssthresh MSS ECN
tcp6 0 57351 edge-jo.26996 pole-n.22 ESTABLISHED vegas 22203 10392 1311 off
tcp4 0 106305 edge-e.62275 pole-n.bacula-sd ESTABLISHED vegas 11943 5276 1331 off
The first connection is freshly created. The second one runs for a day
already , and it is obviousely hosed - it doesn't recover.
# sysctl net.inet.tcp.cc.vegas
net.inet.tcp.cc.vegas.beta: 14
net.inet.tcp.cc.vegas.alpha: 8
8 (alpha) x 1331 (mss) = 10648
The cwin is adjusted to precisely one tick above the alpha, and
doesn't rise further. (Increasing the alpha further does solve the
issue for this connection - but that is not how things are supposed to
work.)
Now I tried to look into the data that vegas would use for it's
decisions, and found this:
# dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\
(*((struct tcpcb **)(arg0+24)))->snd_cwnd,\
((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\
((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\
((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\
((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\
}'
CPU ID FUNCTION:NAME
6 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
3 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
5 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
17 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131
11 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
15 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
13 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
16 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106
3 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261
One can see that the "minrtt" value for the freshly created connection
is 56 (which is very plausible).
But the old and hosed connection shows minrtt = 1, which explains the
observed cwin.
The minrtt gets calculated in sys/netinet/khelp/h_ertt.c:
e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1;
There is a "+1", so this was apparently zero.
But source and destination are at least 1000 km apart. So either we
have had one of the rare occasions of hyperspace tunnelling, or
something is going wrong in the ertt measurement code.
For now this is a one-time observation, but it might also explain why
the other cc algorithms behaved badly. These algorithms are widely in
use and should work - the ertt measurement however is the same for all of
them.