Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.

From: Randall Stewart <rrs_at_netflix.com>
Date: Thu, 04 May 2023 18:47:00 UTC
Rodney/Chen

This is a real issue in the internet… and its not just LRO/TSO making this
all happen. You have cable modem technology that will batch up and keep the
most recent ack and thus aggregate some number of acks (I have seen up to
10 acks eaten this way.. each of those for 2 segments).. 

You have other middle boxes as well doing similar things and then there is the
channel access technology that at least gives you all the acks only issue is
they store them up and release them all at once so forget getting a nice
ack-clocking coming out of the stack.

The only way to deal with it is to generally raise abc_l_var to a much larger
value. That way has you get an aggregated ack your cwnd will open.. down side
is this lets you be more bursty… pacing can help here but only the bbr and rack
pace in FreeBSD…

R

> On May 2, 2023, at 9:55 AM, Rodney W. Grimes <freebsd-rwg@gndrsh.dnsmgr.net> wrote:
> 
> Second attempt, first one failed due to not being a member
> of the list :-(.
> 
>> Adding freebsd-transport@freebsd.org to get that specific groups
>> eyes on this issue.
>> 
>> Rod
>> 
>>> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c,
>>> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension
>>> That is, during slow-start, when receiving an ACK of 'bytes_acked'
>>> 
>>>    cwnd += min(bytes_acked, abc_l_var * SMSS);  // abc_l_var = 2 dflt
>>> 
>>> As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances
>>> the negative impact of the delayed ACK algorithm.  RFC 5681 also
>>> requires that a receiver SHOULD generate an ACK for at least every
>>> second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS.
>>> If both sender and receiver follow it. cwnd should grow exponentially
>>> during slow-slow:
>>> 
>>>    cwnd *= 2    (per RTT)
>>> 
>>> However, LRO and TSO are widely used today, so receiver may generate
>>> much less ACKs than it used to do.  As I observed, Both FreeBSD and
>>> Linux generates at most one ACK per segment assembled by LRO/GRO.
>>> The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535.
>>> 
>>> Sending 1MB over a link of 100ms delay from FreeBSD 13.2:
>>> 
>>> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options
>>> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0
>>> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win
>>> 65160, options [mss 1460,sackOK,TS val 563185696 ecr
>>> 495212525,nop,wscale 7], length 0
>>> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS
>>> val 495212626 ecr 563185696], length 0
>>> // TSopt omitted below for brevity.
>>> 
>>> // cwnd = 10 * MSS, sent 10 * MSS
>>> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 14480
>>> 
>>> // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS
>>> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0
>>> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length 17376
>>> 
>>> // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS
>>> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0
>>> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length 20272
>>> 
>>> // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS
>>> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0
>>> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65,
>>> length 21500
>>> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length 1448
>>> 
>>> As a consequence, instead of growing exponentially, cwnd grows
>>> more-or-less quadratically during slow-start, unless abc_l_var is
>>> set to a sufficiently large value.
>>> 
>>> NewReno took more than 20 seconds to ramp up throughput to 100Mbps
>>> over an emulated 100ms delay link.  While Linux took ~2 seconds.
>>> I can provide the pcap file if anyone is interested.
>>> 
>>> Switching to CUBIC won't help, because it uses the logic in NewReno
>>> ack_received() for slow start.
>>> 
>>> Is this a well-known issue and abc_l_var is the only cure for it?
>>> https://www.google.com/url?q=https://calomel.org/freebsd_network_tuning.html&source=gmail-imap&ust=1683640529000000&usg=AOvVaw0MoyDmFAOg9MlB5yX3FzJP
>>> 
>>> Thank you!
>>> 
>>> Best,
>>> Shuo Chen
>>> 
>>> 
>> 
>> -- 
>> Rod Grimes                                                 rgrimes@freebsd.org
>> 
>> 
> 
> -- 
> Rod Grimes                                                 rgrimes@freebsd.org
> 

------
Randall Stewart
rrs@netflix.com