LRO causing stretch ACK violations interacts badly with delayed ACKing
Andre Oppermann
andre at freebsd.org
Mon Oct 21 20:11:50 UTC 2013
On 21.10.2013 21:57, Andre Oppermann wrote:
> On 21.10.2013 18:39, Colin Percival wrote:
>> On 10/21/13 08:15, Andre Oppermann wrote:
>>> On 21.10.2013 03:42, Colin Percival wrote:
>>>> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and
>>>> 9.2 are behaving differently -- anyone have any ideas?
>>>
>>> The last time I looked our soft-LRO had a few remaining issues. One of
>>> them was that in certain situations reordering may happen with segments
>>> that can't be aggregated into a LRO state. The other was that the driver
>>> is responsible to manage the flushing of LRO states that haven't seen
>>> updates in some time.
>>
>> It looks like the netfront driver flushes LRO every time it finishes reading
>> packets -- if anything, it's too aggressive:
>> /*
>> * Flush any outstanding LRO work
>> */
>> while (!SLIST_EMPTY(&lro->lro_active)) {
>> queued = SLIST_FIRST(&lro->lro_active);
>> SLIST_REMOVE_HEAD(&lro->lro_active, next);
>> tcp_lro_flush(lro, queued);
>> }
>>
>> So unless that code is broken somehow (it looks reasonable to me) I don't think
>> it's a problem of data getting stuck in soft-LRO.
>>
>> Looking at the TCP stack on the other hand confuses me -- I see code which seems
>> to be saying that we can delay-ACK any time that we're receiving data and don't
>> have a delayed ACK already pending, without any regard for the fact that we
>> might be receiving 2+ MSS at once... am I missing something here?
>
> This is an excellent observation! Our tcp doesn't know about LRO
> and I prepared the mbuf header to carry information about the number
> of merged LRO segments. That's not done yet again. However a small
> heuristic in tcp_input looking for segment > mss should be sufficient
> for now. Let me have a look at patching it into a suitable place.
Please check out the patch below. Haven't tested it myself yet though.
--
Andre
Index: netinet/tcp_input.c
===================================================================
--- netinet/tcp_input.c (revision 256780)
+++ netinet/tcp_input.c (working copy)
@@ -508,10 +508,13 @@
* the ack that opens up a 0-sized window and
* - delayed acks are enabled or
* - this is a half-synchronized T/TCP connection.
+ * - the segment size is not larger than the MSS and LRO wasn't used
+ * for this segment.
*/
-#define DELAY_ACK(tp) \
+#define DELAY_ACK(tp, tlen) \
((!tcp_timer_active(tp, TT_DELACK) && \
(tp->t_flags & TF_RXWIN0SENT) == 0) && \
+ (tlen <= tp->t_maxopd) && \
(V_tcp_delack_enabled || (tp->t_flags & TF_NEEDSYN)))
/*
@@ -1863,7 +1866,7 @@
}
/* NB: sorwakeup_locked() does an implicit unlock. */
sorwakeup_locked(so);
- if (DELAY_ACK(tp)) {
+ if (DELAY_ACK(tp, tlen)) {
tp->t_flags |= TF_DELACK;
} else {
tp->t_flags |= TF_ACKNOW;
@@ -1954,7 +1957,7 @@
* If there's data, delay ACK; if there's also a FIN
* ACKNOW will be turned on later.
*/
- if (DELAY_ACK(tp) && tlen != 0)
+ if (DELAY_ACK(tp, tlen) && tlen != 0)
tcp_timer_activate(tp, TT_DELACK,
tcp_delacktime);
else
@@ -2926,7 +2929,7 @@
if (th->th_seq == tp->rcv_nxt &&
LIST_EMPTY(&tp->t_segq) &&
TCPS_HAVEESTABLISHED(tp->t_state)) {
- if (DELAY_ACK(tp))
+ if (DELAY_ACK(tp, tlen))
tp->t_flags |= TF_DELACK;
else
tp->t_flags |= TF_ACKNOW;
More information about the freebsd-net
mailing list