LRO causing stretch ACK violations interacts badly with delayed ACKing

Andre Oppermann andre at freebsd.org
Mon Oct 21 20:11:50 UTC 2013


On 21.10.2013 21:57, Andre Oppermann wrote:
> On 21.10.2013 18:39, Colin Percival wrote:
>> On 10/21/13 08:15, Andre Oppermann wrote:
>>> On 21.10.2013 03:42, Colin Percival wrote:
>>>> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and
>>>> 9.2 are behaving differently -- anyone have any ideas?
>>>
>>> The last time I looked our soft-LRO had a few remaining issues.  One of
>>> them was that in certain situations reordering may happen with segments
>>> that can't be aggregated into a LRO state.  The other was that the driver
>>> is responsible to manage the flushing of LRO states that haven't seen
>>> updates in some time.
>>
>> It looks like the netfront driver flushes LRO every time it finishes reading
>> packets -- if anything, it's too aggressive:
>>         /*
>>          * Flush any outstanding LRO work
>>          */
>>         while (!SLIST_EMPTY(&lro->lro_active)) {
>>             queued = SLIST_FIRST(&lro->lro_active);
>>             SLIST_REMOVE_HEAD(&lro->lro_active, next);
>>             tcp_lro_flush(lro, queued);
>>         }
>>
>> So unless that code is broken somehow (it looks reasonable to me) I don't think
>> it's a problem of data getting stuck in soft-LRO.
>>
>> Looking at the TCP stack on the other hand confuses me -- I see code which seems
>> to be saying that we can delay-ACK any time that we're receiving data and don't
>> have a delayed ACK already pending, without any regard for the fact that we
>> might be receiving 2+ MSS at once... am I missing something here?
>
> This is an excellent observation!  Our tcp doesn't know about LRO
> and I prepared the mbuf header to carry information about the number
> of merged LRO segments.  That's not done yet again.  However a small
> heuristic in tcp_input looking for segment > mss should be sufficient
> for now.  Let me have a look at patching it into a suitable place.

Please check out the patch below.  Haven't tested it myself yet though.

-- 
Andre

Index: netinet/tcp_input.c
===================================================================
--- netinet/tcp_input.c (revision 256780)
+++ netinet/tcp_input.c (working copy)
@@ -508,10 +508,13 @@
   *       the ack that opens up a 0-sized window and
   *             - delayed acks are enabled or
   *             - this is a half-synchronized T/TCP connection.
+ *     - the segment size is not larger than the MSS and LRO wasn't used
+ *       for this segment.
   */
-#define DELAY_ACK(tp)                                                  \
+#define DELAY_ACK(tp, tlen)                                            \
         ((!tcp_timer_active(tp, TT_DELACK) &&                           \
             (tp->t_flags & TF_RXWIN0SENT) == 0) &&                      \
+           (tlen <= tp->t_maxopd) &&                                   \
             (V_tcp_delack_enabled || (tp->t_flags & TF_NEEDSYN)))

  /*
@@ -1863,7 +1866,7 @@
                         }
                         /* NB: sorwakeup_locked() does an implicit unlock. */
                         sorwakeup_locked(so);
-                       if (DELAY_ACK(tp)) {
+                       if (DELAY_ACK(tp, tlen)) {
                                 tp->t_flags |= TF_DELACK;
                         } else {
                                 tp->t_flags |= TF_ACKNOW;
@@ -1954,7 +1957,7 @@
                          * If there's data, delay ACK; if there's also a FIN
                          * ACKNOW will be turned on later.
                          */
-                       if (DELAY_ACK(tp) && tlen != 0)
+                       if (DELAY_ACK(tp, tlen) && tlen != 0)
                                 tcp_timer_activate(tp, TT_DELACK,
                                     tcp_delacktime);
                         else
@@ -2926,7 +2929,7 @@
                 if (th->th_seq == tp->rcv_nxt &&
                     LIST_EMPTY(&tp->t_segq) &&
                     TCPS_HAVEESTABLISHED(tp->t_state)) {
-                       if (DELAY_ACK(tp))
+                       if (DELAY_ACK(tp, tlen))
                                 tp->t_flags |= TF_DELACK;
                         else
                                 tp->t_flags |= TF_ACKNOW;


More information about the freebsd-net mailing list