LRO causing stretch ACK violations interacts badly with delayed ACKing

Mon Oct 21 19:57:40 UTC 2013

On 21.10.2013 18:39, Colin Percival wrote:
> On 10/21/13 08:15, Andre Oppermann wrote:
>> On 21.10.2013 03:42, Colin Percival wrote:
>>> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and
>>> 9.2 are behaving differently -- anyone have any ideas?
>>
>> The last time I looked our soft-LRO had a few remaining issues.  One of
>> them was that in certain situations reordering may happen with segments
>> that can't be aggregated into a LRO state.  The other was that the driver
>> is responsible to manage the flushing of LRO states that haven't seen
>> updates in some time.
>
> It looks like the netfront driver flushes LRO every time it finishes reading
> packets -- if anything, it's too aggressive:
> 		/*
> 		 * Flush any outstanding LRO work
> 		 */
> 		while (!SLIST_EMPTY(&lro->lro_active)) {
> 			queued = SLIST_FIRST(&lro->lro_active);
> 			SLIST_REMOVE_HEAD(&lro->lro_active, next);
> 			tcp_lro_flush(lro, queued);
> 		}
>
> So unless that code is broken somehow (it looks reasonable to me) I don't think
> it's a problem of data getting stuck in soft-LRO.
>
> Looking at the TCP stack on the other hand confuses me -- I see code which seems
> to be saying that we can delay-ACK any time that we're receiving data and don't
> have a delayed ACK already pending, without any regard for the fact that we
> might be receiving 2+ MSS at once... am I missing something here?

This is an excellent observation!  Our tcp doesn't know about LRO
and I prepared the mbuf header to carry information about the number
of merged LRO segments.  That's not done yet again.  However a small
heuristic in tcp_input looking for segment > mss should be sufficient
for now.  Let me have a look at patching it into a suitable place.

-- 
Andre