tcp failing to recover from a packet loss under 8.2-RELEASE?

Thu Aug 11 01:13:35 UTC 2011

On 08/05/11 00:19, Steven Hartland wrote:
> ----- Original Message ----- From: "Lawrence Stewart"
> <lstewart at freebsd.org>
[snip]
>>> So I suppose the question is should maxsegments be larger by
>>> default due to the recent changes e.g.
>>> - V_tcp_reass_maxseg = nmbclusters / 16;
>>> + V_tcp_reass_maxseg = nmbclusters / 8;
>>>
>>> or is the correct fix something more involved?
>>
>> I'm not sure if bumping the value is appropriate - we have always
>> expected users to tune their network stack to perform well when used
>> in "unusual" scenarios - a large BDP fibre path still being in the
>> "unusual" category.
>
> TBH I wouldn't classify a latency of 7ms @ 100Mbps unusal in the slightest
> in this day and age.

Are the TCP sessions experiencing the problem terminating on either side 
of that link i.e. is the RTT of the connection 7ms? Or does the fibre 
link form one part of the path connections are traversing?

Based on your symptoms, I believe the latter is the case (the BDP of a 
7ms 100Mbps fiber link is a lot smaller than your pre-tweaked reass max 
queue limit and therefore shouldn't have caused stalls), in which case 
it's not the characteristics of your fiber link that matter in their own 
right, but the characteristics of the complete path from sender to receiver.

>> The real fix which is somewhere down on my todo list is to make all
>> these memory constraints elastic and respond to VM pressure, thus
>> negating the need for a hard limit at all. This would solve many if
>> not most of the TCP tuning problems we currently have with one foul
>> swoop and would greatly reduce the need for tuning in many situations
>> that currently are in the "needs manual tuning" basket.
>
> This would indeed be a great improvement.
>
>> Andre and Steven, I'm a bit too sleepy to properly review your
>> combined proposed changes right now and will follow up in the next few
>> days instead.
>
> No problem, we've increased nmbclusters on all our machines and there now
> performing fine in the problem scenario so no rush, look forward to your
> feedback when you've had some sleep :)

Steven, as far as my reading of the code informs me, your additional 
sanity checking is unnecessary - the segment only gets added to the 
reassembly list where the calls to LIST_INSERT_* are, and the 
uma_zfree() in the "if (p != NULL)" block shouldn't ever be called if 
the incoming segment is equal to rcv_nxt.

However, I would like to see some additional sanity checking added to 
Andre's base patch in the form of some KASSERTs. There are a number of 
hidden assumptions in the current code and I think explicitly noting 
them with KASSERTs would be useful. I'm also paranoid about leaking a 
stack allocated tseg_qent across calls to tcp_reass() as that would be a 
horrendous bug to diagnose.

Here's my tweaked version of Andre's patch:
http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff

It has only been compile tested at this point.

BTW, when a patch is eventually committed, the logging changes should be 
done separately to the KASSERT/backup stack allocated tseg_qent change.

Cheers,
Lawrence