tcp failing to recover from a packet loss under 8.2-RELEASE?
Lawrence Stewart
lstewart at freebsd.org
Thu Aug 11 01:13:35 UTC 2011
On 08/05/11 00:19, Steven Hartland wrote:
> ----- Original Message ----- From: "Lawrence Stewart"
> <lstewart at freebsd.org>
[snip]
>>> So I suppose the question is should maxsegments be larger by
>>> default due to the recent changes e.g.
>>> - V_tcp_reass_maxseg = nmbclusters / 16;
>>> + V_tcp_reass_maxseg = nmbclusters / 8;
>>>
>>> or is the correct fix something more involved?
>>
>> I'm not sure if bumping the value is appropriate - we have always
>> expected users to tune their network stack to perform well when used
>> in "unusual" scenarios - a large BDP fibre path still being in the
>> "unusual" category.
>
> TBH I wouldn't classify a latency of 7ms @ 100Mbps unusal in the slightest
> in this day and age.
Are the TCP sessions experiencing the problem terminating on either side
of that link i.e. is the RTT of the connection 7ms? Or does the fibre
link form one part of the path connections are traversing?
Based on your symptoms, I believe the latter is the case (the BDP of a
7ms 100Mbps fiber link is a lot smaller than your pre-tweaked reass max
queue limit and therefore shouldn't have caused stalls), in which case
it's not the characteristics of your fiber link that matter in their own
right, but the characteristics of the complete path from sender to receiver.
>> The real fix which is somewhere down on my todo list is to make all
>> these memory constraints elastic and respond to VM pressure, thus
>> negating the need for a hard limit at all. This would solve many if
>> not most of the TCP tuning problems we currently have with one foul
>> swoop and would greatly reduce the need for tuning in many situations
>> that currently are in the "needs manual tuning" basket.
>
> This would indeed be a great improvement.
>
>> Andre and Steven, I'm a bit too sleepy to properly review your
>> combined proposed changes right now and will follow up in the next few
>> days instead.
>
> No problem, we've increased nmbclusters on all our machines and there now
> performing fine in the problem scenario so no rush, look forward to your
> feedback when you've had some sleep :)
Steven, as far as my reading of the code informs me, your additional
sanity checking is unnecessary - the segment only gets added to the
reassembly list where the calls to LIST_INSERT_* are, and the
uma_zfree() in the "if (p != NULL)" block shouldn't ever be called if
the incoming segment is equal to rcv_nxt.
However, I would like to see some additional sanity checking added to
Andre's base patch in the form of some KASSERTs. There are a number of
hidden assumptions in the current code and I think explicitly noting
them with KASSERTs would be useful. I'm also paranoid about leaking a
stack allocated tseg_qent across calls to tcp_reass() as that would be a
horrendous bug to diagnose.
Here's my tweaked version of Andre's patch:
http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff
It has only been compile tested at this point.
BTW, when a patch is eventually committed, the logging changes should be
done separately to the KASSERT/backup stack allocated tseg_qent change.
Cheers,
Lawrence
More information about the freebsd-net
mailing list