Question on TCP reassembly counter

Tue Oct 26 02:44:18 UTC 2010

On 10/25/10 17:40, Sriram Gorti wrote:
> Hi,
> 
> On Sat, Oct 23, 2010 at 5:29 AM, Lawrence Stewart <lstewart at freebsd.org> wrote:
>> On 10/22/10 18:10, Sriram Gorti wrote:
>>> Hi,
>>>
>>> On Mon, Oct 18, 2010 at 3:08 AM, Lawrence Stewart <lstewart at freebsd.org> wrote:
>>>
>>> Thanks for the fix. Tried it on XLR/XLS and the earlier tests pass
>>> now. net.inet.tcp.reass.overflows was always zero after the tests (and
>>> in the samples I took while the tests were running).
>>
>> Great, thanks for testing.
>>
>>> One observation though: net.inet.tcp.reass.cursegments was non-zero
>>> (it was just 1) after 30 rounds, where each round is (as earlier)
>>> 15-concurrent instances of netperf for 20s. This was on the netserver
>>> side. And, it was zero before the netperf runs. On the other hand,
>>> Andre told me (in a separate mail) that this counter is not relevant
>>> anymore - so, should I just ignore it ?
>>
>> It's relevant, just not guaranteed to be 100% accurate at any given
>> point in time. The value is calculated based on synchronised access to
>> UMA zone stats and unsynchronised access to UMA per-cpu zone stats. The
>> latter is safe, but causes the overall result to potentially be
>> inaccurate due to use of stale data. The accuracy vs overhead tradeoff
>> was deemed worthwhile for informational counters like this one.
>>
>> That being said, I would not expect the value to remain persistently at
>> 1 after all TCP activity has finished on the machine. It won't affect
>> performance, but I'm curious to know if the calculation method has a
>> flaw. I'll try to reproduce locally, but can you please confirm if the
>> value stays at 1 even after many minutes of no TCP activity?
>>
> 
> This behavior does repeat easily but finally it did. Even after
> leaving the system alone (other than for background NFS messages) for
> a few mins, the value persists. After a little more investigation, it
> is observed that one of the spawned netserver's has not terminated and
> when it is explicitly terminated, the sysctl of interest drops back to
> zero. Does that mean the TCP reassembly portion is doing okay ?

Yes, the fact that it drops to zero after killing the process indicates
everything is as I expected wrt the accounting. Thanks for confirming.

> But, it opens up the question of why the netserver has not terminated.
> I will dig further into it but if you have any quick suggestions, they
> are most welcome.

Not sure I can answer that, but I'm very interested to know why a
segment appears to become stuck in the reassembly queue. It seems
unlikely that an actual segment is stuck, as the connection would
eventually timeout if it was waiting for a retransmit that never came.
It seems more probable that the process being wedged simply causes the
net.inet.tcp.reass.cursegments sysctl to continue reporting a stale
value, perhaps due to the per-cpu calculation continually using the same
stale data whilst the process is running.

Perhaps you could try reproduce whilst using siftr(4) to capture data at
the same time. That will allow you to see for sure what the state of the
reass queue is amongst other things and rule out a segment really being
stuck in the queue. You could then also use perhaps procstat -kk on the
wedged process and/or truss/ktrace to see what it's doing.

Quick-start guide for siftr (man page has full details):

kldload siftr
sysctl net.inet.siftr.logfile=/somewhere/with/decent/space/siftr.log
sysctl net.inet.siftr.enabled=1
<try reproduce>
sysctl net.inet.siftr.enabled=0

If it takes a few attempts to reproduce, delete the siftr log file in
between each reproduction attempt so we end up with a log that only
covers an actual example.

I think it's the last column that logs the size of the reass queue.
Filter log for the connection of interest (e.g. grep "<ip>,<port>"
siftr.log | tail), and check what the reass queue size is when the last
packet for that connection was sent/received.

Let me know how you go.

Cheers,
Lawrence