mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

Ben RUBSON ben.rubson at gmail.com
Wed Aug 16 09:02:13 UTC 2017


> On 15 Aug 2017, at 23:33, Julien Charbon <jch at freebsd.org> wrote:
> 
> On 8/11/17 11:32 AM, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 13:33, Julien Charbon <jch at freebsd.org> wrote:
>>> 
>>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
>>>> 
>>>> Suggested fix attached.
>>> 
>>> I agree we your conclusion.  Just for the record, more precisely this
>>> regression seems to have been introduced with:
>>> (...)
>>> Thus good catch, and your patch looks good.  I am going to just verify
>>> the other in_pcbrele_wlocked() calls in TCP stack.
>> 
>> Julien, do you plan to make this fix reach 11.0-p12 ?
> 
> I am checking if your issue is another flavor of the issue fixed by:
> 
> https://svnweb.freebsd.org/base?view=revision&revision=307551
> https://reviews.freebsd.org/D8211
> 
> This fix in not in 11.0 but in 11.1.  Currently I did not found how an
> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
> set to NULL already except the issue fixed by r307551.
> 
> Thus could you try to apply this patch:
> 
> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
> 
> and see if you can still reproduce this issue?

Thank you for your answer Julien.
Unfortunately, I'm not sure at all how to reproduce the issue.
I have other servers which are 100% identical to this one, same workload,
same some-months uptime, but they did not trigger the bug yet.

If other network stack experts (I'm not) agree with your analysis,
we could then certainly go further with D8211 / r307551.

One thing that perhaps might help :
# netstat -an | grep TIME_WAIT$ | wc -l
468

Note that due to this running bug, sendmail has lots of difficulties to send outgoing mails.
As soon as I run the above netstat command, I receive a lot of stacked mails (more than 20 this time).
As if netstat was able to somehow help...

Number of TIME_WAIT connections however does not decrease, but increases.

> And in the spirit of r307551 fix and based on Hans patch I will also
> propose to add a kernel log describing the issue instead of starting an
> infinite loop when INVARIANT is not set.

Which should then never be triggered :)
Good idea I think !

Thank you again !

Ben



More information about the freebsd-stable mailing list