Kernel panics in tcp_twclose

Palle Girgensohn girgen at pingpong.net
Mon Sep 21 08:55:46 UTC 2015


> 21 sep 2015 kl. 10:28 skrev Julien Charbon <jch at freebsd.org>:
> 
> 
> Hi Palle,
> 
> On 18/09/15 22:42, Palle Girgensohn wrote:
>>> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov
>>> <kostikbel at gmail.com>:
>>> 
>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: 
>>>> Hi Palle,
>>>> 
>>>>> On 18/09/15 11:12, Palle Girgensohn wrote: We see daily panics
>>>>> on our production systems (web server, apache running MPM
>>>>> event, openjdk8. Kernel with VIMAGE. Jails using netgraph 
>>>>> interfaces [not epair]).
>>>>> 
>>>>> The problem started after the summer. Normal port upgrades
>>>>> seems to be the only difference. The problem occurs with
>>>>> 10.2-p2 kernel as well as 10.1-p4 and 10.1-p15.
>>>>> 
>>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175
>>>>> 
>>>>> Any ideas?
>>>> 
>>>> Thanks for you detailed report.  I am not aware of any
>>>> tcp_twclose() related issues (without VIMAGE) since FreeBSD 10.0
>>>> (does not mean there are none).  Few interesting facts (at least
>>>> for me):
>>>> 
>>>> - Your crash happens when unlocking a inp exclusive lock with
>>>> INP_WUNLOCK()
>>>> 
>>>> - Something is already wrong before calling turnstile_broadcast()
>>>> as it is called with ts = NULL:
>>> In the kernel without witness this is a 99%-sure indication of
>>> attempt to unlock not owned lock.
>>> 
>>>> 
>>>> turnstile_broadcast (ts=0x0, queue=1) at 
>>>> /usr/src/sys/kern/subr_turnstile.c:838 __rw_wunlock_hard () at
>>>> /usr/src/sys/kern/kern_rwlock.c:988 tcp_twclose () at
>>>> /usr/src/sys/netinet/tcp_timewait.c:540 tcp_tw_2msl_scan () at
>>>> /usr/src/sys/netinet/tcp_timewait.c:748 tcp_slowtimo () at
>>>> /usr/src/sys/netinet/tcp_timer.c:198
>>>> 
>>>> I won't go to far here as I am not expert enough in VIMAGE, but
>>>> one question anyway:
>>>> 
>>>> - Can you correlate this kernel panic to a particular event?
>>>> Like for example a VIMAGE/VNET jail destruction.
>>>> 
>>>> I will test that on my side on a 10.2 machine.
>> 
>> I just got a response from adrian@ where he seems to remember that it
>> has all been fixed in head.
>> 
>> I would really prefer not to run a head kernel in production unless I
>> have to, so the question is if it is possible to pin down the
>> specific fixes for this problem? Any suggestions?
>> 
>> Thanks for all the help so far!
> 
> On my side, all issues we have found in TCP stack are currently both
> fixed in 10.2 and HEAD.  The remaining differences are only performance
> improvements that are solely in HEAD.  adrian@ might have more details
> on fixes he has in mind.

Hi, 10.2 gives us the same sort of crash as 10.1.

Vi are now testing releng/10.1 with these two patches merged:

https://svnweb.freebsd.org/changeset/base/287261

https://svnweb.freebsd.org/changeset/base/287780


We have yet to see a crash, so it is looking vaguelly promising, but we have to wait and see.

Palle

PS. I've failed to mention that except VIMAGE +jails, the jail host is an NFS client as well. They NFS shares are mounted from the jail host, not the jails (since that is not possible anyway). DS.






More information about the freebsd-net mailing list