Kernel panics in tcp_twclose

Palle Girgensohn girgen at pingpong.net
Fri Sep 18 20:42:32 UTC 2015





> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov <kostikbel at gmail.com>:
> 
>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:
>> Hi Palle,
>> 
>>> On 18/09/15 11:12, Palle Girgensohn wrote:
>>> We see daily panics on our production systems (web server, apache
>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using netgraph
>>> interfaces [not epair]).
>>> 
>>> The problem started after the summer. Normal port upgrades seems to
>>> be the only difference. The problem occurs with 10.2-p2 kernel as
>>> well as 10.1-p4 and 10.1-p15.
>>> 
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175
>>> 
>>> Any ideas?
>> 
>> Thanks for you detailed report.  I am not aware of any tcp_twclose()
>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean there
>> are none).  Few interesting facts (at least for me):
>> 
>> - Your crash happens when unlocking a inp exclusive lock with INP_WUNLOCK()
>> 
>> - Something is already wrong before calling turnstile_broadcast() as it
>> is called with ts = NULL:
> In the kernel without witness this is a 99%-sure indication of attempt to
> unlock not owned lock.
> 
>> 
>> turnstile_broadcast (ts=0x0, queue=1) at
>> /usr/src/sys/kern/subr_turnstile.c:838
>> __rw_wunlock_hard () at /usr/src/sys/kern/kern_rwlock.c:988
>> tcp_twclose () at /usr/src/sys/netinet/tcp_timewait.c:540
>> tcp_tw_2msl_scan () at /usr/src/sys/netinet/tcp_timewait.c:748
>> tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:198
>> 
>> I won't go to far here as I am not expert enough in VIMAGE, but one
>> question anyway:
>> 
>> - Can you correlate this kernel panic to a particular event?  Like for
>> example a VIMAGE/VNET jail destruction.
>> 
>> I will test that on my side on a 10.2 machine.
>> 
>> --
>> Julien
>> 
> 
> 


Hi,

I just got a response from adrian@ where he seems to remember that it has all been fixed in head. 

I would really prefer not to run a head kernel in production unless I have to, so the question is if it is possible to pin down the specific fixes for this problem? Any suggestions?

Thanks for all the help so far!

Palle


More information about the freebsd-net mailing list