Kernel panics in tcp_twclose

Palle Girgensohn girgen at FreeBSD.org
Mon Sep 21 13:53:43 UTC 2015


> 21 sep 2015 kl. 10:21 skrev Julien Charbon <jch at FreeBSD.org>:
> 
> 
> Hi Konstantin, Hi Palle,
> 
> On 18/09/15 18:06, Konstantin Belousov wrote:
>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:
>>> Hi Palle,
>>> 
>>> On 18/09/15 11:12, Palle Girgensohn wrote:
>>>> We see daily panics on our production systems (web server, apache
>>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using netgraph
>>>> interfaces [not epair]).
>>>> 
>>>> The problem started after the summer. Normal port upgrades seems to
>>>> be the only difference. The problem occurs with 10.2-p2 kernel as
>>>> well as 10.1-p4 and 10.1-p15.
>>>> 
>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175
>>>> 
>>>> Any ideas?
>>> 
>>> Thanks for you detailed report.  I am not aware of any tcp_twclose()
>>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean there
>>> are none).  Few interesting facts (at least for me):
>>> 
>>> - Your crash happens when unlocking a inp exclusive lock with INP_WUNLOCK()
>>> 
>>> - Something is already wrong before calling turnstile_broadcast() as it
>>> is called with ts = NULL:
>> In the kernel without witness this is a 99%-sure indication of attempt to
>> unlock not owned lock.
> 
> Thanks, this is useful.  So far I did not find any path where
> tcp_twclose() can call INP_WUNLOCK without having the exclusive lock
> held, that makes this issue interesting.
> 
>>> I won't go to far here as I am not expert enough in VIMAGE, but one
>>> question anyway:
>>> 
>>> - Can you correlate this kernel panic to a particular event?  Like for
>>> example a VIMAGE/VNET jail destruction.
>>> 
>>> I will test that on my side on a 10.2 machine.
> 
> I did not find any issues while testing 10.2 + VIMAGE on my side. Thus
> Palle what I would suggest:
> 
> - First, test with stable/10 to see if by chance this issue has already
> been fixed in stable branch.
> 
> - Second, if issue is still in stable/10, compile 10.2 kernel with
> these options:
> 
> options        DDB
> options        DEADLKRES
> options        INVARIANTS
> options        INVARIANT_SUPPORT
> options        WITNESS
> options        WITNESS_SKIPSPIN
> 
> To see where the original fault is coming from.

Hi,

We just had two crashes within 15 minutes using 10.2 with these two added:

https://svnweb.freebsd.org/changeset/base/287261

https://svnweb.freebsd.org/changeset/base/287780 

We don't always get a core dump, but the second time, we did.

very similar stack trace, but not identical:

(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80949a82 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80949e65 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff80949cf3 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff80d5d0bb in trap_fatal (frame=<value optimized out>,
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5  0xffffffff80d5d3bd in trap_pfault (frame=0xfffffe1760bc1840,
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674
#6  0xffffffff80d5ca5a in trap (frame=0xfffffe1760bc1840)
    at /usr/src/sys/amd64/amd64/trap.c:440
#7  0xffffffff80d42dd2 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff8099861c in turnstile_broadcast (ts=0x0, queue=1)
    at /usr/src/sys/kern/subr_turnstile.c:838
#9  0xffffffff80948100 in __rw_wunlock_hard (c=0xfffff811c43487a0, tid=1,
    file=0x1 <Address 0x1 out of bounds>, line=1)
    at /usr/src/sys/kern/kern_rwlock.c:988
#10 0xffffffff80b067c4 in tcp_twclose (tw=<value optimized out>,
    reuse=<value optimized out>) at /usr/src/sys/netinet/tcp_timewait.c:540
#11 0xffffffff80b06e0b in tcp_tw_2msl_scan (reuse=0)
    at /usr/src/sys/netinet/tcp_timewait.c:748
#12 0xffffffff80b04b0e in tcp_slowtimo ()
    at /usr/src/sys/netinet/tcp_timer.c:198
#13 0xffffffff809b7a04 in pfslowtimo (arg=0x0)
    at /usr/src/sys/kern/uipc_domain.c:508
#14 0xffffffff8095f91b in softclock_call_cc (c=0xffffffff81620bf0,
    cc=0xffffffff8169dc00, direct=0) at /usr/src/sys/kern/kern_timeout.c:685
#15 0xffffffff8095fd44 in softclock (arg=0xffffffff8169dc00)
    at /usr/src/sys/kern/kern_timeout.c:814
#16 0xffffffff8091592b in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff801102e0d00)
    at /usr/src/sys/kern/kern_intr.c:1264
#17 0xffffffff80915d76 in ithread_loop (arg=0xfffff801102adee0)
    at /usr/src/sys/kern/kern_intr.c:1277
#18 0xffffffff8091347a in fork_exit (
    callout=0xffffffff80915ce0 <ithread_loop>, arg=0xfffff801102adee0,
    frame=0xfffffe1760bc1c00) at /usr/src/sys/kern/kern_fork.c:1018
#19 0xffffffff80d4330e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#20 0x0000000000000000 in ?? ()



I'll try stable/10 now. Would you suggest a "clean" stable/10, or could 287621 and 287780 help?

I'll add the debugging suggested options right away.

Palle




More information about the freebsd-net mailing list