Kernel panics in tcp_twclose

Palle Girgensohn girgen at FreeBSD.org
Mon Sep 28 08:00:12 UTC 2015


> 25 sep 2015 kl. 16:19 skrev Palle Girgensohn <girgen at FreeBSD.org>:
> 
>> 
>> 25 sep 2015 kl. 16:14 skrev Palle Girgensohn <girgen at FreeBSD.org>:
>> 
>>> 
>>> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn <girgen at FreeBSD.org>:
>>> 
>>> 
>>>> 24 sep 2015 kl. 09:57 skrev Julien Charbon <jch at FreeBSD.org>:
>>>> 
>>>> 
>>>> Hi -net,
>>>> 
>>>> On 24/09/15 09:03, Julien Charbon wrote:
>>>>> On 24/09/15 08:55, Palle Girgensohn wrote:
>>>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn
>>>>>>> <girgen at pingpong.net>:
>>>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn
>>>>>>>> <girgen at pingpong.net>:
>>>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch at freebsd.org>: 
>>>>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>>>>> Kernels and userland are updated to 10.2-p3 with the patch
>>>>>>>> removing the suspicous KASSERT.
>>>>>>>> dtrace running continously redirecting to a log file.
>>>>>> Just had a crash. Unfortunately, the kernel was stuck at the db>
>>>>>> prompt, and the remote keyboard was unresponsive (HP ILO, not
>>>>>> impressed). So I had to reset the power and never got a core dump...
>>>>>> 
>>>>>> panic: tcp_tw_2msl_stop: inp should not be released here
>>>>>> cpuid = 0
>>>>>> KDB: stack backtrace:
>>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790
>>>>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800
>>>>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850
>>>>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame
>>>>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame
>>>>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame
>>>>>> 0xfffffe175acd18f0 softclock_call_cc() at
>>>>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at
>>>>>> softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers()
>>>>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30
>>>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70
>>>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0
>>>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0
>>>>>> --- trap 0, rip = 0, rsp = 0xfffffe175acd1b70, rbp = 0 ---
>>>>>> KDB: enter: panic
>>>>>> [ thread pid 12 tid 100043 ]
>>>>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>>>>> db>
>>>>> 
>>>>> Thanks a log for this backstrace.  This is what at expected, when
>>>>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one
>>>>> extra time that leads to:
>>>>> 
>>>>> tcp_tw_2msl_stop: inp should not be released here
>>>>> 
>>>>> Let me try to come with a tentative fix for this case.
>>>> 
>>>> See joined my tentative patch for these case.  It is only a first
>>>> tentative patch as I am still waiting on -net feedbacks on what should
>>>> be the rule here.
>>>> 
>>>> By the way:
>>>> 
>>>> - I see nothing specific to VIMAGE here
>>>> 
>>>> - Anyone aware of tcp_close() (or tcp_drop()) calls modified/introduced
>>>> recently in 10.2 that could explained why this issue only appears only now?
>>>> 
>>>> --
>>>> Julien
>>>> <tcp-close-fix-v1.patch>
>>> 
>>> 
>>> Running a machine with the patch now (it just crashed and rebooted with the new kernel).
>>> 
>>> Hoping it will have a "soothing" effect... ;-)
>>> 
>>> 
>>> dtrace running as previously. No output yet, though.
>>> 
>>> 
>> 
>> Hello -net & Julien!
>> 
>> First of, loud cheers and a big *thank you* to Julien for helping us get our systems to stop crashing. This really means a lot to us! Thank you!
>> 
>> We have been running more than 24 hours with no crash, so I'm getting more and more confident that the change acually makes the system stable.
>> 
>> Dtrace still shows nothing.
>> 
>> Palle
> 
> 
> Secondly, is this error related? This is *not* VIMAGE, *not* jail. It is a binary installed GENERIC from freebsd-update. 10.1-RELEASE-p19. It just crashed today, and we did not get any core dump, but I found this core.txt from a crash in August that I was not aware of (I was on holiday then... :)
> 
> Since it is installed binary, I have no kernel.debug.
> 
> ...
> 
> panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 clashing
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff80963000 at kdb_backtrace+0x60
> #1 0xffffffff80928125 at panic+0x155
> #2 0xffffffff8099c180 at sbdroprecord_locked+0
> #3 0xffffffff80ac8c9c at tcp_output+0xdbc
> #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045
> #5 0xffffffff80ac2e04 at tcp_input+0xd04
> #6 0xffffffff80a54fc7 at ip_input+0x97
> #7 0xffffffff809f4f73 at swi_net+0x143
> #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab
> #9 0xffffffff808fb396 at ithread_loop+0x96
> #10 0xffffffff808f8b6a at fork_exit+0x9a
> #11 0xffffffff80d0b67e at fork_trampoline+0xe
> Uptime: 21d0h54m53s
> Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
> Loaded symbols for /boot/kernel/accf_data.ko.symbols
> Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
> Loaded symbols for /boot/kernel/accf_http.ko.symbols
> Reading symbols from /boot/kernel/oce.ko.symbols...done.
> Loaded symbols for /boot/kernel/oce.ko.symbols
> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/nullfs.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> #0  doadump (textdump=<value optimized out>) at pcpu.h:219
> 219	pcpu.h: No such file or directory.
> 	in pcpu.h
> (kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
> #1  0xffffffff80927da2 in kern_reboot (howto=260)
>    at /usr/src/sys/kern/kern_shutdown.c:452
> #2  0xffffffff80928164 in panic (fmt=<value optimized out>)
>    at /usr/src/sys/kern/kern_shutdown.c:759
> #3  0xffffffff8099c180 in sbsndptr (sb=<value optimized out>, 
>    off=<value optimized out>, len=<value optimized out>, 
>    moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011
> #4  0xffffffff80ac8c9c in tcp_output (tp=0xfffff80312ef5800)
>    at /usr/src/sys/netinet/tcp_output.c:870
> #5  0xffffffff80ac6a95 in tcp_do_segment (m=<value optimized out>, 
>    th=<value optimized out>, so=<value optimized out>, 
>    tp=<value optimized out>, drop_hdrlen=<value optimized out>, tlen=0, 
>    iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1
> )
>    at /usr/src/sys/netinet/tcp_input.c:3018
> #6  0xffffffff80ac2e04 in tcp_input (m=<value optimized out>, 
>    off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1377
> #7  0xffffffff80a54fc7 in ip_input (m=0xfffff800b4516600)
>    at /usr/src/sys/netinet/ip_input.c:734
> #8  0xffffffff809f4f73 in swi_net (arg=0xffffffff81988880)
>    at /usr/src/sys/net/netisr.c:765
> #9  0xffffffff808faf4b in intr_event_execute_handlers (
>    p=<value optimized out>, ie=0xfffff800093ac600)
>    at /usr/src/sys/kern/kern_intr.c:1263
> #10 0xffffffff808fb396 in ithread_loop (arg=0xfffff80009388e40)
>    at /usr/src/sys/kern/kern_intr.c:1276
> #11 0xffffffff808f8b6a in fork_exit (
>    callout=0xffffffff808fb300 <ithread_loop>, arg=0xfffff80009388e40, 
>    frame=0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996
> #12 0xffffffff80d0b67e in fork_trampoline ()
>    at /usr/src/sys/amd64/amd64/exception.S:606
> #13 0x0000000000000000 in ?? ()
> Current language:  auto; currently minimal
> (kgdb) 


Hi Julien and -net,

A sunny Monday, no crashes since the patch was applied. Great! Big thanks again!

We still have nothing in the dtrace log, though.

And I wonder if the above crash could possibly be a result of hitting that same bug?

Palle




More information about the freebsd-net mailing list