Kernel panics in tcp_twclose

Palle Girgensohn girgen at FreeBSD.org
Thu Sep 24 06:55:33 UTC 2015


> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn <girgen at pingpong.net>:
> 
> 
>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn <girgen at pingpong.net>:
>> 
>> Hi,
>> 
>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch at freebsd.org>:
>>> 
>>> 
>>> Hi,
>>> 
>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon <jch at freebsd.org>:
>>>>> On 23/09/15 16:36, Palle Girgensohn wrote:
>>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch at freebsd.org>: On
>>>>>>> 22/09/15 22:58, Palle Girgensohn wrote:
>>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch at freebsd.org>: 
>>>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn 
>>>>>>>>>>> <girgen at FreeBSD.org>:
>>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn 
>>>>>>>>>>>> <girgen at FreeBSD.org>:
>>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon 
>>>>>>>>>>>>> <jch at FreeBSD.org>: On 18/09/15 18:06, Konstantin
>>>>>>>>>>>>> Belousov wrote:
>>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien
>>>>>>>>>>>>>> Charbon wrote:
>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2 
>>>>>>>>>>>>> kernel with these options:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> options        DDB options        DEADLKRES options 
>>>>>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options
>>>>>>>>>>>>> WITNESS options        WITNESS_SKIPSPIN
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To see where the original fault is coming from.
>>>>>>>>>>>> [...]
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean" 
>>>>>>>>>>>> stable/10, or could 287621 and 287780 help?
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll add the debugging suggested options right away.
>>>>>>>>>>>> 
>>>>>>>>>>>> Palle
>>>>>>>>>>> 
>>>>>>>>>>> I have a new core dump from ^/stable/10 with:
>>>>>>>>>>> 
>>>>>>>>>>> options        DDB options        DEADLKRES options
>>>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options WITNESS
>>>>>>>>>>> options WITNESS_SKIPSPIN
>>>>>>>>>> 
>>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] 
>>>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free 
>>>>>>>>>> software, covered by the GNU General Public License, and you
>>>>>>>>>> are welcome to change it and/or distribute copies of it under
>>>>>>>>>> certain conditions. Type "show copying" to see the
>>>>>>>>>> conditions. There is absolutely no warranty for GDB.  Type
>>>>>>>>>> "show warranty" for details. This GDB was configured as
>>>>>>>>>> "amd64-marcel-freebsd"...
>>>>>>>>>> 
>>>>>>>>>> Unread portion of the kernel message buffer: panic:
>>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid =
>>>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at 
>>>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 
>>>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame
>>>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at
>>>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940
>>>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame
>>>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame
>>>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame
>>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame
>>>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame
>>>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame
>>>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at 
>>>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall()
>>>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall
>>>>>>>>>> (6, FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = 
>>>>>>>>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic 
>>>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263 
>>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>>>>>>> 
>>>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. 
>>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading
>>>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded
>>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from
>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols
>>>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from
>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for 
>>>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for 
>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for 
>>>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from 
>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for 
>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0  doadump (textdump=1) at 
>>>>>>>>>> pcpu.h:219 219        __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt
>>>>>>>>>> #0 doadump (textdump=1) at pcpu.h:219 #1  0xffffffff8094b337
>>>>>>>>>> in kern_reboot (howto=260) at
>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845
>>>>>>>>>> in vpanic (fmt=<value optimized out>, ap=<value optimized
>>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3
>>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=<value optimized
>>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4
>>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=<value optimized
>>>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5
>>>>>>>>>> 0xffffffff809cd291 in sofree (so=0xfffff801dd302000) at 
>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6  0xffffffff809cdb00 in
>>>>>>>>>> soclose (so=<value optimized out>) at 
>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7  0xffffffff808fe659 in
>>>>>>>>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 
>>>>>>>>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, 
>>>>>>>>>> td=0xfffff80eebc894a0) at
>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d
>>>>>>>>>> in closefp (fdp=0xfffff80b20cce000, fd=<value optimized out>,
>>>>>>>>>> fp=0xfffff802a593db40, td=0xfffff80eebc894a0,
>>>>>>>>>> holdleaders=<value optimized out>) at 
>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a
>>>>>>>>>> in amd64_syscall (td=0xfffff80eebc894a0, traced=0) at 
>>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall ()
>>>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12
>>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this
>>>>>>>>>> frame (corrupt stack?) Current language:  auto; currently
>>>>>>>>>> minimal
>>>>>>>>> 
>>>>>>>>> Thanks for the information.  As I suspected the initial error
>>>>>>>>> was elsewhere than tcp_twclose(), never got this assertion
>>>>>>>>> before:
>>>>>>>>> 
>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
>>>>>>>>> 
>>>>>>>>> from here:
>>>>>>>>> 
>>>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) {
>>>>>>>>> struct tcpcb *tp;
>>>>>>>>> 
>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>>>>>>>> 
>>>>>>>>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); 
>>>>>>>>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket !=
>>>>>>>>> so"));
>>>>>>>>> 
>>>>>>>>> tp = intotcpcb(inp);
>>>>>>>>> 
>>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & 
>>>>>>>>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT
>>>>>>>>> && " "INP_DROPPED && tp != NULL"));
>>>>>>>>> 
>>>>>>>>> Let me check if I could find a path that could lead to this 
>>>>>>>>> unexpected case.  Unexpected because:  INP_DROPPED and 
>>>>>>>>> inp->inp_ppcb is set to NULL are set at same time here:
>>>>>>>>> 
>>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket
>>>>>>>>> *so; struct inpcb *inp;
>>>>>>>>> 
>>>>>>>>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), 
>>>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, 
>>>>>>>>> ("tcp_twclose: inp_ppcb != tw")); 
>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);      /* in_pcbfree() */ 
>>>>>>>>> INP_WLOCK_ASSERT(inp);
>>>>>>>>> 
>>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; 
>>>>>>>>> in_pcbdrop(inp); ...
>>>>>>>>> 
>>>>>>>>> Interesting [...]
>>>>>> 
>>>>>> Just a quick update. Julien is pursuing this off list with a core
>>>>>> dump and we are now waiting for a new core dump with the first
>>>>>> KASSERT removed. this is on a stable/10 kernel.
>>>>> 
>>>>> By the way Palle could you also run below Dtrace script to see where
>>>>> this tcp_close() in INP_TIMEWAIT comes from:
>>>>> 
>>>>> $ cat tcp-close-tw.d
>>>>> fbt::tcp_close:entry
>>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/
>>>>> {
>>>>> @s1[stack()] = count()
>>>>> }
>>>>> 
>>>>> tick-1sec {
>>>>> printa(@s1);
>>>>> }
>>>>> $ sudo dtrace -s tcp-close-tw.d
>>>>> 
>>>>> And share any backtraces reported in this dtrace script output.
>>>>> 
>>>>> George, could you check if this dtrace script makes sense for you, and
>>>>> if you have any improvements to add, I am quite a rookie in Dtrace scripts.
>>>> 
>>>> Shall I let the dtrace script run continuously until the machine crashes? Or just run it once?
>>> 
>>> Continuously until the machine crashes.  You can report any backstrace
>>> outputs like:
>>> 
>>>            kernel`tcp_usr_close+0x86
>>>            kernel`soclose+0xe4
>>>            kernel`_fdrop+0x29
>>>            kernel`closef+0x237
>>>            kernel`closefp+0x95
>>>            kernel`amd64_syscall+0x357
>>>            kernel`0xffffffff80c83c4b
>>>              1
>>> 
>>> before the machine crashes.   But I expect the problematic case
>>> detection with Dtrace to be quickly followed by the crash.  Will see.
>>> 
>>> --
>>> Julien
>>> 
>> 
>> Kernels and userland are updated to 10.2-p3 with the patch removing the suspicous KASSERT. 
>> 
>> dtrace running continously redirecting to a log file.
>> 
>> now we're just waiting... :)
>> 
>> Palle
> 
> Is the dtrace correct?
> 
> $ sort -u dtrace.out 
> 
>  0  59779                       :tick-1sec 
> CPU     ID                    FUNCTION:NAME
> $ wc -l dtrace.out 
>   56233 dtrace.out
> 
> 
> All it does is write
> 
> 0  59779                       :tick-1sec 
> 
> once a second.
> 
> Just checking... :)
> 
> Palle


Just had a crash. Unfortunately, the kernel was stuck at the db> prompt, and the remote keyboard was unresponsive (HP ILO, not impressed). So I had to reset the power and never got a core dump... :(

panic: tcp_tw_2msl_stop: inp should not be released here                        
cpuid = 0                                                                       
KDB: stack backtrace:                                                           
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe175acd16a0  
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe175acd1750                  
vpanic() at vpanic+0x126/frame 0xfffffe175acd1790                               
kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800                 
tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850                     
tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890           
tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0                    
pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0                        
softclock_call_cc() at softclock_call_cc+0x193/frame 0xfffffe175acd19d0         
softclock() at softclock+0x47/frame 0xfffffe175acd19f0                          
intr_event_execute_handlers() at intr_event_execute_handlers+0x93/frame 0xfffffe
175acd1a30                                                                      
ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70                    
fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0                          
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0               
--- trap 0, rip = 0, rsp = 0xfffffe175acd1b70, rbp = 0 ---                      
KDB: enter: panic                                                               
[ thread pid 12 tid 100043 ]                                                    
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why                              
db>                                                                             



Is there a way to configure the kernel to get all the gory debug stuff without it dropping to the debug prompt on panic? I'd rather see it just dump core and restart automatically.

Palle






More information about the freebsd-net mailing list