Kernel panics in tcp_twclose

Thu Sep 24 09:39:14 UTC 2015

> 24 sep 2015 kl. 09:57 skrev Julien Charbon <jch at FreeBSD.org>:
> 
> 
> Hi -net,
> 
> On 24/09/15 09:03, Julien Charbon wrote:
>> On 24/09/15 08:55, Palle Girgensohn wrote:
>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn
>>>> <girgen at pingpong.net>:
>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn
>>>>> <girgen at pingpong.net>:
>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch at freebsd.org>: 
>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>> Kernels and userland are updated to 10.2-p3 with the patch
>>>>> removing the suspicous KASSERT.
>>>>> dtrace running continously redirecting to a log file.
>>> Just had a crash. Unfortunately, the kernel was stuck at the db>
>>> prompt, and the remote keyboard was unresponsive (HP ILO, not
>>> impressed). So I had to reset the power and never got a core dump...
>>> 
>>> panic: tcp_tw_2msl_stop: inp should not be released here
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame
>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790
>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800
>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850
>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame
>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame
>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame
>>> 0xfffffe175acd18f0 softclock_call_cc() at
>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at
>>> softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers()
>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30
>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70
>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0
>>> --- trap 0, rip = 0, rsp = 0xfffffe175acd1b70, rbp = 0 ---
>>> KDB: enter: panic
>>> [ thread pid 12 tid 100043 ]
>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>> db>
>> 
>> Thanks a log for this backstrace.  This is what at expected, when
>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one
>> extra time that leads to:
>> 
>> tcp_tw_2msl_stop: inp should not be released here
>> 
>> Let me try to come with a tentative fix for this case.
> 
> See joined my tentative patch for these case.  It is only a first
> tentative patch as I am still waiting on -net feedbacks on what should
> be the rule here.
> 
> By the way:
> 
> - I see nothing specific to VIMAGE here
> 
> - Anyone aware of tcp_close() (or tcp_drop()) calls modified/introduced
> recently in 10.2 that could explained why this issue only appears only now?
> 
> --
> Julien
> <tcp-close-fix-v1.patch>

Running a machine with the patch now (it just crashed and rebooted with the new kernel).

Hoping it will have a "soothing" effect... ;-)

dtrace running as previously. No output yet, though.

Palle