Kernel panic in smp rendezvous

Wed May 10 22:34:04 UTC 2017

Hi,
I'm getting the following kernel panics on recent 11-STABLE (r317883):

spin lock 0xffffffff81df43d0 (smp rendezvous) held by 0xfffff8019c7a7000 (tid 100845) too long
timeout stopping cpus
panic: spin lock held too long
cpuid = 12
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe3e64caf2a0
vpanic() at vpanic+0x186/frame 0xfffffe3e64caf320
panic() at panic+0x43/frame 0xfffffe3e64caf380
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe3e64caf3f0
smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x101/frame 0xfffffe3e64caf470
smp_masked_invlpg_range() at smp_masked_invlpg_range+0x50/frame 0xfffffe3e64caf4a0
pmap_invalidate_range() at pmap_invalidate_range+0x196/frame 0xfffffe3e64caf4e0
vm_thread_stack_dispose() at vm_thread_stack_dispose+0x2f/frame 0xfffffe3e64caf530
thread_free() at thread_free+0x39/frame 0xfffffe3e64caf550
thread_reap() at thread_reap+0x10e/frame 0xfffffe3e64caf570
proc_reap() at proc_reap+0x6bd/frame 0xfffffe3e64caf5b0
proc_to_reap() at proc_to_reap+0x48c/frame 0xfffffe3e64caf600
kern_wait6() at kern_wait6+0x49e/frame 0xfffffe3e64caf6b0
sys_wait4() at sys_wait4+0x78/frame 0xfffffe3e64caf8a0
amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe3e64cafa30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3e64cafa30

The panics occur sporadically and seemingly randomly. They occur on
multiple machines with the same configuration, and on older revisions of
11-STABLE as well as r317883 (I cannot easily bisect this down to a
specific commit given that it can take days or sometimes weeks for the
panic to occur). Note that my kernel is compiled with "options IPSEC", but
that does not seem to be relevant.

My understanding of the overall problem is that an IPI is sent to all cores
by smp_rendezvous_cpus, but that some of the cores do not respond to it (or
at least do not respond to it correctly and in time). More specifically, I
can find 61 threads that seem to be blocked in cpustop_handler on an atomic
operation to indicate that they have stopped in response to the IPI. This
operation is CPU_SET_ATOMIC(cpu, &stopped_cpus); CPU_SET_ATOMIC boils down
to a call to "atomic_set_long" (not sure where that is itself defined for
amd64). Either there is a line numbering problem, or this suggests that
perhaps there is a deadlock at this step (unless there's some sort of
livelock and CPU_SET_ATOMIC is the place where the threads spend most of
their time).

Looking at the thread backtraces as a whole, I can see the one that
triggered the panic, the 61 that are in cpustop_handler, and a lot of
sleeping threads. I guess each core should have an active thread, and this
is an architecture with 64 cores, so that leaves 64 - (1 + 61) = 2 cores
that are unaccounted for. Interestingly, for 2 different panics for which I
have a vmcore file these numbers are the same. For these two panics, the
IDs of the cores that are neither in cpustop_handler nor calling the smp
rendez vous are not the same (#18 and #19 in one instance, #44 and #45 in
the other instance) but in both instances the IDs are consecutive; perhaps
that's relevant.

I've uploaded a full set of backtraces at
https://gist.github.com/cinquin/d63cdf9de01b0b8033c47128868f2d38 and a
dmesg at https://gist.github.com/cinquin/17c0cf6ac7832fd1b683488d08d3e69b
(in short, the machine is a 4 processor Opteron 6274 system). I'm pasting below
an example backtrace of the threads in stopcpu_handler.

Any suggestions as to what the problem might be and how to solve it? I've
saved the core and can provide further information.

Thanks
Olivier Cinquin

A total of 61 threads are along the lines of

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
#1  0xffffffff8105a6f5 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1231
#2  0xffffffff80ef740a in trap (frame=0xfffffe3e687f7f30) at /usr/src/sys/amd64/amd64/trap.c:185
#3  0xffffffff80edbd03 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:510
...
(kgdb) frame 0
#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
1275		CPU_SET_ATOMIC(cpu, &stopped_cpus);
(kgdb) list
1270		cpu = PCPU_GET(cpuid);
1271	
1272		savectx(&stoppcbs[cpu]);
1273	
1274		/* Indicate that we are stopped */
1275		CPU_SET_ATOMIC(cpu, &stopped_cpus);
1276	
1277		/* Wait for restart */
1278		while (!CPU_ISSET(cpu, &started_cpus))
1279		    ia32_pause();
(kgdb) p stopped_cpus
$1 = {__bits = 0xffffffff81df4368}
(kgdb) p started_cpus
$2 = {__bits = 0xffffffff81df4418}