[Bug 261198] bhyve host panics with: spin lock 0xffffffff81eac800 (callout) helpanic: spin lock held too long
Date: Sat, 05 Feb 2022 17:30:14 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261198
Mark Johnston <markj@FreeBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|New |Open
CC| |markj@FreeBSD.org
--- Comment #5 from Mark Johnston <markj@FreeBSD.org> ---
I hit what appears to be the same thing. The system was running a bunch of
CPU-intensive FreeBSD guests and a buildworld on the host, somewhat
oversubscribed.
One of the per-CPU callout spinlocks was held for too long:
spin lock 0xffffffff81ead8c0 (callout) held by 0xfffffe00806e21e0 (tid 100203)
too long
panic: spin lock held too long
cpuid = 17
time = 1644078265
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c03184e0
vpanic() at vpanic+0x17f/frame 0xfffffe01c0318530
panic() at panic+0x43/frame 0xfffffe01c0318590
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x74/frame
0xfffffe01c03185b0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe01c0318620
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xd5/frame 0xfffffe01c0318660
_callout_stop_safe() at _callout_stop_safe+0x145/frame 0xfffffe01c03186d0
sleepq_remove_thread() at sleepq_remove_thread+0x1ad/frame 0xfffffe01c03186f0
sleepq_resume_thread() at sleepq_resume_thread+0xaa/frame 0xfffffe01c0318730
wakeup_one() at wakeup_one+0x1f/frame 0xfffffe01c0318750
vcpu_notify_event() at vcpu_notify_event+0x4e/frame 0xfffffe01c0318780
If I look at the mutex state, though, the lock is unheld:
(kgdb) frame 5
#5 0xffffffff80be46d4 in _mtx_lock_indefinite_check
(m=m@entry=0xffffffff81ead8c0 <cc_cpu+8000>,
ldap=ldap@entry=0xfffffe01c03185e0)
at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1256
1256 panic("spin lock held too long");
(kgdb) p td
$1 = (struct thread *) 0xfffffe00806e21e0
(kgdb) p m
$2 = (struct mtx *) 0xffffffff81ead8c0 <cc_cpu+8000>
(kgdb) p *m
$3 = {lock_object = {lo_name = 0xffffffff8130481b "callout", lo_flags = 196608,
lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}
(kgdb) p td->td_tid
$4 = 100203
(kgdb) tid 100203
(kgdb) bt
#0 cpustop_handler () at /usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:1490
#1 0xffffffff810ab739 in ipi_nmi_handler () at
/usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:1447
#2 0xffffffff810e4dd6 in trap (frame=0xfffffe006baf5f30) at
/usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:240
#3 <signal handler called>
#4 rdtsc () at /usr/home/markj/src/freebsd/sys/amd64/include/cpufunc.h:355
#5 delay_tsc (n=1) at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:68
#6 cpu_lock_delay () at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:140
#7 0xffffffff80be46d9 in _mtx_lock_indefinite_check (m=0x22000ee4,
m@entry=0xfffffe006baf70c0, ldap=0x9927a00000000,
ldap@entry=0xffffffff82d07e70)
at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1243
#8 0xffffffff80be490a in thread_lock_flags_ (td=0xfffffe00806e21e0, opts=0,
file=<optimized out>, line=<optimized out>)
at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:929
#9 0xffffffff80c27226 in softclock_thread (arg=arg@entry=0xffffffff81ead8c0
<cc_cpu+8000>) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:840
#10 0xffffffff80bbfc40 in fork_exit (callout=0xffffffff80c271e0
<softclock_thread>, arg=0xffffffff81ead8c0 <cc_cpu+8000>,
frame=0xffffffff82d07f40)
at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:1102
#11 <signal handler called>
#12 0x35f3ad8ceff1b8b6 in ?? ()
So this softclock thread is spinning for its own thread lock, which is:
(kgdb) p/x *m@entry
$5 = {lock_object = {lo_name = 0xfffffe006baf7d40, lo_flags = 0x30000, lo_data
= 0x0, lo_witness = 0x0}, mtx_lock = 0xfffffe0237c51c80}
and the owning thread is the thread that panicked:
(kgdb) p ((struct thread *)$5.mtx_lock)->td_tid
$6 = 161889
Hmm. I somewhat doubt this is bhyve specific, bhyve just exercises the callout
subsystem heavily. This is probably a result of commit
74cf7cae4d2238ae6d1c949b2bbd077e1ab33634 somehow.
--
You are receiving this mail because:
You are the assignee for the bug.