Re: A panic a day

From: Mark Johnston <markj_at_freebsd.org>
Date: Thu, 22 Sep 2022 19:00:53 UTC
On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
> All,
> 
> I updated my kernel/world/all ports on Sept 19 2022.
> Since then, I have had daily panics and hard lock-up
> (no panic, keyboard, mouse, network, ...).  The one
> panic I did witness sent text scolling off the screen.
> There is no dump, or at least, I haven't figured out
> a way to get a dump.
> 
> Using ports/graphics/tesseract and then hand editing 
> the OCR result, the last visible portions is
> 
> 
> panic() at panic+0x43/frame 0xfffffe00daf65550
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xc6/frame 0xfffffe00daf655e0
> sched_add() at sched_add+0x98/frame 0xfffffe00daf656a0
> setrunnable() at setrunnable+0x73/frame 0xfffffe00daf656d0
> wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf656f0
> taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65720
> taskqueue_enqueue_timeout_sbt() at taskqueue_enqueue_timeout_sbt+0xe5/frame 0xfffffe00daf65770
> resettodr() at resettodr+0x7a/frame 0xfffffe00daf657b0
> kern_reboot() at kern_reboot+0x2ae/frame 0xfffffe00daf657f0
> vpanic() at vpanic+0x1be/frame 0xfffffe00daf65840
> panic() at panic+0x43/frame 0xfffffe00daf658a0
> __mtx_lock_spin_flags() at __mix_lock_spin_flags+0xc6/frame 0xfffffe00daf65ab0
> sched_add() at sched_add+0x98/frame 0xfffffe00daf65990
> setrunnable() at setrunnable+0x73/frame 0xfffffe008daf659c0
> wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf659e0
> taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65a11
> drm_crtc_helper_set_config() at drm_crtc_helper_set_config+0x971/frame 0xfffffe00daf65abl
> radeon_crtc_set_config() at radeon_crtc_set_config+0x22/frame 0xfffffe00daf65ad0
> __drm_mode_set_config_internal() at __drm_mode_set_config_internal+0xdd/frame 0xfffffe00daf65b10
> drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x160/frame 0xfffffe00daf65b50
> drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe00daf65b70
> drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x81/frame 
> vt_kms_postswitch() at vt_kms_postswitch+0x166/frame 0xfffffe00daf65bd0
> vt_window_switch() at vt_window_switch+0x119/frame 0xfffffe00daf65c1d
> vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe00daf65c30
> cngrab() at cngrab+0x26/frame 0xfffffe00daf65ca0
> vpanic() at vpanic+0xf0/frame 0xfffffe00daf65ca0
> panic() at panic+0x43/frame 0xfffffe00daf65d00
> __mtx_assert() at __mtx_assert+0x9d/frame 0xfffffe00daf65d10
> ast_sched_locked() at ast_sched_locked+0x29/frame 0xfffffe00daf65d30
> sched_add() at sched_add+0x4c5/frame 0xfffffe00daf65df0
> sched_switch() at sched_switch+0x9f/frame 0xfffffe00daf65e20
> mi_switch() at mi_switch+0x14b/frame 0xfffffe00daf65e40
> sched_bind() at sched_bind+0x73/frame 0xfffffe00daf65e60
> pcpu_cache_drain_safe() at pcpu_cache_drain_safe+0x25a/frame 0xfffffe00daf65e90
> uma_reclaim_domain() at uma_reclain_domain+0x279/frame Buf ffffe00dafohech
> uma_reclaim_worker() at uma_reclaim_worker+0x35/frame 0xfffffe00daf65ef0
> fork_exit() at fork_exit+0x80/frame 0xfffffe00daf65f30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00daf65f30
> --- trap 0, rip = 0, rop = 0, rbp = 0 ---

It looks like you use the 4BSD scheduler?  I think there's a bug in
kick_other_cpu() in that it doesn't make sure that the remote CPU's
curthread lock is held when modifying thread state.  Because 4BSD has a
global scheduler lock, this is often true in practice, but doesn't have
to be.

I think this untested patch will address the panics.  The bug was there
for a long time but some recent restructuring added an assertion which
caught it.

diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
index 9d48aa746f6d..484864b66c1c 100644
--- a/sys/kern/sched_4bsd.c
+++ b/sys/kern/sched_4bsd.c
@@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
 	}
 #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
 
-	ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
-	ipi_cpu(cpuid, IPI_AST);
-	return;
+	if (pcpu->pc_curthread->td_lock == &sched_lock) {
+		ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
+		ipi_cpu(cpuid, IPI_AST);
+	}
 }
 #endif /* SMP */
 
@@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
 
 	cpuid = PCPU_GET(cpuid);
 	if (single_cpu && cpu != cpuid) {
-	        kick_other_cpu(td->td_priority, cpu);
+		kick_other_cpu(td->td_priority, cpu);
 	} else {
 		if (!single_cpu) {
 			tidlemsk = idle_cpus_mask;