Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

John Baldwin jhb at freebsd.org
Mon May 17 16:02:00 UTC 2010


On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote:
> > > The crash was a "page fault while in kernel mode" with the current process
> > > being the interrupt service routine for the bce0 GigE. Things progressed
> > > reasonably until partway through the dump, when the system locked up with a
> > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the
> > > same PID as reported in the main crash.
> >
> > Hmm.  You could try changing the code to not do a nested panic in that
> > case.  You would update subr_turnstile.c to just return if panicstr is
> > not NULL rather than calling panic.  However, there is still a good
> > chance you will end up deadlocking in that case.  I have another patch I
> > can send you next week that prevents blocking on mutexes duing a panic
> > which may also help.
> 
>   Ok, I'll be glad to try that.

--- //depot/vendor/freebsd/src/sys/kern/kern_mutex.c	2010/01/23 15:55:14
+++ //depot/projects/smpng/sys/kern/kern_mutex.c	2010/03/10 22:33:24
@@ -348,6 +348,15 @@
 		return;
 	}
 
+	/*
+	 * If we have already panic'd and this is the thread that called
+	 * panic(), then don't block on any mutexes but silently succeed.
+	 * Otherwise, the kernel will deadlock since the scheduler isn't
+	 * going to run the thread that holds the lock we need.
+	 */
+	if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+		return;
+
 	lock_profile_obtain_lock_failed(&m->lock_object,
 		    &contested, &waittime);
 	if (LOCK_LOG_TEST(&m->lock_object, opts))
@@ -664,6 +673,15 @@
 	}
 
 	/*
+	 * If we failed to unlock this lock and we are a thread that has
+	 * called panic(), it may be due to the bypass in _mtx_lock_sleep()
+	 * above.  In that case, just return and leave the lock alone to
+	 * avoid changing the state.
+	 */
+	if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+		return;
+
+	/*
 	 * We have to lock the chain before the turnstile so this turnstile
 	 * can be removed from the hash list if it is empty.
 	 */

> > > 3) Is there any way to rig the system to obtain more info if this happens
> > > again? Right now I'm using an embedded remote console server, but I could
> > > switch the system to a serial port if enabling the kernel debugger might help.
> > > But I think that the sleeping thread bit would happen even at the debugger
> > > prompt, wouldn't it?
> >
> > Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.
> 
>   Hmmm. Do you think it will get very far before the sleeping thread business
> locks it up?

It should be able to print the backtrace when it panics at least.

-- 
John Baldwin


More information about the freebsd-stable mailing list