Crash dump problem - sleeping thread owns a non-sleepable lock
during crash dump write
John Baldwin
jhb at freebsd.org
Mon May 17 16:02:00 UTC 2010
On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote:
> > > The crash was a "page fault while in kernel mode" with the current process
> > > being the interrupt service routine for the bce0 GigE. Things progressed
> > > reasonably until partway through the dump, when the system locked up with a
> > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the
> > > same PID as reported in the main crash.
> >
> > Hmm. You could try changing the code to not do a nested panic in that
> > case. You would update subr_turnstile.c to just return if panicstr is
> > not NULL rather than calling panic. However, there is still a good
> > chance you will end up deadlocking in that case. I have another patch I
> > can send you next week that prevents blocking on mutexes duing a panic
> > which may also help.
>
> Ok, I'll be glad to try that.
--- //depot/vendor/freebsd/src/sys/kern/kern_mutex.c 2010/01/23 15:55:14
+++ //depot/projects/smpng/sys/kern/kern_mutex.c 2010/03/10 22:33:24
@@ -348,6 +348,15 @@
return;
}
+ /*
+ * If we have already panic'd and this is the thread that called
+ * panic(), then don't block on any mutexes but silently succeed.
+ * Otherwise, the kernel will deadlock since the scheduler isn't
+ * going to run the thread that holds the lock we need.
+ */
+ if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+ return;
+
lock_profile_obtain_lock_failed(&m->lock_object,
&contested, &waittime);
if (LOCK_LOG_TEST(&m->lock_object, opts))
@@ -664,6 +673,15 @@
}
/*
+ * If we failed to unlock this lock and we are a thread that has
+ * called panic(), it may be due to the bypass in _mtx_lock_sleep()
+ * above. In that case, just return and leave the lock alone to
+ * avoid changing the state.
+ */
+ if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+ return;
+
+ /*
* We have to lock the chain before the turnstile so this turnstile
* can be removed from the hash list if it is empty.
*/
> > > 3) Is there any way to rig the system to obtain more info if this happens
> > > again? Right now I'm using an embedded remote console server, but I could
> > > switch the system to a serial port if enabling the kernel debugger might help.
> > > But I think that the sleeping thread bit would happen even at the debugger
> > > prompt, wouldn't it?
> >
> > Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.
>
> Hmmm. Do you think it will get very far before the sleeping thread business
> locks it up?
It should be able to print the backtrace when it panics at least.
--
John Baldwin
More information about the freebsd-stable
mailing list