Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

John Baldwin jhb at FreeBSD.org
Fri May 14 11:53:25 UTC 2010


Terry Kennedy wrote:
>   I'm reposting this over here at the suggestion of the Forums moderator.
> The original post is at http://forums.freebsd.org/showthread.php?t=14163
> 
> Got an interesting crash just now (well, as interesting as a crash on a 
> soon-to-be production system can be).
> 
> This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th.
> 
> The system didn't complete the crash dump, so it needed a manual reset to get 
> it going again.
> 
> The crash was a "page fault while in kernel mode" with the current process 
> being the interrupt service routine for the bce0 GigE. Things progressed 
> reasonably until partway through the dump, when the system locked up with a 
> "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the 
> same PID as reported in the main crash.

Hmm.  You could try changing the code to not do a nested panic in that 
case.  You would update subr_turnstile.c to just return if panicstr is 
not NULL rather than calling panic.  However, there is still a good 
chance you will end up deadlocking in that case.  I have another patch I 
can send you next week that prevents blocking on mutexes duing a panic 
which may also help.

> 3) Is there any way to rig the system to obtain more info if this happens 
> again? Right now I'm using an embedded remote console server, but I could 
> switch the system to a serial port if enabling the kernel debugger might help. 
> But I think that the sleeping thread bit would happen even at the debugger 
> prompt, wouldn't it? 

Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.

> I just booted the new kernel and tried this again, and got another crash. The 
> message is identical to the first, except that the instruction pointer changed 
> by 0x10 (presumably due to code differences between the old and new kernels) 
> and it got 6MB further writing the crash dump.
> 
> Since it seems I can reproduce this at will, I'll be glad to either perform 
> additional information-gathering or give a developer access to the box for 
> testing purposes.
> 
> Is it possible to correlate the source line in the kernel with the instruction 
> pointer in the panic? 

If you are booted into the same kernel with the same modules loaded, you 
can probably run 'kgdb' as root do 'l *<instruction pointer>'.

-- 
John Baldwin



More information about the freebsd-stable mailing list