Crash dump problem - sleeping thread owns a non-sleepable lock
during crash dump write
John Baldwin
jhb at freebsd.org
Mon May 17 16:01:59 UTC 2010
On Friday 14 May 2010 11:42:44 am Matthew Fleming wrote:
> > As an aside, this is a quad-core in one package CPU (an X3363). On both
> > this box and a similar one with an X5470, console messages continue to
> > print out after "the system has been halted - press any key to reboot" -
> > in particular, the shutdown makes a bunch of the "behind the scenes" man-
> > agement stuff like the virtual keyboard and monitor appear. Plugging or
> > unplugging USB devices will go through the whole deal of detecting and
> > making their service available.
>
> Oops, youre right that other CPUs are running.
>
> The stop_cpus() call is only made if kdb is entered. doadump() is called
out of boot() which comes later. At Isilon weve been running with a patch
that does stop_cpus() pretty close to the front of panic(9).
>
> As an design decision it seems reasonable to call stop_cpus() early in
panic(9) simply because most causes for panic means something unexpected, and
the sooner the other CPUs arent running the more likely it is that they dont
do more damage, leaving the system in a more useful state for dump or {g,d}db
analysis. This should be done before dump or entering kdb.
>
> Im ccing -current@ since I would like a small discussion of moving the
stop_cpus() to earlier in panic. If this change is agreeable I can roll up a
patch and test it on CURRENT. Im not sure yet how much of the other panic-
related changes we have made at Isilon would be required.
Right now what happens on x86 is that cpu_reset() actually ends up stopping
the other CPUs. It's good that cpu_reset() does this so that 'reset' from DDB
works. That said, it would probably be a good thing to stop CPUs earlier
during a panic, and even during a normal shutdown. One issue with using
stop_cpus() during shutdown is that it is too severe of a stop. That is,
stop_cpus() doesn't release the threads currently running. This could be a
problem during a normal shutdown if a non-boot CPU is running an interrupt
thread needed during shutdown, etc. I think what we really want is a way to
take CPUs offline (which Attilio is working on) and use that during a normal
shutdown. A quick fix might be a way to force CPUs offline where you have a
'shutdown' or 'offline' mask of sorts and teach the scheduler to only return
the idlethread in that case and then send an IPI_PREEMPT to all the CPUs.
That will break any pinned or bound threads pinned to non-boot CPUs though.
--
John Baldwin
More information about the freebsd-current
mailing list