new panic in cpu_reset() with WITNESS
attilio at freebsd.org
Thu May 17 14:07:40 UTC 2012
2012/5/17, Andriy Gapon <avg at freebsd.org>:
> on 25/01/2012 23:52 Andriy Gapon said the following:
>> on 24/01/2012 14:32 Gleb Smirnoff said the following:
>>> Yes, now:
>>> lock order reversal:
>>> 1st 0xffffffff80937140 smp rendezvous (smp rendezvous) @
>>> 2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) @
>>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @
>> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no
>> It's still intriguing to me why the LOR *doesn't* happen [*] with
>> stop_scheduler_on_panic=0. But I don't see a productive way to pursue
>> investigation further.
> Salve Glebius!
> After your recent nudging I took yet another look at this issue and it seems
> I have some findings.
> For those who might get interested here is a convenience reference to the
> thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307
> A short summary.
> Prerequisites: an SMP x86 system, its kernel is configured with WITNESS &&
> !WITNESS_SKIPSPIN (this is important) and a system uses serial console via
> Then, if stop_scheduler_on_panic is set to zero the system can be rebooted
> a problem. On the other hand, if stop_scheduler_on_panic is enabled, then
> system first runs into a LOR when calling printf() in cpu_reset() and then
> it runs
> into a panic when printf is recursively called from witness(9) to report the
> The panic happens because of the recursion on cnputs_mtx, which should
> that cnputs() output is not intermingled and which is not flagged to
> There are two things about this report that greatly confused and puzzled
> 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how
> could it
> be possible that changing its value affects behavior of the system when
> is not called?!
> 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and
> "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks
> actually predefined in witness order_lists such that uart_hwmtx must come
> smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in
> shutdown_reset(), then we call cpu_reset, then it calls printf and from
> there we
> get to uart_hwmtx for serial console output. So the order between these
> is always violated in the x86 SMP reboot path.
> How come witness(9) doesn't _always_ detect this LOR?
> How come it didn't detect this LOR before any "stop scheduler" commits?!
> [Spoiler alert :-)]
> Turns out that the two puzzles above are closely related.
> Let's first list all the things that change when stop_scheduler_on_panic is
> enabled and a panic happens:
> - other CPUs are stopped (forced to spin)
> - interrupts on current CPU are disabled
> - by virtue of the above the current thread should be the only thread
> (unless it executes a voluntary switch)
> - all locks are "busted", they are completely ignored / bypassed
> - by virtue of the above no lock invariants and witness checks are
> performed, so
> no lock order checking, no recursion checking, etc
> So, what I observe is this: when stop_scheduler_on_panic is disabled, the
> LOR is
> actually detected as it should be. witness(9) works properly here. Once
> the LOR
> is detected witness(9) wants to report it using printf(9). That's where we
> into the cnputs_mtx recursion panic. It's all exactly as with
> stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic
> printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags()
> locks recursion again (this is independent of witness(9)) and calls
> again. Then panic(9) wants to report the panic using printf(9)...
> I assume that when the stack is exhausted we run into a double fault and
> dblfault_handler wants to print something again... Likely we eventually run
> a triple fault which resets the machine. Although, I recall some old
> reports of
> machines hanging during reboot in a place suspiciously close to where the
> described ordeal happens...
> But if the machine doesn't hang we get a full appearance of the reset
> happening (modulo some last messages missing).
> With stop_scheduler_on_panic enabled and all the locks being ignored we, of
> course, do not run into any secondary lock recursions and resulting panics.
> the system is able to at least panic "gracefully" (still we should have
> reported the LOR gracefully, no panic is required).
> Some obvious conclusions:
> - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause
> the problem; it should be fixed one way or other - either by correcting
> order_lists or by avoiding the LOR in shutdown_reset/cpu_reset;
> - because witness(9) uses printf(9) to report problems, it is very fragile
> to use
> witness with any locks that can be acquired under printf(9)
> - stop_scheduler_on_panic just uncovers the true bug
Thanks for working on this.
My thoughts on the matter:
- We should not printf() in cpu_reset() which is supposed to be a very
low level primitives, usable by any context. Acquiring console
spinlocks there is not really a wise choice.
- I'd like to see witness being based on an alternative approach, but
we may find something that doesn't wrap up (thus a bufring ala KTR is
not appropriate) and can work also in SMP context (thus a lockfree
algorithm is not appropriate). I don't have any good solution for this
on the top of my idea, but of course, relying on a path acquiring
locks for a lock protector is not a great idea. The only other thing
we may do is marking locks involved in the printf() path with
NO_WITNESS, but that of course won't solve the deadlock problem
- On a related note, I once had a patch that was having BSP recovering
from deadlock in the reset code and panic/print informations, under
DIAGNOSTIC. I may find it again and post here.
Definitively, I think that we need to make cpu_reset() locks agnostic
and if you agree I can make a patch for that.
Peace can only be achieved by understanding - A. Einstein
More information about the freebsd-current