svn commit: r306346 - head/sys/kern

Thu Oct 6 03:31:27 UTC 2016

On Wed, 5 Oct 2016, Slawa Olhovchenkov wrote:

> On Wed, Oct 05, 2016 at 11:19:10AM +1100, Bruce Evans wrote:
>
>> On Tue, 4 Oct 2016, Gleb Smirnoff wrote:
>>
>>> On Mon, Sep 26, 2016 at 03:30:30PM +0000, Eric van Gyzen wrote:
>>> E> ...
>>> E> Modified: head/sys/kern/kern_mutex.c
>>> E> ==============================================================================
>>> E> --- head/sys/kern/kern_mutex.c	Mon Sep 26 15:03:31 2016	(r306345)
>>> E> +++ head/sys/kern/kern_mutex.c	Mon Sep 26 15:30:30 2016	(r306346)
>>> E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c
>>> E>  {
>>> E>  	const struct mtx *m;
>>> E>
>>> E> -	if (panicstr != NULL || dumping)
>>> E> +	if (panicstr != NULL || dumping || SCHEDULER_STOPPED())
>>> E>  		return;
>>>
>>> I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()?
>>> Positive panicstr and dumping imply scheduler stopped.
>>
>> 'dumping' doesn't imply SCHEDULER_STOPPED().
>>
>> Checking 'dumping' here seems to be just an old bug.  It just breaks
>> __mtx_assert(), while all other mutex operations work normally for dumping
>> without panicing.
>
> [...]
>
> Is this related to halted (not reboted) 11.0 after ~^B and `panic`?

There might be related problems, but I don't see any here.

> What I see on serial console:
> =====
> db> panic
> panic: from debugger

I wouldn't trust panic from the debugger, but it is safer than dump
from the debugger (both are ddb commands, but this is another bug).

> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff8031fadb = db_trace_self_wrapper+0x2b/frame 0xfffffe1f9e198120
> vpanic() at 0xffffffff804a0302 = vpanic+0x182/frame 0xfffffe1f9e1981a0
> panic() at 0xffffffff804a0383 = panic+0x43/frame 0xfffffe1f9e198200
> db_panic() at 0xffffffff8031d987 = db_panic+0x17/frame 0xfffffe1f9e198210
> db_command() at 0xffffffff8031d019 = db_command+0x299/frame 0xfffffe1f9e1982e0
> db_command_loop() at 0xffffffff8031cd74 = db_command_loop+0x64/frame 0xfffffe1f9e1982f0
> db_trap() at 0xffffffff8031fc1b = db_trap+0xdb/frame 0xfffffe1f9e198380
> kdb_trap() at 0xffffffff804dd8c3 = kdb_trap+0x193/frame 0xfffffe1f9e198410
> trap() at 0xffffffff806e3065 = trap+0x255/frame 0xfffffe1f9e198620
> calltrap() at 0xffffffff806cafd1 = calltrap+0x8/frame 0xfffffe1f9e198620
> --- trap 0x3, rip = 0xffffffff804dd11e, rsp = 0xfffffe1f9e1986f0, rbp = 0xfffffe1f9e198710 ---
> kdb_alt_break_internal() at 0xffffffff804dd11e = kdb_alt_break_internal+0x18e/frame 0xfffffe1f9e198710
> kdb_alt_break() at 0xffffffff804dcf8b = kdb_alt_break+0xb/frame 0xfffffe1f9e198720
> uart_intr_rxready() at 0xffffffff803e38a8 = uart_intr_rxready+0x98/frame 0xfffffe1f9e198750
> uart_intr() at 0xffffffff803e4621 = uart_intr+0x121/frame 0xfffffe1f9e198790
> intr_event_handle() at 0xffffffff8046c74b = intr_event_handle+0x9b/frame 0xfffffe1f9e1987e0
> intr_execute_handlers() at 0xffffffff8076d2d8 = intr_execute_handlers+0x48/frame 0xfffffe1f9e198810
> lapic_handle_intr() at 0xffffffff8077163f = lapic_handle_intr+0x3f/frame 0xfffffe1f9e198830
> Xapic_isr1() at 0xffffffff806cb6b7 = Xapic_isr1+0xb7/frame 0xfffffe1f9e198830
> --- interrupt, rip = 0xffffffff8032fedf, rsp = 0xfffffe1f9e198900, rbp = 0xfffffe1f9e198940 ---
> acpi_cpu_idle() at 0xffffffff8032fedf = acpi_cpu_idle+0x2af/frame 0xfffffe1f9e198940
> cpu_idle_acpi() at 0xffffffff8076ad1f = cpu_idle_acpi+0x3f/frame 0xfffffe1f9e198960
> cpu_idle() at 0xffffffff8076adc5 = cpu_idle+0x95/frame 0xfffffe1f9e198980
> sched_idletd() at 0xffffffff804cbbe5 = sched_idletd+0x495/frame 0xfffffe1f9e198a70
> fork_exit() at 0xffffffff8046a211 = fork_exit+0x71/frame 0xfffffe1f9e198ab0
> fork_trampoline() at 0xffffffff806cb50e = fork_trampoline+0xe/frame 0xfffffe1f9e198ab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

This looks like a normal kdb entry then a not so normal panic from ddb,
but no problems.

> Uptime: 1d4h53m19s
> Dumping 12148 out of 131020 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> Dump complete
> mps2: Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 12
> mps2: Incrementing SSU count
> mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
> mps2: Incrementing SSU count
> =====
>
> This is normal reboot (by /sbin/reboot):

Is the above just a hung dump from reboot, before going near ddb?  That
case should work, but perhaps it needs to be more careful about waiting
for the other CPUs.  Just stopping them is no good since it gives an
even more fragile environment, like panicing or entering ddb.

>
> ===
> Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 13
> mps2: Incrementing SSU count
> mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
> mps2: Incrementing SSU count
> mps2: Decrementing SSU count.
> mps2: Completing stop unit for (xpt0:mps2:0:18:ffffffff):
> mps2: Decrementing SSU count.
> mps2: Completing stop unit for (xpt0:mps2:0:14:ffffffff):
> ===
>
> ====
> mps2: lagg0: link state changed to DOWN
> Sending StopUnit: path (xpt0:mps2:0:14:ffffffff):  handle 12
> mps2: Incrementing SSU count
> mps2: Sending StopUnit: path (xpt0:mps2:0:18:ffffffff):  handle 9
> mps2: Incrementing SSU count
> mps2: Decrementing SSU count.
> mps2: Completing stop unit for (xpt0:mps2:0:18:ffffffff):
> mps2: Decrementing SSU count.
> mps2: Completing stop unit for (xpt0:mps2:0:14:ffffffff):
> ====

Bruce