mps(4) blocks panic-reboot

Kenneth D. Merry ken at FreeBSD.ORG
Fri Jun 2 15:37:15 UTC 2017


On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote:
>  Bez??glich Harry Schmalzbauer's Nachricht vom 01.06.2017 21:03 (localtime):
> > Bez??glich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime):
> >> Can you try the attached patch and let me know how it goes? I didn't test
> >> it, but since you know how, it might be easier this way. This was diff'd
> >> from the latest mps files in stable/11, which I recently updated (today).
> > Your diff is doing very well on r319447:
> >
> >
> ???
> > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff):  handle 13
> > mps0: Completing stop unit for (xpt0:mps0:0:6:ffffffff):
> >
> > And, there followed a immediate reset :-)
> 
> There's one new problem: Shutting down leads to the probably last panic
> possible:
> 
> kernel trap 12 with interrupts disabled
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x20
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff805f43ec
> stack pointer           = 0x28:0xfffffe03bc9c3730
> frame pointer           = 0x28:0xfffffe03bc9c3750
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 1 (init)
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff805df4f7 at kdb_backtrace+0x67
> #1 0xffffffff8059df96 at vpanic+0x186
> #2 0xffffffff8059de03 at panic+0x43
> #3 0xffffffff808a1892 at trap_fatal+0x322
> #4 0xffffffff808a18e9 at trap_pfault+0x49
> #5 0xffffffff808a1126 at trap+0x286
> #6 0xffffffff80887401 at calltrap+0x8
> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72
> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c
> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b
> #10 0xffffffff8059db9a at kern_reboot+0x49a
> #11 0xffffffff8059d6f8 at sys_reboot+0x458
> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4
> #13 0xffffffff808876eb at Xfast_syscall+0xfb
> 
> (kgdb) list *0xffffffff805f43ec                   
> 0xffffffff805f43ec is in turnstile_broadcast
> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837).
> 832
> 833             /*
> 834              * Transfer the blocked list to the pending list.
> 835              */
> 836             mtx_lock_spin(&td_contested_lock);
> 837             TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
> td_lockq);
> 838             mtx_unlock_spin(&td_contested_lock);
> 839
> 840             /*
> 841              * Give a turnstile to each thread.  The last thread gets
> 
> I haven't looked at the code at all and only very briefly lokked at the
> diff, just out of curiosity, like pigs staring at clockworks ;-)
> 
> But at least I hope this report does help.

Thanks for testing it!

My guess is that the problem is that the problem is xpt_polled_action()
releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring
the mutex.

You could try putting the following around the call to xpt_polled_action():

	mtx_lock(xpt_path_mtx(ccb->ccb_h.path));
	xpt_polled_action(ccb);
	mtx_unlock(xpt_path_mtx(ccb->ccb_h.path));

See if that fixes things.  One other thing to put in there -- after the
if (target->stop_at_shutdown) { } statement, but still inside the for
loop, add these two lines:

	xpt_free_path(ccb->ccb_h.path);
	xpt_free_ccb(ccb);

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the freebsd-scsi mailing list