US-III crashes on current

Wed Apr 22 19:33:05 UTC 2009

On Tue, Apr 21, 2009 at 11:11:44PM +0200, Florian Smeets wrote:
> On 21.04.09 23:03, Marius Strobl wrote:
> >On Tue, Apr 21, 2009 at 09:15:32PM +0200, Florian Smeets wrote:
> >>On 21.04.09 20:58, Marius Strobl wrote:
> >>>On Tue, Apr 21, 2009 at 01:45:27AM +0200, Florian Smeets wrote:
> >>>>
> >>>>Yes, i can still reproduce this on every shutdown. Tried with r191337.
> >>>>Trace is still the same.
> >>>>
> >>>
> >>>Could you please run gdb(1) on the corresponding kernel.debug
> >>>and report the output of the following commands?
> >>>l *(0xc034c96c)
> >>>l *(callout_lock+0x40)
> >>>Change as needed if the addresses differ from the above
> >>>backtrace. Hrm, the one you reported to scsi@ actually
> >>>is a bit different:
> >>>>-- fast data access mmu miss tar=0x1454156000 %o7=0xc040e7a4 --
> >>>>_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x5c
> >>>>callout_lock() at callout_lock+0x50
> >>>
> >>>In that case please additionally get the output of
> >>>l *(_mtx_lock_spin_flags+0x5c)
> >>>
> >>
> >>OK, to get this straight this is the trace I'm talking about.
> >>
> >>Uptime: 19h19m49s
> >>panic: trap: fast data access mmu miss
> >>cpuid = 0
> >>KDB: enter: panic
> >>[thread pid 97473 tid 100179 ]
> >>Stopped at      kdb_enter+0x80: ta              %xcc, 1
> >>db>  where
> >>Tracing pid 97473 tid 100179 td 0xfffff80006dfc370
> >>panic() at panic+0x20c
> >>trap() at trap+0x4d0
> >>-- fast data access mmu miss tar=0x20007e000 %o7=0xc03f70a4 --
> >>callout_lock() at callout_lock+0x20
> >>untimeout() at untimeout+0xc
> >>isp_done() at isp_done+0x140
> >>isp_intr() at isp_intr+0x3eb8
> >>isp_poll() at isp_poll+0x38
> >>xpt_polled_action() at xpt_polled_action+0xc8
> >>dashutdown() at dashutdown+0x16c
> >>boot() at boot+0x850
> >>reboot() at reboot+0x64
> >>syscall() at syscall+0x2b4
> >>-- syscall (55, FreeBSD ELF64, reboot) %o7=0x1013e4 --
> >>userland() at 0x40564948
> >>user trace: trap %o7=0x1013e4
> >>pc 0x40564948, sp 0x7fdffffe201
> >>pc 0x100df0, sp 0x7fdffffe2c1
> >>pc 0x40206954, sp 0x7fdffffe381
> >>done
> >>
> >>(gdb) l *(0xc03f70a4)
> >>0xc03f70a4 is in spinlock_exit 
> >>(/usr/src/sys/sparc64/sparc64/machdep.c:232).
> >>227	spinlock_exit(void)
> >>228	{
> >>229		struct thread *td;
> >>230	
> >>231		td = curthread;
> >>232		critical_exit();
> >>233		td->td_md.md_spinlock_count--;
> >>234		if (td->td_md.md_spinlock_count == 0)
> >>235			wrpr(pil, td->td_md.md_saved_pil, 0);
> >>236	}
> >
> >Hrm, this suggests that curthread or the per-CPU data went
> >missing at that point, which leaves me clueless at the
> >moment. Do you see this problem since installing FreeBSD
> >on that machine or has it developed later? If the latter,
> >can you pinpoint when it started? What kind of access for
> >debugging could you provide?
> >
> 
> Honestly i don't know for sure. I don't know if it already existed with 
> the first USIII patch you sent me. But i know 100% certain that i was 
> already seeing this when we were debugging the STICK thing, which was 
> only a few days after i installed the machine (with your initial patch).
> 
> I cloud provide access to a FreeBSD box from which you could telnet to 
> the rsc card of the machine.
> 

Ok, please arrange it.

Marius