"sched_lock held too long" panic + trace

Fri Feb 24 08:10:48 PST 2006

No, you're right. The easiest fix is to statically allocate the pcpu pages.
 -Kip

On 2/24/06, John Baldwin <jhb at freebsd.org> wrote:
> On Thursday 23 February 2006 06:07 pm, Kris Kennaway wrote:
> > On Thu, Feb 23, 2006 at 03:47:16PM -0500, Kris Kennaway wrote:
> > > One of my e4500s has started panicking regularly under load because
> > > sched_lock was held for > 5 seconds.  Since on sparc64 it always
> > > deadlocks after this panic instead of entering DDB, I wasn't able to
> > > track down the cause.  Instead, I changed the panic to first
> > > DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap
> > > the printfs) and then kdb_backtrace().
> > >
> > > Doing so I obtained the following trace (still a bit corrupted, but
> > > hopefully more useful).
> > >
> > > KDB: stack backtrace:
> > > hardclock_cpu() at hardclock_cpu+0x6c
> > > tick_hardclock() at tick_hardclock+0xc4
> > > -- interrupt level=0xe pil=0 %o7=0xc0190a98 --
> > > _mtx_lock_spin() at _mtx_lock_spin+0xf4
> > > tlb_page_demap() at tlb_page_demap+0xa0
> > > pmap_zero_page_idle() at pmap_zero_page_idle+0xdc
> > > vm_page_zero_idle() at vm_page_zero_idle+0x108
> > > vm_pagezero() at vm_pagezero+0x4c
> > > fork_exit() at fork_exit+0x94
> > > fork_trampoline() at fork_trampoline+0x8
> >
> > Witness seems to have caught this:
> >
> > panic: blockable sleep lock (sleep mutex) system map @ vm/vm_map.c:2995
> > db> wh
> > Tracing pid 1267 tid 100248 td 0xfffff800612a0540
> > panic() at panic+0x164
> > witness_checkorder() at witness_checkorder+0xc8
> > _mtx_lock_flags() at _mtx_lock_flags+0x80
> > _vm_map_lock_read() at _vm_map_lock_read+0x3c
> > vm_map_lookup() at vm_map_lookup+0x1c
> > vm_fault() at vm_fault+0x68
> > trap_pfault() at trap_pfault+0x1a8
> > trap() at trap+0x2b0
> > -- fast data access mmu miss tar=0xe819c000 %o7=0xc031d204 --
> > cpu_ipi_selected() at cpu_ipi_selected+0x2c
> > tlb_page_demap() at tlb_page_demap+0x74
> > pmap_copy_page() at pmap_copy_page+0x39c
> > vm_fault() at vm_fault+0xe5c
> > trap_pfault() at trap_pfault+0x134
> > trap() at trap+0xa0
> > -- data access protection tar=0x4065c524 sfar=0x4065d314 sfsr=0x800005
> > %o7=0x40350c94 --
>
> This is just a bug.  You shouldn't get a pagefault in cpu_ipi_selected().
>
> --
> John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve" = http://www.FreeBSD.org
> _______________________________________________
> freebsd-sparc64 at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
> To unsubscribe, send any mail to "freebsd-sparc64-unsubscribe at freebsd.org"
>