'make -j16 universe' gives SIReset
Marius Strobl
marius at alchemy.franken.de
Wed Jun 15 23:12:32 UTC 2011
On Wed, Jun 15, 2011 at 07:49:59AM +1000, Peter Jeremy wrote:
> On 2011-Jun-14 09:51:44 +1000, Peter Jeremy <peter at server.vk2pj.dyndns.org> wrote:
> >I'm building r223035 with DDB & KDB and will see how that goes.
>
> I had another try with WITNESS & INVARIANTS and got a different panic:
> panic: blockable sleep lock (sleep mutex) system map @ /usr/src/sys/vm/vm_map.c:3651
> cpuid = 13
> KDB: stack backtrace:
> panic() at panic+0x1c8
> witness_checkorder() at witness_checkorder+0x108
> _mtx_lock_flags() at _mtx_lock_flags+0x110
> _vm_map_lock_read() at _vm_map_lock_read+0x1c
> vm_map_lookup() at vm_map_lookup+0x4c
> vm_fault_hold() at vm_fault_hold+0x94
> vm_fault() at vm_fault+0x14
> trap_pfault() at trap_pfault+0x338
> trap() at trap+0x3a8
> -- fast data access mmu miss tar=0x2000 %o7=0xc055e038 --
> intr_vector_stray() at intr_vector_stray+0x8
> sched_switch() at sched_switch+0x290
> mi_switch() at mi_switch+0x2a8
> sleepq_switch() at sleepq_switch+0x1cc
> sleepq_catch_signals() at sleepq_catch_signals+0x130
> sleepq_timedwait_sig() at sleepq_timedwait_sig+0x8
> _cv_timedwait_sig() at _cv_timedwait_sig+0x344
> seltdwait() at seltdwait+0x74
> kern_select() at kern_select+0x618
> select() at select+0x44
> syscallenter() at syscallenter+0x270
> syscall() at syscall+0x74
> -- syscall (93, FreeBSD ELF64, select) %o7=0x1099dc --
> userland() at 0x14bde8
> user trace: trap %o7=0x1099dc
> pc 0x14bde8, sp 0x7fdffffc8d1
> pc 0x26c800, sp 0x26c800
> done
> KDB: enter: panic
>
> Unfortunately, still no DDB - just a hang
This backtrace shows two things that just shouldn't happen hardware-wise:
a) The CPU issues an stray interrupt vector. This would explain the SIRs
you were seeing without the patch which tries to make these non-fatal.
b) The CPU faults on an address which is covered by an locked TLB slot.
The funny thing is that the CPU then actually still manages to panic; if
something like b) occurs I'd expect it to be in a totally unusable state.
I'm not sure what to do about these as it still looks like broken hardware
or a silicon bug to me but at least the public errata doesn't mention
something like that and the OpenSolaris source doesn't seem to work
around something like these in an obvious way either. The only thing I
can think of is to try whether just ignoring the stray interrupt vectors
with the below patch avoids any further issues. You'll need to revert
sparc64_intr_vector_stray.diff for that or at least the exception.S
part.
Marius
Index: exception.S
===================================================================
--- exception.S (revision 223042)
+++ exception.S (working copy)
@@ -578,7 +578,7 @@
andcc %g1, IRSR_BUSY, %g0
bnz,a,pt %xcc, intr_vector
nop
- sir
+ retry
.align 32
.endm
More information about the freebsd-sparc64
mailing list