'make -j16 universe' gives SIReset
Peter Jeremy
peterjeremy at acm.org
Tue Oct 18 04:26:58 UTC 2011
On 2011-Oct-13 20:42:25 +0200, Marius Strobl <marius at alchemy.franken.de> wrote:
>On Thu, Oct 13, 2011 at 02:56:48PM +1100, Peter Jeremy wrote:
>> Unfortunately, I can't get a crashdump because dumpon(8) doesn't like
>> my Solaris swap partitions:
>> GEOM_PART: Partition 'da0b' not suitable for kernel dumps (wrong type?)
>> GEOM_PART: Partition 'da6b' not suitable for kernel dumps (wrong type?)
>> No suitable dump device was found.
>>
>> I did write a patch for that but took it out during some earlier
>> testing to get back to stock code. It looks like I didn't PR it
>> either so I will do that when I get some time.
I've resurrected that patch (and will send-pr it later).
>Hrm, this backtrace seems impossible as vmtotal() explicitly locks
>the object before calling vm_object_clear_flag(). A crash dump of
>this panic really would be interesting.
I've reproduced the same panic and got a crashdump (2 hours for
the dump and another hour for the savecore):
VNASSERT failed
panic: mutex vm object not owned at /usr/src/sys/vm/vm_object.c:281
cpuid = 7
#10 0x00000000c04ffbf4 in panic (fmt=0xc0a906d0 "mutex %s not owned at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:599
#11 0x00000000c04eb1b8 in _mtx_assert (m=0xfffff8b29d750ca8, what=0x4, file=0xc0ac6c00 "/usr/src/sys/vm/vm_object.c", line=0x119) at /usr/src/sys/kern/kern_mutex.c:706
#12 0x00000000c07f4b0c in vm_object_clear_flag (object=0xfffff8b29d750ca8, bits=0x4) at /usr/src/sys/vm/vm_object.c:281
#13 0x00000000c07f1dac in vmtotal (oidp=0xc0ba9be8, arg1=0x0, arg2=0x30, req=0xef8a54e0) at /usr/src/sys/vm/vm_meter.c:121
#14 0x00000000c050c13c in sysctl_root (oidp=Variable "oidp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1509
#15 0x00000000c050c434 in userland_sysctl (td=0x0, name=0xef8a5628, namelen=0x2, old=0x0, oldlenp=Variable "oldlenp" is not available.) at /usr/src/sys/kern/kern_sysctl.c:1619
#16 0x00000000c050c858 in sys___sysctl (td=0xfffff8a2e3ef48c0, uap=0xef8a5768) at /usr/src/sys/kern/kern_sysctl.c:1545
#17 0x00000000c086ba00 in syscall (tf=Variable "tf" is not available.) at subr_syscall.c:131
#18 0x00000000c0098e60 in tl0_intr ()
(kgdb) p *object
$1 = {
mtx = {
lock_object = {
lo_name = 0xc0a9a308 "vm object",
lo_flags = 0x1430000,
lo_data = 0x0,
lo_witness = 0xfff85180
},
mtx_lock = 0xfffff8a0112d75e0
},
...
}
(kgdb) p *object->mtx->lock_object->lo_witness
$3 = {
w_name = "standard object", '\0' <repeats 48 times>,
w_index = 0xa3,
w_class = 0xc0b82e88,
w_list = {
stqe_next = 0xfff85100
},
w_typelist = {
stqe_next = 0xfff85100
},
w_hash_next = 0x0,
w_file = 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",
w_line = 0x71,
w_refcount = 0x53718,
w_num_ancestors = 0xe,
w_num_descendants = 0xe,
w_ddb_level = 0x0,
w_displayed = 0x1,
w_reversed = 0x0
}
(kgdb) p vm_object_list_mtx
$4 = {
lock_object = {
lo_name = 0xc0ac6e30 "vm object_list",
lo_flags = 0x1030000,
lo_data = 0x0,
lo_witness = 0xfff81d80
},
mtx_lock = 0xfffff8a2e3ef48c2
}
(kgdb) p *vm_object_list_mtx.lock_object.lo_witness
$6 = {
w_name = "vm object_list", '\0' <repeats 49 times>,
w_index = 0x3b,
w_class = 0xc0b82e88,
w_list = {
stqe_next = 0xfff81d00
},
w_typelist = {
stqe_next = 0xfff81d00
},
w_hash_next = 0x0,
w_file = 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",
w_line = 0x6f,
w_refcount = 0x1,
w_num_ancestors = 0xf,
w_num_descendants = 0x0,
w_ddb_level = 0x0,
w_displayed = 0x1,
w_reversed = 0x0
}
The witness information looks correct but I notice that vm_object_list_mtx
is owned by a different thread to the vm_object that triggers the panic.
The panic says it occurred on CPU 7:
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 = (struct thread *) 0xfffff8a2e3ef48c0
which matches the vm_object_list_mtx.
My inital thought was a locking glitch but, looking through
cpuid_to_pcpu[], the vm_object's lock doesn't match any running thread:
(kgdb) p cpuid_to_pcpu[0]->pc_curthread
$14 = (struct thread *) 0xfffff8a2e3008000
(kgdb) p cpuid_to_pcpu[1]->pc_curthread
$15 = (struct thread *) 0xfffff8a2aae7c8c0
(kgdb) p cpuid_to_pcpu[2]->pc_curthread
$16 = (struct thread *) 0xfffff8a0112acd20
(kgdb) p cpuid_to_pcpu[3]->pc_curthread
$17 = (struct thread *) 0xfffff8a0112ac8c0
(kgdb) p cpuid_to_pcpu[4]->pc_curthread
$18 = (struct thread *) 0xfffff8a2aae7da40
(kgdb) p cpuid_to_pcpu[5]->pc_curthread
$19 = (struct thread *) 0xfffff8a2aa2a6460
(kgdb) p cpuid_to_pcpu[6]->pc_curthread
$20 = (struct thread *) 0xfffff8a2e3148d20
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 = (struct thread *) 0xfffff8a2e3ef48c0
(kgdb) p cpuid_to_pcpu[8]->pc_curthread
$22 = (struct thread *) 0xfffff8d32cfa0460
(kgdb) p cpuid_to_pcpu[9]->pc_curthread
$23 = (struct thread *) 0xfffff8a0112b3a40
(kgdb) p cpuid_to_pcpu[10]->pc_curthread
$24 = (struct thread *) 0xfffff8a2a8f77180
(kgdb) p cpuid_to_pcpu[11]->pc_curthread
$25 = (struct thread *) 0xfffff8a2e3ef1a40
(kgdb) p cpuid_to_pcpu[12]->pc_curthread
$26 = (struct thread *) 0xfffff8a2e319e8c0
(kgdb) p cpuid_to_pcpu[13]->pc_curthread
$27 = (struct thread *) 0xfffff8a2e3c30d20
(kgdb) p cpuid_to_pcpu[14]->pc_curthread
$28 = (struct thread *) 0xfffff8a0112b2460
(kgdb) p cpuid_to_pcpu[15]->pc_curthread
$29 = (struct thread *) 0xfffff8c1f78cb180
Some rummaging around says that the object is locked by yarrow:
(kgdb) p ((struct thread *) 0xfffff8a0112d75e0)->td_proc.p_comm
$35 = "yarrow", '\0' <repeats 13 times>
At this stage, I'm not sure where to go next.
--
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20111018/32cf1bc3/attachment.pgp
More information about the freebsd-sparc64
mailing list