'make -j16 universe' gives SIReset

Peter Jeremy peterjeremy at acm.org
Tue Oct 18 04:26:58 UTC 2011


On 2011-Oct-13 20:42:25 +0200, Marius Strobl <marius at alchemy.franken.de> wrote:
>On Thu, Oct 13, 2011 at 02:56:48PM +1100, Peter Jeremy wrote:
>> Unfortunately, I can't get a crashdump because dumpon(8) doesn't like
>> my Solaris swap partitions:
>> GEOM_PART: Partition 'da0b' not suitable for kernel dumps (wrong type?)
>> GEOM_PART: Partition 'da6b' not suitable for kernel dumps (wrong type?)
>> No suitable dump device was found.
>> 
>> I did write a patch for that but took it out during some earlier
>> testing to get back to stock code.  It looks like I didn't PR it
>> either so I will do that when I get some time.

I've resurrected that patch (and will send-pr it later).

>Hrm, this backtrace seems impossible as vmtotal() explicitly locks
>the object before calling vm_object_clear_flag(). A crash dump of
>this panic really would be interesting.

I've reproduced the same panic and got a crashdump (2 hours for
the dump and another hour for the savecore):
VNASSERT failed
panic: mutex vm object not owned at /usr/src/sys/vm/vm_object.c:281
cpuid = 7

#10 0x00000000c04ffbf4 in panic (fmt=0xc0a906d0 "mutex %s not owned at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:599
#11 0x00000000c04eb1b8 in _mtx_assert (m=0xfffff8b29d750ca8, what=0x4, file=0xc0ac6c00 "/usr/src/sys/vm/vm_object.c", line=0x119) at /usr/src/sys/kern/kern_mutex.c:706
#12 0x00000000c07f4b0c in vm_object_clear_flag (object=0xfffff8b29d750ca8, bits=0x4) at /usr/src/sys/vm/vm_object.c:281
#13 0x00000000c07f1dac in vmtotal (oidp=0xc0ba9be8, arg1=0x0, arg2=0x30, req=0xef8a54e0) at /usr/src/sys/vm/vm_meter.c:121
#14 0x00000000c050c13c in sysctl_root (oidp=Variable "oidp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1509
#15 0x00000000c050c434 in userland_sysctl (td=0x0, name=0xef8a5628, namelen=0x2, old=0x0, oldlenp=Variable "oldlenp" is not available.) at /usr/src/sys/kern/kern_sysctl.c:1619
#16 0x00000000c050c858 in sys___sysctl (td=0xfffff8a2e3ef48c0, uap=0xef8a5768) at /usr/src/sys/kern/kern_sysctl.c:1545
#17 0x00000000c086ba00 in syscall (tf=Variable "tf" is not available.) at subr_syscall.c:131
#18 0x00000000c0098e60 in tl0_intr ()

(kgdb) p *object
$1 = {
  mtx = {
    lock_object = {
      lo_name = 0xc0a9a308 "vm object", 
      lo_flags = 0x1430000, 
      lo_data = 0x0, 
      lo_witness = 0xfff85180
    }, 
    mtx_lock = 0xfffff8a0112d75e0
  }, 
...
}
(kgdb) p *object->mtx->lock_object->lo_witness
$3 = {
  w_name = "standard object", '\0' <repeats 48 times>, 
  w_index = 0xa3, 
  w_class = 0xc0b82e88, 
  w_list = {
    stqe_next = 0xfff85100
  }, 
  w_typelist = {
    stqe_next = 0xfff85100
  }, 
  w_hash_next = 0x0, 
  w_file = 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c", 
  w_line = 0x71, 
  w_refcount = 0x53718, 
  w_num_ancestors = 0xe, 
  w_num_descendants = 0xe, 
  w_ddb_level = 0x0, 
  w_displayed = 0x1, 
  w_reversed = 0x0
}
(kgdb) p vm_object_list_mtx
$4 = {
  lock_object = {
    lo_name = 0xc0ac6e30 "vm object_list", 
    lo_flags = 0x1030000, 
    lo_data = 0x0, 
    lo_witness = 0xfff81d80
  }, 
  mtx_lock = 0xfffff8a2e3ef48c2
}
(kgdb) p *vm_object_list_mtx.lock_object.lo_witness 
$6 = {
  w_name = "vm object_list", '\0' <repeats 49 times>, 
  w_index = 0x3b, 
  w_class = 0xc0b82e88, 
  w_list = {
    stqe_next = 0xfff81d00
  }, 
  w_typelist = {
    stqe_next = 0xfff81d00
  }, 
  w_hash_next = 0x0, 
  w_file = 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c", 
  w_line = 0x6f, 
  w_refcount = 0x1, 
  w_num_ancestors = 0xf, 
  w_num_descendants = 0x0, 
  w_ddb_level = 0x0, 
  w_displayed = 0x1, 
  w_reversed = 0x0
}

The witness information looks correct but I notice that vm_object_list_mtx
is owned by a different thread to the vm_object that triggers the panic.

The panic says it occurred on CPU 7:
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 = (struct thread *) 0xfffff8a2e3ef48c0
which matches the vm_object_list_mtx.

My inital thought was a locking glitch but, looking through
cpuid_to_pcpu[], the vm_object's lock doesn't match any running thread:

(kgdb) p cpuid_to_pcpu[0]->pc_curthread
$14 = (struct thread *) 0xfffff8a2e3008000
(kgdb) p cpuid_to_pcpu[1]->pc_curthread
$15 = (struct thread *) 0xfffff8a2aae7c8c0
(kgdb) p cpuid_to_pcpu[2]->pc_curthread
$16 = (struct thread *) 0xfffff8a0112acd20
(kgdb) p cpuid_to_pcpu[3]->pc_curthread
$17 = (struct thread *) 0xfffff8a0112ac8c0
(kgdb) p cpuid_to_pcpu[4]->pc_curthread
$18 = (struct thread *) 0xfffff8a2aae7da40
(kgdb) p cpuid_to_pcpu[5]->pc_curthread
$19 = (struct thread *) 0xfffff8a2aa2a6460
(kgdb) p cpuid_to_pcpu[6]->pc_curthread
$20 = (struct thread *) 0xfffff8a2e3148d20
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 = (struct thread *) 0xfffff8a2e3ef48c0
(kgdb) p cpuid_to_pcpu[8]->pc_curthread
$22 = (struct thread *) 0xfffff8d32cfa0460
(kgdb) p cpuid_to_pcpu[9]->pc_curthread
$23 = (struct thread *) 0xfffff8a0112b3a40
(kgdb) p cpuid_to_pcpu[10]->pc_curthread
$24 = (struct thread *) 0xfffff8a2a8f77180
(kgdb) p cpuid_to_pcpu[11]->pc_curthread
$25 = (struct thread *) 0xfffff8a2e3ef1a40
(kgdb) p cpuid_to_pcpu[12]->pc_curthread
$26 = (struct thread *) 0xfffff8a2e319e8c0
(kgdb) p cpuid_to_pcpu[13]->pc_curthread
$27 = (struct thread *) 0xfffff8a2e3c30d20
(kgdb) p cpuid_to_pcpu[14]->pc_curthread
$28 = (struct thread *) 0xfffff8a0112b2460
(kgdb) p cpuid_to_pcpu[15]->pc_curthread
$29 = (struct thread *) 0xfffff8c1f78cb180

Some rummaging around says that the object is locked by yarrow:
(kgdb) p ((struct thread *) 0xfffff8a0112d75e0)->td_proc.p_comm
$35 = "yarrow", '\0' <repeats 13 times>

At this stage, I'm not sure where to go next.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20111018/32cf1bc3/attachment.pgp


More information about the freebsd-sparc64 mailing list