DTrace lockup on a dual processor VMWare

Andrew Brampton brampton+freebsd-hackers at gmail.com
Fri Aug 21 17:10:12 UTC 2009


Hi,
I am running a amd64 FreeBSD 7.2 inside a VMWare, and DTrace has been
working great. However, I have just changed my VMWare to use two
processors instead of one, and things have started to break. I can
load the dtraceall module but when I run hotkernel the machine hangs.
I have DDB compiled into the kernel, but I was unable to break into
it, instead I used a remote GDB (on the host machine) to give me the
follow backtraces:

CPU 0
#0  rdtsc (n=<value optimized out>) at ./machine/cpufunc.h:379
#1  DELAY (n=<value optimized out>) at /usr/7.2/sys/amd64/isa/clock.c:290
#2  0xffffffff80512d2a in _mtx_lock_spin (m=0xffffff00018b4128,
tid=18446742974215591648, opts=<value optimized out>,
    file=<value optimized out>, line=<value optimized out>) at
/usr/7.2/sys/kern/kern_mutex.c:480
#3  0xffffffff80513304 in _mtx_lock_spin_flags (m=0xffffff00018b4128, opts=0,
    file=0xffffffff80f31600
"/usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c",
line=574)
    at /usr/7.2/sys/kern/kern_mutex.c:229
#4  0xffffffff80f30ca0 in cyclic_fire (frame=0xfffffffe89ce7ae0)
    at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:574
#5  cyclic_clock (frame=0xfffffffe89ce7ae0) at
/usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/amd64/cyclic_machdep.c:99
#6  0xffffffff807cf7bb in lapic_handle_timer
(frame=0xfffffffe89ce7ae0) at
/usr/7.2/sys/amd64/amd64/local_apic.c:695
#7  0xffffffff807c9db7 in Xtimerint () at
/usr/7.2/sys/amd64/amd64/apic_vector.S:103

CPU 1
#0  smp_rendezvous_cpus (map=2, setup_func=0,
action_func=0xffffffff80f301a0 <cyclic_reprogram_xcall>,
    teardown_func=0xffffffff80558b00 <smp_no_rendevous_barrier>,
arg=0xfffffffe8a8e94e0) at /usr/7.2/sys/kern/subr_smp.c:388
#1  0xffffffff80f3055e in xcall (arg=<value optimized out>, c=<value
optimized out>, func=0, param=0x46)
    at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/amd64/cyclic_machdep.c:131
#2  0xffffffff80f301ed in cyclic_reprogram (cpu=<value optimized out>,
exp=<value optimized out>)
    at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:540
#3  0xffffffff80f31194 in cyclic_add_here (cpu=0xffffff00018b4100,
hdlr=0xfffffffe8a8e95a0, when=0xfffffffe8a8e9590,
    flags=0) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:765
#4  0xffffffff80f314ea in cyclic_omni_start (omni=<value optimized out>)
    at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:956
#5  cyclic_add_omni (omni=<value optimized out>) at
/usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:1226
#6  0xffffffff80f21665 in profile_enable (arg=<value optimized out>,
id=2149531648, parg=0x0)
    at /usr/7.2/sys/modules/dtrace/profile/../../../cddl/dev/profile/profile.c:450
#7  0xffffffff80f3c6cf in dtrace_ecb_enable (probe=0x1, arg=<value
optimized out>)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:9656
#8  dtrace_ecb_create_enable (probe=0x1, arg=<value optimized out>)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:10471
#9  0xffffffff80f40142 in dtrace_match (pkp=0xfffffffe8a8e9800,
priv=31, uid=27929688, zoneid=0,
    matched=0xffffffff80f3c130 <dtrace_ecb_create_enable>,
arg=0xffffff000db1f840)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7212
#10 0xffffffff80f40346 in dtrace_probe_enable
(desc=0xffffff0001a9d218, enab=0xffffff000db1f840)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7992
#11 0xffffffff80f40434 in dtrace_enabling_match
(enab=0xffffff000db1f840, nmatched=0xffffff000db83528)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:11366
#12 0xffffffff80f4c19a in dtrace_ioctl (dev=<value optimized out>,
cmd=<value optimized out>, addr=0xffffff000db83520 "",
    flags=<value optimized out>, td=0xffffff0001ab7370)
    at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/dev/dtrace/dtrace_ioctl.c:380
#13 0xffffffff804b14e6 in devfs_ioctl_f (fp=0xffffff00014db780,
com=3222304774, data=0xffffff000db83520,
    cred=<value optimized out>, td=0xffffff0001ab7370) at
/usr/7.2/sys/fs/devfs/devfs_vnops.c:602
#14 0xffffffff80562063 in fo_ioctl (td=0xffffff0001ab7370, fd=7,
com=3222304774, data=0xffffff000db83520 "")
    at /usr/7.2/sys/sys/file.h:269
#15 kern_ioctl (td=0xffffff0001ab7370, fd=7, com=3222304774,
data=0xffffff000db83520 "")
    at /usr/7.2/sys/kern/sys_generic.c:627
#16 0xffffffff805622c9 in ioctl (td=0xffffff0001ab7370,
uap=0xfffffffe8a8e9bf0) at /usr/7.2/sys/kern/sys_generic.c:571
#17 0xffffffff807e54fc in syscall (frame=0xfffffffe8a8e9c80) at
/usr/7.2/sys/amd64/amd64/trap.c:900
#18 0xffffffff807c971b in Xfast_syscall () at
/usr/7.2/sys/amd64/amd64/exception.S:330

It seems that CPU 0 is doing nothing, permanently stuck on line 379 of
cpufunc.h, whereas CPU 1 is spinning inside the loop on line 388 of
subr_smp.c.

On thing I notice that when I run hotkernel on a single processor
VMWare it prints out:
> hotkernel
Sampling... Hit Ctrl+C to end
dtrace: buffer size lowered to 1m
dtrace: aggregation size lowered to 1m

When I run it on the dual-processor VMWare it only prints
> hotkernel
Sampling... Hit Ctrl+C to end

So I suspect it is locking up before it gets to the code that prints
"buffer size lowered".

Additionally, I have built my own kernel, using all the standard
options and sources, with just the following extra options:
options         KDTRACE_FRAME           # Ensure frames are compiled in
options         KDTRACE_HOOKS           # Kernel DTrace hooks
options         DDB_CTF

options         DEVICE_POLLING
options         KDB
options         KDB_UNATTENDED
options         DDB
options         GDB
options         BREAK_TO_DEBUGGER
options         INVARIANTS
options         INVARIANT_SUPPORT
options         WITNESS
options         WITNESS_SKIPSPIN
options         DEBUG_MEMGUARD
options         HWPMC_HOOKS
device          hwpmc

Can anyone suggest anything to try and debug/fix this problem. I'm
happy to hack the kernel sources if need be.

thanks
Andrew


More information about the freebsd-hackers mailing list