race conditions for destroying and opening a dev

Matthew Jacob mj at feral.com
Thu Sep 16 19:00:37 UTC 2010


Has anyone seen this scenario before? I am seeing it in RELENG_7, but 
the code in question exists through to head.

Thread 1:

(kgdb) where
#0  sched_switch (td=0xffffff003a04ea80, newtd=0xffffff00210b4000, 
flags=Variable "flags" is not available.
) at ../../../kern/sched_ule.c:1944
#1  0xffffffff803b6091 in mi_switch (flags=1, newtd=0x0) at 
../../../kern/kern_synch.c:450
#2  0xffffffff80402399 in sleepq_switch (wchan=0xffffff8413b50b60) at 
../../../kern/subr_sleepqueue.c:497
#3  0xffffffff80402e8c in sleepq_timedwait (wchan=0xffffff8413b50b60) at 
../../../kern/subr_sleepqueue.c:615
#4  0xffffffff803b682d in _sleep (ident=0xffffff8413b50b60, 
lock=0xffffffff80b0ee00, priority=76, wmesg=0xffffffff806583bb "devdrn", 
timo=100) at ../../../kern/kern_synch.c:228
#5  0xffffffff8037640c in destroy_devl (dev=0xffffff003aaf0000) at 
../../../kern/kern_conf.c:874
#6  0xffffffff80376759 in destroy_dev (dev=0xffffff003aaf0000) at 
../../../kern/kern_conf.c:916
#7  0xffffffff8034c939 in g_dev_orphan (cp=0xffffff003a544800) at 
../../../geom/geom_dev.c:438
#8  0xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c:164
#9  0xffffffff80351f1c in g_event_procbody () at 
../../../geom/geom_kern.c:141
#10 0xffffffff8038a73a in fork_exit (callout=0xffffffff80351eb0 
<g_event_procbody at ../../../geom/geom_kern.c:132>, arg=0x0, 
frame=0xffffff8413b50c80) at ../../../kern/kern_fork.c:829
#11 0xffffffff805a747e in fork_trampoline () at 
../../../amd64/amd64/exception.S:564
#12 0x0000000000000000 in ?? ()

This thread is waiting on the threadcount to go away- i.e., the last 
close of the device to occur ("da16" in this case).

Thread 2:

(kgdb) where
#0  sched_switch (td=0xffffff009bb4ca80, newtd=0xffffff003af43380, 
flags=Variable "flags" is not available.
) at ../../../kern/sched_ule.c:1944
#1  0xffffffff803b6091 in mi_switch (flags=1, newtd=0x0) at 
../../../kern/kern_synch.c:450
#2  0xffffffff80402399 in sleepq_switch (wchan=0xffffffff80b0e040) at 
../../../kern/subr_sleepqueue.c:497
#3  0xffffffff80402f84 in sleepq_wait (wchan=0xffffffff80b0e040) at 
../../../kern/subr_sleepqueue.c:580
#4  0xffffffff803b5385 in _sx_xlock_hard (sx=0xffffffff80b0e040, 
tid=18446742976810240640, opts=Variable "opts" is not available.
) at ../../../kern/kern_sx.c:562
#5  0xffffffff803b5731 in _sx_xlock (sx=0xffffffff80b0e040, opts=0, 
file=0xffffffff80652d27 "../../../geom/geom_dev.c", line=196) at sx.h:154
#6  0xffffffff8034d1bc in g_dev_open (dev=0xffffff003aaf0000, flags=1, 
fmt=Variable "fmt" is not available.
) at ../../../geom/geom_dev.c:196
#7  0xffffffff80333741 in devfs_open (ap=0xffffff841dea88b0) at 
../../../fs/devfs/devfs_vnops.c:902
#8  0xffffffff80601daf in VOP_OPEN_APV (vop=0xffffffff8089fb80, 
a=0xffffff841dea88b0) at vnode_if.c:371
#9  0xffffffff80467246 in vn_open_cred (ndp=0xffffff841dea8a00, 
flagp=0xffffff841dea894c, cmode=Variable "cmode" is not available.
) at vnode_if.h:199
#10 0xffffffff80463770 in kern_open (td=0xffffff009bb4ca80, 
path=0x5114a0 <Address 0x5114a0 out of bounds>, pathseg=Variable 
"pathseg" is not available.
) at ../../../kern/vfs_syscalls.c:1054
#11 0xffffffff805c599e in syscall (frame=0xffffff841dea8c80) at 
../../../amd64/amd64/trap.c:911
#12 0xffffffff805a723b in Xfast_syscall () at 
../../../amd64/amd64/exception.S:349
#13 0x00000008009a219c in ?? ()

This thread was opening the device, bumped the refcount, but then wedged 
on the geom topology lock .....

the refcount field is protected under devmtx....

Anyone seen this?

I'm half inclined to either add in CDP_SCHED_DTR when one calls 
destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more 
toward the latter.

Any thoughts on this?





More information about the freebsd-hackers mailing list