Deadlock between GEOM and devfs device destroy and process exit.

Alexander Motin mav at
Fri Jan 29 23:27:25 UTC 2010


Experimenting with SATA hot-plug I've found quite repeatable deadlock
case. Problem observed when several SATA devices, opened via devfs,
disappear at exactly same time. In my case, at time of unplugging SATA
Port Multiplier with several disks beyond it. All I have to do is to run
several `dd if=/dev/adaX of=/dev/null bs=1m &` commands and unplug
multiplier. That causes predictable I/O errors and devices destruction.
But with high probability several dd processes getting stuck in kernel.

I've discovered such pieces of problem:
- CAM receives disconnect event and starts device destruction. But as
device is still opened, it can't do it immediately.
- dd receives I/O error and exits.
- exit1() call closes all descriptors, including adaX device. It
triggers final device destruction, by sending event to geom_dev.

adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521
g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4
g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb
g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425
devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762
VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55
vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750
vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854
devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b
at 0x40836da3
closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0
fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da
exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423
sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd
syscall(7b604d38) at 0x40b565c0

- GEOM event thread tries to destroy /dev/adaX device (which should be
already free at this moment), but for some reason freezes, waiting for
device to be freed:

    0     2     0   0  -8  0      0      8 devdrn DL    ??    0:02.89

- as GEOM event is still not handled, exit1() waits for it:

kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909
g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f
exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431
sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd
syscall(7b604d38) at 0x40b565c0

- system stationary. GEOM frozen. No way to get out of this, except
pushing reset.

    0  1065  1055   0  44  0   5344   3040 g_wait DE     0    0:00.43 dd
if=/dev/ada1 of=/dev/null bs=1m
    0  1066  1055   0  44  0   5344   3040 GEOM t DE     0    0:00.07 dd
if=/dev/ada2 of=/dev/null bs=1m

So, does anybody have good idea why destroy_dev() can't complete?

Alexander Motin

More information about the freebsd-geom mailing list