Sleeping thread held mutex in vm_pageout_oom()

Mon Jan 19 22:56:27 UTC 2015

I recently had a system where a DIMM failed and the OOM killer was
constantly kicking in due to a memory-hungry daemon being constantly
restarted.  This ended up triggering a race condition in the OOM
killer leading to this panic:

Sleeping thread (tid 100075, pid 8) owns a non-sleepable lock
sched_switch() at 0xffffffff8048386d = sched_switch+0x16d
mi_switch() at 0xffffffff80469dd6 = mi_switch+0x186
sleepq_wait() at 0xffffffff80499204 = sleepq_wait+0x44
__lockmgr_args() at 0xffffffff8044b88b = __lockmgr_args+0x67b
vop_stdlock() at 0xffffffff804d3689 = vop_stdlock+0x39
---Type <return> to continue, or q <return> to quit---
VOP_LOCK1_APV() at 0xffffffff8069da42 = VOP_LOCK1_APV+0x52
_vn_lock() at 0xffffffff804ed627 = _vn_lock+0x47
vm_object_deallocate() at 0xffffffff8061eef3 = vm_object_deallocate+0x203
vm_map_entry_deallocate() at 0xffffffff80616d2c = vm_map_entry_deallocate+0x4c
vm_map_process_deferred() at 0xffffffff80616d62 = vm_map_process_deferred+0x32
vm_map_remove() at 0xffffffff806183ff = vm_map_remove+0x6f
vmspace_free() at 0xffffffff80619206 = vmspace_free+0x56
vm_pageout_oom() at 0xffffffff806230d1 = vm_pageout_oom+0x181
vm_pageout() at 0xffffffff8062410b = vm_pageout+0x90b
fork_exit() at 0xffffffff8043a382 = fork_exit+0x112
fork_trampoline() at 0xffffffff8063385e = fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80c3be1d00, rbp = 0 ---
panic: sleeping thread
cpuid = 5
curthread = grep/grep (82989/100544)
cpu_ticks = 1848294656444
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff801e52ba = db_trace_self_wrapper+0x2a
panic() at 0xffffffff80461608 = panic+0x228
propagate_priority() at 0xffffffff8049cbde = propagate_priority+0x15e
turnstile_wait() at 0xffffffff8049d278 = turnstile_wait+0x1b8
_mtx_lock_sleep() at 0xffffffff80451af1 = _mtx_lock_sleep+0xf1
---Type <return> to continue, or q <return> to quit---
_mtx_lock_flags() at 0xffffffff80451c75 = _mtx_lock_flags+0x75
exit1() at 0xffffffff804367de = exit1+0x36e
sys_exit() at 0xffffffff8043731e = sys_exit+0xe
syscallenter() at 0xffffffff8049b324 = syscallenter+0x104
syscall() at 0xffffffff80649bfc = syscall+0x4c
Xfast_syscall() at 0xffffffff806335f2 = Xfast_syscall+0xe2
--- syscall (1, FreeBSD ELF64, sys_exit), rip = 0x300a2df9c, rsp =
0x7ffffffd40c8, rbp = 0x7ffffffd40e0 ---
Uptime: 7m19s

The root cause is that vm_pageout_oom() acquires a reference on a
process's vmspace while holding its PROC_LOCK(), then the process
exited.  This left vm_pageout_oom() holding the only reference on the
vmspace, so when it dropped the reference it called into
vm_map_remove() and wound up sleeping while still holding the
PROC_LOCK().  This was under FreeBSD 8 but the code in head does not
seem to have changed here.

I'm not quite familiar with the lock mechanisms here so I'm not sure
how to fix it.  Does vm_pageout_oom() need to _PHOLD() the process
while holding the PROC_LOCK(), then drop the lock, then acquire the
vmspace reference?  It appears that's how other places that call
vmspace_acquire_ref() work.