[Bug 229986] Terminating virtualbox or ctld + zfs scrub + gstat leads to deadlock

Mon Jul 23 19:22:26 UTC 2018

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229986

--- Comment #3 from Andriy Gapon <avg at FreeBSD.org> ---
There are some interesting threads in the procstat output, but I do not see
anything like a deadlock.

    0 100255 kernel              zio_write_issue_2   mi_switch+0xe6
sleepq_wait+0x2c _cv_wait+0x16e zio_wait+0x8b dmu_buf_hold_array_by_dnode+0x232
dmu_read_impl+0xa6 dmu_read+0x45 space_map_iterate+0x103 space_map_load+0x91
metaslab_load+0x3f metaslab_alloc_dva+0x91b metaslab_alloc+0x98
zio_dva_allocate+0x82 zio_execute+0xac taskqueue_run_locked+0x154
taskqueue_thread_loop+0x98 fork_exit+0x83 fork_trampoline+0xe
    9 100461 zfskern             txg_thread_enter    mi_switch+0xe6
sleepq_wait+0x2c _cv_wait+0x16e zio_wait+0x8b dsl_pool_sync+0x3a4
spa_sync+0x9c5 txg_sync_thread+0x280 fork_exit+0x83 fork_trampoline+0xe

   12 100018 geom                g_event             mi_switch+0xe6
sleepq_wait+0x2c _cv_wait+0x16e txg_wait_synced+0xa5 zil_close+0xbf
zvol_last_close+0x15 zvol_geom_access+0x17a g_access+0x1fd g_label_taste+0x415
g_new_provider_event+0xca g_run_events+0x13e fork_exit+0x83 fork_trampoline+0xe

49368 101268 VBoxHeadless        -                   mi_switch+0xe6
sleepq_timedwait+0x2f _sleep+0x21e g_waitidle+0xa1 userret+0x50
amd64_syscall+0x23a fast_syscall_common+0x101

 8922 101185 gstat               -                   mi_switch+0xe6
sleepq_timedwait+0x2f _sleep+0x21e g_waitfor_event+0xc3
sysctl_kern_geom_confxml+0x39 sysctl_root_handler_locked+0x8b sysctl_root+0x1c3
userland_sysctl+0x136 sys___sysctl+0x5f amd64_syscall+0xa1f
fast_syscall_common+0x101

Thread 100018 is the most interesting.
We can see that there is a GEOM new provider event. It naturally leads to the
GEOM tasting.  We see that g_label_taste has finished its tasting and closes
the zvol, so zvol_last_close is called.  That leads to zil_close and
txg_wait_synced.  I am not sure why the last call is made as I would not expect
any writes to the volume after it's been closed by VBox.
Anyway, the thread waits on the sync thread to sync out the current TXG.

The sync thread, 100461, is waiting for an I/O request to complete.

gstat and VBoxHeadless are waiting for the GEOM event to be processed (by
thread 100018).

So, this looks like the scrub I/O introduces a significant delay into the GEOM
event handling via the ZFS sync thread.

-- 
You are receiving this mail because:
You are the assignee for the bug.