ZFS stalled after some mirror disks were lost

Andriy Gapon avg at FreeBSD.org
Mon Oct 2 20:56:23 UTC 2017


On 02/10/2017 22:13, Ben RUBSON wrote:
>> On 02 Oct 2017, at 20:45, Andriy Gapon <avg at FreeBSD.org> wrote:
>>
>> On 02/10/2017 21:17, Ben RUBSON wrote:
>>> Unfortunately the command stalls / does not return :/
>>
>> Try to take procstat -kk -a.
> 
> Thank you Andriy for your answer.
> 
> Here is the procstat output :
> https://benrubson.github.io/zfs/procstat01.log


First, it seems that there are some iscsi threads stuck on a lock like:
    0 100291 kernel           iscsimt          mi_switch+0xd2 sleepq_wait+0x3a
_sx_xlock_hard+0x592 iscsi_maintenance_thread+0x316 fork_exit+0x85
fork_trampoline+0xe

or like

 8580 102077 iscsictl         -                mi_switch+0xd2 sleepq_wait+0x3a
_sx_slock_hard+0x325 iscsi_ioctl+0x7ea devfs_ioctl_f+0x13f kern_ioctl+0x2d4
sys_ioctl+0x171 amd64_syscall+0x4ce Xfast_syscall+0xfb

Also, there is a thread in cam_sim_free():
    0 100986 kernel           iscsimt          mi_switch+0xd2 sleepq_wait+0x3a
_sleep+0x2a1 cam_sim_free+0x48 iscsi_session_cleanup+0x1bd
iscsi_maintenance_thread+0x388 fork_exit+0x85 fork_trampoline+0xe

So, it looks like there could be a problem is the iscsi teardown path.

Maybe that caused a domino effect in ZFS code.  I see a lot of threads waiting
either for spa_namespace_lock or a spa config lock (a highly specialized ZFS
lock).  But it is hard to untangle their inter-dependencies.

Some of ZFS I/O threads are also affected, for example:
    0 101538 kernel           zio_write_issue_ mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
taskqueue_run_locked+0x14a taskqueue_thread_loop+0xe8 fork_exit+0x85
fork_trampoline+0xe
 8716 101319 sshd             -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold_array_by_dnode+0x1d3
dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
VOP_READ_APV+0x89 vn_read+0x157 vn_io_fault1+0x1c2 vn_io_fault+0x197
dofileread+0x98
71181 101141 encfs            -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold+0x3d
zap_lockdir+0x43 zap_cursor_retrieve+0x171 zfs_freebsd_readdir+0x3f3
VOP_READDIR_APV+0x8f kern_getdirentries+0x21b sys_getdirentries+0x28
amd64_syscall+0x4ce Xfast_syscall+0xfb
71181 101190 encfs            -                mi_switch+0xd2 sleepq_wait+0x3a
_cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
zio_nowait+0x49 arc_read+0x8e4 dbuf_prefetch_indirect_done+0xcc arc_read+0x425
dbuf_prefetch+0x4f7 dmu_zfetch+0x418 dmu_buf_hold_array_by_dnode+0x34d
dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
VOP_READ_APV+0x89 vn_read+0x157

Note that the first of these threads executes a write zio.

-- 
Andriy Gapon


More information about the freebsd-fs mailing list