ZFS stalled after some mirror disks were lost

Tue Oct 3 11:43:14 UTC 2017

> On 03 Oct 2017, at 08:19, Ben RUBSON <ben.rubson at gmail.com> wrote:
> 
> Hi,
> 
> Putting scsi list as it could be related.
> 
>> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson at gmail.com> wrote:
>> 
>> Hi,
>> 
>> On a FreeBSD 11 server, the following online/healthy zpool :
>> 
>> home
>> mirror-0
>>   label/local1
>>   label/local2
>>   label/iscsi1
>>   label/iscsi2
>> mirror-1
>>   label/local3
>>   label/local4
>>   label/iscsi3
>>   label/iscsi4
>> cache
>> label/local5
>> label/local6
>> 
>> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk
>> according to "zpool iostat", nothing on local disks (strange but I
>> noticed that IOs always prefer iscsi disks to local disks).
>> No write IOs.
>> 
>> Let's disconnect all iSCSI disks :
>> iscsictl -Ra
>> 
>> Expected behavior :
>> IO activity flawlessly continue on local disks.
>> 
>> What happened :
>> All IOs stalled, server only answers to IOs made to its zroot pool.
>> All commands related to the iSCSI disks (iscsictl), or to ZFS (zfs/zpool),
>> don't return.
>> 
>> Questions :
>> Why this behavior ?
>> How to know what happens ? (/var/log/messages says almost nothing)
>> 
>> I already disconnected the iSCSI disks without any issue in the past,
>> several times, but there were almost no IOs running.
>> 
>> Thank you for your help !
>> 
>> Ben
> 
>> On 02 Oct 2017, at 22:55, Andriy Gapon <avg at FreeBSD.org> wrote:
>> 
>>> On 02/10/2017 22:13, Ben RUBSON wrote:
>>> 
>>>> On 02 Oct 2017, at 20:45, Andriy Gapon <avg at FreeBSD.org> wrote:
>>>> 
>>>>> On 02/10/2017 21:17, Ben RUBSON wrote:
>>>>> 
>>>>> Unfortunately the zpool command stalls / does not return :/
>>>> 
>>>> Try to take procstat -kk -a.
>>> 
>>> Here is the procstat output :
>>> https://benrubson.github.io/zfs/procstat01.log
>> 
>> First, it seems that there are some iscsi threads stuck on a lock like:
>>   0 100291 kernel           iscsimt          mi_switch+0xd2 sleepq_wait+0x3a
>> _sx_xlock_hard+0x592 iscsi_maintenance_thread+0x316 fork_exit+0x85
>> fork_trampoline+0xe
>> 
>> or like
>> 
>> 8580 102077 iscsictl         -                mi_switch+0xd2 sleepq_wait+0x3a
>> _sx_slock_hard+0x325 iscsi_ioctl+0x7ea devfs_ioctl_f+0x13f kern_ioctl+0x2d4
>> sys_ioctl+0x171 amd64_syscall+0x4ce Xfast_syscall+0xfb
>> 
>> Also, there is a thread in cam_sim_free():
>>   0 100986 kernel           iscsimt          mi_switch+0xd2 sleepq_wait+0x3a
>> _sleep+0x2a1 cam_sim_free+0x48 iscsi_session_cleanup+0x1bd
>> iscsi_maintenance_thread+0x388 fork_exit+0x85 fork_trampoline+0xe
>> 
>> So, it looks like there could be a problem is the iscsi teardown path.
>> 
>> Maybe that caused a domino effect in ZFS code.  I see a lot of threads waiting
>> either for spa_namespace_lock or a spa config lock (a highly specialized ZFS
>> lock).  But it is hard to untangle their inter-dependencies.
>> 
>> Some of ZFS I/O threads are also affected, for example:
>>   0 101538 kernel           zio_write_issue_ mi_switch+0xd2 sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
>> taskqueue_run_locked+0x14a taskqueue_thread_loop+0xe8 fork_exit+0x85
>> fork_trampoline+0xe
>> 8716 101319 sshd             -                mi_switch+0xd2 sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold_array_by_dnode+0x1d3
>> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
>> VOP_READ_APV+0x89 vn_read+0x157 vn_io_fault1+0x1c2 vn_io_fault+0x197
>> dofileread+0x98
>> 71181 101141 encfs            -                mi_switch+0xd2 sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold+0x3d
>> zap_lockdir+0x43 zap_cursor_retrieve+0x171 zfs_freebsd_readdir+0x3f3
>> VOP_READDIR_APV+0x8f kern_getdirentries+0x21b sys_getdirentries+0x28
>> amd64_syscall+0x4ce Xfast_syscall+0xfb
>> 71181 101190 encfs            -                mi_switch+0xd2 sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_prefetch_indirect_done+0xcc arc_read+0x425
>> dbuf_prefetch+0x4f7 dmu_zfetch+0x418 dmu_buf_hold_array_by_dnode+0x34d
>> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
>> VOP_READ_APV+0x89 vn_read+0x157
>> 
>> Note that the first of these threads executes a write zio.
>> 
>> It would be nice to determine an owner of spa_namespace_lock.
>> If you have debug symbols then it can be easily done in kgdb on the live system:
>> (kgdb) p spa_namespace_lock
> 
> So as said a few minutes ago I lost access to the server and had to recycle it.
> Thankfully I managed to reproduce the issue, re-playing exactly the same steps.
> 
> Curious line in /var/log/messages :
> kernel: g_access(918): provider da18 has error
> (da18 is the remaining iSCSI target device which did not disconnect properly)
> 
> procstat -kk -a :
> https://benrubson.github.io/zfs/procstat02.log
> 
> (kgdb) p spa_namespace_lock
> $1 = -2110867066

This time with debug symbols.

procstat -kk -a :
https://benrubson.github.io/zfs/procstat03.log

(kgdb) p spa_namespace_lock
$1 = {
  lock_object = {
    lo_name = 0xffffffff822eb986 "spa_namespace_lock", 
    lo_flags = 40960000, 
    lo_data = 0, 
    lo_witness = 0x0
  }, 
  sx_lock = 18446735285324580100
}

Easily reproductible.
No issue however is there is no IO load.
As soon as there is IO load, I can reproduce the issue.

Ben