ZFS stalled after some mirror disks were lost

Mon Oct 2 19:47:12 UTC 2017

On 02/10/2017 20:10, Ben RUBSON wrote:
>> On 02 Oct 2017, at 20:41, Steven Hartland <killing at multiplay.co.uk> wrote:
>>
>> I'm guessing that the devices haven't disconnected cleanly so are just stalling all requests to them and hence the pool.
> I even tried to ifconfig down the network interface serving the iscsi targets, it did not help.
>
>> I'm not that familiar with iscsi, does it still show under under camcontrol or geom?
> # geom disk list
> (...)
> Geom name: da13
> Providers:
> 1. Name: da13
>     Mediasize: 3999688294912 (3.6T)
>     Sectorsize: 512
>     Mode: r1w1e2
>     wither: (null)
>
> Geom name: da15
> Providers:
> 1. Name: da15
>     Mediasize: 3999688294912 (3.6T)
>     Sectorsize: 512
>     Mode: r1w1e2
>     wither: (null)
>
> Geom name: da16
> Providers:
> 1. Name: da16
>     Mediasize: 3999688294912 (3.6T)
>     Sectorsize: 512
>     Mode: r1w1e2
>     wither: (null)
>
> Geom name: da19
> Providers:
> 1. Name: da19
>     Mediasize: 3999688294912 (3.6T)
>     Sectorsize: 512
>     Mode: r1w1e2
>     wither: (null)
>
> # camcontrol devlist
> // does not show the above disks
So these daXX devices represent your iscsi devices?

If so looks like your problem is at the iscsi layer, as its not 
disconnected properly, so as far ZFS is concerned its still waiting for 
them.
>
>> Does iscsid have any options on how to treat failed devices?
> iSCSI has some tuning regarding how to treat failing devices, and I did it :
> kern.iscsi.ping_timeout=5
> kern.iscsi.iscsid_timeout=5
> kern.iscsi.login_timeout=85
> kern.iscsi.fail_on_disconnection=1
>
> However, as I disconnected the targets from the server hosting the zpool,
> they should not have been needed.
     Regards
     Steve