I/O to pool appears to be hung, panic !

Mon Jul 3 09:07:11 UTC 2017

> On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson at gmail.com> wrote:
> 
>> On 29 Jun 2017, at 14:43, Fabian Keil <freebsd-listen at fabiankeil.de> wrote:
> 
> Thank you for your feedback Fabian.
> 
>> Ben RUBSON <ben.rubson at gmail.com> wrote:
>> 
>>> One of my servers did a kernel panic last night, giving the following message :
>>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at '/dev/label/G23iscsi'.
>> [...] 
>>> Here are some numbers regarding this disk, taken from the server hosting the pool :
>>> (unfortunately not from the iscsi target server)
>>> https://s23.postimg.org/zd8jy9xaj/busydisk.png
>>> 
>>> We clearly see that suddendly, disk became 100% busy, meanwhile CPU was almost idle.

We also clearly see that 5 minutes later (02:09) disk seems to be back but became 100% busy again,
and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), panic occurred.

>>> No error message at all on both servers.
>> [...]
>>> The only log I have is the following stacktrace taken from the server console :
>>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at '/dev/label/G23iscsi'.
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67
>>> #1 0xffffffff80ad9462 at vpanic+0x182
>>> #2 0xffffffff80ad92d3 at panic+0x43
>>> #3 0xffffffff82238fa7 at vdev_deadman+0x127
>>> #4 0xffffffff82238ec0 at vdev_deadman+0x40
>>> #5 0xffffffff82238ec0 at vdev_deadman+0x40
>>> #6 0xffffffff8222d0a6 at spa_deadman+0x86
>>> #7 0xffffffff80af32da at softclock_call_cc+0x18a
>>> #8 0xffffffff80af3854 at softclock+0x94
>>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
>>> #10 0xffffffff80a936f6 at ithread_loop+0xc6
>>> #11 0xffffffff80a900d5 at fork_exit+0x85
>>> #12 0xffffffff80f846fe at fork_trampoline+0xe
>>> Uptime: 92d2h47m6s
>>> 
>>> I would have been pleased to make a dump available.
>>> However, despite my (correct ?) configuration, server did not dump :
>>> (nevertheless, "sysctl debug.kdb.panic=1" make it to dump)
>>> # grep ^dump /boot/loader.conf /etc/rc.conf
>>> /boot/loader.conf:dumpdev="/dev/mirror/swap"
>>> /etc/rc.conf:dumpdev="AUTO"
>> 
>> You may want to look at the NOTES section in gmirror(8).
> 
> Yes, I should already be OK (prefer algorithm set).
> 
>>> I use default kernel, with a rebuilt zfs module :
>>> # uname -v
>>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 
>>> 
>>> I use the following iSCSI configuration, which disconnects the disks "as soon as" they are unavailable :
>>> kern.iscsi.ping_timeout=5
>>> kern.iscsi.fail_on_disconnection=1
>>> kern.iscsi.iscsid_timeout=5
>>> 
>>> I then think disk was at least correctly reachable during these 20 busy minutes.
>>> 
>>> So, any idea why I could have faced this issue ?
>> 
>> Is it possible that the system was under memory pressure?
> 
> No I don't think it was :
> https://s1.postimg.org/uvsebpyyn/busydisk2.png
> More than 2GB of available memory.
> Swap not used (624kB).
> ARC behaviour seems correct (anon increases because ZFS can't actually write I think).
> Regarding the pool itself, it was receiving data at 6MB/s, sending around 30kB blocks to disks.
> When disk went busy, throughput fell to some kB, with 128kB blocks.
> 
>> geli's use of malloc() is known to cause deadlocks under memory pressure:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759
>> 
>> Given that gmirror uses malloc() as well it probably has the same issue.
> 
> I don't use geli so I should not face this issue.
> 
>>> I would have thought ZFS would have taken the busy device offline, instead of raising a panic.
>>> Perhaps it is already possible to make ZFS behave like this ?
>> 
>> There's a tunable for this: vfs.zfs.deadman_enabled.
>> If the panic is just a symptom of the deadlock it's unlikely
>> to help though.
> 
> I think this tunable should have prevented the server from having raised a panic :
> # sysctl -d vfs.zfs.deadman_enabled
> vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> # sysctl vfs.zfs.deadman_enabled
> vfs.zfs.deadman_enabled: 1
> 
> But not sure how it would have behaved then...
> (busy disk miraculously back to normal status, memory pressure due to anon increasing...)

I then think it would be nice, once vfs.zfs.deadman_synctime_ms has expired,
to be able to take the busy device offline instead of raising a panic.
Currently, disabling deadman will avoid the panic but will let the device slowing down the pool.

I still did not found the root cause of this issue, not sure I will,
quite difficult actually with a stacktrace and some performance graphs only :/

Thank you again,

Ben