I/O to pool appears to be hung, panic !

Mon Jul 3 11:10:44 UTC 2017

2017-07-03 10:07 GMT+01:00 Ben RUBSON <ben.rubson at gmail.com>:

> > On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson at gmail.com> wrote:
> >
> >> On 29 Jun 2017, at 14:43, Fabian Keil <freebsd-listen at fabiankeil.de>
> wrote:
> >
> > Thank you for your feedback Fabian.
> >
> >> Ben RUBSON <ben.rubson at gmail.com> wrote:
> >>
> >>> One of my servers did a kernel panic last night, giving the following
> message :
> >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at
> '/dev/label/G23iscsi'.
> >> [...]
> >>> Here are some numbers regarding this disk, taken from the server
> hosting the pool :
> >>> (unfortunately not from the iscsi target server)
> >>> https://s23.postimg.org/zd8jy9xaj/busydisk.png
> >>>
> >>> We clearly see that suddendly, disk became 100% busy, meanwhile CPU
> was almost idle.
>
> We also clearly see that 5 minutes later (02:09) disk seems to be back but
> became 100% busy again,
> and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), panic
> occurred.
>
> >>> No error message at all on both servers.
> >> [...]
> >>> The only log I have is the following stacktrace taken from the server
> console :
> >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at
> '/dev/label/G23iscsi'.
> >>> cpuid = 0
> >>> KDB: stack backtrace:
> >>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67
> >>> #1 0xffffffff80ad9462 at vpanic+0x182
> >>> #2 0xffffffff80ad92d3 at panic+0x43
> >>> #3 0xffffffff82238fa7 at vdev_deadman+0x127
> >>> #4 0xffffffff82238ec0 at vdev_deadman+0x40
> >>> #5 0xffffffff82238ec0 at vdev_deadman+0x40
> >>> #6 0xffffffff8222d0a6 at spa_deadman+0x86
> >>> #7 0xffffffff80af32da at softclock_call_cc+0x18a
> >>> #8 0xffffffff80af3854 at softclock+0x94
> >>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
> >>> #10 0xffffffff80a936f6 at ithread_loop+0xc6
> >>> #11 0xffffffff80a900d5 at fork_exit+0x85
> >>> #12 0xffffffff80f846fe at fork_trampoline+0xe
> >>> Uptime: 92d2h47m6s
> >>>
> >>> I would have been pleased to make a dump available.
> >>> However, despite my (correct ?) configuration, server did not dump :
> >>> (nevertheless, "sysctl debug.kdb.panic=1" make it to dump)
> >>> # grep ^dump /boot/loader.conf /etc/rc.conf
> >>> /boot/loader.conf:dumpdev="/dev/mirror/swap"
> >>> /etc/rc.conf:dumpdev="AUTO"
> >>
> >> You may want to look at the NOTES section in gmirror(8).
> >
> > Yes, I should already be OK (prefer algorithm set).
> >
> >>> I use default kernel, with a rebuilt zfs module :
> >>> # uname -v
> >>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017
>  root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
> >>>
> >>> I use the following iSCSI configuration, which disconnects the disks
> "as soon as" they are unavailable :
> >>> kern.iscsi.ping_timeout=5
> >>> kern.iscsi.fail_on_disconnection=1
> >>> kern.iscsi.iscsid_timeout=5
> >>>
> >>> I then think disk was at least correctly reachable during these 20
> busy minutes.
> >>>
> >>> So, any idea why I could have faced this issue ?
> >>
> >> Is it possible that the system was under memory pressure?
> >
> > No I don't think it was :
> > https://s1.postimg.org/uvsebpyyn/busydisk2.png
> > More than 2GB of available memory.
> > Swap not used (624kB).
> > ARC behaviour seems correct (anon increases because ZFS can't actually
> write I think).
> > Regarding the pool itself, it was receiving data at 6MB/s, sending
> around 30kB blocks to disks.
> > When disk went busy, throughput fell to some kB, with 128kB blocks.
> >
> >> geli's use of malloc() is known to cause deadlocks under memory
> pressure:
> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759
> >>
> >> Given that gmirror uses malloc() as well it probably has the same issue.
> >
> > I don't use geli so I should not face this issue.
> >
> >>> I would have thought ZFS would have taken the busy device offline,
> instead of raising a panic.
> >>> Perhaps it is already possible to make ZFS behave like this ?
> >>
> >> There's a tunable for this: vfs.zfs.deadman_enabled.
> >> If the panic is just a symptom of the deadlock it's unlikely
> >> to help though.
> >
> > I think this tunable should have prevented the server from having raised
> a panic :
> > # sysctl -d vfs.zfs.deadman_enabled
> > vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> > # sysctl vfs.zfs.deadman_enabled
> > vfs.zfs.deadman_enabled: 1
> >
> > But not sure how it would have behaved then...
> > (busy disk miraculously back to normal status, memory pressure due to
> anon increasing...)
>
> I then think it would be nice, once vfs.zfs.deadman_synctime_ms has
> expired,
> to be able to take the busy device offline instead of raising a panic.
> Currently, disabling deadman will avoid the panic but will let the device
> slowing down the pool.
>
> I still did not found the root cause of this issue, not sure I will,
> quite difficult actually with a stacktrace and some performance graphs
> only :/
>

What exactly is the disk doing when that happens?  What does "gstat" say?
If the iSCSI
target is also FreeBSD, what does ctlstat say?

If everything else fails, you might want to do a packet trace using
"tcpdump -w" and take
a look at it using Wireshark; it decodes iSCSI quite nicely.