Re: CURRENT: ZFS freezes system beyond reboot

From: Alan Somers <asomers_at_freebsd.org>
Date: Sun, 12 Dec 2021 16:45:06 UTC
On Sun, Dec 12, 2021 at 2:22 AM FreeBSD User <freebsd@walstatt-de.de> wrote:
>
> Running CURRENT (FreeBSD 14.0-CURRENT #52 main-n251260-156fbc64857: Thu
> Dec  2 14:45:55 CET 2021 amd64), out of the sudden the ZFS RAIDZ pool
> suffered from an error:
>
> Solaris: WARNING: Pool 'POOL00' has encountered an uncorrectable I/O
> failure and has been suspended.
>
> The system does not repsond anymore on that pool, transactions to and
> from that pool are frozen, the system is 99.9% idle.
> The most "not so funny" part is: the box doesn't even recognize a
> "shutdown -r now" or a brute force "reboot". I still can login via ssh,
> but any action regarding the ZFS pool freezes the console/terminal.
>
> ZFS very often renders the system unresponsible forever. How can this
> be mitigated? The system in question is on a remote site and it seems
> not only to be bound to CURRENT, we realised similar problems on
> 13-STABLE as well.
>
> What can I do to "unfreeze" the ZFS? The main OS is, luckily, on an
> UFS/FFS filesystem and so not affected from that problem.
>
> By the way, here some more details, as far as I can pick those up:
>
> zpool clear POOL00 cannot clear errors for POOL00: I/O error
>
> Whatever took out the ZFS pool (can not see any hardware errors, the
> pool is part of services and especially a poudriere build system and
> under heavy load all the time, the box has 16 GB RAM), it also renders
> the rest of the system unusable in a way which is beyond a "reboot".
>
> Kind regrads,
> oh

You need to look at what's causing those errors.  What kind of disks
are you using, with what HBA?  It's not surprising that any access to
ZFS hangs; that's what it's designed to do when a pool is suspended.