Re: zfs mirrored pool dead after a disk death and reset

From: Steven Hartland <killing_at_multiplay.co.uk>
Date: Fri, 25 Feb 2022 13:30:32 UTC
Have you tried removing the dead disk physically. I've seen in the past a
bad disk sending causing bad data to be sent to the controller causing
knock on issues.

Also the output doesn't show multiple devices, only nvd0. I'm hoping you
didn't use nv raid to create the mirror, as that means there's no ZFS
protection?

On Fri, 25 Feb 2022 at 11:07, Eugene M. Zheganin <eugene@zhegan.in> wrote:

> Hello.
>
> Recently a disk died in one of my servers running 12.2
> (12.2-RELEASE-p2). So.... it died, I got a bunch of dmesg errors saying
> there's a bunch of i/o commands stuck, OS became partially livelocked (I
> still could login, but barely could do anything) so.... considering this
> is a mirrored pool, and "I have done it many times before, nothing could
> be safer !" I sent a reset to the server via IPMI.
>
> And it was quite discouraging finding this after a successful boot-up
> from intact zroot (yeah, I've already tried to zpool import -F after an
> export, so initially it was imported already, showing the same
> devastating state):
>
>
> [root@db0:~]# zpool import
> pool: data
> id: 15967028801499953224
> state: FAULTED
> status: One or more devices contains corrupted data.
> action: The pool cannot be imported due to damaged devices or data.
> The pool may be active on another system, but can be imported using
> the '-f' flag.
> see: http://illumos.org/msg/ZFS-8000-5E
> config:
> data                   FAULTED  corrupted data
> 9566965891719887395  FAULTED  corrupted data
> nvd0                 ONLINE
>
>
> # zpool import -F data
> cannot import 'data': one or more devices is currently unavailable
>
>
> Well, -yeah, I do have a replica, I didn't lose one bit of data, but
> it's still a tragedy - to lose pool after one silly reset (and I have
> done it literally a hundred times before on various servers and FreeBSD
> versions).
>
> So, a couple of questions:
>
> - is it worth trying FreeBSD 13 to recover ? (just to get the experience
> if it can be still recovered)
>
> - is it because it's more dangerous with NVMes or would it also happen
> on SSD/rotational drives ?
>
> - would zpool checkpoint save me in this case ?
>
>
> Thanks.
>
> Eugene.
>
>
>