Re: Repairing a bad ZFS free list

From: Rich <rincebrain_at_gmail.com>
Date: Sun, 06 Feb 2022 18:23:40 UTC
https://github.com/openzfs/zfs/issues/11480 seems germane.

I'm not 100% certain from reading the fix, but it seems like applying the
patch should result in no longer panicking.

- Rich

On Sun, Feb 6, 2022 at 1:10 PM John F Carr <jfc@mit.edu> wrote:

> I have a corrupt root ZFS pool on my ARM server (Ampere eMAG) running
> a recent version of stable/13.  Is there any way to repair my system
> short of wiping the disk and reinstalling?
>
> All filesystems mount and there are no errors reported by zpool, but
> there is bad metadata, apparently a block having been allocated twice.
> Running "zfs destroy" tends to cause crashes like
>
> panic: VERIFY3(l->blk_birth == r->blk_birth) failed (9269896 == 9269889)
>
> The assertion is in dsl_deadlist.c:livelist_compare().  There are two
> livelist_entry_t objects containing blkptr_t objects with the same
> DVA_GET_VDEV and DVA_GET_OFFSET but distinct blk_birth.  Apparently
> this is a bad thing.
>
> spa_livelist_delete_cb appears in the stack trace.  I think the kernel is
> telling
> me the same block has been allocated twice and it doesn't want to free it
> twice.
>
> This problem persists across reboot.  Since I want to use poudriere
> "stop running zfs destroy" is not a good workaround.
>
> Is it safe to disable the assertion, or will that spread the
> corruption even further?
>
> In the old days I would use clri or fsdb to make the problematic part
> of a UFS filesystem go away.  How do I repair ZFS?
>
> This crash has been reported as bug
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538
>
>
>