Re: Repairing a bad ZFS free list

From: Rich <rincebrain_at_gmail.com>
Date: Sun, 06 Feb 2022 18:29:00 UTC
Hm, on further reading, that should already be in stable/13, sorry.

That said, I might suggest reporting it there or as a new bug mentioning
that one.

On Sun, Feb 6, 2022 at 1:23 PM Rich <rincebrain@gmail.com> wrote:

> https://github.com/openzfs/zfs/issues/11480 seems germane.
>
> I'm not 100% certain from reading the fix, but it seems like applying the
> patch should result in no longer panicking.
>
> - Rich
>
> On Sun, Feb 6, 2022 at 1:10 PM John F Carr <jfc@mit.edu> wrote:
>
>> I have a corrupt root ZFS pool on my ARM server (Ampere eMAG) running
>> a recent version of stable/13.  Is there any way to repair my system
>> short of wiping the disk and reinstalling?
>>
>> All filesystems mount and there are no errors reported by zpool, but
>> there is bad metadata, apparently a block having been allocated twice.
>> Running "zfs destroy" tends to cause crashes like
>>
>> panic: VERIFY3(l->blk_birth == r->blk_birth) failed (9269896 == 9269889)
>>
>> The assertion is in dsl_deadlist.c:livelist_compare().  There are two
>> livelist_entry_t objects containing blkptr_t objects with the same
>> DVA_GET_VDEV and DVA_GET_OFFSET but distinct blk_birth.  Apparently
>> this is a bad thing.
>>
>> spa_livelist_delete_cb appears in the stack trace.  I think the kernel is
>> telling
>> me the same block has been allocated twice and it doesn't want to free it
>> twice.
>>
>> This problem persists across reboot.  Since I want to use poudriere
>> "stop running zfs destroy" is not a good workaround.
>>
>> Is it safe to disable the assertion, or will that spread the
>> corruption even further?
>>
>> In the old days I would use clri or fsdb to make the problematic part
>> of a UFS filesystem go away.  How do I repair ZFS?
>>
>> This crash has been reported as bug
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538
>>
>>
>>