Re: ZFS txg rollback: expected timeframe?

From: Alexander Leidinger <Alexander_at_Leidinger.net>
Date: Sat, 04 Nov 2023 17:35:32 UTC
Am 2023-10-31 15:27, schrieb Alexander Leidinger:

> On Tue, Oct 31, 2023 at 1:15 PM John F Carr <jfc@mit.edu> wrote:
> 
>>> On Oct 31, 2023, at 06:16, Alexander Leidinger 
>>> <alexleidingerde@gmail.com> wrote:
>>> 
>>> Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in 
>>> raidz2 setup) in a way that a normal import of the pool panics the 
>>> machine with "VERIFY3(l->blk_birth == r->blk_birth) failed (101867360 
>>> == 101867222)".
>>> 
>> 
>> I disabled that assertion because it gives a false alarm with some 
>> combinaion
>> of deduplication, cloning, and snapshotting on one of my systems.
>> 
>> See
>> 
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538
>> https://github.com/openzfs/zfs/issues/11480
> 
> I don't have deduplication on this pool. There are clones, and 
> snapshots, and there could be recent ones if poudriere does some. Is it 
> still a false alarm in this case? If yes, you say a kernel with this 
> patch applied should let me import the pool without rollback?
> 
> The github issue is from 2022, I have my doubts that this is the same 
> issue we see. I rather expect some issues around the copy_file_range(2) 
> related code for ZFS which was re-enabled 20 days ago (maybe it is 
> valid to remove this assert, or maybe the block cloning part needs some 
> tweak). CC Martin for the block cloning part.

So in the end I was at least able to import the pool read-only with the 
patch to disable this VERIFY3 panic. After a final incremental backup I 
re-created the pool (with vfs.zfs.bclone_enabled=0) and restored all 
datasets. Now some checks (this VERIFY3 part is enabled again) and then 
a full backup.

There is still the question what caused it. With the above report from 
John about some issues when dedup is enabled (which wasn't on this pool 
until the default of bclone_enabled was changed to 1, which is some kind 
of dedup internal to ZFS as far as I was understanding the description 
of block cloning in the openzfs ticket of block cloning) I have some 
reservations about enabling it again.

Martin, maybe it's a good idea to disable block cloning again, until 
someone with the corresponding OpenZFS knowledge has investigated 
this...

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF