ZFS txg rollback: expected timeframe?

From: Alexander Leidinger <alexleidingerde_at_gmail.com>
Date: Tue, 31 Oct 2023 10:16:01 UTC
Hi,

yes, the answer to $Subject is hard. I know.

Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in raidz2
setup) in a way that a normal import of the pool panics the machine with
"VERIFY3(l->blk_birth == r->blk_birth) failed (101867360 == 101867222)".

There are backups, but a zpoool import with "-N -F -T xxx" should work too
and remove the need to restore from full backup (via  USB) + incremental
(from S3/tarsnap).

During the crash there was a poudriere run of maybe 15 ports active
(including qt6-<web-something>), ccache is in use for this. The rest (in
amounts of data) is just little stuff.

What is the expectation of the runtime on 5k rpm spinning rust (WD red)? So
far all the disks are at 100% (gstat) since about 26h.

On a related note:
Is there a reason why "-T" is not documented?
After a while I get "Panic String: deadlres_td_sleep_q: possible deadlock
detected for 0xfffffe023ae831e0 (l2arc_feed_thread), blocked for 1802817
ticks" during such an import and I had to set
debug.deadlkres.blktime_threshold: 1215752191
debug.deadlkres.slptime_threshold: 1215752191
Setting vfs.zfs.deadman.enabled=0 didn't help (it's still set).
Is there something more wrong with my pool than expected, or is this some
kind of a bug that such an import is triggering this panic?
The SSDs with l2arc and ZIL don't show up at all in the gstat, and I don't
expect them to show up on an import with rollback to a previous txg, as
such I was surprised to see such a panic.

Bye,
Alexander.