ZFS txg rollback: expected timeframe?
Date: Tue, 31 Oct 2023 10:16:01 UTC
Hi, yes, the answer to $Subject is hard. I know. Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in raidz2 setup) in a way that a normal import of the pool panics the machine with "VERIFY3(l->blk_birth == r->blk_birth) failed (101867360 == 101867222)". There are backups, but a zpoool import with "-N -F -T xxx" should work too and remove the need to restore from full backup (via USB) + incremental (from S3/tarsnap). During the crash there was a poudriere run of maybe 15 ports active (including qt6-<web-something>), ccache is in use for this. The rest (in amounts of data) is just little stuff. What is the expectation of the runtime on 5k rpm spinning rust (WD red)? So far all the disks are at 100% (gstat) since about 26h. On a related note: Is there a reason why "-T" is not documented? After a while I get "Panic String: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe023ae831e0 (l2arc_feed_thread), blocked for 1802817 ticks" during such an import and I had to set debug.deadlkres.blktime_threshold: 1215752191 debug.deadlkres.slptime_threshold: 1215752191 Setting vfs.zfs.deadman.enabled=0 didn't help (it's still set). Is there something more wrong with my pool than expected, or is this some kind of a bug that such an import is triggering this panic? The SSDs with l2arc and ZIL don't show up at all in the gstat, and I don't expect them to show up on an import with rollback to a previous txg, as such I was surprised to see such a panic. Bye, Alexander.