Re: reviving ZFS in broken sm_start+sm_size state

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Tue, 02 Sep 2025 18:32:00 UTC
Hi Dmitry,

This is a space map corruption, that could happen even some time before 
the reboot.  You should be able to import the pool read-only to evacuate 
the data, since read-only import does not load space maps. Unfortunately 
without having any reproduction of the actual corruption we might not be 
able to understand how it happened.  It might be either software of 
hardware, so unless you have ECC RAM, you may wish to test it.  You may 
also try to use `zdb -emmmm ...` to dump the metaslabs on the pool and 
look for more corruptions and their patterns, hoping it give any more ideas.

On 01.09.2025 11:22, Dmitry Morozovsky wrote:
> Dear colleagues,
> 
> after some (AFAIR clean) reboot of current with ZFS-on-root I had (OCRed from
> mobile photo but hopefully good enough) unbootable system with the following
> panic:
> 
> --- 8< ---
> panic: VERIFY3U(entry_offset, <, sm->sm_start + sm->sm_size) failed (1847270282567680 < 92341796864)
> 
> cpuid = 2
> time = 1756738203
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0149c856d0
> vpanic at vpanic+0x136/frame 0xfffffe8149c85800
> spl_panic at spl_panic+0x3a/frame 0xfffffe0149c85860
> space_map_iterate() at space_map_iterate+0x3b1/frame 0xfffffe0149c85920
> space_map_load_length() at space_map_load_length+0x5f/frame 0xfffffe8149c85970
> metaslab_load() at metaslab_load+0x529/frame 0xfffffe8149c85a40
> metaslab_activate() at metaslab_activate+0x46/frame 0xfffffe8149c85a88
> metaslab_alloc_dva_range() at metaslab_alloc_dva_range+0x7f9/frame 0xfffffe0149c85bb0
> metaslab_alloc_range() at metaslab_alloc_range+8x2c2/frame 8xfffffe8149c85c70
> metaslab_alloc() at metaslab_allo
> zio_dva_allocate() at 0xfffffe0149c85cc0
> zio_execute() at zio iraframe 0xfffffe0149c85e10/frame 0xfffffe0149c85e40
> taskqueue_run_locked) at taskqueue_run_locked+0x1c2/frame 0xfffffe0149c85ec0
> taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0149c85ef0
> fork_exit() at fork_exit+0x82/frame 0xfffffe0149c85f30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0149c85f30
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 0 tid 101011]
> Stopped at
>   --- 8< ---
> 
> attempts to boot from last snapshot and/or trying to boot from PRERELEASE and
> zpool import lead to exactly the same results, even with different '-F'
> options:
> 
> pool *seems* to be importable but actually isn't due to mad entry_offset as I
> can see from source
> 
> any hints how could I resolve this?  the pool content itself is not **very**
> important, but avoiding recreation would be nice

-- 
Alexander Motin