Re: reviving ZFS in broken sm_start+sm_size state

In reply to: Alexander Motin : "Re: reviving ZFS in broken sm_start+sm_size state"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Dmitry Morozovsky <woozle_at_woozle.net>
Date: Tue, 02 Sep 2025 19:01:39 UTC
Alexander, nice to hear from ya!

On Tue, 2 Sep 2025, Alexander Motin wrote:

> Hi Dmitry,
> 
> This is a space map corruption, that could happen even some time before the
> reboot.  You should be able to import the pool read-only to evacuate the data,

ah, that mostly straightforward idea somehow missed my mind!

and yes, I confirm 

zpool import -o readonly=on -R /mnt

did not panic, and `find -s /mnt` produced reasonable result

> since read-only import does not load space maps. Unfortunately without having
> any reproduction of the actual corruption we might not be able to understand
> how it happened.  It might be either software of hardware, so unless you have
> ECC RAM, you may wish to test it.  You may also try to use `zdb -emmmm ...` to
> dump the metaslabs on the pool and look for more corruptions and their
> patterns, hoping it give any more ideas.

well, as I said, there's not much data to evacuate, good enough backups are in 
place, so I'd rather try to do smth to locate and hopefully help to fix 
underlying bugs

output from which commands would be useful?

thanks again!

> 
> On 01.09.2025 11:22, Dmitry Morozovsky wrote:
> > Dear colleagues,
> > 
> > after some (AFAIR clean) reboot of current with ZFS-on-root I had (OCRed
> > from
> > mobile photo but hopefully good enough) unbootable system with the following
> > panic:
> > 
> > --- 8< ---
> > panic: VERIFY3U(entry_offset, <, sm->sm_start + sm->sm_size) failed
> > (1847270282567680 < 92341796864)
> > 
> > cpuid = 2
> > time = 1756738203
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe0149c856d0
> > vpanic at vpanic+0x136/frame 0xfffffe8149c85800
> > spl_panic at spl_panic+0x3a/frame 0xfffffe0149c85860
> > space_map_iterate() at space_map_iterate+0x3b1/frame 0xfffffe0149c85920
> > space_map_load_length() at space_map_load_length+0x5f/frame
> > 0xfffffe8149c85970
> > metaslab_load() at metaslab_load+0x529/frame 0xfffffe8149c85a40
> > metaslab_activate() at metaslab_activate+0x46/frame 0xfffffe8149c85a88
> > metaslab_alloc_dva_range() at metaslab_alloc_dva_range+0x7f9/frame
> > 0xfffffe0149c85bb0
> > metaslab_alloc_range() at metaslab_alloc_range+8x2c2/frame
> > 8xfffffe8149c85c70
> > metaslab_alloc() at metaslab_allo
> > zio_dva_allocate() at 0xfffffe0149c85cc0
> > zio_execute() at zio iraframe 0xfffffe0149c85e10/frame 0xfffffe0149c85e40
> > taskqueue_run_locked) at taskqueue_run_locked+0x1c2/frame 0xfffffe0149c85ec0
> > taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame
> > 0xfffffe0149c85ef0
> > fork_exit() at fork_exit+0x82/frame 0xfffffe0149c85f30
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0149c85f30
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > [ thread pid 0 tid 101011]
> > Stopped at
> >   --- 8< ---
> > 
> > attempts to boot from last snapshot and/or trying to boot from PRERELEASE
> > and
> > zpool import lead to exactly the same results, even with different '-F'
> > options:
> > 
> > pool *seems* to be importable but actually isn't due to mad entry_offset as
> > I
> > can see from source
> > 
> > any hints how could I resolve this?  the pool content itself is not **very**
> > important, but avoiding recreation would be nice
> 
> 

-- 
Sincerely,
D.Marck                                                          [MCK-RIPE]
[ FreeBSD committer:                                    marck@FreeBSD.org ]
---------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- woozle@woozle.net ***
---------------------------------------------------------------------------