Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>

From: Tomek CEDRO <tomek_at_cedro.info>
Date: Wed, 12 Nov 2025 19:20:41 UTC
On Wed, Nov 12, 2025 at 3:19 PM Frank Leonhardt <freebsd-doc@fjl.co.uk> wrote:
>
> On 12/11/2025 09:25, Sad Clouds wrote:
> > On Wed, 12 Nov 2025 03:58:34 +0100
> > Tomek CEDRO <tomek@cedro.info> wrote:
> >
> >> Hello world :-)
> >>
> >> On 14.3-RELEASE-p5 amd64 I have encountered a kernel panic (will
> >> report on bugzilla in a moment). After that I found some sites did not
> >> load in a web browser, so my first guess was to try zpool status -v
> >> and I got this:
> >>
> >> errors: Permanent errors have been detected in the following files:
> >>          zroot/usr/home:<0x0>
> >>
> >> Any guess what does the <0x0> mean and how to fix the situation?
> >> It should be a file name right?
> >>
> >> zpool scrub and resilver did not help :-(
> >>
> >> Will rolling back a snapshot fix the problem?
> >>
> >> Any hints appreciated :-)
> >> Tomek
> >>
> >> --
> >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
> >>
> > Hi, I'm not a ZFS expert, but I wonder if this error is related to some
> > of the ZFS internal objects, rather than the file data blocks being
> > corrupted. In which case, ZFS may not be able to correctly repair it?
> >
> > I'm currently evaluating ZFS on FreeBSD for some of my storage needs
> > and your report is a bit concerning. Are you able to share the details
> > on the I/O workloads and the storage geometry you use? Do you have more
> > info on the kernel panic message or backtraces?
> >
> > If you put it all in the bug, then can you please share the bug ID?
> >
> > Thanks.
>
> I suspect <0x0> refers to the object number within a dataset, with zero
> being metadata. A permanent error is bad news.
>
> zpool scrub doesn't fix any errors - well not exactly. It tries to read
> everything and if it finds an error, it'll fix it. If you encounter an
> error outside of a scrub it'll fix it anyway, if it can. The point of a
> scrub is to ensure all your data is readable even if it hasn't been read
> it a while.
>
> This is most likely down to a compound hardware failure - with flaky
> drives it's still possible to lose both copies and not know about it
> (hence doing scrubs).
>
> My advice would be to back up what's remaining to tape ASAP before
> anything else.
>
> You can sometimes rollback to an earlier version of a dataset - take a
> look and see if it's readable (i.e. mount it or look for it in the .zfs
> directory). One good way is to use "zfs clone
> zroot/usr/home@snapshotname  zroot/usr/home_fingerscrossed"
>
> ZFS is NOT great at detecting failed drives. I'm currently investigating
> this for my blog (and it was the subject of a question I posted here in
> about Jan to which the answer was "hmm"). zfsd, however, does monitor
> drive health using devctl and might pick up impending drive failures
> before you get to this stage. I'm going through the source code now to
> convince myself it works (it's written in C++ and appears to be
> influenced by Design Patters so it's not exactly clear!)
>
> If the metadata for a dataset is unrecoverable you'll need to destroy
> and recreate the dataset from a backup. HOWEVER, I'd be investigating
> the health of the drives. dd them to /dev/null and see what you get. You
> can actually do this while ZFS is using them. Also check the console log
> for CAM messages - if it's got to that stage you really need to think
> about data recovery.
>
> Regards, Frank.

Sorry my previous response should go here :-P

Hmm this is brand new NVME drive not really likely to fail. I have the
same problem on zraid0 (stripe) array while initially I saw the bad
file name with 3 problems (vm image) it now turned into
ztuff/vm:<0x482>. Charlie Foxtrot :-(

I also have 4x4TB HDD zraid2 array and this one was not affected.

I have some snapshots will try to revert and see if that helps. Before
that I will backup data and then try to recover what is possible after
snap rollback.

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info