Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Wed, 12 Nov 2025 14:18:54 UTC
On 12/11/2025 09:25, Sad Clouds wrote:
> On Wed, 12 Nov 2025 03:58:34 +0100
> Tomek CEDRO <tomek@cedro.info> wrote:
>
>> Hello world :-)
>>
>> On 14.3-RELEASE-p5 amd64 I have encountered a kernel panic (will
>> report on bugzilla in a moment). After that I found some sites did not
>> load in a web browser, so my first guess was to try zpool status -v
>> and I got this:
>>
>> errors: Permanent errors have been detected in the following files:
>>          zroot/usr/home:<0x0>
>>
>> Any guess what does the <0x0> mean and how to fix the situation?
>> It should be a file name right?
>>
>> zpool scrub and resilver did not help :-(
>>
>> Will rolling back a snapshot fix the problem?
>>
>> Any hints appreciated :-)
>> Tomek
>>
>> --
>> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>>
> Hi, I'm not a ZFS expert, but I wonder if this error is related to some
> of the ZFS internal objects, rather than the file data blocks being
> corrupted. In which case, ZFS may not be able to correctly repair it?
>
> I'm currently evaluating ZFS on FreeBSD for some of my storage needs
> and your report is a bit concerning. Are you able to share the details
> on the I/O workloads and the storage geometry you use? Do you have more
> info on the kernel panic message or backtraces?
>
> If you put it all in the bug, then can you please share the bug ID?
>
> Thanks.

I suspect <0x0> refers to the object number within a dataset, with zero 
being metadata. A permanent error is bad news.

zpool scrub doesn't fix any errors - well not exactly. It tries to read 
everything and if it finds an error, it'll fix it. If you encounter an 
error outside of a scrub it'll fix it anyway, if it can. The point of a 
scrub is to ensure all your data is readable even if it hasn't been read 
it a while.

This is most likely down to a compound hardware failure - with flaky 
drives it's still possible to lose both copies and not know about it 
(hence doing scrubs).

My advice would be to back up what's remaining to tape ASAP before 
anything else.

You can sometimes rollback to an earlier version of a dataset - take a 
look and see if it's readable (i.e. mount it or look for it in the .zfs 
directory). One good way is to use "zfs clone 
zroot/usr/home@snapshotname  zroot/usr/home_fingerscrossed"

ZFS is NOT great at detecting failed drives. I'm currently investigating 
this for my blog (and it was the subject of a question I posted here in 
about Jan to which the answer was "hmm"). zfsd, however, does monitor 
drive health using devctl and might pick up impending drive failures 
before you get to this stage. I'm going through the source code now to 
convince myself it works (it's written in C++ and appears to be 
influenced by Design Patters so it's not exactly clear!)

If the metadata for a dataset is unrecoverable you'll need to destroy 
and recreate the dataset from a backup. HOWEVER, I'd be investigating 
the health of the drives. dd them to /dev/null and see what you get. You 
can actually do this while ZFS is using them. Also check the console log 
for CAM messages - if it's got to that stage you really need to think 
about data recovery.

Regards, Frank.