Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>
Date: Wed, 12 Nov 2025 19:20:41 UTC
On Wed, Nov 12, 2025 at 3:19 PM Frank Leonhardt <freebsd-doc@fjl.co.uk> wrote: > > On 12/11/2025 09:25, Sad Clouds wrote: > > On Wed, 12 Nov 2025 03:58:34 +0100 > > Tomek CEDRO <tomek@cedro.info> wrote: > > > >> Hello world :-) > >> > >> On 14.3-RELEASE-p5 amd64 I have encountered a kernel panic (will > >> report on bugzilla in a moment). After that I found some sites did not > >> load in a web browser, so my first guess was to try zpool status -v > >> and I got this: > >> > >> errors: Permanent errors have been detected in the following files: > >> zroot/usr/home:<0x0> > >> > >> Any guess what does the <0x0> mean and how to fix the situation? > >> It should be a file name right? > >> > >> zpool scrub and resilver did not help :-( > >> > >> Will rolling back a snapshot fix the problem? > >> > >> Any hints appreciated :-) > >> Tomek > >> > >> -- > >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info > >> > > Hi, I'm not a ZFS expert, but I wonder if this error is related to some > > of the ZFS internal objects, rather than the file data blocks being > > corrupted. In which case, ZFS may not be able to correctly repair it? > > > > I'm currently evaluating ZFS on FreeBSD for some of my storage needs > > and your report is a bit concerning. Are you able to share the details > > on the I/O workloads and the storage geometry you use? Do you have more > > info on the kernel panic message or backtraces? > > > > If you put it all in the bug, then can you please share the bug ID? > > > > Thanks. > > I suspect <0x0> refers to the object number within a dataset, with zero > being metadata. A permanent error is bad news. > > zpool scrub doesn't fix any errors - well not exactly. It tries to read > everything and if it finds an error, it'll fix it. If you encounter an > error outside of a scrub it'll fix it anyway, if it can. The point of a > scrub is to ensure all your data is readable even if it hasn't been read > it a while. > > This is most likely down to a compound hardware failure - with flaky > drives it's still possible to lose both copies and not know about it > (hence doing scrubs). > > My advice would be to back up what's remaining to tape ASAP before > anything else. > > You can sometimes rollback to an earlier version of a dataset - take a > look and see if it's readable (i.e. mount it or look for it in the .zfs > directory). One good way is to use "zfs clone > zroot/usr/home@snapshotname zroot/usr/home_fingerscrossed" > > ZFS is NOT great at detecting failed drives. I'm currently investigating > this for my blog (and it was the subject of a question I posted here in > about Jan to which the answer was "hmm"). zfsd, however, does monitor > drive health using devctl and might pick up impending drive failures > before you get to this stage. I'm going through the source code now to > convince myself it works (it's written in C++ and appears to be > influenced by Design Patters so it's not exactly clear!) > > If the metadata for a dataset is unrecoverable you'll need to destroy > and recreate the dataset from a backup. HOWEVER, I'd be investigating > the health of the drives. dd them to /dev/null and see what you get. You > can actually do this while ZFS is using them. Also check the console log > for CAM messages - if it's got to that stage you really need to think > about data recovery. > > Regards, Frank. Sorry my previous response should go here :-P Hmm this is brand new NVME drive not really likely to fail. I have the same problem on zraid0 (stripe) array while initially I saw the bad file name with 3 problems (vm image) it now turned into ztuff/vm:<0x482>. Charlie Foxtrot :-( I also have 4x4TB HDD zraid2 array and this one was not affected. I have some snapshots will try to revert and see if that helps. Before that I will backup data and then try to recover what is possible after snap rollback. -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info