Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>
- Reply: Tomek CEDRO : "Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>"
- In reply to: Sad Clouds : "Re: zfs corruption at zroot/usr/home:<0x0>"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 12 Nov 2025 14:18:54 UTC
On 12/11/2025 09:25, Sad Clouds wrote: > On Wed, 12 Nov 2025 03:58:34 +0100 > Tomek CEDRO <tomek@cedro.info> wrote: > >> Hello world :-) >> >> On 14.3-RELEASE-p5 amd64 I have encountered a kernel panic (will >> report on bugzilla in a moment). After that I found some sites did not >> load in a web browser, so my first guess was to try zpool status -v >> and I got this: >> >> errors: Permanent errors have been detected in the following files: >> zroot/usr/home:<0x0> >> >> Any guess what does the <0x0> mean and how to fix the situation? >> It should be a file name right? >> >> zpool scrub and resilver did not help :-( >> >> Will rolling back a snapshot fix the problem? >> >> Any hints appreciated :-) >> Tomek >> >> -- >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info >> > Hi, I'm not a ZFS expert, but I wonder if this error is related to some > of the ZFS internal objects, rather than the file data blocks being > corrupted. In which case, ZFS may not be able to correctly repair it? > > I'm currently evaluating ZFS on FreeBSD for some of my storage needs > and your report is a bit concerning. Are you able to share the details > on the I/O workloads and the storage geometry you use? Do you have more > info on the kernel panic message or backtraces? > > If you put it all in the bug, then can you please share the bug ID? > > Thanks. I suspect <0x0> refers to the object number within a dataset, with zero being metadata. A permanent error is bad news. zpool scrub doesn't fix any errors - well not exactly. It tries to read everything and if it finds an error, it'll fix it. If you encounter an error outside of a scrub it'll fix it anyway, if it can. The point of a scrub is to ensure all your data is readable even if it hasn't been read it a while. This is most likely down to a compound hardware failure - with flaky drives it's still possible to lose both copies and not know about it (hence doing scrubs). My advice would be to back up what's remaining to tape ASAP before anything else. You can sometimes rollback to an earlier version of a dataset - take a look and see if it's readable (i.e. mount it or look for it in the .zfs directory). One good way is to use "zfs clone zroot/usr/home@snapshotname zroot/usr/home_fingerscrossed" ZFS is NOT great at detecting failed drives. I'm currently investigating this for my blog (and it was the subject of a question I posted here in about Jan to which the answer was "hmm"). zfsd, however, does monitor drive health using devctl and might pick up impending drive failures before you get to this stage. I'm going through the source code now to convince myself it works (it's written in C++ and appears to be influenced by Design Patters so it's not exactly clear!) If the metadata for a dataset is unrecoverable you'll need to destroy and recreate the dataset from a backup. HOWEVER, I'd be investigating the health of the drives. dd them to /dev/null and see what you get. You can actually do this while ZFS is using them. Also check the console log for CAM messages - if it's got to that stage you really need to think about data recovery. Regards, Frank.