Re: [List] Re: zfs corruption at zroot/usr/home:<0x0>

From: Tomek CEDRO <tomek_at_cedro.info>
Date: Thu, 13 Nov 2025 15:42:49 UTC
On Thu, Nov 13, 2025 at 2:06 PM Frank Leonhardt <freebsd-doc@fjl.co.uk> wrote:
> On 12/11/2025 19:20, Tomek CEDRO wrote:
> Hmm this is brand new NVME drive not really likely to fail. I have the
> same problem on zraid0 (stripe) array while initially I saw the bad
> file name with 3 problems (vm image) it now turned into
> ztuff/vm:<0x482>. Charlie Foxtrot :-(
>
> NVME drives are known to fail early in their life if they're going to fail at all, otherwise they're quite reliable for a long time.
>
> Almost every time I've blamed ZFS in the past (and there have been quite a few occasions) it's turned out to be a hardware problem, even when it seemed okay. Testing subsequently confirmed a flaky drive or controller. A few times I haven't found conclusive proof one way or the other. I believe ZFS is just particularly good at detecting corruption - I've seen corrupted data on UFS2 over the years, but the OS doesn't notice.
> There's always the chance of a bug in the drivers, of course.

Hmm, will try to boot some diagnostics iso from the vendor to make
sure nvme drive status.. but it was and is still working fine for
several months already, it uses onboard controller and has big
heatsink installed. This is Samsung PRO 9100 2TB NVME with latest
firmware installed (I know Samsung can release nvmes that will self
destruct because of faulty firmware).

I once noticed these early errors in the raidz2 with brand new hdd wd
red disks, so I checked every single one of them with destructive
badblocks and one turned out to be faulty and was replaced quickly.
This was the only time when I saw ZFS error before. Since then I
always pass every disk with rw badblocks in several iterations even
before first use :-)

I have three zfs pools two are simple stripe (1x2TB for root, 2x2TB
for data) and one is raidz2 (4x4TB for data).  I am sure this was
caused by two kernel panics I have triggered by hand during tests. Not
sure if this a "driver" bug because I was source of the problem but it
there is a place for improvement in ZFS then I just found one.. I
would rather prefer to have lost last write than inconsistent
filesystem with unknown location after :-P

Only raidz2 is unaffected because it had some additional data to
restore content. Now I understand why the "lost" 8TB space is required
:D The other two pools were in active use during panic thus the data
loss. I will replace old zfs stripe and add two disks to the raidz2 at
first occasion when some cash jumps in :-)

With UFS2 I not only always had filesystem corruption on kernel panic
but as you say there were hidden corruption problems that fsck could
not catch. ZFS is like a dream here.. and look the problem only
happened in some known dataset so I can restore only the dataset not
the whole disk :-)

> And this is why (as mentioned elsewhere) I do a last-ditch backup of files to tape using tar!
> ZFS is sold as a magic never-lose-data filing system. It's good, but it can't work miracles on flaky hardware. IME, when it goes, it goes.

Yes, I will re-enable my auto zfs snap now in the cron to have at
least one month of snapshots created every week/day auto created and
then zfs export. I had this running but got too confident disabled it
and look it would help now :-P

I am using BluRay disks for backups these are bigger and faster (also
EMP reistant?) than tape but I really admire the tape approach :-)
Just bought several Sony's BD-RE XL 100GB (rewritable) and also have
some BD-RE DL (50GB) but these are slow to write (2x 36Mbps). BD-R and
BD-R DL are a lot faster to write (i.e. 6..12x) but one time only..
and I have BD-R XL 128GB with 4x write. I also got DVD-RAM (2..5x
write speed still slower than 6x DVD-RW) that can store small portions
of data quickly in theory (good for logs) but FreeBSD's UDF support
ends at 1.50 while 2.60 is required for true random access, and I did
not manage to udfclient to provide random read/write so multisession
is the only way for now. Also good disk burner with firmware that
supports these disks is required not all can even read them and write
speed is important too it if takes 12h or quarter of that.

> Good luck with recovering the snapshot.
> Regards, Frank.

Thank you Frank!! For now I am backing up current data to the disks it
takes some time will report back after simple snapshot rollback :-)

Tomek

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info