Sudden zpool checksums errors
Date: Fri, 04 Apr 2025 15:42:26 UTC
Hello. I've got a box with two zpools: _ 1 mirror on 2 SSDs; _ 1 raidz1 on 12 HDDs. Suddenly one daily run showed the following: > pool: backup > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P > scan: scrub repaired 3.18M in 16:53:16 with 0 errors on Tue Apr 1 20:16:55 2025 > config: > > NAME STATE READ WRITE CKSUM > backup ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > da5 ONLINE 0 0 57 > da2 ONLINE 0 0 0 > da8 ONLINE 0 0 25 > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 49 > da12 ONLINE 0 0 8 > da6 ONLINE 0 0 6 > da11 ONLINE 0 0 0 > da9 ONLINE 0 0 56 > da13 ONLINE 0 0 73 > > errors: No known data errors I'm finding it hard to believe that 7 disks out of 12 are failing or just happened to misbehave all on the same day. BTW, SMART says they are OK. I'm reluctant to blame RAM (since it's ECC) and power supply (as it's redundant 2x800W). Disks are 16TB TOSHIBA MG09ACA1 connected to a MegaRAID SAS-3 3108 (of course not operating as RAID and with mrsas driver). % freebsd-version 14.2-RELEASE-p2 % zfs --version zfs-2.2.6-FreeBSD_g33174af15 zfs-kmod-2.2.6-FreeBSD_g33174af15 Is there a known ZFS bug that could explain this? I've "zpool clear"ed the errors and waiting to see if they come up again. bye & Thanks av. P.S. Also, I'm quite sure no "administrator accidentally wrote over a portion of the disk using another program" :)