Re: Detecting failing drives - ZFS carries on regardless
Date: Tue, 18 Feb 2025 17:16:52 UTC
On Tue, Feb 18, 2025 at 11:41 AM Carl Johnson <carlj@peak.org> wrote: > Frank Leonhardt <freebsd-doc@fjl.co.uk> writes: > > > As an example, the following is NOT ENOUGH for ZFS to fail a drive, > > because FreeBSD doesn't offline it. Would you want a drive like this > > in a production environment? > > > > Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 > > 02 c1 38 00 00 08 00 00 > > Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI > > Status Error > > ... > > I haven't used it, but you might want to look into zfsd and see if it > will help. It is supposed to watch for errors and handle them. > > Great thread. I am currently implementing a huge RAIDZ2 NAS and wondering a lot about this... Does SMART monitoring help in these scenarios? Or does ZFS checksumming and scrubbing detect data corruption before SMART does? In that case zfsd is the better choice ? or should you still monitor and warn on SMART counters ? or both? Best, -- Alex