Re: Detecting failing drives - ZFS carries on regardless

From: Alejandro Imass <aimass_at_yabarana.com>
Date: Tue, 18 Feb 2025 17:16:52 UTC
On Tue, Feb 18, 2025 at 11:41 AM Carl Johnson <carlj@peak.org> wrote:

> Frank Leonhardt <freebsd-doc@fjl.co.uk> writes:
>
> > As an example, the following is NOT ENOUGH for ZFS to fail a drive,
> > because FreeBSD doesn't offline it. Would you want a drive like this
> > in a production environment?
> >
> > Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00
> > 02 c1 38 00 00 08 00 00
> > Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI
> > Status Error
> > ...
>
> I haven't used it, but you might want to look into zfsd and see if it
> will help.  It is supposed to watch for errors and handle them.
>
>
Great thread.

I am currently implementing a huge RAIDZ2 NAS and wondering a lot about
this...

Does SMART monitoring help in these scenarios?
Or does ZFS checksumming and scrubbing detect data corruption before SMART
does?
In that case zfsd is the better choice ? or should you still monitor and
warn on SMART counters ?
or both?

Best,

-- 
Alex