Dell HBA, ECC reporting and ZFS ECC in zpool status

George Michaelson ggm at algebras.org
Thu Jun 25 01:48:58 UTC 2020


I have three Dell hosts, 730 and 840 series, with an LSI Dell-ized HBA.

All of them got upgraded to 12.1 recently, and then over time started
reporting a large number of correctable ECC error states in zpool
status.

Some of these have turned into unrecoverable errors, and on disk
replace demanded multiple scrubs. But, not all. So the ECC report
didn't actuall map well to "disk is failing" in a hard sense.

But reading Dell I found a web page where they 'fess up that they
promote upward corrected ECC states in the drive in a way which *may*
be being collected by ZFS to report errors, where there isn't actually
a hard 'impending doom' signal coming. I don't actually know this Disk
level ECC is what ZFs is reporting to me. I do know that I got high
cost, ECC correction load in user space and wound up having to
re-scrub to zpool clean repeatedly.

https://www.dell.com/support/article/en-au/sln316623/excessive-smart-error-rates-logged-for-read-and-verify-ecc-errors-on-certain-enterprise-hard-drives?lang=en

I'm very confused by what to do here. After doing some firmware
update, and then zfs scrub I now have cleared error states in the
zpool. and by moving to the mrsas driver I can now do SMART on the
disks at runtime, but at a cost of not having mrtutil type HBA
interactions: I can't mark drives into valid/good state in runtime any
more because that control logic doesn't look to be in the mrsas
command model. Its camcontrol.

Did something change here? the machines were on various states of 11
and 12.0 before this and it never cropped up like this: Millions of
ECC corrected events in zpool. We were worried enough to get
replacement drives on order, before Dell pointed us to this web page.

 BTW my track record for PBCK is very high in past times with these
lists. If you (dear reader) push back with 'you lack clue to do the
job at hand' I would not deny: 40 years a user doesn't make one a
sysadmin.

-G


More information about the freebsd-fs mailing list