Dell HBA, ECC reporting and ZFS ECC in zpool status

Peter Eriksson pen at lysator.liu.se
Thu Jul 2 12:38:05 UTC 2020


Exactly which Dell HBA are you using? We are using the HBA330 in Dell R730xd:s  (note: _not_ H330 with is the same hardware but running the RAID-firmware) with Dells latest firmware for it with good results and using the “mpr” driver - not “mrsas". Smartctl works fine. As does camcontrol. FreeBSD 12.1 and 11.3

If you have H330:s then you can crossflash them to look like HBA330:s. 

   https://forums.servethehome.com/index.php?threads/flash-crossflash-dell-h330-raid-card-to-hba330-12gbps-hba-it-firmware.25498/ <https://forums.servethehome.com/index.php?threads/flash-crossflash-dell-h330-raid-card-to-hba330-12gbps-hba-it-firmware.25498/>

Totally not supported by Dell of course but it works. Atleast for me :-)


H330:s can be put into a sort of limited HBA mode where it presents individual disks but we had problems with that so switching to “HBA330:s” worked much better for us.

- Peter


> On 25 Jun 2020, at 03:48, George Michaelson <ggm at algebras.org> wrote:
> 
> I have three Dell hosts, 730 and 840 series, with an LSI Dell-ized HBA.
> 
> All of them got upgraded to 12.1 recently, and then over time started
> reporting a large number of correctable ECC error states in zpool
> status.
> 
> Some of these have turned into unrecoverable errors, and on disk
> replace demanded multiple scrubs. But, not all. So the ECC report
> didn't actuall map well to "disk is failing" in a hard sense.
> 
> But reading Dell I found a web page where they 'fess up that they
> promote upward corrected ECC states in the drive in a way which *may*
> be being collected by ZFS to report errors, where there isn't actually
> a hard 'impending doom' signal coming. I don't actually know this Disk
> level ECC is what ZFs is reporting to me. I do know that I got high
> cost, ECC correction load in user space and wound up having to
> re-scrub to zpool clean repeatedly.
> 
> https://www.dell.com/support/article/en-au/sln316623/excessive-smart-error-rates-logged-for-read-and-verify-ecc-errors-on-certain-enterprise-hard-drives?lang=en
> 
> I'm very confused by what to do here. After doing some firmware
> update, and then zfs scrub I now have cleared error states in the
> zpool. and by moving to the mrsas driver I can now do SMART on the
> disks at runtime, but at a cost of not having mrtutil type HBA
> interactions: I can't mark drives into valid/good state in runtime any
> more because that control logic doesn't look to be in the mrsas
> command model. Its camcontrol.
> 
> Did something change here? the machines were on various states of 11
> and 12.0 before this and it never cropped up like this: Millions of
> ECC corrected events in zpool. We were worried enough to get
> replacement drives on order, before Dell pointed us to this web page.
> 
> BTW my track record for PBCK is very high in past times with these
> lists. If you (dear reader) push back with 'you lack clue to do the
> job at hand' I would not deny: 40 years a user doesn't make one a
> sysadmin.
> 
> -G
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list