HAST + ZFS self healing? Hot spares?

Per von Zweigbergk pvz at itassistans.se
Fri May 20 00:11:12 UTC 2011


20 maj 2011 kl. 01.27 skrev Per von Zweigbergk:

> You're describing taking the entire array offline while you perform work on it.

My apologies, I was a bit too quick reading what you (Freddie Cash) wrote.

What you're describing is relying on ZFS's own redundancy while you replace the failed disk, bringing down the entire HAST resource just so you can replace one of the two failed disks. The only reason the ZFS array continues to function is because it's redundant in ZFS itself.

Ideally, the HAST resource could continue to remain operational while the failed disk was replaced. After all, it can remain operational while the primary disk has failed, and it can remain operational while the data is being resynchronized, so why would the resource need to be brought down just to transition between these two states?

I guess it's because HAST isn't quite "finished" yet feature-wise, and that particular feature does not yet exist.

Still, I suppose this is good enough, this just shows that raidz:ing together a bunch of HAST mirrors solves one and a half of my operational problems - replacing failed drives (by momentarily downing the whole HAST resource while work is being done) and providing checksumming capability (although not self-healing).

The setup described (a bunch of HAST mirrors in a raidz) will not self-heal entirely. Imagine if a bit error occurred while writing to one of the secondary disks. Since that data is never read by ZFS or HAST, the error would remain undetected. To ensure data integrity on both the primary and secondary servers, you'd have to failover the servers once every N days/weeks/months (depending on your operational requirements) and perform a zfs scrub on "both sides" of the HAST resource, as part of regular maintenance. It'd probably even be scriptable, assuming you can live with a few seconds of scheduled downtime during the switchover.


More information about the freebsd-fs mailing list