HAST + ZFS self healing? Hot spares?

Wed May 18 08:37:59 UTC 2011

On 2011-05-18 09:59, Daniel Kalchev wrote:
> Your idea is to have hot standby server, to replace the primary, 
> should the primary fail (hardware-wise)?
> You need probably CAPR in addition to HAST in order to maintain the 
> same shared IP address.
Yes, CARP would be required to handle the actual failover.
>> Initially, my thoughts land on simply creating HAST resources for the 
>> corresponding pairs of disks and SSDs in servers A and B, and then 
>> using these HAST resources to make up the ZFS pool.
> This would be the most natural decision, especially if you have 
> identical hardware on both servers. Let's call this variant 1.
>
> Variant 2, would be to create local ZFS pools (as you already have) 
> and then create ZVOLs there, that are managed by HAST. Then, you will 
> use the HAST provider  for whatever storage needs you have. 
> Performance might not be what you expect and you need to trust HAST 
> for the checksuming.
This is a really neat idea, and it is going to be a ton easier to 
configure than anything else.

This would mean that you'd be running a stack looking like:
- ZFS on top of:
- One HAST resource on top of:
- Two ZVOLs, each on top of:
- ZFS on top of:
- Local storage (mirrored by zfs)

This still means data will be mirrored twice - stored on 4 HDDs, though, 
but the configuration will be a ton cleaner than managing a 20-resource 
HAST configuration monstrosity.

It would be an option to run VMFS on top exporting it over ISCSI rather 
than running ZFS on top exporting it over NFS. I have a feeling that 
might be less overhead in the end. Although it's less convenient from a 
management point of view (unless FreeBSD has gained the ability to mount 
VMFS while I wasn't looking)
>> 2. ZFS self-healing. As far as I understand it, ZFS does 
>> self-healing, in that all data is checksummed, and if one disk in a 
>> mirror happens to contain corrupted data, ZFS will re-read the same 
>> data from the other disk in the ZFS mirror. I don't see any way this 
>> could work in a configuration where ZFS is not mirroring itself, but 
>> rather, running on top of HAST, currently. Am I wrong about this? Or 
>> is there any way to achieve this same self-healing effect except with 
>> HAST?
> HAST is simple mirror. It only makes sure blocks on the local and 
> remove drives contains the same data. I do not believe it has strong 
> enough checksuming to compare with ZFS. Therefore, your best bet is to 
> use ZFS on top of HAST for best data protection.
Does it actually make sure the blocks on the local and remote drives 
contain the same data, though? I don't remember reading anything about a 
cross-check between the two drives in case of data corruption like ZFS 
does. Although in your described "variant 2" this won't be a problem.
> In your example, you will need to create 20 HAST resources, out of 
> each disk. Then create ZFS on top of these HAST resources. ZFS will 
> then be able to heal itself in case there are inconsistencies with 
> data on the HAST resources (for whatever reason).
>
> Some reported they used HAST for the SLOG as well. I do not know if 
> using HAST for the L2ARC makes any sense. On failure you will import 
> the pool on the slave node and this will wipe the L2ARC anyway.
Yes, running HAST on L2ARC doesn't make much sense, I'd have to run HAST 
on the ZIL though if I opted for Variant 1 (which I don't think I will).
>> I mean, ideally, ZFS would have a really neat synchronous replication 
>> feature built into it. Or ZFS could be HAST-aware, and know how to 
>> ask HAST to bring it a copy of a block of data on the remote block 
>> device in a HAST mirror in case the checksum on the local block 
>> device doesn't match. Or HAST would itself have some kind of 
>> block-level checksums, and do self-healing itself. (This would 
>> probably be the easiest to implement. The secondary site could even 
>> continually be reading the same data as the primary site is, merely 
>> to check the checksums on disk, not to send it over the wire. It's 
>> not like it's doing anything else useful with that untapped read 
>> performance.)
> With HAST, no (hast) storage providers exist on the secondary node. 
> Therefore, you cannot do any I/O on the secondary node, until it 
> becomes primary.
I did not mean accessing any of the storage on the secondary node 
itself, I meant accessing the blocks *as stored on the secondary node* 
on the primary node.

HAST will already do this in case of a read error on the primary node.