Best practice for high availability ZFS pool

Tue May 17 14:47:02 UTC 2016

> On 17 may 2016 at 15:24, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> 
> On Tue, 17 May 2016, Ben RUBSON wrote:
>>> 
>>> Without completely isolated systems there is always the risk of total failure.  Even with zfs send there is the risk of total failure if the sent data results in corruption on the receiving side.
>> 
>> In this case rollback one of the previous snapshots on the receiving side ?
>> Did you mean the sent data can totally brake the receiving pool making it unusable / unable to import ? Did we already see this ?
> 
> There is at least one case of zfs send propagating a problem into the receiving pool. I don't know if it broke the pool.  Corrupt data may be sent from one pool to another if it passes checksums.

Do you have any link to this problem ? Would be interesting to know if it was possible to come-back to a previous snapshot / consistent pool.

I think that making ZFS send/receive has a higher security level than mirroring to a second (or third) JBOD box.
With mirroring you will still have only one ZFS pool.
With send/receive, you have a second / different ZFS pool / data "envelope", which could (I think) mitigate the "chance" of a broken / dead pool.
Mirror over 2 different JBOD boxes, and send/receive to a third one, is I think a nice solution.

However, if send/receive makes the receiving pool the exact 1:1 copy of the sending pool, then the thing which made the sending pool to corrupt could reach (and corrupt) the receiving pool...
I don't know whether or not this could occur, and if ever it occurs, if we have the chance to revert to a previous snapshot, at least on the receiving side...

Ben