Musings on ZFS Backup strategies

Fri Mar 1 15:45:40 UTC 2013

On 3/1/2013 9:36 AM, dweimer wrote:
> On 03/01/2013 8:24 am, Karl Denninger wrote:
>> Dabbling with ZFS now, and giving some thought to how to handle backup
>> strategies.
>>
>> ZFS' snapshot capabilities have forced me to re-think the way that I've
>> handled this.  Previously near-line (and offline) backup was focused on
>> being able to handle both disasters (e.g. RAID adapter goes nuts and
>> scribbles on the entire contents of the array), a double-disk (or worse)
>> failure, or the obvious (e.g. fire, etc) along with the "aw crap, I just
>> rm -rf'd something I'd rather not!"
>>
>> ZFS makes snapshots very cheap, which means you can resolve the "aw
>> crap" situation without resorting to backups at all.  This turns the
>> backup situation into a disaster recovery one.
>>
>> And that in turn seems to say that the ideal strategy looks more like:
>>
>> Take a base snapshot immediately and zfs send it to offline storage.
>> Take an incremental at some interval (appropriate for disaster recovery)
>> and zfs send THAT to stable storage.
>>
>> If I then restore the base and snapshot, I get back to where I was when
>> the latest snapshot was taken.  I don't need to keep the incremental
>> snapshot for longer than it takes to zfs send it, so I can do:
>>
>> zfs snapshot pool/some-filesystem at unique-label
>> zfs send -i pool/some-filesystem at base pool/some-filesystem at unique-label
>> zfs destroy pool/some-filesystem at unique-label
>>
>> and that seems to work (and restore) just fine.
>>
>> Am I looking at this the right way here?  Provided that the base backup
>> and incremental are both readable, it appears that I have the disaster
>> case covered, and the online snapshot increments and retention are
>> easily adjusted and cover the "oops" situations without having to resort
>> to the backups at all.
>>
>> This in turn means that keeping more than two incremental dumps offline
>> has little or no value; the second merely being taken to insure that
>> there is always at least one that has been written to completion without
>> error to apply on top of the base.  That in turn makes the backup
>> storage requirement based only on entropy in the filesystem and not time
>> (where the "tower of Hanoi" style dump hierarchy imposed both a time AND
>> entropy cost on backup media.)
>>
>> Am I missing something here?
>>
>> (Yes, I know, I've been a ZFS resister.... ;-))
>
> I briefly did something like this between two FreeNAS boxes, it seemed
> to work well, but my secondary Box wasn't quite up to par hardware. 
> Combine that with the lack of necessary internet bandwidth with a
> second physical location in case of something really disastrous, like
> a tornado or fire destroying my house.  I ended up just using an eSATA
> drive dock and Bacula, with a few external drives rotated regularly
> into my office at work, rather than upgrading the secondary box.
>
> If you have the secondary box that is adequate, and either offsite
> backups aren't a concern or you have a big enough pipe to a secondary
> location that houses the backup this should work.
>
> I would recommend testing your incremental snapshot rotation, I never
> did test a restore from anything but the most recent set of data when
> I was running my setup, I did however save a weeks worth of hourly
> snapshots on a couple of the more rapidly changing data sets.
>
I rotate the disaster disks out to a safe-deposit box at the bank, and
they're geli-encrypted, so if stolen they're worthless to the thief
(other than their cash value as a drive) and if the building goes "poof"
I have the ones in the vault to recover from.  There's the potential for
loss up to the rotation time of course but that is the same risk I had
with all UFS filesystems.

I've tested the restores onto a spare box and it appears to work as
expected...

Thanks for the comments!

-- 
-- Karl Denninger
/The Market Ticker ®/ <http://market-ticker.org>
Cuda Systems LLC