Musings on ZFS Backup strategies

Fri Mar 1 15:36:44 UTC 2013

On 03/01/2013 8:24 am, Karl Denninger wrote:
> Dabbling with ZFS now, and giving some thought to how to handle backup
> strategies.
> 
> ZFS' snapshot capabilities have forced me to re-think the way that 
> I've
> handled this.  Previously near-line (and offline) backup was focused 
> on
> being able to handle both disasters (e.g. RAID adapter goes nuts and
> scribbles on the entire contents of the array), a double-disk (or 
> worse)
> failure, or the obvious (e.g. fire, etc) along with the "aw crap, I 
> just
> rm -rf'd something I'd rather not!"
> 
> ZFS makes snapshots very cheap, which means you can resolve the "aw
> crap" situation without resorting to backups at all.  This turns the
> backup situation into a disaster recovery one.
> 
> And that in turn seems to say that the ideal strategy looks more like:
> 
> Take a base snapshot immediately and zfs send it to offline storage.
> Take an incremental at some interval (appropriate for disaster 
> recovery)
> and zfs send THAT to stable storage.
> 
> If I then restore the base and snapshot, I get back to where I was 
> when
> the latest snapshot was taken.  I don't need to keep the incremental
> snapshot for longer than it takes to zfs send it, so I can do:
> 
> zfs snapshot pool/some-filesystem at unique-label
> zfs send -i pool/some-filesystem at base 
> pool/some-filesystem at unique-label
> zfs destroy pool/some-filesystem at unique-label
> 
> and that seems to work (and restore) just fine.
> 
> Am I looking at this the right way here?  Provided that the base 
> backup
> and incremental are both readable, it appears that I have the disaster
> case covered, and the online snapshot increments and retention are
> easily adjusted and cover the "oops" situations without having to 
> resort
> to the backups at all.
> 
> This in turn means that keeping more than two incremental dumps 
> offline
> has little or no value; the second merely being taken to insure that
> there is always at least one that has been written to completion 
> without
> error to apply on top of the base.  That in turn makes the backup
> storage requirement based only on entropy in the filesystem and not 
> time
> (where the "tower of Hanoi" style dump hierarchy imposed both a time 
> AND
> entropy cost on backup media.)
> 
> Am I missing something here?
> 
> (Yes, I know, I've been a ZFS resister.... ;-))

I briefly did something like this between two FreeNAS boxes, it seemed 
to work well, but my secondary Box wasn't quite up to par hardware.  
Combine that with the lack of necessary internet bandwidth with a second 
physical location in case of something really disastrous, like a tornado 
or fire destroying my house.  I ended up just using an eSATA drive dock 
and Bacula, with a few external drives rotated regularly into my office 
at work, rather than upgrading the secondary box.

If you have the secondary box that is adequate, and either offsite 
backups aren't a concern or you have a big enough pipe to a secondary 
location that houses the backup this should work.

I would recommend testing your incremental snapshot rotation, I never 
did test a restore from anything but the most recent set of data when I 
was running my setup, I did however save a weeks worth of hourly 
snapshots on a couple of the more rapidly changing data sets.

-- 
Thanks,
    Dean E. Weimer
    http://www.dweimer.net/