Musings on ZFS Backup strategies

Fri Mar 1 14:25:08 UTC 2013

Dabbling with ZFS now, and giving some thought to how to handle backup
strategies.

ZFS' snapshot capabilities have forced me to re-think the way that I've
handled this.  Previously near-line (and offline) backup was focused on
being able to handle both disasters (e.g. RAID adapter goes nuts and
scribbles on the entire contents of the array), a double-disk (or worse)
failure, or the obvious (e.g. fire, etc) along with the "aw crap, I just
rm -rf'd something I'd rather not!"

ZFS makes snapshots very cheap, which means you can resolve the "aw
crap" situation without resorting to backups at all.  This turns the
backup situation into a disaster recovery one.

And that in turn seems to say that the ideal strategy looks more like:

Take a base snapshot immediately and zfs send it to offline storage.
Take an incremental at some interval (appropriate for disaster recovery)
and zfs send THAT to stable storage.

If I then restore the base and snapshot, I get back to where I was when
the latest snapshot was taken.  I don't need to keep the incremental
snapshot for longer than it takes to zfs send it, so I can do:

zfs snapshot pool/some-filesystem at unique-label
zfs send -i pool/some-filesystem at base pool/some-filesystem at unique-label
zfs destroy pool/some-filesystem at unique-label

and that seems to work (and restore) just fine.

Am I looking at this the right way here?  Provided that the base backup
and incremental are both readable, it appears that I have the disaster
case covered, and the online snapshot increments and retention are
easily adjusted and cover the "oops" situations without having to resort
to the backups at all.

This in turn means that keeping more than two incremental dumps offline
has little or no value; the second merely being taken to insure that
there is always at least one that has been written to completion without
error to apply on top of the base.  That in turn makes the backup
storage requirement based only on entropy in the filesystem and not time
(where the "tower of Hanoi" style dump hierarchy imposed both a time AND
entropy cost on backup media.)

Am I missing something here?

(Yes, I know, I've been a ZFS resister.... ;-))

-- 
-- Karl Denninger
/The Market Ticker ®/ <http://market-ticker.org>
Cuda Systems LLC