Musings on ZFS Backup strategies

Fri Mar 1 17:55:33 UTC 2013

01.03.2013 16:24, Karl Denninger:
> Dabbling with ZFS now, and giving some thought to how to handle backup
> strategies.
>
> ZFS' snapshot capabilities have forced me to re-think the way that I've
> handled this.  Previously near-line (and offline) backup was focused on
> being able to handle both disasters (e.g. RAID adapter goes nuts and
> scribbles on the entire contents of the array), a double-disk (or worse)
> failure, or the obvious (e.g. fire, etc) along with the "aw crap, I just
> rm -rf'd something I'd rather not!"
>
> ZFS makes snapshots very cheap, which means you can resolve the "aw
> crap" situation without resorting to backups at all.  This turns the
> backup situation into a disaster recovery one.
>
> And that in turn seems to say that the ideal strategy looks more like:
>
> Take a base snapshot immediately and zfs send it to offline storage.
> Take an incremental at some interval (appropriate for disaster recovery)
> and zfs send THAT to stable storage.
>
> If I then restore the base and snapshot, I get back to where I was when
> the latest snapshot was taken.  I don't need to keep the incremental
> snapshot for longer than it takes to zfs send it, so I can do:
>
> zfs snapshot pool/some-filesystem at unique-label
> zfs send -i pool/some-filesystem at base pool/some-filesystem at unique-label
> zfs destroy pool/some-filesystem at unique-label
>
> and that seems to work (and restore) just fine.

Yes, I'm working with backups the same way, I wrote a simple script that 
synchronizes two filesystems between distant servers. I also use the 
same script to synchronize bushy filesystems (with hundred thousands of 
files) where rsync produces a too big load for synchronizing.

https://github.com/kworr/zfSnap/commit/08d8b499dbc2527a652cddbc601c7ee8c0c23301

I left it where it was but I was also planning to write some purger for 
snapshots that would automatically purge snapshots when pool gets low on 
space. Never hit that yet.

> Am I looking at this the right way here?  Provided that the base backup
> and incremental are both readable, it appears that I have the disaster
> case covered, and the online snapshot increments and retention are
> easily adjusted and cover the "oops" situations without having to resort
> to the backups at all.
>
> This in turn means that keeping more than two incremental dumps offline
> has little or no value; the second merely being taken to insure that
> there is always at least one that has been written to completion without
> error to apply on top of the base.  That in turn makes the backup
> storage requirement based only on entropy in the filesystem and not time
> (where the "tower of Hanoi" style dump hierarchy imposed both a time AND
> entropy cost on backup media.)

Well, snapshots can pose a value in a longer timeframe depending on 
data. Being able to restore some file accidentally deleted two month ago 
already saved 2k$ for one of our customers.

-- 
Sphinx of black quartz, judge my vow.