Musings on ZFS Backup strategies

Sat Mar 2 15:24:28 UTC 2013

On Fri, 01 Mar 2013 18:55:22 +0100, Volodymyr Kostyrko <c.kworr at gmail.com>  
wrote:

> 01.03.2013 16:24, Karl Denninger:
>> Dabbling with ZFS now, and giving some thought to how to handle backup
>> strategies.
>>
>> ZFS' snapshot capabilities have forced me to re-think the way that I've
>> handled this.  Previously near-line (and offline) backup was focused on
>> being able to handle both disasters (e.g. RAID adapter goes nuts and
>> scribbles on the entire contents of the array), a double-disk (or worse)
>> failure, or the obvious (e.g. fire, etc) along with the "aw crap, I just
>> rm -rf'd something I'd rather not!"
>>
>> ZFS makes snapshots very cheap, which means you can resolve the "aw
>> crap" situation without resorting to backups at all.  This turns the
>> backup situation into a disaster recovery one.
>>
>> And that in turn seems to say that the ideal strategy looks more like:
>>
>> Take a base snapshot immediately and zfs send it to offline storage.
>> Take an incremental at some interval (appropriate for disaster recovery)
>> and zfs send THAT to stable storage.
>>
>> If I then restore the base and snapshot, I get back to where I was when
>> the latest snapshot was taken.  I don't need to keep the incremental
>> snapshot for longer than it takes to zfs send it, so I can do:
>>
>> zfs snapshot pool/some-filesystem at unique-label
>> zfs send -i pool/some-filesystem at base pool/some-filesystem at unique-label
>> zfs destroy pool/some-filesystem at unique-label
>>
>> and that seems to work (and restore) just fine.
>
> Yes, I'm working with backups the same way, I wrote a simple script that  
> synchronizes two filesystems between distant servers. I also use the  
> same script to synchronize bushy filesystems (with hundred thousands of

Your filesystems grow a lot of hair? :-)

> files) where rsync produces a too big load for synchronizing.
>
> https://github.com/kworr/zfSnap/commit/08d8b499dbc2527a652cddbc601c7ee8c0c23301
>
> I left it where it was but I was also planning to write some purger for  
> snapshots that would automatically purge snapshots when pool gets low on  
> space. Never hit that yet.
>
>> Am I looking at this the right way here?  Provided that the base backup
>> and incremental are both readable, it appears that I have the disaster
>> case covered, and the online snapshot increments and retention are
>> easily adjusted and cover the "oops" situations without having to resort
>> to the backups at all.
>>
>> This in turn means that keeping more than two incremental dumps offline
>> has little or no value; the second merely being taken to insure that
>> there is always at least one that has been written to completion without
>> error to apply on top of the base.  That in turn makes the backup
>> storage requirement based only on entropy in the filesystem and not time
>> (where the "tower of Hanoi" style dump hierarchy imposed both a time AND
>> entropy cost on backup media.)
>
> Well, snapshots can pose a value in a longer timeframe depending on  
> data. Being able to restore some file accidentally deleted two month ago  
> already saved 2k$ for one of our customers.