Musings on ZFS Backup strategies

Fri Mar 1 15:29:19 UTC 2013

On Fri, Mar 1, 2013 at 6:06 AM, Ronald Klop <ronald-freebsd8 at klop.yi.org> wrote:
> On Fri, 01 Mar 2013 15:24:53 +0100, Karl Denninger <karl at denninger.net>
> wrote:
>
>> Dabbling with ZFS now, and giving some thought to how to handle backup
>> strategies.
>>
>> ZFS' snapshot capabilities have forced me to re-think the way that I've
>> handled this.  Previously near-line (and offline) backup was focused on
>> being able to handle both disasters (e.g. RAID adapter goes nuts and
>> scribbles on the entire contents of the array), a double-disk (or worse)
>> failure, or the obvious (e.g. fire, etc) along with the "aw crap, I just
>> rm -rf'd something I'd rather not!"
>>
>> ZFS makes snapshots very cheap, which means you can resolve the "aw
>> crap" situation without resorting to backups at all.  This turns the
>> backup situation into a disaster recovery one.
>>
>> And that in turn seems to say that the ideal strategy looks more like:
>>
>> Take a base snapshot immediately and zfs send it to offline storage.
>> Take an incremental at some interval (appropriate for disaster recovery)
>> and zfs send THAT to stable storage.
>>
>> If I then restore the base and snapshot, I get back to where I was when
>> the latest snapshot was taken.  I don't need to keep the incremental
>> snapshot for longer than it takes to zfs send it, so I can do:
>>
>> zfs snapshot pool/some-filesystem at unique-label
>> zfs send -i pool/some-filesystem at base pool/some-filesystem at unique-label
>> zfs destroy pool/some-filesystem at unique-label
>>
>> and that seems to work (and restore) just fine.
>>
>> Am I looking at this the right way here?  Provided that the base backup
>> and incremental are both readable, it appears that I have the disaster
>> case covered, and the online snapshot increments and retention are
>> easily adjusted and cover the "oops" situations without having to resort
>> to the backups at all.
>>
>> This in turn means that keeping more than two incremental dumps offline
>> has little or no value; the second merely being taken to insure that
>> there is always at least one that has been written to completion without
>> error to apply on top of the base.  That in turn makes the backup
>> storage requirement based only on entropy in the filesystem and not time
>> (where the "tower of Hanoi" style dump hierarchy imposed both a time AND
>> entropy cost on backup media.)
>>
>> Am I missing something here?
>>
>> (Yes, I know, I've been a ZFS resister.... ;-))
>>
>
> I do the same. I only use zfs send -I (capital i) so I have all the
> snapshots on the backup also.
> That way the data survives an oops (rm -r) and a fire at the same time. :-)

Concur.  There are "disasters" that are not obvious until some time
has passed -- such as security breaches, application problems that
cause quiet data corruption, etc.

I do not know how a live ZFS filesystem could be manipulated by an
intruder, but the possibility is there.

-- 
Royce Williams