Storing revisions of large files using ZFS snapshots

Tue May 31 20:42:43 UTC 2011

I'm currently looking at the option of using a FreeBSD server using ZFS to store offsite backups.

The primary backup product used (Veeam Backup & Replication) stores its backups in what's called reverse-incremental mode. Basically, this means storing backups as a huge VBK file (one for each job) containing a deduplicated and compressed dump of all the virtual machine files being backed up. The system will also store what are known as "reverse incrementals", i.e. anything it overwrites on a backup pass will be preserved in a file, similar to a traditional incremental backup, except in the other direction.

Since this product does not have any real solutions for offsite backup replication, after considering a few different options, I'm seriously considering using a combination of ZFS snapshots and rsync.

Basically what would happen is that every night after the backup completes, rsync would be run, synchronizing over the differences between the synthetic full backup from the previous day. Historic copies of the full backup images as synchronized by rsync would be kept using ZFS snapshots. After our retention window closes, I'd just nuke the oldest snapshots from the server.

We're talking about a file that's around 1 TB big or so even after the backup software does its own inline compression and deduplication (and is likely to grow bigger as our environment grows) which is kind of impractical to send in its entirety even over our current 100 Mbit/s leased line to our datacenter. 

First of all, will ZFS will do copy-on-write on a block level when it comes to snapshots, or is copy-on-write on ZFS snapshots done on a whole-file level? It would seem that block-level COW would be required for this to even have a chance of working. Please note that I'm not talking about deduplication in ZFS itself, but rather using snapshots as a means to perform a crude kind of deduplication.

Second, are there any other caveats that I'm likely to run into as I go down this path for storing backups?

Obviously, I'd prefer just trucking over plain old incremental backups, and doing a consolidation job off-site, but the backup software doesn't have any image management software that could consolidate a full backup plus its incrementals into a synthetic full backup. It'll only do it as part of a backup job. Grmbl. But then I wouldn't get to play with the idea of actually storing full backup images for every restore point using filesystem level snapshots. :)