ZFS dedup and replication

Sat Dec 3 18:16:03 UTC 2011

On 3 December 2011 05:29, Techie <techchavez at gmail.com> wrote:

> Hey Peter,
>
> Thanks for your informative response.
>
> This is odd, I have been waiting for a response to this question for a
> few days and these messages just came through..I am glad they did.
>
> Anyhow please allow me to explain the whole "tar" thing. I regret it
> because no one even addressed the DDT, Dedup Table part of it.
>
> You see I wanted to use ZFS as a deduplication disk target for my
> backup applications and use the native replication capabilities of ZFS
> to replicate the virtual backup cartridges. All modern backup apps
> leverage disk as a backup target but some don't do replication.
>
> My idea was to use ZFS to do this. However after testing I came to the
> realization that ZFS deduplication is NOT ideal for "deduping" third
> party  backup streams. From what I read this is due to the fact that
> backup applications put their own metadata in the streams and throw
> off the block alignment. Products like Data Domain and Quantum DXi use
> a variable length block and are developed towards deduplicating backup
> application streams. ZFS does OK but nothing in comparison to the
> dedup ratio seen on these aforementioned appliances. I used tar as an
> example and should have been more specific. I understand what you are
> saying about replicating every 15 minutes etc.. However since backup
> application create huge files, an incremental send would need to send
> the newly created huge file..At least that is how I understand, I may
> be correctly. In my testing this was the case but perhaps my syntax
> was not correct.
>
> In any case deduplication appliances when replicating only send the
> changed blocks that don't exist on the target side. To do this they
> have to have knowledge of what exists in the target side "block pool",
> "dedup hash table",or whatever it may be called.
>
> >From what I understand a ZFS file system on the source side has no
> idea of what exists on the target side. I also understand and maybe
> incorrectly, that the zfs send -D only eliminates duplicate blocks in
> the stream it is sending and does not account for a block that may
> already exist at the target.
>
> As an example let's say I am using a backup app like Amanda.. I do a
> full backup every day to a ZFS based disk target..Every day after the
> backup completes I do a -- "zfs send -D -i {snap_yesterday}
> {snap_today} | ssh DESTINATION zfs recv DEST_FS". Now each day's full
> backup will only have maybe a 1% change rate and this will be
> reflected on the source side file system. So if I had 5 days of 2 GB
> full backups, the source file system will show maybe 3GB Alloc in the
> zpool list output. However since the source does not know about
> duplicate blocks on the target side from yesterday's backup, it sends
> the entire 2GB full backup from today only removing any duplicate
> blocks that exist in the stream it is sending. The difference with a
> dedup appliance is that it is aware of duplicate blocks on the target
> side and won't send them.
>
> This is the reason my original question was asking if there were any
> plans to implement a "global DDT" or dedup table to make the target
> aware of the destination duplicate blocks so that only unique blocks
> are transferred.
>
> Am I incorrect in my understanding of the ZFS DDT being unique to each
> ZFS file system/pool?
>
>
> Thanks
> Jimmy
>
> On Fri, Dec 2, 2011 at 12:04 PM, Peter Maloney
> <peter.maloney at brockmann-consult.de> wrote:
> > Am 02.12.2011 15:50, schrieb Jeremy Chadwick:
> >> On Fri, Dec 02, 2011 at 03:27:03PM +0100, Michel Le Cocq wrote:
> >>> it's just me or there is no attachment ?
> >> The mailing list stripped the attachment.  The previous individual will
> >> need to put it up on the web somewhere.
> >>
> > It is possible that I forgot to attach it. I assumed it would be
> > stripped off but the ones in the to/cc would get it.
> >
> > Here it is on the company website:
> >
> > http://www.brockmann-consult.de/peter2/zfs.tgz
> >
> >
> >
> > Disclaimer/notes:
> > -provided as is... might destroy your system, furthermore, I am not
> > responsible for bodily injury nor nuclear war that may result from misuse
> > -there are no unit tests, and no documentation other than a few comments
> > that are possibly only coherent when I read them. For example, it says
> > that it does it recursively and rolls back the destination dataset, but
> > there are a few undocumented cases I can't remember when I needed to do
> > something manual like delete a snapshot, or destroy a dataset. Maybe
> > that is all in the past. I don't know.
> > -the zfs_repl2.bash is the one that makes snapshots and replicates which
> > I wrote myself. The other ksh one is the Oracle one I linked above, and
> > the .sh version of it was just what I was working on to try to make it
> > work reliably, before redoing it all myself (reinventing the wheel is
> > indeed fun).
> > -especially beware of the deleteOldSnapshots.bash which is not well
> > tested and not used yet (and deleteEmptySnapshots.bash which does not
> > work and I believe cannot work).
> > -granted transferable your choice of any present or future version of
> > the BSD or GPL license
> >
> > and another note, I meant to study these which might be better versions
> > of the same thing, or something different, but never got around to it:
> >    /usr/ports/sysutils/zfs-replicate/
> >    /usr/ports/sysutils/zfsnap/
> >    /usr/ports/sysutils/zfs-periodic
> >
> >
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>

What we do at work is keep every things simple, and rsync all files off the
box to the archiver box and at the end of every rsync run snap the
filesystem for that backup client. This provides a very efficient
incremental forever solution and works with all unix hosts, zfs aware or
not.

There are some cases where we dont use the rsync though. One of these would
be mysql. Rsyncing the data dir would be pointless as it would result
in inconsistent backups. As all our mysql server are zfs based, we have a
script that either stops the slave, or issues a global write lock depending
on whether database is a slave or master, it then flushes and then snaps
the mysql-data fs and removes the lock or restarts the slave. We then just
to a zfs incremental send on the data filesystem. Again this is very
efficient.

More importantly though this solution allowed us reduce our netbackup costs
considerably as we reduced the number of clients from many to not many, as
the only clients afterwards were the archivers, which we dumped to tape.
This was more for insurance purposes though rather than any operational
reason