ZFS dedup and replication

Thu Dec 1 16:42:55 UTC 2011

On 1 December 2011 13:03, Peter Maloney
<peter.maloney at brockmann-consult.de>wrote:

> On 12/01/2011 11:20 AM, krad wrote:
> > On 28 November 2011 23:01, Techie <techchavez at gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> Is there any plans to implement sharing of the ZFS DDT Dedup table or
> >> to make ZFS aware of the destination duplicate blocks on a remote
> >> system?
> >>
> >> >From how I understand it, the zfs send/recv stream does not know about
> >> the duplicated blocks on the receiving side when using zfs send -D -i
> >> to sendonly incremental changes.
> >>
> >> So take for example I have an application that I backup each night to
> >> a ZFS file system. I want to replicate this every night to my remote
> >> site. Each night that I back up I create a tar file on the ZFS data
> >> file system. When I go to send an incremental stream it sends the
> >> entire tar file to the destination even though over 90% of those
> >> blocks already exist at the destination.. Is there any plans to make
> >> ZFS aware of what exists already at the destination site to eliminate
> >> the need to send duplicate blocks over the wire? zfs send -D I believe
> >> only eliminates the duplicate blocks within the stream.
> >>
> >> Perhaps I am wrong..
> >>
> >>
> >> Thanks
> >> Jimmy
> >> _______________________________________________
> >> freebsd-fs at freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> >>
> >
> > Why tar up the stuff? Just do a zfs snap and then you bypass the whole
> > issue?
> I was thinking the same thing when I read his message. I don't
> understand it either.
>
> On my system with 12 TiB used up, what I do in a script is basically:
>
> -generate a snap name
> -make a recursive snapshot
> -ssh to the remote server and compare snapshots (find the latest common
> snapshot, to find an incremental reference point)
> -if a usable reference point exists, start the incremental send like
> this (which wipes all changes on the remote system without confirmation):
>        zfs send -R -I ${destLastSnap} ${srcLastSnap} | ssh ${destHost}
> zfs recv -d -F -v ${destPool}
> -and if no usable reference point existed, then do a full send,
> non-incremental:
>        zfs send -R ${srcLastSnap} | ssh ${destHost} zfs recv -F -v
> ${destDataSet}
>
>
> The part about finding the reference snapshot is the most complicated
> part of my script, and missing from anything else I found online when I
> was looking for a good solution. For example this script:
> http://blogs.sun.com/clive/resource/zfs_repl.ksh
> found on this page:
> http://blogs.oracle.com/clive/entry/replication_using_zfs
> was found to be quite terrible, and would fail completely when there was
> a new dataset, or a snapshot missing for some reason. So I suggest you
> look at that one, but write your own.
>
> The only time my script failed is when there was a zfs bug; the same one
> seen here:
>
> http://serverfault.com/questions/66414/cannot-destroy-zfs-snapshot-dataset-already-exists
> so I just deleted the clone manually and it worked again.
>
> I thought gzip could save a small amount of time, eg.
> I compared speed
> of "zfs send ....                           | ssh zfs recv ..."
> to "zfs send ... | gzip -c | ssh 'gunzip -c | zfs recv...'"
> and found not much or no difference.
> But I have no idea why you would use tar.
>
> And just to confirm, I have the same problems with dedup causing severe
> bottlenecks on many things, especially zfs recv and scrub, even though I
> have 48 GB of memory installed and 44 available to ZFS.
>
> But I find incremental sends to be very efficient, taking much less than
> a minute (depending on how much data was changed) when it runs every
> hour. And unless your bandwidth is slow and precious, I recommend
> sending more than daily, because it is very fast if done often enough. I
> send hourly because I didn't have time to work on some scripts to clean
> up the old snapshots. Otherwise I would do it every 15 min or maybe 15
> seconds ;)
>
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
>
> --
>
> --------------------------------------------
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.maloney at brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> --------------------------------------------
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>

sounds like we have been through very similar experiences