ZFS cpu requirements, with/out compression and/or dedup

Tue Sep 22 10:32:15 UTC 2015

or far more easily do an rsync from the old to the new with the
remove-source-files option, and then drop the old dataset at the end

On 21 September 2015 at 22:41, Matthew Seaman <matthew at freebsd.org> wrote:

> On 21/09/2015 22:13, Peter Jeremy wrote:
> > In general, the downsides of dedup outweigh the benefits.  If you already
> > have the data in ZFS, you can use 'zdb -S' to see what effect rebuilding
> > the pool with dedup enabled would have - how much disk space you will
> save
> > and how big the DDT is (and hence how much RAM you will need).  If you
> can
> > afford it, make sure you keep good backups, enable DDT and be ready to
> nuke
> > the pool and restore from backups if dedup doesn't work out.
>
> Nuking the entire pool is a little heavy handed.  Dedup can be turned on
> and off on a per-ZFS basis.  If you've a ZFS that had dedup enabled, you
> can remove the effects by zfs send / zfs recv to create a pristine
> un-deduped copy of the data, destroy the original zfs and rename the new
> one to take its place.  Of course, this depends on your having enough
> free space in the pool to be able to duplicate (and then some) that ZFS.
>
> Failing that, you might be able to 'zpool split' if your pool is
> composed entirely of mirrors.  So long as you're able to do without
> resilience for a while this basically doubles the space you have
> available to play with.  You can then destroy the contents of one of the
> split zpools, and zfs send the data over from the other split pool.
> Unfortunately there isn't a reciprocal 'zfs rejoin' command that undoes
> the splitting, so you'll have to destroy one of the splits and re-add
> the constituent devices back to restore the mirroring in the other
> split.  Which is a delicate operations and not one which is forgiving of
> mistakes.
>
> And failing that, you can start pushing data over the network, but
> that's hardly different to restoring from backup.  However, either of
> the first two choices should be significantly faster if you have large
> quantities of data to handle.
>
>         Cheers,
>
>         Matthew
>
>
>