ZFS cpu requirements, with/out compression and/or dedup

Mon Sep 21 21:13:52 UTC 2015

On 2015-Sep-21 13:50:38 +0100, krad <kraduk at gmail.com> wrote:
>"It's also 'permanent' in the sense that you have to turn it on with the
>> creation of a dataset and can't disable it without nuking said dataset. "
>
>This is completely untrue,  there performance issues with dedup are limited
>to writes only, as it needs to check the DDT table for every write to the
>file system with dedup enabled.

Well, it's partially true.  Once you enable dedup on a dataset, it creates
a DDT and it's not possible to remove the DDT without nuking the dataset.
There are basically 3 operations on a block:
Read a block: DDT is never referenced.
Write a new block: DDT is referenced is dedup is enabled.
Free a block: DDT is always referenced if it exists.

The usual "fall off a cliff" scenario is when you go to delete a large file
or snapshot on a dataset where dedup has been enabled at some point in the
past, even if it's not enabled now.  Every block in that file or snapshot
is checked against the DDT.  Since the DDT is basically a very large hash
table this entails lots of random I/O.

On 2015-Sep-21 10:10:46 -0400, Quartz <quartz at sneakertech.com> wrote:
>Also, just for reference: according to the specs each entry in the dedup
>table costs about 320 bytes of memory per block of disk. This means that
>AT BEST (assuming ZFS decides to use full 128K blocks in your case)
>you'll need 2.5GB of ram per 1 TB of used space just for the DDT stuff

And at worst, assuming advanced format disks, you'll have 4K blocks and
need 80GB RAM per 1 TB used space.

In general, the downsides of dedup outweigh the benefits.  If you already
have the data in ZFS, you can use 'zdb -S' to see what effect rebuilding
the pool with dedup enabled would have - how much disk space you will save
and how big the DDT is (and hence how much RAM you will need).  If you can
afford it, make sure you keep good backups, enable DDT and be ready to nuke
the pool and restore from backups if dedup doesn't work out.

On 2015-Sep-21 10:02:16 -0700, Marcus Reid <marcus at blazingdot.com> wrote:
>This is misleading.  lz4 compression is so fast that in the common case
>it _increases_ performance.

This is true of most of the compression algorithms.

>In addition, lz4 has early-abort where it will detect that the data is
>uncompressible, and just write it out when it is instead of compressing
>it.

I'm not sure how lz4 decides data is uncompressible without trying to
compress it.  The way ZFS compression works is that it tries to compress
a block.  Unless the compressed data is small enough to fit into a smaller
block (ie 2:1 compression or better), the uncompressed data is stored.
(And blocks of NULs are "stored" as holes it the file without attempting
compression).  In general, unless you know a dataset will always be
filled with pre-compressed data - videos, non-RAW images, distfiles -
you are better off enabling compression.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20150922/6d9d1082/attachment.bin>