EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?

Thu Jul 4 08:02:55 UTC 2013

On 04.07.13 05:51, Jeremy Chadwick wrote:
> On Wed, Jul 03, 2013 at 07:39:58PM -0700, Freddie Cash wrote:
>> On 2013-07-03 7:16 PM, "Jeremy Chadwick" <jdc at koitsu.org> wrote:
>>> On Thu, Jul 04, 2013 at 01:40:07PM +1200, Berend de Boer wrote:
>>>>>>>>> "Jeremy" == Jeremy Chadwick <jdc at koitsu.org> writes:
>>>>
>>>>      Jeremy>   Also, because nobody seems to warn others of this: if
>>>>      Jeremy> you go the ZFS route on FreeBSD, please do not use
>>>>      Jeremy> features like dedup or compression.
>>>>
>>>> Exactly the two reasons why I'm experimenting with FreeBSD on AWs.
>>>>
>>>> Please tell me more.
>>> dedup has immense and crazy memory requirements; the commonly referenced
>>> model (which is in no way precise, it's just a general recommendation)
>>> is that for every 1TB of data you need 1GB of RAM just for the DDT
>>> (deduplication table)) -- understand that ZFS's ARC also eats lots of
>>> memory, so when I say 1GB of RAM, I'm talking about that being *purely
>>> dedicated* to DDT.
>> Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each
>> unique block in the pool gets an entry in the DDT.
>>
>> You can use L2ARC to store the DDT, although it takes ARC space to track
>> data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a
>> no-no).
>>
>> However, you do need a lot of RAM to make dedupe work, and your I/O does
>> drop through the floor.
> Thanks Freddie -- I didn't know this (re: ARC space per TB of unique
> data); wasn't aware that's where the DDT got placed.  (Actually makes
> sense now that I think about it...)
>
The really bad thing about this is that the DDT actually competes with 
everything else in ARC. You don't want to arrive at the point where you 
trash the ARC with DDT...

ZFS with dedup is really "handy" for an non-interactive storage box, 
such as an archive server. Mine get over 10x dedup ratio and that means 
I fit the data in 24 disks instead of 240 disks... Extra RAM and L2ARC 
is well worth the cost and the drop in performance.

If you need higher performance form the storage subsystem though, ignore 
both dedup and compression. Even if they are bug free, some day.

Which brings us back to AWS. I believe AWS will charge for CPU time too, 
which you will happily waste with both dedup and compression. Yet 
another reason to avoid it, unless block storage (unlikely) is more 
expensive.

Daniel