Disk usage and ZFS deduplication

Marcus Reid marcus at blazingdot.com
Fri Jun 17 07:00:31 UTC 2011


On Tue, Jun 14, 2011 at 09:19:32AM +0200, Per von Zweigbergk wrote:
> I've been following the "Impossible compression ratio on ZFS" thread
> with some interest, and it made me ask myself this:
> 
> Let us say we have a hypothetical zfs filesystem with the equally
> hypothetical files A and B. The filesystem has deduplication enabled.
> Both files have an apparent file size of 100 MB, but 50 MB of that
> data is common between the two files and thus can be deduplicated.
> This would mean that total disk usage would be 150 MB.
> 
> If you use "du" to determine disk size for a deduplication, what would
> be the result? Which file would the common data be accounted to? Or
> would it be accounted to both files somehow, in part or in
> full?

Pretty simple test.

[root at luna /root]# zfs create -o mountpoint=/dedup -o dedup=on data/dedup 
[root at luna /usr/data]# dd if=/dev/urandom of=set_a_50MiB bs=1m count=50
[root at luna /usr/data]# dd if=/dev/urandom of=set_b_50MiB bs=1m count=50
[root at luna /usr/data]# dd if=/dev/urandom of=set_c_50MiB bs=1m count=50
[root at luna /usr/data]# cat set_a_50MiB set_b_50MiB > file_1
[root at luna /usr/data]# cat set_a_50MiB set_c_50MiB > file_2
[root at luna /usr/data]# cp file_1 /dedup
[root at luna /usr/data]# cp file_2 /dedup
[root at luna /usr/data]# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data   101G  32.8G  68.2G    32%  1.33x  ONLINE  -
[root at luna /usr/data]# cd /dedup
[root at luna /dedup]# du -sk *
102479  file_1
102479  file_2

Marcus


More information about the freebsd-fs mailing list