Storage overhead on zvols

Allan Jude allanjude at freebsd.org
Mon Dec 4 23:39:34 UTC 2017


On 12/04/2017 18:19, Dustin Wenz wrote:
> I'm starting a new thread based on the previous discussion in "bhyve uses all available memory during IO-intensive operations" relating to size inflation of bhyve data stored on zvols. I've done some experimenting with this, and I think it will be useful for others.
> 
> The zvols listed here were created with this command:
> 
> 	zfs create -o volmode=dev -o volblocksize=Xk -V 30g vm00/chyves/guests/myguest/diskY
> 
> The zvols were created on a raidz1 pool of four disks. For each zvol, I created a basic zfs filesystem in the guest using all default tuning (128k recordsize, etc). I then copied the same 8.2GB dataset to each filesystem.
> 
> 	volblocksize	size amplification
> 
> 	512B		11.7x
> 	4k			1.45x
> 	8k			1.45x
> 	16k			1.5x
> 	32k			1.65x
> 	64k			1x
> 	128k		1x
> 
> The worst case is with a 512B volblocksize, where the space used is more than 11 times the size of the data stored within the guest. The size efficiency gains are non-linear as I continue from 4k and double the block sizes; 32k blocks being the second-worst. The amount of wasted space was minimized by using 64k and 128k blocks.
> 
> It would appear that 64k is a good choice for volblocksize if you are using a zvol to back your VM, and the VM is using the virtual device for a zpool. Incidentally, I believe this is the default when creating VMs in FreeNAS.
> 
> 	- .Dustin
> 

As I explained a bit in the other thread, this depends a lot on your
VDEV configuration.

Allocations on RAID-Z* must be padded out to a multiple of 1+p (where p
is the parity level)

So on RAID-Z1, all allocations must be divisible by 2.

Of course any record size less than 4k, on drives with 4k sectors would
be rounded up as well.

So, with recordsize=512, you would end up using: 4k for data, 4k for
parity, with a waste factor of almost 16x.

4k is a bit better
Z1: 1 data + 1 parity + 0 padding = 2x
Z2: 1 data + 2 parity + 0 padding = 3x
Z3: 1 data + 3 parity + 0 padding = 4x

8k can be worse, where the RAID-Z padding comes into play:
Z1: 2 data + 1 parity + 1 padding = 2x (expect 1.5x)
Z2: 2 data + 2 parity + 2 padding = 3x (expect 2x)
Z3: 2 data + 3 parity + 3 padding = 4x (expect 2.x5)

Finally, all of these nice even numbers can be be thrown out once you
enable compression, and some blocks will compress better than others. An
8k record that fits into one 4k sector, etc.

Also consider that 'zfs' commands show size after its calculations of
what the expected raid-z parity space consumption will be, but does not
consider losses to padding. Whereas numbers given by the 'zpool'
command, are raw actual storage.

-- 
Allan Jude


More information about the freebsd-virtualization mailing list