ZFS mirror install /mnt is empty

Paul Kraus paul at kraus-haus.org
Tue May 14 13:15:35 UTC 2013

On May 14, 2013, at 12:10 AM, Shane Ambler <FreeBSD at ShaneWare.Biz> wrote:

> When it comes to disk compression I think people overlook the fact that
> it can impact on more than one level.

Compression has effects at multiple levels:

1) CPU resources to compress (and decompress) the data
2) Disk space used
3) I/O to/from disks

> The size of disks these days means that compression doesn't make a big
> difference to storage capacity for most people and 4k blocks mean little
> change in final disk space used.

	The 4K block issue is *huge* if the majority of your data is less than 4K files. It is also large when you consider that a 5K file will not occupy 8K on disk. I am not a UFS on FreeBSD expert, but UFS on Solaris uses a default block size of 4K but has a fragment size of 1K. So files are stored on disk with 1K resolution (so to speak). By going to a 4K minimum block size you are forcing all data up to the next 4K boundary.

	Now, if the majority of your data is in large files (1MB or more), then the 4K minimum black size probably gets lost in the noise.

	The other factor is the actual compressibility of the data. Most media files (JPEG, MPEG, GIF, PNG, MP3, AAC, etc.) are already compressed and trying to compress them again is not likely to garner any real reduction inn size. In my experience with the default compression algorithm (lzjb), even uncompressed audio files (.AIFF or .WAV) do not compress enough to make the CPU overhead worthwhile.

> One thing people seem to miss is the fact that compressed files are
> going to reduce the amount of data sent through the bottle neck that is
> the wire between motherboard and drive. While a 3k file compressed to 1k
> still uses a 4k block on disk it does (should) reduce the true data
> transferred to disk. Given a 9.1 source tree using 865M, if it
> compresses to 400M then it is going to reduce the time to read the
> entire tree during compilation. This would impact a 32 thread build more
> than a 4 thread build.

	If the data does not compress well, then you get hit with the CPU overhead of compression to no bandwidth or space benefit. How compressible is the source tree ? [Not a loaded question, I haven't tried to compress it]

> While it is said that compression adds little overhead, time wise,

	Compression most certainly DOES add overhead in terms of time, based on the speed of your CPU and how busy your system is. My home server is an HP Proliant Micro with a dual core AMD N36 running at 1.3 GHz. Turning on compression hurts performance *if* I am getting less than 1.2:1 compression ratio (5 drive RAIDz2 of 1TB Enterprise disks). Above that the I/O bandwidth reduction due to the compression makes up for the lost CPU cycles. I have managed servers where each case prevailed… CPU limited so compression hurt performance and I/O limited where compression helped performance.

> it is
> going to take time to compress the data which is going to increase
> latency. Going from a 6ms platter disk latency to a 0.2ms SSD latency
> gives a noticeable improvement to responsiveness. Adding compression is
> going to bring that back up - possibly higher than 6ms.

	Interesting point. I am not sure of the data flow through the code to know if compression has a defined latency component, or is just throughput limited by CPU cycles to do the compression.

> Together these two factors may level out the total time to read a file.
> One question there is whether the zfs cache uses compressed file data
> therefore keeping the latency while eliminating the bandwidth.

	Data cached in the ZFS ARC or L2ARC is uncompressed. Data sent via zfs send / zfs receive is uncompressed; there had been talk of an option to send / receive compressed data, but I do not think it has gone anywhere.

> Personally I have compression turned off (desktop). My thought is that
> the latency added for compression would negate the bandwidth savings.
> For a file server I would consider turning it on as network overhead is
> going to hide the latency.

	Once again, it all depends on the compressibility of the data, the available CPU resources, the speed of the CPU resources, and the I/O bandwidth to/from the drives.

	Note also that RAIDz (RAIDz2, RAIDz3) have their own computational overhead, so compression may be a bigger advantage in this case than in the case of a mirror, as the RAID code will have less data to process after being compressed.

Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

More information about the freebsd-questions mailing list