ZFS regimen: scrub, scrub, scrub and scrub again.

Thu Jan 24 13:13:04 UTC 2013

On 2013-01-23 21:22, Wojciech Puchar wrote:
>>> While RAID-Z is already a king of bad performance,
>>
>> I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
>> measurements to back up your claim?
>
> it is clearly described even in ZFS papers. Both on reads and writes it
> gives single drive random I/O performance.

With ZFS and RAID-Z the situation is a bit more complex.

Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).

A worst case scenario could happen if your random i/o workload was 
reading random files each of 2048 bytes. Each file read would require 
data from 4 disks (5th is parity and won't be read unless there are 
errors). However if files were 512 bytes or less then only one disk 
would be used. 1024 bytes - two disks, etc.

So ZFS is probably not the best choice to store millions of small files 
if random access to whole files is the primary concern.

But lets look at a different scenario - a PostgreSQL database. Here 
table data is split and stored in 1GB files. ZFS splits the file into 
128KiB records (recordsize property). This record is then again split 
into 4 columns each 32768 bytes. 5th column is generated containing 
parity. Each column is then stored on a different disk. You could think 
of it as a regular RAID-5 with stripe size of 32768 bytes.

PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record 
size and column size. Each page access requires only a single disk read. 
Random i/o performance here should be 5 times that of a single disk.

For me the reliability ZFS offers is far more important than pure 
performance.