read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID

Sun Jun 28 11:02:04 UTC 2009

"Now we come to the crucial decision ZFS has made for raidz and
raidz2: in raidz and raidz2, the data block is striped across all of
the disks. Instead of a model where a parity stripe is a bunch of data
blocks, each with an independent checksum, ZFS stripes a single data
block (and its parity), with a single checksum, across all the disks
(or as many of them as necessary).

This is a rational implementation decision, but when combined with the
need to verify checksums, it has an important consequence: in ZFS,
reads always involve all disks, because ZFS always must verify the
data block's checksum, which requires reading all of the data block,
which is spread across all of the drives. This is unlike normal RAID-5
or RAID-6, in which a small enough read will only touch one drive, and
means that adding more disks to a ZFS raidz pool does not increase how
many random reads you can do per second.

(A normal RAID-5 or RAID-6 array has a (theoretical) random read IO
capacity equal to the sum of the random IO operations rate of each of
the disks in the array, and so adding another disk adds its IOPs per
second to your read capacity. A ZFS raidz or raidz2 pool instead has a
capacity equal to the slowest disk's IOPs per second, and adding
another disk does nothing to help. Effectively a raidz ZFS gives you a
single disk's read IOPs per second rate.)"

This was on a blog of a SUN engineer (although a post from a few years
ago), unfortunately I don't have the link, I actually had to go
through my posting history on the Ars Technica forum to even find this
quote in the first place. If the situation has changed and the above
quote no longer holds true, it would be nice if someone more
knowledgeable on the performance implications could elaborate what
kind of performance is to be expected on a raidz system :)

- Sincerely,
Dan Naumov

On Sun, Jun 28, 2009 at 1:36 PM, Andrew Snow<andrew at modulus.org> wrote:
>> What's confusing is that your results are actually out of place with
>> how ZFS numbers are supposed to look, not mine :) When using ZFS
>> RAIDZ, due to the way parity checking works in ZFS, your pool is
>> SUPPOSED to have throughput of the average single disk from that pool
>> and not some numbers growing skyhigh in a linear fashion.
>
> Could you please elaborate on this and explain it?
>
> - Andrew
>