Very low disk performance on 5.x

Mon May 2 07:33:21 PDT 2005

At 10:14 5/2/2005, Arne "Wörner" wrote:
>--- Allen <bsdlists at rfnj.org> wrote:
> > Also you should keep in mind, there could simply be some really
> > goofy
> > controller option enabled, that forces the RAID5 to behave in a
> > "degraded"
> > state for reads -- forcing it to read up all the other disks in
> > the stripe
> > and calculate the XOR again, to make sure the data it read off
> > the disk
> > matches the checksum.  It's rare, but I've seen it before, and
> > it will
> > cause exactly this sort of RAID5 performance inversion.  Since
> > the XOR is
> > recalculated on every write and requires only reading up one
> > sector on a
> > different disk, options that do the above will result in read
> > scores
> > drastically lower than writes to the same array.
> >
>Isn't that compensated by the cache? I mean:
>We would just
>1. read all the blocks, that correspond to the block, that is
>requested,
>2. put them all into the cache
>3. check the parity bits (XOR should be very fast; especially in
>comparison to the disc read times)
>4. keep them in the cache (some kind of read ahead...)
>5. send the requested block to the driver

Your steps are appropriate but you should note that #3 is not true on cards 
that support RAID5 but do not have hardware-XOR.  Some of the very cheap 
i960 based cards have this failing, so the XOR itself is slow on top of 
everything else.

However, the cache doesn't play a part in this at all.  It's the difference 
between these two read cycles, assume just one block was written to out of 
4, on a 5-disk system.

Scenario A, verified read disabled:
1. RAID card reads up one block from appropriate drive.  Done.

Scenario B, verified read enabled:
1. RAID card reads up ALL blocks in the stripe (5 reads).
2. RAID card pretends the block requested is on a "degraded" drive, and 
calculates it from the other 3 + the XOR stripe.
3. RAID card reports the value back, or tosses some kind of error.

You can see, the cache just doesn't play a part in what I was describing, 
which is basically the array performing as though it is degraded when in 
fact it is not, to catch failures that would otherwise be missed.