Very low disk performance on 5.x

Allen bsdlists at rfnj.org
Mon May 2 07:33:21 PDT 2005


At 10:14 5/2/2005, Arne "Wörner" wrote:
>--- Allen <bsdlists at rfnj.org> wrote:
> > Also you should keep in mind, there could simply be some really
> > goofy
> > controller option enabled, that forces the RAID5 to behave in a
> > "degraded"
> > state for reads -- forcing it to read up all the other disks in
> > the stripe
> > and calculate the XOR again, to make sure the data it read off
> > the disk
> > matches the checksum.  It's rare, but I've seen it before, and
> > it will
> > cause exactly this sort of RAID5 performance inversion.  Since
> > the XOR is
> > recalculated on every write and requires only reading up one
> > sector on a
> > different disk, options that do the above will result in read
> > scores
> > drastically lower than writes to the same array.
> >
>Isn't that compensated by the cache? I mean:
>We would just
>1. read all the blocks, that correspond to the block, that is
>requested,
>2. put them all into the cache
>3. check the parity bits (XOR should be very fast; especially in
>comparison to the disc read times)
>4. keep them in the cache (some kind of read ahead...)
>5. send the requested block to the driver

Your steps are appropriate but you should note that #3 is not true on cards 
that support RAID5 but do not have hardware-XOR.  Some of the very cheap 
i960 based cards have this failing, so the XOR itself is slow on top of 
everything else.

However, the cache doesn't play a part in this at all.  It's the difference 
between these two read cycles, assume just one block was written to out of 
4, on a 5-disk system.

Scenario A, verified read disabled:
1. RAID card reads up one block from appropriate drive.  Done.

Scenario B, verified read enabled:
1. RAID card reads up ALL blocks in the stripe (5 reads).
2. RAID card pretends the block requested is on a "degraded" drive, and 
calculates it from the other 3 + the XOR stripe.
3. RAID card reports the value back, or tosses some kind of error.

You can see, the cache just doesn't play a part in what I was describing, 
which is basically the array performing as though it is degraded when in 
fact it is not, to catch failures that would otherwise be missed.




More information about the freebsd-performance mailing list