vinum performance

Greg 'groggy' Lehey grog at FreeBSD.org
Mon Mar 31 16:49:58 PST 2003


[Format recovered--see http://www.lemis.com/email/email-format.html]

Long/short syndrome, gratuitous empty lines.

On Monday, 31 March 2003 at 16:41:21 -0500, Jason Andresen wrote:
> Jens Rehsack wrote:
>> Jason Andresen wrote:
>>> Mattias Pantzare wrote:
>>>> Lukas Ertl wrote:
>>>>> Ok. But I still don't understand why RAID 5 write performance is
>>>>> _so_ bad.  The CPU is not the bottle neck, it's rather
>>>>> bored. And I don't understand why RAID 0 doesn't give a big
>>>>> boost at all. Is the ahc driver known to be slow?
>>>>
>>>> To do a RAID 5 write you do this:
>>>> 1. Read the old data on the blocks that you will write to.
>>>> 2. Read the coresponding parity data.
>>>> 3. Write the new data.
>>>> 4. Write the new parity.
>>>
>>> Hmm, how about the case where you're writing new data?  You
>>> shouldn't have to do steps 1 & 2, and yet the RAID5 write
>>> performance is still abysmial.

It's possible to optimize away (1, 2) some of the time.  You can never
do it all the time.

>> Remember for that case that a block covered by the raid-system may
>> be larger than 512 bytes. I use 32K for my fileserver, so to skip
>> reading old data I had to write 32K blocks at once.

32 kB is far too small.  I recommend about 512 kB.

>> Of course, the system software (either vinum or the controller
>> software) caches a little bit, so if you write enough small data
>> you may get a 32K block (or whatever you use), full.

Well, you need a whole stripe over all subdisks.  On a 7 drive setup,
this would be 3 MB.  FreeBSD can't transfer that much at once.

The alternative is to reduce the stripe size so that the band is no
larger than the maximum transfer size (currently 128 kB).  With a 7
disk RAID-5 array, this would mean about 24 kB.  If you do
predominantly sequential transfers, for example streaming video or
backups, you could then get away without the first two steps for a
large proportion of the transfers.  The problem is that such small
stripes also cause request fragmentation: a 16 kB transfer (current
UFS block size) will go over two different subdisks 60% of the time.
This will require 6 transfers, not 4.

>>> I get 4565 K/sec on modern ATA/133 HDDs.
>>>
>>> Reading is much better at 91908 K/sec at least.
>
> Well, I'm writing 200MB files most of the time, so the stripe size is
> not an issue.  I'm just wondering why the reads are *20* times faster
> than the writes.

They're not.  I don't know where you get that read figure from, but no
disk can transfer that fast, and you don't appear to be doing multiple
transfers in parallel.

> I think the read performance was CPU limited in this case.

I think you were reading from cache.

> While some of this is probably an oddity with bonnie (Bonnie always
> reports my writes to be about half as fast as the reads, but dd
> thinks otherwise:
>
> (Both of these were on previously untouched files to prevent any
> caching, and the "write" test is on a new file, not rewriting an old one)
> Write speed:
> 81920000 bytes transferred in 3.761307 secs (21779663 bytes/sec)
> Read speed:
> 81920000 bytes transferred in 3.488978 secs (23479655 bytes/sec)
>
> But on the RAID5:
> Write speed:
> 81920000 bytes transferred in 17.651300 secs (4641018 bytes/sec)
> Read speed:
> 81920000 bytes transferred in 4.304083 secs (19033090 bytes/sec)

Yes, this is very close to the 4:1 ratio that I would expect.  If
you're looking for write performance, don't use RAID-5.

Greg
--
When replying to this message, please take care not to mutilate the
original text.  
For more information, see http://www.lemis.com/email.html
See complete headers for address and phone numbers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20030401/870d7aa3/attachment.bin


More information about the freebsd-stable mailing list