vinum and hot-swapping

Greg 'groggy' Lehey grog at FreeBSD.org
Sun Jul 13 17:33:05 PDT 2003


On Sunday, 13 July 2003 at 14:14:53 -0500, Dan Nelson wrote:
> In the last episode (Jul 13), Andrea Venturoli said:
>> ** Reply to note from "Greg 'groggy' Lehey" <grog at freebsd.org> Sat, 12 Jul 2003 17:13:29 +0930
>>> The real performance penalty for RAID-5 is simply that writes require
>>> so much I/O.  Expect 25% of the write performance of RAID-0.
>>
>> Ok, I must ask this: Shouldn't SCSI system allow paralell writes on
>> different disks? If so, why so much penalty?
>
> Parallel I/Os are already being used.  A short write on a RAID-5 array
> requires you to
>
> 1) Read the original block and the parity block (done in parallel)
>
> 2) XOR the parity block with the original block and the new block

(which takes no time at all).

> 3) Write the new block and the parity block (done in parallel)
>
> Which means that you're doing 4 times the I/O that a plain RAID-5 read
> would do.

I think the confusion is that people think that, because the I/O
transfers in (1) and (2) are in parallel, this is only twice the time,
not four times.  That's true from a latency point of view, but not
from a throughput point of view.

> There's no getting around this problem for small random writes.
> Repeated writes to the same locations only cost two writes, since
> the original and parity blocks are probably still in cache.

Vinum currently doesn't cache the blocks.  That's an issue I'm
thinking about.

> There is a threshold point where this stops being an issue, however.
> When your write size becomes larger than the raid-5 stripe width
> (stripe size * number of data disks), you can simply calculate the
> parity block directly and not have to read anything.  At this point,
> raid-5 magically becomes as efficient as raid-0 :)
>
> I don't believe vinum can optimize full-stripe writes, though, since
> FreeBSD can only do I/O in 64k max chunks,

128 kB.

> and since vunum is software instead of battery-backed hardware RAID,
> it cannot hold off on multiple writes until the stripe fills up.

It could, but it would be dangerous.  I've been thinking of offering
it as an option (how often do systems really go down?).

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply or reply to the original recipients.
For more information, see http://www.lemis.com/questions.html
See complete headers for address and phone numbers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20030714/953d0efd/attachment.bin


More information about the freebsd-questions mailing list