FreeBSD 5.3b7and poor ata performance
Chuck Swiger
cswiger at mac.com
Mon Oct 25 17:07:34 PDT 2004
Scott Long wrote:
> Charles Swiger wrote:
[ ...let's pray to the format=flowed gods... ]
>> If you prefer... ...consider using:
>> ----------------------------------------------
>> performance, reliability: RAID-1 mirroring
>> performance, cost: RAID-0 striping
>> reliability, performance: RAID-1 mirroring (+ hot spare, if possible)
>> reliability, cost: RAID-5 (+ hot spare)
>> cost, reliability: RAID-5
>> cost, performance: RAID-0 striping
>
> It's more complex than that.
Certainly. I plead guilty both of generalizing, and of simplifying
matters...but I'm not the first one to do so! :-) For example, I didn't
mention RAID-10 or -50, although both can do very well if you've got enough disks.
Still, I suspect the table above may be helpful to someone.
> Are you talking software RAID, PCI RAID, or external RAID?
I didn't specify. I am not sure that considering a specific example would
change the generalization above, since other considerations like the ratio of
reads to writes also have a significant impact on whether, say, RAID-5 or
RAID-1 is a better choice for a particular case.
However, if you can point me to general counterexamples where this issue would
change the recommendations I made above, I would be happy to consider them.
> That affects all three quite a bit. Also, how do you define reliability?
At the physical component layer, reliability gets defined by MTBF #s for the
various failure modes, things like spindle bearing wear, # of start-stop
cycles, etc. SMART provides some helpful parameters for disks, and there is
the I2B or SMBUS mechanisms for doing hardware-level checking of the
controller cards or the MB.
At the logical level, considering a RAID system as a whole, reliability
equates to "availability", which can be measured by how long (or whether) the
data on the RAID volume is _correctly_ available to the system.
> Do you verify reads on RAID-1 and 5?
This is answered by how you value the performance vs. reliability tradeoff.
> Also, what about error recovery?
Are you talking about issues like, "what are your chances of losing data if
two drives fail"?
>> That rule dates back to the early days of SCSI-2, where you could fit
>> about four drives worth of aggregate throughput over a 40Mbs
>> ultra-wide bus. The idea behind it is still sound, although the
>> numbers of drives you can fit obviously changes whether you talk about
>> ATA-100 or SATA-150.
>
> The formula here is simple:
>
> ATA: 2
> SATA: 1
>
> So the channel transport starts becoming irrlevant now (except when you
> talk about SAS and having bonded channels going to switches). The
> limiting factor again becomes PCI.
I absolutely agree that your consumer-grade 32-bit, 33MHz PCI is a significant
limiting factor and will probably act as a bottle bottleneck even to a
four-disk RAID config.
> An easy example is the software RAID cards that are based on the Marvell 8
> channel SATA chip. It can drive all 8 drives at max platter speed if you
> have enough PCI bandwidth (and I've tested this recently with FreeBSD 5.3,
> getting >200 MB/s across 4 drives). However, you're talking about
> PCI-X-100 bandwidth at that point, which is not what most people have in
> their desktop systems.
True, although that will gradually change over the next year or two as PCI-X
systems like the AMD Opteron and the G5 Macs get adopted.
Besides, given the quality trends of consumer-grade hard drives, more and more
people are using RAID to save them from a 16-month old dead drive (brought to
you courtesy of vendors H, I, or Q).
> And for reasons of reliability, I wouldn't consider software RAID to
> be something that you would base your server-class storage on other than
> to mirror the boot drive so a failure there doesn't immediately bring
> you down.
If you cannot trust your OS to handle your data via software RAID properly,
why should you trust the OS to pass data on to a hardware RAID controller
which actually is valid?
For example, it seems to me that a failure mode such as a bad memory chip
would result in incorrect data going to the disks regardless of whether you
were using software or hardware RAID.
Ditto for an application-level bug which generates the wrong results. [1]
[ ... ]
> What is interesting is measuring how many single-sector transfers can be
> done per second and how much CPU that consumes. I used to be able
> to get about 11,000 io/s on an aac card on a 5.2-CURRENT system from
> last winter. Now I can only get about 7,000. I not sure where the
> problem is yet, unfortunately. I'm using KSE pthreads to generate a
> lot of parallel requests with as little overhead as possible, so maybe
> something there has changed, or maybe something in the I/O path above
> the driver has changed, or maybe something in interrupt handling or
> shceduling has changed. It would be interesting to figure this out
> since this definitenly shows a problem.
Thanks for your thoughts.
--
-Chuck
[1]: This is why RAID is still not a substitute for good backups...
More information about the freebsd-current
mailing list