Raid 1+0
Michael Powell
nightrecon at hotmail.com
Tue Apr 19 19:51:30 UTC 2016
Steve O'Hara-Smith wrote:
> On Mon, 18 Apr 2016 17:05:22 -0500 (CDT)
> "Valeri Galtsev" <galtsev at kicp.uchicago.edu> wrote:
>
>> Not correct. First of all, in most of the cases, failure of each of the
>> drives are independent events
>
> If only that were so. When the drives are as near identical as
> manufacturing can make them and have had very similar histories they can
> be expected to have very similar wear and be similarly close to failure at
> all times, which makes it likely that the load imposed by one failing will
> push another over.
>
And the more of them you place in the same physical enclosure, the more
vibration patterns and platter skew from either perfectly horizontal or
perfectly vertical mounting generate complex interference patterns. The
vibrational characteristics of the enclosure matter. In airframe
superstructure testing vibration sensors (think seismology) are scattered
throughout, then they use something that resembles a gun or an air hammer to
bang on a point in order to map out how the resulting vibration will flow
through the airframe. (Not my field of endeavor, something I learned from my
dad).
I'm certainly not qualified to debate probability theory. My experience is
anecdotal at best, but many sysadmins have witnessed various forms of drive
failure(s) in raid arrays. Most have noticed over the years that it seems to
occur most often when all drives come from the same manufacturing batch run
and lot number. After enough of these a sysadmin will respond by shuffling
the drives so they are not all from the same shipment, as well as when one
does fail, get it swapped out ASAP before another goes and you lose the
whole array.
Another pattern is simple age. I've seen drives that had run for so many
years that all assumptions are they are OK. Power them down, and poof - just
like that they don't come back. I've had arrays where one drive failed and
when powered down some of the others would not come back up after power up.
The answer to this is hot spare plus hot swap.
Anecdotal experience is no substitute for rigorous scientific proofs. Most
sysadmins are not concerned with such, but rather keeping servers running
and data flowing, almost to the point of superstition. Whatever works - use
it.
-Mike
More information about the freebsd-questions
mailing list