to gmirror or to ZFS

Sat Jul 20 20:26:41 UTC 2013

On Sat, 20 Jul 2013, Steve O'Hara-Smith wrote:

> On Sat, 20 Jul 2013 18:14:20 +0100
> Frank Leonhardt <frank2 at fjl.co.uk> wrote:
>
>> It's worth noting, as a warning for anyone who hasn't been there, that
>> the number of times a second drive in a RAID system fails during a
>> rebuild is higher than would be expected. During a rebuild the remaining
>> drives get thrashed, hot, and if they're on the edge, that's when
>> they're going to go. And at the most inconvenient time. Okay - obvious
>> when you think about it, but this tends to be too late.
>
> 	Having the cabinet stuffed full of nominally identical drives
> bought at the same time from the same supplier tends to add to the
> probability that more than one drive is on the edge when one goes. It's a
> pity there are now only two manufacturers of spinning rust.

Often this is presummed to be the reason for double failures close in 
time, also common mode failures such as environment, a defective power 
supply or excess voltage can be blamed. I have to think that the most 
common "cause" for a second failure soon after the first is that a failed 
drive often isn't detected until a particular sector is read or written. 
Since the resilvering reads and writes every sector on multiple disks, 
including unused sectors, it can "detect" latent problems that may have 
existed since the drive was new but which haven't been used for data yet, 
or have gone bad since the last write, but haven't been read since.

The ZFS scrub processes only sectors with data, so it provides only 
partial protection against double failures.

Daniel Feenberg
NBER

>
> -- 
> Steve O'Hara-Smith <steve at sohara.org>
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"
>