A little story of failed raid5 (3ware 8000 series)
Manjunath R Gowda
mgowda82 at gmail.com
Sat Aug 25 21:29:13 PDT 2007
On 8/25/07, Tom Samplonius <tom at samplonius.org> wrote:
> ----- "Artem Kuchin" <matrix at itlegion.ru> wrote:
> > But i don't understand how and why it happened. ONly 6 hours ago (a
> > night before)
> > all those files were backed up fine w/o any read error. And now, right
> > after replacing
> > the driver and starting rebuild it said that there are bad sectors all
> > over those file.
> > How come?
> What happened to you was an extremely common occurrence. You had a disk
> develop a media failure sometime ago, but the controller never detected it,
> because that particular bad area was not read. Your backups worked because
> they never touched this portion of the disk (ex. empty space, meta data,
> etc). And then another drive developed a electronics failure, which is
> instantly detected, putting the array into a degraded mode. When you did a
> rebuild onto a replace drive, the controller discovered that there was a
> second failed disk, and this is unrecoverable.
> RAID, of any level, isn't magic. It is important to understand how it
> works, an realize that drives can passive fail. BTW, if you were using
> RAID1 or RAID10, you would likely have had the same problem (well, RAID10
> can survive _some_ double-disk failures). RAID6 is the only RAID level that
> can survive failure of any two disks.
> The real solution is RAID scrubbing: a low level background process that
> reads every sector of every disk. All of the real RAID systems do this
> (usually scheduled weekly, or every other week). Most 3ware RAID card don't
> have this feature.
> So rather than not using RAID5 or RAID6 again, you should just not use
> 3ware anymore.
It's a common mistake to assume that regular maintenance are not required
which is implied through "It was working for years".
Your just waiting for some event to happen which could result in
catastrophic failure like this. It's always good to act, rather than react
dealing with critical data.
More information about the freebsd-stable