A little story of failed raid5 (3ware 8000 series)

Jim Pingle lists at pingle.org
Mon Aug 20 17:39:48 PDT 2007


Artem Kuchin wrote:
> A day ago at 11 am i have turn off the server,
> pull out the old driver, installed a new one, turned of the server
> and started rebuild in an hour from remote location via web interface.
> After about 5 minuted the machine became unresponsive. Tried rebooting
> - nothing. I went to the machine and fingure out, that rebuild failed (0%)
> and some data cannot be read because of bad sectors.

We had a similar failure a few years ago, in fact every drive in the array
had bad sectors, but one failed completely. Rebuild would not work no matter
what we did. Yet another reason to be sure your RAID disks come from
different manufacturing batches!

> - i cannot make it ignore read errors
> - i cannot figure out which driver has bad sectors
> (maybe someone know it?)

We recovered from that situation like so:
 1. Power off the server
 2. Pull each drive, attach to another box w/o RAID adapter and use
diagnostic tools to run a surface scan/remap on each disk. (These were SCSI
disks, not sure if the same applies to PATA/SATA)
 3. Put all drives back, including a replacement for the truly failed drive
 4. Let the array rebuild

Many crossed fingers/toes/eyes later, it came back to life. We replaced the
whole box shortly thereafter.

The downside was the entire server was offline for the duration of the
process, instead of being online during a normal rebuild.

> So, no raid5 or even raid 6 for me any more. Never!

If it's done properly, with hot spares and other failsafe measures, it isn't
too bad. Sometimes it's the best available option due to budget/hardware/etc
constraints, especially on older systems.

RAID can be a tough beast, though. We had one server that ran fine for
nearly 5 years on a single PATA disk. Two months after I rebuild it with a
proper SCSI RAID setup, it has a multi-drive failure and bombs. Sometimes
all the safety measures in the world can't make up for what passes for
hardware quality these days...

Jim


More information about the freebsd-stable mailing list