"Fixing" a RAID

Thu Jun 19 15:06:39 UTC 2008

> 
> I recently had this happen to me on an 8 x 1 TB RAID-5 array on a
> Highpoint RocketRAID 2340 controller. For some unknown reason two drives
> developed unreadable sectors within hours of each other. To make a long
> story short, the way I "fixed" this was to:
> 
	Not FreeBSD related, so you can delete now if not interested...

	We had a 1.5TB NetApp filer at my previous place. It was originally
backed up by another 1.5TB filer taking snapshots every few hours. After
a few years, the customer decided it was "too safe" so they used the 2nd
filer for something else. A month later, we had a double disk failure in the
same volume.

	The NetApp freaked out and rebooted, but when it did it marked one
disk dead, and the other as fine. Since there was a hot spare, it started
to attempt a rebuild. It took 9 hours for a 72G disk, and the 1/2 failed
drive sounded like it was putting the head through the media with lead shot
in it. The filer performed at about 1/2 speed during that time. The SECOND
that it finished, and the software claimed that the array was in optimal 
mode, we immediately pulled the bad disk out and replaced it with a fresh
disk. That rebuild went fine. Pulled the failed disk, and put another disk
in for hot spare.

	Not sure if its a testimony to NetApp, or our and the customers
luck. They had specifically not wanted backups, and rebuilding the data
would have taken months, many man hours, and loss of revenue to the site. 

	Ever since then, I try to get disks made at different times and
different batches. You figure that if they were MADE around the same time,
they will most likely DIE around the same time. :)

			Tuc