6.0 on Dell 1850 with PERC4e/DC RAID?
Doug Ambrisko
ambrisko at ambrisko.com
Fri Jan 13 10:26:24 PST 2006
Mike Tancsa writes:
| At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
| >|
| >| That's lame. Under what condition does it happen, do you know?
| >
| >Running RAID 10, a drive was swapped and the rebuild started on the
| >replacement drive. The rebuild complained about the source drive
| >for the mirror rebuild having read errors that couldn't be recovered.
| >It continued on and finished re-creating the mirror. Then the RAID
| >proceeeded onto a background init which they normal did and started
| >failing that and re-starting the background init over and over again.
| >The box changed the RAID from degraded to optimal when the rebuild
| >completed (with errors). Do a dd of the entire RAID logical device
| >returned an error at the bad sector since it couldn't recover that.
| >The RAID controller reported an I/O error and still left the RAID as
| >optimal.
| >
| >We reported this and where told that's the way it is designed :-(
|
| Interesting timing as I ran into this sort of situation on the
| weekend on a 3ware drive in RAID1. The card had complained for a week
| about read errors on drive 1. We thought we would wait until the
| weekend maintenance window to swap it out. Sadly, before that
| window, drive zero totally died a horrible death. We popped in a new
| drive on port zero, started the rebuild, and it crapped out saying
| there was a read error on drive 1. However, there is a check box
| that says continue the build, even with errors on the source drive.
With Adaptec we used to do a verify of each disk before a swap
to increase our chances of a successful disk swap. Adaptec was
a little heavy handed in if you are running on the last disk of the
mirror and it has a read-error it will fail the drive. If you have
a RAID 10 then you lose 1/2 the file system :-( I'd rather just
get the read error back to the OS then loose the entire drive.
| This setup seems to give you the best of both worlds. We did a quick
| check of the resultant files compared to backups and only a couple
| were toasted. (The box is going to be retired in a month, so if there
| is other hidden fs corruption if it holds out for another 3 weeks we
| dont care too much). The correct approach would be to do a total
| restore of course, but this was good enough for us in this
| situation. I guess the question is, is this RAID1 in a proper mirror
| given that there are hard errors on the drive on port 1 ?
That sounds like a good controller assuming it says the RAID is still
degraded and it's not optimal. I assume "optimal" means everything
is fine and safe to read the entire volume.
Doug A.
More information about the freebsd-stable
mailing list