RAID Gone Wild - One Array Split Into Two

Alex Kirk alex at schnarff.com
Sun Mar 1 16:12:33 PST 2009


First off, I realize that this may be more of a lower-level hardware  
question than is appropriate to ask here, but I'm at a real loss, and  
have no idea who else to ask...so I apologize in advance if I'm being  
a pest.

That said: I've got a FreeBSD 7.0/stable box that is used as the  
development server for a live system I administer. It recently crapped  
out on me (the dev box), and I realized that its power supply had  
kicked the bucket. After going out and replacing the power supply, it  
booted right back up, I ssh'd in, and when I ran my first userland  
command - "w", FWIW - it froze up solid. I got one more SSH session in  
attempting to figure out WTF was going on before it wouldn't even log  
me in any more.

After a couple of hard reboots, I decided to attach a monitor to it to  
see what was going on. It turns out that the RAID5 array on the system  
had really lost its mind - all four devices that were part of the  
array were listed as being offline, which of course meant that the  
system could no longer boot (as it was booting off of the RAID). The  
controller is an integrated Intel Matrix DHC7R, built onto the  
motherboard.

I looked around the web a bit to try to figure out how to fix this,  
and ran across a couple of forum posts (which I can unfortunately no  
longer seem to find) suggesting that this particular controller was  
prone to an issue where hard power-downs would sometimes make the  
drives go offline, and that I needed to boot from CD to re-initialize  
them into their previous state. I tried first with an Ubuntu Linux CD  
I had handy - which promptly freaked out and dropped me into an  
emergency shell - and then the FreeBSD 7.0 boot-only disc. The latter  
was a bit more helpful, because I got this diagnostic:

ar0: WARNING - parity protection lost, RAID5 array in DEGRADED mode
ar0: 715418MB <Intel MatrixRAID RAID5 (stripe 64KB)> status: DEGRADED
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad8 at ata4-master
ar0: disk2 READY using ad6 at ata3-master
ar0: disk3 DOWN no device found for this subdisk
ar1: 715418MB <Intel MatrixRAID RAID5 (stripe 64KB)> status: BROKEN
ar1: disk0 DOWN no device found for this subdisk
ar1: disk1 DOWN no device found for this subdisk
ar1: disk2 DOWN no device found for this subdisk
ar1: disk3 READY using ad10 at ata5-master

Now I can see that my problem is that I've somehow got *two* RAID  
devices, both improperly configured, whereas I'd only had one before.

Does anyone have a clue how I can fix this, preferably while retaining  
my data? I could wipe the box if necessary, but I'd really prefer not  
to, as that would be a huge pain in the butt.

Thanks,
Alex Kirk


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


More information about the freebsd-questions mailing list