RAID 5 - serious problem

Jeremy Chadwick koitsu at FreeBSD.org
Wed Oct 15 14:01:34 UTC 2008


On Wed, Oct 15, 2008 at 03:51:19PM +0200, Jon Theil Nielsen wrote:
> 2008/10/15 Jeremy Chadwick <koitsu at freebsd.org>
> 
> > On Wed, Oct 15, 2008 at 02:32:25PM +0200, Jon Theil Nielsen wrote:
> > > Dear list,
> > >
> > > Something happened that I don't think should be possible. I "lost" all
> > three
> > > disks in my RAID 5 array simultaneously after approx. two years without
> > any
> > > problem. And I fear I will never see my data again. But I really hope
> > some
> > > of you clever persons can give me some hints. My system is:
> > > FreeBSD 7.0-Release
> > > Intel D975XBX2 motherboard (Intel Matrix Storage Technology)
> >
> > Are you using the Matrix Storage Technology?  If so, immediately stop.
> > FreeBSD's support for this is very, very bad, and will nearly guarantee
> > data loss.  There are many of us who have tried it, and it's known to
> > be buggy on FreeBSD.
> >
> > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting
> >
> > I recommend you stop using this feature and start using ZFS or gvinum
> > for what you need.
> >
> > > 3 WD Raptor 74 GB in a RAID 5 array
> > > 1 WD Raptor 150 GB as a standalone disk
> > > / and /var mounted on the standalone,, /usr on the RAID 5
> > > I believe what happened was that one of the disks didn't respond for such
> > a
> > > long time, that is was marked "bad". And afterwards the same thing
> > happened
> > > for the other disks. When I try to boot the system, all three disks are
> > > marked "Offline".
> > > The BIOS utility for the host controller has no option to force the disks
> > > back online.
> > > I have another machine with a S5000XVN board and Intel Embedded Server
> > RAID
> > > Technology II. The BIOS configuration utility on this board has the
> > option
> > > to force offline drives back online.
> >
> > Any "embedded" RAID is usually BIOS RAID managed by either a "software
> > RAID IC" (e.g. an IC on the motherboard that handles LBA/CHS addressing
> > for creating a pseudo-array, but the OS still does all of the management
> > and does not off-load anything).
> >
> > > I am very desperate not to lose my data, so I don't know if I dare moving
> > > the drives to the other machine and try to make them online again. Do you
> > > think I should try?
> >
> > No, but you might not have any choice.  It honestly sounds like the
> > metadata on your disks is in a bad state.
> >
> > I would recommend you try booting Linux, since their support for
> > MatrixRAID is significantly better/more advanced.  Ideally, you should
> > be able to bring the RAID members back online using their tools, then
> > reboot into FreeBSD and cross your fingers that your data becomes
> > accessible.  Once accessible, offload it somewhere immediately, and
> > follow my above recommendations.
> >
> > > In general, are there any procedures I can try to recover my RAID array?
> > Or
> > > is the offline status definitive ? and all data definitely lost? I guess
> > > some specialized companies have the expertise to recover lost data from a
> > > broken RAID array, but I don't know. And I don't know the price of such a
> > > service.
> > > I would really, really appreciate any kind of help.
> > > I have backups of most user data, but not of the system configuration
> > (and
> > > maybe even not the databases).  This is of course pretty stupid. In the
> > > future, I will not rely on RAID 5 as a foolproof solution?
> >
> > RAID 5 is a fine solution, but you have learned a very valuable lesson,
> > one which I will enclose in asterisks to make it crystal clear: ***RAID
> > DOES NOT REPLACE BACKUPS***.  Repeat this mantra over and over until you
> > accept it.  :-)
> >
> > --
> > | Jeremy Chadwick                                jdc at parodius.com |
> > | Parodius Networking                       http://www.parodius.com/ |
> > | UNIX Systems Administrator                  Mountain View, CA, USA |
> > | Making life hard for others since 1977.              PGP: 4BD6C0CB |
> >
> > Hi Jeremy,
> 
> Thanks for your advice. As I understand you, the best bet is to boot from
> Linux and try to repair.

> And that trying with my other controller might be the second best.

You risk corrupting or losing the metadata using another controller.
The two controllers are *not* identical; just because they're Intel
doesn't mean they speak the same metadata format.  :-)

> Would it be an idea to try to run som sort of Linux live cd?  I have
> no machines with Linux installed.

Yes, absolutely.  I assume any Linux distribution which uses libata
should be able to speak to Intel MatrixRAID disks and BIOSes.  Linux
refers to this feature as "Intel SATA RAID" or "Intel Software RAID",
Any present-day 2.6.x kernel uses libata; the newer the better.

I do not know how to manipulate or interface with MatrixRAID on Linux.
You will have to Google for how to get support in that regard.  My
quick searches turn up the following useful links:

http://linux-ata.org/faq-sata-raid.html
http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Bios_(Onboard)_RAID
http://www.intel.com/support/chipsets/imsm/sb/cs-020663.htm
http://iswraid.sourceforge.net/  (old/outdated from the look of it)

It would appear the tool to manipulate the metadata is called
dmraid(8).  I believe the -a flag is what you might require, but I do
not know for sure:

http://www.linuxmanpages.com/man8/dmraid.8.php
http://people.redhat.com/~heinzm/sw/dmraid/

Final note: I *will not* be held responsible for what happens if
you use any of these tools.  I have absolutely no experience doing any
of what I've described; I'm simply articulating the route I'd choose in
this scenario.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-hardware mailing list