Has anybody EVER successfully recovered VINUM?

Scott Mitchell scott+freebsd at fishballoon.org
Wed Dec 8 02:08:13 PST 2004


On Tue, Dec 07, 2004 at 09:45:51PM -0800, orville weyrich wrote:
> I have been trying to figure out how to get VINUM to
> recognize a new disk after a disk failure, and no luck
> at all.
> 
> I cannot find instructions in the official
> documentation, nor in the FreeBSD Dairy.
> 
> Lots of places tell how to build a VINUM system. 
> Nobody ever takls about how to recover from a disk
> failure.
> 
> Can someone PLEASE help me recover?  I have already
> posted complete information to this list, with no
> answer.  I will give a short version now and provide
> more info if requested.

Hi Orville,

We have successfully replaced failed drives in Vinum RAID-5 volumes several
times, on a couple of different machines.  Obviously no guarantees that
these steps will work for you, but this is what we did:

1. Pull the failed drive.  The machines in question use Intel L440GX+
   boards with hot-swap SCSI drive cages, so this was as simple as popping
   out the dead drive.

2. Plug in a new drive.  The hardware and FreeBSD seemed quite happy with
   this, whether because the new drive was identical to the old one, or our
   SCSI layer is just smart enough to deal with this, I don't know.  I've
   never had to replace a failed drive with a larger one, so I have no idea
   if this would work or not.

3. Clean any existing partition table and boot blocks off the new drive,
   just in case:

	# dd if=/dev/zero of=/dev/da0 bs=1k count=1

4. Put a default BSD disklabel on the new drive:

	# disklabel -w -r da0 auto

5. Edit the disklabel to add the Vinum partition:

	# disklabel -e -r da0

   Just copy the 'c' partition line to 'a', change the partition type to
   'vinum' and clear out the 'fsize' and 'bsize' fields.  You can always
   check the disklabel on one of the other drives to see what it should
   look like.  Of course this assumes you're dedidating the entire disk to
   the Vinum partition.  If replacing the failed drive with a larger disk,
   adjust accordingly - the important thing is that the new partition is
   the same size as the old one.

6. Tell vinum to restart the failed subdisk:

	# vinum start raid.p0.s0

7. Wait ages while the new disk is 'revived'.

I was quite impressed that the volume remained available with users
accessing it throughout this procedure :-)  To play it safe you might want
to unmount the volume before starting.

There was one instance where the machine was for some reason rebooted after
the drive failure - vinum somehow completely forgot about that subdisk when
it came back up.  I *think* we fixed this by generating a new vinum config
just for that subdisk, something like:

	drive d0 device /dev/da0a

Once this was done vinum seemed happy to rebuild on the new drive.

This procedure has worked for us several times, but I have no idea if it's
the 'right' way to do this.  Maybe we're just really lucky.  So if you
follow these instructions and end up trashing your disks, it's not my
fault.  I *really* recommend having a recent backup before starting,
assuming the volume is still readable, just in case.

Good luck!

	Scott

-- 
===========================================================================
Scott Mitchell           | PGP Key ID | "Eagles may soar, but weasels
Cambridge, England       | 0x54B171B9 |  don't get sucked into jet engines"
scott at fishballoon.org | 0xAA775B8B |      -- Anon


More information about the freebsd-questions mailing list