gvinum and raid5

Marco Haddad freebsd-lists at ideo.com.br
Fri Nov 2 18:33:20 PDT 2007


I must say that I had a strong faith on vinum too. I used it on a dozen
servers to build raid5 volumes, specially when the hard drives were small
and unreliable. So I had a few crashes naturally, but replacing the failed
disk was easy and rebuild worked all times.

I started using gvinum at the first SCSI controller not supported by vinum
found. As gvinum solved the vinum problem with that controller, it
immediately received the same faith I had on vinum. I kept using gvinum many
times after, till my faith was shaken by a hard disk crash, because I could
not get the replacement drive added to the raid5 volume. After a lot of head
bumping against the wall, I came up with this work around procedure to
replace a failed disk.

I have just used this procedure today to replace a SATA hard disk that I
suspect was the cause of an intermittent failure, with such a success that I
began to think it isn't so bad... Anyway I'll describe a simple example in
order to get your comments.

Suppose a simple system with three hard disks ad0, ad1 and ad2. They were
fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d
are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each
drive has only one slice and they are joined in a raid5 plex formming the
volume VOL. The gvinum create script would be the following:

drive AD0 device /dev/ad0s1d
drive AD1 device /dev/ad1s1d
drive AD2 device /dev/ad2s1d
volume VOL
  plex org raid5 128K
    sd drive AD0
    sd drive AD1
    sd drive AD2

Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l"
we would get something like this:

3 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD1    State: down    /dev/ad1s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: degraded    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: down    D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...

First thing I do: edit fstab and comment out the line mounting
/dev/gvinum/VOL wherever it was mounted. It is necessary because once
mounted gvinum can not operate most commands, and umount doesn't do the
trick. Then I shutdown the system and replace the hard disk and bring it up

At this point the first weird thing can be noted. With 'gvinum l' you would

2 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: up    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: up        D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...

What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it
makes the plex up instead of degraded. The first time I saw it I got the

Next step is to fdisk and label the new disk just like the old one. The new
disk can be bigger but, I think, the partition ad1s1d must be the same size
as before.

At this point should be enough to use gvinum create with a script file
containing only the line:

drive AD1 device /dev/ad1s1d

but gvinum would panic with that and the system would lock or core dump.
Then something weird must be done: remove all gvinum objects with 'gvinum rm
---'. Yes, just to make it clear, in this case the commands would be:

gvinum rm -r AD0
gvinum rm -r AD2
gvinum rm VOL
gvinum rm VOL.p0
gvinum rm VOL.p0.s1

Then we can use 'gvinum create' with the original script to recreate

Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must
be marked as stale with:

gvinum setstate -f stale VOL.p0.s1

This brings back the plex to degraded mode and we can use:

gvinum start VOL

to rebuild it. It may take about 1 hour per 100GB of volume space, so we
better grab some lunch...

The progress can be seen at any time with:

gvinum ls

After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors
left behind when the drive came down.

Now we just need to uncomment that line in fstab and reboot.

I think there's no easier way...

Marco Haddad

On 11/2/07, Peter Giessel <pgiessel at mac.com > wrote:
> On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" < joe at rootnode.com>
> wrote:
> >Ulf Lilleengen wrote:
> >> On ons,

okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
> >>
> >>> I found in recent researchs that a lot of people say gvinum should not
> be
> >>> trusted, when it comes to raid5. I began to get worried. Am I alone
> using
> >>>
> >>>
> >> I'm working on it, and there are definately people still using it.
> (I've
> >> recieved a number of private mails as well as those seen on this list).
> IMO,
> >> gvinum can be trusted when it comes to raid5. I've not experienced any
> >> corruption-bugs or anything like that with it.
> >>
> >
> >The source of the mistrust may be the fact that few software-only RAID-5
> >systems can guarantee write consistency across a multi-drive
> >read-update-write cycle in the case of, e.g., power failure.
> That may be the true source, but my source of mistrust comes from a few
> drive failures and gvinum's inability to rebuild the replaced drive.
> Worked fine under vinum in tests, tried the same thing in gvinum (granted,
> this was under FreeBSD 5), and the array failed to rebuild.
> I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's
> fault, and I no longer have access to the box to play with, but when I was
> playing with gvinum, replacing a failed drive usually resulted in panics.

More information about the freebsd-geom mailing list