fsck'ing a Vinum RAID5 volume (and a stale drive)

Daniel Eriksson daniel_k_eriksson at telia.com
Fri Jun 25 20:35:27 PDT 2004


Benjamin P. Keating wrote:

> I've found on the net that I can switch the state by doing:
> 
> $ vinum setstate up backup.p0 backup.p0.s3

Ouch, this is a bad move. You just told vinum to start using the stale (=out
of date data) disc as if it was up to date and nothing was wrong with it.
Basically you have trashed your data, possibly beyond repair.

Why was the disc in a stale state? If it developed bad blocks that could not
be remapped, then vinum marked it as stale and continued to use the other
discs in degraded mode (just as it should). Even if the disc did not break
(maybe just connection problems or something), once vinum marked it as stale
any further writing to the array would immediately invalidate the data on
the disc; which means the only way to bring it back up would be to go throug
a proper rebuild of the data.

> I rebooted, the state is "up" so I unmounted the volume to fsck it. Is
> this approach correct? does this do anything productive or just forces
> the state label to change and do nothing to the drives? I don't feel
> confident that it did anything and Im having a VERY hard time finding
> documentation on this.

Your approach is not correct. You should have paid attention to the vinum
manpage which says this about setstate: "This bypasses the usual consistency
mechanism of vinum and should be used only for recovery purposes.  It is
possible to crash the system by incorrect use of this command."

It is unfortunate that the manpage says "for recovery", since people can
misunderstand and think you can recover from a crashed disc. setstate should
not need to be used during normal operation, even if a disc in a RAID-5
array crashes.

> Im assuming I'll want to answer yes to at least some of those. Can I
> say yes to all of them? What errors should I say no to? I have no idea
> whats bad bad and whats correctable.

An fsck of a degraded RAID-5 array should not normally have any errors. The
errors you are seeing is because you have forced out-of-date (stale) data
into the middle of the filesystem, messing up pretty much the entire
filesystem.

> Now. because this is a raid5 volume, are some of these fsck prompts
> false  positives? ie; fsck is giving a error but really it's fine as
> it's raid5?

No, that is not how RAID-5 works. RAID-5 protects the integrity of the
filesystem by ensuring that the stored data can be read/written even if one
disc fails. RAID arrays work at a level below the filesystem, and they
generally don't know anything about the actual filesystem.


Your only hope now is if you haven't actually allowed anything to be written
to the array. If fsck changed things around then you are probably out of
luck. IF nothing has been written, then reset the failed disc to the
"stale/down" state (which should put the array in degraded mode hopefully)
and then try fsck again. If you are lucky you should see no errors at all.

If the disc isn't physically broken, then the proper way to get your array
back to the original "all up" state, you should run a "vinum start
backup.p0.s3". I'm not sure if this will rebuild all the data automatically
(it does for RAID-1 arrays), but if it doesn't then I guess you also need to
run a "vinum rebuildparity backup.p0".

/Daniel Eriksson




More information about the freebsd-questions mailing list