Serious bug in vinum?
João Carlos Mendes Luís
jonny at jonny.eng.br
Tue Mar 30 19:14:21 PST 2004
Greg 'groggy' Lehey wrote:
> On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote:
>
>>On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote:
>>
>>
>>> I think this should be like:
>>>
>>> if (plex->state > plex_corrupt) { /* something accessible, */
>>>
>>> Or, in other words, volume state is up only if plex state is degraded
>>>or better.
>>
>>You are right, this is a bug,
>
> No, see my reply.
I think "maybe" is the best answer here.
>>The correct solution, of course, is to check if the data is valid
>>before changing the volume state, but turn might turn out to be a
>>very complex check.
>
>
> Well, the minimum correct solution is to return an error if somebody
> tries to access the inaccessible part of the volume. That should
> happen, and I'm confused that it doesn't appear to be doing so in this
> case.
>
> On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote:
>
>>Greg 'groggy' Lehey wrote:
>>
>>>On Tuesday, 30 March 2004 at 0:32:38 -0300, Joo Carlos Mendes Lus wrote:
>>>
>>>Basically, this is a feature and not a bug. A plex that is corrupt is
>>>still partially accessible, so we should allow access to it. If you
>>>have two striped plexes both striped between two disks, with the same
>>>stripe size, and one plex starts on the first drive, and the other on
>>>the second, and one drive dies, then each plex will lose half of its
>>>data, every second stripe. But the volume will be completely
>>>accessible.
>>
>> A good idea if you have both stripe and mirror, to avoid discarding the
>>whole disk. But, IMHO, if some part of the disk is inacessible, the volume
>>should go down, and IFF the operator wants to try recovery, should use the
>>setstate command. This is the safe state.
>
> setstate is not safe. It bypasses a lot of consistency checking.
That's why it should be done only by a human operator, and only after
checking the physical disk. I use setstate frequently, when I have my wizard
hat on, but I know the consequences of doing that. If I have someone watching I
carefully explain then to *not* repeat that. ;-)
>
> One possibility would be:
>
> 1. Based on the plex states, check if all of the volume is still
> accessible.
> 2. If not, take the volume into a "flaky" state.
This is easy if the volume is composed of a single plex (my case, and the
case of most people who needs only a big and "unsafe" disk. Where unsafe means
a disk available or not available, and not half a disk. At least for me.
If the volume has more than one plex, then you could think of an algoritm
that explores this redundancy.
But, IMO, a disk with half of it unavailable is hardly an "up and ok" one.
Also note that, instead of turning the whole subdisk stale when a single
I/O fails, the error could be passed above. But, also, this only works with
single plex stripe or concat configurations.
> 3. *Somehow* ensure that the volume can't be accessed again as a file
> system until it has been remounted.
> 4. Refuse to remount the file system without the -f option.
>
> The last two are outside the scope of Vinum, of course.
And again violates the layering aproach. I thought newfs -v has been enough...
The first time I used vinum I was happilly thinking that I would mix 4
whole disks (except for boot and swap partitions, of course) and create a new
pseudo disk, in which I would again disklabel it, and repartition for expected
use. Say, for example, that I want to have /var and /usr on different
partitions, but I want both with mirroring. With real world vinum I need to
create 2 vinum partitions on real disks, and have 2 vinum volumes.
AFAIK, -current and GEOM fixes this, right? My last experience with
RaidFrame was a panic one, since the disk creation. But I must confess I did
not try that hard, since vinum and -stable was working for me. I am not a
-current hacker for a long time now.
Greg, I like vinum, and I use it since its release in FreeBSD. Before that
I have used ccd(4). When 5.x is stable, I will use GEOM, vinum or raidframe.
But I really think *ix is great for it's reusability, recursivity and modularity
and vinum breaks this. If vinum creates a virtual disk, it should behave like a
real disk.
Jonny
--
João Carlos Mendes Luís - Networking Engineer - jonny at jonny.eng.br
More information about the freebsd-bugs
mailing list