kern/79035: gvinum unable to create a striped set of mirrored sets/plexes

Sven Willenberger sven at dmv.com
Sat Mar 19 22:10:06 PST 2005


The following reply was made to PR kern/79035; it has been noted by GNATS.

From: Sven Willenberger <sven at dmv.com>
To: "Greg 'groggy' Lehey" <grog at FreeBSD.org>
Cc: freebsd-gnats-submit at FreeBSD.org
Subject: Re: kern/79035: gvinum unable to create a striped set of mirrored
 sets/plexes
Date: Sun, 20 Mar 2005 01:00:29 -0500

 Greg 'groggy' Lehey presumably uttered the following on 03/20/05 00:21:
 > On Saturday, 19 March 2005 at 23:43:00 -0500, Sven Willenberger wrote:
 > 
 >>Greg 'groggy' Lehey presumably uttered the following on 03/19/05 22:11:
 >>
 >>>On Sunday, 20 March 2005 at  2:04:34 +0000, Sven Willenberger wrote:
 >>>
 >>>
 >>>>Under the current implementation of gvinum it is possible to create
 >>>>a mirrored set of striped plexes but not a striped set of mirrored
 >>>>plexes. For purposes of resiliency the latter configuration is
 >>>>preferred as illustrated by the following example:
 >>>>
 >>>>Use 6 disks to create one of 2 different scenarios.
 >>>>
 >>>>1) Using the current abilities of gvinum create 2 striped sets using
 >>>>3 disks each: A1 A2 A3 and B1 B2 B3 then create a mirror of those 2
 >>>>sets such that A(123) mirrors B(123). In this situation if any drive
 >>>>in Set A fails, one still has a working set with Set B. If any drive
 >>>>now fails in Set B, the system is shot.
 >>>
 >>>No, this is not correct.  The plex ("set") only fails when all drives
 >>>in it fail.
 >>
 >>I hope the following diagrams better illustrate what I was trying to
 >>point out. Data striped across all the A's and that is mirrored to the B
 >>Stripes:
 >>
 >>...
 >>
 >>If A1 fails, then the A Stripe set cannot function (much like in Raid 0,
 >>one disk fails the set) meaning that B now is the array:
 > 
 > 
 > No, this is not correct.
 > 
 > 
 >>>>Thus the striping of mirrors (rather than a mirror of striped sets)
 >>>>is a more resilient and fault-tolerant setup of a multi-disk array.
 >>>
 >>>No, you're misunderstanding the current implementation.
 >>
 >>Perhaps I am ... but unless gvinum somehow reconstructs a 3 disk stripe
 >>into a 2 disk stripe in the event one disk fails, I am now sure how.
 > 
 > 
 > Well, you have the source code.  It's not quite the way you look at
 > it.  It doesn't have stripes: it has plexes.  And they can be
 > incomplete.  If a read to a plex hits a "hole", it automatically
 > retries via (possibly all) the other plexes.  Only when all plexes
 > have a hole in the same place does the transfer fail.
 > 
 > You might like to (re)read http://www.vinumvm.org/vinum/intro.html.
 > 
 > Greg
 > --
 > See complete headers for address and phone numbers.
 
 I guess I just needed someone to come out and say what you just said. 
 Rereading the manual lead me to the point of confusion that brought me 
 to this question in the first place. Quoting (from The-Big-Picture):
 
 "    *
 
        Although a plex represents the complete data of a volume, it is 
 possible for parts of the representation to be physically missing, 
 either by design (by not defining a subdisk for parts of the plex) or by 
 accident (as a result of the failure of a drive).
      * A volume is a collection of between one and eight plexes. Each 
 plex represents the data in the volume, so more than one plex provides 
 mirroring. As long as at least one plex can provide the data for the 
 complete address range of the volume, the volume is fully functional."
 
 The first sentence would seem to imply partial plexes are ok. However, 
 it would appear from the last sentence (wherein it would imply that one 
 plex needs to provide the date for the complete address range) that the 
 volume still needs at least one complete plex in order to function. I 
 could not find any indication that it could combine "partial" plexes 
 into a fully functioning volume so I am glad you did point that out. 
 This would indicate that the solution I seek is already available (and 
 now I can test this:)  )
 
 Sven


More information about the freebsd-bugs mailing list