Sven Willenberger
2005-Apr-08 15:28 UTC
kern/79035: gvinum unable to create a striped set of mirrored sets/plexes
On Sun, 2005-03-20 at 15:51 +1030, Greg 'groggy' Lehey wrote:> On Saturday, 19 March 2005 at 23:43:00 -0500, Sven Willenberger wrote: > > Greg 'groggy' Lehey presumably uttered the following on 03/19/05 22:11: > >> On Sunday, 20 March 2005 at 2:04:34 +0000, Sven Willenberger wrote: > >> > >>> Under the current implementation of gvinum it is possible to create > >>> a mirrored set of striped plexes but not a striped set of mirrored > >>> plexes. For purposes of resiliency the latter configuration is > >>> preferred as illustrated by the following example: > >>> > >>> Use 6 disks to create one of 2 different scenarios. > >>> > >>> 1) Using the current abilities of gvinum create 2 striped sets using > >>> 3 disks each: A1 A2 A3 and B1 B2 B3 then create a mirror of those 2 > >>> sets such that A(123) mirrors B(123). In this situation if any drive > >>> in Set A fails, one still has a working set with Set B. If any drive > >>> now fails in Set B, the system is shot. > >> > >> No, this is not correct. The plex ("set") only fails when all drives > >> in it fail. > > > > I hope the following diagrams better illustrate what I was trying to > > point out. Data striped across all the A's and that is mirrored to the B > > Stripes: > > > > ... > > > > If A1 fails, then the A Stripe set cannot function (much like in Raid 0, > > one disk fails the set) meaning that B now is the array: > > No, this is not correct. > > >>> Thus the striping of mirrors (rather than a mirror of striped sets) > >>> is a more resilient and fault-tolerant setup of a multi-disk array. > >> > >> No, you're misunderstanding the current implementation. > > > > Perhaps I am ... but unless gvinum somehow reconstructs a 3 disk stripe > > into a 2 disk stripe in the event one disk fails, I am now sure how. > > Well, you have the source code. It's not quite the way you look at > it. It doesn't have stripes: it has plexes. And they can be > incomplete. If a read to a plex hits a "hole", it automatically > retries via (possibly all) the other plexes. Only when all plexes > have a hole in the same place does the transfer fail. > > You might like to (re)read http://www.vinumvm.org/vinum/intro.html. >I was really hoping that the "holes in the plex" functioning was going to work but my tests have shown otherwise. I created a gvinum array consisting of (A striped B) mirror (C striped D) which is the only such mirror/stripe combination allowed by gvinum for four drives. We have: _________ | A B |__ |_______| | |Mirror _________ | | C D |--| |_______| Based on what the "plex hole" theory states, Drive A and Drive D could both fail and the system would read through the holes and pick up data from B and C (or the converse if B and C failed), functionally equivalent to a stripe of mirrors. To fail a drive I rebooted single-user, dd dev/zero to the beginning of the disk and then fdisk. drive d device /dev/da4s1h drive c device /dev/da3s1h drive b device /dev/da2s1h drive a device /dev/da1s1h volume home plex name home.p1 org striped 960s vol home plex name home.p0 org striped 960s vol home sd name home.p1.s1 drive d len 71681280s driveoffset 265s plex home.p1 plexoffset 960s sd name home.p1.s0 drive c len 71681280s driveoffset 265s plex home.p1 plexoffset 0s sd name home.p0.s1 drive b len 71681280s driveoffset 265s plex home.p0 plexoffset 960s sd name home.p0.s0 drive a len 71681280s driveoffset 265s plex home.p0 plexoffset 0s In my case: Fail B Fail B and C A = /dev/da1s1h up up B = /dev/da2s1h down down C = /dev/da3s1h up down D = /dev/da4s1h up up 1 Volume V home2 up down (!) 2 Plexes P home.p0 (A and B) down down P home.p1 (C and D) up down 4 Subdisks S home.p0.s0 (A) up up S home.p0.s1 (B) down down S home.p1.s0 (C) up down S home p1.s1 (D) up up Based on this failing the one drive did in fact fail the plex (home.p0). Although at that point I realized that failing either drive on the other plex would also fail that plex and also the volume, I went ahead and failed drive C also. The result was a failed volume. With the failed B drive, once I bsdlabeled the disk to include the vinum slice, then I got the message that the the plex was now stale (instead of down). A simple gvinum start home2 changed the state to degraded the the system rebuilt the array. When both drives failed I had to work a bit of a kludge in. I gvinum setstate -f up home.p1.s0, then gvinum start home.p0. At that point the system rebuilt itself and it would appear the data is intact .. I have not completely tested or verified that last statement however. In essence although my feature request to have the ability to create a striped set of mirrors was going to be hopefully supplanted by the functional equivalent via the "plex hole" system, it did not come to fruition. So please note this as either a re-request for that feature or a bug report in that the pass-through feature of gvinum plexes is broken. Sven