Dave Sherohman
2018-Feb-26 11:15 UTC
[Gluster-users] Quorum in distributed-replicate volume
I've configured 6 bricks as distributed-replicated with replica 2, expecting that all active bricks would be usable so long as a quorum of at least 4 live bricks is maintained. However, I have just found http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ Which states that "In a replica 2 volume... If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only." Does this apply only to a two-brick replica 2 volume or does it apply to all replica 2 volumes, even if they have, say, 6 bricks total? If it does apply to distributed-replicated volumes with >2 bricks, what's the reasoning for it? I would expect that, if the cluster splits into brick 1 by itself and bricks 2-3-4-5-6 still together, then brick 1 will recognize that it doesn't have volume-wide quorum and reject writes, thus allowing brick 2 to remain authoritative and able to accept writes. -- Dave Sherohman
Karthik Subrahmanya
2018-Feb-26 12:15 UTC
[Gluster-users] Quorum in distributed-replicate volume
Hi Dave, On Mon, Feb 26, 2018 at 4:45 PM, Dave Sherohman <dave at sherohman.org> wrote:> I've configured 6 bricks as distributed-replicated with replica 2, > expecting that all active bricks would be usable so long as a quorum of > at least 4 live bricks is maintained. >The client quorum is configured per replica sub volume and not for the entire volume. Since you have a distributed-replicated volume with replica 2, the data will have 2 copies, and considering your scenario of quorum to be taken on the total number of bricks will lead to split-brains.> > However, I have just found > > http://docs.gluster.org/en/latest/Administrator%20Guide/ > Split%20brain%20and%20ways%20to%20deal%20with%20it/ > > Which states that "In a replica 2 volume... If we set the client-quorum > option to auto, then the first brick must always be up, irrespective of > the status of the second brick. If only the second brick is up, the > subvolume becomes read-only." >By default client-quorum is "none" in replica 2 volume.> > Does this apply only to a two-brick replica 2 volume or does it apply to > all replica 2 volumes, even if they have, say, 6 bricks total? >It applies to all the replica 2 volumes even if it has just 2 brick or more. Total brick count in the volume doesn't matter for the quorum, what matters is the number of bricks which are up in the particular replica subvol.> > If it does apply to distributed-replicated volumes with >2 bricks, > what's the reasoning for it? I would expect that, if the cluster splits > into brick 1 by itself and bricks 2-3-4-5-6 still together, then brick 1 > will recognize that it doesn't have volume-wide quorum and reject > writes, thus allowing brick 2 to remain authoritative and able to accept > writes. >If I understood your configuration correctly it should look something like this: (Please correct me if I am wrong) replica-1: bricks 1 & 2 replica-2: bricks 3 & 4 replica-3: bricks 5 & 6 Since quorum is per replica, if it is set to auto then it needs the first brick of the particular replica subvol to be up to perform the fop. In replica 2 volumes you can end up in split-brains. It would be great if you can consider configuring an arbiter or replica 3 volume. You can find more details about their advantages over replica 2 volume in the same document. Regards, Karthik> > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180226/d4f43a3b/attachment.html>
Dave Sherohman
2018-Feb-26 12:44 UTC
[Gluster-users] Quorum in distributed-replicate volume
On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote:> > "In a replica 2 volume... If we set the client-quorum option to > > auto, then the first brick must always be up, irrespective of the > > status of the second brick. If only the second brick is up, the > > subvolume becomes read-only." > > > By default client-quorum is "none" in replica 2 volume.I'm not sure where I saw the directions saying to set it, but I do have "cluster.quorum-type: auto" in my volume configuration. (And I think that's client quorum, but feel free to correct me if I've misunderstood the docs.)> It applies to all the replica 2 volumes even if it has just 2 brick or more. > Total brick count in the volume doesn't matter for the quorum, what matters > is the number of bricks which are up in the particular replica subvol.Thanks for confirming that.> If I understood your configuration correctly it should look something like > this: > (Please correct me if I am wrong) > replica-1: bricks 1 & 2 > replica-2: bricks 3 & 4 > replica-3: bricks 5 & 6Yes, that's correct.> Since quorum is per replica, if it is set to auto then it needs the first > brick of the particular replica subvol to be up to perform the fop. > > In replica 2 volumes you can end up in split-brains.How would that happen if bricks which are not in (cluster-wide) quorum refuse to accept writes? I'm not seeing the reason for using individual subvolume quorums instead of full-volume quorum.> It would be great if you can consider configuring an arbiter or > replica 3 volume.I can. My bricks are 2x850G and 4x11T, so I can repurpose the small bricks as arbiters with minimal effect on capacity. What would be the sequence of commands needed to: 1) Move all data off of bricks 1 & 2 2) Remove that replica from the cluster 3) Re-add those two bricks as arbiters (And did I miss any additional steps?) Unfortunately, I've been running a few months already with the current configuration and there are several virtual machines running off the existing volume, so I'll need to reconfigure it online if possible. -- Dave Sherohman