Jeevan Patnaik
2019-Jan-25 08:10 UTC
[Gluster-users] Is it required for a node to meet quorum over all the nodes in storage pool?
Hi, I'm just going through the concepts of quorum and split-brains with a cluster in general, and trying to understand GlusterFS quorums again which I previously found difficult to accurately understand. When we talk about server quorums, what I understand is that the concept is similar to STONITH in cluster i.e., we shoot the node that probably have issues/ make the bricks down preventing access at all. But I don't get how it calculates quorum. My understanding: In a distributed replicated volume, 1. All bricks in a replica set should have same data writes and hence, it is required to meet atleast 51% quorum on those replica sets. Now considering following 3x replica configuration: ServerA,B,C,D,E,F-> brickA,B,C,D,E,F respectively and serverG without any brick in storage pool. Scenario: ServerA,B,F formed a partition i.e., they are isolated with other nodes in storage pool. But serverA,B,C bricks are of same sub-volume, Hence if we consider quorum over sub-volumes, A and B meets quorum for it's only participating sub-volume and can serve the corresponding bricks. And the corresponding bricks on C should go down. But when we consider quorum over storage pool, C,D,E,G meets quorum whereas A,B,F is not. Hence, bricks on A,B,F should fail. And for C, the quorum still will not me met for it's sub-volume. So, it will go to read only mode. Sub-volume on D and E should work normally. So, with assumption that only sub-volume quorum is considered, we don't have any downtime on sub-volumes, but we have two partitions and if clients can access both, clients can still write and read on both the partitions separately and without data conflict. The split-brain problem arrives when some clients can access one partition and some other. If quorum is considered for entire storage pool, then this split-brain will not be seen as the problem nodes will be dead. And so why is it's not mandatory to enable server quorum to avoid this split-brain issue? And I also assume that quorum percentage should be greater than 50%. There's any option to set custom percentage. Why is it required? If all that is required is to kill the problem node partition (group) by identifying if it has the largest possible share (i.e. greater than 50), does the percentage really matter? Thanks in advance! Regards, Jeevan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.gluster.org/pipermail/gluster-users/attachments/20190125/295123d2/attachment.html>
Atin Mukherjee
2019-Jan-31 03:33 UTC
[Gluster-users] Is it required for a node to meet quorum over all the nodes in storage pool?
On Fri, Jan 25, 2019 at 1:41 PM Jeevan Patnaik <g1patnaik at gmail.com> wrote:> Hi, > > I'm just going through the concepts of quorum and split-brains with a > cluster in general, and trying to understand GlusterFS quorums again which > I previously found difficult to accurately understand. > > When we talk about server quorums, what I understand is that the concept > is similar to STONITH in cluster i.e., we shoot the node that probably have > issues/ make the bricks down preventing access at all. But I don't get how > it calculates quorum. > > My understanding: > In a distributed replicated volume, > 1. All bricks in a replica set should have same data writes and hence, it > is required to meet atleast 51% quorum on those replica sets. Now > considering following 3x replica configuration: > ServerA,B,C,D,E,F-> brickA,B,C,D,E,F respectively and serverG without any > brick in storage pool. >Please note server quorum isn't calculated based on number of active bricks rather number of active nodes in the cluster. So in this case even if server G doesn't host any bricks in the storage pool, the quorum will be decided based on total number of servers/peers in the cluster vs total number of active peers in the cluster. If you're interested to know about it more, please refer staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum or github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-server-quorum.c#L281 (in case you are happy to browse the source code and understand the logic).> Scenario: > ServerA,B,F formed a partition i.e., they are isolated with other nodes in > storage pool. > > But serverA,B,C bricks are of same sub-volume, Hence if we consider quorum > over sub-volumes, A and B meets quorum for it's only participating > sub-volume and can serve the corresponding bricks. And the corresponding > bricks on C should go down. > > But when we consider quorum over storage pool, C,D,E,G meets quorum > whereas A,B,F is not. Hence, bricks on A,B,F should fail. And for C, the > quorum still will not me met for it's sub-volume. So, it will go to read > only mode. Sub-volume on D and E should work normally. > > So, with assumption that only sub-volume quorum is considered, we don't > have any downtime on sub-volumes, but we have two partitions and if clients > can access both, clients can still write and read on both the partitions > separately and without data conflict. The split-brain problem arrives when > some clients can access one partition and some other. > > If quorum is considered for entire storage pool, then this split-brain > will not be seen as the problem nodes will be dead. > > And so why is it's not mandatory to enable server quorum to avoid this > split-brain issue? > > And I also assume that quorum percentage should be greater than 50%. > There's any option to set custom percentage. Why is it required? > If all that is required is to kill the problem node partition (group) by > identifying if it has the largest possible share (i.e. greater than 50), > does the percentage really matter? > > Thanks in advance! > > Regards, > Jeevan. > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.gluster.org/pipermail/gluster-users/attachments/20190131/c4de3cd7/attachment.html>