thr3ads.net - CentOS - [CentOS] two 2-node clusters or one 4-node cluster? [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Gianluca Cecchi

2018-Jul-05 15:27 UTC

[CentOS] two 2-node clusters or one 4-node cluster?

Hello,
I'm planning migration of current two clusters based on CentOS 6.x with
Cman/Rgmanager going to CentOS 7.x and Corosync/Pacemaker.

As the clusters and their services are on the same subnet, and there no
particular security concerns differentiating them, I'm also evaluating the
option to transform the two clusters into a unique 4-node one during the
upgrade.

Currently I'm testing a virtual 4-node CentOS 7.4 cluster inside oVirt 4.2
and things seem to behave well.

Before going further in deep with tests and so on, I'd like to check with
the community about how many CentOS 7.x clusters composed by more than two
nodes are in place and what are the feedbacks on them in terms of
incremented latency/communication, ecc scaling out.

Also general feedback related to CentOS 6 and scalability of cluster nodes
number is welcome.

Thanks in advance,
Gianluca

Digimer

2018-Jul-05 17:10 UTC

head link

[CentOS] two 2-node clusters or one 4-node cluster?

On 2018-07-05 11:27 AM, Gianluca Cecchi wrote:> Hello,
> I'm planning migration of current two clusters based on CentOS 6.x with
> Cman/Rgmanager going to CentOS 7.x and Corosync/Pacemaker.
> 
> As the clusters and their services are on the same subnet, and there no
> particular security concerns differentiating them, I'm also evaluating
the
> option to transform the two clusters into a unique 4-node one during the
> upgrade.
> 
> Currently I'm testing a virtual 4-node CentOS 7.4 cluster inside oVirt
4.2
> and things seem to behave well.
> 
> Before going further in deep with tests and so on, I'd like to check
with
> the community about how many CentOS 7.x clusters composed by more than two
> nodes are in place and what are the feedbacks on them in terms of
> incremented latency/communication, ecc scaling out.
> 
> Also general feedback related to CentOS 6 and scalability of cluster nodes
> number is welcome.
> 
> Thanks in advance,
> Gianluca
I always prioritize simplicity and isolation, so I vote fore 2x 2-nodes.
There is no effective benefit to 3+ nodes (quorum is arguably helpful,
but proper stonith, which you need anyway, makes it mostly a moot point).

Keep in mind; If your services are critical enough to justify an HA
cluster, they're probably important enough that adding the
complexity/overhead of larger clusters doesn't offset any hardware
efficiency savings. Lastly, with 2x 2-node, you could lose two nodes
(one per cluster) and still be operational. If you lose 2 nodes of a
four node cluster, you're offline.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein?s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

Alexander Dalloz

2018-Jul-05 19:07 UTC

head link

[CentOS] two 2-node clusters or one 4-node cluster?

Am 05.07.2018 um 17:27 schrieb Gianluca Cecchi:> Hello,
> I'm planning migration of current two clusters based on CentOS 6.x with
> Cman/Rgmanager going to CentOS 7.x and Corosync/Pacemaker.
> 
> As the clusters and their services are on the same subnet, and there no
> particular security concerns differentiating them, I'm also evaluating
the
> option to transform the two clusters into a unique 4-node one during the
> upgrade.
> 
> Currently I'm testing a virtual 4-node CentOS 7.4 cluster inside oVirt
4.2
> and things seem to behave well.
> 
> Before going further in deep with tests and so on, I'd like to check
with
> the community about how many CentOS 7.x clusters composed by more than two
> nodes are in place and what are the feedbacks on them in terms of
> incremented latency/communication, ecc scaling out.
> 
> Also general feedback related to CentOS 6 and scalability of cluster nodes
> number is welcome.
> 
> Thanks in advance,
> Gianluca
 From my point of view such classical cluster setups are so 2000s. 
Outdated by modern infrastructure concepts you see implemented in 
Kubernetes, OpenShift or cloud solutions in general. It's commonly 
summarized in the phrase "pets versus cattle". You don't want
clusters
to be treated as pets. Has always been difficult to maintain.

Obviously I don't know what you run on your old cluster and whether you 
can migrate to a modern setup instead of replicating it on a current 
major release. You didn't give us details.

Alexander

Warren Young

2018-Jul-06 16:17 UTC

head link

[CentOS] two 2-node clusters or one 4-node cluster?

On Jul 5, 2018, at 11:10 AM, Digimer <lists at alteeve.ca>
wrote:>
> There is no effective benefit to 3+ nodes
That depends on your requirements and definition of terms.

For one very useful (and thus popular) definition of ?success? in distributed
computing, you need at least four nodes, and that only allows you to catch a
single failed node at a time:

http://lamport.azurewebsites.net/pubs/reaching.pdf

See section 4 for the mathematical proof. This level of redundancy is necessary
to achieve what is called Byzantine fault tolerance, which is where we cannot
trust all of the nodes implicitly, and thus must achieve consistency by
consensus and cross-checks.

A less stringent success criterion assumes a less malign environment and allows
for a minimum of three nodes to prevent system failure in the face of a single
fault at a time:

http://lamport.azurewebsites.net/pubs/lower-bound.pdf

With your suggested 2-node setup, you only get replication, which is not the
same thing as distributed fault tolerance:

http://paxos.systems/how.html

To see the distinction, consider what happens if one node in a mirrored pair
goes down and then the one remaining up is somehow corrupted before the downed
node comes back up: the second will be corrupted as soon as it comes back up
because it mirrors the corruption.

The n >= 2f+1 criterion prevents this problem in the face of normal hardware
or software failure, with a minimum of 3 nodes required to reliably detect and
cope with a single failure at a time.

The more stringent n > 2m+1 criterion prevents this problem in the face of
nodes that may be actively hostile, with 4 nodes being required to reliably
catch a single traitorous node.

That terminology comes from one of the most important papers in distributed
computing, ?The Byzantine Generals Problem,? co-authored by Leslie Lamport, who
was also involved in all of the above work:

https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf

And who is Leslie Lamport? He is the 2013 Turing Award winner, which is as
close to the Nobel Prize as you can get with work in pure computer science:

https://en.wikipedia.org/wiki/Leslie_Lamport

So, if you want to argue with the papers above, you?d better bring some pretty
good arguments. :)

If his current affiliation with Microsoft bothers you, realize that he did all
of this work prior to joining Microsoft in 2001.

Also, he?s also the ?La? in LaTeX. :)
> (quorum is arguably helpful, but proper stonith, which you need anyway,
makes it mostly a moot point).
STONITH is orthogonal to the concepts expressed in the CAP theorem:

https://en.wikipedia.org/wiki/CAP_theorem

It is mathematically impossible to escape the restrictions of the CAP theorem.
I?ve seen people try, but it inevitably amounts to Humpty Dumpty logic:
"When I use a word," Humpty Dumpty said, in rather a scornful tone,
"it means just what I choose it to mean ? neither more nor less.? You can
win as many arguments as you like if you get to redefine the foundational terms
upon which the argument is based to suit your needs at different stages of the
argument.

With that understanding, we can say that a 2-node setup results in one of the
following consequences:

OPTION 1: Give up Consistency (C)
---------------------------------
If you give up consistency, you get an AP system:

- While one node in a mirrored pair is down, the other is up giving you
availability (A), assuming all clients can normally see both nodes.

- Since you give up on consistency (C), you can put the two nodes in different
data centers to gain partition tolerance (P) over the data paths within and
between those data centers. This only gets you availability as well if both data
centers can be seen by all clients.

In other words, you can tolerate either a loss of a single node or a partition
between them, but not both at the same time, and while either condition applies,
you cannot guarantee consistency in query replies.

This mode of operation is sometimes called ?eventual consistency,? meaning that
it?s expected that there will be periods of time where multiple nodes are online
but they don?t all respond to identical queries with the same data.

OPTION 2: Require Consistency
-----------------------------
In order to get consistency, a 2-node system behaves like it is a bare-minimum
quorum in a faux 3-node cluster, with the third node always MIA. As soon as one
of the two ?remaining? nodes goes down, you have two bad choices:

1. Continue to treat it as a cluster. Since you have no second node to check
your transactions against to maintain consistency, the cluster must stop write
operations until the downed node is restored. Read-only operations can
continue, if we?re willing to accept the risk of the remaining system somehow
becoming corrupted without going down entirely.

2. Split the cluster so that the remaining node is a standalone instance, which
means you have no distributed system any more, so the CAP theorem no longer
applies. You might continue to have C-as-in-ACID, if your software stack is
ACID-compliant in the first place, but you lose C-as-in-CAP until the second
system comes back up and is restored to full operation.

We can have a CA cluster in one of the two senses above. Only the first is a
true cluster, though, and it?s not useful in many applications, since it blocks
writes.

CP has no useful meaning in this situation, since a network partition will split
the cluster, so that you don?t actually have partition tolerance. This is why
the minimum number of nodes is 3: so that a quorum remains on one side of the
partition!

Regardless of which path you pick above, you lose something relative to an n
>= 2f+1 or an n > 2m+1 cluster. TANSTAAFL.
> Lastly, with 2x 2-node, you could lose two nodes
> (one per cluster) and still be operational. If you lose 2 nodes of a
> four node cluster, you're offline.
I don?t see how both statements can be true under the same CAP design
restrictions, as laid out above. I think if you analyze the configuration of
your different scenarios, you?ll find that they?re not in the same CAP regime.

Lacking more information, I?m guessing this 2x2 configuration you?re thinking of
is AP, and your 4-min-3 configuration is CP.

A 4-node AP configuration could lose 2 nodes and keep running, as long as all
clients can see at least one of the remaining nodes, since you?ve given up on
continual consistency.

Warren Young

2018-Jul-06 16:27 UTC

head link

[CentOS] two 2-node clusters or one 4-node cluster?

On Jul 5, 2018, at 1:07 PM, Alexander Dalloz <ad+lists at uni-x.org>
wrote:> 
> classical cluster setups are so 2000s. Outdated by modern infrastructure
concepts you see implemented in Kubernetes, OpenShift or cloud solutions in
general. It's commonly summarized in the phrase "pets versus
cattle". You don't want clusters to be treated as pets.
That depends on how expensive it is to grow the herd, to extend your metaphor.

For example, if you have a big DBMS, it?s probably much, much faster to maintain
a live spare that you can fail over to instantly than it is to spin up a new VM
with OpenKuberStack? and clone the complete DBMS over the Internet.  At 1
Gbit/sec, every 10 TB costs you about a day of replication time!

The herd-of-cattle model and follow-ons like ?serverless? assume you can spin a
new one up in a second or so.  That ain?t always possible.

Gianluca Cecchi

2018-Jul-07 13:54 UTC

head link

[CentOS] two 2-node clusters or one 4-node cluster?

On Thu, Jul 5, 2018 at 7:10 PM, Digimer <lists at alteeve.ca> wrote:

First of all thanks for all your answers, all useful in a way or another. I
have yet to dig sufficiently deep in Warren considerations, but I will do
it, I promise! Very interesting arguments
The concerns of Alexander are true in an ideal world, but when your role is
to be an IT Consultant and you are not responsible for the budget and for
the department, is not so easy to convince about emerging concepts that due
to their nature are not so rock solid and accepted (yet).
In my work lifetime I had the fortune to be on both the sides of the IT
chair and so I think I'm able to see all the points.
Eg in 2004 I was the IT Manager of a small company (without responsibility
of the budget, I had to convince my CEO at that time; company revenue about
50 million euros) and I did migrate the physical environment to VMware and
a Dell CX300 SAN, but it was not so easy, believe in me. I left the company
at end of 2007 and the same untouched 3-years old environment ran for other
4 years without any modification or problems.
And bare on me, at least in Italy in 2004 it wasn't a so common environment
to setup for production.

I always prioritize simplicity and isolation, so I vote fore 2x
2-nodes.> There is no effective benefit to 3+ nodes (quorum is arguably helpful,
> but proper stonith, which you need anyway, makes it mostly a moot point).
>
> In this particular scenario I run several Oracle RDBMS instances. They arecurrently distributed as 3 big ones on the first cluster and other 7
smaller on the other one.
With chance to grow up.
So in my case I think I can spread better in my opinion the load and have
better high availability.

> Keep in mind; If your services are critical enough to justify an HA
> cluster, they're probably important enough that adding the
> complexity/overhead of larger clusters doesn't offset any hardware
> efficiency savings.

Probably true for old RHCS stack, based on Cman/Rgmanager. But from various
tests it seems Corosync/Pacemaker is much more smooth in managing more than
2 nodes' clusters

> Lastly, with 2x 2-node, you could lose two nodes
> (one per cluster) and still be operational. If you lose 2 nodes of a
> four node cluster, you're offline.
>
>This is true with default configuration, but you can configure Auto Tie
Breaker (ATB) as you can see with "man votequorum", or an example web
page
here:
https://www.systutorials.com/docs/linux/man/5-votequorum/

I just tested and verified it on my virtual 4-nodes based on CentOS 7.4,
where I have:

- modified corosync.conf on all nodes
- pcs cluster stop --all
- pcs cluster start --all
- wait a few minutes for resources to start
- shutdown cl3 and cl4

and this is the situation at the end, without downtime and with cluster
quorate

[root at cl1 ~]# pcs status
Cluster name: clorarhv1
Stack: corosync
Current DC: intracl2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
Last updated: Sat Jul  7 15:25:47 2018
Last change: Thu Jul  5 18:09:52 2018 by root via crm_resource on intracl2

4 nodes configured
15 resources configured

Online: [ intracl1 intracl2 ]
OFFLINE: [ intracl3 intracl4 ]

Full list of resources:

 Resource Group: DB1
     LV_DB1_APPL (ocf::heartbeat:LVM): Started intracl1
     DB1_APPL (ocf::heartbeat:Filesystem): Started intracl1
     LV_DB1_CTRL (ocf::heartbeat:LVM): Started intracl1
     LV_DB1_DATA (ocf::heartbeat:LVM): Started intracl1
     LV_DB1_RDOF (ocf::heartbeat:LVM): Started intracl1
     LV_DB1_REDO (ocf::heartbeat:LVM): Started intracl1
     LV_DB1_TEMP (ocf::heartbeat:LVM): Started intracl1
     DB1_CTRL (ocf::heartbeat:Filesystem): Started intracl1
     DB1_DATA (ocf::heartbeat:Filesystem): Started intracl1
     DB1_RDOF (ocf::heartbeat:Filesystem): Started intracl1
     DB1_REDO (ocf::heartbeat:Filesystem): Started intracl1
     DB1_TEMP (ocf::heartbeat:Filesystem): Started intracl1
     VIP_DB1 (ocf::heartbeat:IPaddr2): Started intracl1
     oracledb_DB1 (ocf::heartbeat:oracle): Started intracl1
     oralsnr_DB1 (ocf::heartbeat:oralsnr): Started intracl1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root at cl1 ~]#

This 4 node scenario seems also suitable to my needs because every one of
the current 2-nodes clusters is a stretched one, with one node in site A
and the other in site B.
The future scenario will see 2 nodes of the new cluster in site A and 2
nodes in site B, so that a failure of a site will compromise 2 nodes, but
with the setting above I can provide all the 10 RDBMS services spread
between two nodes but allowing me to decide where to put them and not force
to only a single node.

BTW: there is also last_man_standing option I can set so that I can also
tolerate the loss ot site B and while not yet resolved, the loss of one of
the two surviving nodes in site A (in this case possibly I will disable
some less critical services or tolerate degraded performances)

In this case the configuration in my case would be (not tested yet):

 quorum {
           provider: corosync_votequorum
           expected_votes: 4
           last_man_standing: 1
           auto_tie_breaker: 1
       }

Note that it is applied only on node loss and not when a node leaves the
cluster in a clean state, for which there is the "allow_downscale"
option
that seems not fully supported at this moment.

Cheers,
Gianluca

Reasonably Related Threads

Search for more reasonably related threads

CentOS - Jul 2018 - two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

[CentOS] two 2-node clusters or one 4-node cluster?

Reasonably Related Threads