In 2-node clusters, never allow cman or rgmanager to start on boot. A
node will reboot for two reasons; it was fenced or it is scheduled
maintenance. In the former case, you want to review it before restoring
it. In the later case, a human is there to start it already. This is
good advice for 3+ clusters as well.
As an aside, the default timeout to wait for the peer on start is 6
seconds, which I find to be too short. I up it to 30 seconds with:
<fence_daemon post_join_delay="30" />
As for the fence-on-start, it could be a network issue. Have you tried
unicast instead of multicast? Try this:
<cman transport="udpu" expected_votes="1"
two_node="1" />
Slight comment;
> When cluster being quorum,
Nodes are always quorate in 2-node clusters.
digimer
On 29/10/14 04:44 AM, aditya hilman wrote:> Hi Guys,
>
> I'm using centos 6.5 as guest on RHEV and rhcs for cluster web
environment.
> The environtment :
> web1.example.com
> web2.example.com
>
> When cluster being quorum, the web1 reboots by web2. When web2 is going up,
> web2 reboots by web1.
> Does anybody know how to solving this "fence loop" ?
> master_wins="1" is not working properly, qdisk also.
> Below the cluster.conf, I re-create "fresh" cluster, but the
fence loop is
> still exist.
>
> <?xml version="1.0"?>
> <cluster config_version="7" name="web-cluster">
> <clusternodes>
> <clusternode name="web2.cluster"
nodeid="1">
> <fence>
> <method name="fence-web2">
> <device
name="fence-rhevm"
> port="web2.cluster"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="web3.cluster"
nodeid="2">
> <fence>
> <method name="fence-web3">
> <device
name="fence-rhevm"
> port="web3.cluster"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_rhevm"
ipaddr="192.168.1.1"
> login="admin at internal" name="fence-rhevm"
passwd="secret" ssl="on"/>
> </fencedevices>
> </cluster>
>
>
> Log : /var/log/messages
> Oct 29 07:34:04 web2 corosync[1182]: [QUORUM] Members[1]: 1
> Oct 29 07:34:04 web2 corosync[1182]: [QUORUM] Members[1]: 1
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
> fence_rhevm result: error from agent
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
> fence_rhevm result: error from agent
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
> Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
> Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
> Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
> Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
> Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
> Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
> Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting
> Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting
>
>
> Thanks
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?