Hey all: It seems like my question is related to ha, drbd and xen . Hence posting to all of them at once. I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos 5.2. As I was testing this cluster for high availibility, I noticed some issues 1) domA is running under node1. when I manually shutdown node 1, sometimes it is migrated automatically to node2 and sometimes it is restarted in node2. Why is this happening? 2) domA is running under node1. when I pull off the network cable, domA is restarted in node 2 with no problem. But when the node1 comes back, domA is not migrated to node1 and if i do ''xm list'' under node1, I see "migrating-domain". This is complicating everything. My ha.cf file looks: logfacility local0 udpport 694 keepalive 1 deadtime 5 warntime 3 initdead 10 ucast eth0 10.42.40.198 ucast eth0 10.42.40.26 auto_failback on watchdog /dev/watchdog debugfile /var/log/ha-debug node ha1.domain.local node ha2.domain.local Help ! Thanks in advance Paras. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Hey all: > > It seems like my question is related to ha, drbd and xen . Hence posting to all of them at once. > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos 5.2. As I was testing this cluster for high availibility, I noticed some issues > > 1) domA is running under node1. when I manually shutdown node 1, sometimes it is migrated automatically to node2 and sometimes it is restarted in node2. Why is this happening? > 2) domA is running under node1. when I pull off the network cable, domA is restarted in node 2 with no problem. But when the node1 comes back, domA is not migrated to node1 and if i do ''xm list'' under node1, I see "migrating-domain". This is complicating everything. >1) Most likely live migration fails for some reason and therefore the domA is restarted in node2. Could be a timer issue or a problem with release of resources. You should be able to see something from the logs during shutdown on node1. 2) heartbeat on node1 will sense an error and try to migrate domA to node2 when node1 is up again. But the node2 has already started domA and you basically have domA running on both nodes. To avoid split situations like this you should really use a STONITH device that can reboot the other node, a hardware device connected via serial cable is most secure, but a cheaper alternative is to use soft stonith device that can reboot the other node via SSH or telnet. You probably need to tweak heartbeat as well to allow it to do further checks, for example test connectivity to your gateway. Do you have two NICs in both nodes or are you running DRBD, HA and data traffic over same NIC? Regards, Daniel http://www.asplund.nu/xencluster.html _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Oct 5, 2008 at 3:54 AM, Daniel Asplund <danielsaori@gmail.com>wrote:> > > > Hey all: > > > > It seems like my question is related to ha, drbd and xen . Hence posting > to all of them at once. > > I have two nodes setup with xen 3.0.3, drbd82, heartbeat 2 under centos > 5.2. As I was testing this cluster for high availibility, I noticed some > issues > > > > 1) domA is running under node1. when I manually shutdown node 1, > sometimes it is migrated automatically to node2 and sometimes it is > restarted in node2. Why is this happening? > > 2) domA is running under node1. when I pull off the network cable, domA > is restarted in node 2 with no problem. But when the node1 comes back, domA > is not migrated to node1 and if i do ''xm list'' under node1, I see > "migrating-domain". This is complicating everything. > > > > 1) Most likely live migration fails for some reason and therefore the > domA is restarted in node2. Could be a timer issue or a problem with > release of resources. You should be able to see something from the > logs during shutdown on node1. > > 2) heartbeat on node1 will sense an error and try to migrate domA to > node2 when node1 is up again. But the node2 has already started domA > and you basically have domA running on both nodes. To avoid split > situations like this you should really use a STONITH device that can > reboot the other node, a hardware device connected via serial cable is > most secure, but a cheaper alternative is to use soft stonith device > that can reboot the other node via SSH or telnet. You probably need to > tweak heartbeat as well to allow it to do further checks, for example > test connectivity to your gateway.Yes it seems I need Stonith. At least for now I want to use stonith ssh for testing purposes. One thing that i am confused, how do i configure stonith and what is the typical practise. In above scenario, node1 should be rebooted or node2. What i did is under node1, I added "stonith_host * ssh node2" to ha.cf and under node2: "stonith_host * ssh node1". But this is not working. Is that the way to configure stonith. I have checked linux-ha.org + google, but this confusion persists. What I want is, if there is a network outage in node1, it should be automatically rebooted or shutdown migrating all domUs to node2.> > > Do you have two NICs in both nodes or are you running DRBD, HA and > data traffic over same NIC?Daniel, Yes I have 2 NICs in both nodes.> > Regards, Daniel > http://www.asplund.nu/xencluster.html >Thanks Paras. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users