Tijl Van den Broeck
2006-Dec-19 14:55 UTC
[Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
When setting up heartbeat for several dom0''s for usage of EVMS on SLES10 I''ve hit a problem. Bonding with network-bridge was solved with a custom network script from someone of the list: bond-network-script: #!/bin/sh dir=$(dirname "$0") "$dir/network-bridge" "$@" vifnum=0 netdev=bond0 As such, domU''s networking and bond0 were operational. Whenever the networking for Xen is launched, the EVMS-HA cluster dies. It looses connectivity over multicast. tcpdump sees the packages coming and going. On the "bad"-side it sees the other node going away. On the "healthy"-side (not yet started networking for Xen) heartbeat logs the following: Dec 19 15:50:06 xendev heartbeat: [6676]: ERROR: Message hist queue is filling up (200 messages in queue) Dec 19 15:50:06 xendev heartbeat: [6676]: debug: hist->ackseq =2459 Dec 19 15:50:06 xendev heartbeat: [6676]: debug: hist->lowseq =2633, hist->hiseq=2833 Dec 19 15:50:06 xendev heartbeat: [6676]: debug: expecting from xendev2 Dec 19 15:50:06 xendev heartbeat: [6676]: debug: it''s ackseq=2459 As soon as I go /etc/xen/scripts/bond-networking-script stop all trouble is over. Is this a specific multicast issue with the combination bonding+heartbeat+network-bridge? Has anyone got a config like this tested or running? greetings Tijl Van den Broeck _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tijl Van den Broeck
2006-Dec-20 09:13 UTC
Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
On 12/20/06, Ulrich Windl <ulrich.windl@rz.uni-regensburg.de> wrote:> On 19 Dec 2006 at 15:55, Tijl Van den Broeck wrote: > > > When setting up heartbeat for several dom0''s for usage of EVMS on > > SLES10 I''ve hit a problem. Bonding with network-bridge was solved with > > a custom network script from someone of the list: > > > > bond-network-script: > > #!/bin/sh > > dir=$(dirname "$0") > > "$dir/network-bridge" "$@" vifnum=0 netdev=bond0 > > I did not manage to get a stable bridge networking with bonding and SLES10 that > way. Maybe tell us which bonding mode you are using at least. >I''ve tested it using mode 1 and mode 5. The NIC''s are HP NetXtreme II BCM5708S Gigabit ethernet''s if it''s relevant. I did have some issues inside the domU''s now you mention it, I had to "fix" the gateway''s arp table entry with a tiny bootup script, since it appears receive a corrupted arp entry. After that fix: everything works stable, failover works and speed is as good as it gets. Thinking about that fix, it could be arp in dom0 gets corrupt as well, so I tested adding fixed arp entries in my heartbeat dom0''s... but still, heartbeat goes down if I bring up the xen networking. Other IP traffic works just fine. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ulrich Windl
2006-Dec-20 12:35 UTC
Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
On 20 Dec 2006 at 10:13, Tijl Van den Broeck wrote: [...]> Thinking about that fix, it could be arp in dom0 gets corrupt as well, > so I tested adding fixed arp entries in my heartbeat dom0''s... but > still, heartbeat goes down if I bring up the xen networking. Other IP > traffic works just fine.Have you tried tcpdump (looking for heartbeat packets) or ethereal for the DomU device? Then on the pdev (bond) device? Regards, Ulrich _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tijl Van den Broeck
2006-Dec-21 11:40 UTC
Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
On 12/20/06, Ulrich Windl <ulrich.windl@rz.uni-regensburg.de> wrote:> Have you tried tcpdump (looking for heartbeat packets) or ethereal for the DomU > device? Then on the pdev (bond) device? > > Regards, > Ulrich > >My primary concern is for the moment to fix dom0 networking before thinking of domU''s :-) I''ve ran tcpdumps in a healthy environment (just eth0 + heartbeat + xenbr0) and in the bonded one... contents of the HA packets are exactly the same; NS_ackmgs and status ones. I looked into the arp issues, but realised I couldn''t have any, because heartbeat is multicasting, and arp multicast flags seem to be set. I enabled multicast in the function create_bridge from xen-network-common.sh so the bridge should support multicast. Perhaps it''s something with the software bridge not forwarding the multicasts properly in xenbr0. I just tried the network-bridge script from xen 3.03 under sles 10, but that one doesn''t even bring up the network at all. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tijl Van den Broeck
2006-Dec-22 09:36 UTC
Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
Issue resolved in a weird an strange manner which I''m not fond of. Running lots of tcpdumps learned me that traffic from multicasts reached the following interfaces (from and to): (xendev = normal eth0 + xenbridging machine, xendev2 = bond0 + xenbridging machine, tcpdumps taken from xendev2) xenbr0: multicasts xendev vif0.0: multicasts xendev bond0: multicasts xendev, xendev2 The bridge doesn''t "forward" multicasts (allthough it should, I modified the scripts so that it should, but it doesn''t). I tried setting ha.cf to "mcast xenbr0 224.224.224.1 694 2 0" but that didn''t help either. But having resolved this all to a software bridge multicast problem hinted me to try unicasting. Setting the following allowed the HA cluster to come back up. xendev:/etc/ha.d/ha.cf "ucast eth0 xendev2" xendev2:/etc/ha.d/ha.cf "ucast xenbr0 xendev" Ulrich, if you want I can mail you the entire setup so you can try to emulate and get bonding working as well, I''ve read of your weird bonding issues as well (http://lists.xensource.com/archives/html/xen-users/2006-11/msg00504.html) but 2 bonding interfaces could give some more problems offcourse. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users