Darren Thompson
2007-Nov-12 04:29 UTC
Re: [Xen-users] bonding combined with network-bridge fails heartbeat cluster on dom0
Help I have a problem that I cannot resolve and am requesting your help with getting in-touch with the appropriate people to assist. I have successfully used the configuration instructions and customised XEN bonding script sourced from: http://vandelande.com/guides/howto% 20setup%20XEN%20using%20network%20bonding%20on%20SLES10.html (original and modified XEN script attached) and it works well with single servers. Problem Summary: I can get any two of the three combinations of XEN, Heartbeat and Bonding working but when all three together are used, the Hearbeat fails to communicate, killing the servers through "split brain condition" - STONITH. I initially encountered this problem on HP Blade servers but have since succeeded in recreating the same issue using VMWARE VM'' (VM configs attached) so I am fairly confident that it is not a hardware related issue. This configuration (without hearbeat clustering) works on single servers without issue. If I do not team the NIC'' then both XEN and Heartbeat appear to work as expected so the problem is a combination of all three. Detailed description: Two servers running SLES10SP1. Each has two network cards (Physical restraint on HP blades - hence the desire to use bonding, to increase availability). The network cards are bonded to create a virtual bond0 interface. (NIC & teaming config files attached) The two servers are configured to run Heartbeat (configs files attached) When booted to non XEN kernel both the NIC bonding and Heartbeat work without issue. When booted to the XEN kernel the heartbeat fails to communicate (protocol: broadcast, multicast or unicast makes no difference) but the servers can communities successfully in all other regards. I found this message thread on XENSOURCE: http://lists.xensource.com/archives/html/xen-users/2006-12/msg00650.html ,although the ''work-around'' does not appear to work in my case and is otherwise not suitable, although it does indicate that this issue has been identified previously and not resolved. Since I now have this problem configuration running under VMWARE I can provide a wealth of scripts, error logs etc (I have attached the VMWARE congif files for the VM servers to facilitate someone recreating my exact configuration) Nasty Work Around: I have found that taking any one of the two network cards from the TEAM and configuring them to connect to the same network will facilitate the HB working, although it completely defeats the whole purpose of using the teaming, it does indicate the issue is with the way XEN modifies the bonding driver at startup. I can also get the servers to work if I do not attempt any bonding but configure the NIC'' separately, eth0 for XEN and eth1 for Heartbeat. Feel free to include my contact details to anyone who you think can assist Regards Darren Thompson Professional Services Engineer AkurIT Level 24, Santos House 91 King William Street Adelaide SA 5000 Australia Tel: +61 8 8233 5873 Fax: +61 8 8233 5911 Mobile: +61 0400 640 414 Mail: darrent@akurit.com.au _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users