Hi, I'm trying to set up a small CTDB cluster running in an IPoIB InfiniBand network. When I try to start up a cluster with a set of public IP addresses, the public addresses do not come online. So, I removed the public address configuration and started a single node up by hand, then tried to add a public address as follows: [root at gp-1-0 ctdb]# ctdb ip Public IPs on node 0 [root at gp-1-0 ctdb]# ctdb ip -Y :Public IP:Node:ActiveInterface:AvailableInterfaces:ConfiguredInterfaces: [root at gp-1-0 ctdb]# ctdb addip 192.168.5.151/24 ib0 2014/05/20 06:44:31.202611 [180506]: client/ctdb_client.c:2184 ctdb_control for takeover_ip failed 2014/05/20 06:44:31.202746 [180506]: Failed to take over IP on node 0 2014/05/20 06:44:31.202762 [180506]: Failed to move ip to node 0 [root at gp-1-0 ctdb]# ctdb ip -Y :Public IP:Node:ActiveInterface:AvailableInterfaces:ConfiguredInterfaces: :192.168.5.151:-1:::ib0: As you can see, the address is listed but the VIP doesn't start. I know I can create virtual IPs on the ib0 interface, so Im hoping there's something obvious I'm missing. Versions are a bit out of date (running in CentOS 6.5): [root at gp-1-0 ctdb]# rpm -qa ctdb samba samba-3.6.9-151.el6_4.1.x86_64 ctdb-1.0.114.5-3.el6.x86_64 I've done this heaps of times with Ethernet public addresses but not with the IB interface. Maybe something in the eventscript? Pointers, advice welcomed. Regards, Malcolm.
I have found a workaround that is sufficient to bring the public addresses online. It also works for moving IP addresses and the usual stuff of rebalancing when starting and stopping nodes. There is still an ugly warning in the logs, namely: 2014/05/20 08:28:19.983280 [16459]: common/system_linux.c:120 not an ethernet address family (0x20) 2014/05/20 08:28:19.983332 [16459]: server/ctdb_takeover.c:240 sending of arp failed on iface 'ib0' (Invalid argument) but the system appears to be working, so I'm going to call it good until I can revisit properly (somewhat under the gun at the moment). In the end, all I did was add a case statement to the 10.interface eventscript: [root at gp-1-0 ~]# diff -au /etc/ctdb/orig/events.d/10.interface /etc/ctdb/events.d/10.interface --- /etc/ctdb/orig/events.d/10.interface 2014-05-20 08:10:11.000000000 +0100 +++ /etc/ctdb/events.d/10.interface 2014-05-20 07:52:11.344221132 +0100 @@ -78,6 +78,22 @@ case $IFACE in ib*) # we dont know how to test ib links + # mjcowe: workaround to test the *ibX connection. + # This is made complicated by the fact that TS cards + # use "qib" for the kernel device but "ib" for the + # ipoib interface. + cat /sys/class/infiniband/*$IFACE/ports/1/state | grep -q '4: ACTIVE' || { + echo "ERROR: No link on the public network interface $IFACE" + fail=1 + test -n "$OLDLINK" && { + ctdb setifacelink $IFACE down + } + continue + } + test -n "$OLDLINK" && { + ok=1 # we only set ok for interfaces known to ctdbd + ctdb setifacelink $IFACE up + } ;; *) [ -z "$IFACE" ] || { Sorry for the noise (and the poor quality of the comments in the code). Malcolm.