Hi,
I?ve setup a simple ctdb cluster. Actually copied the config file from an
existing system.
Thats what happens:
Node 1, alone
Number of nodes:2
pnn:0 10.0.0.1 OK (THIS NODE)
pnn:1 10.0.0.2 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:1369816268
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Node1, after start of ctdb on Node 2
Number of nodes:2
pnn:0 10.0.0.1 OK (THIS NODE)
pnn:1 10.0.0.2 UNHEALTHY
Generation:1369816268
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Node 1, 1 minute later
Number of nodes:2
pnn:0 10.0.0.1 OK (THIS NODE)
pnn:1 10.0.0.2 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:1369816268
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0
Node 2
Number of nodes:2
pnn:0 10.0.0.1 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 10.0.0.2 OK (THIS NODE)
Generation:2125944281
Size:1
hash:0 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1
?> RESULTS in splitbrain, both nodes have the public ip
Node 1 Log
2014/07/03 16:07:59.033170 [33243]: Starting CTDBD as pid : 33243
2014/07/03 16:07:59.036903 [33243]: Vacuuming is disabled for persistent
database group_mapping.tdb
2014/07/03 16:07:59.040167 [33243]: Vacuuming is disabled for persistent
database account_policy.tdb
2014/07/03 16:07:59.043457 [33243]: Vacuuming is disabled for persistent
database share_info.tdb
2014/07/03 16:07:59.046547 [33243]: Vacuuming is disabled for persistent
database secrets.tdb
2014/07/03 16:07:59.049848 [33243]: Vacuuming is disabled for persistent
database registry.tdb
2014/07/03 16:07:59.052966 [33243]: Vacuuming is disabled for persistent
database passdb.tdb
2014/07/03 16:07:59.053005 [33243]: Freeze priority 1
2014/07/03 16:07:59.053378 [33243]: Freeze priority 2
2014/07/03 16:07:59.053602 [33243]: Freeze priority 3
2014/07/03 16:07:59.229670 [33243]: Freeze priority 1
2014/07/03 16:07:59.229780 [33243]: Freeze priority 2
2014/07/03 16:07:59.229863 [33243]: Freeze priority 3
2014/07/03 16:07:59.247015 [33243]: Set DeterministicIPs to 0
2014/07/03 16:07:59.253600 [33243]: Set NoIpFailback to 1
2014/07/03 16:08:03.235484 [33287]: Taking out recovery lock from recovery
daemon
2014/07/03 16:08:03.235584 [33287]: Take the recovery lock
2014/07/03 16:08:03.236070 [33287]: Recovery lock taken successfully
2014/07/03 16:08:03.236198 [33287]: Recovery lock taken successfully by recovery
daemon
2014/07/03 16:08:03.237080 [33243]: Freeze priority 1
2014/07/03 16:08:03.237189 [33243]: Freeze priority 2
2014/07/03 16:08:03.237274 [33243]: Freeze priority 3
2014/07/03 16:08:03.424076 [33243]: Thawing priority 1
2014/07/03 16:08:03.424117 [33243]: Release freeze handler for prio 1
2014/07/03 16:08:03.424147 [33243]: Thawing priority 2
2014/07/03 16:08:03.424160 [33243]: Release freeze handler for prio 2
2014/07/03 16:08:03.424184 [33243]: Thawing priority 3
2014/07/03 16:08:03.424195 [33243]: Release freeze handler for prio 3
2014/07/03 16:08:03.748739 [33287]: Resetting ban count to 0 for all nodes
2014/07/03 16:08:14.760888 [33287]: Trigger takeoverrun
2014/07/03 16:08:18.574646 [33243]: Starting SMB services: [ OK ]
2014/07/03 16:08:18.575198 [33243]: Register srvid 18302628885633695744 for
client 65746
2014/07/03 16:08:18.575789 [33243]: Deregister srvid 18302628885633695744 for
client 65746
2014/07/03 16:08:18.588310 [33243]: Register srvid 18302628885633695744 for
client 65746
2014/07/03 16:08:18.591688 [33243]: Deregister srvid 18302628885633695744 for
client 65746
2014/07/03 16:08:18.936008 [33287]: Trigger takeoverrun
2014/07/03 16:08:20.288537 [33287]: Trigger takeoverrun
2014/07/03 16:08:23.891691 [33243]: Node became HEALTHY. Ask recovery master 0
to perform ip reallocation
2014/07/03 16:10:39.962127 [33287]: client/ctdb_client.c:759 control timed out.
reqid:67831 opcode:80 dstnode:1
2014/07/03 16:10:39.962203 [33287]: client/ctdb_client.c:870 ctdb_control_recv
failed
2014/07/03 16:10:39.962219 [33287]: Async operation failed with state 3,
opcode:80
2014/07/03 16:10:39.962235 [33287]: Async wait failed - fail_count=1
2014/07/03 16:10:39.962251 [33287]: server/ctdb_recoverd.c:251 Failed to read
node capabilities.
2014/07/03 16:10:39.962264 [33287]: server/ctdb_recoverd.c:3041 Unable to update
node capabilities.
2014/07/03 16:11:00.984133 [33287]: client/ctdb_client.c:759 control timed out.
reqid:67841 opcode:80 dstnode:1
2014/07/03 16:11:00.984201 [33287]: client/ctdb_client.c:870 ctdb_control_recv
failed
2014/07/03 16:11:00.984217 [33287]: Async operation failed with state 3,
opcode:80
2014/07/03 16:11:00.984234 [33287]: Async wait failed - fail_count=1
2014/07/03 16:11:00.984285 [33287]: server/ctdb_recoverd.c:251 Failed to read
node capabilities.
2014/07/03 16:11:00.984301 [33287]: server/ctdb_recoverd.c:3041 Unable to update
node capabilities.
2014/07/03 16:11:04.261771 [33287]: ctdb_control error: 'node is
disconnected'
2014/07/03 16:11:04.261821 [33287]: ctdb_control error: 'node is
disconnected'
2014/07/03 16:11:04.261841 [33287]: Async operation failed with ret=-1 res=-1
opcode=80
2014/07/03 16:11:04.261854 [33287]: Async wait failed - fail_count=1
2014/07/03 16:11:04.261884 [33287]: server/ctdb_recoverd.c:251 Failed to read
node capabilities.
2014/07/03 16:11:04.261896 [33287]: server/ctdb_recoverd.c:3041 Unable to update
node capabilities.
2014/07/03 16:11:04.261920 [33287]: client/ctdb_client.c:706 reqid 67841 not
found
2014/07/03 16:11:04.261947 [33287]: client/ctdb_client.c:706 reqid 67831 not
found
Node 2 Log
2014/07/03 16:10:15.590428 [17182]: Starting CTDBD as pid : 17182
2014/07/03 16:10:15.594254 [17182]: Vacuuming is disabled for persistent
database account_policy.tdb
2014/07/03 16:10:15.597875 [17182]: Vacuuming is disabled for persistent
database registry.tdb
2014/07/03 16:10:15.601015 [17182]: Vacuuming is disabled for persistent
database secrets.tdb
2014/07/03 16:10:15.604113 [17182]: Vacuuming is disabled for persistent
database share_info.tdb
2014/07/03 16:10:15.607215 [17182]: Vacuuming is disabled for persistent
database passdb.tdb
2014/07/03 16:10:15.610304 [17182]: Vacuuming is disabled for persistent
database group_mapping.tdb
2014/07/03 16:10:15.610342 [17182]: Freeze priority 1
2014/07/03 16:10:15.610689 [17182]: Freeze priority 2
2014/07/03 16:10:15.610959 [17182]: Freeze priority 3
2014/07/03 16:10:15.787984 [17182]: Freeze priority 1
2014/07/03 16:10:15.788078 [17182]: Freeze priority 2
2014/07/03 16:10:15.788162 [17182]: Freeze priority 3
System Details:
Redhat 6.5
Nodes:
10.0.0.1
10.0.0.2
public_addresses:
10.98.81.2/24 bond0
Ctdb:
CTDB_RECOVERY_LOCK=/mnt/media23/.ctdb_lock/lock.file
CTDB_DEBUGLEVEL=ERR
CTDB_MANAGES_SAMBA=yes
CTDB_PUBLIC_INTERFACE=bond0
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_SET_NoIpFailback=1
CTDB_SET_DeterministicIPs=0
The lock Filesystem is a Stornext Filesystem
Any help would be apreciated.
Cheers
Axel
--
View this message in context:
http://samba.2283325.n4.nabble.com/ctdb-split-brain-nodes-doesn-t-see-each-other-tp4668664.html
Sent from the Samba - General mailing list archive at Nabble.com.