Hi, I am setting up a two node Samba cluster with CTDB in AWS in two different subnets. All IP ports are open between these two subnets. I am initially forming the Samba cluster with one node, then will add the second node after startup of CTDB. I am not using public_addresses for CTDB due to AWS not supporting VIP's. I am using 64bit Amazon Linux with two NICs defined, eth0 as the primary NIC, eth1 as the private IP NIC. With clustering off and no CTDB, Samba works. I need to get this running for a needed project. Only errors are reported in /var/log/log.ctdb. Please help. CTDB Configs: Edit /etc/sysconfig/ctdb for the following to change from default. CTDB_RECOVERY_LOCK="/samba/samba_lock" CTDB_NODES=/etc/ctdb/nodes CTDB_DEBUGLEVEL=3 Edited /etc/ctdb/nodes to add internal Ip address for eth1 for private IP. The complete /var/log/log.ctdb: 2013/04/09 16:09:59.881679 [30574]: CTDB starting on node 2013/04/09 16:09:59.886133 [30575]: Starting CTDBD as pid : 30575 2013/04/09 16:09:59.886305 [30575]: Set scheduler to SCHED_FIFO 2013/04/09 16:09:59.886637 [30575]: ctdb chose network address 10.22.1.20:4379 pnn 0 2013/04/09 16:09:59.887035 [30575]: server/eventscript.c:800 Starting eventscript init 2013/04/09 16:09:59.969022 [30575]: 10.interface: No public addresses file found. Nothing to do for 10.interfaces 2013/04/09 16:10:00.246654 [30575]: server/eventscript.c:486 Eventscript init finished with state 0 2013/04/09 16:10:00.248978 [30575]: Keepalive monitoring has been started 2013/04/09 16:10:00.249024 [30575]: Monitoring has been started 2013/04/09 16:10:00.249057 [30575]: server/eventscript.c:800 Starting eventscript setup 2013/04/09 16:10:00.249415 [recoverd:30648]: monitor_cluster starting 2013/04/09 16:10:00.251621 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870283321406128128 2013/04/09 16:10:00.251760 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870564796382838784 2013/04/09 16:10:00.251858 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870846271359549440 2013/04/09 16:10:00.251952 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17365880163140632576 2013/04/09 16:10:00.252050 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17582052945254416384 2013/04/09 16:10:00.252150 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17942340915444056064 2013/04/09 16:10:00.252243 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17798225727368200192 2013/04/09 16:10:00.252332 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18014398509481984000 2013/04/09 16:10:00.252422 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18086456103519911936 2013/04/09 16:10:00.252511 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18087019053473333248 2013/04/09 16:10:00.252600 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18158513697557839872 2013/04/09 16:10:00.252688 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17654110539292344320 2013/04/09 16:10:00.252776 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18086737578496622592 2013/04/09 16:10:00.253577 [recoverd:30648]: server/ctdb_recoverd.c:3415 Initial recovery master set - forcing election 2013/04/09 16:10:00.253609 [recoverd:30648]: server/ctdb_recoverd.c:2521 Force an election 2013/04/09 16:10:00.253673 [30575]: Freeze priority 1 2013/04/09 16:10:00.253783 [30575]: Freeze priority 2 2013/04/09 16:10:00.253901 [30575]: Freeze priority 3 2013/04/09 16:10:00.254181 [recoverd:30648]: server/ctdb_recoverd.c:2005 Send election request to all active nodes 2013/04/09 16:10:01.249677 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:02.249961 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:03.250141 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:03.257560 [recoverd:30648]: server/ctdb_recoverd.c:1055 Election timed out 2013/04/09 16:10:03.258563 [recoverd:30648]: The interfaces status has changed on local node 0 - force takeover run 2013/04/09 16:10:03.258805 [recoverd:30648]: Trigger takeoverrun 2013/04/09 16:10:03.259041 [recoverd:30648]: server/ctdb_recoverd.c:2702 Node:0 was in recovery mode. Restart recovery process 2013/04/09 16:10:03.259071 [recoverd:30648]: server/ctdb_recoverd.c:1555 Starting do_recovery 2013/04/09 16:10:03.259085 [recoverd:30648]: Taking out recovery lock from recovery daemon 2013/04/09 16:10:03.259108 [recoverd:30648]: Take the recovery lock 2013/04/09 16:10:03.267903 [recoverd:30648]: Recovery lock taken successfully 2013/04/09 16:10:03.267933 [recoverd:30648]: ctdb_recovery_lock: Got recovery lock on '/mnt/prod-assets/samba/samba_lock' 2013/04/09 16:10:03.268052 [recoverd:30648]: Recovery lock taken successfully by recovery daemon 2013/04/09 16:10:03.268071 [recoverd:30648]: server/ctdb_recoverd.c:1592 Recovery initiated due to problem with node 0 2013/04/09 16:10:03.268190 [recoverd:30648]: server/ctdb_recoverd.c:1617 Recovery - created remote databases 2013/04/09 16:10:03.268211 [recoverd:30648]: server/ctdb_recoverd.c:1624 Recovery - updated db priority for all databases 2013/04/09 16:10:03.268351 [30575]: Freeze priority 1 2013/04/09 16:10:03.268455 [30575]: Freeze priority 2 2013/04/09 16:10:03.268552 [30575]: Freeze priority 3 2013/04/09 16:10:03.268723 [30575]: server/ctdb_recover.c:1035 startrecovery eventscript has been invoked 2013/04/09 16:10:03.268744 [30575]: Monitoring has been disabled 2013/04/09 16:10:03.268763 [30575]: server/eventscript.c:800 Starting eventscript startrecovery 2013/04/09 16:10:03.617562 [30575]: server/eventscript.c:486 Eventscript startrecovery finished with state 0 2013/04/09 16:10:03.618061 [30575]: Control modflags on node 0 - Unchanged - flags 0x2 2013/04/09 16:10:03.618127 [recoverd:30648]: server/ctdb_recoverd.c:1661 Recovery - updated flags 2013/04/09 16:10:03.618311 [recoverd:30648]: server/ctdb_recoverd.c:1705 started transactions on all nodes 2013/04/09 16:10:03.618333 [recoverd:30648]: server/ctdb_recoverd.c:1718 Recovery - starting database commits 2013/04/09 16:10:03.618389 [30575]: server/ctdb_freeze.c:408 healthy_nodes[0] 2013/04/09 16:10:03.618450 [recoverd:30648]: server/ctdb_recoverd.c:1730 Recovery - committed databases 2013/04/09 16:10:03.618621 [recoverd:30648]: server/ctdb_recoverd.c:1780 Recovery - updated vnnmap 2013/04/09 16:10:03.618717 [recoverd:30648]: server/ctdb_recoverd.c:1789 Recovery - updated recmaster 2013/04/09 16:10:03.618916 [30575]: Control modflags on node 0 - Unchanged - flags 0x2 2013/04/09 16:10:03.618973 [recoverd:30648]: server/ctdb_recoverd.c:1806 Recovery - updated flags 2013/04/09 16:10:03.619034 [30575]: server/ctdb_recover.c:665 Recovery mode set to NORMAL 2013/04/09 16:10:03.619053 [30575]: Thawing priority 1 2013/04/09 16:10:03.619066 [30575]: Release freeze handler for prio 1 2013/04/09 16:10:03.619110 [30575]: Thawing priority 2 2013/04/09 16:10:03.619126 [30575]: Release freeze handler for prio 2 2013/04/09 16:10:03.619150 [30575]: Thawing priority 3 2013/04/09 16:10:03.619164 [30575]: Release freeze handler for prio 3 2013/04/09 16:10:03.622723 [recoverd:30648]: server/ctdb_recoverd.c:1815 Recovery - disabled recovery mode 2013/04/09 16:10:03.623218 [recoverd:30648]: Disabling ip check for 9 seconds 2013/04/09 16:10:03.623228 [30575]: Running eventscripts with arguments ipreallocated 2013/04/09 16:10:03.623260 [30575]: Monitoring has been disabled 2013/04/09 16:10:03.623283 [30575]: server/eventscript.c:800 Starting eventscript ipreallocated 2013/04/09 16:10:03.971720 [30575]: server/eventscript.c:486 Eventscript ipreallocated finished with state 0 2013/04/09 16:10:03.971788 [30575]: Monitoring has been enabled 2013/04/09 16:10:03.972044 [30575]: Recovery has finished 2013/04/09 16:10:03.972067 [30575]: Monitoring has been disabled 2013/04/09 16:10:03.972083 [30575]: server/eventscript.c:800 Starting eventscript recovered 2013/04/09 16:10:04.250561 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:04.250613 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. 2013/04/09 16:10:04.322804 [30575]: server/eventscript.c:486 Eventscript recovered finished with state 0 2013/04/09 16:10:04.322870 [30575]: Monitoring has been enabled 2013/04/09 16:10:04.322983 [recoverd:30648]: server/ctdb_recoverd.c:1841 Recovery - finished the recovered event 2013/04/09 16:10:04.323022 [recoverd:30648]: server/ctdb_recoverd.c:1847 Recovery complete 2013/04/09 16:10:04.323038 [recoverd:30648]: Resetting ban count to 0 for all nodes 2013/04/09 16:10:04.323057 [recoverd:30648]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds) 2013/04/09 16:10:05.251440 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:05.251473 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. 2013/04/09 16:10:06.251582 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:06.251634 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. 2013/04/09 16:10:07.251744 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:07.251775 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. 2013/04/09 16:10:08.251886 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:08.251925 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. 2013/04/09 16:10:09.252062 [30575]: CTDB_WAIT_UNTIL_RECOVERED 2013/04/09 16:10:09.252117 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second. Thanks for any help. Chuck reinke at mac.com