Dave Lawrence
2011-Apr-27 07:10 UTC
[Samba] CTDB / Samba4. Nodes don't become healthy on first startup
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I'm trying to bring up a three node cluster using CTDB and SAMBA4 on a GlusterFS clustered file system. The filesystem itself seems to be working just fine, but CTDB doesn't seem happy. If I start a single node up: service ctdb start After about 10 seconds it becomes healthy, starts SAMBA and takes over all 3 IP addresses. However, when I start up subsequent nodes, they simply refuse to become healthy. None of the the nodes appear to agree about who is the recover master. The recover lock file is always zero bytes. If I start all 3 nodes at once, none of them become healthy. Somewhere along the line I once managed to get two nodes healthy at once, and they agreed on who was recmaster. This was a fluke that I cannot reproduce. Typical log messages 2011/04/27 07:33:23.633970 [25303]: CTDB_WAIT_UNTIL_RECOVERED 2011/04/27 07:33:23.634002 [25303]: server/ctdb_monitor.c:232 generation is INVALID. Wait one more second 2011/04/27 07:33:23.695058 [recoverd:25365]: server/ctdb_recoverd.c:1812 Send election request to all active nodes 2011/04/27 07:33:24.196099 [recoverd:25365]: server/ctdb_recoverd.c:1812 Send election request to all active nodes So I have a number of questions 1) What data is CTDB actually managing in the case of SAMBA4? Presumably the temporary .tdb files that get created under /usr/local/samba - do I need to tell CTDB about this location? 2) The reclock file is stored on the clustered filesystem, at a location that is not part of a network share of any sort. Is this correct 3) Could this be a problem with GlusterFS 4) Is it OK that the public and private IP ranges are on the same physical network? There's certainly no sign of any problem with the servers communicating with each other on either range using other protocols, eg ssh. I am using a recent build of CTDB from Git. I had similar experiences with Ubuntu and Debian packages. Our servers are all VMs, as follows node0: Guest: Ubuntu 10.04 Host: OpenVZ (linux 2.6.32 x86_64 with Ubuntu config and openvz patch) Network: Guest has full control of eth1 node1 Guest: Ubuntu 10.04 x86 Host: VMWare ESXI 4 x86_64 Network: Guest adapter bridged to host node2 Guest: Ubuntu 10.04 x86_64 Host: VMWare Player (Windows 7) Network: Guest adapter bridged to host Our config: /etc/default/ctdb CTDB_RECOVERY_LOCK="mnt/data/lockfile" CTDB_PUBLIC_INTERFACE=eth1 # varies per server CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses CTDB_MANAGES_SAMBA=yes CTDB_SAMBA_CHECK_PORTS="445" # work around grep error in log CTDB_MANAGES_WINBIND=yes CTDB_SERVICE_SMB=samba4 #name of our init script CTDB_NODES=/etc/ctdb/nodes CTDB_DBDIR=/var/ctdb CTDB_DBDIR_PERSISTENT=/var/ctdb/persistent CTDB_LOGFILE=/var/log/log.ctdb CTDB_DEBUGLEVEL=3 /etc/ctdb/public_addresses 192.168.2.119/24 eth1 192.168.2.120/24 eth1 192.168.2.121/24 eth1 (network adapter varies) /etc/ctdb/nodes 10.0.0.1 10.0.0.2 10.0.0.3 excerpt from /etc/network/interfaces auto eth1 iface eth1 inet static address 192.168.2.164 netmask 255.255.255.0 gateway 192.168.2.1 up route add -net 192.168.3.0/24 gw 192.168.2.165 up route add -net 192.168.4.0/24 gw 192.168.2.156 auto eth1:0 iface eth1:0 inet static address 10.0.0.1 netmask 255.0.0.0 note the static address 192.168.2.164 is not one of the takeover addresses but is in the same range. I have also tried with eth1 configured for ONLY the 10.x.x.x range. Thanks for listening! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNt8F6AAoJEJjHYqrO/1Xc1m8IAIXA7hAqRSfKYbPEEkdNN3Ms A5CTbiPwM+HnKfhJSTYYvQzaL90QzhAgLg/lImD4r45V6oGkG1zR9IkaJLrcGfkb 9WhK80x2yId7Lzm3GC3yphQfH+MSFRIL5lJP7Sxglz8rFJXTse0U/FNmsXvJQdvV gqjnYKlQ8Al/B5PQX6t586YHH+yRb61M/DvRuclLUtcsxrcrFX89bjfkmNkhD26T /bXvANMtNAGDxVxuwChYOJ05Q2Jt0fQfBslg0U9tR/tqwbzuQZ2W+4TjklC6zz9B thFH74m+KhCjVEgCJp3oSdTYNAcH7MN1dK8JSsGldfFXuIQhj66jwQxI/tl6978=ueFp -----END PGP SIGNATURE-----
Dave Lawrence
2011-Apr-27 07:33 UTC
[Samba] CTDB / Samba4. Nodes don't become healthy on first startup
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 27/04/11 08:10, Dave Lawrence wrote:> Our config: > /etc/default/ctdb > CTDB_RECOVERY_LOCK="mnt/data/lockfile"No matter how many times you check the config, there's always something obviously stupid that you don't notice till after you've gone public Should be CTDB_RECOVERY_LOCK="/mnt/data/lockfile" (leading forward slash) I'll be back> CTDB_SAMBA_CHECK_PORTS="445" # work around grep error in logI'll explain this in a different thread. I think there is a bug. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNt8a9AAoJEJjHYqrO/1XcxHQIAL0FbaiTjn+s30muWlGn4xRk 9IL5mT3zjNpdiPAqJN9HSbl9p4ltdCemACG+P0PBUJktznfoND2hGgUjYRwnMBCz NAtL3XVJDyY6qUFNTuWv1B12pYCr36p5BkOgMIC1yilh76DNpem/JYC6pg/nlesk TlnoAifHRKAKdcn97wkiNhAqDFkOsSegTMwuUy1LwD7PtS/9jmy33PpNyewOiAXV nNaRqP/UN5JEHhrQ4/0d9dDXrI6uRL4zGGNEa89q++vTGv0J6HS8DyTQtCdkrsMM vVm7cRl/J+jXm1yJ3otEU49BXOPdGKpa8VGVnXpGPDsIKTspw4HYHJGWCU0Zb0U=DL1p -----END PGP SIGNATURE-----