Krishnan Parthasarathi
2012-May-11 12:52 UTC
[Samba] CTDB daemon crashed on bringing down one node in the cluster
All, I have a 3 node CTDB cluster which serves 4 'public addresses'. /etc/ctdb/public_addresses file is node specific and present in the above path in participating nodes. All the nodes run RHEL 6.2. Other ctdb config files such as "nodes" and "public_addresses" are placed on a shared filesystem mounted on a known location (say, /gluster/lock) On starting CTDB service in all the nodes, we see things are fine via ctdb status. All nodes are "OK" and connected. To test the failover behaviour, I brought down one of the nodes. "ctdb status" when run on one of the (up) nodes gave the following status, [root@<nodename>~]# ctdb status Number of nodes:4 pnn:0 x.y.z.a DISCONNECTED|BANNED|UNHEALTHY|INACTIVE pnn:1 x.y.z.b BANNED|UNHEALTHY|INACTIVE (THIS NODE) pnn:2 x.y.z.c DISCONNECTED|UNHEALTHY|INACTIVE pnn:3 x.y.z.d OK Generation:INVALID Size:3 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:3 Recovery mode:RECOVERY (1) Recovery master:3 In the above (edited) output, pnn: 2 is the one that was brought down. I also observed that ctdb had crashed with signal 6 in pnn: 0. The stack trace was not very useful. I am new to ctdb, I would like to know if there is anyway I can get more useful stack traces on subsequent crashes (if any). Is there something that I may have missed. Could somebody give me pointers how I can debug this issue? cheers, krish