zhu.shangzhong at zte.com.cn
2018-Sep-05 10:49 UTC
[Samba] [ctdb]Unable to run startrecovery event(if mail contentis encrypted, please see the attached file)
Thanks Martin! We are using the ctdb 4.6.10. Are you able to recreate this every time? Sometimes? Rarely? Rarely. Note that you're referring to nodes 1, 2, 3 while CTDB numbers the nodes 0, 1, 2. In fact, the situation is a little more confused than this: This is my wrong. The CTDB numbers the nodes is 0,1,2. # ctdb status Number of nodes:3 pnn:0 10.231.8.67 OK pnn:1 10.231.8.65 OK pnn:2 10.231.8.66 OK (THIS NODE) #ctdb ip Public IPs on node 2 10.231.8.68 1 10.231.8.69 2 10.231.8.70 0 ----------------------------------- Re: [Samba] [ctdb]Unable to run startrecovery event(if mail contentis encrypted, please see the attached file) Thanks for reporting this. It looks very interesting and we will fix it all as soon as we understand it! :-) On Wed, 5 Sep 2018 16:29:31 +0800 (CST), "zhu.shangzhong--- via samba" wrote:> There is a 3 nodes ctdb cluster is running. When one of 3 nodes is > powered down, lots of logs will be wrote to log.ctdb.Can you please let us know what version of Samba/CTDB you're using? Note that you're referring to nodes 1, 2, 3 while CTDB numbers the nodes 0, 1, 2. In fact, the situation is a little more confused than this:> Power down node3 > The node1 log is as follow: > 2018/09/04 04:29:33.402108 ctdbd[10129]: 10.231.8.65:4379: node 10.231.8.67:4379 is dead: 1 connected > 2018/09/04 04:29:33.414817 ctdbd[10129]: Tearing down connection to dead node :0It appears that the node you're calling node 3 is the one CTDB calls node 0! Can you please post the output of "ctdb status" when all nodes are up and running? I'm guessing that your nodes file looks like: 10.231.8.67 10.231.8.65 10.231.8.66 This:> node1: repeat logs: > 2018/09/04 04:35:06.414369 ctdbd[10129]: Recovery has started > 2018/09/04 04:35:06.414944 ctdbd[10129]: connect() failed, errno=111 > 2018/09/04 04:35:06.415076 ctdbd[10129]: Unable to run startrecovery eventis due to this:> 2018/09/04 04:29:55.570212 ctdb-eventd[10131]: Bad talloc magic value - wrong talloc version used/mixed > 2018/09/04 04:29:57.240533 ctdbd[10129]: Eventd went awayWe have fixed a similar issue in some versions. When we know what version you are running then we can say whether it is a known issue or a new issue. I have been working on the following issue for most of this week:> 2018/09/04 04:29:52.465663 ctdbd[10129]: This node (1) is now the recovery master > 2018/09/04 04:29:55.468771 ctdb-recoverd[11302]: Election period ended > 2018/09/04 04:29:55.469404 ctdb-recoverd[11302]: Node 2 has changed flags - now 0x8 was 0x0 > 2018/09/04 04:29:55.469475 ctdb-recoverd[11302]: Remote node 2 had flags 0x8, local had 0x0 - updating local > 2018/09/04 04:29:55.469514 ctdb-recoverd[11302]: ../ctdb/server/ctdb_recoverd.c:1267 Starting do_recovery > 2018/09/04 04:29:55.469525 ctdb-recoverd[11302]: Attempting to take recovery lock (/share-fs/export/ctdb/.ctdb/reclock) > 2018/09/04 04:29:55.563522 ctdb-recoverd[11302]: Unable to take recovery lock - contention > 2018/09/04 04:29:55.563573 ctdb-recoverd[11302]: Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds > 2018/09/04 04:29:55.563585 ctdb-recoverd[11302]: Banning node 1 for 300 secondsAre you able to recreate this every time? Sometimes? Rarely? I hadn't seen this until recently and I'm now worried that it is more widespread than we realise. Thanks... peace & happiness, martin
Possibly Parallel Threads
- [ctdb]Unable to run startrecovery event(if mail contentis encrypted, please see the attached file)
- [ctdb]Unable to run startrecovery event
- [ctdb]Unable to run startrecovery event(if mail content is encrypted, please see the attached file)
- Failed to start CTDB first time after install
- [ctdb]Unable to run startrecovery event