tu.qiuping
2023-Nov-26 13:13 UTC
[Samba] CTDB: some problems about disconnecting the private network of ctdb leader nodes
My ctdb version is 4.17.7 Hello, everyone. My ctdb cluster configuration is correct and the cluster is healthy before operation. My cluster has three nodes, namely host-192-168-34-164, host-192-168-34-165, and host-192-168-34-166. And the node host-192-168-34-164 is the leader before operation. I conducted network oscillation testing on node host-192-168-34-164?I down the interface of private network of ctdb at 19:18:54.091439. Then this node starts to do recovery. What I am puzzled about is that at 19:18:59.822903, this node timed out obtaining a lock, and the log shows ?Time out getting recovery lock, allowing recovery mode set any way??and then host-192-168-34-164 takeover all the virtual ip. I checked the source code of ctdb and found that lines 578 to 582 of the file samba/ctdb/server/ctdb_recover. c state: Timeout. Consider this a success, not a failure, as we failed to set the recovery lock which is what we wanted. This can be caused by the cluster filesystem being very slow to arbitrate locks immediately after a node failure. I am puzzled why get the reclock timeout is considered successful. Although a slow cluster file system may cause get reclock timeout, disconnecting the private network of the leader node can also cause this situation. Therefore, this disconnected node will take over all virtual IPs, which will conflict with the virtual IPs of other normal nodes?So, is it inappropriate to assume that get the reclock timeout is successful in this situation? The logs of the three nodes are attached.
Martin Schwenke
2023-Nov-28 02:22 UTC
[Samba] CTDB: some problems about disconnecting the private network of ctdb leader nodes
Hi, On Sun, 26 Nov 2023 21:13:21 +0800, "tu.qiuping via samba" <samba at lists.samba.org> wrote:> My ctdb version is 4.17.7 > > > My ctdb cluster configuration is correct and the cluster is healthy before operation. > > My cluster has three nodes, namely host-192-168-34-164, > host-192-168-34-165, and host-192-168-34-166. And the node > host-192-168-34-164 is the leader before operation. > > > I conducted network oscillation testing on node > host-192-168-34-164?I down the interface of private network of ctdb > at 19:18:54.091439. Then this node starts to do recovery. What I am > puzzled about is that at 19:18:59.822903, this node timed out > obtaining a lock, and the log shows ?Time out getting recovery lock, > allowing recovery mode set any way??and then host-192-168-34-164 > takeover all the virtual ip. > > > I checked the source code of ctdb and found that lines 578 to 582 of > the file samba/ctdb/server/ctdb_recover. c state: Timeout. > Consider this a success, not a failure, as we failed to set the > recovery lock which is what we wanted. This can be caused by > the cluster filesystem being very slow to arbitrate locks immediately > after a node failure. > > > I am puzzled why get the reclock timeout is considered successful. > Although a slow cluster file system may cause get reclock timeout, > disconnecting the private network of the leader node can also cause > this situation. Therefore, this disconnected node will take over all > virtual IPs, which will conflict with the virtual IPs of other normal > nodes?So, is it inappropriate to assume that get the reclock timeout > is successful in this situation?It is considered successful because it wasn't able to take the lock in a conflicting way. This part of the code doesn't have anything to do with selecting the leader node. It is a sanity check at the end of recovery to ensure that cluster lock can't be taken by nodes/processes that should not be able to take it. Given the way this lock is now used to determine the leader, this check is definitely in the wrong place, since this is a cluster management issue, not a recovery issue. If the leader becomes disconnected then the cluster lock will need to be released (e.g. by the underlying filesystem) before another node can be leader. 2 nodes should never be leader at the same time.> The logs of the three nodes are attached.I think mailman might have stripped the logs... :-( I think I need to see the logs before I can say anything very useful... peace & happiness, martin