thr3ads.net - samba - [Samba] Create network oscillations on the leader node, resulting in brain splitting [Jun 2023]

If this information is useful, please help other people find it:
Share via:

tu.qiuping

2023-Jun-12 16:46 UTC

[Samba] Create network oscillations on the leader node, resulting in brain splitting

My ctdb version is 4.17.7


Hello, everyone. 
My ctdb cluster configuration is correct and the cluster is healthy before
operation.
My cluster has three nodes, namely 192.168.40.131?node 0?, 192.168.40.132?node
1?, and 192.168.40.133?node 2?. And the node 192.168.40.133 is the leader.


I conducted network oscillation testing on node 192.168.40.133, and after a
period of time, the lock update of this node failed, and at this time, the lock
was taken away by node 0.
Amazingly, after node 0 received the lock, it sent a message with leader=0 to
node 1, but did not send it to node 2. After a short period of time, node 0 and
node 1 received a broadcast with leader=2, and at this time, node 0 did not
release the lock and believed that it was not the leadereven though the network
of node 2 was healthy at this time.
And when I restored the network of node 2, node 1 and node 2 kept trying to
acquire the lock and reported an error: Unable to take cluster lock -
contention.



The logs of the three nodes are attached.


ctdb status of the node 0:
[root at host-192-168-40-131 ~]# ctdb status
Number of nodes:3
pnn:0 192.136.40.131&nbsp; &nbsp;OK (THIS NODE)
pnn:1 192.136.40.132&nbsp; &nbsp;OK
pnn:2 192.136.40.133&nbsp; &nbsp;UNHEALTHY
Generation:629720908
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:RECOVERY (1)
Leader:UNKNOWN



ctdb status of the node 1:
[root at host-192-168-40-132 tecs]# ctdb status
bNumber of nodes:3
pnn:0 192.136.40.131&nbsp; &nbsp;OK
pnn:1 192.136.40.132&nbsp; &nbsp;OK (THIS NODE)
pnn:2 192.136.40.133&nbsp; &nbsp;UNHEALTHY
Generation:629720908
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:RECOVERY (1)
Leader:UNKNOWN



ctdb status of the node 2:
[root at host-192-168-40-133 tecs]# ctdb status
Number of nodes:3
pnn:0 192.136.40.131&nbsp; &nbsp;UNHEALTHY
pnn:1 192.136.40.132&nbsp; &nbsp;UNHEALTHY
pnn:2 192.136.40.133&nbsp; &nbsp;OK (THIS NODE)
Generation:1185443889
Size:1
hash:0 lmaster:2
Recovery mode:RECOVERY (1)
Leader:UNKNOWN

samba - Jun 2023 - Create network oscillations on the leader node, resulting in brain splitting

[Samba] Create network oscillations on the leader node, resulting in brain splitting