Strahil Nikolov
2020-Dec-27 20:20 UTC
[Gluster-users] cannot resolve split-brain via 'select one brick as source'
Hi All, I'm currently playing around with Gluster 8.3 and geo-replication. I have created a 'replica 3' volume as source and a 'replica 2' as a destination. In this case geo-replication is working quite fine , but during my tests I have managed to cause a split-brain in some of the files: [root at glustere mnt]# gluster volume info secondary ? Volume Name: secondary Type: Distributed-Replicate Volume ID: 1b5717ee-aa9b-4eff-9989-ad4f0388b86c Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: glusterd:/bricks/brick-d1/brick Brick2: glustere:/bricks/brick-e2/brick Brick3: glustere:/bricks/brick-e1/brick Brick4: glusterd:/bricks/brick-d2/brick Options Reconfigured: cluster.quorum-count: 1 cluster.quorum-type: fixed features.read-only: on nfs.disable: on transport.address-family: inet storage.fips-mode-rchecksum: on performance.quick-read: off performance.client-io-threads: off cluster.enable-shared-storage: enable [root at glustere mnt]# gluster volume heal secondary info summary Brick glusterd:/bricks/brick-d1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick glustere:/bricks/brick-e2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick glustere:/bricks/brick-e1/brick Status: Connected Total Number of entries: 2641 Number of entries in heal pending: 0 Number of entries in split-brain: 2641 Number of entries possibly healing: 0 Brick glusterd:/bricks/brick-d2/brick Status: Connected Total Number of entries: 2641 Number of entries in heal pending: 0 Number of entries in split-brain: 2641 Number of entries possibly healing: 0 As node "glustere" was last reboot , the source of truth should be 'glusterd:/bricks/brick-d2/brick'. I have tried to tell gluster that , but it doesn't want to resolve the split-brain: [root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick | tail -n 5 Lookup failed on gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a:Transport endpoint is not connected. Lookup failed on gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3:Transport endpoint is not connected. Status: Connected Number of healed entries: 1 In the logs it's clearly that something is wrong: [2020-12-27 20:14:07.100074] W [MSGID: 108027] [afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read subvols for dbfb3794-4a1a-4540-b235-2fdce4d21d6a? [2020-12-27 20:14:07.100106] I [MSGID: 109063] [dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies [{path=dbfb3794-4a1a-4540-b235-2fdce4d21d6a}, {gfid=dbfb3794-4a1a-4540-b235-2fdce4d21d6a}, {holes=1}, {overlaps=0}]? [2020-12-27 20:14:07.100903] W [MSGID: 108027] [afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read subvols for d78fdf9c-1c00-4712-8229-cfc10b009ad3? [2020-12-27 20:14:07.100935] I [MSGID: 109063] [dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies [{path=d78fdf9c-1c00-4712-8229-cfc10b009ad3}, {gfid=d78fdf9c-1c00-4712-8229-cfc10b009ad3}, {holes=1}, {overlaps=0}] Yet, if I specify the file in the previous command the heal is OK: [root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a Healed gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a. [root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3 Healed gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3. [2020-12-27 20:16:27.471113] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-secondary-replicate-1: performing metadata selfheal on dbfb3794-4a1a-4540-b235-2fdce4d21d6a? [2020-12-27 20:16:27.473157] I [MSGID: 108026] [afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1: Completed metadata selfheal on dbfb3794-4a1a-4540-b235-2fdce4d21d6a. sources=[1]? sinks=0?? [2020-12-27 20:16:38.303151] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-secondary-replicate-1: performing metadata selfheal on d78fdf9c-1c00-4712-8229-cfc10b009ad3? [2020-12-27 20:16:38.305144] I [MSGID: 108026] [afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1: Completed metadata selfheal on d78fdf9c-1c00-4712-8229-cfc10b009ad3. sources=[1]? sinks=0? I thought that 'source-brick' solves both data and metadata split-brains. Am I wrong ? Best Regards, Strahil Nikolov