Strahil Nikolov
2020-Dec-27 20:20 UTC
[Gluster-users] cannot resolve split-brain via 'select one brick as source'
Hi All,
I'm currently playing around with Gluster 8.3 and geo-replication.
I have created a 'replica 3' volume as source and a 'replica 2'
as a destination.
In this case geo-replication is working quite fine , but during my tests I have
managed to cause a split-brain in some of the files:
[root at glustere mnt]# gluster volume info secondary
?
Volume Name: secondary
Type: Distributed-Replicate
Volume ID: 1b5717ee-aa9b-4eff-9989-ad4f0388b86c
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: glusterd:/bricks/brick-d1/brick
Brick2: glustere:/bricks/brick-e2/brick
Brick3: glustere:/bricks/brick-e1/brick
Brick4: glusterd:/bricks/brick-d2/brick
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
features.read-only: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.quick-read: off
performance.client-io-threads: off
cluster.enable-shared-storage: enable
[root at glustere mnt]# gluster volume heal secondary info summary
Brick glusterd:/bricks/brick-d1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick glustere:/bricks/brick-e2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick glustere:/bricks/brick-e1/brick
Status: Connected
Total Number of entries: 2641
Number of entries in heal pending: 0
Number of entries in split-brain: 2641
Number of entries possibly healing: 0
Brick glusterd:/bricks/brick-d2/brick
Status: Connected
Total Number of entries: 2641
Number of entries in heal pending: 0
Number of entries in split-brain: 2641
Number of entries possibly healing: 0
As node "glustere" was last reboot , the source of truth should be
'glusterd:/bricks/brick-d2/brick'.
I have tried to tell gluster that , but it doesn't want to resolve the
split-brain:
[root at glustere mnt]# gluster volume heal secondary split-brain source-brick
glusterd:/bricks/brick-d2/brick | tail -n 5
Lookup failed on gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a:Transport endpoint is
not connected.
Lookup failed on gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3:Transport endpoint is
not connected.
Status: Connected
Number of healed entries: 1
In the logs it's clearly that something is wrong:
[2020-12-27 20:14:07.100074] W [MSGID: 108027]
[afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read
subvols for dbfb3794-4a1a-4540-b235-2fdce4d21d6a?
[2020-12-27 20:14:07.100106] I [MSGID: 109063]
[dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies
[{path=dbfb3794-4a1a-4540-b235-2fdce4d21d6a},
{gfid=dbfb3794-4a1a-4540-b235-2fdce4d21d6a}, {holes=1}, {overlaps=0}]?
[2020-12-27 20:14:07.100903] W [MSGID: 108027]
[afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read
subvols for d78fdf9c-1c00-4712-8229-cfc10b009ad3?
[2020-12-27 20:14:07.100935] I [MSGID: 109063]
[dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies
[{path=d78fdf9c-1c00-4712-8229-cfc10b009ad3},
{gfid=d78fdf9c-1c00-4712-8229-cfc10b009ad3}, {holes=1}, {overlaps=0}]
Yet, if I specify the file in the previous command the heal is OK:
[root at glustere mnt]# gluster volume heal secondary split-brain source-brick
glusterd:/bricks/brick-d2/brick gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a
Healed gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a.
[root at glustere mnt]# gluster volume heal secondary split-brain source-brick
glusterd:/bricks/brick-d2/brick gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3
Healed gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3.
[2020-12-27 20:16:27.471113] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-secondary-replicate-1: performing metadata selfheal on
dbfb3794-4a1a-4540-b235-2fdce4d21d6a?
[2020-12-27 20:16:27.473157] I [MSGID: 108026]
[afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1:
Completed metadata selfheal on dbfb3794-4a1a-4540-b235-2fdce4d21d6a.
sources=[1]? sinks=0??
[2020-12-27 20:16:38.303151] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-secondary-replicate-1: performing metadata selfheal on
d78fdf9c-1c00-4712-8229-cfc10b009ad3?
[2020-12-27 20:16:38.305144] I [MSGID: 108026]
[afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1:
Completed metadata selfheal on d78fdf9c-1c00-4712-8229-cfc10b009ad3.
sources=[1]? sinks=0?
I thought that 'source-brick' solves both data and metadata
split-brains. Am I wrong ?
Best Regards,
Strahil Nikolov