thr3ads.net - Gluster users - [Gluster-users] Possible split-brain [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Aaron Roberts

2010-Nov-11 14:37 UTC

[Gluster-users] Possible split-brain

Hi all,
	I have 4 glusterd servers running a single glusterfs volume.  The volume was
created using the gluster command line, with no changes from default.  The same
machines all mount the volume using the native glusterfs client:

[root at localhost ~]# gluster volume create datastore replica 2 transport tcp
192.168.253.1:/glusterfs/primary 192.168.253.3:/glusterfs/secondary
192.168.253.2:/glusterfs/primary 192.168.253.4:/glusterfs/secondary
192.168.253.3:/glusterfs/primary 192.168.253.1:/glusterfs/secondary
192.168.253.4:/glusterfs/primary 192.168.253.2:/glusterfs/secondary

[root at localhost ~]# cat /etc/fstab

...
/dev/cciss/c0d0p6       /glusterfs/primary      ext4    defaults,noatime 1 2
/dev/cciss/c0d1p6       /glusterfs/secondary    ext4    defaults,noatime 1 2
192.168.253.1:/datastore /mnt/datastore	    glusterfs defaults,_netdev 0 0

[root at localhost ~]# gluster volume info

Volume Name: datastore
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/glusterfs/primary
Brick2: 192.168.253.3:/glusterfs/secondary
Brick3: 192.168.253.2:/glusterfs/primary
Brick4: 192.168.253.4:/glusterfs/secondary
Brick5: 192.168.253.3:/glusterfs/primary
Brick6: 192.168.253.1:/glusterfs/secondary
Brick7: 192.168.253.4:/glusterfs/primary
Brick8: 192.168.253.2:/glusterfs/secondary 

The platform is not currently running production data and I have been testing
the redundancy of the setup (pulling cables etc.).  All my servers are now
logging the following messages every 1 minute or so:

[2010-11-11 14:18:49.636327] I [afr-common.c:672:afr_lookup_done]
datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.636388] I [afr-common.c:716:afr_lookup_done]
datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.636863] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable
to self-heal permissions/ownership of '/' (possible split-brain). Please
fix the file on all backend volumes
[2010-11-11 14:18:49.637080] I
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
datastore-replicate-0: background  meta-data data self-heal completed on /
[2010-11-11 14:18:49.637561] I [afr-common.c:672:afr_lookup_done]
datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.637588] I [afr-common.c:716:afr_lookup_done]
datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.638064] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable
to self-heal permissions/ownership of '/' (possible split-brain). Please
fix the file on all backend volumes
[2010-11-11 14:18:49.638265] I
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
datastore-replicate-0: background  meta-data data self-heal completed on /

Can anyone tell me what I need to do to fix this?

Thanks,
	Aaron

Aaron Roberts

2010-Nov-11 16:00 UTC

head link

[Gluster-users] Possible split-brain

Hi all,
	I have 4 glusterd servers running a single glusterfs volume.  The volume was
created using the gluster command line, with no changes from default.  The same
machines all mount the volume using the native glusterfs client:

[root at localhost ~]# gluster volume create datastore replica 2 transport tcp
192.168.253.1:/glusterfs/primary 192.168.253.3:/glusterfs/secondary
192.168.253.2:/glusterfs/primary 192.168.253.4:/glusterfs/secondary
192.168.253.3:/glusterfs/primary 192.168.253.1:/glusterfs/secondary
192.168.253.4:/glusterfs/primary 192.168.253.2:/glusterfs/secondary

[root at localhost ~]# cat /etc/fstab

...
/dev/cciss/c0d0p6       /glusterfs/primary      ext4    defaults,noatime 1 2
/dev/cciss/c0d1p6       /glusterfs/secondary    ext4    defaults,noatime 1 2
192.168.253.1:/datastore /mnt/datastore	    glusterfs defaults,_netdev 0 0

[root at localhost ~]# gluster volume info

Volume Name: datastore
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/glusterfs/primary
Brick2: 192.168.253.3:/glusterfs/secondary
Brick3: 192.168.253.2:/glusterfs/primary
Brick4: 192.168.253.4:/glusterfs/secondary
Brick5: 192.168.253.3:/glusterfs/primary
Brick6: 192.168.253.1:/glusterfs/secondary
Brick7: 192.168.253.4:/glusterfs/primary
Brick8: 192.168.253.2:/glusterfs/secondary 

The platform is not currently running production data and I have been testing
the redundancy of the setup (pulling cables etc.).  All my servers are now
logging the following messages every 1 minute or so:

[2010-11-11 14:18:49.636327] I [afr-common.c:672:afr_lookup_done]
datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.636388] I [afr-common.c:716:afr_lookup_done]
datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.636863] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable
to self-heal permissions/ownership of '/' (possible split-brain). Please
fix the file on all backend volumes
[2010-11-11 14:18:49.637080] I
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
datastore-replicate-0: background  meta-data data self-heal completed on /
[2010-11-11 14:18:49.637561] I [afr-common.c:672:afr_lookup_done]
datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.637588] I [afr-common.c:716:afr_lookup_done]
datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.638064] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable
to self-heal permissions/ownership of '/' (possible split-brain). Please
fix the file on all backend volumes
[2010-11-11 14:18:49.638265] I
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
datastore-replicate-0: background  meta-data data self-heal completed on /

Can anyone tell me what I need to do to fix this?

Thanks,
	Aaron

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Nov 2010 - Possible split-brain

[Gluster-users] Possible split-brain

[Gluster-users] Possible split-brain

Maybe Matching Threads