Tomasz Chmielewski
2013-Jan-08 17:33 UTC
[Gluster-users] frequent split-brain detected, aborting selfheal; background meta-data self-heal failed
Hi, I'm seeing rather frequent (several times per minute) log entries like: [2013-01-08 16:45:03.399791] I [afr-common.c:1038:afr_launch_self_heal] 0-shared-replicate-0: background meta-data self-heal triggered. path: /lfd/techstudiolfc/pub [2013-01-08 16:45:03.400224] I [afr-self-heal-common.c:705:afr_mark_sources] 0-shared-replicate-0: split-brain possible, no source detected [2013-01-08 16:45:03.400253] E [afr-self-heal-metadata.c:512:afr_sh_metadata_fix] 0-shared-replicate-0: Unable to self-heal permissions/ownership of '/lfd/techstudiolfc/pub' (possible split-brain). Please fix the file on all backend volumes [2013-01-08 16:45:03.400417] I [afr-self-heal-metadata.c:81:afr_sh_metadata_done] 0-shared-replicate-0: split-brain detected, aborting selfheal of /lfd/techstudiolfc/pub [2013-01-08 16:45:03.400453] E [afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 0-shared-replicate-0: background meta-data self-heal failed on /lfd/techstudiolfc/pub However, when checking the affected directory - the permissions/ownerships seem to be identical on both servers: [root at ca1.sg1 /]# ls -ld /data/gluster/lfd/techstudiolfc/pub drwxr-xr-x 2 userftp userftp 4096 Jun 6 2012 /data/gluster/lfd/techstudiolfc/pub [root at ca1.sg1 /]# attr -l /data/gluster/lfd/techstudiolfc/pub Attribute "gfid" has a 16 byte value for /data/gluster/lfd/techstudiolfc/pub Attribute "afr.shared-client-0" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub Attribute "afr.shared-client-1" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub [root at ca2.sg1 /]# ls -ld /data/gluster/lfd/techstudiolfc/pub drwxr-xr-x 2 userftp userftp 4096 Jun 6 2012 /data/gluster/lfd/techstudiolfc/pub [root at ca2.sg1 /]# attr -l /data/gluster/lfd/techstudiolfc/pub Attribute "gfid" has a 16 byte value for /data/gluster/lfd/techstudiolfc/pub Attribute "afr.shared-client-0" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub Attribute "afr.shared-client-1" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub What could be the problem? I'm using glusterfs 3.2.6 on Debian Squeeze, and seeing the very same problem on different servers. It only seem to affect directories. -- Tomasz Chmielewski http://wpkg.org
Tomasz Chmielewski
2013-Jan-08 17:44 UTC
[Gluster-users] frequent split-brain detected, aborting selfheal; background meta-data self-heal failed
On 01/08/2013 06:33 PM, Tomasz Chmielewski wrote:> [root at ca2.sg1 /]# attr -l /data/gluster/lfd/techstudiolfc/pub > > Attribute "gfid" has a 16 byte value for /data/gluster/lfd/techstudiolfc/pub > > Attribute "afr.shared-client-0" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub > > Attribute "afr.shared-client-1" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pubPerhaps that would be useful, too - it differs on both servers (trusted.afr.shared-client-0 and trusted.afr.shared-client-1). What's its meaning? What situations could lead to them being different? [root at ca1.sg1 /]# getfattr -m . -d -e hex /data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from absolute path names # file: data/gluster/lfd/techstudiolfc/pub trusted.afr.shared-client-0=0x000000000000000000000000 trusted.afr.shared-client-1=0x000000000000001d00000000 trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2 [root at ca2.sg1 /]# getfattr -m . -d -e hex /data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from absolute path names # file: data/gluster/lfd/techstudiolfc/pub trusted.afr.shared-client-0=0x000000000000000300000000 trusted.afr.shared-client-1=0x000000000000000000000000 trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2 -- Tomasz Chmielewski http://wpkg.org