Thomas Besser
2011-Jul-14 14:42 UTC
[Gluster-users] self-heal/possible split-brain problem...
Hi all, have a gluster replica (3.1.4) over two servers. client log says: [2011-07-14 08:33:23.661543] I [afr-common.c:672:afr_lookup_done] 0- storage1-replicate-0: split brain detected du ring lookup of /poolserver/hdb.raw. [2011-07-14 08:33:23.661579] I [afr-common.c:716:afr_lookup_done] 0- storage1-replicate-0: background data self-h eal triggered. path: /poolserver/hdb.raw [2011-07-14 08:33:23.662216] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0- storage1-replicate-0: Unable to self-heal contents of '/poolserver/hdb.raw' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-07-14 08:33:23.662442] I [afr-self-heal- common.c:1527:afr_self_heal_completion_cbk] 0-storage1-replicate-0: background data self-heal completed on /poolserver/hdb.raw -----------server-node-01--------------- root at vsh01:/# getfattr -m . -d -e hex srv/glusterfs/poolserver/hdb.raw # file: srv/glusterfs/poolserver/hdb.raw trusted.afr.storage1-client-0=0x000000000000000000000000 trusted.afr.storage1-client-1=0x2d0000020000000000000000 trusted.gfid=0xa23bee795ac1450d89228bf8e38b915c root at vsh01:/# ls -l srv/glusterfs/poolserver/hdb.raw -rw-r--r-- 1 kvm kvm 241591910400 14. Jul 15:41 srv/glusterfs/poolserver/hdb.raw ------------server-node-02--------------- root at vsh02:/# getfattr -m . -d -e hex srv/glusterfs/poolserver/hdb.raw # file: srv/glusterfs/poolserver/hdb.raw trusted.afr.storage1-client-0=0x010000250000000000000000 trusted.afr.storage1-client-1=0x000000000000000000000000 trusted.gfid=0xa23bee795ac1450d89228bf8e38b915c root at vsh02:/# ls -l srv/glusterfs/poolserver/hdb.raw -rw-r--r-- 1 kvm kvm 241591910400 14. Jul 15:41 srv/glusterfs/poolserver/hdb.raw ------------------------------------ The files on both server nodes looks like the same and both getting changed from the running VM on it! So perhaps only the extended attributes are wrong. Is there a way to change this manually? Without deleting the file on one side. Also I'm confused about the last line of the client log, which telling me "data self-heal completed on /poolserver/hdb.raw". I would mean that this is a success message!? Regards Thomas
Thomas Besser
2011-Jul-22 07:38 UTC
[Gluster-users] [solved] Re: self-heal/possible split-brain problem...
Am 14.07.2011 16:42, schrieb Thomas Besser:> have a gluster replica (3.1.4) over two servers. > > client log says: > > [2011-07-14 08:33:23.661543] I [afr-common.c:672:afr_lookup_done] 0- > storage1-replicate-0: split brain detected du > ring lookup of /poolserver/hdb.raw. > [2011-07-14 08:33:23.661579] I [afr-common.c:716:afr_lookup_done] 0- > storage1-replicate-0: background data self-h > eal triggered. path: /poolserver/hdb.raw > [2011-07-14 08:33:23.662216] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0- > storage1-replicate-0: Unable to self-heal contents of '/poolserver/hdb.raw' > (possible split-brain). Please delete the file from all but the preferred > subvolume. > [2011-07-14 08:33:23.662442] I [afr-self-heal- > common.c:1527:afr_self_heal_completion_cbk] 0-storage1-replicate-0: > background data self-heal completed on /poolserver/hdb.raw > > -----------server-node-01--------------- > > root at vsh01:/# getfattr -m . -d -e hex srv/glusterfs/poolserver/hdb.raw > # file: srv/glusterfs/poolserver/hdb.raw > trusted.afr.storage1-client-0=0x000000000000000000000000 > trusted.afr.storage1-client-1=0x2d0000020000000000000000 > trusted.gfid=0xa23bee795ac1450d89228bf8e38b915c > > root at vsh01:/# ls -l srv/glusterfs/poolserver/hdb.raw > -rw-r--r-- 1 kvm kvm 241591910400 14. Jul 15:41 > srv/glusterfs/poolserver/hdb.raw > > ------------server-node-02--------------- > > root at vsh02:/# getfattr -m . -d -e hex srv/glusterfs/poolserver/hdb.raw > # file: srv/glusterfs/poolserver/hdb.raw > trusted.afr.storage1-client-0=0x010000250000000000000000 > trusted.afr.storage1-client-1=0x000000000000000000000000 > trusted.gfid=0xa23bee795ac1450d89228bf8e38b915c > > root at vsh02:/# ls -l srv/glusterfs/poolserver/hdb.raw > -rw-r--r-- 1 kvm kvm 241591910400 14. Jul 15:41 > srv/glusterfs/poolserver/hdb.raw > > ------------------------------------ > > The files on both server nodes looks like the same and both getting changed > from the running VM on it! So perhaps only the extended attributes are > wrong. > > Is there a way to change this manually? Without deleting the file on one > side.I corrected manually on both server nodes the extended attributes to "0x000000000000000000000000" like this: setfattr -n trusted.afr.storage1-client-0 -v 0x000000000000000000000000 /srv/glusterfs/poolserver/hdb.raw It worked! Regards Thomas