Alexis Huxley
2016-Jun-19  15:31 UTC
[Gluster-users] easily provoked unrecoverable split brain
As per the quickstart guide, I'm setting up a replicated volume on two
test (KVM) VMs fiori2 and torchio2 as follows:
	mkfs -t xfs -i size=512 -f /dev/vdb1          #  on both
	mount /dev/vdb1 /vol/brick0                   #  on both
	gluster peer probe torchio2	              #  on fiori2
	gluster peer probe fiori2                     #  on torchio2
	mkdir /vol/brick0/vmimages                    #  on both
	gluster volume create vmimages replica 2 \
            torchio2:/vol/brick0/vmimages 
            fiori2:/vol/brick0/vmimages               #  fiori2
	mount -t glusterfs fiori2:/vmimages /mnt      #  on both
Then I pull the virtual network cable out of one host (with 'virsh
domif-setlink fiori2 vnet10 down') and then run:
	ls /mnt                                       #  on both (wait for timeouts to
elapse)
	uname -n > /mnt/hostname		      #  on both (create conflict)
Then I put the cable back, wait a bit and then run:
	torchio2# cat /mnt/hostname
	cat: /mnt/hostname: Input/output error
	torchio2# 
I'm deliberately trying to provoke split-brain, so this I/O error
is no surprise.
The real problem comes when I try to recover from it:
	fiori2# gluster volume heal vmimages info
	Brick torchio2:/vol/brick0/vmimages
	/ - Is in split-brain
	
	/hostname 
	Number of entries: 2
	
	Brick fiori2:/vol/brick0/vmimages
	/ - Is in split-brain
	
	/hostname 
	Number of entries: 2
	
	fiori2# gluster volume heal vmimages split-brain source-brick
torchio2:/vol/brick0/vmimages
	'source-brick' option used on a directory
(gfid:00000000-0000-0000-0000-000000000001). Performing conservative merge.
	Healing gfid:00000000-0000-0000-0000-000000000001 failed:Operation not
permitted.
	Healing gfid:73dce70e-bb3e-40a2-bec9-4741399b6b72 failed:Transport endpoint is
not connected.
	Number of healed entries: 0
	fiori2# 
and the I/O error remains.
I've also tried it the manual/fattr way, but that itself also
produces I/O errors:
	fiori2# getfattr -d -m . -e hex /mnt/hostname
	getfattr: /mnt/hostname: Input/output error
	fiori2# 
I've done some googling, but not turned up any references to
split-brain with "Operation not permitted" or "Transport endpoint
is
not connected". Am I doing something wrong? Is this a known bug?
Is there a workaround?
For info, I'm using:
	fiori2# cat /etc/issue
	Ubuntu 16.04 LTS \n \l
	
	fiori2# uname -a
	Linux fiori2 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64
x86_64 x86_64 GNU/Linux
	fiori2# dpkg -l | grep gluster
	ii  glusterfs-client                   3.7.6-1ubuntu1                  amd64   
clustered file-system (client package)
	ii  glusterfs-common                   3.7.6-1ubuntu1                  amd64   
GlusterFS common libraries and translator modules
	ii  glusterfs-server                   3.7.6-1ubuntu1                  amd64   
clustered file-system (server package)
	fiori2# 
I understand that two nodes are not optimal; occassional split-brain 
is acceptable so long as I can recover from it. Up to now, for
a clustered filesystem on my VM servers, I've been using DRBD+OCFS2,
but the NFS3 interaction has been glitchy, so now I'm doing some
tests with GlusterFS.
Any advice gratefully received! Thanks!
Alexis
Ravishankar N
2016-Jun-20  01:41 UTC
[Gluster-users] easily provoked unrecoverable split brain
On 06/19/2016 09:01 PM, Alexis Huxley wrote:> As per the quickstart guide, I'm setting up a replicated volume on two > test (KVM) VMs fiori2 and torchio2 as follows: > > mkfs -t xfs -i size=512 -f /dev/vdb1 # on both > mount /dev/vdb1 /vol/brick0 # on both > gluster peer probe torchio2 # on fiori2 > gluster peer probe fiori2 # on torchio2 > mkdir /vol/brick0/vmimages # on both > gluster volume create vmimages replica 2 \ > torchio2:/vol/brick0/vmimages > fiori2:/vol/brick0/vmimages # fiori2 > mount -t glusterfs fiori2:/vmimages /mnt # on both > > Then I pull the virtual network cable out of one host (with 'virsh > domif-setlink fiori2 vnet10 down') and then run: > > ls /mnt # on both (wait for timeouts to elapse) > uname -n > /mnt/hostname # on both (create conflict)Since you are creating the file each time from the clients (as opposed to modifying an existing file), you end up in the same file having different gfids in the bricks. The split-brain resolution commands cannot be used to fix gfid split-brains or entry split-brains (i.e. same file name but different file type). You would need to remove the file and all its hard links (including the one in .glusterfs folder) directly from one of the bricks and then trigger heal. See 'Fixing Directory entry split-brain' in https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md. -Ravi> Then I put the cable back, wait a bit and then run: > > torchio2# cat /mnt/hostname > cat: /mnt/hostname: Input/output error > torchio2# > > I'm deliberately trying to provoke split-brain, so this I/O error > is no surprise. > > The real problem comes when I try to recover from it: > > fiori2# gluster volume heal vmimages info > Brick torchio2:/vol/brick0/vmimages > / - Is in split-brain > > /hostname > Number of entries: 2 > > Brick fiori2:/vol/brick0/vmimages > / - Is in split-brain > > /hostname > Number of entries: 2 > > fiori2# gluster volume heal vmimages split-brain source-brick torchio2:/vol/brick0/vmimages > 'source-brick' option used on a directory (gfid:00000000-0000-0000-0000-000000000001). Performing conservative merge. > Healing gfid:00000000-0000-0000-0000-000000000001 failed:Operation not permitted. > Healing gfid:73dce70e-bb3e-40a2-bec9-4741399b6b72 failed:Transport endpoint is not connected. > Number of healed entries: 0 > fiori2# > > and the I/O error remains. > > I've also tried it the manual/fattr way, but that itself also > produces I/O errors: > > fiori2# getfattr -d -m . -e hex /mnt/hostname > getfattr: /mnt/hostname: Input/output error > fiori2# > > I've done some googling, but not turned up any references to > split-brain with "Operation not permitted" or "Transport endpoint is > not connected". Am I doing something wrong? Is this a known bug? > Is there a workaround? > > For info, I'm using: > > fiori2# cat /etc/issue > Ubuntu 16.04 LTS \n \l > > fiori2# uname -a > Linux fiori2 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > fiori2# dpkg -l | grep gluster > ii glusterfs-client 3.7.6-1ubuntu1 amd64 clustered file-system (client package) > ii glusterfs-common 3.7.6-1ubuntu1 amd64 GlusterFS common libraries and translator modules > ii glusterfs-server 3.7.6-1ubuntu1 amd64 clustered file-system (server package) > fiori2# > > I understand that two nodes are not optimal; occassional split-brain > is acceptable so long as I can recover from it. Up to now, for > a clustered filesystem on my VM servers, I've been using DRBD+OCFS2, > but the NFS3 interaction has been glitchy, so now I'm doing some > tests with GlusterFS. > > Any advice gratefully received! Thanks! > > Alexis > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users