Henrik Juul Pedersen
2017-Dec-20 18:26 UTC
[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Hi, I have the following volume: Volume Name: virt_images Type: Replicate Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 Status: Started Snapshot Count: 2 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: virt3:/data/virt_images/brick Brick2: virt2:/data/virt_images/brick Brick3: printserver:/data/virt_images/brick (arbiter) Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.barrier: disable features.scrub: Active features.bitrot: on nfs.rpc-auth-allow: on server.allow-insecure: on user.cifs: off features.shard: off cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.disable: on transport.address-family: inet server.outstanding-rpc-limit: 512 After a server reboot (brick 1) a single file has become unavailable: # touch fedora27.qcow2 touch: setting times of 'fedora27.qcow2': Input/output error Looking at the split brain status from the client side cli: # getfattr -n replica.split-brain-status fedora27.qcow2 # file: fedora27.qcow2 replica.split-brain-status="The file is not under data or metadata split-brain" However, in the client side log, a split brain is mentioned: [2017-12-20 18:05:23.570762] E [MSGID: 108008] [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-virt_images-replicate-0: Failing SETATTR on gfid 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed. [Input/output error] [2017-12-20 18:05:23.576046] W [MSGID: 108027] [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no read subvols for /fedora27.qcow2 [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk] 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output error) = Server side No mention of a possible split brain: # gluster volume heal virt_images info split-brain Brick virt3:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 Brick virt2:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 Brick printserver:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 The info command shows the file: ]# gluster volume heal virt_images info Brick virt3:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick virt2:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick printserver:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 The heal and heal full commands does nothing, and I can't find anything in the logs about them trying and failing to fix the file. Trying to manually resolve the split brain from cli gives the following: # gluster volume heal virt_images split-brain source-brick virt3:/data/virt_images/brick /fedora27.qcow2 Healing /fedora27.qcow2 failed: File not in split-brain. Volume heal failed. The attrs from virt2 and virt3 are as follows: [root at virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2 # file: fedora27.qcow2 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.virt_images-client-1=0x000002280000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563 trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001 trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 # file: fedora27.qcow2 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.virt_images-client-2=0x000003ef0000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001 trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 I don't know how to find similar information from the arbiter... Versions are the same on all three systems: # glusterd --version glusterfs 3.12.2 # gluster volume get all cluster.op-version Option Value ------ ----- cluster.op-version 31202 I might try upgrading to version 3.13.0 tomorrow, but I want to hear you out first. How do I fix this? Do I have to manually change the file attributes? Also, in the guides for manual resolution through setfattr, all the bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in my system (as can be seen above), I only see the other bricks? So which attributes should be changes into what? I hope someone might know a solution. If you need any more information I'll try and provide it. I can probably change the virtual machine to another image for now. Best regards, Henrik Juul Pedersen LIAB ApS
Ben Turner
2017-Dec-21 04:25 UTC
[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Here is the process for resolving split brain on replica 2: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html It should be pretty much the same for replica 3, you change the xattrs with something like: # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /gfs/brick-b/a When I try to decide which copy to use I normally run things like: # stat /<path to brick>/pat/to/file Check out the access and change times of the file on the back end bricks. I normally pick the copy with the latest access / change times. I'll also check: # md5sum /<path to brick>/pat/to/file Compare the hashes of the file on both bricks to see if the data actually differs. If the data is the same it makes choosing the proper replica easier. Any idea how you got in this situation? Did you have a loss of NW connectivity? I see you are using server side quorum, maybe check the logs for any loss of quorum? I wonder if there was a loos of quorum and there was some sort of race condition hit: http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls "Unlike in client-quorum where the volume becomes read-only when quorum is lost, loss of server-quorum in a particular node makes glusterd kill the brick processes on that node (for the participating volumes) making even reads impossible." I wonder if the killing of brick processes could have led to some sort of race condition where writes were serviced on one brick / the arbiter and not the other? If you can find a reproducer for this please open a BZ with it, I have been seeing something similar(I think) but I haven't been able to run the issue down yet. -b ----- Original Message -----> From: "Henrik Juul Pedersen" <hjp at liab.dk> > To: gluster-users at gluster.org > Cc: "Henrik Juul Pedersen" <henrik at corepower.dk> > Sent: Wednesday, December 20, 2017 1:26:37 PM > Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware > > Hi, > > I have the following volume: > > Volume Name: virt_images > Type: Replicate > Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 > Status: Started > Snapshot Count: 2 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: virt3:/data/virt_images/brick > Brick2: virt2:/data/virt_images/brick > Brick3: printserver:/data/virt_images/brick (arbiter) > Options Reconfigured: > features.quota-deem-statfs: on > features.inode-quota: on > features.quota: on > features.barrier: disable > features.scrub: Active > features.bitrot: on > nfs.rpc-auth-allow: on > server.allow-insecure: on > user.cifs: off > features.shard: off > cluster.shd-wait-qlength: 10000 > cluster.locking-scheme: granular > cluster.data-self-heal-algorithm: full > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.eager-lock: enable > network.remote-dio: enable > performance.low-prio-threads: 32 > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > nfs.disable: on > transport.address-family: inet > server.outstanding-rpc-limit: 512 > > After a server reboot (brick 1) a single file has become unavailable: > # touch fedora27.qcow2 > touch: setting times of 'fedora27.qcow2': Input/output error > > Looking at the split brain status from the client side cli: > # getfattr -n replica.split-brain-status fedora27.qcow2 > # file: fedora27.qcow2 > replica.split-brain-status="The file is not under data or metadata > split-brain" > > However, in the client side log, a split brain is mentioned: > [2017-12-20 18:05:23.570762] E [MSGID: 108008] > [afr-transaction.c:2629:afr_write_txn_refresh_done] > 0-virt_images-replicate-0: Failing SETATTR on gfid > 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed. > [Input/output error] > [2017-12-20 18:05:23.576046] W [MSGID: 108027] > [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no > read subvols for /fedora27.qcow2 > [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk] > 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output > error) > > = Server side > > No mention of a possible split brain: > # gluster volume heal virt_images info split-brain > Brick virt3:/data/virt_images/brick > Status: Connected > Number of entries in split-brain: 0 > > Brick virt2:/data/virt_images/brick > Status: Connected > Number of entries in split-brain: 0 > > Brick printserver:/data/virt_images/brick > Status: Connected > Number of entries in split-brain: 0 > > The info command shows the file: > ]# gluster volume heal virt_images info > Brick virt3:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > Brick virt2:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > Brick printserver:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > > The heal and heal full commands does nothing, and I can't find > anything in the logs about them trying and failing to fix the file. > > Trying to manually resolve the split brain from cli gives the following: > # gluster volume heal virt_images split-brain source-brick > virt3:/data/virt_images/brick /fedora27.qcow2 > Healing /fedora27.qcow2 failed: File not in split-brain. > Volume heal failed. > > The attrs from virt2 and virt3 are as follows: > [root at virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2 > # file: fedora27.qcow2 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.virt_images-client-1=0x000002280000000000000000 > trusted.afr.virt_images-client-3=0x000000000000000000000000 > trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563 > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 > trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001 > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 > > # file: fedora27.qcow2 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.virt_images-client-2=0x000003ef0000000000000000 > trusted.afr.virt_images-client-3=0x000000000000000000000000 > trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 > trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001 > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 > > I don't know how to find similar information from the arbiter... > > Versions are the same on all three systems: > # glusterd --version > glusterfs 3.12.2 > > # gluster volume get all cluster.op-version > Option Value > ------ ----- > cluster.op-version 31202 > > I might try upgrading to version 3.13.0 tomorrow, but I want to hear > you out first. > > How do I fix this? Do I have to manually change the file attributes? > > Also, in the guides for manual resolution through setfattr, all the > bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in > my system (as can be seen above), I only see the other bricks? So > which attributes should be changes into what? > > > > I hope someone might know a solution. If you need any more information > I'll try and provide it. I can probably change the virtual machine to > another image for now. > > Best regards, > Henrik Juul Pedersen > LIAB ApS > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
Karthik Subrahmanya
2017-Dec-21 06:18 UTC
[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Hey, Can you give us the volume info output for this volume? Why are you not able to get the xattrs from arbiter brick? It is the same way as you do it on data bricks. The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3} in the getxattr outputs you have provided. Did you do a remove-brick and add-brick any time? Otherwise it will be trusted.afr.virt_images-client-{0,1,2} usually. To overcome this scenario you can do what Ben Turner had suggested. Select the source copy and change the xattrs manually. I am suspecting that it has hit the arbiter becoming source for data heal bug. But to confirm that we need the xattrs on the arbiter brick also. Regards, Karthik On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote:> Here is the process for resolving split brain on replica 2: > > https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/ > Administration_Guide/Recovering_from_File_Split-brain.html > > It should be pretty much the same for replica 3, you change the xattrs > with something like: > > # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 > /gfs/brick-b/a > > When I try to decide which copy to use I normally run things like: > > # stat /<path to brick>/pat/to/file > > Check out the access and change times of the file on the back end bricks. > I normally pick the copy with the latest access / change times. I'll also > check: > > # md5sum /<path to brick>/pat/to/file > > Compare the hashes of the file on both bricks to see if the data actually > differs. If the data is the same it makes choosing the proper replica > easier. > > Any idea how you got in this situation? Did you have a loss of NW > connectivity? I see you are using server side quorum, maybe check the logs > for any loss of quorum? I wonder if there was a loos of quorum and there > was some sort of race condition hit: > > http://docs.gluster.org/en/latest/Administrator%20Guide/ > arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls > > "Unlike in client-quorum where the volume becomes read-only when quorum is > lost, loss of server-quorum in a particular node makes glusterd kill the > brick processes on that node (for the participating volumes) making even > reads impossible." > > I wonder if the killing of brick processes could have led to some sort of > race condition where writes were serviced on one brick / the arbiter and > not the other? > > If you can find a reproducer for this please open a BZ with it, I have > been seeing something similar(I think) but I haven't been able to run the > issue down yet. > > -b > > ----- Original Message ----- > > From: "Henrik Juul Pedersen" <hjp at liab.dk> > > To: gluster-users at gluster.org > > Cc: "Henrik Juul Pedersen" <henrik at corepower.dk> > > Sent: Wednesday, December 20, 2017 1:26:37 PM > > Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. > gluster cli seems unaware > > > > Hi, > > > > I have the following volume: > > > > Volume Name: virt_images > > Type: Replicate > > Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 > > Status: Started > > Snapshot Count: 2 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: virt3:/data/virt_images/brick > > Brick2: virt2:/data/virt_images/brick > > Brick3: printserver:/data/virt_images/brick (arbiter) > > Options Reconfigured: > > features.quota-deem-statfs: on > > features.inode-quota: on > > features.quota: on > > features.barrier: disable > > features.scrub: Active > > features.bitrot: on > > nfs.rpc-auth-allow: on > > server.allow-insecure: on > > user.cifs: off > > features.shard: off > > cluster.shd-wait-qlength: 10000 > > cluster.locking-scheme: granular > > cluster.data-self-heal-algorithm: full > > cluster.server-quorum-type: server > > cluster.quorum-type: auto > > cluster.eager-lock: enable > > network.remote-dio: enable > > performance.low-prio-threads: 32 > > performance.io-cache: off > > performance.read-ahead: off > > performance.quick-read: off > > nfs.disable: on > > transport.address-family: inet > > server.outstanding-rpc-limit: 512 > > > > After a server reboot (brick 1) a single file has become unavailable: > > # touch fedora27.qcow2 > > touch: setting times of 'fedora27.qcow2': Input/output error > > > > Looking at the split brain status from the client side cli: > > # getfattr -n replica.split-brain-status fedora27.qcow2 > > # file: fedora27.qcow2 > > replica.split-brain-status="The file is not under data or metadata > > split-brain" > > > > However, in the client side log, a split brain is mentioned: > > [2017-12-20 18:05:23.570762] E [MSGID: 108008] > > [afr-transaction.c:2629:afr_write_txn_refresh_done] > > 0-virt_images-replicate-0: Failing SETATTR on gfid > > 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed. > > [Input/output error] > > [2017-12-20 18:05:23.576046] W [MSGID: 108027] > > [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no > > read subvols for /fedora27.qcow2 > > [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk] > > 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output > > error) > > > > = Server side > > > > No mention of a possible split brain: > > # gluster volume heal virt_images info split-brain > > Brick virt3:/data/virt_images/brick > > Status: Connected > > Number of entries in split-brain: 0 > > > > Brick virt2:/data/virt_images/brick > > Status: Connected > > Number of entries in split-brain: 0 > > > > Brick printserver:/data/virt_images/brick > > Status: Connected > > Number of entries in split-brain: 0 > > > > The info command shows the file: > > ]# gluster volume heal virt_images info > > Brick virt3:/data/virt_images/brick > > /fedora27.qcow2 > > Status: Connected > > Number of entries: 1 > > > > Brick virt2:/data/virt_images/brick > > /fedora27.qcow2 > > Status: Connected > > Number of entries: 1 > > > > Brick printserver:/data/virt_images/brick > > /fedora27.qcow2 > > Status: Connected > > Number of entries: 1 > > > > > > The heal and heal full commands does nothing, and I can't find > > anything in the logs about them trying and failing to fix the file. > > > > Trying to manually resolve the split brain from cli gives the following: > > # gluster volume heal virt_images split-brain source-brick > > virt3:/data/virt_images/brick /fedora27.qcow2 > > Healing /fedora27.qcow2 failed: File not in split-brain. > > Volume heal failed. > > > > The attrs from virt2 and virt3 are as follows: > > [root at virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2 > > # file: fedora27.qcow2 > > trusted.afr.dirty=0x000000000000000000000000 > > trusted.afr.virt_images-client-1=0x000002280000000000000000 > > trusted.afr.virt_images-client-3=0x000000000000000000000000 > > trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563 > > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba > > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d > 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 > > trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1> 0x00000000a49eb0000000000000000001 > > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 > > > > # file: fedora27.qcow2 > > trusted.afr.dirty=0x000000000000000000000000 > > trusted.afr.virt_images-client-2=0x000003ef0000000000000000 > > trusted.afr.virt_images-client-3=0x000000000000000000000000 > > trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a > > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba > > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d > 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 > > trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1> 0x00000000a2fbe0000000000000000001 > > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 > > > > I don't know how to find similar information from the arbiter... > > > > Versions are the same on all three systems: > > # glusterd --version > > glusterfs 3.12.2 > > > > # gluster volume get all cluster.op-version > > Option Value > > ------ ----- > > cluster.op-version 31202 > > > > I might try upgrading to version 3.13.0 tomorrow, but I want to hear > > you out first. > > > > How do I fix this? Do I have to manually change the file attributes? > > > > Also, in the guides for manual resolution through setfattr, all the > > bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in > > my system (as can be seen above), I only see the other bricks? So > > which attributes should be changes into what? > > > > > > > > I hope someone might know a solution. If you need any more information > > I'll try and provide it. I can probably change the virtual machine to > > another image for now. > > > > Best regards, > > Henrik Juul Pedersen > > LIAB ApS > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171221/12856c17/attachment.html>
Possibly Parallel Threads
- Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
- Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
- Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
- Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
- Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware