I was checking on the client fuse mount and saw that [2015-09-26 14:41:29.417267] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-devstatic-replicate-1: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 15 ] [ 1 0 ] ] [2015-09-26 14:41:29.418063] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-devstatic-replicate-1: metadata self heal failed, on / After reviewing your doc to build understanding, https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md I found http://thr3ads.net/gluster-users/2013/11/2710016-Unable-to-self-heal-contents-of-gfid-00000000-0000-0000-0000-000000000001 also had the same issue of '/'. Now I believe all is clear. Client log: [2015-09-26 14:53:35.662325] I [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-devstatic-replicate-1: metadata self heal is successfully completed, metadata self heal from source devstatic-client-2 to devstatic-client-3, metadata - Pending matrix: [ [ 0 0 ] [ 0 0 ] ], on / [2015-09-26 14:53:35.667537] I [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-devstatic-replicate-0: metadata self heal is successfully completed, metadata self heal from source devstatic-client-0 to devstatic-client-1, metadata - Pending matrix: [ [ 0 0 ] [ 0 0 ] ], on / gluster storage cli output. Now the gluster volume heal devstatic info split-brain is clean [root at omdx1b51 ~]# gluster volume heal devstatic info split-brain Gathering list of split brain entries on volume devstatic has been successful Brick omhq1b4e:/static/content Number of entries: 0 Brick omdx1b50:/static/content Number of entries: 0 Brick omhq1b4f:/static/content Number of entries: 0 Brick omdx1b51:/static/content Number of entries: 0 Please let me know if there were any steps I've missed or additional areas to look at. Thank you, Khoi Mai Union Pacific Railroad Distributed Engineering & Architecture Senior Project Engineer From: Ravishankar N <ravishankar at redhat.com> To: Khoi Mai <KHOIMAI at up.com> Date: 09/26/2015 12:59 AM Subject: Re: [Gluster-users] glusterfs3.4.2-1 split-brain question This email originated from outside of the company. Please use discretion if opening attachments or clicking on links. On 09/26/2015 10:37 AM, Khoi Mai wrote: I'd like to run the afr attr reset on omdx1b51, does that make omhq1b4f the winning source? Yes. Resetting trusted.afr.devstatic-client-2 on omdx1b51 makes omhq1b4f the source because it blames omdx1b51 via trusted.afr.devstatic-client-3. Or do I run the commands on the server I want to be the source? For example below? So, the intended changes are: On omdx1b51 For trusted.afr.devstatic-client-2 0x000000000000000600000000 to 0x000000000000000000000000 Hence execute setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /static/content/ then gluster volume heal devstatic Thank you for your help! Khoi Mai Union Pacific Railroad Distributed Engineering & Architecture Senior Project Engineer From: Ravishankar N <ravishankar at redhat.com> To: Khoi Mai <KHOIMAI at up.com> Cc: gluster-users at gluster.org Date: 09/25/2015 09:04 PM Subject: Re: [Gluster-users] glusterfs3.4.2-1 split-brain question This email originated from outside of the company. Please use discretion if opening attachments or clicking on links. On 09/25/2015 07:40 PM, Khoi Mai wrote: I think I found it from your github doc. the quota size does not match with the replicate pair. I don't know if that would make the difference. I apologize, i cannot use fpaste.org, or pastebin.com due to policies at my company. I'm not sure quota xattrs are handled in AFR in glusterfs-3.4. There doesn't seem to be any split-brain in the first replica pair since the afr xattrs all seem to be zero. For the second replica pair, they are in metadata split-brain (but unlikely due to the quota-size xattr). You can pick one brick as source reset the appropriate afr xattr and run `gluster v heal volname` once. [root at omhq1b4e ~]# getfattr -d -m . -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.afr.devstatic-client-0=0x000000000000000000000000 trusted.afr.devstatic-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff trusted.glusterfs.quota.size=0x0000006f303e4e00 trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc [root at omdx1b50 ~]# getfattr -d -m . -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.afr.devstatic-client-0=0x000000000000000000000000 trusted.afr.devstatic-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff trusted.glusterfs.quota.size=0x00000081bfca4e00 trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc [root at omhq1b4f ~]# getfattr -d -m . -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.afr.devstatic-client-2=0x000000000000000000000000 trusted.afr.devstatic-client-3=0x000000000000000900000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff trusted.glusterfs.quota.size=0x00000076b9b20800 trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc [root at omdx1b51 ~]# getfattr -d -m . -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.afr.devstatic-client-2=0x000000000000000600000000 trusted.afr.devstatic-client-3=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff trusted.glusterfs.quota.size=0x0000006eb4e0b000 trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc Khoi Mai Union Pacific Railroad Distributed Engineering & Architecture Senior Project Engineer From: Khoi Mai/UPC To: Ravishankar N <ravishankar at redhat.com> Cc: gluster-users at gluster.org Date: 09/25/2015 09:01 AM Subject: Re: [Gluster-users] glusterfs3.4.2-1 split-brain question the gfid looks the same. I'm not sure what gluster volume heal info split-brain is reporting when the GFID matches, and for all 4 nodes in the devstatic volume. [root at omhq1b4f ~]# getfattr -h -d -m trusted.gfid -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.gfid=0x00000000000000000000000000000001 [root at omhq1b4f ~]# stat /static/content/ File: `/static/content/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd02h/64770d Inode: 536871040 Links: 90 Access: (0775/drwxrwxr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2014-02-02 09:06:27.073528000 -0600 Modify: 2014-12-23 10:13:00.823641000 -0600 Change: 2015-09-25 08:42:44.524336543 -0500 [root at omhq1b4f ~]# [root at omdx1b51 ~]# getfattr -h -d -m trusted.gfid -e hex /static/content/ getfattr: Removing leading '/' from absolute path names # file: static/content/ trusted.gfid=0x00000000000000000000000000000001 [root at omdx1b51 ~]# stat /static/content/ File: `/static/content/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd02h/64770d Inode: 536871040 Links: 90 Access: (0775/drwxrwxr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2014-02-02 09:06:27.073528000 -0600 Modify: 2014-12-23 10:13:00.823641000 -0600 Change: 2015-09-25 08:42:44.526287950 -0500 Khoi Mai Union Pacific Railroad Distributed Engineering & Architecture Senior Project Engineer From: Ravishankar N <ravishankar at redhat.com> To: Khoi Mai <KHOIMAI at UP.COM>, gluster-users at gluster.org Date: 09/25/2015 03:13 AM Subject: Re: [Gluster-users] glusterfs3.4.2-1 split-brain question This email originated from outside of the company. Please use discretion if opening attachments or clicking on links. On 09/25/2015 07:48 AM, Khoi Mai wrote: I have a 4 node distributed-replicated gluster farm. Volume Name: devstatic Type: Distributed-Replicate Volume ID: 75832afb-f20e-4017-8d74-8550a92233fd Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: omhq1b4e:/static/content Brick2: omdx1b50:/static/content Brick3: omhq1b4f:/static/content Brick4: omdx1b51:/static/content Options Reconfigured: features.quota-deem-statfs: on server.allow-insecure: on network.ping-timeout: 10 performance.lazy-open: off performance.write-behind: on features.quota: on geo-replication.indexing: off server.statedump-path: /tmp/ diagnostics.brick-log-level: CRITICAL When I query heal split-brain info I get the following. [root at omhq1b4e ~]# gluster volume heal devstatic info split-brain Gathering list of split brain entries on volume devstatic has been successful Brick omhq1b4e:/static/content Number of entries: 0 Brick omdx1b50:/static/content Number of entries: 0 Brick omhq1b4f:/static/content Number of entries: 43 at path on brick ----------------------------------- 2015-09-24 18:50:20 / 2015-09-24 18:50:20 / 2015-09-24 18:52:01 / 2015-09-24 19:10:22 / Brick omdx1b51:/static/content Number of entries: 42 at path on brick ----------------------------------- 2015-09-24 18:51:58 / 2015-09-24 18:51:59 / 2015-09-24 19:01:59 / 2015-09-24 19:11:59 / Being / on the same replicate, how would I safely resolve this issue? Is it really going to require me to delete the root of each node and heal? I hope not, the entire volume is about 1TB. No, it is likely that the root is only in metadata split-brain. What does the getfattr output of '/' show on the bricks? https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md should tell you how to resolve split-brains. Thank you, Khoi Mai Union Pacific Railroad Distributed Engineering & Architecture Senior Project Engineer ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150926/da6a99e5/attachment.html>