Alessandro Ipe
2015-Mar-12 10:33 UTC
[Gluster-users] Input/output error when trying to access a file on client
Hi, "gluster volume heal md1 info split-brain" returns approximatively 2000 files (already divided by 2 due to replicate volume). So manually repairing each split-brain is unfeasable. Before scripting some procedure, I need to be sure that I will not harm further the gluster system. Moreover, I noticed that the messages printed in the logs are all about directories, e.g. [2015-03-12 10:06:53.423856] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424005] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424110] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1: metadata self heal failed, on /root [2015-03-12 10:06:53.424290] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /root Getting the attributes of that directory on each brick gives me for the first # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= and for the second # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= so it seems that there are both rigorously identical. However, according to your split -brain tutorial, none of them has 0x000000000000000000000000. What 0sAAAAAAAAAAAAAAAA means in fact ? Should I change both attributes on each directory to 0x000000000000000000000000 ? Many thanks, A. On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote: Hi, Have you gone through https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1] ? If not, could you go through that once and try the steps given there? Do let us know if something is not clear in the doc. -Krutika -------------------- *From: *"Alessandro Ipe" <Alessandro.Ipe at meteo.be> *To: *gluster-users at gluster.org *Sent: *Wednesday, March 11, 2015 4:54:09 PM *Subject: *Re: [Gluster-users] Input/output error when trying to access a file on client Well, it is even worse. Now when doing a "ls -R" on the volume results in a lot of [2015-03-11 11:18:31.957505] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/library' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 1 0 ] ][2015-03-11 11:18:31.957692] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /library I am desperate... _______________________________________________Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users -------- [1] https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150312/2bab0946/attachment.html>
Alessandro Ipe
2015-Mar-12 11:45 UTC
[Gluster-users] Input/output error when trying to access a file on client
Hi, Actually, my gluster volume is distribute-replicate so I should provide the attributes on all the bricks. Here they are: 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= 2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= 3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-3=0sAAAAAAAAAAEAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAVVVVVA= 4. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAAAAAAAAAEAAAAA trusted.afr.md1-client-3=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAVVVVVA= 5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-5=0sAAAAAAAAAAEAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAABVVVVVqqqqqQ= 6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0sAAAAAAAAAAEAAAAA trusted.afr.md1-client-5=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAABVVVVVqqqqqQ= so it seems in fact that there are discrepancies between 3-4 and 5-6 (replicate pairs). A. On Thursday 12 March 2015 11:33:00 Alessandro Ipe wrote: Hi, "gluster volume heal md1 info split-brain" returns approximatively 2000 files (already divided by 2 due to replicate volume). So manually repairing each split-brain is unfeasable. Before scripting some procedure, I need to be sure that I will not harm further the gluster system. Moreover, I noticed that the messages printed in the logs are all about directories, e.g. [2015-03-12 10:06:53.423856] E [afr-self-heal- common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424005] E [afr-self-heal- common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424110] E [afr-self-heal- common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1: metadata self heal failed, on /root [2015-03-12 10:06:53.424290] E [afr-self-heal- common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /root Getting the attributes of that directory on each brick gives me for the first # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= and for the second # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw=trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w= so it seems that there are both rigorously identical. However, according to your split - brain tutorial, none of them has 0x000000000000000000000000. What 0sAAAAAAAAAAAAAAAA means in fact ? Should I change both attributes on each directory to 0x000000000000000000000000 ? Many thanks, A. On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote: Hi, Have you gone through https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1] ? If not, could you go through that once and try the steps given there? Do let us know if something is not clear in the doc. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150312/1d6b300b/attachment.html>