Ravishankar N
2016-Mar-04 12:01 UTC
[Gluster-users] [Gluster-devel] Query on healing process
On 03/04/2016 12:10 PM, ABHISHEK PALIWAL wrote:> Hi Ravi, > > 3. On the rebooted node, do you have ssl enabled by any chance? There > is a bug for "Not able to fetch volfile' when ssl is enabled: > https://bugzilla.redhat.com/show_bug.cgi?id=1258931 > > ->>>>> I have checked but ssl is disabled but still getting these errors > > # gluster volume heal c_glusterfs info > c_glusterfs: Not able to fetch volfile from glusterd > Volume heal failed. >Ok, just to confirm, glusterd and other brick processes are running after this node rebooted? When you run the above command, you need to check /var/log/glusterfs/glfsheal-volname.log logs errros. Setting client-log-level to DEBUG would give you a more verbose message> # gluster volume heal c_glusterfs info split-brain > c_glusterfs: Not able to fetch volfile from glusterd > Volume heal failed.> > > And based on the your observation I understood that this is not the > problem of split-brain but *is there any way through which can find > out the file which is not in split-brain as well as not in sync?*`gluster volume heal c_glusterfs info split-brain` should give you files that need heal.> > # getfattr -m . -d -e hex > /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > getfattr: Removing leading '/' from absolute path names > # file: > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-0=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-2=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-4=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-6=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000**//because client8 > is the latest client in our case and starting 8 digits ** > * > *00000006....are saying like there is something in changelog data. > * > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000001356d86c0c000217fd > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > # lhsh 002500 getfattr -m . -d -e hex > /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > getfattr: Removing leading '/' from absolute path names > # file: > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000**// and > here we can say that there is no split brain but the file is out of sync* > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000001156d86c290005735c > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > # gluster volume info > > Volume Name: c_glusterfs > Type: Replicate > Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.32.0.48:/opt/lvmdir/c2/brick > Brick2: 10.32.1.144:/opt/lvmdir/c2/brick > Options Reconfigured: > performance.readdir-ahead: on > network.ping-timeout: 4 > nfs.disable: on > > > # gluster volume info > > Volume Name: c_glusterfs > Type: Replicate > Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.32.0.48:/opt/lvmdir/c2/brick > Brick2: 10.32.1.144:/opt/lvmdir/c2/brick > Options Reconfigured: > performance.readdir-ahead: on > network.ping-timeout: 4 > nfs.disable: on > > # gluster --version > glusterfs 3.7.8 built on Feb 17 2016 07:49:49 > Repository revision: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com > <https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>> > > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > # gluster volume heal info heal-failed > Usage: volume heal <VOLNAME> [enable | disable | full |statistics > [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | > heal-failed | split-brain] |split-brain {bigger-file <FILE> > |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}] > # gluster volume heal c_glusterfs info heal-failed > Command not supported. Please use "gluster volume heal c_glusterfs > info" and logs to find the heal information. > # lhsh 002500 > _______ _____ _____ _____ __ _ _ _ _ _ > | |_____] |_____] | | | \ | | | \___/ > |_____ | | |_____ __|__ | \_| |_____| _/ \_ > > 002500> gluster --version > glusterfs 3.7.8 built on Feb 17 2016 07:49:49 > Repository revision: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com > <https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>> > > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > 002500> > > Regards, > Abhishek > > On Thu, Mar 3, 2016 at 4:54 PM, ABHISHEK PALIWAL > <abhishpaliwal at gmail.com <mailto:abhishpaliwal at gmail.com>> wrote: > > > On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N > <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote: > > Hi, > > On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote: >> Hi Ravi, >> >> As I discussed earlier this issue, I investigated this issue >> and find that healing is not triggered because the "gluster >> volume heal c_glusterfs info split-brain" command not showing >> any entries as a outcome of this command even though the file >> in split brain case. > > Couple of observations from the 'commands_output' file. > > getfattr -d -m . -e hex > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > The afr xattrs do not indicate that the file is in split brain: > # file: > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-1=0x000000000000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9 > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > > > getfattr -d -m . -e hex > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-0=0x000000080000000000000000 > trusted.afr.c_glusterfs-client-2=0x000000020000000000000000 > trusted.afr.c_glusterfs-client-4=0x000000020000000000000000 > trusted.afr.c_glusterfs-client-6=0x000000020000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7 > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > 1. There doesn't seem to be a split-brain going by the > trusted.afr* xattrs. > > > if it is not the split brain problem then how can I resolve this. > > 2. You seem to have re-used the bricks from another > volume/setup. For replica 2, only > trusted.afr.c_glusterfs-client-0 and > trusted.afr.c_glusterfs-client-1 must be present but I see 4 > xattrs - client-0,2,4 and 6 > > > could you please suggest why these entries are there because I am > not able to find out scenario. I am rebooting the one board > multiple times to reproduce the issue and after every reboot doing > the remove-brick and add-brick on the same volume for the second > board. > > 3. On the rebooted node, do you have ssl enabled by any > chance? There is a bug for "Not able to fetch volfile' when > ssl is enabled: > https://bugzilla.redhat.com/show_bug.cgi?id=1258931 > > Btw, you for data and metadata split-brains you can use the > gluster CLI > https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md > instead of modifying the file from the back end. > > > But you are saying it is not split brain problem and even the > split-brain command is not showing any file so how can I find the > bigger file in size. Also in my case the file size is fix 2MB it > is overwritten every time. > > > -Ravi > >> >> So, what I have done I manually deleted the gfid entry of >> that file from .glusterfs directory and follow the >> instruction mentioned in the following link to do heal >> >> https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md >> >> and this works fine for me. >> >> But my question is why the split-brain command not showing >> any file in output. >> >> Here I am attaching all the log which I get from the node for >> you and also the output of commands from both of the boards >> >> In this tar file two directories are present >> >> 000300 - log for the board which is running continuously >> 002500- log for the board which is rebooted >> >> I am waiting for your reply please help me out on this issue. >> >> Thanks in advanced. >> >> Regards, >> Abhishek >> >> On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL >> <abhishpaliwal at gmail.com <mailto:abhishpaliwal at gmail.com>> wrote: >> >> On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N >> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> >> wrote: >> >> On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote: >>> >>> Yes correct >>> >> >> Okay, so when you say the files are not in sync until >> some time, are you getting stale data when accessing >> from the mount? >> I'm not able to figure out why heal info shows zero >> when the files are not in sync, despite all IO >> happening from the mounts. Could you provide the >> output of getfattr -d -m . -e hex /brick/file-name >> from both bricks when you hit this issue? >> >> I'll provide the logs once I get. here delay means we >> are powering on the second board after the 10 minutes. >> >> >>> On Feb 26, 2016 9:57 AM, "Ravishankar N" >>> <ravishankar at redhat.com >>> <mailto:ravishankar at redhat.com>> wrote: >>> >>> Hello, >>> >>> On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote: >>>> Hi Ravi, >>>> >>>> Thanks for the response. >>>> >>>> We are using Glugsterfs-3.7.8 >>>> >>>> Here is the use case: >>>> >>>> We have a logging file which saves logs of the >>>> events for every board of a node and these >>>> files are in sync using glusterfs. System in >>>> replica 2 mode it means When one brick in a >>>> replicated volume goes offline, the glusterd >>>> daemons on the other nodes keep track of all >>>> the files that are not replicated to the >>>> offline brick. When the offline brick becomes >>>> available again, the cluster initiates a >>>> healing process, replicating the updated files >>>> to that brick. But in our casse, we see that >>>> log file of one board is not in the sync and >>>> its format is corrupted means files are not in >>>> sync. >>> >>> Just to understand you correctly, you have >>> mounted the 2 node replica-2 volume on both >>> these nodes and writing to a logging file from >>> the mounts right? >>> >>>> >>>> Even the outcome of #gluster volume heal >>>> c_glusterfs info shows that there is no pending >>>> heals. >>>> >>>> Also , The logging file which is updated is of >>>> fixed size and the new entries will be wrapped >>>> ,overwriting the old entries. >>>> >>>> This way we have seen that after few restarts , >>>> the contents of the same file on two bricks are >>>> different , but the volume heal info shows zero >>>> entries >>>> >>>> Solution: >>>> >>>> But when we tried to put delay > 5 min before >>>> the healing everything is working fine. >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N >>>> <ravishankar at redhat.com >>>> <mailto:ravishankar at redhat.com>> wrote: >>>> >>>> On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote: >>>>> Hi, >>>>> >>>>> Here, I have one query regarding the time >>>>> taken by the healing process. >>>>> In current two node setup when we rebooted >>>>> one node then the self-healing process >>>>> starts less than 5min interval on the >>>>> board which resulting the corruption of >>>>> the some files data. >>>> >>>> Heal should start immediately after the >>>> brick process comes up. What version of >>>> gluster are you using? What do you mean by >>>> corruption of data? Also, how did you >>>> observe that the heal started after 5 minutes? >>>> -Ravi >>>>> >>>>> And to resolve it I have search on google >>>>> and found the following link: >>>>> https://support.rackspace.com/how-to/glusterfs-troubleshooting/ >>>>> >>>>> Mentioning that the healing process can >>>>> takes upto 10min of time to start this >>>>> process. >>>>> >>>>> Here is the statement from the link: >>>>> >>>>> "Healing replicated volumes >>>>> >>>>> When any brick in a replicated volume goes >>>>> offline, the glusterd daemons on the >>>>> remaining nodes keep track of all the >>>>> files that are not replicated to the >>>>> offline brick. When the offline brick >>>>> becomes available again, the cluster >>>>> initiates a healing process, replicating >>>>> the updated files to that brick. *The >>>>> start of this process can take up to 10 >>>>> minutes, based on observation.*" >>>>> >>>>> After giving the time of more than 5 min >>>>> file corruption problem has been resolved. >>>>> >>>>> So, Here my question is there any way >>>>> through which we can reduce the time taken >>>>> by the healing process to start? >>>>> >>>>> >>>>> Regards, >>>>> Abhishek Paliwal >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org >>>>> <mailto:Gluster-devel at gluster.org> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>> >>> >> >> >> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> >> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal > > > > > > -- > > > > > Regards > Abhishek Paliwal > > > > > -- > > > > > Regards > Abhishek Paliwal-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160304/c0df219f/attachment.html>
ABHISHEK PALIWAL
2016-Mar-04 12:53 UTC
[Gluster-users] [Gluster-devel] Query on healing process
On Fri, Mar 4, 2016 at 5:31 PM, Ravishankar N <ravishankar at redhat.com> wrote:> On 03/04/2016 12:10 PM, ABHISHEK PALIWAL wrote: > > Hi Ravi, > > 3. On the rebooted node, do you have ssl enabled by any chance? There is a > bug for "Not able to fetch volfile' when ssl is enabled: > <https://bugzilla.redhat.com/show_bug.cgi?id=1258931> > https://bugzilla.redhat.com/show_bug.cgi?id=1258931 > > ->>>>> I have checked but ssl is disabled but still getting these errors > > # gluster volume heal c_glusterfs info > c_glusterfs: Not able to fetch volfile from glusterd > Volume heal failed. > > > Ok, just to confirm, glusterd and other brick processes are running after > this node rebooted? > When you run the above command, you need to check > /var/log/glusterfs/glfsheal-volname.log logs errros. Setting > client-log-level to DEBUG would give you a more verbose message > > Yes, glusterd and other brick processes running fine. I have check the/var/log/glusterfs/glfsheal-volname.log file without the log-level= DEBUG. Here is the logs from that file [2016-03-02 13:51:39.059440] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-03-02 13:51:39.072172] W [MSGID: 101012] [common-utils.c:2776:gf_get_reserved_ports] 0-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info [No such file or directory] [2016-03-02 13:51:39.072228] W [MSGID: 101081] [common-utils.c:2810:gf_process_reserved_ports] 0-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2016-03-02 13:51:39.072583] E [socket.c:2278:socket_connect_finish] 0-gfapi: connection to 127.0.0.1:24007 failed (Connection refused) [2016-03-02 13:51:39.072663] E [MSGID: 104024] [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected) [Transport endpoint is not connected] [2016-03-02 13:51:39.072700] I [MSGID: 104025] [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all volfile servers [Transport endpoint is not connected]> # gluster volume heal c_glusterfs info split-brain > c_glusterfs: Not able to fetch volfile from glusterd > Volume heal failed. > > > > > And based on the your observation I understood that this is not the > problem of split-brain but *is there any way through which can find out > the file which is not in split-brain as well as not in sync?* > > > `gluster volume heal c_glusterfs info split-brain` should give you files > that need heal. >I have run "gluster volume heal c_glusterfs info split-brain" command but it is not showing that file which is out of sync that is the issue file is not in sync on both of the brick and split-brain is not showing that command in output for heal required. Thats is why I am asking that is there any command other than this split brain command so that I can find out the files those are required the heal operation but not displayed in the output of "gluster volume heal c_glusterfs info split-brain" command.> > > # getfattr -m . -d -e hex > /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > getfattr: Removing leading '/' from absolute path names > # file: > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-0=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-2=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-4=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-6=0x000000000000000000000000 > trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000** //because > client8 is the latest client in our case and starting 8 digits * > > *00000006....are saying like there is something in changelog data. * > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000001356d86c0c000217fd > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > # lhsh 002500 getfattr -m . -d -e hex > /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > getfattr: Removing leading '/' from absolute path names > # file: > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml > trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000** // and > here we can say that there is no split brain but the file is out of sync* > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x000000000000001156d86c290005735c > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae > > # gluster volume info > > Volume Name: c_glusterfs > Type: Replicate > Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.32.0.48:/opt/lvmdir/c2/brick > Brick2: 10.32.1.144:/opt/lvmdir/c2/brick > Options Reconfigured: > performance.readdir-ahead: on > network.ping-timeout: 4 > nfs.disable: on > > > # gluster volume info > > Volume Name: c_glusterfs > Type: Replicate > Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.32.0.48:/opt/lvmdir/c2/brick > Brick2: 10.32.1.144:/opt/lvmdir/c2/brick > Options Reconfigured: > performance.readdir-ahead: on > network.ping-timeout: 4 > nfs.disable: on > > # gluster --version > glusterfs 3.7.8 built on Feb 17 2016 07:49:49 > Repository revision: git://git.gluster.com/glusterfs.git > Copyright (c) 2006-2011 Gluster Inc. < > <https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f> > http://www.gluster.com> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > # gluster volume heal info heal-failed > Usage: volume heal <VOLNAME> [enable | disable | full |statistics > [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | > split-brain] |split-brain {bigger-file <FILE> |source-brick > <HOSTNAME:BRICKNAME> [<FILE>]}] > # gluster volume heal c_glusterfs info heal-failed > Command not supported. Please use "gluster volume heal c_glusterfs info" > and logs to find the heal information. > # lhsh 002500 > _______ _____ _____ _____ __ _ _ _ _ _ > | |_____] |_____] | | | \ | | | \___/ > |_____ | | |_____ __|__ | \_| |_____| _/ \_ > > 002500> gluster --version > glusterfs 3.7.8 built on Feb 17 2016 07:49:49 > Repository revision: git://git.gluster.com/glusterfs.git > Copyright (c) 2006-2011 Gluster Inc. < > <https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f> > http://www.gluster.com> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > 002500> > > Regards, > Abhishek > > On Thu, Mar 3, 2016 at 4:54 PM, ABHISHEK PALIWAL < > <abhishpaliwal at gmail.com>abhishpaliwal at gmail.com> wrote: > >> >> On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N < <ravishankar at redhat.com> >> ravishankar at redhat.com> wrote: >> >>> Hi, >>> >>> On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote: >>> >>> Hi Ravi, >>> >>> As I discussed earlier this issue, I investigated this issue and find >>> that healing is not triggered because the "gluster volume heal c_glusterfs >>> info split-brain" command not showing any entries as a outcome of this >>> command even though the file in split brain case. >>> >>> >>> Couple of observations from the 'commands_output' file. >>> >>> getfattr -d -m . -e hex >>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml >>> The afr xattrs do not indicate that the file is in split brain: >>> # file: >>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml >>> trusted.afr.c_glusterfs-client-1=0x000000000000000000000000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9 >>> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae >>> >>> >>> >>> getfattr -d -m . -e hex >>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml >>> trusted.afr.c_glusterfs-client-0=0x000000080000000000000000 >>> trusted.afr.c_glusterfs-client-2=0x000000020000000000000000 >>> trusted.afr.c_glusterfs-client-4=0x000000020000000000000000 >>> trusted.afr.c_glusterfs-client-6=0x000000020000000000000000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7 >>> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae >>> >>> 1. There doesn't seem to be a split-brain going by the trusted.afr* >>> xattrs. >>> >> >> if it is not the split brain problem then how can I resolve this. >> >> >>> 2. You seem to have re-used the bricks from another volume/setup. For >>> replica 2, only trusted.afr.c_glusterfs-client-0 and >>> trusted.afr.c_glusterfs-client-1 must be present but I see 4 xattrs - >>> client-0,2,4 and 6 >>> >> >> could you please suggest why these entries are there because I am not >> able to find out scenario. I am rebooting the one board multiple times to >> reproduce the issue and after every reboot doing the remove-brick and >> add-brick on the same volume for the second board. >> >> >>> 3. On the rebooted node, do you have ssl enabled by any chance? There is >>> a bug for "Not able to fetch volfile' when ssl is enabled: >>> <https://bugzilla.redhat.com/show_bug.cgi?id=1258931> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1258931 >>> >>> Btw, you for data and metadata split-brains you can use the gluster CLI >>> <https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md> >>> https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md >>> instead of modifying the file from the back end. >>> >> >> But you are saying it is not split brain problem and even the split-brain >> command is not showing any file so how can I find the bigger file in size. >> Also in my case the file size is fix 2MB it is overwritten every time. >> >>> >>> -Ravi >>> >>> >>> So, what I have done I manually deleted the gfid entry of that file from >>> .glusterfs directory and follow the instruction mentioned in the following >>> link to do heal >>> >>> >>> https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md >>> >>> and this works fine for me. >>> >>> But my question is why the split-brain command not showing any file in >>> output. >>> >>> Here I am attaching all the log which I get from the node for you and >>> also the output of commands from both of the boards >>> >>> In this tar file two directories are present >>> >>> 000300 - log for the board which is running continuously >>> 002500- log for the board which is rebooted >>> >>> I am waiting for your reply please help me out on this issue. >>> >>> Thanks in advanced. >>> >>> Regards, >>> Abhishek >>> >>> On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL < >>> <abhishpaliwal at gmail.com>abhishpaliwal at gmail.com> wrote: >>> >>>> On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N < >>>> <ravishankar at redhat.com>ravishankar at redhat.com> wrote: >>>> >>>>> On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote: >>>>> >>>>> Yes correct >>>>> >>>>> >>>>> Okay, so when you say the files are not in sync until some time, are >>>>> you getting stale data when accessing from the mount? >>>>> I'm not able to figure out why heal info shows zero when the files are >>>>> not in sync, despite all IO happening from the mounts. Could you provide >>>>> the output of getfattr -d -m . -e hex /brick/file-name from both bricks >>>>> when you hit this issue? >>>>> >>>>> I'll provide the logs once I get. here delay means we are powering on >>>>> the second board after the 10 minutes. >>>>> >>>>> >>>>> On Feb 26, 2016 9:57 AM, "Ravishankar N" < <ravishankar at redhat.com> >>>>> ravishankar at redhat.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote: >>>>>> >>>>>> Hi Ravi, >>>>>> >>>>>> Thanks for the response. >>>>>> >>>>>> We are using Glugsterfs-3.7.8 >>>>>> >>>>>> Here is the use case: >>>>>> >>>>>> We have a logging file which saves logs of the events for every board >>>>>> of a node and these files are in sync using glusterfs. System in replica 2 >>>>>> mode it means When one brick in a replicated volume goes offline, >>>>>> the glusterd daemons on the other nodes keep track of all the files that >>>>>> are not replicated to the offline brick. When the offline brick becomes >>>>>> available again, the cluster initiates a healing process, replicating the >>>>>> updated files to that brick. But in our casse, we see that log file >>>>>> of one board is not in the sync and its format is corrupted means files are >>>>>> not in sync. >>>>>> >>>>>> >>>>>> Just to understand you correctly, you have mounted the 2 node >>>>>> replica-2 volume on both these nodes and writing to a logging file from the >>>>>> mounts right? >>>>>> >>>>>> >>>>>> Even the outcome of #gluster volume heal c_glusterfs info shows that >>>>>> there is no pending heals. >>>>>> >>>>>> Also , The logging file which is updated is of fixed size and the new >>>>>> entries will be wrapped ,overwriting the old entries. >>>>>> >>>>>> This way we have seen that after few restarts , the contents of the >>>>>> same file on two bricks are different , but the volume heal info shows zero >>>>>> entries >>>>>> >>>>>> Solution: >>>>>> >>>>>> But when we tried to put delay > 5 min before the healing >>>>>> everything is working fine. >>>>>> >>>>>> Regards, >>>>>> Abhishek >>>>>> >>>>>> On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N < >>>>>> <ravishankar at redhat.com>ravishankar at redhat.com> wrote: >>>>>> >>>>>>> On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Here, I have one query regarding the time taken by the healing >>>>>>> process. >>>>>>> In current two node setup when we rebooted one node then the >>>>>>> self-healing process starts less than 5min interval on the board which >>>>>>> resulting the corruption of the some files data. >>>>>>> >>>>>>> >>>>>>> Heal should start immediately after the brick process comes up. What >>>>>>> version of gluster are you using? What do you mean by corruption of data? >>>>>>> Also, how did you observe that the heal started after 5 minutes? >>>>>>> -Ravi >>>>>>> >>>>>>> >>>>>>> And to resolve it I have search on google and found the following >>>>>>> link: >>>>>>> <https://support.rackspace.com/how-to/glusterfs-troubleshooting/> >>>>>>> https://support.rackspace.com/how-to/glusterfs-troubleshooting/ >>>>>>> >>>>>>> Mentioning that the healing process can takes upto 10min of time to >>>>>>> start this process. >>>>>>> >>>>>>> Here is the statement from the link: >>>>>>> >>>>>>> "Healing replicated volumes >>>>>>> >>>>>>> When any brick in a replicated volume goes offline, the glusterd >>>>>>> daemons on the remaining nodes keep track of all the files that are not >>>>>>> replicated to the offline brick. When the offline brick becomes available >>>>>>> again, the cluster initiates a healing process, replicating the updated >>>>>>> files to that brick. *The start of this process can take up to 10 >>>>>>> minutes, based on observation.*" >>>>>>> >>>>>>> After giving the time of more than 5 min file corruption problem has >>>>>>> been resolved. >>>>>>> >>>>>>> So, Here my question is there any way through which we can reduce >>>>>>> the time taken by the healing process to start? >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Abhishek Paliwal >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >>> >>> >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > > -- > > > > > Regards > Abhishek Paliwal > > > >-- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160304/0a3115d2/attachment.html>