thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] Query on healing process [Mar 2016]

If this information is useful, please help other people find it:
Share via:

ABHISHEK PALIWAL

2016-Mar-04 06:40 UTC

[Gluster-users] [Gluster-devel] Query on healing process

Hi Ravi,

3. On the rebooted node, do you have ssl enabled by any chance? There is a
bug for "Not able to fetch volfile' when ssl is enabled:
https://bugzilla.redhat.com/show_bug.cgi?id=1258931

->>>>> I have checked but ssl is disabled but still getting these
errors

# gluster volume heal c_glusterfs info
c_glusterfs: Not able to fetch volfile from glusterd
Volume heal failed.

# gluster volume heal c_glusterfs info split-brain
c_glusterfs: Not able to fetch volfile from glusterd
Volume heal failed.

And based on the your observation I understood that this is not the problem
of split-brain but *is there any way through which can find out the file
which is not in split-brain as well as not in sync?*

# getfattr -m . -d -e hex
/opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
getfattr: Removing leading '/' from absolute path names
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000** //because
client8 is the latest client in our case and starting 8 digits *

*00000006....are saying like there is something in changelog data.*
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae

# lhsh 002500 getfattr -m . -d -e hex
/opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
getfattr: Removing leading '/' from absolute path names
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000** // and here
we can say that there is no split brain but the file is out of sync*
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000001156d86c290005735c
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae

# gluster volume info

Volume Name: c_glusterfs
Type: Replicate
Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
performance.readdir-ahead: on
network.ping-timeout: 4
nfs.disable: on


# gluster volume info

Volume Name: c_glusterfs
Type: Replicate
Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
performance.readdir-ahead: on
network.ping-timeout: 4
nfs.disable: on

# gluster --version
glusterfs 3.7.8 built on Feb 17 2016 07:49:49
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
# gluster volume heal info heal-failed
Usage: volume heal <VOLNAME> [enable | disable | full |statistics
[heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed |
split-brain] |split-brain {bigger-file <FILE> |source-brick
<HOSTNAME:BRICKNAME> [<FILE>]}]
# gluster volume heal c_glusterfs info heal-failed
Command not supported. Please use "gluster volume heal c_glusterfs
info"
and logs to find the heal information.
# lhsh 002500
 _______  _____   _____              _____ __   _ _     _ _     _
 |       |_____] |_____]      |        |   | \  | |     |  \___/
 |_____  |       |            |_____ __|__ |  \_| |_____| _/   \_

002500> gluster --version
glusterfs 3.7.8 built on Feb 17 2016 07:49:49
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
002500>

Regards,
Abhishek

On Thu, Mar 3, 2016 at 4:54 PM, ABHISHEK PALIWAL <abhishpaliwal at
gmail.com>
wrote:
>
> On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>> Hi,
>>
>> On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:
>>
>> Hi Ravi,
>>
>> As I discussed earlier this issue, I investigated this issue and find
>> that healing is not triggered because the "gluster volume heal
c_glusterfs
>> info split-brain" command not showing any entries as a outcome of
this
>> command even though the file in split brain case.
>>
>>
>> Couple of observations from the 'commands_output' file.
>>
>> getfattr -d -m . -e hex
>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>> The afr xattrs do not indicate that the file is in split brain:
>> # file:
>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>> trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9
>> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>>
>>
>>
>> getfattr -d -m . -e hex
>> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>> trusted.afr.c_glusterfs-client-0=0x000000080000000000000000
>> trusted.afr.c_glusterfs-client-2=0x000000020000000000000000
>> trusted.afr.c_glusterfs-client-4=0x000000020000000000000000
>> trusted.afr.c_glusterfs-client-6=0x000000020000000000000000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7
>> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>>
>> 1. There doesn't seem to be a split-brain going by the trusted.afr*
>> xattrs.
>>
>
> if it is not the split brain problem then how can I resolve this.
>
>
>> 2. You seem to have re-used the bricks from another volume/setup. For
>> replica 2, only trusted.afr.c_glusterfs-client-0 and
>> trusted.afr.c_glusterfs-client-1 must be present but I see 4 xattrs -
>> client-0,2,4 and 6
>>
>
> could you please suggest why these entries are there because I am not able
> to find out scenario. I am rebooting the one board multiple times to
> reproduce the issue and after every reboot doing the remove-brick and
> add-brick on the same volume for the second board.
>
>
>> 3. On the rebooted node, do you have ssl enabled by any chance? There
is
>> a bug for "Not able to fetch volfile' when ssl is enabled:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1258931
>>
>> Btw, you for data and metadata split-brains you can use the gluster CLI
>>
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md
>> instead of modifying the file from the back end.
>>
>
> But you are saying it is not split brain problem and even the split-brain
> command  is not showing any file so how can I find the bigger file in size.
> Also in my case the file size is fix 2MB it is overwritten every time.
>
>>
>> -Ravi
>>
>>
>> So, what I have done I manually deleted the gfid entry of that file
from
>> .glusterfs directory and follow the instruction mentioned in the
following
>> link to do heal
>>
>>
>>
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
>>
>> and this works fine for me.
>>
>> But my question is why the split-brain command not showing any file in
>> output.
>>
>> Here I am attaching all the log which I get from the node for you and
>> also the output of commands from both of the boards
>>
>> In this tar file two directories are present
>>
>> 000300 - log for the board which is running continuously
>> 002500-  log for the board which is rebooted
>>
>> I am waiting for your reply please help me out on this issue.
>>
>> Thanks in advanced.
>>
>> Regards,
>> Abhishek
>>
>> On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL <
>> <abhishpaliwal at gmail.com>abhishpaliwal at gmail.com> wrote:
>>
>>> On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N <
>>> <ravishankar at redhat.com>ravishankar at redhat.com>
wrote:
>>>
>>>> On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote:
>>>>
>>>> Yes correct
>>>>
>>>>
>>>> Okay, so when you say the files are not in sync until some
time, are
>>>> you getting stale data when accessing from the mount?
>>>> I'm not able to figure out why heal info shows zero when
the files are
>>>> not in sync, despite all IO happening from the mounts. Could
you provide
>>>> the output of getfattr -d -m . -e hex /brick/file-name from
both bricks
>>>> when you hit this issue?
>>>>
>>>> I'll provide the logs once I get. here delay means we are
powering on
>>>> the second board after the 10 minutes.
>>>>
>>>>
>>>> On Feb 26, 2016 9:57 AM, "Ravishankar N" <
<ravishankar at redhat.com>
>>>> ravishankar at redhat.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:
>>>>>
>>>>> Hi Ravi,
>>>>>
>>>>> Thanks for the response.
>>>>>
>>>>> We are using Glugsterfs-3.7.8
>>>>>
>>>>> Here is the use case:
>>>>>
>>>>> We have a logging file which saves logs of the events for
every board
>>>>> of a node and these files are in sync using glusterfs.
System in replica 2
>>>>> mode it means When one brick in a replicated volume goes
offline, the
>>>>> glusterd daemons on the other nodes keep track of all the
files that are
>>>>> not replicated to the offline brick. When the offline brick
becomes
>>>>> available again, the cluster initiates a healing process,
replicating the
>>>>> updated files to that brick. But in our casse, we see that
log file
>>>>> of one board is not in the sync and its format is corrupted
means files are
>>>>> not in sync.
>>>>>
>>>>>
>>>>> Just to understand you correctly, you have mounted the 2
node
>>>>> replica-2 volume on both these nodes and writing to a
logging file from the
>>>>> mounts right?
>>>>>
>>>>>
>>>>> Even the outcome of #gluster volume heal c_glusterfs info
shows that
>>>>> there is no pending heals.
>>>>>
>>>>> Also , The logging file which is updated is of fixed size
and the new
>>>>> entries will be wrapped ,overwriting the old entries.
>>>>>
>>>>> This way we have seen that after few restarts , the
contents of the
>>>>> same file on two bricks are different , but the volume heal
info shows zero
>>>>> entries
>>>>>
>>>>> Solution:
>>>>>
>>>>> But when we tried to put delay  > 5 min before the
healing everything
>>>>> is working fine.
>>>>>
>>>>> Regards,
>>>>> Abhishek
>>>>>
>>>>> On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N <
>>>>> <ravishankar at redhat.com>ravishankar at
redhat.com> wrote:
>>>>>
>>>>>> On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here, I have one query regarding the time taken by the
healing
>>>>>> process.
>>>>>> In current two node setup when we rebooted one node
then the
>>>>>> self-healing process starts less than 5min interval on
the board which
>>>>>> resulting the corruption of the some files data.
>>>>>>
>>>>>>
>>>>>> Heal should start immediately after the brick process
comes up. What
>>>>>> version of gluster are you using? What do you mean by
corruption of data?
>>>>>> Also, how did you observe that the heal started after 5
minutes?
>>>>>> -Ravi
>>>>>>
>>>>>>
>>>>>> And to resolve it I have search on google and found the
following
>>>>>> link:
>>>>>>
<https://support.rackspace.com/how-to/glusterfs-troubleshooting/>
>>>>>>
https://support.rackspace.com/how-to/glusterfs-troubleshooting/
>>>>>>
>>>>>> Mentioning that the healing process can takes upto
10min of time to
>>>>>> start this process.
>>>>>>
>>>>>> Here is the statement from the link:
>>>>>>
>>>>>> "Healing replicated volumes
>>>>>>
>>>>>> When any brick in a replicated volume goes offline, the
glusterd
>>>>>> daemons on the remaining nodes keep track of all the
files that are not
>>>>>> replicated to the offline brick. When the offline brick
becomes available
>>>>>> again, the cluster initiates a healing process,
replicating the updated
>>>>>> files to that brick. *The start of this process can
take up to 10
>>>>>> minutes, based on observation.*"
>>>>>>
>>>>>> After giving the time of more than 5 min file
corruption problem has
>>>>>> been resolved.
>>>>>>
>>>>>> So, Here my question is there any way through which we
can reduce the
>>>>>> time taken by the healing process to start?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Abhishek Paliwal
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing listGluster-devel at
gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Abhishek Paliwal
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Regards
>>> Abhishek Paliwal
>>>
>>
>>
>>
>> --
>>
>>
>>
>>
>> Regards
>> Abhishek Paliwal
>>
>>
>>
>>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>


-- 




Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160304/38955f37/attachment.html>

Ravishankar N

2016-Mar-04 12:01 UTC

head link

[Gluster-users] [Gluster-devel] Query on healing process

On 03/04/2016 12:10 PM, ABHISHEK PALIWAL wrote:> Hi Ravi,
>
> 3. On the rebooted node, do you have ssl enabled by any chance? There 
> is a bug for "Not able to fetch volfile' when ssl is enabled: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1258931
>
> ->>>>> I have checked but ssl is disabled but still getting
these errors
>
> # gluster volume heal c_glusterfs info
> c_glusterfs: Not able to fetch volfile from glusterd
> Volume heal failed.
>
Ok, just to confirm, glusterd  and other brick processes are running 
after this node rebooted?
When you run the above command, you need to check 
/var/log/glusterfs/glfsheal-volname.log logs errros. Setting 
client-log-level to DEBUG would give you a more verbose message
> # gluster volume heal c_glusterfs info split-brain
> c_glusterfs: Not able to fetch volfile from glusterd
> Volume heal failed.
>
>
> And based on the your observation I understood that this is not the 
> problem of split-brain but *is there any way through which can find 
> out the file which is not in split-brain as well as not in sync?*
`gluster volume heal c_glusterfs info split-brain` should give you files 
that need heal.
>
> # getfattr -m . -d -e hex 
> /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> getfattr: Removing leading '/' from absolute path names
> # file: 
> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000**//because
client8
> is the latest client in our case and starting 8 digits **
> *
> *00000006....are saying like there is something in changelog data.
> *
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
> # lhsh 002500 getfattr -m . -d -e hex 
> /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> getfattr: Removing leading '/' from absolute path names
> # file: 
> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000**// and 
> here we can say that there is no split brain but the file is out of sync*
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x000000000000001156d86c290005735c
> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
> # gluster volume info
>
> Volume Name: c_glusterfs
> Type: Replicate
> Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
> Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
> Options Reconfigured:
> performance.readdir-ahead: on
> network.ping-timeout: 4
> nfs.disable: on
>
>
> # gluster volume info
>
> Volume Name: c_glusterfs
> Type: Replicate
> Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
> Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
> Options Reconfigured:
> performance.readdir-ahead: on
> network.ping-timeout: 4
> nfs.disable: on
>
> # gluster --version
> glusterfs 3.7.8 built on Feb 17 2016 07:49:49
> Repository revision: git://git.gluster.com/glusterfs.git 
> <http://git.gluster.com/glusterfs.git>
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com 
>
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>
>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> General Public License.
> # gluster volume heal info heal-failed
> Usage: volume heal <VOLNAME> [enable | disable | full |statistics 
> [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | 
> heal-failed | split-brain] |split-brain {bigger-file <FILE> 
> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]
> # gluster volume heal c_glusterfs info heal-failed
> Command not supported. Please use "gluster volume heal c_glusterfs 
> info" and logs to find the heal information.
> # lhsh 002500
>  _______  _____ _____              _____ __   _ _     _ _     _
>  |       |_____] |_____]      |        |   | \  | |     |  \___/
>  |_____  |       |          |_____ __|__ |  \_| |_____| _/   \_
>
> 002500> gluster --version
> glusterfs 3.7.8 built on Feb 17 2016 07:49:49
> Repository revision: git://git.gluster.com/glusterfs.git 
> <http://git.gluster.com/glusterfs.git>
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com 
>
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>
>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> General Public License.
> 002500>
>
> Regards,
> Abhishek
>
> On Thu, Mar 3, 2016 at 4:54 PM, ABHISHEK PALIWAL 
> <abhishpaliwal at gmail.com <mailto:abhishpaliwal at
gmail.com>> wrote:
>
>
>     On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N
>     <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>
>         Hi,
>
>         On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:
>>         Hi Ravi,
>>
>>         As I discussed earlier this issue, I investigated this issue
>>         and find that healing is not triggered because the
"gluster
>>         volume heal c_glusterfs info split-brain" command not
showing
>>         any entries as a outcome of this command even though the file
>>         in split brain case.
>
>         Couple of observations from the 'commands_output' file.
>
>         getfattr -d -m . -e hex
>        
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>         The afr xattrs do not indicate that the file is in split brain:
>         # file:
>        
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>         trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
>         trusted.afr.dirty=0x000000000000000000000000
>         trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9
>         trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
>
>
>         getfattr -d -m . -e hex
>        
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>         trusted.afr.c_glusterfs-client-0=0x000000080000000000000000
>         trusted.afr.c_glusterfs-client-2=0x000000020000000000000000
>         trusted.afr.c_glusterfs-client-4=0x000000020000000000000000
>         trusted.afr.c_glusterfs-client-6=0x000000020000000000000000
>         trusted.afr.dirty=0x000000000000000000000000
>         trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7
>         trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
>         1. There doesn't seem to be a split-brain going by the
>         trusted.afr* xattrs.
>
>
>     if it is not the split brain problem then how can I resolve this.
>
>         2. You seem to have re-used the bricks from another
>         volume/setup. For replica 2, only
>         trusted.afr.c_glusterfs-client-0 and
>         trusted.afr.c_glusterfs-client-1 must be present but I see 4
>         xattrs - client-0,2,4 and 6
>
>
>     could you please suggest why these entries are there because I am
>     not able to find out scenario. I am rebooting the one board
>     multiple times to reproduce the issue and after every reboot doing
>     the remove-brick and add-brick on the same volume for the second
>     board.
>
>         3. On the rebooted node, do you have ssl enabled by any
>         chance? There is a bug for "Not able to fetch volfile'
when
>         ssl is enabled:
>         https://bugzilla.redhat.com/show_bug.cgi?id=1258931
>
>         Btw, you for data and metadata split-brains you can use the
>         gluster CLI
>        
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md
>         instead of modifying the file from the back end.
>
>
>     But you are saying it is not split brain problem and even the
>     split-brain command  is not showing any file so how can I find the
>     bigger file in size. Also in my case the file size is fix 2MB it
>     is overwritten every time.
>
>
>         -Ravi
>
>>
>>         So, what I have done I manually deleted the gfid entry of
>>         that file from .glusterfs directory and follow the
>>         instruction mentioned in the following link to do heal
>>
>>        
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
>>
>>         and this works fine for me.
>>
>>         But my question is why the split-brain command not showing
>>         any file in output.
>>
>>         Here I am attaching all the log which I get from the node for
>>         you and also the output of commands from both of the boards
>>
>>         In this tar file two directories are present
>>
>>         000300 - log for the board which is running continuously
>>         002500-  log for the board which is rebooted
>>
>>         I am waiting for your reply please help me out on this issue.
>>
>>         Thanks in advanced.
>>
>>         Regards,
>>         Abhishek
>>
>>         On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL
>>         <abhishpaliwal at gmail.com <mailto:abhishpaliwal at
gmail.com>> wrote:
>>
>>             On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N
>>             <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>>
>>             wrote:
>>
>>                 On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote:
>>>
>>>                 Yes correct
>>>
>>
>>                 Okay, so when you say the files are not in sync until
>>                 some time, are you getting stale data when accessing
>>                 from the mount?
>>                 I'm not able to figure out why heal info shows zero
>>                 when the files are not in sync, despite all IO
>>                 happening from the mounts. Could you provide the
>>                 output of getfattr -d -m . -e hex /brick/file-name
>>                 from both bricks when you hit this issue?
>>
>>                 I'll provide the logs once I get. here delay means
we
>>                 are powering on the second board after the 10 minutes.
>>
>>
>>>                 On Feb 26, 2016 9:57 AM, "Ravishankar N"
>>>                 <ravishankar at redhat.com
>>>                 <mailto:ravishankar at redhat.com>> wrote:
>>>
>>>                     Hello,
>>>
>>>                     On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:
>>>>                     Hi Ravi,
>>>>
>>>>                     Thanks for the response.
>>>>
>>>>                     We are using Glugsterfs-3.7.8
>>>>
>>>>                     Here is the use case:
>>>>
>>>>                     We have a logging file which saves logs of
the
>>>>                     events for every board of a node and these
>>>>                     files are in sync using glusterfs. System
in
>>>>                     replica 2 mode it means When one brick in a
>>>>                     replicated volume goes offline, the
glusterd
>>>>                     daemons on the other nodes keep track of
all
>>>>                     the files that are not replicated to the
>>>>                     offline brick. When the offline brick
becomes
>>>>                     available again, the cluster initiates a
>>>>                     healing process, replicating the updated
files
>>>>                     to that brick. But in our casse, we see
that
>>>>                     log file of one board is not in the sync
and
>>>>                     its format is corrupted means files are not
in
>>>>                     sync.
>>>
>>>                     Just to understand you correctly, you have
>>>                     mounted the 2 node replica-2 volume on both
>>>                     these nodes and writing to a logging file from
>>>                     the mounts right?
>>>
>>>>
>>>>                     Even the outcome of #gluster volume heal
>>>>                     c_glusterfs info shows that there is no
pending
>>>>                     heals.
>>>>
>>>>                     Also , The logging file which is updated is
of
>>>>                     fixed size and the new entries will be
wrapped
>>>>                     ,overwriting the old entries.
>>>>
>>>>                     This way we have seen that after few
restarts ,
>>>>                     the contents of the same file on two bricks
are
>>>>                     different , but the volume heal info shows
zero
>>>>                     entries
>>>>
>>>>                     Solution:
>>>>
>>>>                     But when we tried to put delay > 5 min
before
>>>>                     the healing everything is working fine.
>>>>
>>>>                     Regards,
>>>>                     Abhishek
>>>>
>>>>                     On Fri, Feb 26, 2016 at 6:35 AM,
Ravishankar N
>>>>                     <ravishankar at redhat.com
>>>>                     <mailto:ravishankar at
redhat.com>> wrote:
>>>>
>>>>                         On 02/25/2016 06:01 PM, ABHISHEK
PALIWAL wrote:
>>>>>                         Hi,
>>>>>
>>>>>                         Here, I have one query regarding
the time
>>>>>                         taken by the healing process.
>>>>>                         In current two node setup when we
rebooted
>>>>>                         one node then the self-healing
process
>>>>>                         starts less than 5min interval on
the
>>>>>                         board which resulting the
corruption of
>>>>>                         the some files data.
>>>>
>>>>                         Heal should start immediately after the
>>>>                         brick process comes up. What version of
>>>>                         gluster are you using? What do you mean
by
>>>>                         corruption of data? Also, how did you
>>>>                         observe that the heal started after 5
minutes?
>>>>                         -Ravi
>>>>>
>>>>>                         And to resolve it I have search on
google
>>>>>                         and found the following link:
>>>>>                        
https://support.rackspace.com/how-to/glusterfs-troubleshooting/
>>>>>
>>>>>                         Mentioning that the healing process
can
>>>>>                         takes upto 10min of time to start
this
>>>>>                         process.
>>>>>
>>>>>                         Here is the statement from the
link:
>>>>>
>>>>>                         "Healing replicated volumes
>>>>>
>>>>>                         When any brick in a replicated
volume goes
>>>>>                         offline, the glusterd daemons on
the
>>>>>                         remaining nodes keep track of all
the
>>>>>                         files that are not replicated to
the
>>>>>                         offline brick. When the offline
brick
>>>>>                         becomes available again, the
cluster
>>>>>                         initiates a healing process,
replicating
>>>>>                         the updated files to that brick.
*The
>>>>>                         start of this process can take up
to 10
>>>>>                         minutes, based on
observation.*"
>>>>>
>>>>>                         After giving the time of more than
5 min
>>>>>                         file corruption problem has been
resolved.
>>>>>
>>>>>                         So, Here my question is there any
way
>>>>>                         through which we can reduce the
time taken
>>>>>                         by the healing process to start?
>>>>>
>>>>>
>>>>>                         Regards,
>>>>>                         Abhishek Paliwal
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                        
_______________________________________________
>>>>>                         Gluster-devel mailing list
>>>>>                         Gluster-devel at gluster.org
>>>>>                         <mailto:Gluster-devel at
gluster.org>
>>>>>                        
http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                     -- 
>>>>
>>>>
>>>>
>>>>
>>>>                     Regards
>>>>                     Abhishek Paliwal
>>>
>>>
>>
>>
>>
>>
>>
>>             -- 
>>
>>
>>
>>
>>             Regards
>>             Abhishek Paliwal
>>
>>
>>
>>
>>         -- 
>>
>>
>>
>>
>>         Regards
>>         Abhishek Paliwal
>
>
>
>
>
>     -- 
>
>
>
>
>     Regards
>     Abhishek Paliwal
>
>
>
>
> -- 
>
>
>
>
> Regards
> Abhishek Paliwal

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160304/c0df219f/attachment.html>

Gluster users - Mar 2016 - [Gluster-devel] Query on healing process

[Gluster-users] [Gluster-devel] Query on healing process

[Gluster-users] [Gluster-devel] Query on healing process