thr3ads.net - Gluster users - [Gluster-users] self-heal not working [Aug 2017]

If this information is useful, please help other people find it:
Share via:

mabi

2017-Aug-23 17:01 UTC

[Gluster-users] self-heal not working

I just saw the following bug which was fixed in 3.8.15:

https://bugzilla.redhat.com/show_bug.cgi?id=1471613

Is it possible that the problem I described in this post is related to that bug?
> -------- Original Message --------
> Subject: Re: [Gluster-users] self-heal not working
> Local Time: August 22, 2017 11:51 AM
> UTC Time: August 22, 2017 9:51 AM
> From: ravishankar at redhat.com
> To: mabi <mabi at protonmail.ch>
> Ben Turner <bturner at redhat.com>, Gluster Users <gluster-users
at gluster.org>
>
> On 08/22/2017 02:30 PM, mabi wrote:
>
>> Thanks for the additional hints, I have the following 2 questions
first:
>>
>> - In order to launch the index heal is the following command correct:
>> gluster volume heal myvolume
>
> Yes
>
>> - If I run a "volume start force" will it have any short
disruptions on my clients which mount the volume through FUSE? If yes, how long?
This is a production system that's why I am asking.
>
> No. You can actually create a test volume on  your personal linux box to
try these kinds of things without needing multiple machines. This is how we
develop and test our patches :)
> 'gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3}
force` and so on.
>
> HTH,
> Ravi
>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 22, 2017 6:26 AM
>>> UTC Time: August 22, 2017 4:26 AM
>>> From: ravishankar at redhat.com
>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at
protonmail.ch), Ben Turner [<bturner at redhat.com>](mailto:bturner at
redhat.com)
>>> Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>
>>> Explore the following:
>>>
>>> - Launch index heal and look at the glustershd logs of all bricks
for possible errors
>>>
>>> - See if the glustershd in each node is connected to all bricks.
>>>
>>> - If not try to restart shd by `volume start force`
>>>
>>> - Launch index heal again and try.
>>>
>>> - Try debugging the shd log by setting client-log-level to DEBUG
temporarily.
>>>
>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>
>>>> Sure, it doesn't look like a split brain based on the
output:
>>>>
>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>> Status: Connected
>>>> Number of entries in split-brain: 0
>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>> Local Time: August 21, 2017 11:35 PM
>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>> From: bturner at redhat.com
>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at
protonmail.ch)
>>>>> Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>
>>>>> Can you also provide:
>>>>>
>>>>> gluster v heal <my vol> info split-brain
>>>>>
>>>>> If it is split brain just delete the incorrect file from
the brick and run heal again. I haven"t tried this with arbiter but I
assume the process is the same.
>>>>>
>>>>> -b
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "mabi" [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> To: "Ben Turner" [<bturner at
redhat.com>](mailto:bturner at redhat.com)
>>>>>> Cc: "Gluster Users" [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>
>>>>>> Hi Ben,
>>>>>>
>>>>>> So it is really a 0 kBytes file everywhere (all nodes
including the arbiter
>>>>>> and from the client).
>>>>>> Here below you will find the output you requested.
Hopefully that will help
>>>>>> to find out why this specific file is not healing...
Let me know if you need
>>>>>> any more information. Btw node3 is my arbiter node.
>>>>>>
>>>>>> NODE1:
>>>>>>
>>>>>> STAT:
>>>>>> File:
>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: (
33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg=>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo>>>>>>
>>>>>> NODE2:
>>>>>>
>>>>>> STAT:
>>>>>> File:
>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: (
33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw=>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE>>>>>>
>>>>>> NODE3:
>>>>>> STAT:
>>>>>> File:
>>>>>>
/srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: (
33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> GETFATTR:
>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg=>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4>>>>>>
>>>>>> CLIENT GLUSTER MOUNT:
>>>>>> STAT:
>>>>>> File:
>>>>>>
"/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: (
33/www-data)
>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> Birth: -
>>>>>>
>>>>>> > -------- Original Message --------
>>>>>> > Subject: Re: [Gluster-users] self-heal not working
>>>>>> > Local Time: August 21, 2017 9:34 PM
>>>>>> > UTC Time: August 21, 2017 7:34 PM
>>>>>> > From: bturner at redhat.com
>>>>>> > To: mabi [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> > Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> >
>>>>>> > ----- Original Message -----
>>>>>> >> From: "mabi" [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>> >> To: "Gluster Users"
[<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>> >> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>> >> Subject: [Gluster-users] self-heal not working
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> I have a replicat 2 with arbiter GlusterFS
3.8.11 cluster and there is
>>>>>> >> currently one file listed to be healed as you
can see below but never gets
>>>>>> >> healed by the self-heal daemon:
>>>>>> >>
>>>>>> >> Brick node1.domain.tld:/data/myvolume/brick
>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> Brick node2.domain.tld:/data/myvolume/brick
>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> Brick
node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> >> Status: Connected
>>>>>> >> Number of entries: 1
>>>>>> >>
>>>>>> >> As once recommended on this mailing list I
have mounted that glusterfs
>>>>>> >> volume
>>>>>> >> temporarily through fuse/glusterfs and ran a
"stat" on that file which is
>>>>>> >> listed above but nothing happened.
>>>>>> >>
>>>>>> >> The file itself is available on all 3
nodes/bricks but on the last node it
>>>>>> >> has a different date. By the way this file is
0 kBytes big. Is that maybe
>>>>>> >> the reason why the self-heal does not work?
>>>>>> >
>>>>>> > Is the file actually 0 bytes or is it just 0 bytes
on the arbiter(0 bytes
>>>>>> > are expected on the arbiter, it just stores
metadata)? Can you send us the
>>>>>> > output from stat on all 3 nodes:
>>>>>> >
>>>>>> > $ stat <file on back end brick>
>>>>>> > $ getfattr -d -m - <file on back end brick>
>>>>>> > $ stat <file from gluster mount>
>>>>>> >
>>>>>> > Lets see what things look like on the back end, it
should tell us why
>>>>>> > healing is failing.
>>>>>> >
>>>>>> > -b
>>>>>> >
>>>>>> >>
>>>>>> >> And how can I now make this file to heal?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Mabi
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
_______________________________________________
>>>>>> >> Gluster-users mailing list
>>>>>> >> Gluster-users at gluster.org
>>>>>> >>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170823/c3adac4c/attachment.html>

Ravishankar N

2017-Aug-24 03:58 UTC

head link

[Gluster-users] self-heal not working

Unlikely. In your case only the afr.dirty is set, not the 
afr.volname-client-xx xattr.

`gluster volume set myvolume diagnostics.client-log-level DEBUG` is right.


On 08/23/2017 10:31 PM, mabi wrote:> I just saw the following bug which was fixed in 3.8.15:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
>
> Is it possible that the problem I described in this post is related to 
> that bug?
>
>
>
>> -------- Original Message --------
>> Subject: Re: [Gluster-users] self-heal not working
>> Local Time: August 22, 2017 11:51 AM
>> UTC Time: August 22, 2017 9:51 AM
>> From: ravishankar at redhat.com
>> To: mabi <mabi at protonmail.ch>
>> Ben Turner <bturner at redhat.com>, Gluster Users 
>> <gluster-users at gluster.org>
>>
>>
>>
>>
>> On 08/22/2017 02:30 PM, mabi wrote:
>>> Thanks for the additional hints, I have the following 2 questions
first:
>>>
>>> - In order to launch the index heal is the following command
correct:
>>> gluster volume heal myvolume
>>>
>> Yes
>>
>>> - If I run a "volume start force" will it have any short
disruptions
>>> on my clients which mount the volume through FUSE? If yes, how
long?
>>> This is a production system that's why I am asking.
>>>
>>>
>> No. You can actually create a test volume on  your personal linux box 
>> to try these kinds of things without needing multiple machines. This 
>> is how we develop and test our patches :)
>> 'gluster volume create testvol replica 3 
>> /home/mabi/bricks/brick{1..3} force` and so on.
>>
>> HTH,
>> Ravi
>>
>>
>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [Gluster-users] self-heal not working
>>>> Local Time: August 22, 2017 6:26 AM
>>>> UTC Time: August 22, 2017 4:26 AM
>>>> From: ravishankar at redhat.com
>>>> To: mabi <mabi at protonmail.ch>, Ben Turner <bturner
at redhat.com>
>>>> Gluster Users <gluster-users at gluster.org>
>>>>
>>>>
>>>> Explore the following:
>>>>
>>>> - Launch index heal and look at the glustershd logs of all
bricks
>>>> for possible errors
>>>>
>>>> - See if the glustershd in each node is connected to all
bricks.
>>>>
>>>> - If not try to restart shd by `volume start force`
>>>>
>>>> - Launch index heal again and try.
>>>>
>>>> - Try debugging the shd log by setting client-log-level to
DEBUG
>>>> temporarily.
>>>>
>>>>
>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>> Sure, it doesn't look like a split brain based on the
output:
>>>>>
>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>> Status: Connected
>>>>> Number of entries in split-brain: 0
>>>>>
>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>> Status: Connected
>>>>> Number of entries in split-brain: 0
>>>>>
>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>> Status: Connected
>>>>> Number of entries in split-brain: 0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>> From: bturner at redhat.com
>>>>>> To: mabi <mabi at protonmail.ch>
>>>>>> Gluster Users <gluster-users at gluster.org>
>>>>>>
>>>>>> Can you also provide:
>>>>>>
>>>>>> gluster v heal <my vol> info split-brain
>>>>>>
>>>>>> If it is split brain just delete the incorrect file
from the
>>>>>> brick and run heal again. I haven"t tried this
with arbiter but I
>>>>>> assume the process is the same.
>>>>>>
>>>>>> -b
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> > From: "mabi" <mabi at
protonmail.ch>
>>>>>> > To: "Ben Turner" <bturner at
redhat.com>
>>>>>> > Cc: "Gluster Users" <gluster-users at
gluster.org>
>>>>>> > Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>> > Subject: Re: [Gluster-users] self-heal not working
>>>>>> >
>>>>>> > Hi Ben,
>>>>>> >
>>>>>> > So it is really a 0 kBytes file everywhere (all
nodes including
>>>>>> the arbiter
>>>>>> > and from the client).
>>>>>> > Here below you will find the output you requested.
Hopefully
>>>>>> that will help
>>>>>> > to find out why this specific file is not
healing... Let me
>>>>>> know if you need
>>>>>> > any more information. Btw node3 is my arbiter
node.
>>>>>> >
>>>>>> > NODE1:
>>>>>> >
>>>>>> > STAT:
>>>>>> > File:
>>>>>> > 
>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>> > Size: 0 Blocks: 38 IO Block: 131072 regular empty
file
>>>>>> > Device: 24h/36d Inode: 10033884 Links: 2
>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
( 33/www-data)
>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> > Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> > Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> > Birth: -
>>>>>> >
>>>>>> > GETFATTR:
>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> >
trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg=>>>>>> >
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>> >
>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo>>>>>>
>
>>>>>> > NODE2:
>>>>>> >
>>>>>> > STAT:
>>>>>> > File:
>>>>>> > 
>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>> > Size: 0 Blocks: 38 IO Block: 131072 regular empty
file
>>>>>> > Device: 26h/38d Inode: 10031330 Links: 2
>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
( 33/www-data)
>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> > Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>> > Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>> > Birth: -
>>>>>> >
>>>>>> > GETFATTR:
>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> >
trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw=>>>>>> >
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>> >
>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE>>>>>>
>
>>>>>> > NODE3:
>>>>>> > STAT:
>>>>>> > File:
>>>>>> > 
>>>>>>
/srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> > Size: 0 Blocks: 0 IO Block: 4096 regular empty
file
>>>>>> > Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
( 33/www-data)
>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> > Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>> > Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>> > Birth: -
>>>>>> >
>>>>>> > GETFATTR:
>>>>>> > trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>> >
trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg=>>>>>> >
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>> >
>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4>>>>>>
>
>>>>>> > CLIENT GLUSTER MOUNT:
>>>>>> > STAT:
>>>>>> > File:
>>>>>> > 
>>>>>>
"/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>> > Size: 0 Blocks: 0 IO Block: 131072 regular empty
file
>>>>>> > Device: 1eh/30d Inode: 11897049013408443114 Links:
1
>>>>>> > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
( 33/www-data)
>>>>>> > Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>> > Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>> > Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>> > Birth: -
>>>>>> >
>>>>>> > > -------- Original Message --------
>>>>>> > > Subject: Re: [Gluster-users] self-heal not
working
>>>>>> > > Local Time: August 21, 2017 9:34 PM
>>>>>> > > UTC Time: August 21, 2017 7:34 PM
>>>>>> > > From: bturner at redhat.com
>>>>>> > > To: mabi <mabi at protonmail.ch>
>>>>>> > > Gluster Users <gluster-users at
gluster.org>
>>>>>> > >
>>>>>> > > ----- Original Message -----
>>>>>> > >> From: "mabi" <mabi at
protonmail.ch>
>>>>>> > >> To: "Gluster Users"
<gluster-users at gluster.org>
>>>>>> > >> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>> > >> Subject: [Gluster-users] self-heal not
working
>>>>>> > >>
>>>>>> > >> Hi,
>>>>>> > >>
>>>>>> > >> I have a replicat 2 with arbiter
GlusterFS 3.8.11 cluster
>>>>>> and there is
>>>>>> > >> currently one file listed to be healed as
you can see below
>>>>>> but never gets
>>>>>> > >> healed by the self-heal daemon:
>>>>>> > >>
>>>>>> > >> Brick
node1.domain.tld:/data/myvolume/brick
>>>>>> > >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> > >> Status: Connected
>>>>>> > >> Number of entries: 1
>>>>>> > >>
>>>>>> > >> Brick
node2.domain.tld:/data/myvolume/brick
>>>>>> > >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> > >> Status: Connected
>>>>>> > >> Number of entries: 1
>>>>>> > >>
>>>>>> > >> Brick
node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>> > >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>> > >> Status: Connected
>>>>>> > >> Number of entries: 1
>>>>>> > >>
>>>>>> > >> As once recommended on this mailing list
I have mounted that
>>>>>> glusterfs
>>>>>> > >> volume
>>>>>> > >> temporarily through fuse/glusterfs and
ran a "stat" on that
>>>>>> file which is
>>>>>> > >> listed above but nothing happened.
>>>>>> > >>
>>>>>> > >> The file itself is available on all 3
nodes/bricks but on
>>>>>> the last node it
>>>>>> > >> has a different date. By the way this
file is 0 kBytes big.
>>>>>> Is that maybe
>>>>>> > >> the reason why the self-heal does not
work?
>>>>>> > >
>>>>>> > > Is the file actually 0 bytes or is it just 0
bytes on the
>>>>>> arbiter(0 bytes
>>>>>> > > are expected on the arbiter, it just stores
metadata)? Can
>>>>>> you send us the
>>>>>> > > output from stat on all 3 nodes:
>>>>>> > >
>>>>>> > > $ stat <file on back end brick>
>>>>>> > > $ getfattr -d -m - <file on back end
brick>
>>>>>> > > $ stat <file from gluster mount>
>>>>>> > >
>>>>>> > > Lets see what things look like on the back
end, it should
>>>>>> tell us why
>>>>>> > > healing is failing.
>>>>>> > >
>>>>>> > > -b
>>>>>> > >
>>>>>> > >>
>>>>>> > >> And how can I now make this file to heal?
>>>>>> > >>
>>>>>> > >> Thanks,
>>>>>> > >> Mabi
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
_______________________________________________
>>>>>> > >> Gluster-users mailing list
>>>>>> > >> Gluster-users at gluster.org
>>>>>> > >>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/36657925/attachment.html>

mabi

2017-Aug-24 10:08 UTC

head link

[Gluster-users] self-heal not working

Thanks for confirming the command. I have now enabled DEBUG client-log-level,
run a heal and then attached the glustershd log files of all 3 nodes in this
mail.

The volume concerned is called myvol-pro, the other 3 volumes have no problem so
far.

Also note that in the mean time it looks like the file has been deleted by the
user and as such the heal info command does not show the file name anymore but
just is GFID which is:

gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea

Hope that helps for debugging this issue.
> -------- Original Message --------
> Subject: Re: [Gluster-users] self-heal not working
> Local Time: August 24, 2017 5:58 AM
> UTC Time: August 24, 2017 3:58 AM
> From: ravishankar at redhat.com
> To: mabi <mabi at protonmail.ch>
> Ben Turner <bturner at redhat.com>, Gluster Users <gluster-users
at gluster.org>
>
> Unlikely. In your case only the afr.dirty is set, not the
afr.volname-client-xx xattr.
>
> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is right.
>
> On 08/23/2017 10:31 PM, mabi wrote:
>
>> I just saw the following bug which was fixed in 3.8.15:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
>>
>> Is it possible that the problem I described in this post is related to
that bug?
>>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 22, 2017 11:51 AM
>>> UTC Time: August 22, 2017 9:51 AM
>>> From: ravishankar at redhat.com
>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at
protonmail.ch)
>>> Ben Turner [<bturner at redhat.com>](mailto:bturner at
redhat.com), Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>
>>> On 08/22/2017 02:30 PM, mabi wrote:
>>>
>>>> Thanks for the additional hints, I have the following 2
questions first:
>>>>
>>>> - In order to launch the index heal is the following command
correct:
>>>> gluster volume heal myvolume
>>>
>>> Yes
>>>
>>>> - If I run a "volume start force" will it have any
short disruptions on my clients which mount the volume through FUSE? If yes, how
long? This is a production system that's why I am asking.
>>>
>>> No. You can actually create a test volume on  your personal linux
box to try these kinds of things without needing multiple machines. This is how
we develop and test our patches :)
>>> 'gluster volume create testvol replica 3
/home/mabi/bricks/brick{1..3} force` and so on.
>>>
>>> HTH,
>>> Ravi
>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>> Local Time: August 22, 2017 6:26 AM
>>>>> UTC Time: August 22, 2017 4:26 AM
>>>>> From: ravishankar at redhat.com
>>>>> To: mabi [<mabi at protonmail.ch>](mailto:mabi at
protonmail.ch), Ben Turner [<bturner at redhat.com>](mailto:bturner at
redhat.com)
>>>>> Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>
>>>>> Explore the following:
>>>>>
>>>>> - Launch index heal and look at the glustershd logs of all
bricks for possible errors
>>>>>
>>>>> - See if the glustershd in each node is connected to all
bricks.
>>>>>
>>>>> - If not try to restart shd by `volume start force`
>>>>>
>>>>> - Launch index heal again and try.
>>>>>
>>>>> - Try debugging the shd log by setting client-log-level to
DEBUG temporarily.
>>>>>
>>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>>
>>>>>> Sure, it doesn't look like a split brain based on
the output:
>>>>>>
>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>> Status: Connected
>>>>>> Number of entries in split-brain: 0
>>>>>>
>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>> Status: Connected
>>>>>> Number of entries in split-brain: 0
>>>>>>
>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>> Status: Connected
>>>>>> Number of entries in split-brain: 0
>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>>> From: bturner at redhat.com
>>>>>>> To: mabi [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>> Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>
>>>>>>> Can you also provide:
>>>>>>>
>>>>>>> gluster v heal <my vol> info split-brain
>>>>>>>
>>>>>>> If it is split brain just delete the incorrect file
from the brick and run heal again. I haven"t tried this with arbiter but I
assume the process is the same.
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "mabi" [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>> To: "Ben Turner" [<bturner at
redhat.com>](mailto:bturner at redhat.com)
>>>>>>>> Cc: "Gluster Users"
[<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>>>> Subject: Re: [Gluster-users] self-heal not
working
>>>>>>>>
>>>>>>>> Hi Ben,
>>>>>>>>
>>>>>>>> So it is really a 0 kBytes file everywhere (all
nodes including the arbiter
>>>>>>>> and from the client).
>>>>>>>> Here below you will find the output you
requested. Hopefully that will help
>>>>>>>> to find out why this specific file is not
healing... Let me know if you need
>>>>>>>> any more information. Btw node3 is my arbiter
node.
>>>>>>>>
>>>>>>>> NODE1:
>>>>>>>>
>>>>>>>> STAT:
>>>>>>>> File:
>>>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular
empty file
>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data)
Gid: ( 33/www-data)
>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>> Birth: -
>>>>>>>>
>>>>>>>> GETFATTR:
>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg=>>>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo>>>>>>>>
>>>>>>>> NODE2:
>>>>>>>>
>>>>>>>> STAT:
>>>>>>>> File:
>>>>>>>>
?/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png?
>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular
empty file
>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data)
Gid: ( 33/www-data)
>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>> Birth: -
>>>>>>>>
>>>>>>>> GETFATTR:
>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw=>>>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE>>>>>>>>
>>>>>>>> NODE3:
>>>>>>>> STAT:
>>>>>>>> File:
>>>>>>>>
/srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty
file
>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data)
Gid: ( 33/www-data)
>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>>>> Birth: -
>>>>>>>>
>>>>>>>> GETFATTR:
>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>
trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg=>>>>>>>>
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g=>>>>>>>>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4>>>>>>>>
>>>>>>>> CLIENT GLUSTER MOUNT:
>>>>>>>> STAT:
>>>>>>>> File:
>>>>>>>>
"/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular
empty file
>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114
Links: 1
>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data)
Gid: ( 33/www-data)
>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>> Birth: -
>>>>>>>>
>>>>>>>> > -------- Original Message --------
>>>>>>>> > Subject: Re: [Gluster-users] self-heal not
working
>>>>>>>> > Local Time: August 21, 2017 9:34 PM
>>>>>>>> > UTC Time: August 21, 2017 7:34 PM
>>>>>>>> > From: bturner at redhat.com
>>>>>>>> > To: mabi [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>> > Gluster Users [<gluster-users at
gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>> >
>>>>>>>> > ----- Original Message -----
>>>>>>>> >> From: "mabi" [<mabi at
protonmail.ch>](mailto:mabi at protonmail.ch)
>>>>>>>> >> To: "Gluster Users"
[<gluster-users at gluster.org>](mailto:gluster-users at gluster.org)
>>>>>>>> >> Sent: Monday, August 21, 2017 9:28:24
AM
>>>>>>>> >> Subject: [Gluster-users] self-heal not
working
>>>>>>>> >>
>>>>>>>> >> Hi,
>>>>>>>> >>
>>>>>>>> >> I have a replicat 2 with arbiter
GlusterFS 3.8.11 cluster and there is
>>>>>>>> >> currently one file listed to be healed
as you can see below but never gets
>>>>>>>> >> healed by the self-heal daemon:
>>>>>>>> >>
>>>>>>>> >> Brick
node1.domain.tld:/data/myvolume/brick
>>>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>> >> Status: Connected
>>>>>>>> >> Number of entries: 1
>>>>>>>> >>
>>>>>>>> >> Brick
node2.domain.tld:/data/myvolume/brick
>>>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>> >> Status: Connected
>>>>>>>> >> Number of entries: 1
>>>>>>>> >>
>>>>>>>> >> Brick
node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>> >>
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>> >> Status: Connected
>>>>>>>> >> Number of entries: 1
>>>>>>>> >>
>>>>>>>> >> As once recommended on this mailing
list I have mounted that glusterfs
>>>>>>>> >> volume
>>>>>>>> >> temporarily through fuse/glusterfs and
ran a "stat" on that file which is
>>>>>>>> >> listed above but nothing happened.
>>>>>>>> >>
>>>>>>>> >> The file itself is available on all 3
nodes/bricks but on the last node it
>>>>>>>> >> has a different date. By the way this
file is 0 kBytes big. Is that maybe
>>>>>>>> >> the reason why the self-heal does not
work?
>>>>>>>> >
>>>>>>>> > Is the file actually 0 bytes or is it just
0 bytes on the arbiter(0 bytes
>>>>>>>> > are expected on the arbiter, it just
stores metadata)? Can you send us the
>>>>>>>> > output from stat on all 3 nodes:
>>>>>>>> >
>>>>>>>> > $ stat <file on back end brick>
>>>>>>>> > $ getfattr -d -m - <file on back end
brick>
>>>>>>>> > $ stat <file from gluster mount>
>>>>>>>> >
>>>>>>>> > Lets see what things look like on the back
end, it should tell us why
>>>>>>>> > healing is failing.
>>>>>>>> >
>>>>>>>> > -b
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> And how can I now make this file to
heal?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> Mabi
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
_______________________________________________
>>>>>>>> >> Gluster-users mailing list
>>>>>>>> >> Gluster-users at gluster.org
>>>>>>>> >>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/7177a78c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gshdlogs.zip
Type: application/zip
Size: 14526 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/7177a78c/attachment.zip>

Apparently Analagous Threads

Search for more maybe matching threads

Gluster users - Aug 2017 - self-heal not working

[Gluster-users] self-heal not working

[Gluster-users] self-heal not working

[Gluster-users] self-heal not working

Apparently Analagous Threads