thr3ads.net - Gluster users - [Gluster-users] [External] Re: Self Heal Confusion [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Davide Obbi

2018-Dec-31 07:58 UTC

[Gluster-users] [External] Re: Self Heal Confusion

if the long GFID does not correspond to any file it could mean the file has
been deleted by the client mounting the volume. I think this is caused when
the delete was issued and the number of active bricks were not reaching
quorum majority or a second brick was taken down while another was down or
did not finish the selfheal, the latter more likely.
It would be interesting to see:
- what version of glusterfs you running, it happened to me with 3.12
- volume quorum rules: "gluster volume get vol all | grep quorum"

To clean it up if i remember correctly it should be possible to delete the
gfid entries from the brick mounts on the glusterfs server nodes reporting
the files to heal.

As a side note you might want to consider changing the selfheal timeout to
more agressive schedule in cluster.heal-timeout option
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181231/616ba006/attachment.html>

Brett Holcomb

2018-Dec-31 09:34 UTC

head link

[Gluster-users] [External] Re: Self Heal Confusion

That is probably the case as a lot of files were deleted some time ago.

I'm on version 5.2 but was on 3.12 until about a week ago.

Here is the quorum info.? I'm running a distributed replicated volumes 
in 2 x 3 = 6

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
cluster.quorum-reads??????????????????? no

Where exacty do I remove the gfid entries from - the .glusterfs 
directory?? Do I just delete all the directories can files under this 
directory?

Where do I put the cluster.heal-timeout option - which file?

I think you've hit on the cause of the issue.? Thinking back we've had 
some extended power outages and due to a misconfiguration in the swap 
file device name a couple of the nodes did not come up and I didn't 
catch it for a while so maybe the deletes occured then.

Thank you.

On 12/31/18 2:58 AM, Davide Obbi wrote:> if the long GFID does not correspond to any file it could mean the 
> file has been deleted by the client mounting the volume. I think this 
> is caused when the delete was issued and the number of active bricks 
> were not reaching quorum majority or a second brick was taken down 
> while another was down or did not finish the selfheal, the latter more 
> likely.
> It would be interesting to see:
> - what version of glusterfs you running, it happened to me with 3.12
> - volume quorum rules: "gluster volume get vol all | grep quorum"
>
> To clean it up if i remember correctly it should be possible to delete 
> the gfid entries from the brick mounts on the glusterfs server nodes 
> reporting the files to heal.
>
> As a side note you might want to consider changing the selfheal 
> timeout to more agressive schedule in cluster.heal-timeout option

Gluster users - Dec 2018 - [External] Re: Self Heal Confusion

[Gluster-users] [External] Re: Self Heal Confusion

[Gluster-users] [External] Re: Self Heal Confusion