thr3ads.net - Gluster users - [Gluster-users] How To Turn Off Self Heal [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Kyle Harris

2015-Jan-16 23:58 UTC

[Gluster-users] How To Turn Off Self Heal

Hello,

I created a post a few days ago named "Turning Off Self Heal Options
Don't
Appear Work?" which can be found at the following link:
http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html

I never got a response so I decided to set up a test in a lab environment.
I am able to reproduce the same thing so I'm hoping someone can help me.

I have discovered over time that if a single node in a 3-node replicated
cluster with many small files is off for any length of time, when it comes
back on-line, it does a great deal of self-healing that can cause the
glusterfs and glusterfsd processes to spike on the machines to a degree
that makes them unusable.  I only have one volume, with a client mount on
each server where it hosts many websites running PHP.  All is fine until
the healing process goes into overdrive.

So, I attempted to turn off self-healing by setting the following three
settings:
gluster volume set gv0 cluster.data-self-heal off
gluster volume set gv0 cluster.entry-self-heal off
gluster volume set gv0 cluster.metadata-self-heal off

Note that I would rather not set gv0 cluster.self-heal-daemon off as then I
can't see what needs healing such that I can do it at a later time.  Those
settings appear to have no affect at all.

Here is how I reproduced this in my lab:

Output from "gluster volume info gv0":
Volume Name: gv0
Type: Replicate
Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.116:/export/brick1
Brick2: 192.168.1.140:/export/brick1
Brick3: 192.168.1.123:/export/brick1
Options Reconfigured:
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off

This was done using the latest version of gluster as of this writing,
v3.6.1 installed on CentOS 6.6 using the rpms available from the gluster
web site.

Here is how I tested:
- With all 3 nodes up, I put 4 simple text files on the cluster
- I then turned one node off
- Next I made a change to 2 of the text files
- Then I brought the previously turned off node back up

Upon doing so, I see far more than 2 of the following message in the
glusterhd.log:

[2015-01-15 23:19:30.471384] I
[afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0:
performing entry selfheal on 00000000-0000-0000-0000-000000000001
[2015-01-15 23:19:30.494714] I
[afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0: Completed
entry selfheal on 00000000-0000-0000-0000-000000000001. source=0 sinks
Questions:
- So is this a bug?
- Why am I seeing "entry selfheal" messaages when this feature is
supposed
to be turned off?
- Also, why am I seeing far more selfheal messages than 2 when I only
changed 2 files while the single node was down?
- Finally, how do I really turn off these selfheals that are taking place
without completely turning off the cluster.self-heal-daemon for reasons
mentioned above?

Thank you for any insight you may be able to provide on this.

-- 
Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150116/115cc298/attachment.html>

Pranith Kumar Karampuri

2015-Jan-21 19:02 UTC

head link

[Gluster-users] How To Turn Off Self Heal

On 01/17/2015 05:28 AM, Kyle Harris wrote:> Hello,
>
> I created a post a few days ago named "Turning Off Self Heal Options 
> Don't Appear Work?" which can be found at the following link: 
> http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html
>
> I never got a response so I decided to set up a test in a lab 
> environment.  I am able to reproduce the same thing so I'm hoping 
> someone can help me.
>
> I have discovered over time that if a single node in a 3-node 
> replicated cluster with many small files is off for any length of 
> time, when it comes back on-line, it does a great deal of self-healing 
> that can cause the glusterfs and glusterfsd processes to spike on the 
> machines to a degree that makes them unusable.  I only have one 
> volume, with a client mount on each server where it hosts many 
> websites running PHP.  All is fine until the healing process goes into 
> overdrive.
>
> So, I attempted to turn off self-healing by setting the following 
> three settings:
> gluster volume set gv0 cluster.data-self-heal off
> gluster volume set gv0 cluster.entry-self-heal off
> gluster volume set gv0 cluster.metadata-self-heal offhi Kyle,
        Krutika wanted to send a response to you today, but we spent the 
whole day debugging a bug. Let me answer some of the things we already 
discussed on behalf of Krutika.
        Krutika (CCed) has found one issue where even when some of the 
options are turned off, self-heal was still triggered. But if all the 
options are turned off I think it wouldn't do any heals from the mount 
process. But glustershd can still do heals. To disable that healing, we 
need to turn off self-heal-daemon using 'gluster volume set <volname> 
self-heal-daemon off'>
> Note that I would rather not set gv0 cluster.self-heal-daemon off as 
> then I can't see what needs healing such that I can do it at a later 
> time.  Those settings appear to have no affect at all.Ah! 3.6.2 will be able to give the output of 'gluster volume heal 
<volname> info' output even when self-heal-daemon is turned
off.>
> Here is how I reproduced this in my lab:
>
> Output from "gluster volume info gv0":
> Volume Name: gv0
> Type: Replicate
> Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.116:/export/brick1
> Brick2: 192.168.1.140:/export/brick1
> Brick3: 192.168.1.123:/export/brick1
> Options Reconfigured:
> cluster.metadata-self-heal: off
> cluster.entry-self-heal: off
> cluster.data-self-heal: off
>
> This was done using the latest version of gluster as of this writing, 
> v3.6.1 installed on CentOS 6.6 using the rpms available from the 
> gluster web site.
>
> Here is how I tested:
> - With all 3 nodes up, I put 4 simple text files on the cluster
> - I then turned one node off
> - Next I made a change to 2 of the text files
> - Then I brought the previously turned off node back up
>
> Upon doing so, I see far more than 2 of the following message in the 
> glusterhd.log:
>
> [2015-01-15 23:19:30.471384] I 
> [afr-self-heal-entry.c:545:afr_selfheal_entry_do] 0-gv0-replicate-0: 
> performing entry selfheal on 00000000-0000-0000-0000-000000000001
> [2015-01-15 23:19:30.494714] I 
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-gv0-replicate-0: 
> Completed entry selfheal on 00000000-0000-0000-0000-000000000001. 
> source=0 sinks>
> Questions:
> - So is this a bug?The log seems to suggest that it didn't find any 'sinks' to heal to
so
it wouldn't have done any file creation/deletions. May be we should fix 
the log or see if there is more to that bug.> - Why am I seeing "entry selfheal" messaages when this feature is
> supposed to be turned off?Because glustershd can still do self-heals as wel didn't disable
it?> - Also, why am I seeing far more selfheal messages than 2 when I only 
> changed 2 files while the single node was down?At the moment, I believe they are just log messages and not really 
heals. But we will need to look further and find if there is more to
it.> - Finally, how do I really turn off these selfheals that are taking 
> place without completely turning off the cluster.self-heal-daemon for 
> reasons mentioned above?There are 2 workarounds until 3.6.2 is released for this:
1) As a workaround may be we can turn self-heal-daemon off. When we want 
to see the files that need healing, we can turn it on, see the 
information and turn it off immediately. This broken functionality made 
it to 3.6.1 because I couldn't re-implement the feature for afrv2 in 
time for the release. Sorry about that!

2) Other way to do it is to inspect the gfids of the files that need 
heal directly by looking at the directory 
<brick-path>/.glusterfs/indices/xattrop. This is where self-heal-daemon 
looks at and finds the files that need healing.

You were saying you know a way to make machines unusable by triggering 
self-heals. It would be very good if we can replicate that test in our 
labs. Wondering if you have any pointers for us to do the same.

Pranith>
> Thank you for any insight you may be able to provide on this.
>
> -- 
> Kyle
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150122/43631188/attachment.html>

Gluster users - Jan 2015 - How To Turn Off Self Heal

[Gluster-users] How To Turn Off Self Heal

[Gluster-users] How To Turn Off Self Heal