thr3ads.net - Gluster users - [Gluster-users] Self Heal Confusion [Dec 2018]

If this information is useful, please help other people find it:
Share via:

John Strunk

2018-Dec-21 01:26 UTC

[Gluster-users] Self Heal Confusion

Assuming your bricks are up... yes, the heal count should be decreasing.

There is/was a bug wherein self-heal would stop healing but would still be
running. I don't know whether your version is affected, but the remedy is
to just restart the self-heal daemon.
Force start one of the volumes that has heals pending. The bricks are
already running, but it will cause shd to restart and, assuming this is the
problem, healing should begin...

$ gluster vol start my-pending-heal-vol force

Others could better comment on the status of the bug.

-John


On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb <biholcomb at l1049h.com>
wrote:
> I have one volume that has 85 pending entries in healing and two more
> volumes with 58,854 entries in healing pending.  These numbers are from
> the volume heal info summary command.  They have stayed constant for two
> days now.  I've read the gluster docs and many more.  The Gluster docs
> just give some commands and non gluster docs basically repeat that.
> Given that it appears no self-healing is going on for my volume I am
> confused as to why.
>
> 1.  If a self-heal deamon is listed on a host (all of mine show one with
> a volume status command) can I assume it's enabled and running?
>
> 2.  I assume the volume that has all the self-heals pending has some
> serious issues even though I can access the files and directories on
> it.  If self-heal is running shouldn't the numbers be decreasing?
>
> It appears to me self-heal is not working properly so how to I get it to
> start working or should I delete the volume and start over?
>
> I'm running gluster 5.2 on Centos 7 latest and updated.
>
> Thank you.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181220/75adee03/attachment.html>

Brett Holcomb

2018-Dec-21 19:52 UTC

head link

[Gluster-users] Self Heal Confusion

Not sure what I'm looking for but nothing jumps out and says "I broke 
here." <G>.? I see some event_dispatch_epoll_worker messages about 
failing to dispatch handler in all the logs.? from what I can tell it's 
connecting to the volume fine.


On 12/20/18 8:26 PM, John Strunk wrote:> Assuming your bricks are up... yes, the heal count should be decreasing.
>
> There is/was a bug wherein self-heal would stop healing but would 
> still be running. I don't know whether your version is affected, but 
> the remedy is to just restart the self-heal daemon.
> Force start one of the volumes that has heals pending. The bricks are 
> already running, but it will cause shd to restart and, assuming this 
> is the problem, healing should begin...
>
> $ gluster vol start my-pending-heal-vol force
>
> Others could better comment on the status of the bug.
>
> -John
>
>
> On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb <biholcomb at l1049h.com 
> <mailto:biholcomb at l1049h.com>> wrote:
>
>     I have one volume that has 85 pending entries in healing and two more
>     volumes with 58,854 entries in healing pending.? These numbers are
>     from
>     the volume heal info summary command.? They have stayed constant
>     for two
>     days now.? I've read the gluster docs and many more.? The Gluster
>     docs
>     just give some commands and non gluster docs basically repeat that.
>     Given that it appears no self-healing is going on for my volume I am
>     confused as to why.
>
>     1.? If a self-heal deamon is listed on a host (all of mine show
>     one with
>     a volume status command) can I assume it's enabled and running?
>
>     2.? I assume the volume that has all the self-heals pending has some
>     serious issues even though I can access the files and directories on
>     it.? If self-heal is running shouldn't the numbers be decreasing?
>
>     It appears to me self-heal is not working properly so how to I get
>     it to
>     start working or should I delete the volume and start over?
>
>     I'm running gluster 5.2 on Centos 7 latest and updated.
>
>     Thank you.
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181221/cf4fb389/attachment.html>

Brett Holcomb

2018-Dec-23 04:17 UTC

head link

[Gluster-users] Self Heal Confusion

Very strange.? I see this in the glusterd.log

[2018-12-22 23:53:47.216743] E [MSGID: 101191] 
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
dispatch handler
(END)

After force starting the volume and doing a

gluster vol? heal projects full

This is in the glustershd log so I assume it started.

[2018-12-22 22:54:22.328897] I [MSGID: 114046] 
[client-handshake.c:1107:client_setvolume_cbk] 0-projects-client-5: 
Connected to projects-client-5, attached to remote volume 
'/srv/gfs01/Projects'.

This shows up in the glfsheal-projects.log file.

[2018-12-22 23:53:41.916773] E [MSGID: 101191] 
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
dispatch handler

I'm not sure what it's trying to tell me when it fails to dispatch a 
handler.

 From what I could find there were issues in the early 5.0 build with 
some of these errors coming up but that a patch was included early on.? 
I am on 5.2

I'll keep digging.


On 12/20/18 8:26 PM, John Strunk wrote:> Assuming your bricks are up... yes, the heal count should be decreasing.
>
> There is/was a bug wherein self-heal would stop healing but would 
> still be running. I don't know whether your version is affected, but 
> the remedy is to just restart the self-heal daemon.
> Force start one of the volumes that has heals pending. The bricks are 
> already running, but it will cause shd to restart and, assuming this 
> is the problem, healing should begin...
>
> $ gluster vol start my-pending-heal-vol force
>
> Others could better comment on the status of the bug.
>
> -John
>
>
> On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb <biholcomb at l1049h.com 
> <mailto:biholcomb at l1049h.com>> wrote:
>
>     I have one volume that has 85 pending entries in healing and two more
>     volumes with 58,854 entries in healing pending.? These numbers are
>     from
>     the volume heal info summary command.? They have stayed constant
>     for two
>     days now.? I've read the gluster docs and many more.? The Gluster
>     docs
>     just give some commands and non gluster docs basically repeat that.
>     Given that it appears no self-healing is going on for my volume I am
>     confused as to why.
>
>     1.? If a self-heal deamon is listed on a host (all of mine show
>     one with
>     a volume status command) can I assume it's enabled and running?
>
>     2.? I assume the volume that has all the self-heals pending has some
>     serious issues even though I can access the files and directories on
>     it.? If self-heal is running shouldn't the numbers be decreasing?
>
>     It appears to me self-heal is not working properly so how to I get
>     it to
>     start working or should I delete the volume and start over?
>
>     I'm running gluster 5.2 on Centos 7 latest and updated.
>
>     Thank you.
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181222/2e87b820/attachment.html>

Brett Holcomb

2018-Dec-26 18:49 UTC

head link

[Gluster-users] Self Heal Confusion

Still no change in the heals pending.? I found this reference, 
https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
which mentions the default SELinux context for a brick and that internal 
operations such as self-heal, rebalance should be ignored. but they do 
not elaborate on what ignore means - is it just not doing self-heal or 
something else.

I did set SELinux to permissive and nothing changed.? I'll try setting 
the bricks to the context mentioned in this pdf and see what happens.


On 12/20/18 8:26 PM, John Strunk wrote:> Assuming your bricks are up... yes, the heal count should be decreasing.
>
> There is/was a bug wherein self-heal would stop healing but would 
> still be running. I don't know whether your version is affected, but 
> the remedy is to just restart the self-heal daemon.
> Force start one of the volumes that has heals pending. The bricks are 
> already running, but it will cause shd to restart and, assuming this 
> is the problem, healing should begin...
>
> $ gluster vol start my-pending-heal-vol force
>
> Others could better comment on the status of the bug.
>
> -John
>
>
> On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb <biholcomb at l1049h.com 
> <mailto:biholcomb at l1049h.com>> wrote:
>
>     I have one volume that has 85 pending entries in healing and two more
>     volumes with 58,854 entries in healing pending.? These numbers are
>     from
>     the volume heal info summary command.? They have stayed constant
>     for two
>     days now.? I've read the gluster docs and many more.? The Gluster
>     docs
>     just give some commands and non gluster docs basically repeat that.
>     Given that it appears no self-healing is going on for my volume I am
>     confused as to why.
>
>     1.? If a self-heal deamon is listed on a host (all of mine show
>     one with
>     a volume status command) can I assume it's enabled and running?
>
>     2.? I assume the volume that has all the self-heals pending has some
>     serious issues even though I can access the files and directories on
>     it.? If self-heal is running shouldn't the numbers be decreasing?
>
>     It appears to me self-heal is not working properly so how to I get
>     it to
>     start working or should I delete the volume and start over?
>
>     I'm running gluster 5.2 on Centos 7 latest and updated.
>
>     Thank you.
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181226/dd078d86/attachment.html>

Gluster users - Dec 2018 - Self Heal Confusion

[Gluster-users] Self Heal Confusion

[Gluster-users] Self Heal Confusion

[Gluster-users] Self Heal Confusion

[Gluster-users] Self Heal Confusion