thr3ads.net - Gluster users - [Gluster-users] v3.6.1 vs v3.5.2 self heal

If this information is useful, please help other people find it:
Share via:

Vince Loschiavo

2014-Nov-19 16:20 UTC

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Hello Gluster Community,

I have been using the Nagios monitoring scripts, mentioned in the below
thread, on 3.5.2 with great success. The most useful of these is the self
heal.

However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
has become quite aggressive.  I continually get alerts/warnings on 3.6.1
that virt disk images need self heal, then they clear.  This is not the
case on 3.5.2.  This

Configuration:
2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
using this volume as a QEMU/KVM virt image store through the fuse mount on
Centos 6.5.

Example:
on 3.5.2:
*gluster volume heal volumename info:  *shows the bricks and number of
entries to be healed: 0

On v3.5.2 - During normal gluster operations, I can run this command over
and over again, 2-4 times per second, and it will always show 0 entries to
be healed.  I've used this as an indicator that the bricks are
synchronized.

Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
Running *gluster volume heal volumename info*, during normal operations,
will show a file out-of-sync, seemingly between every block written to disk
then synced to the peer.  I can run the command over and over again, 2-4
times per second, and it will almost always show something out of sync.
The individual files change, meaning:

Example:
1st Run: shows file1 out of sync
2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync (not
in the list)
3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
(not in the list).
...
nth run: shows 0 files out of sync
nth+1 run: shows file 3 and 12 out of sync.
>From looking at the virtual machines running off this gluster volume,
it'sobvious that gluster is working well.  However, this obviously plays havoc
with Nagios and alerts.  Nagios will run the heal info and get different
and non-useful results each time, and will send alerts.

Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to tune
the settings or change the monitoring method to get better results into
Nagios.

Thank you,

-- 
-Vince Loschiavo

On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <
humble.devassy at gmail.com> wrote:
> Hi Gopu,
>
> Awesome !!
>
> We can  have a Gluster blog about this implementation.
>
> --Humble
>
>
>
> --Humble
>
>
> On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan <gopukrishnantec at
gmail.com>
> wrote:
>
>> Thanks for all your help... I was able to configure nagios using the
>> glusterfs plugin. Following link shows how I configured it. Hope it
helps
>> someone else.:
>>
>>
>>
http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
>>
>> On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <
>> humble.devassy at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Please look at this thread
>>>
http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
>>>
>>> Btw,  if you are around, we have a talk on same topic in upcoming
>>> GlusterFS India meetup.
>>>
>>> Details can be fetched from:
>>>  http://www.meetup.com/glusterfs-India/
>>>
>>> --Humble
>>>
>>> --Humble
>>>
>>>
>>> On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan <
>>> gopukrishnantec at gmail.com> wrote:
>>>
>>>> How can we monitor the glusters and alert us if something
happened
>>>> wrong. I found some nagios plugins and didn't work until
this time. I am
>>>> still experimenting with those. Any suggestions would be much
helpful
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141119/c228a9cf/attachment.html>

Humble Devassy Chirammal

2014-Nov-19 17:52 UTC

head link

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Hi Vince,
It could be a behavioural change in heal process output capture with latest
GlusterFS. If that is the case, we may tune the interval which  nagios
collect heal info output  or some other settings to avoid continuous
alerts. I am Ccing  gluster nagios devs.

--Humble

--Humble


On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo <vloschiavo at gmail.com>
wrote:
>
> Hello Gluster Community,
>
> I have been using the Nagios monitoring scripts, mentioned in the below
> thread, on 3.5.2 with great success. The most useful of these is the self
> heal.
>
> However, I've just upgraded to 3.6.1 on the lab and the self heal
daemon
> has become quite aggressive.  I continually get alerts/warnings on 3.6.1
> that virt disk images need self heal, then they clear.  This is not the
> case on 3.5.2.  This
>
> Configuration:
> 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
> using this volume as a QEMU/KVM virt image store through the fuse mount on
> Centos 6.5.
>
> Example:
> on 3.5.2:
> *gluster volume heal volumename info:  *shows the bricks and number of
> entries to be healed: 0
>
> On v3.5.2 - During normal gluster operations, I can run this command over
> and over again, 2-4 times per second, and it will always show 0 entries to
> be healed.  I've used this as an indicator that the bricks are
> synchronized.
>
> Last night, I upgraded to 3.6.1 in lab and I'm seeing different
behavior.
> Running *gluster volume heal volumename info*, during normal operations,
> will show a file out-of-sync, seemingly between every block written to disk
> then synced to the peer.  I can run the command over and over again, 2-4
> times per second, and it will almost always show something out of sync.
> The individual files change, meaning:
>
> Example:
> 1st Run: shows file1 out of sync
> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
> (not in the list)
> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
> (not in the list).
> ...
> nth run: shows 0 files out of sync
> nth+1 run: shows file 3 and 12 out of sync.
>
> From looking at the virtual machines running off this gluster volume,
it's
> obvious that gluster is working well.  However, this obviously plays havoc
> with Nagios and alerts.  Nagios will run the heal info and get different
> and non-useful results each time, and will send alerts.
>
> Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to tune
> the settings or change the monitoring method to get better results into
> Nagios.
>
> Thank you,
>
> --
> -Vince Loschiavo
>
>
> On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <
> humble.devassy at gmail.com> wrote:
>
>> Hi Gopu,
>>
>> Awesome !!
>>
>> We can  have a Gluster blog about this implementation.
>>
>> --Humble
>>
>>
>>
>> --Humble
>>
>>
>> On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan <gopukrishnantec at
gmail.com
>> > wrote:
>>
>>> Thanks for all your help... I was able to configure nagios using
the
>>> glusterfs plugin. Following link shows how I configured it. Hope it
helps
>>> someone else.:
>>>
>>>
>>>
http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
>>>
>>> On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <
>>> humble.devassy at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Please look at this thread
>>>>
http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
>>>>
>>>> Btw,  if you are around, we have a talk on same topic in
upcoming
>>>> GlusterFS India meetup.
>>>>
>>>> Details can be fetched from:
>>>>  http://www.meetup.com/glusterfs-India/
>>>>
>>>> --Humble
>>>>
>>>> --Humble
>>>>
>>>>
>>>> On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan <
>>>> gopukrishnantec at gmail.com> wrote:
>>>>
>>>>> How can we monitor the glusters and alert us if something
happened
>>>>> wrong. I found some nagios plugins and didn't work
until this time. I am
>>>>> still experimenting with those. Any suggestions would be
much helpful
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>>
http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141119/62fc8d8e/attachment.html>

Anuradha Talur

2014-Nov-21 06:01 UTC

head link

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

----- Original Message -----> From: "Vince Loschiavo" <vloschiavo at gmail.com>
> To: "gluster-users at gluster.org" <Gluster-users at
gluster.org>
> Sent: Wednesday, November 19, 2014 9:50:50 PM
> Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)
> 
> 
> Hello Gluster Community,
> 
> I have been using the Nagios monitoring scripts, mentioned in the below
> thread, on 3.5.2 with great success. The most useful of these is the self
> heal.
> 
> However, I've just upgraded to 3.6.1 on the lab and the self heal
daemon has
> become quite aggressive. I continually get alerts/warnings on 3.6.1 that
> virt disk images need self heal, then they clear. This is not the case on
> 3.5.2. This
> 
> Configuration:
> 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
> using this volume as a QEMU/KVM virt image store through the fuse mount on
> Centos 6.5.
> 
> Example:
> on 3.5.2:
> gluster volume heal volumename info: shows the bricks and number of entries
> to be healed: 0
> 
> On v3.5.2 - During normal gluster operations, I can run this command over
and
> over again, 2-4 times per second, and it will always show 0 entries to be
> healed. I've used this as an indicator that the bricks are
synchronized.
> 
> Last night, I upgraded to 3.6.1 in lab and I'm seeing different
behavior.
> Running gluster volume heal volumename info , during normal operations,
will
> show a file out-of-sync, seemingly between every block written to disk then
> synced to the peer. I can run the command over and over again, 2-4 times
per
> second, and it will almost always show something out of sync. The
individual
> files change, meaning:
> 
> Example:
> 1st Run: shows file1 out of sync
> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync (not
> in the list)
> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
> (not in the list).
> ...
> nth run: shows 0 files out of sync
> nth+1 run: shows file 3 and 12 out of sync.
> 
> From looking at the virtual machines running off this gluster volume,
it's
> obvious that gluster is working well. However, this obviously plays havoc
> with Nagios and alerts. Nagios will run the heal info and get different and
> non-useful results each time, and will send alerts.
> 
> Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to tune
the
> settings or change the monitoring method to get better results into Nagios.
> In 3.6.1 the way heal info command works is different from that in 3.5.2. In
3.6.1, it is self-heal daemon that gathers the entries that might need healing.
Currently, in 3.6.1, there isn't a method to distinguish between a file that
is being healed and a file with on-going I/O while listing. Hence you see files
with normal operation too listed in the output of heal info
command.> Thank you,
> 
> --
> -Vince Loschiavo
> 
> 
> On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal <
> humble.devassy at gmail.com > wrote:
> 
> 
> 
> Hi Gopu,
> 
> Awesome !!
> 
> We can have a Gluster blog about this implementation.
> 
> --Humble
> 
> 
> 
> --Humble
> 
> 
> On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan < gopukrishnantec at
gmail.com >
> wrote:
> 
> 
> 
> Thanks for all your help... I was able to configure nagios using the
> glusterfs plugin. Following link shows how I configured it. Hope it helps
> someone else.:
> 
>
http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
> 
> On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal <
> humble.devassy at gmail.com > wrote:
> 
> 
> 
> Hi,
> 
> Please look at this thread
> http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
> 
> Btw, if you are around, we have a talk on same topic in upcoming GlusterFS
> India meetup.
> 
> Details can be fetched from:
> http://www.meetup.com/glusterfs-India/
> 
> --Humble
> 
> --Humble
> 
> 
> On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan < gopukrishnantec at
gmail.com >
> wrote:
> 
> 
> 
> How can we monitor the glusters and alert us if something happened wrong. I
> found some nagios plugins and didn't work until this time. I am still
> experimenting with those. Any suggestions would be much helpful
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-- 
Thanks,
Anuradha.

Gluster users - Nov 2014 - v3.6.1 vs v3.5.2 self heal - help (Nagios related)

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

[Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)