thr3ads.net - Gluster users - [Gluster-users] Nagios monitoring of replicated volumes.. [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Cory Meyer

2009-Apr-02 20:39 UTC

[Gluster-users] Nagios monitoring of replicated volumes..

Has anyone found a decent way out there to monitor GlusterFS volumes?
I'm currently using Nagios and Cacti to take care of basic CPU, Load,
Memory, and raw Disk I/O. I need to monitor GlusterFS status and making
sure all volumes are available..

My test environment is 6 servers with 6 AFR volumes which are each shared
between those 2 servers. All volumes are mounted on each server.

The checks I'm testing out so far include a simple Bash script that writes
the current Unix timestamp and hostname to a file once a minute. This is
done by each server on only the volumes that they store.
echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE

The Nagios NRPE daemon would then execute a Perl script on each of the
clients. This script goes thorugh each of the Gluster mount points
comparing the timestamps in the CHECK_FILE to the current system time
alarming if the timestamp is off by more than a minute. Another test which
hasn't been implimented was checking the contents of the CHECK_FILE with
the data that is on the raw disk.

Bash code to write timestamps and executed via cron once a minute.
(write_timestamps.sh)
http://glusterfs.pastebin.com/m5a220a6

Perl code to compare the timestamps which is executed on the client.
(check_glusterfs_mounts.pl)
http://glusterfs.pastebin.com/m2f057a77

Any ideas/questions/comments?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090402/62e2caba/attachment.html>

Stas Oskin

2009-Apr-02 21:05 UTC

head link

[Gluster-users] Nagios monitoring of replicated volumes..

Hi.

This is an interesting topic indeed.

I'm planning to have each server ping it's AFR pair, and if one of them
goes
down, the moment it comes up, to run ls -lR on the mount.

Perhaps others can share additional ideas?

Regards.

2009/4/2 Cory Meyer <cory.meyer at gmail.com>
> Has anyone found a decent way out there to monitor GlusterFS volumes?
> I'm currently using Nagios and Cacti to take care of basic CPU, Load,
> Memory, and raw Disk I/O.   I need to monitor GlusterFS status and making
> sure all volumes are available..
>
> My test environment is 6 servers with 6 AFR volumes which are each shared
> between those 2 servers.  All volumes are mounted on each server.
>
> The checks I'm testing out so far include a simple Bash script that
writes
> the current Unix timestamp and hostname to a file once a minute. This is
> done by each server on only the volumes that they store.
>    echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE
>
> The Nagios NRPE daemon would then execute a Perl script on each of the
> clients.   This script goes thorugh each of the Gluster mount points
> comparing the timestamps in the CHECK_FILE to the current system time
> alarming if the timestamp is off by more than a minute.  Another test which
> hasn't been implimented was checking the contents of the CHECK_FILE 
with
> the data that is on the raw disk.
>
> Bash code to write timestamps and executed via cron once a minute.
> (write_timestamps.sh)
> http://glusterfs.pastebin.com/m5a220a6
>
> Perl code to compare the timestamps which is executed on the client.
> (check_glusterfs_mounts.pl)
> http://glusterfs.pastebin.com/m2f057a77
>
> Any ideas/questions/comments?
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/10ab75a3/attachment.html>

Christopher Hawkins

2009-Apr-03 12:13 UTC

head link

[Gluster-users] Nagios monitoring of replicated volumes..

I wrote a script to do something similar. Here's a modified version that
will verify working glusterfs mounts in general... All you need is a path that
is the same on all nodes being checked and on the node performing the check, and
passwordless ssh into the gluster client nodes. For testing I just made a tmp
directory right inside the glusterfs mount and used that:


#!/bin/bash 


check_node() { 
??# ssh into the node and have it write its hostname into a temp file 
??# in a gluster mounted directory. If we can read it from here and it's 
??# correct, the node is online with 100% certainty 
??SSH="ssh -q -l root -o StrictHostKeyChecking=no -o
PasswordAuthentication=no -o ConnectTimeout=5"


??# All nodes must have the same path to this directory 
??TEMP_DIR=/cluster/tmp 
??FILE=`mktemp -p $TEMP_DIR` 
??$SSH $ip "hostname > $FILE" 


??# For any ip addresses listed in /a_node_ip_list (list them one per line) 
??# on line 27, we need to get the hostname 
??# from /etc/hosts. Make sure it's in there 
??if test "`grep $ip /etc/hosts | awk '{print $2}'`" ==
"`cat $FILE`"
?? then 
?? ?echo "confirmed online" 
?? else 
?? ?echo "not online. Call someone!" 
?? fi 
?} 


echo 
echo "GlusterFS status:" 
echo 


for ip in `cat /a_node_ip_list` 
?do 
?? ? echo -n "checking $ip... ?" 
?? ? check_node 
?done 


# Clean up 
rm -rf $TEMP_DIR/tmp.* 


exit 0 
? 



> 
> This is an interesting topic indeed. 
> 
> I'm planning to have each server ping it's AFR pair, and if one of
them
> goes down, the moment it comes up, to run ls -lR on the mount. 
> 
> Perhaps others can share additional ideas? 
> 
> Regards. 
> 
> 2009/4/2 Cory Meyer < cory.meyer at gmail.com > 
> 
> > Has anyone found a decent way out there to monitor GlusterFS volumes? 
> > I'm currently using Nagios and Cacti to take care of basic CPU,
Load,
> > Memory, and raw Disk I/O. ? I need to monitor GlusterFS status and
making
> > sure all volumes are available.. 
> > 
> > My test environment is 6 servers with 6 AFR volumes which are each
shared
> > between those 2 servers. ?All volumes are mounted on each server. 
> > 
> > The checks I'm testing out so far include a simple Bash script
that
> > writes the current Unix timestamp and hostname to a file once a
minute.
> > This is done by each server on only the volumes that they store. 
> > ? ?echo "$(uname -n):$(date +%s)" >
/mnt/gluster01/CHECK_FILE
> > 
> > The Nagios NRPE daemon would then execute a Perl script on each of the
> > clients. ? This script goes thorugh each of the Gluster mount points 
> > comparing the timestamps in the CHECK_FILE to the current system time 
> > alarming if the timestamp is off by more than a minute. ?Another test 
> > which hasn't been implimented was checking the contents of the
CHECK_FILE
> > ?with the data that is on the raw disk. 
> > 
> > Bash code to write timestamps and executed via cron once a minute. 
> > (write_timestamps.sh) 
> > http://glusterfs.pastebin.com/m5a220a6 
> > 
> > Perl code to compare the timestamps which is executed on the client. 
> > (check_glusterfs_mounts.pl) 
> > http://glusterfs.pastebin.com/m2f057a77 
> > 
> > Any ideas/questions/comments? 
> > 
> > 
> > 
> > _______________________________________________ 
> > Gluster-users mailing list 
> > Gluster-users at gluster.org 
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users 


_______________________________________________ Gluster-users mailing list
Gluster-users at gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/dcd5845d/attachment.html>

Gluster users - Apr 2009 - Nagios monitoring of replicated volumes..

[Gluster-users] Nagios monitoring of replicated volumes..

[Gluster-users] Nagios monitoring of replicated volumes..

[Gluster-users] Nagios monitoring of replicated volumes..