Cory Meyer
2009-Apr-02 20:39 UTC
[Gluster-users] Nagios monitoring of replicated volumes..
Has anyone found a decent way out there to monitor GlusterFS volumes? I'm currently using Nagios and Cacti to take care of basic CPU, Load, Memory, and raw Disk I/O. I need to monitor GlusterFS status and making sure all volumes are available.. My test environment is 6 servers with 6 AFR volumes which are each shared between those 2 servers. All volumes are mounted on each server. The checks I'm testing out so far include a simple Bash script that writes the current Unix timestamp and hostname to a file once a minute. This is done by each server on only the volumes that they store. echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE The Nagios NRPE daemon would then execute a Perl script on each of the clients. This script goes thorugh each of the Gluster mount points comparing the timestamps in the CHECK_FILE to the current system time alarming if the timestamp is off by more than a minute. Another test which hasn't been implimented was checking the contents of the CHECK_FILE with the data that is on the raw disk. Bash code to write timestamps and executed via cron once a minute. (write_timestamps.sh) http://glusterfs.pastebin.com/m5a220a6 Perl code to compare the timestamps which is executed on the client. (check_glusterfs_mounts.pl) http://glusterfs.pastebin.com/m2f057a77 Any ideas/questions/comments? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090402/62e2caba/attachment.html>
Stas Oskin
2009-Apr-02 21:05 UTC
[Gluster-users] Nagios monitoring of replicated volumes..
Hi. This is an interesting topic indeed. I'm planning to have each server ping it's AFR pair, and if one of them goes down, the moment it comes up, to run ls -lR on the mount. Perhaps others can share additional ideas? Regards. 2009/4/2 Cory Meyer <cory.meyer at gmail.com>> Has anyone found a decent way out there to monitor GlusterFS volumes? > I'm currently using Nagios and Cacti to take care of basic CPU, Load, > Memory, and raw Disk I/O. I need to monitor GlusterFS status and making > sure all volumes are available.. > > My test environment is 6 servers with 6 AFR volumes which are each shared > between those 2 servers. All volumes are mounted on each server. > > The checks I'm testing out so far include a simple Bash script that writes > the current Unix timestamp and hostname to a file once a minute. This is > done by each server on only the volumes that they store. > echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE > > The Nagios NRPE daemon would then execute a Perl script on each of the > clients. This script goes thorugh each of the Gluster mount points > comparing the timestamps in the CHECK_FILE to the current system time > alarming if the timestamp is off by more than a minute. Another test which > hasn't been implimented was checking the contents of the CHECK_FILE with > the data that is on the raw disk. > > Bash code to write timestamps and executed via cron once a minute. > (write_timestamps.sh) > http://glusterfs.pastebin.com/m5a220a6 > > Perl code to compare the timestamps which is executed on the client. > (check_glusterfs_mounts.pl) > http://glusterfs.pastebin.com/m2f057a77 > > Any ideas/questions/comments? > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/10ab75a3/attachment.html>
Christopher Hawkins
2009-Apr-03 12:13 UTC
[Gluster-users] Nagios monitoring of replicated volumes..
I wrote a script to do something similar. Here's a modified version that will verify working glusterfs mounts in general... All you need is a path that is the same on all nodes being checked and on the node performing the check, and passwordless ssh into the gluster client nodes. For testing I just made a tmp directory right inside the glusterfs mount and used that: #!/bin/bash check_node() { ??# ssh into the node and have it write its hostname into a temp file ??# in a gluster mounted directory. If we can read it from here and it's ??# correct, the node is online with 100% certainty ??SSH="ssh -q -l root -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ConnectTimeout=5" ??# All nodes must have the same path to this directory ??TEMP_DIR=/cluster/tmp ??FILE=`mktemp -p $TEMP_DIR` ??$SSH $ip "hostname > $FILE" ??# For any ip addresses listed in /a_node_ip_list (list them one per line) ??# on line 27, we need to get the hostname ??# from /etc/hosts. Make sure it's in there ??if test "`grep $ip /etc/hosts | awk '{print $2}'`" == "`cat $FILE`" ?? then ?? ?echo "confirmed online" ?? else ?? ?echo "not online. Call someone!" ?? fi ?} echo echo "GlusterFS status:" echo for ip in `cat /a_node_ip_list` ?do ?? ? echo -n "checking $ip... ?" ?? ? check_node ?done # Clean up rm -rf $TEMP_DIR/tmp.* exit 0 ?> > This is an interesting topic indeed. > > I'm planning to have each server ping it's AFR pair, and if one of them > goes down, the moment it comes up, to run ls -lR on the mount. > > Perhaps others can share additional ideas? > > Regards. > > 2009/4/2 Cory Meyer < cory.meyer at gmail.com > > > > Has anyone found a decent way out there to monitor GlusterFS volumes? > > I'm currently using Nagios and Cacti to take care of basic CPU, Load, > > Memory, and raw Disk I/O. ? I need to monitor GlusterFS status and making > > sure all volumes are available.. > > > > My test environment is 6 servers with 6 AFR volumes which are each shared > > between those 2 servers. ?All volumes are mounted on each server. > > > > The checks I'm testing out so far include a simple Bash script that > > writes the current Unix timestamp and hostname to a file once a minute. > > This is done by each server on only the volumes that they store. > > ? ?echo "$(uname -n):$(date +%s)" > /mnt/gluster01/CHECK_FILE > > > > The Nagios NRPE daemon would then execute a Perl script on each of the > > clients. ? This script goes thorugh each of the Gluster mount points > > comparing the timestamps in the CHECK_FILE to the current system time > > alarming if the timestamp is off by more than a minute. ?Another test > > which hasn't been implimented was checking the contents of the CHECK_FILE > > ?with the data that is on the raw disk. > > > > Bash code to write timestamps and executed via cron once a minute. > > (write_timestamps.sh) > > http://glusterfs.pastebin.com/m5a220a6 > > > > Perl code to compare the timestamps which is executed on the client. > > (check_glusterfs_mounts.pl) > > http://glusterfs.pastebin.com/m2f057a77 > > > > Any ideas/questions/comments? > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090403/dcd5845d/attachment.html>