Mark Nipper
2012-Jun-28 21:06 UTC
[Gluster-users] necessary improvements in documentation and monitoring
We are currently dipping our toes in Gluster using the 3.3 GA RPM packages. So far most everything has been working acceptably. However, I do have some suggested improvements you might consider to help the project out long term that we've run across so far ourselves. The first problem we ran into has to do with monitoring. We're coming from using a DRBD HA cluster using iSCSI to Gluster for storing our KVM images in qcow2 files instead of iSCSI exported LVM logical volumes on top of DRBD. Needless to say, Gluster is a joy in comparison to working with the complexity of the DRBD/HA/iSCSI stack. But, Gluster seems to lack terribly in the monitoring department. It's quite trivial to get the status of your DRBD volumes looking at the output of /proc/drbd, even as an unprivileged user. To get something even resembling this in Gluster requires, at the very least, root permissions. I'm attaching a script I wrote which I think (3.3 is our first real attempt using Gluster, so I'm probably making some gross assumptions about the health of the volumes by using this script) is giving us at least partial visibility into the status of our Gluster volumes via Nagios. Hopefully the list allows the shell script through. If not, I'll post a URL to it elsewhere. Please suggest any changes you think might make that script more robust or useful. Like I said, it's just the first pass at trying to get something workable into Nagios. Secondly, coming from a DRBD background, there are a few things that seem like obvious omissions in the Gluster Administration Guide. The first relates to the upgrade process. What is (or does it even exist) the upgrade process for moving between minor point releases of Gluster? Can you upgrade one brick in say a replicated volume to the next point release, reboot, wait for synchronization or healing to occur, and then rinse and repeat across the other bricks? This isn't really spelled out anywhere and that's quite distressing given that one of the major features of Gluster is the always on capability. The DRBD documentation, as an example, points out that upgrades between minor point releases are fully supported (no protocol breaking incompatibilities for example) and clearly illustrates the process of upgrading each node at a time. I understand that this would only be highly useful on volumes which are in fact replicated somehow in Gluster. Any other configuration and you'd have to expect some kind of downtime obviously. The second omission has to do with what might be simply a design limitation currently. In attempting to upgrade using a rolling release approach as just discussed in the last paragraph, there doesn't seem to be a clean way to shut down any given brick in a volume. Currently we're simply typing reboot on one of the bricks and then we get hit with the network.ping-timeout seemingly where the other server brick and all our KVM guests hang for around 50 seconds. This was rather unexpected behavior to say the least (especially with the other server brick hanging completely as well; we're accustom to the KVM guests themselves hanging and waiting indefinitely for their iSCSI backed LVM volume to come back during a DRBD transition for example). I see the detach option, but there isn't a reattach. Do you just re-add the brick into the volume? Does it incur the same amount of synchronization overhead as simply rebooting one of the bricks or does it resynchronize the entire volume from scratch? Does it avoid the network.ping-timeout problem completely? Again, this seems like an area where things could be more clearly spelled out as it seems like something an administrator would commonly be affected by in the routine maintenance of servers. Those are the biggest issues we've seen for now. We have one KVM guest which keeps ending up with a read-only root file system. But since none of the other guests are doing it so far, and I don't see any chatter in any of the Gluster logs about that qcow2 file, I'm assuming for the time being that the disk image was somehow already corrupt coming from the DRBD backed LVM volume it was on before I used qemu-img to convert it to qcow2 on top of this Gluster volume. I'm going to recreate the VM from scratch to verify that it's a lower level disk image problem and not a Gluster problem at this point as the other VM's have been behaving okay. Thanks for reading! -- Mark Nipper nipsy at bitgnome.net (XMPP) +1 979 575 3193 - "All existence is conditioned." -- Shakyamuni Buddha -------------- next part -------------- #!/bin/bash # This Nagios script was written against version 3.3 of Gluster. Older # versions will most likely not work at all with this monitoring script. # # Gluster currently requires elevated permissions to do anything. In order to # accommodate this, you need to allow your Nagios user some additional # permissions via sudo. The line you want to add will look something like the # following in /etc/sudoers (or something equivalent): # # Defaults:nagios !requiretty # nagios ALL=(root) NOPASSWD:/usr/sbin/gluster peer status,/usr/sbin/gluster volume list,/usr/sbin/gluster volume heal [[\:graph\:]]* info # # That should give us all the access we need to check the status of any # currently defined peers and volumes. # define some variables ME=$(basename -- $0) SUDO="/usr/bin/sudo" PIDOF="/sbin/pidof" GLUSTER="/usr/sbin/gluster" PEERSTATUS="peer status" VOLLIST="volume list" VOLHEAL1="volume heal" VOLHEAL2="info" peererrorvolerror # check for commands for cmd in $SUDO $PIDOF $GLUSTER; do if [ ! -x "$cmd" ]; then echo "$ME UNKNOWN - $cmd not found" exit 3 fi done # check for glusterd (management daemon) if ! $PIDOF glusterd &>/dev/null; then echo "$ME CRITICAL - glusterd management daemon not running" exit 2 fi # check for glusterfsd (brick daemon) if ! $PIDOF glusterfsd &>/dev/null; then echo "$ME CRITICAL - glusterfsd brick daemon not running" exit 2 fi # get peer status peerstatus="peers: " for peer in $(sudo $GLUSTER $PEERSTATUS | grep '^Hostname: ' | awk '{print $2}'); do state state=$(sudo $GLUSTER $PEERSTATUS | grep -A 2 "^Hostname: $peer$" | grep '^State: ' | sed -nre 's/.* \(([[:graph:]]+)\)$/\1/p') if [ "$state" != "Connected" ]; then peererror=1 fi peerstatus+="$peer/$state " done # get volume status volstatus="volumes: " for vol in $(sudo $GLUSTER $VOLLIST); do thisvolerror=0 entries for entries in $(sudo $GLUSTER $VOLHEAL1 $vol $VOLHEAL2 | grep '^Number of entries: ' | awk '{print $4}'); do if [ "$entries" -gt 0 ]; then volerror=1 let $((thisvolerror+=entries)) fi done volstatus+="$vol/$thisvolerror unsynchronized entries " done # drop extra space peerstatus=${peerstatus:0:${#peerstatus}-1} volstatus=${volstatus:0:${#volstatus}-1} # set status according to whether any errors occurred if [ "$peererror" ] || [ "$volerror" ]; then status="CRITICAL" else status="OK" fi # actual Nagios output echo "$ME $status $peerstatus $volstatus" # exit with appropriate value if [ "$peererror" ] || [ "$volerror" ]; then exit 2 else exit 0 fi