Hello all, I am curious to know if anybody out there is monitoring their Gluster systems in some way, and if so, how you've decided to go about it. I currently use Monit ( http://mmonit.com ) to do some basic checks : - is the glusterfs process in memory ? - is the glusterfs process a zombie ? - can a file be read on the gluster mount ? - can a file be written on the gluster mount ? One of the features of Monit is a simple connection test, whereby Monit will connect to the port of a given service, send a string, and then test the response. For example, one could connect to the SMTP service and issue a HELO, then check the response for validity. I would like very much to implement this feature, however i'm not sure that Gluster has any ? plaintext ? interactions of this sort. Is anyone aware of a way to implement this functionality ? The idea here would be to test whether the port is connectable /and/ that the process behind it is responding intelligently. I ask because a few weeks ago we had a problem where the glusterfs process was in memory, but wasn't actually accepting connections or interactions - a connection test would have found this problem immediately. Thank you, all ! -- Daniel Maher <dma+gluster AT witbe DOT net>
I do something a bit more simple. since my gluster filesystem has a web server on top, I have a cgi script which returns the hostname and a timestamp. I do a wget from another machine, if it times out, then somethings wrong, if it returns an error code then somethings wrong, if it returns the hostname and a timestamp which is different form the last timestamp I got then it''s working. the real problem with any monitoring is that there''s no real good way to automate fixing the failed gluster mount. since, in my experience, when the gluster mount fails it does a couple things: 1) the glusterfs process is still alive but only certain parts of the filesystem are inaccessible--I think they''ve fixed most of these problems in the recent versions of 1.4... However, when this happens, there are lots of zombied processes and often a df of the mount point hangs. but some files can be accessed while others cant. So, killing the glusterfs process doesn''t always unlock the file locks so you can''t unmount and remount the filesystem without finding and killing off all the hung processes 2) glusterfs process is dead, but you have some zombied things with file descriptors open on the mount point. in this case, sometimes a fuser -c will show the processes and you can kill them other times not. so, if there were an easy way of automating remounting the filesystem, then we''d be in good shape, but since there isn''t all we can do is get notified that it''s broken and then take manual steps, so a simple method of knowing it''s broken is sufficient for my purposes At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out >there is monitoring their Gluster systems in >some way, and if so, how you''ve decided to go >about it. I currently use Monit ( >http://mmonit.com ) to do some basic checks : - >is the glusterfs process in memory ? - is the >glusterfs process a zombie ? - can a file be >read on the gluster mount ? - can a file be >written on the gluster mount ? One of the >features of Monit is a simple connection test, >whereby Monit will connect to the port of a >given service, send a string, and then test the >response. For example, one could connect to the >SMTP service and issue a HELO, then check the >response for validity. I would like very much to >implement this feature, however i''m not sure >that Gluster has any ?? plaintext ?? >interactions of this sort. Is anyone aware of a >way to implement this functionality ? The idea >here would be to test whether the port is >connectable /and/ that the process behind it is >responding intelligently. I ask because a few >weeks ago we had a problem where the glusterfs >process was in memory, but wasn''t actually >accepting connections or interactions - a >connection test would have found this problem >immediately. Thank you, all ! -- Daniel Maher ><dma+gluster AT witbe DOT net> >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I do something a bit more simple. since my gluster filesystem has a web server on top, I have a cgi script which returns the hostname and a timestamp. I do a wget from another machine, if it times out, then somethings wrong, if it returns an error code then somethings wrong, if it returns the hostname and a timestamp which is different form the last timestamp I got then it's working. the real problem with any monitoring is that there's no real good way to automate fixing the failed gluster mount. since, in my experience, when the gluster mount fails it does a couple things: 1) the glusterfs process is still alive but only certain parts of the filesystem are inaccessible--I think they've fixed most of these problems in the recent versions of 1.4... However, when this happens, there are lots of zombied processes and often a df of the mount point hangs. but some files can be accessed while others cant. So, killing the glusterfs process doesn't always unlock the file locks so you can't unmount and remount the filesystem without finding and killing off all the hung processes 2) glusterfs process is dead, but you have some zombied things with file descriptors open on the mount point. in this case, sometimes a fuser -c will show the processes and you can kill them other times not. so, if there were an easy way of automating remounting the filesystem, then we'd be in good shape, but since there isn't all we can do is get notified that it's broken and then take manual steps, so a simple method of knowing it's broken is sufficient for my purposes At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out >there is monitoring their Gluster systems in >some way, and if so, how you've decided to go >about it. I currently use Monit ( >http://mmonit.com ) to do some basic checks : - >is the glusterfs process in memory ? - is the >glusterfs process a zombie ? - can a file be >read on the gluster mount ? - can a file be >written on the gluster mount ? One of the >features of Monit is a simple connection test, >whereby Monit will connect to the port of a >given service, send a string, and then test the >response. For example, one could connect to the >SMTP service and issue a HELO, then check the >response for validity. I would like very much to >implement this feature, however i'm not sure >that Gluster has any ?? plaintext ?? >interactions of this sort. Is anyone aware of a >way to implement this functionality ? The idea >here would be to test whether the port is >connectable /and/ that the process behind it is >responding intelligently. I ask because a few >weeks ago we had a problem where the glusterfs >process was in memory, but wasn't actually >accepting connections or interactions - a >connection test would have found this problem >immediately. Thank you, all ! -- Daniel Maher ><dma+gluster AT witbe DOT net> >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I do something a bit more simple. since my gluster filesystem has a web server on top, I have a cgi script which returns the hostname and a timestamp. I do a wget from another machine, if it times out, then somethings wrong, if it returns an error code then somethings wrong, if it returns the hostname and a timestamp which is different form the last timestamp I got then it''s working. the real problem with any monitoring is that there''s no real good way to automate fixing the failed gluster mount. since, in my experience, when the gluster mount fails it does a couple things: 1) the glusterfs process is still alive but only certain parts of the filesystem are inaccessible--I think they''ve fixed most of these problems in the recent versions of 1.4... However, when this happens, there are lots of zombied processes and often a df of the mount point hangs. but some files can be accessed while others cant. So, killing the glusterfs process doesn''t always unlock the file locks so you can''t unmount and remount the filesystem without finding and killing off all the hung processes 2) glusterfs process is dead, but you have some zombied things with file descriptors open on the mount point. in this case, sometimes a fuser -c will show the processes and you can kill them other times not. so, if there were an easy way of automating remounting the filesystem, then we''d be in good shape, but since there isn''t all we can do is get notified that it''s broken and then take manual steps, so a simple method of knowing it''s broken is sufficient for my purposes At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out >there is monitoring their Gluster systems in >some way, and if so, how you''ve decided to go >about it. I currently use Monit ( >http://mmonit.com ) to do some basic checks : - >is the glusterfs process in memory ? - is the >glusterfs process a zombie ? - can a file be >read on the gluster mount ? - can a file be >written on the gluster mount ? One of the >features of Monit is a simple connection test, >whereby Monit will connect to the port of a >given service, send a string, and then test the >response. For example, one could connect to the >SMTP service and issue a HELO, then check the >response for validity. I would like very much to >implement this feature, however i''m not sure >that Gluster has any ?? plaintext ?? >interactions of this sort. Is anyone aware of a >way to implement this functionality ? The idea >here would be to test whether the port is >connectable /and/ that the process behind it is >responding intelligently. I ask because a few >weeks ago we had a problem where the glusterfs >process was in memory, but wasn''t actually >accepting connections or interactions - a >connection test would have found this problem >immediately. Thank you, all ! -- Daniel Maher ><dma+gluster AT witbe DOT net> >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I do something a bit more simple. since my gluster filesystem has a web server on top, I have a cgi script which returns the hostname and a timestamp. I do a wget from another machine, if it times out, then somethings wrong, if it returns an error code then somethings wrong, if it returns the hostname and a timestamp which is different form the last timestamp I got then it''s working. the real problem with any monitoring is that there''s no real good way to automate fixing the failed gluster mount. since, in my experience, when the gluster mount fails it does a couple things: 1) the glusterfs process is still alive but only certain parts of the filesystem are inaccessible--I think they''ve fixed most of these problems in the recent versions of 1.4... However, when this happens, there are lots of zombied processes and often a df of the mount point hangs. but some files can be accessed while others cant. So, killing the glusterfs process doesn''t always unlock the file locks so you can''t unmount and remount the filesystem without finding and killing off all the hung processes 2) glusterfs process is dead, but you have some zombied things with file descriptors open on the mount point. in this case, sometimes a fuser -c will show the processes and you can kill them other times not. so, if there were an easy way of automating remounting the filesystem, then we''d be in good shape, but since there isn''t all we can do is get notified that it''s broken and then take manual steps, so a simple method of knowing it''s broken is sufficient for my purposes At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out >there is monitoring their Gluster systems in >some way, and if so, how you''ve decided to go >about it. I currently use Monit ( >http://mmonit.com ) to do some basic checks : - >is the glusterfs process in memory ? - is the >glusterfs process a zombie ? - can a file be >read on the gluster mount ? - can a file be >written on the gluster mount ? One of the >features of Monit is a simple connection test, >whereby Monit will connect to the port of a >given service, send a string, and then test the >response. For example, one could connect to the >SMTP service and issue a HELO, then check the >response for validity. I would like very much to >implement this feature, however i''m not sure >that Gluster has any ?? plaintext ?? >interactions of this sort. Is anyone aware of a >way to implement this functionality ? The idea >here would be to test whether the port is >connectable /and/ that the process behind it is >responding intelligently. I ask because a few >weeks ago we had a problem where the glusterfs >process was in memory, but wasn''t actually >accepting connections or interactions - a >connection test would have found this problem >immediately. Thank you, all ! -- Daniel Maher ><dma+gluster AT witbe DOT net> >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users