thr3ads.net - Gluster users - [Gluster-users] monitoring glusterfs ? [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Daniel Maher

2008-Dec-08 14:37 UTC

[Gluster-users] monitoring glusterfs ?

Hello all,

I am curious to know if anybody out there is monitoring their Gluster 
systems in some way, and if so, how you've decided to go about it.

I currently use Monit ( http://mmonit.com ) to do some basic checks :
- is the glusterfs process in memory ?
- is the glusterfs process a zombie ?
- can a file be read on the gluster mount ?
- can a file be written on the gluster mount ?

One of the features of Monit is a simple connection test, whereby Monit 
will connect to the port of a given service, send a string, and then 
test the response.  For example, one could connect to the SMTP service 
and issue a HELO, then check the response for validity.

I would like very much to implement this feature, however i'm not sure 
that Gluster has any ? plaintext ? interactions of this sort.  Is anyone 
aware of a way to implement this functionality ?  The idea here would be 
to test whether the port is connectable /and/ that the process behind it 
is responding intelligently.  I ask because a few weeks ago we had a 
problem where the glusterfs process was in memory, but wasn't actually 
accepting connections or interactions - a connection test would have 
found this problem immediately.

Thank you, all !


-- 
Daniel Maher <dma+gluster AT witbe DOT net>

Keith Freedman

2008-Dec-08 21:37 UTC

head link

[Gluster-users] monitoring glusterfs ?

I do something a bit more simple.
since my gluster filesystem has a web server on 
top, I have a cgi script which returns the hostname and a timestamp.

I do a wget from another machine, if it times 
out, then somethings wrong, if it returns an 
error code then somethings wrong, if it returns 
the hostname and a timestamp which is different 
form the last timestamp I got then it''s working.

the real problem with any monitoring is that 
there''s no real good way to automate fixing the failed gluster mount.

since, in my experience, when the gluster mount fails it does a couple things:
1) the glusterfs process is still alive but only 
certain parts of the filesystem are 
inaccessible--I think they''ve fixed most of these 
problems in the recent versions of 1.4... 
However, when this happens, there are lots of 
zombied processes and often a df of the mount 
point hangs.  but some files can be accessed 
while others cant.  So, killing the glusterfs 
process doesn''t always  unlock the file locks so 
you can''t unmount and remount the filesystem 
without finding and killing off all the hung processes
2) glusterfs process is dead, but you have some 
zombied things with file descriptors open on the 
mount point.  in this case, sometimes a fuser -c 
will show the processes and you can kill them other times not.

so, if there were an easy way of automating 
remounting the filesystem, then we''d be in good 
shape, but since there isn''t all we can do is get 
notified that it''s broken and then take manual 
steps, so a simple method of knowing it''s broken is sufficient for my
purposes

At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out 
>there is monitoring their Gluster systems in 
>some way, and if so, how you''ve decided to go 
>about it. I currently use Monit ( 
>http://mmonit.com ) to do some basic checks : - 
>is the glusterfs process in memory ? - is the 
>glusterfs process a zombie ? - can a file be 
>read on the gluster mount ? - can a file be 
>written on the gluster mount ? One of the 
>features of Monit is a simple connection test, 
>whereby Monit will connect to the port of a 
>given service, send a string, and then test the 
>response.  For example, one could connect to the 
>SMTP service and issue a HELO, then check the 
>response for validity. I would like very much to 
>implement this feature, however i''m not sure 
>that Gluster has any ?? plaintext ?? 
>interactions of this sort.  Is anyone aware of a 
>way to implement this functionality ?  The idea 
>here would be to test whether the port is 
>connectable /and/ that the process behind it is 
>responding intelligently.  I ask because a few 
>weeks ago we had a problem where the glusterfs 
>process was in memory, but wasn''t actually 
>accepting connections or interactions - a 
>connection test would have found this problem 
>immediately. Thank you, all ! -- Daniel Maher 
><dma+gluster AT witbe DOT net> 
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-08 21:37 UTC

head link

[Gluster-users] monitoring glusterfs ?

I do something a bit more simple.
since my gluster filesystem has a web server on 
top, I have a cgi script which returns the hostname and a timestamp.

I do a wget from another machine, if it times 
out, then somethings wrong, if it returns an 
error code then somethings wrong, if it returns 
the hostname and a timestamp which is different 
form the last timestamp I got then it's working.

the real problem with any monitoring is that 
there's no real good way to automate fixing the failed gluster mount.

since, in my experience, when the gluster mount fails it does a couple things:
1) the glusterfs process is still alive but only 
certain parts of the filesystem are 
inaccessible--I think they've fixed most of these 
problems in the recent versions of 1.4... 
However, when this happens, there are lots of 
zombied processes and often a df of the mount 
point hangs.  but some files can be accessed 
while others cant.  So, killing the glusterfs 
process doesn't always  unlock the file locks so 
you can't unmount and remount the filesystem 
without finding and killing off all the hung processes
2) glusterfs process is dead, but you have some 
zombied things with file descriptors open on the 
mount point.  in this case, sometimes a fuser -c 
will show the processes and you can kill them other times not.

so, if there were an easy way of automating 
remounting the filesystem, then we'd be in good 
shape, but since there isn't all we can do is get 
notified that it's broken and then take manual 
steps, so a simple method of knowing it's broken is sufficient for my
purposes

At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out 
>there is monitoring their Gluster systems in 
>some way, and if so, how you've decided to go 
>about it. I currently use Monit ( 
>http://mmonit.com ) to do some basic checks : - 
>is the glusterfs process in memory ? - is the 
>glusterfs process a zombie ? - can a file be 
>read on the gluster mount ? - can a file be 
>written on the gluster mount ? One of the 
>features of Monit is a simple connection test, 
>whereby Monit will connect to the port of a 
>given service, send a string, and then test the 
>response.  For example, one could connect to the 
>SMTP service and issue a HELO, then check the 
>response for validity. I would like very much to 
>implement this feature, however i'm not sure 
>that Gluster has any ?? plaintext ?? 
>interactions of this sort.  Is anyone aware of a 
>way to implement this functionality ?  The idea 
>here would be to test whether the port is 
>connectable /and/ that the process behind it is 
>responding intelligently.  I ask because a few 
>weeks ago we had a problem where the glusterfs 
>process was in memory, but wasn't actually 
>accepting connections or interactions - a 
>connection test would have found this problem 
>immediately. Thank you, all ! -- Daniel Maher 
><dma+gluster AT witbe DOT net> 
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-08 21:37 UTC

head link

[Gluster-users] monitoring glusterfs ?

I do something a bit more simple.
since my gluster filesystem has a web server on 
top, I have a cgi script which returns the hostname and a timestamp.

I do a wget from another machine, if it times 
out, then somethings wrong, if it returns an 
error code then somethings wrong, if it returns 
the hostname and a timestamp which is different 
form the last timestamp I got then it''s working.

the real problem with any monitoring is that 
there''s no real good way to automate fixing the failed gluster mount.

since, in my experience, when the gluster mount fails it does a couple things:
1) the glusterfs process is still alive but only 
certain parts of the filesystem are 
inaccessible--I think they''ve fixed most of these 
problems in the recent versions of 1.4... 
However, when this happens, there are lots of 
zombied processes and often a df of the mount 
point hangs.  but some files can be accessed 
while others cant.  So, killing the glusterfs 
process doesn''t always  unlock the file locks so 
you can''t unmount and remount the filesystem 
without finding and killing off all the hung processes
2) glusterfs process is dead, but you have some 
zombied things with file descriptors open on the 
mount point.  in this case, sometimes a fuser -c 
will show the processes and you can kill them other times not.

so, if there were an easy way of automating 
remounting the filesystem, then we''d be in good 
shape, but since there isn''t all we can do is get 
notified that it''s broken and then take manual 
steps, so a simple method of knowing it''s broken is sufficient for my
purposes

At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out 
>there is monitoring their Gluster systems in 
>some way, and if so, how you''ve decided to go 
>about it. I currently use Monit ( 
>http://mmonit.com ) to do some basic checks : - 
>is the glusterfs process in memory ? - is the 
>glusterfs process a zombie ? - can a file be 
>read on the gluster mount ? - can a file be 
>written on the gluster mount ? One of the 
>features of Monit is a simple connection test, 
>whereby Monit will connect to the port of a 
>given service, send a string, and then test the 
>response.  For example, one could connect to the 
>SMTP service and issue a HELO, then check the 
>response for validity. I would like very much to 
>implement this feature, however i''m not sure 
>that Gluster has any ?? plaintext ?? 
>interactions of this sort.  Is anyone aware of a 
>way to implement this functionality ?  The idea 
>here would be to test whether the port is 
>connectable /and/ that the process behind it is 
>responding intelligently.  I ask because a few 
>weeks ago we had a problem where the glusterfs 
>process was in memory, but wasn''t actually 
>accepting connections or interactions - a 
>connection test would have found this problem 
>immediately. Thank you, all ! -- Daniel Maher 
><dma+gluster AT witbe DOT net> 
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-08 21:37 UTC

head link

[Gluster-users] monitoring glusterfs ?

I do something a bit more simple.
since my gluster filesystem has a web server on 
top, I have a cgi script which returns the hostname and a timestamp.

I do a wget from another machine, if it times 
out, then somethings wrong, if it returns an 
error code then somethings wrong, if it returns 
the hostname and a timestamp which is different 
form the last timestamp I got then it''s working.

the real problem with any monitoring is that 
there''s no real good way to automate fixing the failed gluster mount.

since, in my experience, when the gluster mount fails it does a couple things:
1) the glusterfs process is still alive but only 
certain parts of the filesystem are 
inaccessible--I think they''ve fixed most of these 
problems in the recent versions of 1.4... 
However, when this happens, there are lots of 
zombied processes and often a df of the mount 
point hangs.  but some files can be accessed 
while others cant.  So, killing the glusterfs 
process doesn''t always  unlock the file locks so 
you can''t unmount and remount the filesystem 
without finding and killing off all the hung processes
2) glusterfs process is dead, but you have some 
zombied things with file descriptors open on the 
mount point.  in this case, sometimes a fuser -c 
will show the processes and you can kill them other times not.

so, if there were an easy way of automating 
remounting the filesystem, then we''d be in good 
shape, but since there isn''t all we can do is get 
notified that it''s broken and then take manual 
steps, so a simple method of knowing it''s broken is sufficient for my
purposes

At 06:37 AM 12/8/2008, Daniel Maher wrote:>Hello all, I am curious to know if anybody out 
>there is monitoring their Gluster systems in 
>some way, and if so, how you''ve decided to go 
>about it. I currently use Monit ( 
>http://mmonit.com ) to do some basic checks : - 
>is the glusterfs process in memory ? - is the 
>glusterfs process a zombie ? - can a file be 
>read on the gluster mount ? - can a file be 
>written on the gluster mount ? One of the 
>features of Monit is a simple connection test, 
>whereby Monit will connect to the port of a 
>given service, send a string, and then test the 
>response.  For example, one could connect to the 
>SMTP service and issue a HELO, then check the 
>response for validity. I would like very much to 
>implement this feature, however i''m not sure 
>that Gluster has any ?? plaintext ?? 
>interactions of this sort.  Is anyone aware of a 
>way to implement this functionality ?  The idea 
>here would be to test whether the port is 
>connectable /and/ that the process behind it is 
>responding intelligently.  I ask because a few 
>weeks ago we had a problem where the glusterfs 
>process was in memory, but wasn''t actually 
>accepting connections or interactions - a 
>connection test would have found this problem 
>immediately. Thank you, all ! -- Daniel Maher 
><dma+gluster AT witbe DOT net> 
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Dec 2008 - monitoring glusterfs ?

[Gluster-users] monitoring glusterfs ?

[Gluster-users] monitoring glusterfs ?

[Gluster-users] monitoring glusterfs ?

[Gluster-users] monitoring glusterfs ?

[Gluster-users] monitoring glusterfs ?