Pranith Kumar Karampuri
2018-Jul-23 14:03 UTC
[Gluster-users] Subject: Help needed in improving monitoring in Gluster
Hi, We want gluster's monitoring/observability to be as easy as possible going forward. As part of reaching this goal we are starting this initiative to add improvements to existing apis/commands and create new apis/commands to gluster so that the admin can integrate it with whichever monitoring tool he/she likes. The gluster-prometheus project hosted at https://github.com/gluster/gluster-prometheus is the direction in which we feel metrics can be collected and distributed from a Gluster cluster enabling analysis and visualization. As a first step we want to hear from you what you feel needs to be addressed. Here are some questions we came up with: 1) How do you monitor if the volumes/gluster management plane are behaving as expected? 2) How do you monitor performance of the volumes/gluster management plane? 3) What are the problems at the moment that take very long before you find that gluster is not behaving as expected? 4) Are there any gaps that need to be addressed which will add missing information in the existing commands? 5) What are the aspects of gluster that you wish to monitor but are not easily able to? 6) What existing monitoring commands in gluster do you wish to use at regular intervals but you don't because they are too slow/error-prone? We will be converting the responses we receive until 30th of this month to github issues and come up with a roadmap for the first release of this project. Appreciate your insights and feedback. Thanks in advance, On behalf of the team(github handles) Pranith(@pranithk), Venkata(@vredara), Sridhar(@sseshasa). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180723/f81b219b/attachment.html>
Maarten van Baarsel
2018-Jul-23 14:54 UTC
[Gluster-users] Subject: Help needed in improving monitoring in Gluster
On 23/07/18 16:03, Pranith Kumar Karampuri wrote:> We want gluster's monitoring/observability to be as easy as possible > going forward. As part of reaching this goal we are starting this > initiative to add improvements to existing apis/commands and create new > apis/commands to gluster so that the admin can integrate it with > whichever monitoring tool he/she likes. The gluster-prometheus project > hosted at https://github.com/gluster/gluster-prometheus is the direction > in which we feel metrics can be collected and distributed from a Gluster > cluster enabling analysis and visualization. > > > As a first step we want to hear from you what you feel needs to be > addressed.Regarding monitoring; I would love to see in my monitoring that geo-replication is working as intended; at the moment I'm faking georep monitoring by having a process touch a file (every server involved in gluster touches another file) on every volume and checking mtime on the slave. However, I discovered that this is not foolproof: if the georep run stops for whatever reason the mtime of the monitored file is being kept updated, probably because it's updated to often, but the georep is not complete. I've also seen that a crashed glusterd escapes this monitoring. What would also be fun is some kind of monitoring where you can find out why gluster is running at X MB/sec where Y MB/sec is expected (bit large target, that) I've once tried monitoring 'gluster volume status all' output but that only works if everything is OK; with some network problems you can wait for hours for output which then causes more problems. Also, I've checked the example output at https://github.com/gluster/gluster-prometheus: would JSON or something like that be more friendy to parse instead of the "[parameter] { [details] } [number]" format? thanks, Maarten.