Hi, I was wondering how people here monitor puppet runs on the clients. For puppet 0.25.x I enabled reporting and then wrote a nagios plugin to parse the YAML report files that each client returned after a run. Specifically I was looking for any ''failures'' or ''failed_restarts''. Unfortunately with 2.6.2 the format of those YAML files has not only changed but also varies hugely for different hosts depending on how the run went. Plus the sheer size of these files now means it takes too long for PyYAML to parse them (even for only 40 odd hosts). In fact, I don''t understand what the YAML reports are useful for - they don''t appear to realistically be either human or machine readable. Anyway what other approaches are there? I''d like to simply see 2 things: 1) If there were any failures during the puppet run on the client 2) When the last puppet run on each client was (ie. if it was more than 50 mins ago raise a warning) -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
R.I.Pienaar
2010-Nov-11 14:28 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
----- "Tim" <tedwards@eso.org> wrote:> Hi, > > I was wondering how people here monitor puppet runs on the clients. > For puppet 0.25.x I enabled reporting and then wrote a nagios plugin > to parse the YAML report files that each client returned after a run. > Specifically I was looking for any ''failures'' or ''failed_restarts''.for detailed monitoring right now the reports are the only option unfortunately. Ideally http://projects.puppetlabs.com/issues/4339 would get implemented so we can do this better and on the node but alas no joy yet.> 1) If there were any failures during the puppet run on the client > 2) When the last puppet run on each client was (ie. if it was more > than 50 mins ago raise a warning)You can check the ages of the localconfig cache or the state file. The state file will get touched on every run so that will indicate if its running while the local config cache will get updated on each compile so that will show you if the node is getting new catalogs - there arent any obvious syntax errors and your master is up etc -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ohad Levy
2010-Nov-11 14:52 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
On Thu, Nov 11, 2010 at 4:09 PM, Tim <tedwards@eso.org> wrote:> Hi, > > I was wondering how people here monitor puppet runs on the clients. > For puppet 0.25.x I enabled reporting and then wrote a nagios plugin > to parse the YAML report files that each client returned after a run. > Specifically I was looking for any ''failures'' or ''failed_restarts''. > > Unfortunately with 2.6.2 the format of those YAML files has not only > changed but also varies hugely for different hosts depending on how > the run went. Plus the sheer size of these files now means it takes > too long for PyYAML to parse them (even for only 40 odd hosts). > > In fact, I don''t understand what the YAML reports are useful for - > they don''t appear to realistically be either human or machine > readable. > > Anyway what other approaches are there? I''d like to simply see 2 > things: > 1) If there were any failures during the puppet run on the client > 2) When the last puppet run on each client was (ie. if it was more > than 50 mins ago raise a warning) > > Some users of foreman, already utilize its API [1] to provide that sameinformation to nagios. Ohad [1] -http://theforeman.org/projects/foreman/wiki/API --> You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Doug Warner
2010-Nov-12 13:45 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
I use tmz''s puppetstatus scripts [1] [2] and they work great for checking the last run time from Nagios. I also have reports setup w/ tagmail to send me anything with "err" in it. -Doug [1] http://markmail.org/message/m6xi34aljso4w5qq [2] http://tmz.fedorapeople.org/scripts/puppetstatus/ On 11/11/2010 09:09 AM, Tim wrote:> Hi, > > I was wondering how people here monitor puppet runs on the clients. > For puppet 0.25.x I enabled reporting and then wrote a nagios plugin > to parse the YAML report files that each client returned after a run. > Specifically I was looking for any ''failures'' or ''failed_restarts''. > > Unfortunately with 2.6.2 the format of those YAML files has not only > changed but also varies hugely for different hosts depending on how > the run went. Plus the sheer size of these files now means it takes > too long for PyYAML to parse them (even for only 40 odd hosts). > > In fact, I don''t understand what the YAML reports are useful for - > they don''t appear to realistically be either human or machine > readable. > > Anyway what other approaches are there? I''d like to simply see 2 > things: > 1) If there were any failures during the puppet run on the client > 2) When the last puppet run on each client was (ie. if it was more > than 50 mins ago raise a warning) >
In the end I just changed my script to grep for ''Failed'' in the reports YAML files. My script already uses the time of the most recent report YAML file to detect if it''s been too long since the most recent report (eg. if the puppetd process has died or something). I''ll wait for http://projects.puppetlabs.com/issues/4339 to be completed I think. Tim On Nov 12, 2:45 pm, Doug Warner <d...@warner.fm> wrote:> I use tmz''s puppetstatus scripts [1] [2] and they work great for checking the > last run time from Nagios. I also have reports setup w/ tagmail to send me > anything with "err" in it. > > -Doug > > [1]http://markmail.org/message/m6xi34aljso4w5qq > [2]http://tmz.fedorapeople.org/scripts/puppetstatus/ > > On 11/11/2010 09:09 AM, Tim wrote: > > > Hi, > > > I was wondering how people here monitor puppet runs on the clients. > > For puppet 0.25.x I enabled reporting and then wrote a nagios plugin > > to parse the YAML report files that each client returned after a run. > > Specifically I was looking for any ''failures'' or ''failed_restarts''. > > > Unfortunately with 2.6.2 the format of those YAML files has not only > > changed but also varies hugely for different hosts depending on how > > the run went. Plus the sheer size of these files now means it takes > > too long for PyYAML to parse them (even for only 40 odd hosts). > > > In fact, I don''t understand what the YAML reports are useful for - > > they don''t appear to realistically be either human or machine > > readable. > > > Anyway what other approaches are there? I''d like to simply see 2 > > things: > > 1) If there were any failures during the puppet run on the client > > 2) When the last puppet run on each client was (ie. if it was more > > than 50 mins ago raise a warning) > > > > signature.asc > < 1KViewDownload-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nicolas Szalay
2010-Nov-16 07:56 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
Le jeudi 11 novembre 2010 à 06:09 -0800, Tim a écrit :> Hi,Hello,> Anyway what other approaches are there? I''d like to simply see 2 > things: > 1) If there were any failures during the puppet run on the client > 2) When the last puppet run on each client was (ie. if it was more > than 50 mins ago raise a warning)I check point 2 with the help of mcollective and its puppetd agent. See http://www.rottenbytes.info/?p=387 for more information. Regards, Nico.
Brian Gallew
2010-Nov-18 05:27 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
I''ve been thinking about this myself, and I''ve come up with a few possibilities. 1) Leverage the reports on the puppet master. This could be done with a daemon that watched /var/lib/puppet/reports, for instance. 2) Leverage the reports on the puppet clients. Each puppet run could ship the report of the previous puppet run off to nagios via a custom function. It would run behind, though, which is an issue. 3) Leverage Dashboard/Foreman. Both of those have APIs that can be queried to determine host status and get the errors from the report. 4) Leverage puppet''s report subsystem: create another report (e.g. "nagios") and have it send Nagios the correct information. Of all the choice here, I like 4 the best, and it''s what I''m planning on implementing when I''ve got a stock of round tuits. Basically, I''ll get the report status and use send_nsca to send the results to Nagios. Alternatively, if the rest of the team insists that Nagios should do active polling, then I''ll write a check that will query either Foreman or ask the DB directly (which ever is easier). 2010/11/15 Nicolas Szalay <nszalay@qualigaz.com>> Le jeudi 11 novembre 2010 à 06:09 -0800, Tim a écrit : > > Hi, > > Hello, > > > Anyway what other approaches are there? I''d like to simply see 2 > > things: > > 1) If there were any failures during the puppet run on the client > > 2) When the last puppet run on each client was (ie. if it was more > > than 50 mins ago raise a warning) > > I check point 2 with the help of mcollective and its puppetd agent. See > http://www.rottenbytes.info/?p=387 for more information. > > Regards, > > Nico. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
James Turnbull
2010-Nov-18 07:02 UTC
Re: [Puppet Users] Monitor puppet runs on clients with nagios
Brian Gallew wrote:> I''ve been thinking about this myself, and I''ve come up with a few > possibilities. >Brian You might want to also have a look at: http://projects.puppetlabs.com/issues/4339 James -- Puppet Labs - http://www.puppetlabs.com C: 503-734-8571 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.