Nick Moffitt
2010-Dec-20 18:57 UTC
[Puppet Users] "# Only restart if we''re actually running"
I''d like to know the best way to fix the refresh/restart behavior of Service resources without using ensure => running. I know that this is an unpopular requirement, but I do not want puppet to restart dying services before my monitoring system notices. If a service is fragile, I want to be woken up at 3am. In the worst case, ensure => running could restart my service every ten minutes, nagios could check it a few seconds after, and it could die again a few seconds past that. With the right harmonics a service could be effectively 99% downtime and ensure => running would prevent me from finding out. I looked into writing a provider to fix this, but it appears that the provider.restart doesn''t even get *called* by the core service type unless we''re ensure => running or status comes back as running. Now I *do* want the system to enforce the running state at the moment a configuration change has sent a refresh to the service, but not otherwise! So how can I best do this? Ideally I''d like for the ensure => running behavior to obey something like the Exec resource''s "refreshonly" parameter. It seems like this is up at the type level, but is there a simple way to monkey-patch this for now? -- Hey, how come nobody here in the future has a time machine except me? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Mark Stanislav
2010-Dec-20 19:00 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Nick, I would recommend using Nagios event handlers for this if you want Nagios to essentially take the reigns of this problem. That way you will get your alerts and Nagios can react by starting the service again after x number of failures. I understand you may want to do this through Puppet for architecture reasons but that''s just the way I''d do it per your desire. Good luck! -Mark On Dec 20, 2010, at 1:57 PM, Nick Moffitt wrote:> I''d like to know the best way to fix the refresh/restart behavior of > Service resources without using ensure => running. > > I know that this is an unpopular requirement, but I do not want puppet > to restart dying services before my monitoring system notices. If a > service is fragile, I want to be woken up at 3am. In the worst case, > ensure => running could restart my service every ten minutes, nagios > could check it a few seconds after, and it could die again a few seconds > past that. With the right harmonics a service could be effectively 99% > downtime and ensure => running would prevent me from finding out. > > I looked into writing a provider to fix this, but it appears that the > provider.restart doesn''t even get *called* by the core service type > unless we''re ensure => running or status comes back as running. Now I > *do* want the system to enforce the running state at the moment a > configuration change has sent a refresh to the service, but not > otherwise! > > So how can I best do this? Ideally I''d like for the ensure => running > behavior to obey something like the Exec resource''s "refreshonly" > parameter. It seems like this is up at the type level, but is there a > simple way to monkey-patch this for now? > > -- > Hey, how come nobody here in the future > has a time machine except me? > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2010-Dec-20 19:24 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Mark Stanislav:> I would recommend using Nagios event handlers for this if you want > Nagios to essentially take the reigns of this problem. That way you > will get your alerts and Nagios can react by starting the service > again after x number of failures.Actually, this is kind of the opposite of what I want. I want a human to have to restart the service, because otherwise it doesn''t present enough pain for the problem to be fixed more permanently. I have situations where I semi-regularly restart a bloating service, but that''s about as heinous as I''ll get. Once you get used to automated systems propping up your daemons, the decay spreads until you encounter a serious intractable downtime event. I need the relevant people to feel panic when this happens. -- 01234567 <- The amazing* Indent-O-Meter! ^ *: Indent-O-Meter may not actually amaze. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Mark Stanislav
2010-Dec-20 19:40 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
On Dec 20, 2010, at 2:24 PM, Nick Moffitt wrote:> Mark Stanislav: >> I would recommend using Nagios event handlers for this if you want >> Nagios to essentially take the reigns of this problem. That way you >> will get your alerts and Nagios can react by starting the service >> again after x number of failures. > > Actually, this is kind of the opposite of what I want. I want a human > to have to restart the service, because otherwise it doesn''t present > enough pain for the problem to be fixed more permanently. I have > situations where I semi-regularly restart a bloating service, but that''s > about as heinous as I''ll get. > > Once you get used to automated systems propping up your daemons, the > decay spreads until you encounter a serious intractable downtime event. > I need the relevant people to feel panic when this happens.Fault tolerant infrastructure should be the point. Nagios will still blow up their e-mail, pager, phone, IMs until a threshold is hit and when the service restarted because of the event handler, they will get another e-mail. Why not just take a downtime (soft + hard states) report and if it breaches a given threshold a fix obviously needs to be implemented? That or the number of failures to reach a hard state should be reduced so that it''s very apparent a PROBLEM beyond a dead service once a year is happening. Appears that you are trying to solve a training problem rather than an infrastructure automation problem, which is probably why Puppet & Nagios aren''t an ''easy'' solution to fix it with. But I digress, perhaps someone will have a Puppet answer for you nonetheless. Good luck Nick! -Mark> > -- > 01234567 <- The amazing* Indent-O-Meter! > ^ > *: Indent-O-Meter may not actually amaze. > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2010-Dec-20 21:33 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Mark Stanislav:> Fault tolerant infrastructure should be the point.Absolutely, but the granularity of nagios and puppet (Every half hour? Every ten minutes? Every five?) is simply too coarse to qualify as fault-tolerance. Propping a broken service back on its feet at this frequency is worse than nothing, in my opinion. We absolutely design properly highly-available services, but patching over serious crashes at even a one minute resolution would give us false confidence in our architecture. -- "No, I ain''t got a fax machine! I also ain''t got an Apple IIc, polio, or a falcon!" -- Ray, Achewood 2006-11-22 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Dec-21 01:11 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
On Mon, Dec 20, 2010 at 1:33 PM, Nick Moffitt <nick@zork.net> wrote:> Mark Stanislav: > > Fault tolerant infrastructure should be the point. > > Absolutely, but the granularity of nagios and puppet (Every half hour? > Every ten minutes? Every five?) is simply too coarse to qualify as > fault-tolerance. Propping a broken service back on its feet at this > frequency is worse than nothing, in my opinion. > > We absolutely design properly highly-available services, but patching > over serious crashes at even a one minute resolution would give us false > confidence in our architecture. >Can you use the "basic" service provider with fully-specified start/stop/restart commands to achieve what you need?> > -- > "No, I ain''t got a fax machine! I also ain''t got an > Apple IIc, polio, or a falcon!" > -- Ray, Achewood 2006-11-22 > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2010-Dec-21 08:14 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Nigel Kersten:> Can you use the "basic" service provider with fully-specified > start/stop/restart commands to achieve what you need?Are you suggesting that I override the start command to a noop, and make sure the restart command works in that scenario? Thinking that over, it has potential. I suppose it would refrain from starting a crashed service, but it would pass the test in type/provider.rb that''s been causing me grief: # Basically just a synonym for restarting. Used to respond # to events. def refresh # Only restart if we''re actually running if (@parameters[:ensure] || newattr(:ensure)).retrieve == :running provider.restart else debug "Skipping restart; service is not running" end end I think I''ll continue to use the fully-specified provider to gain the enablement features among other things. Thank you, that seems eminently sensible! -- You are not entitled to your opinions. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Bill Proud
2010-Dec-21 10:03 UTC
[Puppet Users] Re: "# Only restart if we''re actually running"
On Dec 20, 7:57 pm, Nick Moffitt <n...@zork.net> wrote:> I''d like to know the best way to fix the refresh/restart behavior of > Service resources without using ensure => running. > > I know that this is an unpopular requirement, but I do not want puppet > to restart dying services before my monitoring system notices. If a > service is fragile, I want to be woken up at 3am. In the worst case, > ensure => running could restart my service every ten minutes, nagios > could check it a few seconds after, and it could die again a few seconds > past that. With the right harmonics a service could be effectively 99% > downtime and ensure => running would prevent me from finding out. > > I looked into writing a provider to fix this, but it appears that the > provider.restart doesn''t even get *called* by the core service type > unless we''re ensure => running or status comes back as running. Now I > *do* want the system to enforce the running state at the moment a > configuration change has sent a refresh to the service, but not > otherwise! > > So how can I best do this? Ideally I''d like for the ensure => running > behavior to obey something like the Exec resource''s "refreshonly" > parameter. It seems like this is up at the type level, but is there a > simple way to monkey-patch this for now? > > -- > Hey, how come nobody here in the future > has a time machine except me?Interesting requirement. I would have thought that the simplest solution would be to not use a service at all but instead notify an exec from the file resource for the configuration. The exec could run a simple script that checks if the application is running and restarts it if it is. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2010-Dec-21 12:02 UTC
Re: [Puppet Users] Re: "# Only restart if we''re actually running"
Bill Proud:> I would have thought that the simplest solution would be to not use a > service at all but instead notify an exec from the file resource for > the configuration. The exec could run a simple script that checks if > the application is running and restarts it if it is.The trouble with execs is that they''re so open-ended that a "puppet agent -t --noop" can''t predict what will happen after one occurs. With a service it can at least assume that the refresh happened successfully and any notifications can trickle onward from there. With an exec it just says "um, except now we''re noop. You''re on your own from here, mate." It''s like that old usenet rant about "given what you just did, it''s perfectly within spec for the compiler to make demons fly out your nose!" Anything at all could happen! I feel like execs are something of a misfeature, but I''d be hard pressed to figure out how to live completely without them. I''m glad I have them available to do things like generate quick unique ssl snakeoil certificates or ssh keys, but I would hate to rely on them for something as central as the service resource. -- Though the great song return no more There''s keen delight in what we have: The rattle of pebbles on the shore Under the receding wave. -- W. B. Yeats -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2010-Dec-21 16:41 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Nick Moffitt:> Are you suggesting that I override the start command to a noop, and make > sure the restart command works in that scenario? Thinking that over, it > has potential. I suppose it would refrain from starting a crashed > service, but it would pass the test in type/provider.rb that''s been > causing me grief: > > # Basically just a synonym for restarting. Used to respond > # to events. > def refresh > # Only restart if we''re actually running > if (@parameters[:ensure] || newattr(:ensure)).retrieve == :running > provider.restart > else > debug "Skipping restart; service is not running" > end > endUnfortunately that doesn''t seem to work: notice: /Stage[main]/Haproxy::Config/File[/etc/default/haproxy]/content: content changed ''{md5}a1f2deb7c7a10e55dc7c971a2288f5d4'' to ''{md5}9854e65621b62147b91ebc2c02cce1c2'' notice: /Stage[main]/Haproxy::Config/File[/etc/default/haproxy]/mode: mode changed ''644'' to ''444'' info: /Stage[main]/Haproxy::Config/File[/etc/default/haproxy]: Scheduling refresh of Service[haproxy] info: /Stage[main]/Haproxy::Config/File[/etc/default/haproxy]: Scheduling refresh of Service[haproxy] debug: Service[haproxy](provider=debian): Executing ''/etc/init.d/haproxy status'' debug: Service[haproxy](provider=debian): Executing ''/bin/true'' notice: /Stage[main]/Haproxy::Service/Service[haproxy]/ensure: ensure changed ''stopped'' to ''running'' debug: Service[haproxy](provider=debian): Executing ''/etc/init.d/haproxy status'' debug: /Stage[main]/Haproxy::Service/Service[haproxy]: Skipping restart; service is not running notice: /Stage[main]/Haproxy::Service/Service[haproxy]: Triggered ''refresh'' from 4 events So it really is checking the currently-running status of the service, and not whether you have set ensure => running. And this happens up in the type code, well outside the provider''s bailiwick. I''m rather disappointed. Nearly every init script I use will start a downed service if you run /etc/init.d/foo restart (often by running an unneeded stop, followed by a start), but it seems that Puppet has engineered the service type not to follow this behavior. Of course I could override the status command to /bin/true as well, but that saddens me greatly: info: /Stage[main]/Haproxy::Config/File[/etc/haproxy/haproxy.cfg]: Scheduling refresh of Service[haproxy] info: /Stage[main]/Haproxy::Config/File[/etc/haproxy/haproxy.cfg]: Scheduling refresh of Service[haproxy] debug: Service[haproxy](provider=debian): Executing ''/bin/true'' debug: Service[haproxy](provider=debian): Executing ''/bin/true'' debug: Service[haproxy](provider=debian): Executing ''/etc/init.d/haproxy restart'' notice: /Stage[main]/Haproxy::Service/Service[haproxy]: Triggered ''refresh'' from 4 events I suppose if I''m only interested in unconditional restarts and enablement, neutering the status command is probably not that diastrous. It just seems a bridge too far to me, somehow. Still, I guess that''s the path I''m taking now. -- BitKeeper, how quaint. -- Alan Cox -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Dec-21 16:47 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
On Tue, Dec 21, 2010 at 12:14 AM, Nick Moffitt <nick@zork.net> wrote:> Nigel Kersten: > > Can you use the "basic" service provider with fully-specified > > start/stop/restart commands to achieve what you need? > > Are you suggesting that I override the start command to a noop, and make > sure the restart command works in that scenario? Thinking that over, it > has potential. I suppose it would refrain from starting a crashed > service, but it would pass the test in type/provider.rb that''s been > causing me grief: > > # Basically just a synonym for restarting. Used to respond > # to events. > def refresh > # Only restart if we''re actually running > if (@parameters[:ensure] || newattr(:ensure)).retrieve == :running > provider.restart > else > debug "Skipping restart; service is not running" > end > end > > I think I''ll continue to use the fully-specified provider to gain the > enablement features among other things. > > Thank you, that seems eminently sensible! >I guess I''m suggesting that you come up with a set of commands for start/stop/restart that gives you the behavior you want. I''d try and see if you can achieve this with provider => basic, and if you can, you can then either: * write a defined DSL type that abstracts away the complexity in the commands you''re specifying. * write a native Ruby provider that does the same thing, deliver with pluginsync. The first is quicker to scaffold, but means you won''t be using "service" anymore for each resource instance, the second will be a matter of specifying a specific provider with each "service" resource instance, or as a resource default.> -- > You are not entitled to your opinions. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Craig Miskell
2010-Dec-21 19:25 UTC
Re: [Puppet Users] Re: "# Only restart if we''re actually running"
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nick Moffitt wrote:> Bill Proud: >> I would have thought that the simplest solution would be to not use a >> service at all but instead notify an exec from the file resource for >> the configuration. The exec could run a simple script that checks if >> the application is running and restarts it if it is. > > The trouble with execs is that they''re so open-ended that a "puppet > agent -t --noop" can''t predict what will happen after one occurs. With > a service it can at least assume that the refresh happened successfully > and any notifications can trickle onward from there. With an exec it > just says "um, except now we''re noop. You''re on your own from here, > mate." It''s like that old usenet rant about "given what you just did, > it''s perfectly within spec for the compiler to make demons fly out your > nose!" Anything at all could happen! > > I feel like execs are something of a misfeature, but I''d be hard pressed > to figure out how to live completely without them. I''m glad I have them > available to do things like generate quick unique ssl snakeoil > certificates or ssh keys, but I would hate to rely on them for something > as central as the service resource.In that case, how about modifying the /etc/init.d/<app> script (assuming Linux and a standard sort of init.d script here) that does the restart and/or status check to behave differently in the two different cases. In the case of a crashed/dead app, the PID file will exist, but there won''t be a process on that PID (or rather, there won''t be a correctly named process with that PID). That''s a dead app, so don''t restart even if Puppet (or anyone else) asks you to. If there *is* an app at that PID, then the app is running and needs to be restarted/reloaded as requested. - -- Craig Miskell Senior Systems Administrator Opus International Consultants Phone: +64 4 471 7209 OpenVMS: The OS with uptimes longer than MS Windows support policies -- Browser window title of the www.openvms.org website -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAk0Q/0IACgkQmDveRtxWqnY7RwCghes4alH1WCDzFXFGVuYYUEPX b9sAoKCOCJQHbfi5JIb4DQh/rPmt0X2V =qOXR -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Alan Barrett
2010-Dec-22 11:59 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
On Mon, 20 Dec 2010, Nick Moffitt wrote:> With the right harmonics a service could be effectively 99% > downtime and ensure => running would prevent me from finding out.The puppet logs would report that the service was being started over and over. I don''t use Puppet Dashboard, but perhaps it can do enough log analysis to help? --apb (Alan Barrett) -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Thomas Bellman
2011-Jan-05 14:15 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
On 2010-12-21 09:14, Nick Moffitt wrote:> Nigel Kersten: >> Can you use the "basic" service provider with fully-specified >> start/stop/restart commands to achieve what you need? > > Are you suggesting that I override the start command to a noop, and make > sure the restart command works in that scenario? Thinking that over, it > has potential. I suppose it would refrain from starting a crashed > service, but it would pass the test in type/provider.rb that''s been > causing me grief:Some init scripts support a ''condrestart'' action, that will only restart the service if it is already running. For example, on a Fedora machine: # service sshd status openssh-daemon (pid 21829) is running... # service sshd condrestart Stopping sshd: [ OK ] Starting sshd: [ OK ] # service sshd status openssh-daemon (pid 21900) is running... # kill -9 21900 # service sshd status openssh-daemon dead but pid file exists # service sshd condrestart # service sshd status openssh-daemon dead but pid file exists You should be able to write something like this (using the sshd service as an example): service { "sshd": enable => true, ensure => undef, hasstatus => true, restart => "/sbin/service sshd condrestart", subscribe => File["/etc/ssh/sshd_config"]; } I *think* this works the way you want. You would have to make sure that the init script has a condrestart action that works properly. If not, you would have to write your own command in the restart parameter that works the way you want. /Bellman -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nick Moffitt
2011-Jan-05 14:57 UTC
Re: [Puppet Users] "# Only restart if we''re actually running"
Thomas Bellman:> Some init scripts support a ''condrestart'' action, that will only > restart the service if it is already running.That is one behavior I was trying to avoid. I wanted a service to be started or restarted iff it had been notified. Neutering the start and stop commands did this handily, it seems. -- "If, as they say, God spanked the town for being over frisky, why did He burn the churches down and save Hotaling''s whisky?" -- 1906 SF Earthquake rhyme -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.