Hello List, We are using puppet to manage a growing number of Debian Etch based servers (currently 70). Since upgrading to 0.22.4 we encountered a problem when services do not restarted on puppets request. For example the Nagios remote plugin executor daemon (nrpe). It''s running daemonized and its confiugration is located in /etc/nagios/nrpe.cfg. This file is managed through puppet (fileserver). The init script is located in /etc/init.d/nagios-nrpe-server. Here is the code snip of the manifest: file { "/etc/nagios/nrpe.cfg": notify => service[nagios-nrpe-server], require => Package["nagios-nrpe-server"], } service { nagios-nrpe-server: hasrestart => true } on any changes in nrpe.cfg the files get synced on the puppet client and puppetd writes out the following messages when running in debug mode: info: Filebucket[/var/lib/puppet/clientbucket]: Adding /etc/nagios/nrpe.cfg(d74c3c576cbc9c400407f679af44ed0e) info: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Filebucketed to puppet with sum d74c3c576cbc9c400407f679af44ed0e debug: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/checksum: Replacing /etc/nagios/nrpe.cfg checksum {md5}d74c3c576cbc9c400407f679af44ed0e with {md5}efe2fd5379d213d40f4ddf76ec1d2fcd notice: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/source: replacing from source puppet://puppet.mm-karton.com:8141/files//etc/nagios/nrpe.cfg with contents {md5}efe2fd5379d213d40f4ddf76ec1d2fcd info: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Scheduling refresh of Service[nagios-nrpe-server] notice: //base/nagios/Service[nagios-nrpe-server]: Triggering ''refresh'' from 1 dependencies debug: Calling fileserver.describe debug: Calling fileserver.describe debug: Calling fileserver.describe debug: Calling fileserver.describe As you can see puppetd says the service needs to refreshed. But it never gets. If I remember correctly I was able to see in earlier version which command puppet tries to invoke to refresh a service. It''s also reproducable with the squid configuration file which is generated via template(). puppetd says the service needs to refreshed, but it never gets so. I have tried now: * specifying hasrestart => true * specifying restart command (/etc/init.d/nagios-nrpe-server restart) * tried the other way with subscribe to the nrpe.cfg file Anyone has a clue why the services do not get restarted? Thanks, Andreas
Hi Andreas On May 25, 2007, at 12:40, Andreas Unterkircher wrote:> service { nagios-nrpe-server: > hasrestart => true > }This is just a hunch: I think puppet uses the name to look for the service in the process list. So maybe you should try to provide a pattern parameter: pattern => "/usr/sbin/nrpe-server", (I don''t know if it''s the right pattern...!) -- Med venlig hilsen Juri Rischel Jensen Fab:IT ApS Vesterbrogade 50 DK-1620 København Tlf: 70 202 407 / Fax: 33 313 640 www.fab-it.dk / juri@fab-it.dk
On May 25, 2007, at 5:40 AM, Andreas Unterkircher wrote:> notice: //base/nagios/Service[nagios-nrpe-server]: Triggering > ''refresh'' from 1 dependenciesThis is where Puppet is trying to restart the service. So, it''s trying to do so, it''s just not succeeding. Are you sure that ''/etc/init.d/nagios-nrpe-server restart'' works? Because that''s what Puppet is doing here.> It''s also reproducable with the squid configuration file which is > generated > via template(). puppetd says the service needs to refreshed, but it > never > gets so. > > I have tried now: > > * specifying hasrestart => true > * specifying restart command (/etc/init.d/nagios-nrpe-server restart) > * tried the other way with subscribe to the nrpe.cfg file > > Anyone has a clue why the services do not get restarted?My guess is that the ''restart'' command is not working in the init script. -- The leader of Jamestown was "John Smith" (not his real name), under whose direction the colony engaged in a number of activities, primarily related to starving. -- Dave Barry, "Dave Barry Slept Here" --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Hi Juri,> This is just a hunch: I think puppet uses the name to look for the > service in the process list. So maybe you should try to provide a > pattern parameter:Thanks for that hint. I tried to specify the pattern (/usr/sbin/nrpe or a regexp like below) but it didn''t helped. service { "nagios-nrpe-server": hasrestart => true, pattern => ".*/usr/sbin/nrpe.*", } debug: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/checksum: Replacing /etc/nagios/nrpe.cfg checksum {md5}d74c3c576cbc9c400407f679af44ed0e with {md5}efe2fd5379d213d40f4ddf76ec1d2fcd notice: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/source: replacing from source puppet://puppet.mm-karton.com:8141/files//etc/nagios/nrpe.cfg with contents {md5}efe2fd5379d213d40f4ddf76ec1d2fcd info: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Scheduling refresh of Service[nagios-nrpe-server] notice: //base/nagios/Service[nagios-nrpe-server]: Triggering ''refresh'' from 1 dependencies debug: Calling fileserver.describe debug: Calling fileserver.describe debug: Calling fileserver.describe ... For squid the init script name and process name is the same. Cheers, Andreas
Quoting Luke Kanies <luke@madstop.com>:> My guess is that the ''restart'' command is not working in the init > script.Sure, it work like a charm: someserver:~# /etc/init.d/nagios-nrpe-server restart Stopping nagios-nrpe: nagios-nrpe. Starting nagios-nrpe: nagios-nrpe. someserver:~# echo $? But what I miss - I strongly remember that in previous versions puppet wrote out the command it tries to invoke. Or am I confused? Cheers, Andreas
On May 25, 2007, at 10:12 AM, Andreas Unterkircher wrote:> Quoting Luke Kanies <luke@madstop.com>: >> My guess is that the ''restart'' command is not working in the init >> script. > > Sure, it work like a charm: > > someserver:~# /etc/init.d/nagios-nrpe-server restart > Stopping nagios-nrpe: nagios-nrpe. > Starting nagios-nrpe: nagios-nrpe. > someserver:~# echo $? > > > But what I miss - I strongly remember that in previous > versions puppet wrote out the command it tries to invoke. > Or am I confused?You are correct; it should produce a debug message telling you the command being run. What version are you using? -- To have a right to do a thing is not at all the same as to be right in doing it. -- G. K. Chesterton --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On May 25, 2007, at 10:07 AM, Andreas Unterkircher wrote:> > Thanks for that hint. I tried to specify the pattern (/usr/sbin/nrpe > or a regexp like below) but it didn''t helped. > > service { "nagios-nrpe-server": > hasrestart => true, > pattern => ".*/usr/sbin/nrpe.*", > }Any Ruby regex should work.> debug: > //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/ > nrpe.cfg]/checksum: Replacing /etc/nagios/nrpe.cfg checksum {md5} > d74c3c576cbc9c400407f679af44ed0e with > {md5}efe2fd5379d213d40f4ddf76ec1d2fcd > notice: > //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/ > nrpe.cfg]/source: replacing from source puppet://puppet.mm- > karton.com:8141/files//etc/nagios/nrpe.cfg with contents > {md5}efe2fd5379d213d40f4ddf76ec1d2fcd > info: > //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/ > nrpe.cfg]: > Scheduling refresh of Service[nagios-nrpe-server] > notice: //base/nagios/Service[nagios-nrpe-server]: Triggering > ''refresh'' from 1 dependenciesOk, so we''ve established that for some reason your services are getting triggered but that trigger is not actually restarting the machines. What platform are you on and what Puppet version are you using? -- The salesman asked me what size I wore, I told him extra-medium. -- Stephen Wright --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Quoting Luke Kanies <luke@madstop.com>:> What platform are you on and what Puppet version are you using?Plattform is Debian etch (stable). puppet as well as the puppetmaster server are using version 0.22.4. ruby is used in version 1.8.5 Cheers, Andreas
On May 25, 2007, at 10:33 AM, Andreas Unterkircher wrote:> Quoting Luke Kanies <luke@madstop.com>: >> What platform are you on and what Puppet version are you using? > > Plattform is Debian etch (stable). puppet as well as the puppetmaster > server are using version 0.22.4. ruby is used in version 1.8.5This is what I get from my Debian box: debug: Service[apache2](provider=debian): Executing ''ps -ef'' debug: Service[apache2](provider=debian): PID is 32632 notice: //basenode/culain/webserver/Service[apache2]: Triggering ''refresh'' from 1 dependencies debug: Service[apache2](provider=debian): Executing ''ps -ef'' debug: Service[apache2](provider=debian): PID is 32632 debug: Service[apache2](provider=debian): Executing ''/etc/init.d/ apache2 stop'' debug: Service[apache2](provider=debian): Executing ''/etc/init.d/ apache2 start'' So, you should be getting those logs. Can you try explicitly specifying the debian provider? I can''t imagine that would be the problem, but then, I''m pretty confused on what could be the problem at all. -- Dawkins''s Law of Adversarial Debate: When two incompatible beliefs are advocated with equal intensity, the truth does not lie half way between them. --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Quoting Luke Kanies <luke@madstop.com>:> Can you try explicitly specifying the debian provider?Ok, I specified it like this: service { "nagios-nrpe-server": provider => debian, hasrestart => true, } But sadly again the same: debug: Calling fileserver.describedebug: Calling fileserver.describedebug: Calling fileserver.describedebug: Calling fileserver.describedebug: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Changing sourcedebug: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: 1 change(s)debug: Calling fileserver.retrieve info: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Filebucketed to puppet with sum d74c3c576cbc9c400407f679af44ed0e debug: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/checksum: Replacing /etc/nagios/nrpe.cfg checksum {md5}d74c3c576cbc9c400407f679af44ed0e with {md5}efe2fd5379d213d40f4ddf76ec1d2fcd notice: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]/source: replacing from source puppet://puppet.mm-karton.com:8141/files//etc/nagios/nrpe.cfg with contents {md5}efe2fd5379d213d40f4ddf76ec1d2fcd info: //base/nagios/remotefile[/etc/nagios/nrpe.cfg]/File[/etc/nagios/nrpe.cfg]: Scheduling refresh of Service[nagios-nrpe-server] notice: //base/nagios/Service[nagios-nrpe-server]: Triggering ''refresh'' from 1 dependencies debug: Calling fileserver.describe debug: Calling fileserver.describe debug: Calling fileserver.describe debug: Calling fileserver.describe Do you have any hint where to place some debug messages in the ruby source code? Is this functionality in component.rb? Cheers, Andreas
On May 25, 2007, at 11:30 AM, Andreas Unterkircher wrote:> Quoting Luke Kanies <luke@madstop.com>: >> Can you try explicitly specifying the debian provider? > > Ok, I specified it like this: > > service { "nagios-nrpe-server": > provider => debian, > hasrestart => true, > } > > But sadly again the same:[...] Very strange.> Do you have any hint where to place some debug messages in the ruby > source code? Is this functionality in component.rb?Look in lib/puppet/provider/service/base.rb (for the ''refresh'' and ''texecute'' methods) or lib/puppet/provider/service/init.rb (for the ''restart'' method). You should be able to modify ''texecute'' to just print the command being executed. -- Always be wary of any helpful item that weighs less than its operating manual. -- Terry Pratchett --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Quoting Luke Kanies <luke@madstop.com>:> Look in lib/puppet/provider/service/base.rb (for the ''refresh'' and > ''texecute'' methods) or lib/puppet/provider/service/init.rb (for the > ''restart'' method).Because you wrote "refresh", this is the content of the service directory: unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ ls -l total 32 -rw-r--r-- 1 root root 4513 2007-05-25 19:24 base.rb -rw-r--r-- 1 root root 932 2006-12-28 09:06 debian.rb -rw-r--r-- 1 root root 1349 2006-12-28 01:24 gentoo.rb -rw-r--r-- 1 root root 4227 2006-12-28 09:06 init.rb -rw-r--r-- 1 root root 1703 2007-03-19 23:13 redhat.rb -rw-r--r-- 1 root root 2325 2007-03-18 18:35 smf.rb unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ grep -i refresh * unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ But none of the files contain any word of "refresh".> You should be able to modify ''texecute'' to just print the command > being executed.I''ve added this to base.rp: ... def texecute(type, command, fof = true) print "Command to be executed: type: $type, cmd: $command \n" begin ... But it never gets called. Also not restart in init.rb where I tried to print the value of @model[:hasrestart]. Cheers, Andreas
On May 25, 2007, at 12:34 PM, Andreas Unterkircher wrote:> Quoting Luke Kanies <luke@madstop.com>: > >> Look in lib/puppet/provider/service/base.rb (for the ''refresh'' and >> ''texecute'' methods) or lib/puppet/provider/service/init.rb (for the >> ''restart'' method). > > Because you wrote "refresh", this is the content of the service > directory: > > unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ ls -l > total 32 > -rw-r--r-- 1 root root 4513 2007-05-25 19:24 base.rb > -rw-r--r-- 1 root root 932 2006-12-28 09:06 debian.rb > -rw-r--r-- 1 root root 1349 2006-12-28 01:24 gentoo.rb > -rw-r--r-- 1 root root 4227 2006-12-28 09:06 init.rb > -rw-r--r-- 1 root root 1703 2007-03-19 23:13 redhat.rb > -rw-r--r-- 1 root root 2325 2007-03-18 18:35 smf.rb > unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ grep -i > refresh * > unki@srv-vis-31:/usr/lib/ruby/1.8/puppet/provider/service$ > > But none of the files contain any word of "refresh".Sorry; the method in base.rb is ''restart'', not ''refresh''.>> You should be able to modify ''texecute'' to just print the command >> being executed. > > I''ve added this to base.rp: > > ... > def texecute(type, command, fof = true) > print "Command to be executed: type: $type, cmd: $command \n" > begin > ... > > But it never gets called. Also not restart in init.rb where I tried > to print the value of @model[:hasrestart].I just figured it out. type/service.rb has this code in its refresh () method (which is what the transaction calls): if ens = @parameters[:ensure] and ens.should == :running and ens.retrieve == :running provider.restart end If I remember your example correctly, you don''t specify a value for ''ensure''. Thus, this code will return false, meaning that refresh() is being called on the resource but it''s not calling restart() on the provider. Try adding ensure => running to your services. I don''t know what the right solution here is; maybe it''s a documentation problem? Or there should be a warning if neither ensure nor enabled is being managed? -- The most likely way for the world to be destroyed, most experts agree, is by accident. That''s where we come in; we''re computer professionals. We cause accidents. --Nathaniel Borenstein --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On 5/25/2007 2:19 PM, Luke Kanies wrote:> If I remember your example correctly, you don''t specify a value for > ''ensure''. Thus, this code will return false, meaning that refresh() > is being called on the resource but it''s not calling restart() on the > provider. > > Try adding ensure => running to your services. > > I don''t know what the right solution here is; maybe it''s a > documentation problem? Or there should be a warning if neither > ensure nor enabled is being managed?My initial guess would be to give a warning. I''d expect that if I define a service to be managed, I have requirements on that service''s availability: either it should start on bootup and stay running (cron, sshd), be disabled entirely (ntpd on Xen hosts), etc. The only situation I can imagine for not defining ensure or enabled would be if: 1. I was perfectly happy with the service being running or not, and 2. I would want to be able to restart it automatically whenever some other event happened. But point 2 tends to contradict point 1. You can''t technically restart a service that wasn''t already running. -- Mike Renfro / R&D Engineer, Center for Manufacturing Research, 931 372-3601 / Tennessee Technological University -- renfro@tntech.edu
On May 25, 2007, at 3:05 PM, Mike Renfro wrote:> > My initial guess would be to give a warning. I''d expect that if I > define > a service to be managed, I have requirements on that service''s > availability: either it should start on bootup and stay running (cron, > sshd), be disabled entirely (ntpd on Xen hosts), etc.I contually get burned when making assumptions about what people expect from Puppet; I think services used to default to ''running'', but I had to disable it because of complaints. That being said, there is now a warning if neither ''ensure'' nor ''enabled'' is specified.> The only situation I can imagine for not defining ensure or enabled > would be if: > > 1. I was perfectly happy with the service being running or not, and > 2. I would want to be able to restart it automatically whenever some > other event happened. > > But point 2 tends to contradict point 1. You can''t technically > restart a > service that wasn''t already running.I agree. -- The world tolerates conceit from those who are successful, but not from anybody else. -- John Blake --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Hi Luke, Quoting Luke Kanies <luke@madstop.com>:> Try adding ensure => running to your services.You are right "ensure => running" and everything works! Thank you!> I don''t know what the right solution here is; maybe it''s a > documentation problem? Or there should be a warning if neither > ensure nor enabled is being managed?I would say, if you configure a notify/subscribe then you normally want that a running process gets notified of changes in a puppet controlled file. But if there is a situation (why ever) where on all nodes this process has been stopped by yourself manually and you only want to rollout a new config file, there should be no need to change puppets manifest and remove the "ensure" so that puppetd does not start the service again. What about a default behavior (when ensure => running is not specified), if a process is there, restart it, if no process is there, don''t do anything and only replace the files. And well documented :) For me puppet is not an service monitoring software which ensure availability of services. This is a job for other software... but as a available puppet option it''s fine. Cheers, Andreas
On May 26, 2007, at 2:12 AM, Andreas Unterkircher wrote:> Hi Luke, > > Quoting Luke Kanies <luke@madstop.com>: >> Try adding ensure => running to your services. > > You are right "ensure => running" and everything works! > Thank you!Sorry it took so long to figure out.>> I don''t know what the right solution here is; maybe it''s a >> documentation problem? Or there should be a warning if neither >> ensure nor enabled is being managed? > > I would say, if you configure a notify/subscribe then you normally > want that a running process gets notified of changes in a puppet > controlled file.I specifically added that check because someone else expected this behaviour. I think it''s reasonable to require that you specify a value for ''ensure'' if you want the service restarted.> But if there is a situation (why ever) where on all nodes this process > has been stopped by yourself manually and you only want to rollout a > new config file, there should be no need to change puppets manifest > and remove the "ensure" so that puppetd does not start the service > again.I don''t really understand what you''re saying -- it should only restart the service when it''s running> What about a default behavior (when ensure => running is not > specified), > if a process is there, restart it, if no process is there, don''t do > anything and only replace the files. And well documented :)Hmm. What do others think about this? It''ll be relatively annoying to implement, and it would be behaviour very different from other resource types.> For me puppet is not an service monitoring software which ensure > availability of services. This is a job for other software... but > as a available puppet option it''s fine.Sure, but Puppet does need to know how to start services. -- The conception of two people living together for twenty-five years without having a cross word suggests a lack of spirit only to be admired in sheep. --Alan Patrick Herbert --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Quoting Luke Kanies <luke@madstop.com>:>> But if there is a situation (why ever) where on all nodes this process >> has been stopped by yourself manually and you only want to rollout a >> new config file, there should be no need to change puppets manifest >> and remove the "ensure" so that puppetd does not start the service >> again.> I don''t really understand what you''re saying -- it should only > restart the service when it''s runningI mean it should restart a service if the service is already running. If not, don''t touch the service. This would be my first guess if I think about how notify/subscribe will affect a service (if ensure => running is not specified). Think about you are planing a PHP upgrade on your Apache webservers. Now you have stopped Apache manually on x webservers and you want to roll out a new httpd.conf before the upgrade takes place. After this you upgrade PHP on every of this x webservers with apt. In this situation puppet should not care about the service status when serving httpd.conf. Only a example... So this is my argue that puppet should not require ensure => running to notify an already running service.> Hmm. What do others think about this? It''ll be relatively annoying > to implement, and it would be behaviour very different from other > resource types.I wouldn''t spend too much effort into that. What about if ensure => running is not specified, puppet could use the status option of the init script and refresh it only when it''s active. Cheers, Andreas
On May 27, 2007, at 1:21 AM, Andreas Unterkircher wrote:> > I mean it should restart a service if the service is already running. > If not, don''t touch the service. This would be my first guess if I > think about how notify/subscribe will affect a service (if ensure => > running is not specified).That''s basically how it works now, except that Puppet assumes that it should ignore services if you aren''t telling Puppet it should be running (which is defferent from whether the service is running).> Think about you are planing a PHP upgrade on your Apache webservers. > Now you have stopped Apache manually on x webservers and you want to > roll out a new httpd.conf before the upgrade takes place. After > this you upgrade PHP on every of this x webservers with apt. In this > situation puppet should not care about the service status when serving > httpd.conf. Only a example...Hmm. I generally only think about those cases where you''re using Puppet for all of your work, so I can see why I didn''t think of this case.> So this is my argue that puppet should not require ensure => running > to notify an already running service. > >> Hmm. What do others think about this? It''ll be relatively annoying >> to implement, and it would be behaviour very different from other >> resource types. > > I wouldn''t spend too much effort into that. What about if ensure => > running is not specified, puppet could use the status option of the > init script and refresh it only when it''s active.Go ahead and open an enhancement request for it. The feature should be easy, but the test code will be annoying. -- Today at work an ethernet switch decided to take the ''N'' out of NVRAM -- Richard Letts --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com