Andreas N
2012-Jan-05 00:23 UTC
[Puppet Users] Puppet agent hangs after running a few hours, defunct sh process
Hi, On a node running Puppet 2.7.9 from apt.puppetlabs.com, using Ubuntu 10.04.3 the agent hangs after a few hours of operation. I have to kill -9 it, nothing else helps. Obviously, this is unfortunate. Looking at ps -ef I see this: root 4842 4594 0 Jan04 pts/0 00:00:55 /usr/bin/ruby1.8 /usr/bin/puppet agent --verbose --no-daemonize --debug root 9803 4842 0 Jan04 pts/0 00:00:00 [sh] <defunct> It seems a defunct sh process is responsible. This has happened before on that node so I started the agent with the command line arguments you see above. Unfortunately the produced debug logs don''t look any different from the debug logs on a node where I haven''t observed this behavior. The logs from the last run can be found here, nonetheless: http://pastie.org/3128200 The problem seems to happen regularly on that particular node but I looked around other nodes we have running and it seems to happen on a few others as well. These nodes don''t have anything in common (not even the puppet master) but do have a few common modules applied. Could this be caused by one of those modules? How would I go about debugging? Or does anyone already know what''s going on here? Thanks, Andreas -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/z6W5nxo-DqAJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Andreas N
2012-Jan-06 07:06 UTC
[Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
Wow, it took quite a while for my post to reach this group. No idea why, is it moderated? Anyway, this problem seems to also happen with agents running Puppet 2.7.6, although apparently less frequently. I''m almost positive it must have something to do with a module but I wouldn''t know how or where to begin debugging. Does anyone have any ideas? Thanks, Andreas -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/1HN_1syVguIJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2012-Jan-06 16:31 UTC
[Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Jan 6, 1:06 am, Andreas N <d...@pseudoterminal.org> wrote:> Wow, it took quite a while for my post to reach this group. No idea why, is > it moderated? > > Anyway, this problem seems to also happen with agents running Puppet 2.7.6, > although apparently less frequently. I''m almost positive it must have > something to do with a module but I wouldn''t know how or where to begin > debugging. > > Does anyone have any ideas?Nothing in your log suggests that the Puppet agent is doing any work when it fails. It appears to apply a catalog successfully, then create a report successfully, then nothing else. That doesn''t seem like a problem in a module. Nevertheless, you could try removing classes from the affected node''s configuration and testing whether Puppet still freezes. You said the agent runs for several hours before it hangs. Does it perform multiple successful runs during that time? That also would tend to counterindicate a problem in your manifests. I''m suspicious that something else on your systems is interfering with the Puppet process; some kind of service manager, for example. You''ll have to say whether that''s a reasonable guess. Alternatively, you may have a system-level bug; there have been a few Ruby bugs and kernel regressions that interfered with Puppet operation. You could try using strace to determine where the failure happens, though that''s not as simple as it may sound. You could also try just sidestepping the problem by using cron to launch puppetd --runonce at your desired intervals, instead of leaving puppetd running in daemon mode. A fair number of people seem to run Puppet that way, and it has some advantages. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Andreas N
2012-Jan-08 03:40 UTC
[Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Friday, January 6, 2012 5:31:34 PM UTC+1, jcbollinger wrote:> > > Nothing in your log suggests that the Puppet agent is doing any work > when it fails. It appears to apply a catalog successfully, then > create a report successfully, then nothing else. That doesn''t seem > like a problem in a module. Nevertheless, you could try removing > classes from the affected node''s configuration and testing whether > Puppet still freezes. >John, thanks for your reply. I''ll be deploying a node that includes no modules at all and see if a zombie process appears again.> You said the agent runs for several hours before it hangs. Does it > perform multiple successful runs during that time? That also would > tend to counterindicate a problem in your manifests. >Yes, the agents perform several runs (with no changes to the catalog) and then simply freeze up, waiting for the defunct sh process to return.> I''m suspicious that something else on your systems is interfering with > the Puppet process; some kind of service manager, for example. You''ll > have to say whether that''s a reasonable guess. Alternatively, you may > have a system-level bug; there have been a few Ruby bugs and kernel > regressions that interfered with Puppet operation. >Those are all pretty plain Ubuntu 10.04.3 server installations (both i386 and x86_64), especially the ones I deployed this week, which aren''t in production yet. What kind of service manager could there even be that interferes?> You could try using strace to determine where the failure happens, > though that''s not as simple as it may sound. >Simply trying to strace the zombie process only results in an "Operation not permitted". The agent process shows these lines repeatedly: Process 3741 attached - interrupt to quit select(8, [7], NULL, NULL, {1, 723393}) = 0 (Timeout) sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 ... That doesn''t tell me anything other than that the puppet agent is blocking on select() with a timeout of two seconds. You could also try just sidestepping the problem by using cron to> launch puppetd --runonce at your desired intervals, instead of leaving > puppetd running in daemon mode. A fair number of people seem to run > Puppet that way, and it has some advantages. >Thanks, that''s a good idea that I will probably have to resort to if the problem doesn''t go away. Andreas -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/z-sG9Y7q6vQJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2012-Jan-08 04:26 UTC
Re: [Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Thu, Jan 5, 2012 at 11:06 PM, Andreas N <daff@pseudoterminal.org> wrote:> Wow, it took quite a while for my post to reach this group. No idea why, > is it moderated? > >We moderate the first post from everyone to stop spam getting through. This sucks, but it sucks less than the other alternatives of moderating every post, or approving membership manually. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Andreas N
2012-Jan-08 04:45 UTC
Re: [Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Sunday, January 8, 2012 5:26:50 AM UTC+1, Nigel Kersten wrote:> > We moderate the first post from everyone to stop spam getting through. > > This sucks, but it sucks less than the other alternatives of moderating > every post, or approving membership manually. >Nigel, good to know, thanks! Andreas -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/gJTnMTIR1cEJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2012-Jan-09 14:56 UTC
[Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Jan 7, 9:40 pm, Andreas N <d...@pseudoterminal.org> wrote:> On Friday, January 6, 2012 5:31:34 PM UTC+1, jcbollinger wrote: > > > Nothing in your log suggests that the Puppet agent is doing any work > > when it fails. It appears to apply a catalog successfully, then > > create a report successfully, then nothing else. That doesn''t seem > > like a problem in a module. Nevertheless, you could try removing > > classes from the affected node''s configuration and testing whether > > Puppet still freezes. > > John, thanks for your reply. I''ll be deploying a node that includes no > modules at all and see if a zombie process appears again. > > > You said the agent runs for several hours before it hangs. Does it > > perform multiple successful runs during that time? That also would > > tend to counterindicate a problem in your manifests. > > Yes, the agents perform several runs (with no changes to the catalog) and > then simply freeze up, waiting for the defunct sh process to return. > > > I''m suspicious that something else on your systems is interfering with > > the Puppet process; some kind of service manager, for example. You''ll > > have to say whether that''s a reasonable guess. Alternatively, you may > > have a system-level bug; there have been a few Ruby bugs and kernel > > regressions that interfered with Puppet operation. > > Those are all pretty plain Ubuntu 10.04.3 server installations (both i386 > and x86_64), especially the ones I deployed this week, which aren''t in > production yet. What kind of service manager could there even be that > interferes?I was thinking along the lines of an intrusion detection system, or perhaps a monitoring / management tool such as Nagios. That''s not to say that I suspect Nagios in particular -- a lot of people seem to use it together with Puppet with great success. It sounds like such a thing is not in your picture, however.> > You could try using strace to determine where the failure happens, > > though that''s not as simple as it may sound. > > Simply trying to strace the zombie process only results in an "Operation > not permitted". The agent process shows these lines repeatedly: > > Process 3741 attached - interrupt to quit > select(8, [7], NULL, NULL, {1, 723393}) = 0 (Timeout) > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > ... > > That doesn''t tell me anything other than that the puppet agent is blocking > on select() with a timeout of two seconds.I kinda meant to trace a new agent process so as to catch whatever happens when it transitions to non-functional state. Nevertheless, the trace does yield a bit of information. In particular, it shows that the agent is not fully blocked. In that case, the fact that it has a defunct child process that it has not collected makes me even more suspect a Ruby bug. I am also a bit curious what open FD 7 that Puppet is selecting for might be, but I don''t think that''s directly related to your issue. I suggest you compare the Ruby and kernel versions installed on the affected nodes to those installed on unaffected nodes. It may also be useful to compare the Puppet configuration (/etc/puppet/puppet.conf) on failing nodes to those on non-failing nodes to see whether there any options are set differently. I am especially curious as to whether the ''listen'' option might be enabled when it does not need to be (or does it?), but there might be other significant differences. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jo Rhett
2012-Jan-09 17:40 UTC
Re: [Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Jan 7, 2012, at 7:40 PM, Andreas N wrote:> That doesn''t tell me anything other than that the puppet agent is blocking on select() with a timeout of two seconds.Sounds like #10418. Check your kernel version. https://projects.puppetlabs.com/issues/10418 -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2012-Jan-10 16:41 UTC
[Puppet Users] Re: Puppet agent hangs after running a few hours, defunct sh process
On Jan 9, 11:40 am, Jo Rhett <jrh...@netconsonance.com> wrote:> On Jan 7, 2012, at 7:40 PM, Andreas N wrote: > > > That doesn''t tell me anything other than that the puppet agent is blocking on select() with a timeout of two seconds. > > Sounds like #10418. Check your kernel version. > https://projects.puppetlabs.com/issues/10418It sounds similar, but 10418 is specific to a particular RedHat / CentOS kernel, and the OP is observing his problem on Ubuntu. My awareness of that issue is one of the reasons I advised the OP to look at kernel versions, however. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jo Rhett
2012-Jan-10 21:01 UTC
Re: [Puppet Users] Puppet agent hangs after running a few hours, defunct sh process
The comments in the redhat bug indicated that this breakage came from upstream, as did the fix. So it''s entirely possible that this bug appeared in some Debian kernels, but I don''t know which. On Jan 10, 2012, at 8:41 AM, jcbollinger wrote:> On Jan 9, 11:40 am, Jo Rhett <jrh...@netconsonance.com> wrote: >> On Jan 7, 2012, at 7:40 PM, Andreas N wrote: >> >>> That doesn''t tell me anything other than that the puppet agent is blocking on select() with a timeout of two seconds. >> >> Sounds like #10418. Check your kernel version. >> https://projects.puppetlabs.com/issues/10418 > > It sounds similar, but 10418 is specific to a particular RedHat / > CentOS kernel, and the OP is observing his problem on Ubuntu. My > awareness of that issue is one of the reasons I advised the OP to look > at kernel versions, however. > > > John > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.