Hi, I''ve recently started to have a problem where some of my clients puppetd processes are locking up (the puppetdlock file is several hours old). My server is running puppet 2.7.12 on Centos 6.2 and my clients are running puppet 2.7.12 on Scientific Linux 6.2. If I check the puppetdlock file, it contains the pid of the currently "running" puppet. If I restart puppetd, it''s fine for a while, but sooner or later I end up in the same state. If I run strace against the puppetd, I get: # strace -p 10726 Process 10726 attached - interrupt to quit select(8, [7], NULL, NULL, {1, 560249}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 ^C <unfinished ...> Process 10726 detached If I run lsof, I get: # lsof -p 10726 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME puppetd 10726 root cwd DIR 8,1 4096 2 / puppetd 10726 root rtd DIR 8,1 4096 2 / puppetd 10726 root txt REG 8,1 10576 8151417 /usr/bin/ruby [...] puppetd 10726 root mem REG 8,1 26050 8153796 /usr/lib64/gconv/gconv-modules.cache puppetd 10726 root 0r CHR 1,3 0t0 3820 /dev/null puppetd 10726 root 1w CHR 1,3 0t0 3820 /dev/null puppetd 10726 root 2w CHR 1,3 0t0 3820 /dev/null puppetd 10726 root 3r FIFO 0,8 0t0 17283753 pipe puppetd 10726 root 4w FIFO 0,8 0t0 17283753 pipe puppetd 10726 root 5u unix 0xffff88013680b0c0 0t0 17283804 socket puppetd 10726 root 6u REG 8,1 6045 3145906 /var/log/puppet/http.log puppetd 10726 root 7u IPv4 17283830 0t0 TCP *:8139 (LISTEN) If I look at what puppet is running: # ps -elfw | grep 10726 5 S root 10726 1 0 81 1 - 61549 poll_s 15:15 ? 00:00:17 /usr/bin/ruby /usr/sbin/puppetd --debug --verbose 0 Z root 11429 10726 0 81 1 - 0 exit 15:39 ? 00:00:00 [sh] <defunct> Help? ...dave -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Tue, Apr 10, 2012 at 2:24 PM, David Alden <dave@alden.name> wrote:> If I look at what puppet is running: > > # ps -elfw | grep 10726 > 5 S root 10726 1 0 81 1 - 61549 poll_s 15:15 ? 00:00:17 > /usr/bin/ruby /usr/sbin/puppetd --debug --verbose > 0 Z root 11429 10726 0 81 1 - 0 exit 15:39 ? 00:00:00 > [sh] <defunct> > >Any chance you''ve looked at your log files to see what that last-forked shell is doing? Seems like a process isn''t exiting (maybe doesn''t want to give up a handle) and the puppet client is sitting around waiting for it? Cheers - RVT -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Seems to be the same as described here: https://groups.google.com/forum/?fromgroups#!topic/puppet-users/N1XcMTth7mE Got the same issue with puppet 2.6.14 and various kernels version. Can''t find what''s going rong, so I guess I''ll have to downgrade until everything run fine JB On 10/04/2012 23:24, David Alden wrote:> Hi, I''ve recently started to have a problem where some of my > clients puppetd processes are locking up (the puppetdlock file is > several hours old). My server is running puppet 2.7.12 on Centos > 6.2 and my clients are running puppet 2.7.12 on Scientific Linux > 6.2. If I check the puppetdlock file, it contains the pid of the > currently "running" puppet. If I restart puppetd, it''s fine for a > while, but sooner or later I end up in the same state. If I run > strace against the puppetd, I get: > > # strace -p 10726 Process 10726 attached - interrupt to quit > select(8, [7], NULL, NULL, {1, 560249}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, > NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, > [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, > [7], NULL, NULL, {2, 0}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, > NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, > [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 ^C > <unfinished ...> Process 10726 detached > > > If I run lsof, I get: > > # lsof -p 10726 COMMAND PID USER FD TYPE DEVICE > SIZE/OFF NODE NAME puppetd 10726 root cwd DIR > 8,1 4096 2 / puppetd 10726 root rtd DIR > 8,1 4096 2 / puppetd 10726 root txt REG > 8,1 10576 8151417 /usr/bin/ruby [...] puppetd 10726 root mem > REG 8,1 26050 8153796 > /usr/lib64/gconv/gconv-modules.cache puppetd 10726 root 0r CHR > 1,3 0t0 3820 /dev/null puppetd 10726 root 1w CHR > 1,3 0t0 3820 /dev/null puppetd 10726 root 2w CHR > 1,3 0t0 3820 /dev/null puppetd 10726 root 3r FIFO > 0,8 0t0 17283753 pipe puppetd 10726 root 4w FIFO > 0,8 0t0 17283753 pipe puppetd 10726 root 5u unix > 0xffff88013680b0c0 0t0 17283804 socket puppetd 10726 root > 6u REG 8,1 6045 3145906 > /var/log/puppet/http.log puppetd 10726 root 7u IPv4 > 17283830 0t0 TCP *:8139 (LISTEN) > > > If I look at what puppet is running: > > # ps -elfw | grep 10726 5 S root 10726 1 0 81 1 - 61549 > poll_s 15:15 ? 00:00:17 /usr/bin/ruby /usr/sbin/puppetd > --debug --verbose 0 Z root 11429 10726 0 81 1 - 0 exit > 15:39 ? 00:00:00 [sh] <defunct> > > > Help? > > ...dave >-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+Ep60ACgkQM2eZoKJfKd11IwCcD1RMeMIg4RNYpPkBGMzEJEcE 1q4An1I9V38LBia1+qBq/+vgwjXENNWe =q7y/ -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Tue, Apr 10, 2012 at 2:35 PM, Jean Baptiste FAVRE <jean.baptiste.favre@gmail.com> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > Seems to be the same as described here: > https://groups.google.com/forum/?fromgroups#!topic/puppet-users/N1XcMTth7mE > > Got the same issue with puppet 2.6.14 and various kernels version. > > Can''t find what''s going rong, so I guess I''ll have to downgrade until > everything run fineWhy do you need to downgrade instead of upgrade to 2.7? -Jeff -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Tue, Apr 10, 2012 at 2:24 PM, David Alden <dave@alden.name> wrote:> Hi, > I''ve recently started to have a problem where some of my clients puppetd processes are locking up (the puppetdlock file is several hours old). My server is running puppet 2.7.12 on Centos 6.2 and my clients are running puppet 2.7.12 on Scientific Linux 6.2. If I check the puppetdlock file, it contains the pid of the currently "running" puppet. If I restart puppetd, it''s fine for a while, but sooner or later I end up in the same state. If I run strace against the puppetd, I get: > > # strace -p 10726 > Process 10726 attached - interrupt to quit > select(8, [7], NULL, NULL, {1, 560249}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > ^C <unfinished ...> > Process 10726 detachedHey Dave, is this running the stock Ruby that comes with EL6 ? -Jeff -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Because all my recipes are not fully 2.7 compatible yet :-/ And I won''t be able to rewrite them fast enough. And finally, since the hang issue also occurs with some 2.7 release, I''m not sure it''ll be worth the upgrade JB On 10/04/2012 23:41, Jeff McCune wrote:> On Tue, Apr 10, 2012 at 2:35 PM, Jean Baptiste FAVRE > <jean.baptiste.favre@gmail.com> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> Hello, Seems to be the same as described here: >> https://groups.google.com/forum/?fromgroups#!topic/puppet-users/N1XcMTth7mE >> >> >>Got the same issue with puppet 2.6.14 and various kernels version.>> >> Can''t find what''s going rong, so I guess I''ll have to downgrade >> until everything run fine > > Why do you need to downgrade instead of upgrade to 2.7? > > -Jeff >-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+FInMACgkQM2eZoKJfKd1zAgCdGtxZyGcesWQ3L0vZWiLRNjOI wLsAmgMYps7c4GaVVABHa6VogttTp2kZ =pAMk -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
I have the same issue I think. With 2.7.6 and 2.7.11. https://projects.puppetlabs.com/issues/13000 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Forgot to mention: I see this misbehaviour on many servers with: - puppet 2.6.14, - facter 1.6.5 - Ubuntu 10.04 with 2.6.32-317-ec2 kernel See it also with other kernels, but less often. Puppet & facter are home backported (ie, not using apt.puppetlabs.com) JB On 4/10/12 11:35 PM, Jean Baptiste FAVRE wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > Seems to be the same as described here: > https://groups.google.com/forum/?fromgroups#!topic/puppet-users/N1XcMTth7mE > > Got the same issue with puppet 2.6.14 and various kernels version. > > Can''t find what''s going rong, so I guess I''ll have to downgrade until > everything run fine > > JB > > > On 10/04/2012 23:24, David Alden wrote: >> Hi, I''ve recently started to have a problem where some of my >> clients puppetd processes are locking up (the puppetdlock file is >> several hours old). My server is running puppet 2.7.12 on Centos >> 6.2 and my clients are running puppet 2.7.12 on Scientific Linux >> 6.2. If I check the puppetdlock file, it contains the pid of the >> currently "running" puppet. If I restart puppetd, it''s fine for a >> while, but sooner or later I end up in the same state. If I run >> strace against the puppetd, I get: >> >> # strace -p 10726 Process 10726 attached - interrupt to quit >> select(8, [7], NULL, NULL, {1, 560249}) = 0 (Timeout) >> rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 >> rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, >> NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, >> [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, >> [7], NULL, NULL, {2, 0}) = 0 (Timeout) >> rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 >> rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL, >> NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, >> [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 ^C >> <unfinished ...> Process 10726 detached >> >> >> If I run lsof, I get: >> >> # lsof -p 10726 COMMAND PID USER FD TYPE DEVICE >> SIZE/OFF NODE NAME puppetd 10726 root cwd DIR >> 8,1 4096 2 / puppetd 10726 root rtd DIR >> 8,1 4096 2 / puppetd 10726 root txt REG >> 8,1 10576 8151417 /usr/bin/ruby [...] puppetd 10726 root mem >> REG 8,1 26050 8153796 >> /usr/lib64/gconv/gconv-modules.cache puppetd 10726 root 0r CHR >> 1,3 0t0 3820 /dev/null puppetd 10726 root 1w CHR >> 1,3 0t0 3820 /dev/null puppetd 10726 root 2w CHR >> 1,3 0t0 3820 /dev/null puppetd 10726 root 3r FIFO >> 0,8 0t0 17283753 pipe puppetd 10726 root 4w FIFO >> 0,8 0t0 17283753 pipe puppetd 10726 root 5u unix >> 0xffff88013680b0c0 0t0 17283804 socket puppetd 10726 root >> 6u REG 8,1 6045 3145906 >> /var/log/puppet/http.log puppetd 10726 root 7u IPv4 >> 17283830 0t0 TCP *:8139 (LISTEN) >> >> >> If I look at what puppet is running: >> >> # ps -elfw | grep 10726 5 S root 10726 1 0 81 1 - 61549 >> poll_s 15:15 ? 00:00:17 /usr/bin/ruby /usr/sbin/puppetd >> --debug --verbose 0 Z root 11429 10726 0 81 1 - 0 exit >> 15:39 ? 00:00:00 [sh]<defunct> >> >> >> Help? >> >> ...dave >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.12 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk+Ep60ACgkQM2eZoKJfKd11IwCcD1RMeMIg4RNYpPkBGMzEJEcE > 1q4An1I9V38LBia1+qBq/+vgwjXENNWe > =q7y/ > -----END PGP SIGNATURE----- >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, Thanks for all of the replies so far. On Apr 10, 2012, at 5:37 PM, Jean Baptiste FAVRE wrote:> Seems to be the same as described here: > https://groups.google.com/forum/?fromgroups#!topic/puppet-users/N1XcMTth7mEYes it does. On Apr 11, 2012, at 3:16 AM, Elias Abacioglu wrote:> I have the same issue I think. With 2.7.6 and 2.7.11. > https://projects.puppetlabs.com/issues/13000Yup - that also describes the same issue. At least I''m not alone in this. :-) ...dave -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi,> On Apr 10, 2012, at 5:43 PM, Jeff McCune wrote: > Hey Dave, is this running the stock Ruby that comes with EL6 ?Yup: ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] This seems to be the same as: https://projects.puppetlabs.com/issues/13000 ...dave -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Sorry I missed this thread until now. Are you running with --listen? If so, try: echo "" | nc localhost 8139 and see if that helps. https://projects.puppetlabs.com/issues/12185 is tracking an issue where puppetd, running with --listen, gets stuck in a socket read loop. Any data whatsoever sent to port 8139 will cause puppetd to continue where it left off (sometimes days ago). I''ve mainly seen this on Cent6.2 personally. I''ve setup one of my test clients with a version of ruby patched for caller_for_all_threads and with the "xray" rubygem, and will try to get a thread dump next time it happens. -Jason -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Yes, my clients run with listen option. Already saw this bug repote and that''s why I''ve restarted puppet agent without this option yesterday to see what happen. Will see results when arriving at work :) JB On 12/04/2012 01:38, Jason Antman wrote:> Sorry I missed this thread until now. > > Are you running with --listen? If so, try: echo "" | nc localhost > 8139 and see if that helps. > > https://projects.puppetlabs.com/issues/12185 is tracking an issue > where puppetd, running with --listen, gets stuck in a socket read > loop. Any data whatsoever sent to port 8139 will cause puppetd to > continue where it left off (sometimes days ago). > > I''ve mainly seen this on Cent6.2 personally. I''ve setup one of my > test clients with a version of ruby patched for > caller_for_all_threads and with the "xray" rubygem, and will try to > get a thread dump next time it happens. > > -Jason >-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+GcdwACgkQM2eZoKJfKd1y7ACfSbOHZGcH5z1se3DN43C5xPB4 U1sAn113IJ4fpY5ctUJhV04IObET+aY1 =Wh4Z -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, On Apr 11, 2012, at 7:38 PM, Jason Antman wrote:> Sorry I missed this thread until now. > > Are you running with --listen? If so, try: > echo "" | nc localhost 8139 > and see if that helps. > > https://projects.puppetlabs.com/issues/12185 > is tracking an issue where puppetd, running with --listen, gets stuck in a socket read loop. Any data whatsoever sent to port 8139 will cause puppetd to continue where it left off (sometimes days ago). > > I''ve mainly seen this on Cent6.2 personally. I''ve setup one of my test clients with a version of ruby patched for caller_for_all_threads and with the "xray" rubygem, and will try to get a thread dump next time it happens.Yup - that did it. Time to modify my check-puppet script to do the echo. :-) ...thnx, ...dave -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.