thr3ads.net - Puppet users - [Puppet Users] puppetd hanging on some nodes [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Gonzalo Servat

2012-Feb-08 03:11 UTC

[Puppet Users] puppetd hanging on some nodes

Hi All,

In my set-up, I''ve got a cron job that triggers a Puppet run every 20
minutes. I''ve found that on approximately 13 nodes (out of 166),
puppetd just hangs. I have to go in, kill the process, remove
/var/lib/puppet/state/puppetdlock, and run puppet again and then it''s
fine.

After a while, it just hangs again so I have to go in, kill the process, etc.

Any ideas?

Thanks!
Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Brian Gallew

2012-Feb-08 04:25 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

If you are like me, the problem is that the ruby for your platform sucks.
 The webstack ruby 1.8.7 for Solaris 10 has a nasty tendency to hang (for
the daemons) and core dump for individual runs.  Individual runs out of a
crontab are the most reliable way I''ve found to make it all work.

On Tue, Feb 7, 2012 at 7:11 PM, Gonzalo Servat <gservat@gmail.com> wrote:
> Hi All,
>
> In my set-up, I''ve got a cron job that triggers a Puppet run every
20
> minutes. I''ve found that on approximately 13 nodes (out of 166),
> puppetd just hangs. I have to go in, kill the process, remove
> /var/lib/puppet/state/puppetdlock, and run puppet again and then
it''s
> fine.
>
> After a while, it just hangs again so I have to go in, kill the process,
> etc.
>
> Any ideas?
>
> Thanks!
> Gonzalo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Gonzalo Servat

2012-Feb-08 07:56 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Wed, Feb 8, 2012 at 3:25 PM, Brian Gallew <geek@gallew.org> wrote:
> If you are like me, the problem is that the ruby for your platform sucks.
>  The webstack ruby 1.8.7 for Solaris 10 has a nasty tendency to hang (for
> the daemons) and core dump for individual runs.  Individual runs out of a
> crontab are the most reliable way I''ve found to make it all work.

This is ruby-1.8.7.299-7.el6_1.1 and I am running Puppet out of crontab,
but it''s still frequently hanging. Right about now it has hanged again
on
several nodes.

Any ideas?

- Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Daniel Pittman

2012-Feb-08 18:44 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Tue, Feb 7, 2012 at 23:56, Gonzalo Servat <gservat@gmail.com>
wrote:> On Wed, Feb 8, 2012 at 3:25 PM, Brian Gallew <geek@gallew.org> wrote:
>>
>> If you are like me, the problem is that the ruby for your platform
sucks.
>>  The webstack ruby 1.8.7 for Solaris 10 has a nasty tendency to hang
(for
>> the daemons) and core dump for individual runs.  Individual runs out of
a
>> crontab are the most reliable way I''ve found to make it all
work.
>
> This is ruby-1.8.7.299-7.el6_1.1 and I am running Puppet out of crontab,
but
> it''s still frequently hanging. Right about now it has hanged again
on
> several nodes.
>
> Any ideas?
RedHat released some update kernels that reintroduced a bug from the
2.6.13 Linux kernel.  You can run any of the code in this gist to
check if your kernel suffers that: https://gist.github.com/441278

The C code is obviously a pretty good choice, as it excludes Ruby
entirely from the problem space, and will confirm if that is your root
cause.

(The bug is that select on a file in /proc hangs for a long time,
possibly forever, and Ruby will use select on a file if there are
enough handles open.  This happens in some daemon
configurations.)>
> - Gonzalo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.


-- 
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Gonzalo Servat

2012-Feb-08 22:40 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Thu, Feb 9, 2012 at 5:44 AM, Daniel Pittman <daniel@puppetlabs.com>
wrote:> RedHat released some update kernels that reintroduced a bug from the
> 2.6.13 Linux kernel.  You can run any of the code in this gist to
> check if your kernel suffers that: https://gist.github.com/441278
>
> The C code is obviously a pretty good choice, as it excludes Ruby
> entirely from the problem space, and will confirm if that is your root
> cause.
>
> (The bug is that select on a file in /proc hangs for a long time,
> possibly forever, and Ruby will use select on a file if there are
> enough handles open.  This happens in some daemon configurations.)
Hi Daniel,

I tried the C code (with vda, instead of sda, as this is a VM using virtio)
and the result matched the "good" section of that url you pasted.

On stracing a hung puppetd run, I see infinite number of these:

select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
gettimeofday({1328740663, 962461}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0

The process looks like this:

/usr/bin/ruby /usr/sbin/puppetd --pluginsync --ignorecache
--no-usecacheonfailure --onetime --no-daemonize --logdest syslog
--environment=production --server=puppet-server --report

Any other ideas?

- Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Daniel Pittman

2012-Feb-08 22:54 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Wed, Feb 8, 2012 at 14:40, Gonzalo Servat <gservat@gmail.com>
wrote:> On Thu, Feb 9, 2012 at 5:44 AM, Daniel Pittman
<daniel@puppetlabs.com> wrote:
>> RedHat released some update kernels that reintroduced a bug from the
>> 2.6.13 Linux kernel.  You can run any of the code in this gist to
>> check if your kernel suffers that: https://gist.github.com/441278
>>
>> The C code is obviously a pretty good choice, as it excludes Ruby
>> entirely from the problem space, and will confirm if that is your root
>> cause.
>>
>> (The bug is that select on a file in /proc hangs for a long time,
>> possibly forever, and Ruby will use select on a file if there are
>> enough handles open.  This happens in some daemon configurations.)
>
> I tried the C code (with vda, instead of sda, as this is a VM using virtio)
> and the result matched the "good" section of that url you pasted.
>
> On stracing a hung puppetd run, I see infinite number of these:
>
> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> gettimeofday({1328740663, 962461}, NULL) = 0
> rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
> rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
> rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
> rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
> rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
Damn.  Well, at least we eliminated one possible cause.  Is there any
chance you can run with `--debug` enabled on one of the failed
machines, and see if that points to the right place?  Otherwise we
have to start to get into some fairly heavy ways to figure out what is
going on.

We can''t trivially reproduce this in-house, though we will keep trying.

-- 
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Gonzalo Servat

2012-Feb-09 06:08 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

>
> Damn.  Well, at least we eliminated one possible cause.  Is there any
> chance you can run with `--debug` enabled on one of the failed
> machines, and see if that points to the right place?  Otherwise we
> have to start to get into some fairly heavy ways to figure out what is
> going on.
>
OK I''m now running it with --debug into separate log files, to compare
a
working and non-working runs. Unfortunately the hung Puppet doesn''t
seem to
reveal anything interesting in the logs. A working puppet run looks like
this:

[..stuff..]
debug: Finishing transaction 70131030874760
debug: Loaded state in 0.01 seconds
info: Retrieving plugin
debug: file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml;
using pson
debug: Using cached certificate for ca
debug: Using cached certificate for mtsldrp118.sirca.org.au
debug: Using cached certificate_revocation_list for ca
debug: Finishing transaction 70131030519320
info: Loading facts in /var/lib/puppet/lib/facter/server_class.rb
[...more custom facts loading...]
debug: catalog supports formats: b64_zlib_yaml dot marshal pson raw yaml;
using pson
debug: Puppet::Type::Package::ProviderRpm: Executing ''/bin/rpm
--version''
debug: Puppet::Type::Package::ProviderAptrpm: Executing ''/bin/rpm -ql
rpm''
debug: Puppet::Type::Package::ProviderYum: Executing ''/bin/rpm
--version''
[..etc..]

A broken Puppet run shows:

[..stuff..]
debug: /File[/var/lib/puppet/state]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/clientbucket]: Autorequiring
File[/var/lib/puppet]
debug: /File[/var/lib/puppet/client_data]: Autorequiring
File[/var/lib/puppet]
debug: Finishing transaction 69910666048880
debug: /File[/var/lib/puppet/lib]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/state]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/certs]: Autorequiring
File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/private]: Autorequiring
File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/facts]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/crl.pem]: Autorequiring
File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/certs/ca.pem]: Autorequiring
File[/var/lib/puppet/ssl/certs]
debug: Finishing transaction 69910666553940
debug: Using cached certificate for ca
debug: Using cached certificate for puppetclient.mydomain
debug: Finishing transaction 69910665891720
debug: Loaded state in 0.01 seconds
info: Retrieving plugin
debug: file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml;
using pson
debug: Using cached certificate for ca
debug: Using cached certificate for puppetclient.mydomain
debug: Using cached certificate_revocation_list for ca
debug: Finishing transaction 69910665535980

That''s it. Nothing else in the output. Strace on the puppetd process
shows
repetitions of what I pasted in an earlier email:

select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
gettimeofday({1328767567, 900875}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
gettimeofday({1328767567, 901663}, NULL) = 0

Would appreciate any suggestions you have on this.

Regards
Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Gonzalo Servat

2012-Feb-10 01:19 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Thu, Feb 9, 2012 at 5:08 PM, Gonzalo Servat <gservat@gmail.com> wrote:
> Damn.  Well, at least we eliminated one possible cause.  Is there any
>> chance you can run with `--debug` enabled on one of the failed
>> machines, and see if that points to the right place?  Otherwise we
>> have to start to get into some fairly heavy ways to figure out what is
>> going on.
>>
>
> OK I''m now running it with --debug into separate log files, to
compare a
> working and non-working runs. Unfortunately the hung Puppet
doesn''t seem to
> reveal anything interesting in the logs. A working puppet run looks like
> this:
>[..snip..]

Hi Daniel,

I''m having an increasing number of nodes now with puppet hangs. I now
have
26 nodes where puppetd just hangs. Any ideas on what I can try?
I''ve tried removing any Puppet configuration for the hanging nodes, but
it
doesn''t help so it looks like a client side problem and not the catalog
that gets applied to them.

Kernel on hanging nodes: 2.6.32-131.17.1.el6.x86_64.

It would be nice if all the nodes running the above kernel had this
problem, but unfortunately there are other nodes using the above kernel
that are not hanging.

Thanks in advance.
Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Daniel Pittman

2012-Feb-15 00:02 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Thu, Feb 9, 2012 at 17:19, Gonzalo Servat <gservat@gmail.com>
wrote:> On Thu, Feb 9, 2012 at 5:08 PM, Gonzalo Servat <gservat@gmail.com>
wrote:
>>>
>>> Damn.  Well, at least we eliminated one possible cause.  Is there
any
>>> chance you can run with `--debug` enabled on one of the failed
>>> machines, and see if that points to the right place?  Otherwise we
>>> have to start to get into some fairly heavy ways to figure out what
is
>>> going on.
>>
>> OK I''m now running it with --debug into separate log files, to
compare a
>> working and non-working runs. Unfortunately the hung Puppet
doesn''t seem to
>> reveal anything interesting in the logs. A working puppet run looks
like
>> this:
>
> [..snip..]
>
> I''m having an increasing number of nodes now with puppet hangs. I
now have
> 26 nodes where puppetd just hangs. Any ideas on what I can try?
> I''ve tried removing any Puppet configuration for the hanging
nodes, but it
> doesn''t help so it looks like a client side problem and not the
catalog that
> gets applied to them.
Sorry for not getting back to this sooner.  If you are running 2.7.10,
can you try removing the file
`puppet/util/instrumentation/listeners/process_name.rb` and see if
that fixes the problem?

We have some reports that can cause hangs, and eliminating it will
make sure this doesn''t descend from there.

Otherwise we really have to start getting into more aggressive
debugging.  Would you be comfortable doing some hacking / patching of
the code to narrow this down, and/or installing some development tools
on one of the nodes that triggers the hang?

-- 
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Gonzalo Servat

2012-Feb-15 00:28 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Wed, Feb 15, 2012 at 11:02 AM, Daniel Pittman
<daniel@puppetlabs.com>wrote:
> Sorry for not getting back to this sooner.  If you are running 2.7.10,
> can you try removing the file
> `puppet/util/instrumentation/listeners/process_name.rb` and see if
> that fixes the problem?
>
No worries Daniel. Yes. It did fix the problem and I did actually raise a
bug on this (to avoid doubling up work):
http://projects.puppetlabs.com/issues/12588

- Gonzalo

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Daniel Pittman

2012-Feb-15 00:29 UTC

head link

Re: [Puppet Users] puppetd hanging on some nodes

On Tue, Feb 14, 2012 at 16:28, Gonzalo Servat <gservat@gmail.com>
wrote:> On Wed, Feb 15, 2012 at 11:02 AM, Daniel Pittman
<daniel@puppetlabs.com>
> wrote:
>>
>> Sorry for not getting back to this sooner.  If you are running 2.7.10,
>> can you try removing the file
>> `puppet/util/instrumentation/listeners/process_name.rb` and see if
>> that fixes the problem?
>
> No worries Daniel. Yes. It did fix the problem and I did actually raise a
> bug on this (to avoid doubling up
> work): http://projects.puppetlabs.com/issues/12588
Oh, awesome.  This is why the bug system works better than just email
- someone else noticed and fixed it up. :)

That code will be gone in the next release, and won''t return until
better behaved.

-- 
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Seemingly Similar Threads

Search for more apparently analagous threads

Puppet users - Feb 2012 - puppetd hanging on some nodes

[Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Re: [Puppet Users] puppetd hanging on some nodes

Seemingly Similar Threads