Chris
2010-Dec-14 08:24 UTC
[Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hi I recently upgraded my puppet masters (and clients) from 0.24.8 to 2.6.4 Previously, my most busy puppet master would hover around about 0.9 load average, after the upgrade, its load hovers around 5 I am running passenger and mysql based stored configs. Checking my running processes, ruby (puppetmasterd) shoots up to 99% cpu load and stays there for a few seconds before dropping again. Often there are 4 of these running simultaneously, pegging each core at 99% cpu. It seems that there has been a serious performance regression between 0.24 and 2.6 for my configuration I hop the following can help work out where... I ran puppetmasterd through a profiler to find the root cause of this (http://boojum.homelinux.org/profile.svg). The main problem appears to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in the evaluate function. I added a few timing commands around various sections of that function to find the following breakdown of times spent inside it, and the two most intensive calls are --- paramobjects = parameters.collect { |param| param.safeevaluate(scope) } --- and --- resource_titles.flatten.collect { |resource_title| exceptwrap :type => Puppet::ParseError do resource = Puppet::Parser::Resource.new( fully_qualified_type, resource_title, :parameters => paramobjects, :file => self.file, :line => self.line, :exported => self.exported, :virtual => virt, :source => scope.source, :scope => scope, :strict => true ) if resource.resource_type.is_a? Puppet::Resource::Type resource.resource_type.instantiate_resource(scope, resource) end scope.compiler.add_resource(scope, resource) scope.compiler.evaluate_classes([resource_title],scope,false) if fully_qualified_type == ''class'' resource end }.reject { |resource| resource.nil? } --- Unfortunately, that is about the limit of my current ruby skills. What else can be looked at to speed 2.6 back up to the performance of 0.24? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2010-Dec-14 23:40 UTC
[Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hi Chris, Sorry - I can''t say I''m seeing this performance issue myself with my setup :-(. I''m not an expert around that part of the code. Having said that its probably DSL parsing related (possibly a recursion somewhere) ... I''d be focusing on your content not just the Ruby to see what part of the puppet DSL is causing it. Strip your content right back and add bits back in slowly. I think this would make your report very useful if it turns out to be a bug, and perhaps you can find a workaround that way as well. That''s just my 2c. Good luck :-). ken. On Tuesday, December 14, 2010 8:24:55 AM UTC, Chris wrote:> > Hi > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > 2.6.4 > > Previously, my most busy puppet master would hover around about 0.9 > load average, after the upgrade, its load hovers around 5 > > I am running passenger and mysql based stored configs. > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > cpu load and stays there for a few seconds before dropping again. > Often there are 4 of these running simultaneously, pegging each core > at 99% cpu. > > It seems that there has been a serious performance regression between > 0.24 and 2.6 for my configuration > > I hop the following can help work out where... > > I ran puppetmasterd through a profiler to find the root cause of this > (http://boojum.homelinux.org/profile.svg). The main problem appears > to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in > the evaluate function. > > I added a few timing commands around various sections of that function > to find the following breakdown of times spent inside it, and the two > most intensive calls are > --- > paramobjects = parameters.collect { |param| > param.safeevaluate(scope) > } > --- > > and > --- > resource_titles.flatten.collect { |resource_title| > exceptwrap :type => Puppet::ParseError do > resource = Puppet::Parser::Resource.new( > fully_qualified_type, resource_title, > :parameters => paramobjects, > :file => self.file, > :line => self.line, > :exported => self.exported, > :virtual => virt, > :source => scope.source, > :scope => scope, > :strict => true > ) > > if resource.resource_type.is_a? Puppet::Resource::Type > resource.resource_type.instantiate_resource(scope, resource) > end > scope.compiler.add_resource(scope, resource) > scope.compiler.evaluate_classes([resource_title],scope,false) > if fully_qualified_type == ''class'' > resource > end > }.reject { |resource| resource.nil? } > --- > > > Unfortunately, that is about the limit of my current ruby skills. > What else can be looked at to speed 2.6 back up to the performance of > 0.24? > > > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Dec-15 00:48 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Tue, Dec 14, 2010 at 12:24 AM, Chris <iwouldratherbesleepingnow@gmail.com> wrote:> Hi > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > 2.6.4 > > Previously, my most busy puppet master would hover around about 0.9 > load average, after the upgrade, its load hovers around 5 > > I am running passenger and mysql based stored configs. > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > cpu load and stays there for a few seconds before dropping again. > Often there are 4 of these running simultaneously, pegging each core > at 99% cpu. > > It seems that there has been a serious performance regression between > 0.24 and 2.6 for my configuration >Some useful info would be: OS OS version Ruby version Apache version/worker model Passenger version> I hop the following can help work out where... > > I ran puppetmasterd through a profiler to find the root cause of this > (http://boojum.homelinux.org/profile.svg). The main problem appears > to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in > the evaluate function. > > I added a few timing commands around various sections of that function > to find the following breakdown of times spent inside it, and the two > most intensive calls are > --- > paramobjects = parameters.collect { |param| > param.safeevaluate(scope) > } > --- > > and > --- > resource_titles.flatten.collect { |resource_title| > exceptwrap :type => Puppet::ParseError do > resource = Puppet::Parser::Resource.new( > fully_qualified_type, resource_title, > :parameters => paramobjects, > :file => self.file, > :line => self.line, > :exported => self.exported, > :virtual => virt, > :source => scope.source, > :scope => scope, > :strict => true > ) > > if resource.resource_type.is_a? Puppet::Resource::Type > resource.resource_type.instantiate_resource(scope, resource) > end > scope.compiler.add_resource(scope, resource) > scope.compiler.evaluate_classes([resource_title],scope,false) > if fully_qualified_type == ''class'' > resource > end > }.reject { |resource| resource.nil? } > --- > > > Unfortunately, that is about the limit of my current ruby skills. > What else can be looked at to speed 2.6 back up to the performance of > 0.24? > > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- Nigel Kersten - Puppet Labs - http://www.puppetlabs.com -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chris
2010-Dec-15 07:10 UTC
[Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
> > Some useful info would be: > > OS > OS version > Ruby version > Apache version/worker model > Passenger version >CentOS 5.2 ruby-1.8.5-5.el5_3.7 httpd-2.2.3-31.el5.centos.2 rubygem-passenger-2.2.11-2el5.ecn rubygem-rails-2.1.1-2.el5 rubygem-rack-1.1.0-1el5 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 10:42 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Tue, 2010-12-14 at 00:24 -0800, Chris wrote:> Hi > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > 2.6.4 > > Previously, my most busy puppet master would hover around about 0.9 > load average, after the upgrade, its load hovers around 5 > > I am running passenger and mysql based stored configs. > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > cpu load and stays there for a few seconds before dropping again. > Often there are 4 of these running simultaneously, pegging each core > at 99% cpu.I would say it is perfectly normal. Compiling the catalog is a hard and complex problem and requires CPU. The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is that some performance issues have been fixed. Those issues made the master be more I/O bound under 0.24, but now mostly CPU bound in later versions. Now compare the compilation time under 0.24.8 and 2.6 and you should see that it reduced drastically (allowing to fit more compilation in the same time unit). The reverse of the medal is that now your master requires transient high CPU usage. I don''t really get what is the issue about using 100% of CPU? You''re paying about the same price when your CPU is used and when it''s idle, so that shouldn''t make a difference :) If that''s an issue, reduce the concurrency of your setup (run less compilation in parallel, implement splay time, etc...).> It seems that there has been a serious performance regression between > 0.24 and 2.6 for my configurationI think it''s the reverse that happened.> I hop the following can help work out where... > > I ran puppetmasterd through a profiler to find the root cause of this > (http://boojum.homelinux.org/profile.svg). The main problem appears > to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in > the evaluate function. > > I added a few timing commands around various sections of that function > to find the following breakdown of times spent inside it, and the two > most intensive calls are > --- > paramobjects = parameters.collect { |param| > param.safeevaluate(scope) > } > --- > > and > --- > resource_titles.flatten.collect { |resource_title| > exceptwrap :type => Puppet::ParseError do > resource = Puppet::Parser::Resource.new( > fully_qualified_type, resource_title, > :parameters => paramobjects, > :file => self.file, > :line => self.line, > :exported => self.exported, > :virtual => virt, > :source => scope.source, > :scope => scope, > :strict => true > ) > > if resource.resource_type.is_a? Puppet::Resource::Type > resource.resource_type.instantiate_resource(scope, resource) > end > scope.compiler.add_resource(scope, resource) > scope.compiler.evaluate_classes([resource_title],scope,false) > if fully_qualified_type == ''class'' > resource > end > }.reject { |resource| resource.nil? } > ---Yes, this is what the compiler is doing during compilation: evaluating resources and parameters. The more resources you use, the more the compilation will take time and CPU. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chris
2010-Dec-15 13:28 UTC
[Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> wrote:> On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: > > Hi > > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > > 2.6.4 > > > Previously, my most busy puppet master would hover around about 0.9 > > load average, after the upgrade, its load hovers around 5 > > > I am running passenger and mysql based stored configs. > > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > > cpu load and stays there for a few seconds before dropping again. > > Often there are 4 of these running simultaneously, pegging each core > > at 99% cpu. > > I would say it is perfectly normal. Compiling the catalog is a hard and > complex problem and requires CPU. > > The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is that > some performance issues have been fixed. Those issues made the master be > more I/O bound under 0.24, but now mostly CPU bound in later versions.If we were talking about only cpu usage, I would agree with you. But in this case, the load average of the machine has gone up over 5x. And as high load average indicates processes not getting enough runtime, in this case it is an indication to me that 2.6 is performing worse than 0.24 (previously, on average, all processes got enough runtime and did not have to wait for system resources, now processes are sitting in the run queue, waiting to get a chance to run)> > I don''t really get what is the issue about using 100% of CPU?Thats not the issue, just an indication of what is causing it> > You''re paying about the same price when your CPU is used and when it''s > idle, so that shouldn''t make a difference :)Generally true, but this is a on VM which is also running some of my radius and proxy instances, amongst others.> > If that''s an issue, reduce the concurrency of your setup (run less > compilation in parallel, implement splay time, etc...).splay has been enabled since 0.24 My apache maxclients is set to 15 to limit concurrency. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Trevor Vaughan
2010-Dec-15 14:12 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
What is your CPU to Puppetmaster instance ratio? I''ve had decent luck with 1 CPU to 2 PM, but not much above that. If you need dedicated resources for other tasks, you may want to ensure that you don''t have as many masters spawning as you have processors. Trevor On Wed, Dec 15, 2010 at 8:28 AM, Chris <iwouldratherbesleepingnow@gmail.com> wrote:> > > On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> > wrote: >> On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: >> > Hi >> >> > I recently upgraded my puppet masters (and clients) from 0.24.8 to >> > 2.6.4 >> >> > Previously, my most busy puppet master would hover around about 0.9 >> > load average, after the upgrade, its load hovers around 5 >> >> > I am running passenger and mysql based stored configs. >> >> > Checking my running processes, ruby (puppetmasterd) shoots up to 99% >> > cpu load and stays there for a few seconds before dropping again. >> > Often there are 4 of these running simultaneously, pegging each core >> > at 99% cpu. >> >> I would say it is perfectly normal. Compiling the catalog is a hard and >> complex problem and requires CPU. >> >> The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is that >> some performance issues have been fixed. Those issues made the master be >> more I/O bound under 0.24, but now mostly CPU bound in later versions. > > If we were talking about only cpu usage, I would agree with you. But > in this case, the load average of the machine has gone up over 5x. > And as high load average indicates processes not getting enough > runtime, in this case it is an indication to me that 2.6 is performing > worse than 0.24 (previously, on average, all processes got enough > runtime and did not have to wait for system resources, now processes > are sitting in the run queue, waiting to get a chance to run) > >> >> I don''t really get what is the issue about using 100% of CPU? > Thats not the issue, just an indication of what is causing it > >> >> You''re paying about the same price when your CPU is used and when it''s >> idle, so that shouldn''t make a difference :) > Generally true, but this is a on VM which is also running some of my > radius and proxy instances, amongst others. > >> >> If that''s an issue, reduce the concurrency of your setup (run less >> compilation in parallel, implement splay time, etc...). > splay has been enabled since 0.24 > > My apache maxclients is set to 15 to limit concurrency. > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- Trevor Vaughan Vice President, Onyx Point, Inc (410) 541-6699 tvaughan@onyxpoint.com -- This account not approved for unencrypted proprietary information -- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 17:15 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, 2010-12-15 at 05:28 -0800, Chris wrote:> > On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> > wrote: > > On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: > > > Hi > > > > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > > > 2.6.4 > > > > > Previously, my most busy puppet master would hover around about 0.9 > > > load average, after the upgrade, its load hovers around 5 > > > > > I am running passenger and mysql based stored configs. > > > > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > > > cpu load and stays there for a few seconds before dropping again. > > > Often there are 4 of these running simultaneously, pegging each core > > > at 99% cpu. > > > > I would say it is perfectly normal. Compiling the catalog is a hard and > > complex problem and requires CPU. > > > > The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is that > > some performance issues have been fixed. Those issues made the master be > > more I/O bound under 0.24, but now mostly CPU bound in later versions. > > If we were talking about only cpu usage, I would agree with you. But > in this case, the load average of the machine has gone up over 5x. > And as high load average indicates processes not getting enough > runtime, in this case it is an indication to me that 2.6 is performing > worse than 0.24 (previously, on average, all processes got enough > runtime and did not have to wait for system resources, now processes > are sitting in the run queue, waiting to get a chance to run)Load is not necessarily an indication of an issue. It can also mean some tasks are waiting for I/O not only CPU. The only real issue under load is if service time is beyond an admissible value, otherwise you can''t say it''s bad or not. If you see some hosts reporting timeouts, then it''s an indication that service time is not good :) BTW, do you run your mysql storedconfig instance on the same server? You can activate thin_storeconfigs to reduce the load on the mysql db.> > > > I don''t really get what is the issue about using 100% of CPU? > Thats not the issue, just an indication of what is causing it > > > > > You''re paying about the same price when your CPU is used and when it''s > > idle, so that shouldn''t make a difference :) > Generally true, but this is a on VM which is also running some of my > radius and proxy instances, amongst others. > > > > > If that''s an issue, reduce the concurrency of your setup (run less > > compilation in parallel, implement splay time, etc...). > splay has been enabled since 0.24 > > My apache maxclients is set to 15 to limit concurrency.I think this is too many except if you have 8 cores. As Trevor said in another e-mail in this thread, 2PM/Core is the best. Now it all depends on your number of nodes and sleeptime. I suggest you use ext/puppet-load to find your setup real concurrency. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2010-Dec-15 18:27 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
This issue is definitely a problem. I have a support ticket in with Puppet Labs about the same thing. My CPU remains at 100% almost constantly and it slows things down significantly. If you strace it you can see that very little appears to be going on. This is absolutely not normal behavior. Even when I had 1 client checking in I had all cores fully used. On Wed, Dec 15, 2010 at 12:15 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> On Wed, 2010-12-15 at 05:28 -0800, Chris wrote: > > > > On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> > > wrote: > > > On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: > > > > Hi > > > > > > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > > > > 2.6.4 > > > > > > > Previously, my most busy puppet master would hover around about 0.9 > > > > load average, after the upgrade, its load hovers around 5 > > > > > > > I am running passenger and mysql based stored configs. > > > > > > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > > > > cpu load and stays there for a few seconds before dropping again. > > > > Often there are 4 of these running simultaneously, pegging each core > > > > at 99% cpu. > > > > > > I would say it is perfectly normal. Compiling the catalog is a hard and > > > complex problem and requires CPU. > > > > > > The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is > that > > > some performance issues have been fixed. Those issues made the master > be > > > more I/O bound under 0.24, but now mostly CPU bound in later versions. > > > > If we were talking about only cpu usage, I would agree with you. But > > in this case, the load average of the machine has gone up over 5x. > > And as high load average indicates processes not getting enough > > runtime, in this case it is an indication to me that 2.6 is performing > > worse than 0.24 (previously, on average, all processes got enough > > runtime and did not have to wait for system resources, now processes > > are sitting in the run queue, waiting to get a chance to run) > > Load is not necessarily an indication of an issue. It can also mean some > tasks are waiting for I/O not only CPU. > The only real issue under load is if service time is beyond an > admissible value, otherwise you can''t say it''s bad or not. > If you see some hosts reporting timeouts, then it''s an indication that > service time is not good :) > > BTW, do you run your mysql storedconfig instance on the same server? > You can activate thin_storeconfigs to reduce the load on the mysql db. > > > > > > > I don''t really get what is the issue about using 100% of CPU? > > Thats not the issue, just an indication of what is causing it > > > > > > > > You''re paying about the same price when your CPU is used and when it''s > > > idle, so that shouldn''t make a difference :) > > Generally true, but this is a on VM which is also running some of my > > radius and proxy instances, amongst others. > > > > > > > > If that''s an issue, reduce the concurrency of your setup (run less > > > compilation in parallel, implement splay time, etc...). > > splay has been enabled since 0.24 > > > > My apache maxclients is set to 15 to limit concurrency. > > I think this is too many except if you have 8 cores. As Trevor said in > another e-mail in this thread, 2PM/Core is the best. > > Now it all depends on your number of nodes and sleeptime. I suggest you > use ext/puppet-load to find your setup real concurrency. > -- > Brice Figureau > Follow the latest Puppet Community evolutions on www.planetpuppet.org! > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Disconnect
2010-Dec-15 18:35 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
"me too". All the logs show nice quick compilations but the actual wall clock to get anything done is HUGE. Dec 15 13:10:29 puppet puppet-master[31406]: Compiled catalog for puppet.foo.com in environment production in 21.52 seconds Dec 15 13:10:51 puppet puppet-agent[8251]: Caching catalog for puppet.foo.com That was almost 30 minutes ago. Since then, it has sat there doing nothing... $ sudo strace -p 8251 Process 8251 attached - interrupt to quit select(7, [6], [], [], {866, 578560} lsof shows: puppetd 8251 root 6u IPv4 11016045 0t0 TCP puppet.foo.com:33065->puppet.foo.com:8140 (ESTABLISHED) On Wed, Dec 15, 2010 at 1:27 PM, Ashley Penney <apenney@gmail.com> wrote:> This issue is definitely a problem. I have a support ticket in with Puppet > Labs about the same thing. My CPU remains at 100% almost constantly and it > slows things down significantly. If you strace it you can see that very > little appears to be going on. This is absolutely not normal behavior. > Even when I had 1 client checking in I had all cores fully used. > > > On Wed, Dec 15, 2010 at 12:15 PM, Brice Figureau < > brice-puppet@daysofwonder.com> wrote: > >> On Wed, 2010-12-15 at 05:28 -0800, Chris wrote: >> > >> > On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> >> > wrote: >> > > On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: >> > > > Hi >> > > >> > > > I recently upgraded my puppet masters (and clients) from 0.24.8 to >> > > > 2.6.4 >> > > >> > > > Previously, my most busy puppet master would hover around about 0.9 >> > > > load average, after the upgrade, its load hovers around 5 >> > > >> > > > I am running passenger and mysql based stored configs. >> > > >> > > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% >> > > > cpu load and stays there for a few seconds before dropping again. >> > > > Often there are 4 of these running simultaneously, pegging each core >> > > > at 99% cpu. >> > > >> > > I would say it is perfectly normal. Compiling the catalog is a hard >> and >> > > complex problem and requires CPU. >> > > >> > > The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is >> that >> > > some performance issues have been fixed. Those issues made the master >> be >> > > more I/O bound under 0.24, but now mostly CPU bound in later versions. >> > >> > If we were talking about only cpu usage, I would agree with you. But >> > in this case, the load average of the machine has gone up over 5x. >> > And as high load average indicates processes not getting enough >> > runtime, in this case it is an indication to me that 2.6 is performing >> > worse than 0.24 (previously, on average, all processes got enough >> > runtime and did not have to wait for system resources, now processes >> > are sitting in the run queue, waiting to get a chance to run) >> >> Load is not necessarily an indication of an issue. It can also mean some >> tasks are waiting for I/O not only CPU. >> The only real issue under load is if service time is beyond an >> admissible value, otherwise you can''t say it''s bad or not. >> If you see some hosts reporting timeouts, then it''s an indication that >> service time is not good :) >> >> BTW, do you run your mysql storedconfig instance on the same server? >> You can activate thin_storeconfigs to reduce the load on the mysql db. >> >> > > >> > > I don''t really get what is the issue about using 100% of CPU? >> > Thats not the issue, just an indication of what is causing it >> > >> > > >> > > You''re paying about the same price when your CPU is used and when it''s >> > > idle, so that shouldn''t make a difference :) >> > Generally true, but this is a on VM which is also running some of my >> > radius and proxy instances, amongst others. >> > >> > > >> > > If that''s an issue, reduce the concurrency of your setup (run less >> > > compilation in parallel, implement splay time, etc...). >> > splay has been enabled since 0.24 >> > >> > My apache maxclients is set to 15 to limit concurrency. >> >> I think this is too many except if you have 8 cores. As Trevor said in >> another e-mail in this thread, 2PM/Core is the best. >> >> Now it all depends on your number of nodes and sleeptime. I suggest you >> use ext/puppet-load to find your setup real concurrency. >> -- >> Brice Figureau >> Follow the latest Puppet Community evolutions on www.planetpuppet.org! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to puppet-users@googlegroups.com. >> To unsubscribe from this group, send email to >> puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> >> > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Disconnect
2010-Dec-15 18:38 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
One addendum - the box is absolutely not io or cpu bound: Cpu(s): 83.0%us, 13.1%sy, 0.0%ni, 2.5%id, 0.0%wa, 0.1%hi, 1.3%si, 0.0%st (64bit kvm vm w/ 6 3.5ghz amd64 cpus, on an lvm partition - raw disk - with 5G ram, but only using 3 gigs. PLENTY of power, and monitoring supports that..) On Wed, Dec 15, 2010 at 1:35 PM, Disconnect <dc.disconnect@gmail.com> wrote:> "me too". All the logs show nice quick compilations but the actual wall > clock to get anything done is HUGE. > > Dec 15 13:10:29 puppet puppet-master[31406]: Compiled catalog for > puppet.foo.com in environment production in 21.52 seconds > Dec 15 13:10:51 puppet puppet-agent[8251]: Caching catalog for > puppet.foo.com > > That was almost 30 minutes ago. Since then, it has sat there doing > nothing... > $ sudo strace -p 8251 > Process 8251 attached - interrupt to quit > select(7, [6], [], [], {866, 578560} > > lsof shows: > puppetd 8251 root 6u IPv4 11016045 0t0 TCP > puppet.foo.com:33065->puppet.foo.com:8140 (ESTABLISHED) > > > > On Wed, Dec 15, 2010 at 1:27 PM, Ashley Penney <apenney@gmail.com> wrote: > >> This issue is definitely a problem. I have a support ticket in with >> Puppet Labs about the same thing. My CPU remains at 100% almost constantly >> and it slows things down significantly. If you strace it you can see that >> very little appears to be going on. This is absolutely not normal behavior. >> Even when I had 1 client checking in I had all cores fully used. >> >> >> On Wed, Dec 15, 2010 at 12:15 PM, Brice Figureau < >> brice-puppet@daysofwonder.com> wrote: >> >>> On Wed, 2010-12-15 at 05:28 -0800, Chris wrote: >>> > >>> > On Dec 15, 12:42 pm, Brice Figureau <brice-pup...@daysofwonder.com> >>> > wrote: >>> > > On Tue, 2010-12-14 at 00:24 -0800, Chris wrote: >>> > > > Hi >>> > > >>> > > > I recently upgraded my puppet masters (and clients) from 0.24.8 to >>> > > > 2.6.4 >>> > > >>> > > > Previously, my most busy puppet master would hover around about 0.9 >>> > > > load average, after the upgrade, its load hovers around 5 >>> > > >>> > > > I am running passenger and mysql based stored configs. >>> > > >>> > > > Checking my running processes, ruby (puppetmasterd) shoots up to >>> 99% >>> > > > cpu load and stays there for a few seconds before dropping again. >>> > > > Often there are 4 of these running simultaneously, pegging each >>> core >>> > > > at 99% cpu. >>> > > >>> > > I would say it is perfectly normal. Compiling the catalog is a hard >>> and >>> > > complex problem and requires CPU. >>> > > >>> > > The difference between 0.24.8 and 2.6 (or 0.25 for what matters) is >>> that >>> > > some performance issues have been fixed. Those issues made the master >>> be >>> > > more I/O bound under 0.24, but now mostly CPU bound in later >>> versions. >>> > >>> > If we were talking about only cpu usage, I would agree with you. But >>> > in this case, the load average of the machine has gone up over 5x. >>> > And as high load average indicates processes not getting enough >>> > runtime, in this case it is an indication to me that 2.6 is performing >>> > worse than 0.24 (previously, on average, all processes got enough >>> > runtime and did not have to wait for system resources, now processes >>> > are sitting in the run queue, waiting to get a chance to run) >>> >>> Load is not necessarily an indication of an issue. It can also mean some >>> tasks are waiting for I/O not only CPU. >>> The only real issue under load is if service time is beyond an >>> admissible value, otherwise you can''t say it''s bad or not. >>> If you see some hosts reporting timeouts, then it''s an indication that >>> service time is not good :) >>> >>> BTW, do you run your mysql storedconfig instance on the same server? >>> You can activate thin_storeconfigs to reduce the load on the mysql db. >>> >>> > > >>> > > I don''t really get what is the issue about using 100% of CPU? >>> > Thats not the issue, just an indication of what is causing it >>> > >>> > > >>> > > You''re paying about the same price when your CPU is used and when >>> it''s >>> > > idle, so that shouldn''t make a difference :) >>> > Generally true, but this is a on VM which is also running some of my >>> > radius and proxy instances, amongst others. >>> > >>> > > >>> > > If that''s an issue, reduce the concurrency of your setup (run less >>> > > compilation in parallel, implement splay time, etc...). >>> > splay has been enabled since 0.24 >>> > >>> > My apache maxclients is set to 15 to limit concurrency. >>> >>> I think this is too many except if you have 8 cores. As Trevor said in >>> another e-mail in this thread, 2PM/Core is the best. >>> >>> Now it all depends on your number of nodes and sleeptime. I suggest you >>> use ext/puppet-load to find your setup real concurrency. >>> -- >>> Brice Figureau >>> Follow the latest Puppet Community evolutions on www.planetpuppet.org! >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Puppet Users" group. >>> To post to this group, send email to puppet-users@googlegroups.com. >>> To unsubscribe from this group, send email to >>> puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/puppet-users?hl=en. >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to puppet-users@googlegroups.com. >> To unsubscribe from this group, send email to >> puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 19:14 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 15/12/10 19:35, Disconnect wrote:> "me too". All the logs show nice quick compilations but the actual wall > clock to get anything done is HUGE. > > Dec 15 13:10:29 puppet puppet-master[31406]: Compiled catalog for > puppet.foo.com <http://puppet.foo.com> in environment production in > 21.52 secondsThis looks long.> Dec 15 13:10:51 puppet puppet-agent[8251]: Caching catalog for > puppet.foo.com <http://puppet.foo.com> > > That was almost 30 minutes ago. Since then, it has sat there doing > nothing... > $ sudo strace -p 8251 > Process 8251 attached - interrupt to quit > select(7, [6], [], [], {866, 578560} > > lsof shows: > puppetd 8251 root 6u IPv4 11016045 0t0 TCP > puppet.foo.com:33065->puppet.foo.com:8140 <http://puppet.foo.com:8140> > (ESTABLISHED)Note: we were talking about the puppet master taking 100% CPU, but you''re apparently looking to the puppet agent, which is a different story. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 19:15 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 15/12/10 19:27, Ashley Penney wrote:> This issue is definitely a problem. I have a support ticket in with > Puppet Labs about the same thing. My CPU remains at 100% almost > constantly and it slows things down significantly. If you strace it you > can see that very little appears to be going on. This is absolutely not > normal behavior. Even when I had 1 client checking in I had all cores > fully used.I do agree that it''s not the correct behavior. I suggest you to strace or use any other ruby introspection techniques to find what part of the master is taking CPU. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Disconnect
2010-Dec-15 19:24 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, Dec 15, 2010 at 2:14 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> > Note: we were talking about the puppet master taking 100% CPU, but > you''re apparently looking to the puppet agent, which is a different story. >The agent isn''t taking cpu, it is hanging waiting for the master to do anything. (The run I quoted earlier eventually ended with a timeout..) The master has pegged the cpus, and it seems to be related to file resources: $ ps auxw|grep master puppet 31392 74.4 4.7 361720 244348 ? R 10:42 162:06 Rack: /usr/share/puppet/rack/puppetmasterd puppet 31396 70.0 4.9 369524 250200 ? R 10:42 152:32 Rack: /usr/share/puppet/rack/puppetmasterd puppet 31398 66.2 3.9 318828 199472 ? R 10:42 144:10 Rack: /usr/share/puppet/rack/puppetmasterd puppet 31400 66.6 4.9 369992 250588 ? R 10:42 145:04 Rack: /usr/share/puppet/rack/puppetmasterd puppet 31406 68.6 3.9 318292 200992 ? R 10:42 149:31 Rack: /usr/share/puppet/rack/puppetmasterd puppet 31414 67.0 2.4 243800 124476 ? R 10:42 146:00 Rack: /usr/share/puppet/rack/puppetmasterd Dec 15 13:42:23 puppet puppet-master[31406]: Compiled catalog for puppet.foo.com in environment production in 30.83 seconds Dec 15 13:42:49 puppet puppet-agent[10515]: Caching catalog for puppet.foo.com Dec 15 14:00:18 puppet puppet-agent[10515]: Applying configuration version ''1292438512'' ... Dec 15 14:14:56 puppet puppet-agent[10515]: Finished catalog run in 882.43 seconds Changes: Total: 6 Events: Success: 6 Total: 6 Resources: Changed: 6 Out of sync: 6 Total: 287 Time: Config retrieval: 72.20 Cron: 0.05 Exec: 32.42 File: 752.33 Filebucket: 0.00 Mount: 0.98 Package: 6.13 Schedule: 0.02 Service: 9.09 Ssh authorized key: 0.07 Sysctl: 0.00 real 34m56.066s user 1m6.030s sys 0m26.590s -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 19:43 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 15/12/10 20:24, Disconnect wrote:> On Wed, Dec 15, 2010 at 2:14 PM, Brice Figureau > <brice-puppet@daysofwonder.com <mailto:brice-puppet@daysofwonder.com>> > wrote: >> >> Note: we were talking about the puppet master taking 100% CPU, but >> you''re apparently looking to the puppet agent, which is a different story. >> > > The agent isn''t taking cpu, it is hanging waiting for the master to do > anything. (The run I quoted earlier eventually ended with a timeout..) > The master has pegged the cpus, and it seems to be related to file > resources:Oh, I see.> $ ps auxw|grep master > puppet 31392 74.4 4.7 361720 244348 ? R 10:42 162:06 Rack: > /usr/share/puppet/rack/puppetmasterd > > puppet 31396 70.0 4.9 369524 250200 ? R 10:42 152:32 Rack: > /usr/share/puppet/rack/puppetmasterd > > puppet 31398 66.2 3.9 318828 199472 ? R 10:42 144:10 Rack: > /usr/share/puppet/rack/puppetmasterd > > puppet 31400 66.6 4.9 369992 250588 ? R 10:42 145:04 Rack: > /usr/share/puppet/rack/puppetmasterd > > puppet 31406 68.6 3.9 318292 200992 ? R 10:42 149:31 Rack: > /usr/share/puppet/rack/puppetmasterd > > puppet 31414 67.0 2.4 243800 124476 ? R 10:42 146:00 Rack: > /usr/share/puppet/rack/puppetmasterdNote that they''re all running. That means there is none left to serve file content if they are all busy for several seconds (in our case around 20) while compiling catalogs.> Dec 15 13:42:23 puppet puppet-master[31406]: Compiled catalog for > puppet.foo.com <http://puppet.foo.com> in environment production in > 30.83 seconds > Dec 15 13:42:49 puppet puppet-agent[10515]: Caching catalog for > puppet.foo.com <http://puppet.foo.com> > Dec 15 14:00:18 puppet puppet-agent[10515]: Applying configuration > version ''1292438512'' > ... > Dec 15 14:14:56 puppet puppet-agent[10515]: Finished catalog run in > 882.43 seconds > Changes: > Total: 6 > Events: > Success: 6 > Total: 6 > Resources: > Changed: 6 > Out of sync: 6 > Total: 287That''s not a big number.> Time: > Config retrieval: 72.20This is also suspect.> Cron: 0.05 > Exec: 32.42 > File: 752.33Indeed.> Filebucket: 0.00 > Mount: 0.98 > Package: 6.13 > Schedule: 0.02 > Service: 9.09 > Ssh authorized key: 0.07 > Sysctl: 0.00 > > real 34m56.066s > user 1m6.030s > sys 0m26.590s >That just means your master are so busy serving catalogs that they barely have the time to serve files. One possibility is to use file content offloading (see one of my blog post about this: http://www.masterzen.fr/2010/03/21/more-puppet-offloading/). How many nodes are you compiling at the same time? Apparently you have 6 master processes running at high CPU usage. As I said earlier, I really advise people to try puppet-load (which can be found in the ext/ directory of the source tarball since puppet 2.6) to execise load againts a master. This will help you find your actual concurrency. But, if it''s a bug, then could this be an issue with passenger? -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Disconnect
2010-Dec-15 20:10 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
As a datapoint, this exact config (with mongrel_cluster) was working great under 0.25.x. With fewer, slower cpus, slower storage (vm image files) and 2G of ram... I gave puppet-load a try, but it is throwing errors that I don''t have time to dig into today: debug: reading facts from: puppet.foo.com.yaml /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:72:in `send_request'': uninitialized constant EventMachine::ConnectionError (NameError) from /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:59:in `setup_request'' from /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:49:in `get'' from ./puppet-load.rb:272:in `spawn_request'' from ./puppet-load.rb:334:in `spawn'' Running about 250 nodes, every 30 minutes. On Wed, Dec 15, 2010 at 2:43 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> On 15/12/10 20:24, Disconnect wrote: > > On Wed, Dec 15, 2010 at 2:14 PM, Brice Figureau > > <brice-puppet@daysofwonder.com <mailto:brice-puppet@daysofwonder.com>> > > wrote: > >> > >> Note: we were talking about the puppet master taking 100% CPU, but > >> you''re apparently looking to the puppet agent, which is a different > story. > >> > > > > The agent isn''t taking cpu, it is hanging waiting for the master to do > > anything. (The run I quoted earlier eventually ended with a timeout..) > > The master has pegged the cpus, and it seems to be related to file > > resources: > > Oh, I see. > > > $ ps auxw|grep master > > puppet 31392 74.4 4.7 361720 244348 ? R 10:42 162:06 Rack: > > /usr/share/puppet/rack/puppetmasterd > > > > puppet 31396 70.0 4.9 369524 250200 ? R 10:42 152:32 Rack: > > /usr/share/puppet/rack/puppetmasterd > > > > puppet 31398 66.2 3.9 318828 199472 ? R 10:42 144:10 Rack: > > /usr/share/puppet/rack/puppetmasterd > > > > puppet 31400 66.6 4.9 369992 250588 ? R 10:42 145:04 Rack: > > /usr/share/puppet/rack/puppetmasterd > > > > puppet 31406 68.6 3.9 318292 200992 ? R 10:42 149:31 Rack: > > /usr/share/puppet/rack/puppetmasterd > > > > puppet 31414 67.0 2.4 243800 124476 ? R 10:42 146:00 Rack: > > /usr/share/puppet/rack/puppetmasterd > > Note that they''re all running. That means there is none left to serve > file content if they are all busy for several seconds (in our case > around 20) while compiling catalogs. > > > Dec 15 13:42:23 puppet puppet-master[31406]: Compiled catalog for > > puppet.foo.com <http://puppet.foo.com> in environment production in > > 30.83 seconds > > Dec 15 13:42:49 puppet puppet-agent[10515]: Caching catalog for > > puppet.foo.com <http://puppet.foo.com> > > Dec 15 14:00:18 puppet puppet-agent[10515]: Applying configuration > > version ''1292438512'' > > ... > > Dec 15 14:14:56 puppet puppet-agent[10515]: Finished catalog run in > > 882.43 seconds > > Changes: > > Total: 6 > > Events: > > Success: 6 > > Total: 6 > > Resources: > > Changed: 6 > > Out of sync: 6 > > Total: 287 > > That''s not a big number. > > > Time: > > Config retrieval: 72.20 > > This is also suspect. > > > Cron: 0.05 > > Exec: 32.42 > > File: 752.33 > > Indeed. > > > Filebucket: 0.00 > > Mount: 0.98 > > Package: 6.13 > > Schedule: 0.02 > > Service: 9.09 > > Ssh authorized key: 0.07 > > Sysctl: 0.00 > > > > real 34m56.066s > > user 1m6.030s > > sys 0m26.590s > > > > That just means your master are so busy serving catalogs that they > barely have the time to serve files. One possibility is to use file > content offloading (see one of my blog post about this: > http://www.masterzen.fr/2010/03/21/more-puppet-offloading/). > > How many nodes are you compiling at the same time? Apparently you have 6 > master processes running at high CPU usage. > > As I said earlier, I really advise people to try puppet-load (which can > be found in the ext/ directory of the source tarball since puppet 2.6) > to execise load againts a master. This will help you find your actual > concurrency. > > But, if it''s a bug, then could this be an issue with passenger? > -- > Brice Figureau > My Blog: http://www.masterzen.fr/ > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-15 21:45 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 15/12/10 21:10, Disconnect wrote:> As a datapoint, this exact config (with mongrel_cluster) was working > great under 0.25.x. With fewer, slower cpus, slower storage (vm image > files) and 2G of ram...So I ask it again: could it be a problem with passenger more than an issue with puppet itself? It would really be interesting to use some ruby introspection[1] to find exactly where the cpu is spent in those masters. Like with passenger it reparses everything instead of just compiling? (I simply don''t know, just throwing out some ideas) I myself use nginx + mongrel, but have only a dozen of nodes, so I don''t really qualify.> I gave puppet-load a try, but it is throwing errors that I don''t have > time to dig into today: > debug: reading facts from: puppet.foo.com.yaml > /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:72:in > `send_request'': uninitialized constant EventMachine::ConnectionError > (NameError) > from > /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:59:in > `setup_request'' > from > /var/lib/gems/1.8/gems/em-http-request-0.2.15/lib/em-http/request.rb:49:in > `get'' > from ./puppet-load.rb:272:in `spawn_request'' > from ./puppet-load.rb:334:in `spawn''Could it be that you''re missing EventMachine?> Running about 250 nodes, every 30 minutes.Did you try to use mongrel? Do you use splay time? Just some math (which might be totally wrong), to give an idea of how I think we can compute our optimal scaling case: So with 250 nodes and a sleep time of 30 minutes, we need to overcome 250 compiles in every 30 minute time spans. If we assume a concurrency of 2 and all nodes evenly spaced (in time), that means we must compile 125 nodes in 30 minutes. If each compilation takes about 10s, then that means it''ll take 1250s, which means 20 minutes so you have some room for growth :) Now during those 20min your 2 master processes will consume 100% CPU. Since we''re consuming the the CPU for only 66% of the 30 minute span, you''ll globally consume 66% of all your CPU available... Hope that helps, [1]: http://projects.puppetlabs.com/projects/1/wiki/Puppet_Introspection -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2010-Dec-16 00:47 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Just to reply to this - like I said earlier I can get this problem with 1 node checking in against puppetmaster. All the puppetmasterd processes use maximum CPU. It''s not a scaling issue considering serving one node is certainly not going to max out a newish physical server. On Wed, Dec 15, 2010 at 4:45 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> > Just some math (which might be totally wrong), to give an idea of how I > think we can compute our optimal scaling case: > So with 250 nodes and a sleep time of 30 minutes, we need to overcome > 250 compiles in every 30 minute time spans. If we assume a concurrency > of 2 and all nodes evenly spaced (in time), that means we must compile > 125 nodes in 30 minutes. If each compilation takes about 10s, then that > means it''ll take 1250s, which means 20 minutes so you have some room for > growth :) > Now during those 20min your 2 master processes will consume 100% CPU. > Since we''re consuming the the CPU for only 66% of the 30 minute span, > you''ll globally consume 66% of all your CPU available... > > Hope that helps, > > [1]: http://projects.puppetlabs.com/projects/1/wiki/Puppet_Introspection > -- > Brice Figureau > My Blog: http://www.masterzen.fr/ > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Dec-16 01:25 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, Dec 15, 2010 at 4:47 PM, Ashley Penney <apenney@gmail.com> wrote:> Just to reply to this - like I said earlier I can get this problem with 1 > node checking in against puppetmaster. All the puppetmasterd processes use > maximum CPU. It''s not a scaling issue considering serving one node is > certainly not going to max out a newish physical server.That is definitely a problem. Does this happen as soon as a node checks in? Or as soon as you start the passenger processes? Can you post a sanitized strace somewhere?> > > On Wed, Dec 15, 2010 at 4:45 PM, Brice Figureau < > brice-puppet@daysofwonder.com> wrote: > >> >> Just some math (which might be totally wrong), to give an idea of how I >> think we can compute our optimal scaling case: >> So with 250 nodes and a sleep time of 30 minutes, we need to overcome >> 250 compiles in every 30 minute time spans. If we assume a concurrency >> of 2 and all nodes evenly spaced (in time), that means we must compile >> 125 nodes in 30 minutes. If each compilation takes about 10s, then that >> means it''ll take 1250s, which means 20 minutes so you have some room for >> growth :) >> Now during those 20min your 2 master processes will consume 100% CPU. >> Since we''re consuming the the CPU for only 66% of the 30 minute span, >> you''ll globally consume 66% of all your CPU available... >> >> Hope that helps, >> >> [1]: http://projects.puppetlabs.com/projects/1/wiki/Puppet_Introspection >> -- >> Brice Figureau >> My Blog: http://www.masterzen.fr/ >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to puppet-users@googlegroups.com. >> To unsubscribe from this group, send email to >> puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> >> > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- Nigel Kersten - Puppet Labs - http://www.puppetlabs.com -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2010-Dec-16 09:25 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, 2010-12-15 at 19:47 -0500, Ashley Penney wrote:> Just to reply to this - like I said earlier I can get this problem > with 1 node checking in against puppetmaster. All the puppetmasterd > processes use maximum CPU. It''s not a scaling issue considering > serving one node is certainly not going to max out a newish physical > server.This looks like a bug to me. Do your manifests use many file sources? And/or recursive file resources? It''s possible that those masters are spending their time checksuming files. Like I said earlier in the thread the only real way to know is to use Puppet introspection: http://projects.puppetlabs.com/projects/1/wiki/Puppet_Introspection -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Leonid Batizhevsky
2010-Dec-16 12:49 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
I have same issuer with running puppentmaster and puppted in same host. When I updated ruby to 1.8.7 enterprise It resolve problem for me. Leonid S. Batizhevsky On Tue, Dec 14, 2010 at 11:24, Chris <iwouldratherbesleepingnow@gmail.com> wrote:> Hi > > I recently upgraded my puppet masters (and clients) from 0.24.8 to > 2.6.4 > > Previously, my most busy puppet master would hover around about 0.9 > load average, after the upgrade, its load hovers around 5 > > I am running passenger and mysql based stored configs. > > Checking my running processes, ruby (puppetmasterd) shoots up to 99% > cpu load and stays there for a few seconds before dropping again. > Often there are 4 of these running simultaneously, pegging each core > at 99% cpu. > > It seems that there has been a serious performance regression between > 0.24 and 2.6 for my configuration > > I hop the following can help work out where... > > I ran puppetmasterd through a profiler to find the root cause of this > (http://boojum.homelinux.org/profile.svg). The main problem appears > to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in > the evaluate function. > > I added a few timing commands around various sections of that function > to find the following breakdown of times spent inside it, and the two > most intensive calls are > --- > paramobjects = parameters.collect { |param| > param.safeevaluate(scope) > } > --- > > and > --- > resource_titles.flatten.collect { |resource_title| > exceptwrap :type => Puppet::ParseError do > resource = Puppet::Parser::Resource.new( > fully_qualified_type, resource_title, > :parameters => paramobjects, > :file => self.file, > :line => self.line, > :exported => self.exported, > :virtual => virt, > :source => scope.source, > :scope => scope, > :strict => true > ) > > if resource.resource_type.is_a? Puppet::Resource::Type > resource.resource_type.instantiate_resource(scope, resource) > end > scope.compiler.add_resource(scope, resource) > scope.compiler.evaluate_classes([resource_title],scope,false) > if fully_qualified_type == ''class'' > resource > end > }.reject { |resource| resource.nil? } > --- > > > Unfortunately, that is about the limit of my current ruby skills. > What else can be looked at to speed 2.6 back up to the performance of > 0.24? > > > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Dec-17 00:03 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Thu, Dec 16, 2010 at 4:49 AM, Leonid Batizhevsky <the.leonko@gmail.com> wrote:> I have same issuer with running puppentmaster and puppted in same > host. When I updated ruby to 1.8.7 enterprise It resolve problem for > me. > Leonid S. BatizhevskyFor the sake of the archives, what version did you upgrade *from* Leonid ?> > > > On Tue, Dec 14, 2010 at 11:24, Chris > <iwouldratherbesleepingnow@gmail.com> wrote: >> Hi >> >> I recently upgraded my puppet masters (and clients) from 0.24.8 to >> 2.6.4 >> >> Previously, my most busy puppet master would hover around about 0.9 >> load average, after the upgrade, its load hovers around 5 >> >> I am running passenger and mysql based stored configs. >> >> Checking my running processes, ruby (puppetmasterd) shoots up to 99% >> cpu load and stays there for a few seconds before dropping again. >> Often there are 4 of these running simultaneously, pegging each core >> at 99% cpu. >> >> It seems that there has been a serious performance regression between >> 0.24 and 2.6 for my configuration >> >> I hop the following can help work out where... >> >> I ran puppetmasterd through a profiler to find the root cause of this >> (http://boojum.homelinux.org/profile.svg). The main problem appears >> to be in /usr/lib/ruby/site_ruby/1.8/puppet/parser/ast/resource.rb, in >> the evaluate function. >> >> I added a few timing commands around various sections of that function >> to find the following breakdown of times spent inside it, and the two >> most intensive calls are >> --- >> paramobjects = parameters.collect { |param| >> param.safeevaluate(scope) >> } >> --- >> >> and >> --- >> resource_titles.flatten.collect { |resource_title| >> exceptwrap :type => Puppet::ParseError do >> resource = Puppet::Parser::Resource.new( >> fully_qualified_type, resource_title, >> :parameters => paramobjects, >> :file => self.file, >> :line => self.line, >> :exported => self.exported, >> :virtual => virt, >> :source => scope.source, >> :scope => scope, >> :strict => true >> ) >> >> if resource.resource_type.is_a? Puppet::Resource::Type >> resource.resource_type.instantiate_resource(scope, resource) >> end >> scope.compiler.add_resource(scope, resource) >> scope.compiler.evaluate_classes([resource_title],scope,false) >> if fully_qualified_type == ''class'' >> resource >> end >> }.reject { |resource| resource.nil? } >> --- >> >> >> Unfortunately, that is about the limit of my current ruby skills. >> What else can be looked at to speed 2.6 back up to the performance of >> 0.24? >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups "Puppet Users" group. >> To post to this group, send email to puppet-users@googlegroups.com. >> To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >> >> > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- Nigel Kersten - Puppet Labs - http://www.puppetlabs.com -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Leonid Batizhevsky
2010-Dec-17 16:27 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Ruby or puppet? I start use from 0.25.x (from epel repo and not long) and 1.8.5 ruby. Then I update to 2.6.0 and I saw memory problems. I start to google and found: http://projects.puppetlabs.com/projects/1/wiki/Puppet_Red_Hat_Centos "The 1.8.5 branch of Ruby shipped will RHEL5 can exhibit memory leaks. " And upgrade to ruby enterprise 1.8.7 and It solve my problems! Leonid S. Batizhevsky On Fri, Dec 17, 2010 at 03:03, Nigel Kersten <nigel@puppetlabs.com> wrote:> For the sake of the archives, what version did you upgrade *from* Leonid ? >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2010-Dec-17 16:39 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
As a datapoint, I experience this problem on RHEL6: ruby-1.8.7.299-4.el6.x86_64 Gems: passenger (3.0.0) rack (1.2.1) rack-mount (0.6.13) rack-test (0.5.6) rails (3.0.3) On Fri, Dec 17, 2010 at 11:27 AM, Leonid Batizhevsky <the.leonko@gmail.com>wrote:> Ruby or puppet? > I start use from 0.25.x (from epel repo and not long) and 1.8.5 ruby. > Then I update to 2.6.0 and I saw memory problems. > I start to google and found: > http://projects.puppetlabs.com/projects/1/wiki/Puppet_Red_Hat_Centos > "The 1.8.5 branch of Ruby shipped will RHEL5 can exhibit memory leaks. " > And upgrade to ruby enterprise 1.8.7 and It solve my problems! > > Leonid S. Batizhevsky > > > > > > On Fri, Dec 17, 2010 at 03:03, Nigel Kersten <nigel@puppetlabs.com> wrote: > > For the sake of the archives, what version did you upgrade *from* Leonid > ? > > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Leonid Batizhevsky
2011-Jan-08 22:51 UTC
Re: [Puppet Users] puppetmaster 100%cpu usage on 2.6 (not on 0.24)
No, I have not, maybe try to play with passanger worker time to live? Leonid S. Batizhevsky On Fri, Dec 17, 2010 at 19:39, Ashley Penney <apenney@gmail.com> wrote:> As a datapoint, I experience this problem on RHEL6: > ruby-1.8.7.299-4.el6.x86_64 > Gems: > passenger (3.0.0) > rack (1.2.1) > rack-mount (0.6.13) > rack-test (0.5.6) > rails (3.0.3) > On Fri, Dec 17, 2010 at 11:27 AM, Leonid Batizhevsky <the.leonko@gmail.com> > wrote: >> >> Ruby or puppet? >> I start use from 0.25.x (from epel repo and not long) and 1.8.5 ruby. >> Then I update to 2.6.0 and I saw memory problems. >> I start to google and found: >> http://projects.puppetlabs.com/projects/1/wiki/Puppet_Red_Hat_Centos >> "The 1.8.5 branch of Ruby shipped will RHEL5 can exhibit memory leaks. " >> And upgrade to ruby enterprise 1.8.7 and It solve my problems! >> >> Leonid S. Batizhevsky >> >> >> >> >> >> On Fri, Dec 17, 2010 at 03:03, Nigel Kersten <nigel@puppetlabs.com> wrote: >> > For the sake of the archives, what version did you upgrade *from* Leonid >> > ? >> > >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to puppet-users@googlegroups.com. >> To unsubscribe from this group, send email to >> puppet-users+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-25 22:11 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Brice Figureau <brice-puppet@daysofwonder.com> writes:> On 15/12/10 19:27, Ashley Penney wrote: >> This issue is definitely a problem. I have a support ticket in with >> Puppet Labs about the same thing. My CPU remains at 100% almost >> constantly and it slows things down significantly. If you strace it you >> can see that very little appears to be going on. This is absolutely not >> normal behavior. Even when I had 1 client checking in I had all cores >> fully used. > > I do agree that it''s not the correct behavior. I suggest you to strace > or use any other ruby introspection techniques to find what part of the > master is taking CPU.I''m having a similar problem with 2.6.3. At this point, I can''t get reliable puppet runs, and I''m not sure what to do. What seems to happen is things are working fine at the beginning. Catalog compiles peg the CPU for the puppet process that is doing them and take anywhere from between 20 seconds and 75 seconds. Then things get drastically worse after 4 compiles (note: I have four mongrels too, coincidence?), catalog compiles shoot up to 115, 165, 209, 268, 273, 341, 418, 546, 692, 774, 822, then 1149 seconds... then things are really hosed. Sometimes hosts will fail outright and complain about weird things, like: Jan 25 14:04:34 puppetmaster puppet-master[30294]: Host is missing hostname and/or domain: gull.example.com Jan 25 14:04:55 puppetmaster puppet-master[30294]: Failed to parse template site-apt/local.list: Could not find value for ''lsbdistcodename'' at /etc/puppet/modules/site-apt/manifests/init.pp:4 on node gull.example.com All four of my mongrels are constantly pegged, doing 40-50% of the CPU each, occupying all available CPUs. They never settle down. I''ve got 74 nodes checking in now, it doesn''t seem like its that many, but perhaps i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, 2gigs of ram machine)? I''ve tried a large number of different things to attempt to work around this: 0. reduced my node check-in times to be once an hour (and splayed randomly) 1. turn on puppetqd/stomp queuing This didn''t seem to make a difference, its off now 2. turn on thin stored configs This sort of helped a little, but not enough 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 I didn''t see any appreciable difference here. I ended up going back to 2.3.5 because that was the packaged version. 4. tried to offload file content via nginx[1] This maybe helped a little, but its clear that the problem isn''t the fileserving, it seems to be something in the catalog compilation. 5. tried to cache catalogs through adding a http front-end cache and expiring that cache when manifests are updated[1] I''m not sure this works at all. 6. set ''fair'' queuing in my nginx.conf[3] This seemed to help for a few days, but then things got bad again. 7. set --http_compression I''m not sure if this actually hurts the master or not (because it has to now occupy the CPU compressing catalogs?) 8. tried to follow the introspection technique[2] this wasn''t so easy to do, I had to operate really fast, because if I was too slow the thread would exit, or it would get hung up on: [Thread 0xb6194b70 (LWP 25770) exited] [New Thread 0xb6194b70 (LWP 25806)] Eventually I did manage to get somewhere: 0xb74f1b16 in memcpy () from /lib/i686/cmov/libc.so.6 (gdb) session-ruby (gdb) redirect_stdout $1 = 2 (gdb) $2 = 2 (gdb) eval "caller" $3 = 3 (gdb) rb_object_counts Cannot get thread event message: debugger service failed An error occurred while in a function called from GDB. Evaluation of the expression containing the function (rb_eval_string_protect) will be abandoned. When the function is done executing, GDB will silently stop. (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts ''---''; puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\"" Invalid character ''\'' in expression. ... then nothing. In the tail: root@puppetmaster:/tmp# tail -f ruby-debug.28724 207 Puppet::Util::LoadedFile["/usr/lib/ruby/1.8/active_record/base.rb:2746:in `attributes=''", "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `each''", "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `attributes=''", "/usr/lib/ruby/1.8/active_record/base.rb:2438:in `initialize''", "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `new''", "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `build_association''", "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:423:in `build_record''", "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:102:in `build''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:145:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `each''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `each''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/indirector/facts/active_record.rb:32:in `save''", "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:256:in `save''", "/usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save''", "/usr/lib/ruby/1.8/puppet/indirector.rb:64:in `save''", "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:25:in `extract_facts_from_request''", "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:30:in `find''", "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find''", "/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process''", "/usr/lib/ruby/1.8/mongrel.rb:159:in `process_client''", "/usr/lib/ruby/1.8/mongrel.rb:158:in `each''", "/usr/lib/ruby/1.8/mongrel.rb:158:in `process_client''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `run''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `initialize''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `new''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `run''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `initialize''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `new''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `run''", "/usr/lib/ruby/1.8/puppet/network/http/mongrel.rb:22:in `listen''", "/usr/lib/ruby/1.8/puppet/network/server.rb:127:in `listen''", "/usr/lib/ruby/1.8/puppet/network/server.rb:142:in `start''", "/usr/lib/ruby/1.8/puppet/daemon.rb:124:in `start''", "/usr/lib/ruby/1.8/puppet/application/master.rb:114:in `main''", "/usr/lib/ruby/1.8/puppet/application/master.rb:46:in `run_command''", "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run''", "/usr/lib/ruby/1.8/puppet/application.rb:393:in `exit_on_fail''", "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run''", "/usr/lib/ruby/1.8/puppet/util/command_line.rb:55:in `execute''", "/usr/bin/puppet:4"] 190 Puppet::Parser::AST::CaseStatement 181 ZAML::Label 170 Puppet::Parser::AST::Default 152 ActiveRecord::DynamicFinderMatch 152 ActiveRecord::DynamicScopeMatch 150 ActiveSupport::OrderedHash 148 OptionParser::Switch::RequiredArgument 138 YAML::Syck::Node 125 Range 124 Puppet::Parser::AST::IfStatement 117 ActiveRecord::Errors 115 Puppet::Provider::Confine::Exists 109 Puppet::Parser::AST::Selector 108 UnboundMethod 107 File::Stat 99 Puppet::Parameter::Value 90 Bignum 86 OptionParser::Switch::NoArgument 85 Puppet::Util::Settings::Setting 80 Puppet::Indirector::Request 75 Puppet::Parser::AST::ComparisonOperator 74 Puppet::Parser::Lexer::Token 73 Puppet::Parser::AST::ResourceOverride 70 ActiveRecord::ConnectionAdapters::MysqlColumn 66 Sync 65 StringIO 64 Binding 62 ActiveSupport::Callbacks::Callback 61 Puppet::Util::Settings::FileSetting 58 Puppet::Provider::ConfineCollection 56 Mysql::Result 52 Puppet::Module 47 Puppet::Network::AuthStore::Declaration 46 IPAddr 39 Puppet::Util::Settings::BooleanSetting 38 Thread 36 Puppet::Util::Autoload 35 Mysql 35 ActiveRecord::ConnectionAdapters::MysqlAdapter 34 Puppet::Parser::AST::Not 28 Puppet::Type::MetaParamLoglevel 28 Puppet::Type::File 28 Puppet::Type::File::ParameterPurge 28 Puppet::Type::File::ParameterLinks 28 Puppet::Type::File::Ensure 28 Puppet::Type::File::ParameterBackup 28 Puppet::Type::File::ParameterReplace 28 Puppet::Type::File::ParameterProvider 28 Puppet::Type::File::ParameterPath 28 Puppet::Type::File::ProviderPosix 28 Puppet::Type::File::ParameterChecksum but then it seemed to stop logging entirely... I''m available on IRC to try more advanced debugging, just ping me (hacim). I''d really like things to function again! micah 1. http://www.masterzen.fr/2010/03/21/more-puppet-offloading/ 2. http://projects.puppetlabs.com/projects/1/wiki/Puppet_Introspection 3. http://www.mail-archive.com/puppet-users@googlegroups.com/msg13692.html -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Felix Frank
2011-Jan-26 09:21 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
> What seems to happen is things are working fine at the > beginning. Catalog compiles peg the CPU for the puppet process that is > doing them and take anywhere from between 20 seconds and 75 > seconds. Then things get drastically worse after 4 compiles (note: I > have four mongrels too, coincidence?), catalog compiles shoot up to 115, > 165, 209, 268, 273, 341, 418, 546, 692, 774, 822, then 1149 > seconds... then things are really hosed. Sometimes hosts will fail > outright and complain about weird things, like: > > Jan 25 14:04:34 puppetmaster puppet-master[30294]: Host is missing hostname and/or domain: gull.example.com > Jan 25 14:04:55 puppetmaster puppet-master[30294]: Failed to parse template site-apt/local.list: Could not find value for ''lsbdistcodename'' at /etc/puppet/modules/site-apt/manifests/init.pp:4 on node gull.example.com > > All four of my mongrels are constantly pegged, doing 40-50% of the CPU > each, occupying all available CPUs. They never settle down. I''ve got 74 > nodes checking in now, it doesn''t seem like its that many, but perhaps > i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, > 2gigs of ram machine)?Hmm, some quick math: You have 74 nodes that (I assume) check in at least once each 1800 seconds. Each compile takes above 40 seconds on average. So all compiles (if run serially) take some 3000 seconds. Of course, seeing as you have two cores on that machine, you can take advantage of some concurrency, but in the ideal case you''re down to 1500 seconds, which leaves you with little room to breathe (and in real life, concurrency will not even be that efficient). I propose you need to restructure your manifest so that it compiles faster (if at all possible) or scale up your master. What you''re watching is probably just overload and resource thrashing. Do you have any idea why each individual compilation takes that long? I see a 15-20 seconds compile now and again, but most compiles are under 3 seconds in my case (but then, that''s with 4 2.4GHz cores, so it doesn''t necessarily compare). Regards, Felix -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-26 10:13 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote:> Brice Figureau <brice-puppet@daysofwonder.com> writes: > > > On 15/12/10 19:27, Ashley Penney wrote: > >> This issue is definitely a problem. I have a support ticket in with > >> Puppet Labs about the same thing. My CPU remains at 100% almost > >> constantly and it slows things down significantly. If you strace it you > >> can see that very little appears to be going on. This is absolutely not > >> normal behavior. Even when I had 1 client checking in I had all cores > >> fully used. > > > > I do agree that it''s not the correct behavior. I suggest you to strace > > or use any other ruby introspection techniques to find what part of the > > master is taking CPU. > > I''m having a similar problem with 2.6.3. At this point, I can''t get > reliable puppet runs, and I''m not sure what to do. > > What seems to happen is things are working fine at the > beginning. Catalog compiles peg the CPU for the puppet process that is > doing them and take anywhere from between 20 seconds and 75 > seconds. Then things get drastically worse after 4 compiles (note: I > have four mongrels too, coincidence?), catalog compiles shoot up to 115, > 165, 209, 268, 273, 341, 418, 546, 692, 774, 822, then 1149 > seconds... then things are really hosed. Sometimes hosts will fail > outright and complain about weird things, like: > > Jan 25 14:04:34 puppetmaster puppet-master[30294]: Host is missing hostname and/or domain: gull.example.com > Jan 25 14:04:55 puppetmaster puppet-master[30294]: Failed to parse template site-apt/local.list: Could not find value for ''lsbdistcodename'' at /etc/puppet/modules/site-apt/manifests/init.pp:4 on node gull.example.com > > All four of my mongrels are constantly pegged, doing 40-50% of the CPU > each, occupying all available CPUs. They never settle down. I''ve got 74 > nodes checking in now, it doesn''t seem like its that many, but perhaps > i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, > 2gigs of ram machine)?The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you shouldn''t try to achieve a concurrency of 4 (which your mongrel are trying to do), otherwise what will happen is that more than one request will be accepted by one mongrel process and each thread will contend for the CPU. The bad news is that the ruby MRI uses green threading, so the second thread will only run when the first one will either sleep, do I/O or relinquish the CPU voluntary. In a word, it will only run when the first thread will finish its compilation. Now you have 74 nodes, with the worst compilation time of 75s (which is a lot), that translates to 74*75 = 5550s of compilation time. With a concurrency of 2, that''s still 2775s of compilation time per round of <insert here your default sleep time>. With the default 30min of sleep time and assuming a perfect scheduling, that''s still larger than a round of sleep time, which means that you won''t ever finish compiling nodes, when the first node will ask again for a catalog. And I''m talking only about compilation. If your manifests use file sourcing, you must also add this to the equation. Another explanation of the issue is swapping. You mention your server has 2GiB of RAM. Are you sure your 4 mongrel processes after some times still fit in the physical RAM (along with the other thing running on the server)? Maybe your server is constantly swapping. So you can do several thing to get better performances: * reduce the number of nodes that check in at a single time (ie increase sleep time) * reduce the time it takes to compile a catalog: + which includes not using storeconfigs (or using puppetqd or thin_storeconfig instead). + Check the server is not swapping. + Reduce the number of mongrel instances, to artifically reduce the concurrency (this is counter-intuitive I know) + use a "better" ruby interpreter like Ruby Enterprise Edition (for several reasons this ones has better GC, better memory footprint). + Cache compiled catalogs in nginx + offload file content serving in nginx + Use passenger instead of mongrel Note: you can use puppet-load (in the 2.6 source distribution) to simulate concurrent node asking for catalogs. This is really helpful to size a puppetmaster and check the real concurrency a stack/hardware can give.> I''ve tried a large number of different things to attempt to work around > this: > > > 0. reduced my node check-in times to be once an hour (and splayed > randomly) > > 1. turn on puppetqd/stomp queuing > > This didn''t seem to make a difference, its off now > > 2. turn on thin stored configs > > This sort of helped a little, but not enough > > 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 > > I didn''t see any appreciable difference here. I ended up going back to > 2.3.5 because that was the packaged version.Since you seem to use Debian, make sure you use either the latest ruby lenny backports (or REE) as they fixed an issue with pthreads and CPU consumption: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229> 4. tried to offload file content via nginx[1] > > This maybe helped a little, but its clear that the problem isn''t the > fileserving, it seems to be something in the catalog compilation.Actually offloading only helps when the puppet agent needs the file content which happens only when this one changes or if the node doesn''t have this file. In practice this helps only for new nodes.> 5. tried to cache catalogs through adding a http front-end cache and > expiring that cache when manifests are updated[1] > > I''m not sure this works at all.This should have helped because this would prevent the puppetmaster to even be called. You might check your nginx configuration then.> 6. set ''fair'' queuing in my nginx.conf[3] > > This seemed to help for a few days, but then things got bad again. > > 7. set --http_compression > > I''m not sure if this actually hurts the master or not (because it has > to now occupy the CPU compressing catalogs?)This is a client option, and you need the collaboration of nginx for it to work. This will certainly add more burden on your master CPU, because nginx now has to gzip everything you''re sending.> 8. tried to follow the introspection technique[2] > > this wasn''t so easy to do, I had to operate really fast, because if I > was too slow the thread would exit, or it would get hung up on: > > [Thread 0xb6194b70 (LWP 25770) exited] > [New Thread 0xb6194b70 (LWP 25806)]When you attach gdb, how many threads are running?> Eventually I did manage to get somewhere: > > 0xb74f1b16 in memcpy () from /lib/i686/cmov/libc.so.6 > (gdb) session-ruby > (gdb) redirect_stdout > $1 = 2 > (gdb) > $2 = 2 > (gdb) eval "caller" > $3 = 3 > (gdb) rb_object_counts > Cannot get thread event message: debugger service failed > An error occurred while in a function called from GDB. > Evaluation of the expression containing the function > (rb_eval_string_protect) will be abandoned. > When the function is done executing, GDB will silently stop. > (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts ''---''; puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\"" > Invalid character ''\'' in expression. > > ... then nothing. > > In the tail: > > root@puppetmaster:/tmp# tail -f ruby-debug.28724 > 207 Puppet::Util::LoadedFile["/usr/lib/ruby/1.8/active_record/base.rb:2746:in `attributes=''", "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `each''", "/usr/lib/ruby/1.8/active_record/base.rb:2742:in `attributes=''", "/usr/lib/ruby/1.8/active_record/base.rb:2438:in `initialize''", "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `new''", "/usr/lib/ruby/1.8/active_record/reflection.rb:162:in `build_association''", "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:423:in `build_record''", "/usr/lib/ruby/1.8/active_record/associations/association_collection.rb:102:in `build''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:145:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `each''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:144:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `each''", "/usr/lib/ruby/1.8/puppet/rails/host.rb:140:in `merge_facts''", "/usr/lib/ruby/1.8/puppet/indirector/facts/active_record.rb:32:in `save''", "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:256:in `save''", "/usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save''", "/usr/lib/ruby/1.8/puppet/indirector.rb:64:in `save''", "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:25:in `extract_facts_from_request''", "/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:30:in `find''", "/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find''", "/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send''", "/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process''", "/usr/lib/ruby/1.8/mongrel.rb:159:in `process_client''", "/usr/lib/ruby/1.8/mongrel.rb:158:in `each''", "/usr/lib/ruby/1.8/mongrel.rb:158:in `process_client''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `run''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `initialize''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `new''", "/usr/lib/ruby/1.8/mongrel.rb:285:in `run''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `initialize''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `new''", "/usr/lib/ruby/1.8/mongrel.rb:268:in `run''", "/usr/lib/ruby/1.8/puppet/network/http/mongrel.rb:22:in `listen''", "/usr/lib/ruby/1.8/puppet/network/server.rb:127:in `listen''", "/usr/lib/ruby/1.8/puppet/network/server.rb:142:in `start''", "/usr/lib/ruby/1.8/puppet/daemon.rb:124:in `start''", "/usr/lib/ruby/1.8/puppet/application/master.rb:114:in `main''", "/usr/lib/ruby/1.8/puppet/application/master.rb:46:in `run_command''", "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run''", "/usr/lib/ruby/1.8/puppet/application.rb:393:in `exit_on_fail''", "/usr/lib/ruby/1.8/puppet/application.rb:287:in `run''", "/usr/lib/ruby/1.8/puppet/util/command_line.rb:55:in `execute''", "/usr/bin/puppet:4"] > > 190 Puppet::Parser::AST::CaseStatement > 181 ZAML::Label > 170 Puppet::Parser::AST::Default > 152 ActiveRecord::DynamicFinderMatch > 152 ActiveRecord::DynamicScopeMatch > 150 ActiveSupport::OrderedHash > 148 OptionParser::Switch::RequiredArgument > 138 YAML::Syck::Node > 125 Range > 124 Puppet::Parser::AST::IfStatement > 117 ActiveRecord::Errors > 115 Puppet::Provider::Confine::Exists > 109 Puppet::Parser::AST::Selector > 108 UnboundMethod > 107 File::Stat > 99 Puppet::Parameter::Value > 90 Bignum > 86 OptionParser::Switch::NoArgument > 85 Puppet::Util::Settings::Setting > 80 Puppet::Indirector::Request > 75 Puppet::Parser::AST::ComparisonOperator > 74 Puppet::Parser::Lexer::Token > 73 Puppet::Parser::AST::ResourceOverride > 70 ActiveRecord::ConnectionAdapters::MysqlColumn > 66 Sync > 65 StringIO > 64 Binding > 62 ActiveSupport::Callbacks::Callback > 61 Puppet::Util::Settings::FileSetting > 58 Puppet::Provider::ConfineCollection > 56 Mysql::Result > 52 Puppet::Module > 47 Puppet::Network::AuthStore::Declaration > 46 IPAddr > 39 Puppet::Util::Settings::BooleanSetting > 38 Thread > 36 Puppet::Util::Autoload > 35 Mysql > 35 ActiveRecord::ConnectionAdapters::MysqlAdapter > 34 Puppet::Parser::AST::Not > 28 Puppet::Type::MetaParamLoglevel > 28 Puppet::Type::File > 28 Puppet::Type::File::ParameterPurge > 28 Puppet::Type::File::ParameterLinks > 28 Puppet::Type::File::Ensure > 28 Puppet::Type::File::ParameterBackup > 28 Puppet::Type::File::ParameterReplace > 28 Puppet::Type::File::ParameterProvider > 28 Puppet::Type::File::ParameterPath > 28 Puppet::Type::File::ProviderPosix > 28 Puppet::Type::File::ParameterChecksumThis is just the objects used at a given time. What is more interesting is where the CPU time is spent (ie getting a stacktrace would be helpful, but not easy).> but then it seemed to stop logging entirely... > > I''m available on IRC to try more advanced debugging, just ping me > (hacim). I''d really like things to function again!I''ll ping you, but I''m just really busy for this very next couple of days :( -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-26 14:44 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Felix Frank <felix.frank@alumni.tu-berlin.de> writes:> I propose you need to restructure your manifest so that it compiles > faster (if at all possible) or scale up your master. What you''re > watching is probably just overload and resource thrashing.I''m interested in ideas for what are good steps for restructuring manifests so they can compile faster, or at least methods for identifying problematic areas in manifests.> Do you have any idea why each individual compilation takes that long?It wasn''t before. Before things start spinning, compilation times are between 9 seconds and 60 seconds, usually averaging just shy of 30 seconds. micah -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Felix Frank
2011-Jan-26 14:46 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 01/26/2011 03:44 PM, Micah Anderson wrote:> Felix Frank <felix.frank@alumni.tu-berlin.de> writes: > >> I propose you need to restructure your manifest so that it compiles >> faster (if at all possible) or scale up your master. What you''re >> watching is probably just overload and resource thrashing. > > I''m interested in ideas for what are good steps for restructuring > manifests so they can compile faster, or at least methods for > identifying problematic areas in manifests.Are there many templates or use of the file() function? Do you make heavy use of modules and the autoloader?>> Do you have any idea why each individual compilation takes that long? > > It wasn''t before. Before things start spinning, compilation times are > between 9 seconds and 60 seconds, usually averaging just shy of 30 > seconds.That''s still quite considerable IMO. Regards, Felix -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-26 15:11 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Brice Figureau <brice-puppet@daysofwonder.com> writes:> On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote: >> Brice Figureau <brice-puppet@daysofwonder.com> writes: >> >> All four of my mongrels are constantly pegged, doing 40-50% of the CPU >> each, occupying all available CPUs. They never settle down. I''ve got 74 >> nodes checking in now, it doesn''t seem like its that many, but perhaps >> i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, >> 2gigs of ram machine)? > > The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you > shouldn''t try to achieve a concurrency of 4 (which your mongrel are > trying to do), otherwise what will happen is that more than one request > will be accepted by one mongrel process and each thread will contend for > the CPU. The bad news is that the ruby MRI uses green threading, so the > second thread will only run when the first one will either sleep, do I/O > or relinquish the CPU voluntary. In a word, it will only run when the > first thread will finish its compilation.Ok, that is a good thing to know. I wasn''t aware that ruby was not able to do that.> Now you have 74 nodes, with the worst compilation time of 75s (which is > a lot), that translates to 74*75 = 5550s of compilation time. > With a concurrency of 2, that''s still 2775s of compilation time per > round of <insert here your default sleep time>. With the default 30min > of sleep time and assuming a perfect scheduling, that''s still larger > than a round of sleep time, which means that you won''t ever finish > compiling nodes, when the first node will ask again for a catalog.I''m doing 60 minutes of sleep time, which is 3600 seconds an hour, the concurrency of 2 giving me 2775s of compile time per hour does keep me under the 3600 seconds... assuming scheduling is perfect, which it very likely is not.> And I''m talking only about compilation. If your manifests use file > sourcing, you must also add this to the equation.As explained, I set up your nginx method for offloading file sourcing.> Another explanation of the issue is swapping. You mention your server > has 2GiB of RAM. Are you sure your 4 mongrel processes after some times > still fit in the physical RAM (along with the other thing running on the > server)? > Maybe your server is constantly swapping.I''m actually doing fine on memory, not dipping into swap. I''ve watched i/o to see if I could identify either a swap or disk problem, but didn''t notice very much happening there. The CPU usage of the mongrel processes is pretty much where everything is spending its time. I''ve been wondering if I have some loop in a manifest or something that is causing them to just spin.> So you can do several thing to get better performances: > * reduce the number of nodes that check in at a single time (ie increase > sleep time)I''ve already reduced to once per hour, but I could consider reducing it more.> * reduce the time it takes to compile a catalog: > + which includes not using storeconfigs (or using puppetqd or > thin_storeconfig instead).I need to use storeconfigs, and as detailed in my original message, I''ve tried puppetqd and it didn''t do much for me. thin_storeconfigs did help, and I''m still using it, so this one has already been done too.> + Check the server is not swapping.Not swapping.> + Reduce the number of mongrel instances, to artifically reduce the > concurrency (this is counter-intuitive I know)Ok, I''m backing off to two mongrels to see how well that works.> + use a "better" ruby interpreter like Ruby Enterprise Edition (for > several reasons this ones has better GC, better memory footprint).I''m pretty sure my problem isn''t memory, so I''m not sure if these will help much.> + Cache compiled catalogs in nginxDoing this.> + offload file content serving in nginxDoing this> + Use passenger instead of mongrelI tried to switch to passenger, and things were much worse. Actually, passenger worked fine with 0.25, but when I upgraded I couldn''t get it to function anymore. I actually had to go back to nginx to get things functioning again.>> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 >> >> I didn''t see any appreciable difference here. I ended up going back to >> 2.3.5 because that was the packaged version. > > Since you seem to use Debian, make sure you use either the latest ruby > lenny backports (or REE) as they fixed an issue with pthreads and CPU > consumption: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229I''m using Debian Squeeze, which has the same version you are mentioning from lenny backports (2.3.5).>> 5. tried to cache catalogs through adding a http front-end cache and >> expiring that cache when manifests are updated[1] >> >> I''m not sure this works at all. > > This should have helped because this would prevent the puppetmaster to > even be called. You might check your nginx configuration then.Hmm. According to jamesturnbull, the rest terminus shouldn''t allow you to request any node''s catalog, so I''m not sure how this can work at all... but in case I''ve got something screwed up in my nginx.conf, I''d really be happy if you could have a look at it, its possible that I misunderstood something from your blog post! Here it is: user www-data; worker_processes 2; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { # In a reverse proxy situation, max_clients becomes # max_clients = worker_processes * worker_connections/4 worker_connections 2048; } http { default_type application/octet-stream; sendfile on; tcp_nopush on; tcp_nodelay on; large_client_header_buffers 1024 2048k; client_max_body_size 150m; proxy_buffers 128 4k; keepalive_timeout 65; gzip on; gzip_min_length 1000; gzip_types text/plain; ssl on; ssl_certificate /var/lib/puppet/ssl/certs/puppetmaster.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/puppetmaster.pem; ssl_client_certificate /var/lib/puppet/ssl/ca/ca_crt.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_session_cache shared:SSL:8m; ssl_session_timeout 5m; proxy_read_timeout 600; upstream puppet_mongrel { fair; server 127.0.0.1:18140; server 127.0.0.1:18141; server 127.0.0.1:18142; server 127.0.0.1:18143; } log_format noip ''0.0.0.0 - $remote_user [$time_local] '' ''"$request" $status $body_bytes_sent '' ''"$http_referer" "$http_user_agent"''; proxy_cache_path /var/cache/nginx/cache levels=1:2 keys_zone=puppetcache:10m; server { listen 8140; access_log /var/log/nginx/access.log noip; ssl_verify_client required; root /etc/puppet; # make sure we serve everything # as raw types { } default_type application/x-raw; # serve static file for the [files] mountpoint location /production/file_content/files/ { allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; alias /etc/puppet/files/; } # serve modules files sections location ~ /production/file_content/[^/]+/files/ { # it is advisable to have some access rules here allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; root /etc/puppet/modules; # rewrite /production/file_content/module/files/file.txt # to /module/file.text rewrite ^/production/file_content/([^/]+)/files/(.+)$ $1/$2 break; } # Variables # $ssl_cipher returns the line of those utilized it is cipher for established SSL-connection # $ssl_client_serial returns the series number of client certificate for established SSL-connection # $ssl_client_s_dn returns line subject DN of client certificate for established SSL-connection # $ssl_client_i_dn returns line issuer DN of client certificate for established SSL-connection # $ssl_protocol returns the protocol of established SSL-connection location / { proxy_pass http://puppet_mongrel; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Client-Verify SUCCESS; proxy_set_header X-SSL-Subject $ssl_client_s_dn; proxy_set_header X-SSL-Issuer $ssl_client_i_dn; proxy_buffer_size 16k; proxy_buffers 8 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; proxy_read_timeout 540; # we handle catalog differently # because we want to cache them location /production/catalog { proxy_pass http://puppet_mongrel; proxy_redirect off; # it is a good thing to actually restrict who # can ask for a catalog (especially for cached # catalogs) allow 172.16.0.0/16; allow 10.0.1.0/8; allow 127.0.0.1/8; deny all; # where to cache contents proxy_cache puppetcache; # we cache content by catalog host # we could also use $args to take into account request # facts, but those change too often (ie uptime or memory) # to be really usefull proxy_cache_key $uri; # define how long to cache response # normal catalogs will be cached 2 weeks proxy_cache_valid 200 302 301 2w; # errors are not cached long proxy_cache_valid 500 403 1m; # the rest is cached a little bit proxy_cache_valid any 30m; } # catch all location for other terminii location / { proxy_pass http://puppet_mongrel; proxy_redirect off; } } } server { listen 8141; ssl_verify_client off; root /var/empty; access_log /var/log/nginx/access.log noip; location / { proxy_pass http://puppet_mongrel; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Client-Verify FAILURE; proxy_set_header X-SSL-Subject $ssl_client_s_dn; proxy_set_header X-SSL-Issuer $ssl_client_i_dn; } } }>> 7. set --http_compression >> >> I''m not sure if this actually hurts the master or not (because it has >> to now occupy the CPU compressing catalogs?) > > This is a client option, and you need the collaboration of nginx for it > to work. This will certainly add more burden on your master CPU, because > nginx now has to gzip everything you''re sending.Yeah, I have the gzip compression turned on in nginx, but I dont really need it and my master could use the break.>> 8. tried to follow the introspection technique[2] >> >> this wasn''t so easy to do, I had to operate really fast, because if I >> was too slow the thread would exit, or it would get hung up on: >> >> [Thread 0xb6194b70 (LWP 25770) exited] >> [New Thread 0xb6194b70 (LWP 25806)] > > When you attach gdb, how many threads are running?I''m not sure, how can I determine that? I just had the existing 4 mongrel processes.>> (gdb) eval "total = \[\[ObjectSpace\]\].each_object(Array)\{\|x\| puts ''---''; puts x.inspect \}; puts \\"---\\nTotal Arrays: \#{total}\\"" >> Invalid character ''\'' in expression.The above seemed to be a problem with the expression on the wiki page, does anyone know what that should be so gdb doesn''t have a problem with it?>> I''m available on IRC to try more advanced debugging, just ping me >> (hacim). I''d really like things to function again! > > I''ll ping you, but I''m just really busy for this very next couple of > days :(Thanks for any help or ideas, I''m out of ideas myself so anything helps! micah -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-26 15:35 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, 2011-01-26 at 09:44 -0500, Micah Anderson wrote:> Felix Frank <felix.frank@alumni.tu-berlin.de> writes: > > > I propose you need to restructure your manifest so that it compiles > > faster (if at all possible) or scale up your master. What you''re > > watching is probably just overload and resource thrashing. > > I''m interested in ideas for what are good steps for restructuring > manifests so they can compile faster, or at least methods for > identifying problematic areas in manifests. > > > Do you have any idea why each individual compilation takes that long? > > It wasn''t before. Before things start spinning, compilation times are > between 9 seconds and 60 seconds, usually averaging just shy of 30 > seconds.Do you use a External Node Classifier? -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-26 16:23 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, 2011-01-26 at 10:11 -0500, Micah Anderson wrote:> Brice Figureau <brice-puppet@daysofwonder.com> writes: > > > On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote: > >> Brice Figureau <brice-puppet@daysofwonder.com> writes: > >> > >> All four of my mongrels are constantly pegged, doing 40-50% of the CPU > >> each, occupying all available CPUs. They never settle down. I''ve got 74 > >> nodes checking in now, it doesn''t seem like its that many, but perhaps > >> i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, > >> 2gigs of ram machine)? > > > > The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you > > shouldn''t try to achieve a concurrency of 4 (which your mongrel are > > trying to do), otherwise what will happen is that more than one request > > will be accepted by one mongrel process and each thread will contend for > > the CPU. The bad news is that the ruby MRI uses green threading, so the > > second thread will only run when the first one will either sleep, do I/O > > or relinquish the CPU voluntary. In a word, it will only run when the > > first thread will finish its compilation. > > Ok, that is a good thing to know. I wasn''t aware that ruby was not able > to do that. > > > Now you have 74 nodes, with the worst compilation time of 75s (which is > > a lot), that translates to 74*75 = 5550s of compilation time. > > With a concurrency of 2, that''s still 2775s of compilation time per > > round of <insert here your default sleep time>. With the default 30min > > of sleep time and assuming a perfect scheduling, that''s still larger > > than a round of sleep time, which means that you won''t ever finish > > compiling nodes, when the first node will ask again for a catalog. > > I''m doing 60 minutes of sleep time, which is 3600 seconds an hour, the > concurrency of 2 giving me 2775s of compile time per hour does keep me > under the 3600 seconds... assuming scheduling is perfect, which it very > likely is not. > > > And I''m talking only about compilation. If your manifests use file > > sourcing, you must also add this to the equation. > > As explained, I set up your nginx method for offloading file sourcing. > > > Another explanation of the issue is swapping. You mention your server > > has 2GiB of RAM. Are you sure your 4 mongrel processes after some times > > still fit in the physical RAM (along with the other thing running on the > > server)? > > Maybe your server is constantly swapping. > > I''m actually doing fine on memory, not dipping into swap. I''ve watched > i/o to see if I could identify either a swap or disk problem, but didn''t > notice very much happening there. The CPU usage of the mongrel processes > is pretty much where everything is spending its time. > > I''ve been wondering if I have some loop in a manifest or something that > is causing them to just spin.I don''t think it''s the problem. There could be some ruby internals issues playing here, but I doubt something in your manifest creates a loop. What is strange is that you mentioned that the very first catalog compilations were fine, but then the compilation time increases.> > So you can do several thing to get better performances: > > * reduce the number of nodes that check in at a single time (ie increase > > sleep time) > > I''ve already reduced to once per hour, but I could consider reducing it > more.That would be interesting. This would help us know if the problem is too many load/concurrency for your clients or a problem in the master itself. BTW, what''s the load on the server?> > * reduce the time it takes to compile a catalog: > > + which includes not using storeconfigs (or using puppetqd or > > thin_storeconfig instead). > > I need to use storeconfigs, and as detailed in my original message, I''ve > tried puppetqd and it didn''t do much for me. thin_storeconfigs did help, > and I''m still using it, so this one has already been done too. > > > + Check the server is not swapping. > > Not swapping.OK, good.> > + Reduce the number of mongrel instances, to artifically reduce the > > concurrency (this is counter-intuitive I know) > > Ok, I''m backing off to two mongrels to see how well that works.Let me know if that changes something.> > + use a "better" ruby interpreter like Ruby Enterprise Edition (for > > several reasons this ones has better GC, better memory footprint). > > I''m pretty sure my problem isn''t memory, so I''m not sure if these will > help much.Well, having a better GC means that the ruby interpreter will become faster at allocating stuff or recycling object. That in the end means the overall memory footprint can be better, but that also means it will spend much less time doing garbage stuff (ie better use the CPU for your code and not for tidying stuff).> > + Cache compiled catalogs in nginx > > Doing this. > > > + offload file content serving in nginx > > Doing this > > > + Use passenger instead of mongrel > > I tried to switch to passenger, and things were much worse. Actually, > passenger worked fine with 0.25, but when I upgraded I couldn''t get it > to function anymore. I actually had to go back to nginx to get things > functioning again. > > >> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 > >> > >> I didn''t see any appreciable difference here. I ended up going back to > >> 2.3.5 because that was the packaged version. > > > > Since you seem to use Debian, make sure you use either the latest ruby > > lenny backports (or REE) as they fixed an issue with pthreads and CPU > > consumption: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229 > > I''m using Debian Squeeze, which has the same version you are mentioning > from lenny backports (2.3.5).I was talking about the ruby1.8 package, not rails. Make sure you use the squeeze version or the lenny-backports one.> >> 5. tried to cache catalogs through adding a http front-end cache and > >> expiring that cache when manifests are updated[1] > >> > >> I''m not sure this works at all. > > > > This should have helped because this would prevent the puppetmaster to > > even be called. You might check your nginx configuration then. > > Hmm. According to jamesturnbull, the rest terminus shouldn''t allow you > to request any node''s catalog, so I''m not sure how this can work at > all... but in case I''ve got something screwed up in my nginx.conf, I''d > really be happy if you could have a look at it, its possible that I > misunderstood something from your blog post! Here it is:When a client asks for a catalog, nginx checks it has already cached it, if it is and the cache is still fresh, it serves it otherwise it asks a puppetmaster for the same REST url, and then cache what the master returns. It''s easy to check if nginx is caching the catalog: Have a look into /var/cache/nginx/cache and see if there are some files containing some of your catalogs. Puppet doesn''t send the necessary caching headers right now, and I''m not sure how nginx deals with that. I hope it would still cache (through the vertue of proxy_cache_valid). What version of nginx are you using?> server { > listen 8140; > access_log /var/log/nginx/access.log noip; > ssl_verify_client required;Make that: ssl_verify_client optional; And remove the second server{} block, and make sure your clients do not use a different ca_port. But only if you use nginx >= 0.7.64> root /etc/puppet; > > # make sure we serve everything > # as raw > types { } > default_type application/x-raw; > > # serve static file for the [files] mountpoint > location /production/file_content/files/ { > allow 172.16.0.0/16; > allow 10.0.1.0/8; > allow 127.0.0.1/8; > deny all; > > alias /etc/puppet/files/; > } > > # serve modules files sections > location ~ /production/file_content/[^/]+/files/ { > # it is advisable to have some access rules here > allow 172.16.0.0/16; > allow 10.0.1.0/8; > allow 127.0.0.1/8; > deny all; > > root /etc/puppet/modules; > > # rewrite /production/file_content/module/files/file.txt > # to /module/file.text > rewrite ^/production/file_content/([^/]+)/files/(.+)$ $1/$2 break; > } > > # Variables > # $ssl_cipher returns the line of those utilized it is cipher for established SSL-connection > # $ssl_client_serial returns the series number of client certificate for established SSL-connection > # $ssl_client_s_dn returns line subject DN of client certificate for established SSL-connection > # $ssl_client_i_dn returns line issuer DN of client certificate for established SSL-connection > # $ssl_protocol returns the protocol of established SSL-connection > > location / { > proxy_pass http://puppet_mongrel; > proxy_redirect off; > proxy_set_header Host $host; > proxy_set_header X-Real-IP $remote_addr; > proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; > proxy_set_header X-Client-Verify SUCCESS;If you used ssl_verify_client as I explained above, this should be: proxy_set_header X-Client-Verify $ssl_client_verify> proxy_set_header X-SSL-Subject $ssl_client_s_dn; > proxy_set_header X-SSL-Issuer $ssl_client_i_dn; > proxy_buffer_size 16k; > proxy_buffers 8 32k; > proxy_busy_buffers_size 64k; > proxy_temp_file_write_size 64k; > proxy_read_timeout 540; > > # we handle catalog differently > # because we want to cache them > location /production/catalog {Warning: this ^^ will work only if your nodes are in the "production" environment. Adjust for your environments.> proxy_pass http://puppet_mongrel; > proxy_redirect off; > > # it is a good thing to actually restrict who > # can ask for a catalog (especially for cached > # catalogs) > allow 172.16.0.0/16; > allow 10.0.1.0/8; > allow 127.0.0.1/8; > deny all; > > # where to cache contents > proxy_cache puppetcache; > > # we cache content by catalog host > # we could also use $args to take into account request > # facts, but those change too often (ie uptime or memory) > # to be really usefull > proxy_cache_key $uri; > > # define how long to cache response > > # normal catalogs will be cached 2 weeks > proxy_cache_valid 200 302 301 2w; > > # errors are not cached long > proxy_cache_valid 500 403 1m; > > # the rest is cached a little bit > proxy_cache_valid any 30m; > } > > # catch all location for other terminii > location / {You already have a location ''/'' above. Are you sure nginx is correctly using this configuration? Try: nginx -t it will check your configuration> proxy_pass http://puppet_mongrel; > proxy_redirect off; > } > } > } > server { > listen 8141; > ssl_verify_client off; > root /var/empty; > access_log /var/log/nginx/access.log noip; > > location / { > proxy_pass http://puppet_mongrel; > proxy_redirect off; > proxy_set_header Host $host; > proxy_set_header X-Real-IP $remote_addr; > proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; > proxy_set_header X-Client-Verify FAILURE; > proxy_set_header X-SSL-Subject $ssl_client_s_dn; > proxy_set_header X-SSL-Issuer $ssl_client_i_dn; > } > } > }This server{} wouldn''t be needed if you use the ssl_verify_client as explained above.> > >> 7. set --http_compression > >> > >> I''m not sure if this actually hurts the master or not (because it has > >> to now occupy the CPU compressing catalogs?) > > > > This is a client option, and you need the collaboration of nginx for it > > to work. This will certainly add more burden on your master CPU, because > > nginx now has to gzip everything you''re sending. > > Yeah, I have the gzip compression turned on in nginx, but I dont really > need it and my master could use the break.Actually your nginx are only compressing text/plain documents, so it won''t compress your catalogs.> >> 8. tried to follow the introspection technique[2] > >> > >> this wasn''t so easy to do, I had to operate really fast, because if I > >> was too slow the thread would exit, or it would get hung up on: > >> > >> [Thread 0xb6194b70 (LWP 25770) exited] > >> [New Thread 0xb6194b70 (LWP 25806)] > > > > When you attach gdb, how many threads are running? > > I''m not sure, how can I determine that? I just had the existing 4 > mongrel processes.Maybe you can first try to display the full C trace for all threads: thread apply all bt Then, resume everything, and 2 to 5s take another snapshot with the command above. Comparing the two trace might help us understand what the process is doing. HTH, -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-26 16:30 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Wed, 2011-01-26 at 10:11 -0500, Micah Anderson wrote:> http { > default_type application/octet-stream; > > sendfile on; > tcp_nopush on; > tcp_nodelay on; > > large_client_header_buffers 1024 2048k; > client_max_body_size 150m; > proxy_buffers 128 4k; > > keepalive_timeout 65; > > gzip on; > gzip_min_length 1000; > gzip_types text/plain; > > ssl on; > ssl_certificate /var/lib/puppet/ssl/certs/puppetmaster.pem; > ssl_certificate_key /var/lib/puppet/ssl/private_keys/puppetmaster.pem; > ssl_client_certificate /var/lib/puppet/ssl/ca/ca_crt.pem; > ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; > ssl_session_cache shared:SSL:8m; > ssl_session_timeout 5m; > > proxy_read_timeout 600; > upstream puppet_mongrel { > fair; > server 127.0.0.1:18140; > server 127.0.0.1:18141; > server 127.0.0.1:18142; > server 127.0.0.1:18143; > } > log_format noip ''0.0.0.0 - $remote_user [$time_local] '' > ''"$request" $status $body_bytes_sent '' > ''"$http_referer" "$http_user_agent"''; > > proxy_cache_path /var/cache/nginx/cache levels=1:2 keys_zone=puppetcache:10m;make this: proxy_cache_path /var/cache/nginx/cache levels=1:2 keys_zone=puppetcache:50m inactive=300m The default inactive is 10 minute which is too low for a sleeptime of 60 minutes, and it is possible the cached catalog to be evicted. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-26 19:47 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Felix Frank <felix.frank@alumni.tu-berlin.de> writes:> On 01/26/2011 03:44 PM, Micah Anderson wrote: >> Felix Frank <felix.frank@alumni.tu-berlin.de> writes: >> >>> I propose you need to restructure your manifest so that it compiles >>> faster (if at all possible) or scale up your master. What you''re >>> watching is probably just overload and resource thrashing. >> >> I''m interested in ideas for what are good steps for restructuring >> manifests so they can compile faster, or at least methods for >> identifying problematic areas in manifests. > > Are there many templates or use of the file() function?Yes, there are quite a few. I''m not really sure the best way to count them. I have 288 ''source => "$fileserver"'' lines in my manifests. Another ~160 of them in various modules. As far as templates go, I have ~77 "content => template(...)" lines in my manifests and another 55 in modules.> Do you make heavy use of modules and the autoloader?I do make heavy use of modules, I have about 50 of them. I''m importing 18 of them in my manifests/modules.pp. I think, if they are set up right, I only need to import one of those, and I''ve been slowly pairing those down. I presume that by ''the autoloader'' you are meaning those modules which aren''t explictly included somewhere?>>> Do you have any idea why each individual compilation takes that long? >> >> It wasn''t before. Before things start spinning, compilation times are >> between 9 seconds and 60 seconds, usually averaging just shy of 30 >> seconds. > > That''s still quite considerable IMO.Actually looking at my logs, compile time actually was averaging around 15 seconds each. some taking very little time at all. When things go bad, its more or less a thundering herd and the times start going up and up. micah -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-26 19:48 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Brice Figureau <brice-puppet@daysofwonder.com> writes:> On Wed, 2011-01-26 at 09:44 -0500, Micah Anderson wrote: >> Felix Frank <felix.frank@alumni.tu-berlin.de> writes: >> >> > I propose you need to restructure your manifest so that it compiles >> > faster (if at all possible) or scale up your master. What you''re >> > watching is probably just overload and resource thrashing. >> >> I''m interested in ideas for what are good steps for restructuring >> manifests so they can compile faster, or at least methods for >> identifying problematic areas in manifests. >> >> > Do you have any idea why each individual compilation takes that long? >> >> It wasn''t before. Before things start spinning, compilation times are >> between 9 seconds and 60 seconds, usually averaging just shy of 30 >> seconds. > > Do you use a External Node Classifier?I do not. micah -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Micah Anderson
2011-Jan-26 20:40 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Brice Figureau <brice-puppet@daysofwonder.com> writes:> On Wed, 2011-01-26 at 10:11 -0500, Micah Anderson wrote: >> Brice Figureau <brice-puppet@daysofwonder.com> writes: >> >> > On Tue, 2011-01-25 at 17:11 -0500, Micah Anderson wrote: >> >> Brice Figureau <brice-puppet@daysofwonder.com> writes: >> >> >> >> All four of my mongrels are constantly pegged, doing 40-50% of the CPU >> >> each, occupying all available CPUs. They never settle down. I''ve got 74 >> >> nodes checking in now, it doesn''t seem like its that many, but perhaps >> >> i''ve reached a tipping point with my puppetmaster (its a dual 1ghz, >> >> 2gigs of ram machine)? >> > >> > The puppetmaster is mostly CPU bound. Since you have only 2 CPUs, you >> > shouldn''t try to achieve a concurrency of 4 (which your mongrel are >> > trying to do), otherwise what will happen is that more than one request >> > will be accepted by one mongrel process and each thread will contend for >> > the CPU. The bad news is that the ruby MRI uses green threading, so the >> > second thread will only run when the first one will either sleep, do I/O >> > or relinquish the CPU voluntary. In a word, it will only run when the >> > first thread will finish its compilation. >> >> Ok, that is a good thing to know. I wasn''t aware that ruby was not able >> to do that. >> >> > Now you have 74 nodes, with the worst compilation time of 75s (which is >> > a lot), that translates to 74*75 = 5550s of compilation time. >> > With a concurrency of 2, that''s still 2775s of compilation time per >> > round of <insert here your default sleep time>. With the default 30min >> > of sleep time and assuming a perfect scheduling, that''s still larger >> > than a round of sleep time, which means that you won''t ever finish >> > compiling nodes, when the first node will ask again for a catalog. >> >> I''m doing 60 minutes of sleep time, which is 3600 seconds an hour, the >> concurrency of 2 giving me 2775s of compile time per hour does keep me >> under the 3600 seconds... assuming scheduling is perfect, which it very >> likely is not. >> >> > And I''m talking only about compilation. If your manifests use file >> > sourcing, you must also add this to the equation. >> >> As explained, I set up your nginx method for offloading file sourcing. >> >> > Another explanation of the issue is swapping. You mention your server >> > has 2GiB of RAM. Are you sure your 4 mongrel processes after some times >> > still fit in the physical RAM (along with the other thing running on the >> > server)? >> > Maybe your server is constantly swapping. >> >> I''m actually doing fine on memory, not dipping into swap. I''ve watched >> i/o to see if I could identify either a swap or disk problem, but didn''t >> notice very much happening there. The CPU usage of the mongrel processes >> is pretty much where everything is spending its time. >> >> I''ve been wondering if I have some loop in a manifest or something that >> is causing them to just spin. > > I don''t think it''s the problem. There could be some ruby internals > issues playing here, but I doubt something in your manifest creates a > loop. > > What is strange is that you mentioned that the very first catalog > compilations were fine, but then the compilation time increases.Yes, and it increases quite rapidly. Interesting to note that the first few compile times are basically within range of what I was experiencing before things started to tip over (the last few days). I''m struggling to try and think of anything I could have changed, but so far have not been able to think of anything.>> > So you can do several thing to get better performances: >> > * reduce the number of nodes that check in at a single time (ie increase >> > sleep time) >> >> I''ve already reduced to once per hour, but I could consider reducing it >> more. > > That would be interesting. This would help us know if the problem is too > many load/concurrency for your clients or a problem in the master > itself.I''ll need to setup mcollective to do that I believe. Right now I''m setting up a cronjob like this: "<%= scope.function_fqdn_rand([''59'']) %> * * * *" which results in a cronjob (on one host): 6 * * * * root /usr/sbin/puppetd --onetime --no-daemonize --config=/etc/puppet/puppet.conf --color false | grep -E ''(^err:|^alert:|^emerg:|^crit:)''> BTW, what''s the load on the server?The server is dedicated to puppetmaster. When I had four mongrels running it was basically at 4 constantly. Now that I''ve backed it down to 2 mongrels, its: 11:57:41 up 58 days, 21:20, 2 users, load average: 2.31, 1.97, 2.02>> Not swapping. > > OK, good.Just as a confirmation to this... vmstat shows no si/so happening, and very high numbers in the CPU user column. Very little bi/bo, and low sys values. Context switches are a bit high... this clearly points to the process eating CPU, not any disk/memory/swap scenario.>> > + Reduce the number of mongrel instances, to artifically reduce the >> > concurrency (this is counter-intuitive I know) >> >> Ok, I''m backing off to two mongrels to see how well that works. > > Let me know if that changes something.Doesn''t seem to help. Compiles start out low, and are inching up (started at 27, and now they are at 120 seconds).>> > + use a "better" ruby interpreter like Ruby Enterprise Edition (for >> > several reasons this ones has better GC, better memory footprint). >> >> I''m pretty sure my problem isn''t memory, so I''m not sure if these will >> help much. > > Well, having a better GC means that the ruby interpreter will become > faster at allocating stuff or recycling object. That in the end means > the overall memory footprint can be better, but that also means it will > spend much less time doing garbage stuff (ie better use the CPU for your > code and not for tidying stuff).That could be interesting. I haven''t tried REE or jruby on debian before, I suppose its worth a try.>> >> 3. tried to upgrade rails from 2.3.5 (the debian version) to 2.3.10 >> >> >> >> I didn''t see any appreciable difference here. I ended up going back to >> >> 2.3.5 because that was the packaged version. >> > >> > Since you seem to use Debian, make sure you use either the latest ruby >> > lenny backports (or REE) as they fixed an issue with pthreads and CPU >> > consumption: >> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579229 >> >> I''m using Debian Squeeze, which has the same version you are mentioning >> from lenny backports (2.3.5). > > I was talking about the ruby1.8 package, not rails. Make sure you use > the squeeze version or the lenny-backports one.Yep, I''m using the squeeze ruby1.8, which is 1.8.7.302-2>> >> 5. tried to cache catalogs through adding a http front-end cache and >> >> expiring that cache when manifests are updated[1] >> >> >> >> I''m not sure this works at all. >> > >> > This should have helped because this would prevent the puppetmaster to >> > even be called. You might check your nginx configuration then.It wasn''t really caching before, because of the nginx parameter you pointed out in a previous message. But now it seems like it is: find /var/cache/nginx/cache -type f |wc 29 29 1769> What version of nginx are you using?0.7.67-3> Make that: > ssl_verify_client optional; > > And remove the second server{} block, and make sure your clients do not > use a different ca_port. But only if you use nginx >= 0.7.64Ok, that second server block was for the cert request... but sounds like if I tweak the verify to optional, I dont need that. I''m sure the clients aren''t using a different ca_port (except for the initial node bootstrap). I''ve changed that and removed the block.> If you used ssl_verify_client as I explained above, this should be: > proxy_set_header X-Client-Verify $ssl_client_verifyChanged.>> # we handle catalog differently >> # because we want to cache them >> location /production/catalog { > > Warning: this ^^ will work only if your nodes are in the "production" > environment. Adjust for your environments./etc/puppet/puppet.conf has: environment = production I do occasionally use development environments, but rarely enough that not having caching is ok.> You already have a location ''/'' above. > Are you sure nginx is correctly using this configuration? > Try: > nginx -t > it will check your configurationHm, good catch. nginx -t seems ok with it, but I''ve removed the extra location ''/'' just in case.> This server{} wouldn''t be needed if you use the ssl_verify_client as > explained above.Removed.>> >> 7. set --http_compression >> >> >> >> I''m not sure if this actually hurts the master or not (because it has >> >> to now occupy the CPU compressing catalogs?) >> > >> > This is a client option, and you need the collaboration of nginx for it >> > to work. This will certainly add more burden on your master CPU, because >> > nginx now has to gzip everything you''re sending. >> >> Yeah, I have the gzip compression turned on in nginx, but I dont really >> need it and my master could use the break. > > Actually your nginx are only compressing text/plain documents, so it > won''t compress your catalogs.Ah, interesting! Well, again... I''m turning it off on the nodes, its not needed.>> >> 8. tried to follow the introspection technique[2] >> >> >> >> this wasn''t so easy to do, I had to operate really fast, because if I >> >> was too slow the thread would exit, or it would get hung up on: >> >> >> >> [Thread 0xb6194b70 (LWP 25770) exited] >> >> [New Thread 0xb6194b70 (LWP 25806)] >> > >> > When you attach gdb, how many threads are running? >> >> I''m not sure, how can I determine that? I just had the existing 4 >> mongrel processes. > > Maybe you can first try to display the full C trace for all threads: > thread apply all bt > > Then, resume everything, and 2 to 5s take another snapshot with the > command above. Comparing the two trace might help us understand what the > process is doing.Now that I''ve fixed up the nginx.conf and caching is actually happening, I''ve noticed that catalog compiles are 10s, 14s, 19s, 10s, 25s, 8s and things haven''t fallen over yet, so its much better right now. I''m going to let this run for an hour or two and if things are still bad, I''ll look at the thread traces. m -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Felix Frank
2011-Jan-27 09:57 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
>> Are there many templates or use of the file() function? > > Yes, there are quite a few. I''m not really sure the best way to count > them. I have 288 ''source => "$fileserver"'' lines in myThose don''t hurt compilation.> manifests. Another ~160 of them in various modules. As far as templates > go, I have ~77 "content => template(...)" lines in my manifests and > another 55 in modules.If in doubt, try and loose a greater portion of them to see if compile times are affected. But that won''t be very practical.>> Do you make heavy use of modules and the autoloader? > > I do make heavy use of modules, I have about 50 of them. I''m importing > 18 of them in my manifests/modules.pp. I think, if they are set up > right, I only need to import one of those, and I''ve been slowly pairing > those down. I presume that by ''the autoloader'' you are meaning those > modules which aren''t explictly included somewhere?Yes. If your structure leads to all modules being eventually included on all nodes, you''re possibly wasting CPU cycles during compilation. I recently had a script rename all my classes to <module_name>::classname (including all references, i.e. includes etc.) and got rid of all import statements. I didn''t determine any changes in compilation time, but the manifests are now generally less messy, so there really are no downsides for me.>>>> Do you have any idea why each individual compilation takes that long? >>> >>> It wasn''t before. Before things start spinning, compilation times are >>> between 9 seconds and 60 seconds, usually averaging just shy of 30 >>> seconds. >> >> That''s still quite considerable IMO. > > Actually looking at my logs, compile time actually was averaging around > 15 seconds each. some taking very little time at all. When things go > bad, its more or less a thundering herd and the times start going up and > up.Reminiscent of what I saw before moving away from Webrick. Seeing as you noticed in the other branch that tuning nginx (was it?) helped you, I still think you were just overloaded before. Regards, Felix -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Udo Waechter
2011-Jan-31 18:11 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hi. I am just reading this thread, and it strikes me that we have the same problems with 2.6.3. Since upgrading from 2.6.2 to .3 puppetmaster shows the behabiour described in this thread. We have about 160 clients, and puppetmaster (is now) 8 core, 8Gb RAM kvm instance. Had this with 4 cores and 4 gigs RAM, "doublesizing" the VM did not change a thing! We use passenger 2.2.11debian-2 and apache 2.2.16-3, ruby1.8 from squeeze. Puppetmaster works fine after restart, then after about 2-3 hours it becomes pretty unresponsive, catalog runs go upt do 120 seconds and more (the baseline being something about 10 seconds). I need to restart apache/puppetmaster about once a day. When I do that I need to: * stop apache * kill (still running) pupppetmasters (with SIGKILL!), some are always left running with "CPU 100%" * start apache Something is very weird there, and there were no fundamental changes to the manifests/modules. The only thing that really changed is the VM itself. It was XEN (for years), we switched to KVM with kernel 2.6.35 Another strange thing: puppet-clients do run a lot longer nowadays. A machine usually took about 40-50 seconds for one run. When puppetmaster goes crazy it now takes ages (500 seconds and even more). Something is weird there... --udo, -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nan Liu
2011-Jan-31 18:16 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Mon, Jan 31, 2011 at 10:11 AM, Udo Waechter <udo.waechter@uni-osnabrueck.de> wrote:> Hi. > > I am just reading this thread, and it strikes me that we have the same problems with 2.6.3. > > Since upgrading from 2.6.2 to .3 puppetmaster shows the behabiour described in this thread. > We have about 160 clients, and puppetmaster (is now) 8 core, 8Gb RAM kvm instance. Had this with 4 cores and 4 gigs RAM, "doublesizing" the VM did not change a thing! > > We use passenger 2.2.11debian-2 and apache 2.2.16-3, ruby1.8 from squeeze. > > Puppetmaster works fine after restart, then after about 2-3 hours it becomes pretty unresponsive, catalog runs go upt do 120 seconds and more (the baseline being something about 10 seconds). > > I need to restart apache/puppetmaster about once a day. When I do that I need to: > > * stop apache > * kill (still running) pupppetmasters (with SIGKILL!), some are always left running with "CPU 100%" > * start apache > > Something is very weird there, and there were no fundamental changes to the manifests/modules. > > The only thing that really changed is the VM itself. It was XEN (for years), we switched to KVM with kernel 2.6.35 > > Another strange thing: > > puppet-clients do run a lot longer nowadays. A machine usually took about 40-50 seconds for one run. When puppetmaster goes crazy it now takes ages (500 seconds and even more).When it takes longer, is the agent simply spending more time on config_retreival? You can find this metric in store reports. I would not focus on the agent, if the delays are caused by compilation delays by the master. Thanks, Nan -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-31 21:43 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 31/01/11 19:11, Udo Waechter wrote:> Hi. > > I am just reading this thread, and it strikes me that we have the > same problems with 2.6.3. > > Since upgrading from 2.6.2 to .3 puppetmaster shows the behabiour > described in this thread. We have about 160 clients, and puppetmaster > (is now) 8 core, 8Gb RAM kvm instance. Had this with 4 cores and 4 > gigs RAM, "doublesizing" the VM did not change a thing! > > We use passenger 2.2.11debian-2 and apache 2.2.16-3, ruby1.8 from > squeeze.I see a pattern here. It seems Micah (see a couple of mails above in this thread) has about the same setup, except he''s using mongrels. It would be great to try a non-debian ruby (hint: Ruby Enterprise Edition for instance) to see if that''s any better. Do you use storeconfigs?> Puppetmaster works fine after restart, then after about 2-3 hours it > becomes pretty unresponsive, catalog runs go upt do 120 seconds and > more (the baseline being something about 10 seconds).With 160 hosts, a 30 min sleeptime, and a compilation of 10s, that means you need 1600 cpu seconds to build catalogs for all your fleet. With a concurrency of 8 core (assuming you use a pool of 8 passenger app), that''s 200s per core, which way less than the max of 1800s you can accomodate in a 30 min time-frame. Of course this assumes an evenly distributed load an perfect concurrency, but still you have plenty of available resources. So I conclude this is not normal.> I need to restart apache/puppetmaster about once a day. When I do > that I need to: > > * stop apache * kill (still running) pupppetmasters (with SIGKILL!), > some are always left running with "CPU 100%" * start apacheDoes stracing/ltracing the process show something useful?> Something is very weird there, and there were no fundamental changes > to the manifests/modules. > > The only thing that really changed is the VM itself. It was XEN (for > years), we switched to KVM with kernel 2.6.35 > > Another strange thing: > > puppet-clients do run a lot longer nowadays. A machine usually took > about 40-50 seconds for one run. When puppetmaster goes crazy it now > takes ages (500 seconds and even more).If your master are busy, there are great chances your clients have to wait more to get served either catalogs or sourced files (or file metadata). This can dramatically increase the run time.> Something is weird there... --udo,Indeed. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
John Warburton
2011-Jan-31 21:50 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 1 February 2011 08:43, Brice Figureau <brice-puppet@daysofwonder.com>wrote:> On 31/01/11 19:11, Udo Waechter wrote: >> Do you use storeconfigs? >Speaking of resource hogs, do you run the puppet labs dashboard on the same host? I had a similar setup (on crusty old Sun kit mind), and found a big performance hit in writing the reports by the client to the puppet master and then those reports to the dashboard. Everything calmed down once I moved the dashboard to another host John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Waechter Udo
2011-Feb-01 10:30 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hi, On 31.01.2011, at 22:43, Brice Figureau wrote:> On 31/01/11 19:11, Udo Waechter wrote: >> [..] >> We use passenger 2.2.11debian-2 and apache 2.2.16-3, ruby1.8 from >> squeeze. > > I see a pattern here. It seems Micah (see a couple of mails above in > this thread) has about the same setup, except he''s using mongrels. > > It would be great to try a non-debian ruby (hint: Ruby Enterprise > Edition for instance) to see if that''s any better. >Well, since this behavior turned up with 2.6.3 I did not think about blaming it on another tool. Like ruby. I will try RubyEE though.> Do you use storeconfigs?Yes, ALOT! Nowadays with stompserver and puppetqd. I did switch it off already and that did not change a (performance) thing.> >> Puppetmaster works fine after restart, then after about 2-3 hours it >> becomes pretty unresponsive, catalog runs go upt do 120 seconds and >> more (the baseline being something about 10 seconds). > > With 160 hosts, a 30 min sleeptime, and a compilation of 10s, that means > you need 1600 cpu seconds to build catalogs for all your fleet. > With a concurrency of 8 core (assuming you use a pool of 8 passenger > app), that''s 200s per core, which way less than the max of 1800s you can > accomodate in a 30 min time-frame. Of course this assumes an evenly > distributed load an perfect concurrency, but still you have plenty of > available resources. So I conclude this is not normal.Nope, like I said. We had pupetmaster running as a VM with 4 Cores and 4 Gigs ''o RAM. This worked fine since 0.22.x, now its twice as big (if this comparison holds) and performance is worse than ever. Also, we do not do 30 Minutes puppet runs. We do it every hour for workstations and every 2 hours for servers all with half of that time random sleep. The load on the server is pretty evenly distributed. Once or twice a day there are some peaks, but those are not critical at all. Thanks, udo.
Waechter Udo
2011-Feb-01 10:31 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hi, On 31.01.2011, at 22:50, John Warburton wrote:> On 1 February 2011 08:43, Brice Figureau <brice-puppet@daysofwonder.com> wrote: > On 31/01/11 19:11, Udo Waechter wrote: > > Do you use storeconfigs? > > Speaking of resource hogs, do you run the puppet labs dashboard on the same host? I had a similar setup (on crusty old Sun kit mind), and found a big performance hit in writing the reports by the client to the puppet master and then those reports to the dashboard. Everything calmed down once I moved the dashboard to another hostYes I do, but I always did.... Even if this is not a good idea, performance was acceptable until 2.6.3 Something must have changed there. --udo.
Ashley Penney
2011-Feb-01 15:30 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
This is the crux of the situation for me too - Puppetlabs blame it on a Ruby bug that hasn''t been resolved with RHEL6 (in my situation) but this wasn''t an issue until .3 for me too. I feel that fact that many of us have this problem since upgrading means it can be fixed within Puppet, rather than Ruby, because it was fine before. On Tue, Feb 1, 2011 at 5:31 AM, Waechter Udo <udo.waechter@uni-osnabrueck.de> wrote:> Hi, > > On 31.01.2011, at 22:50, John Warburton wrote: > > > On 1 February 2011 08:43, Brice Figureau <brice-puppet@daysofwonder.com> > wrote: > > On 31/01/11 19:11, Udo Waechter wrote: > > > > Do you use storeconfigs? > > > > Speaking of resource hogs, do you run the puppet labs dashboard on the > same host? I had a similar setup (on crusty old Sun kit mind), and found a > big performance hit in writing the reports by the client to the puppet > master and then those reports to the dashboard. Everything calmed down once > I moved the dashboard to another host > > Yes I do, but I always did.... Even if this is not a good idea, performance > was acceptable until 2.6.3 Something must have changed there. > --udo.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Feb-01 17:14 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Tue, 2011-02-01 at 10:30 -0500, Ashley Penney wrote:> This is the crux of the situation for me too - Puppetlabs blame it on > a Ruby bug that hasn''t been resolved with RHEL6 (in my situation) but > this wasn''t an issue until .3 for me too. I feel that fact that many > of us have this problem since upgrading means it can be fixed within > Puppet, rather than Ruby, because it was fine before.Do you mean puppet 2.6.2 wasn''t exhibiting this problem? -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2011-Feb-01 19:35 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Yes, it didn''t happen with the earlier versions of 2.6. On Tue, Feb 1, 2011 at 12:14 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> On Tue, 2011-02-01 at 10:30 -0500, Ashley Penney wrote: > > This is the crux of the situation for me too - Puppetlabs blame it on > > a Ruby bug that hasn''t been resolved with RHEL6 (in my situation) but > > this wasn''t an issue until .3 for me too. I feel that fact that many > > of us have this problem since upgrading means it can be fixed within > > Puppet, rather than Ruby, because it was fine before. > > Do you mean puppet 2.6.2 wasn''t exhibiting this problem? > > > -- > Brice Figureau > Follow the latest Puppet Community evolutions on www.planetpuppet.org! > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Feb-01 19:45 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 01/02/11 20:35, Ashley Penney wrote:> Yes, it didn''t happen with the earlier versions of 2.6.If it''s easy for you to reproduce the issue you really should git bisect the issue and tell puppetlabs what commit is the root cause (the differences between 2.6.2 and 2.6.3 is not that big). This way, they''ll certainly be able to fix it. Do we have a redmine ticket to track this issue? -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Udo Waechter
2011-Feb-01 20:17 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 01.02.2011, at 18:14, Brice Figureau wrote:> On Tue, 2011-02-01 at 10:30 -0500, Ashley Penney wrote: >> This is the crux of the situation for me too - Puppetlabs blame it on >> a Ruby bug that hasn''t been resolved with RHEL6 (in my situation) but >> this wasn''t an issue until .3 for me too. I feel that fact that many >> of us have this problem since upgrading means it can be fixed within >> Puppet, rather than Ruby, because it was fine before. > > Do you mean puppet 2.6.2 wasn''t exhibiting this problem?Yes for me. --udo. -- :: udo waechter - root@zoide.net :: N 52º16''30.5" E 8º3''10.1" :: genuine input for your ears: http://auriculabovinari.de :: your eyes: http://ezag.zoide.net :: your brain: http://zoide.net -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2011-Feb-07 16:23 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Because I like to live dangerously I upgraded to 2.6.5 and it seems like this has resolved the CPU problem completely for me. On Tue, Feb 1, 2011 at 3:17 PM, Udo Waechter <udo.waechter@uni-osnabrueck.de> wrote:> > On 01.02.2011, at 18:14, Brice Figureau wrote: > > > On Tue, 2011-02-01 at 10:30 -0500, Ashley Penney wrote: > >> This is the crux of the situation for me too - Puppetlabs blame it on > >> a Ruby bug that hasn''t been resolved with RHEL6 (in my situation) but > >> this wasn''t an issue until .3 for me too. I feel that fact that many > >> of us have this problem since upgrading means it can be fixed within > >> Puppet, rather than Ruby, because it was fine before. > > > > Do you mean puppet 2.6.2 wasn''t exhibiting this problem? > Yes for me. > --udo. > > -- > :: udo waechter - root@zoide.net :: N 52º16''30.5" E 8º3''10.1" > :: genuine input for your ears: http://auriculabovinari.de > :: your eyes: http://ezag.zoide.net > :: your brain: http://zoide.net > > > > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Feb-07 18:56 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 07/02/11 17:23, Ashley Penney wrote:> Because I like to live dangerously I upgraded to 2.6.5 and it seems like > this has resolved the CPU problem completely for me.Did you upgrade the master or the master and all the nodes? I had a discussion about this issue with Nigel during the week-end, and he said something really interesting I didn''t thought about: it might be possible that the reports generated by 2.6.3 were larger than what they were in previous versions. It is then possible that the CPU time taken to unserialize and process those larger reports is the root cause of the high CPU usage. That''d be great if one of the people having the problem could disable reports to see if that''s the culprit. And if this is the case, we should at least log how long it takes to process a report on the master. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ashley Penney
2011-Feb-07 19:15 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
I just upgraded the master, I was too lazy to do the nodes yet. On Mon, Feb 7, 2011 at 1:56 PM, Brice Figureau < brice-puppet@daysofwonder.com> wrote:> On 07/02/11 17:23, Ashley Penney wrote: > > Because I like to live dangerously I upgraded to 2.6.5 and it seems like > > this has resolved the CPU problem completely for me. > > Did you upgrade the master or the master and all the nodes? > > I had a discussion about this issue with Nigel during the week-end, and > he said something really interesting I didn''t thought about: > it might be possible that the reports generated by 2.6.3 were larger > than what they were in previous versions. > > It is then possible that the CPU time taken to unserialize and process > those larger reports is the root cause of the high CPU usage. > > That''d be great if one of the people having the problem could disable > reports to see if that''s the culprit. > > And if this is the case, we should at least log how long it takes to > process a report on the master. > -- > Brice Figureau > My Blog: http://www.masterzen.fr/ > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Udo Waechter
2011-Feb-10 14:55 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
Hello, I am one of those who have this problem. Some people suggested using Ruby Enterprise. I looked at its installation, it looked a little bit time-consuming, so I did not try that one out. I upgraded to debian squeeze (of course), and the problem persists. Thus I did some tests: 1. got ruby from "Ubuntu Meercat": libruby1.8 1.8.7.299-2 ruby1.8 1.8.7.299-2 ruby1.8-dev 1.8.7.299-2 Same Problem (debian is 1.8.7.302 I think), with ruby from ubuntu lucid (1.8.7.249) the problem is the same. I guess we can rule out debian''s ruby here. 2. I reported that after stopping apache, stray master process remain and do 100% cpu. I did an strace on those processes and they do this (whatever that means): $ strace -p 1231 Process 1231 attached - interrupt to quit brk(0xa49a000) = 0xa49a000 brk(0xbf51000) = 0xbf51000 brk(0xda09000) = 0xda09000 brk(0xa49a000) = 0xa49a000 brk(0xbf52000) = 0xbf52000 brk(0xda09000) = 0xda09000 brk(0xa49a000) = 0xa49a000 brk(0xbf52000) = 0xbf52000 brk(0xda09000) = 0xda09000 ^CProcess 1231 detached 3. I have now disabled reports, lets see what happens. Thanks for the effort and have a nice day. udo. On 07.02.2011, at 19:56, Brice Figureau wrote:> On 07/02/11 17:23, Ashley Penney wrote: >> Because I like to live dangerously I upgraded to 2.6.5 and it seems like >> this has resolved the CPU problem completely for me. > > Did you upgrade the master or the master and all the nodes? > > I had a discussion about this issue with Nigel during the week-end, and > he said something really interesting I didn''t thought about: > it might be possible that the reports generated by 2.6.3 were larger > than what they were in previous versions. > > It is then possible that the CPU time taken to unserialize and process > those larger reports is the root cause of the high CPU usage. > > That''d be great if one of the people having the problem could disable > reports to see if that''s the culprit. > > And if this is the case, we should at least log how long it takes to > process a report on the master. > -- > Brice Figureau > My Blog: http://www.masterzen.fr/ > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- :: udo waechter - root@zoide.net :: N 52º16''30.5" E 8º3''10.1" :: genuine input for your ears: http://auriculabovinari.de :: your eyes: http://ezag.zoide.net :: your brain: http://zoide.net -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Feb-10 15:22 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Thu, 2011-02-10 at 15:55 +0100, Udo Waechter wrote:> Hello, > I am one of those who have this problem. Some people suggested using Ruby Enterprise. I looked at its installation, it looked a little bit time-consuming, so I did not try that one out. > I upgraded to debian squeeze (of course), and the problem persists. > > Thus I did some tests: > > 1. got ruby from "Ubuntu Meercat": > libruby1.8 1.8.7.299-2 > ruby1.8 1.8.7.299-2 > ruby1.8-dev 1.8.7.299-2 > > Same Problem (debian is 1.8.7.302 I think), with ruby from ubuntu lucid (1.8.7.249) the problem is the same. I guess we can rule out debian''s ruby here. > > 2. I reported that after stopping apache, stray master process remain and do 100% cpu. I did an strace on those processes and they do this (whatever that means): > > $ strace -p 1231 > Process 1231 attached - interrupt to quit > brk(0xa49a000) = 0xa49a000 > brk(0xbf51000) = 0xbf51000 > brk(0xda09000) = 0xda09000 > brk(0xa49a000) = 0xa49a000 > brk(0xbf52000) = 0xbf52000 > brk(0xda09000) = 0xda09000 > brk(0xa49a000) = 0xa49a000 > brk(0xbf52000) = 0xbf52000 > brk(0xda09000) = 0xda09000 > ^CProcess 1231 detachedThis process is allocating memory like crazy :)> 3. I have now disabled reports, lets see what happens. > > Thanks for the effort and have a nice day. > udo.Are you still on puppet 2.6.3? Can you upgrade to 2.6.5 to see if that''s better as reported by one other user? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Patrick
2011-Feb-10 19:40 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On Feb 10, 2011, at 6:55 AM, Udo Waechter wrote:> Hello, > I am one of those who have this problem. Some people suggested using Ruby Enterprise. I looked at its installation, it looked a little bit time-consuming, so I did not try that one out.Well, I find it takes about 30 min at the most, saves on RAM, and causes puppet it use a little more CPU. Here''s what I did. This method requires a compiler. You can also do everything up to (but not including) step 5 without affecting puppet. It''s also easy to reverse. 1) Changed /usr/share/puppet/rack/puppetmasterd/config.ru to use an absolute path to the folder. Need this line: $:.unshift(''/usr/lib/ruby/1.8/'') 2) Install the dependencies for the compile: package { "libssl-dev": ensure => present } package { "libsqlite3-dev": ensure => present } package { ''libmysql++-dev'': ensure => present } package { ''libpq-dev'': ensure => present } package { ''apache2-prefork-dev'': ensure => present } package { ''libapr1-dev'': ensure => present } package { ''libaprutil1-dev'': ensure => present } 3) Installed RubyEE from their universal package. 4) Added a passengerEE mod to /etc/apache/mods-avaliable/ /etc/apache2/mods-avaliable/passengeree.load: LoadModule passenger_module /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/gems/1.8/gems/passenger-2.2.15/ext/apache2/mod_passenger.so PassengerRoot /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/gems/1.8/gems/passenger-2.2.15 PassengerRuby /opt/ruby-enterprise-1.8.7-2010.02/bin/ruby 5) Disable the old passenger and enable the new one a2dismod passenger a2enmod passengeree service apache2 restart If things don''t work do this to enable your old passenger: a2enmod passenger a2dismod passengeree service apache2 restart -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
John Warburton
2011-Feb-10 23:55 UTC
Re: [Puppet Users] Re: puppetmaster 100%cpu usage on 2.6 (not on 0.24)
On 8 February 2011 06:15, Ashley Penney <apenney@gmail.com> wrote:> I just upgraded the master, I was too lazy to do the nodes yet. > > > On Mon, Feb 7, 2011 at 1:56 PM, Brice Figureau < > brice-puppet@daysofwonder.com> wrote: > >> On 07/02/11 17:23, Ashley Penney wrote: >> > Because I like to live dangerously I upgraded to 2.6.5 and it seems like >> > this has resolved the CPU problem completely for me. >> >> Did you upgrade the master or the master and all the nodes? >> >Was that upgrade to 2.6.5rc2? Seems there has been a nice patch to speed up large HTTP POST & PUTs. Since 2.6.x reports can be large (I have some approaching 1 Mb), this might be where the problem may have been https://projects.puppetlabs.com/projects/puppet/wiki/Release_Notes#2.6.5 https://projects.puppetlabs.com/issues/6257 John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.