Michael Conigliaro
2009-May-04 14:41 UTC
[Puppet Users] long catalog run times and random connection timeouts
Hello, I''ve been seeing some strange behavior for the last few days, and I''m not sure what else to do to troubleshoot it. It started when two of my puppet clients (always the same two) suddenly began to take forever to finish their catalog runs. I''m not sure what changed around 5pm last Thursday to cause this, but well, check out the logs... Apr 30 16:50:50 emsdb01 puppetd[11363]: Finished catalog run in 4.59 seconds Apr 30 17:26:29 emsdb01 puppetd[11363]: Finished catalog run in 292.65 seconds Apr 30 16:43:05 emsweb01 puppetd[1345]: Finished catalog run in 4.47 seconds Apr 30 17:18:27 emsweb01 puppetd[1345]: Finished catalog run in 292.53 seconds Now when emsdb01 and emsweb01 are checking in, all the other clients seem to queue up behind them. So I end up seeing... Conection timeout calling puppetmaster.getconfig: execution expired And... Could not retrieve catalog: Connection Timeout from all the other clients. For those that don''t time out, I see a flurry of "finished catalog run" as they all complete at the same time (but with excessive catalog run times, presumably because everyone else got held up by emsdb01 and emsweb01). When I run "puppetd --test --debug" on emsdb01 and emsweb01, I don''t see anything unusual happening. Both servers are basically idle while the catalog run is happening. What I do see is that they get hung up on these tasks for some reason: debug: Calling puppetmaster.getconfig debug: Calling fileserver.describe When I look at the puppetmaster as this is going on, I don''t see anything unusual in regards to server load there either. At first, I suspected some kind of network connectivity issue, but I have run several tcpdumps, and everything seems to be connecting fine. Is there something else I should be looking at to determine the cause of this problem? -- Michael Conigliaro Computer Analyst Fuss & O''Neill Technologies www.fandotech.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Ohad Levy
2009-May-04 14:47 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
are you using webrick? On Mon, May 4, 2009 at 10:41 PM, Michael Conigliaro < mconigliaro@fandotech.com> wrote:> > Hello, > > I''ve been seeing some strange behavior for the last few days, and I''m > not sure what else to do to troubleshoot it. It started when two of my > puppet clients (always the same two) suddenly began to take forever to > finish their catalog runs. I''m not sure what changed around 5pm last > Thursday to cause this, but well, check out the logs... > > Apr 30 16:50:50 emsdb01 puppetd[11363]: Finished catalog run in 4.59 > seconds Apr 30 17:26:29 emsdb01 puppetd[11363]: Finished catalog run in > 292.65 seconds > > Apr 30 16:43:05 emsweb01 puppetd[1345]: Finished catalog run in 4.47 > seconds Apr 30 17:18:27 emsweb01 puppetd[1345]: Finished catalog run in > 292.53 seconds > > Now when emsdb01 and emsweb01 are checking in, all the other clients > seem to queue up behind them. So I end up seeing... > > Conection timeout calling puppetmaster.getconfig: execution expired > > And... > > Could not retrieve catalog: Connection Timeout > > from all the other clients. For those that don''t time out, I see a > flurry of "finished catalog run" as they all complete at the same time > (but with excessive catalog run times, presumably because everyone else > got held up by emsdb01 and emsweb01). > > When I run "puppetd --test --debug" on emsdb01 and emsweb01, I don''t see > anything unusual happening. Both servers are basically idle while the > catalog run is happening. What I do see is that they get hung up on > these tasks for some reason: > > debug: Calling puppetmaster.getconfig > debug: Calling fileserver.describe > > When I look at the puppetmaster as this is going on, I don''t see > anything unusual in regards to server load there either. At first, I > suspected some kind of network connectivity issue, but I have run > several tcpdumps, and everything seems to be connecting fine. Is there > something else I should be looking at to determine the cause of this > problem? > > -- > Michael Conigliaro > Computer Analyst > Fuss & O''Neill Technologies > www.fandotech.com > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Michael Conigliaro
2009-May-04 14:55 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
I''m actually not sure. How do I determine that? I just use the redhat rpms from the epel repository, and I don''t remember seeing an option for that anywhere. -- Michael Conigliaro Computer Analyst Fuss & O''Neill Technologies www.fandotech.com -----Original Message----- From: puppet-users@googlegroups.com [mailto:puppet-users@googlegroups.com] On Behalf Of Ohad Levy Sent: Monday, May 04, 2009 10:48 AM To: puppet-users@googlegroups.com Subject: [Puppet Users] Re: long catalog run times and random connection timeouts are you using webrick? On Mon, May 4, 2009 at 10:41 PM, Michael Conigliaro <mconigliaro@fandotech.com> wrote: Hello, I''ve been seeing some strange behavior for the last few days, and I''m not sure what else to do to troubleshoot it. It started when two of my puppet clients (always the same two) suddenly began to take forever to finish their catalog runs. I''m not sure what changed around 5pm last Thursday to cause this, but well, check out the logs... Apr 30 16:50:50 emsdb01 puppetd[11363]: Finished catalog run in 4.59 seconds Apr 30 17:26:29 emsdb01 puppetd[11363]: Finished catalog run in 292.65 seconds Apr 30 16:43:05 emsweb01 puppetd[1345]: Finished catalog run in 4.47 seconds Apr 30 17:18:27 emsweb01 puppetd[1345]: Finished catalog run in 292.53 seconds Now when emsdb01 and emsweb01 are checking in, all the other clients seem to queue up behind them. So I end up seeing... Conection timeout calling puppetmaster.getconfig: execution expired And... Could not retrieve catalog: Connection Timeout from all the other clients. For those that don''t time out, I see a flurry of "finished catalog run" as they all complete at the same time (but with excessive catalog run times, presumably because everyone else got held up by emsdb01 and emsweb01). When I run "puppetd --test --debug" on emsdb01 and emsweb01, I don''t see anything unusual happening. Both servers are basically idle while the catalog run is happening. What I do see is that they get hung up on these tasks for some reason: debug: Calling puppetmaster.getconfig debug: Calling fileserver.describe When I look at the puppetmaster as this is going on, I don''t see anything unusual in regards to server load there either. At first, I suspected some kind of network connectivity issue, but I have run several tcpdumps, and everything seems to be connecting fine. Is there something else I should be looking at to determine the cause of this problem? -- Michael Conigliaro Computer Analyst Fuss & O''Neill Technologies www.fandotech.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Nigel Kersten
2009-May-04 15:34 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
what versions of puppet and facter are you using? On Mon, May 4, 2009 at 7:41 AM, Michael Conigliaro <mconigliaro@fandotech.com> wrote:> > Hello, > > I''ve been seeing some strange behavior for the last few days, and I''m > not sure what else to do to troubleshoot it. It started when two of my > puppet clients (always the same two) suddenly began to take forever to > finish their catalog runs. I''m not sure what changed around 5pm last > Thursday to cause this, but well, check out the logs... > > Apr 30 16:50:50 emsdb01 puppetd[11363]: Finished catalog run in 4.59 > seconds Apr 30 17:26:29 emsdb01 puppetd[11363]: Finished catalog run in > 292.65 seconds > > Apr 30 16:43:05 emsweb01 puppetd[1345]: Finished catalog run in 4.47 > seconds Apr 30 17:18:27 emsweb01 puppetd[1345]: Finished catalog run in > 292.53 seconds > > Now when emsdb01 and emsweb01 are checking in, all the other clients > seem to queue up behind them. So I end up seeing... > > Conection timeout calling puppetmaster.getconfig: execution expired > > And... > > Could not retrieve catalog: Connection Timeout > > from all the other clients. For those that don''t time out, I see a > flurry of "finished catalog run" as they all complete at the same time > (but with excessive catalog run times, presumably because everyone else > got held up by emsdb01 and emsweb01). > > When I run "puppetd --test --debug" on emsdb01 and emsweb01, I don''t see > anything unusual happening. Both servers are basically idle while the > catalog run is happening. What I do see is that they get hung up on > these tasks for some reason: > > debug: Calling puppetmaster.getconfig > debug: Calling fileserver.describe > > When I look at the puppetmaster as this is going on, I don''t see > anything unusual in regards to server load there either. At first, I > suspected some kind of network connectivity issue, but I have run > several tcpdumps, and everything seems to be connecting fine. Is there > something else I should be looking at to determine the cause of this > problem? > > -- > Michael Conigliaro > Computer Analyst > Fuss & O''Neill Technologies > www.fandotech.com > > > > > >-- Nigel Kersten nigelk@google.com System Administrator Google, Inc. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Michael Conigliaro
2009-May-04 15:43 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
[root@emsdb01 ~]# puppetd --version 0.24.8 [root@emsdb01 ~]# facter --version 1.5.2 -- Michael Conigliaro Computer Analyst Fuss & O''Neill Technologies www.fandotech.com -----Original Message----- From: puppet-users@googlegroups.com [mailto:puppet-users@googlegroups.com] On Behalf Of Nigel Kersten Sent: Monday, May 04, 2009 11:35 AM To: puppet-users@googlegroups.com Subject: [Puppet Users] Re: long catalog run times and random connection timeouts what versions of puppet and facter are you using? On Mon, May 4, 2009 at 7:41 AM, Michael Conigliaro <mconigliaro@fandotech.com> wrote:> > Hello, > > I''ve been seeing some strange behavior for the last few days, and I''m > not sure what else to do to troubleshoot it. It started when two of my > puppet clients (always the same two) suddenly began to take forever to > finish their catalog runs. I''m not sure what changed around 5pm last > Thursday to cause this, but well, check out the logs... > > Apr 30 16:50:50 emsdb01 puppetd[11363]: Finished catalog run in 4.59 > seconds Apr 30 17:26:29 emsdb01 puppetd[11363]: Finished catalog run in > 292.65 seconds > > Apr 30 16:43:05 emsweb01 puppetd[1345]: Finished catalog run in 4.47 > seconds Apr 30 17:18:27 emsweb01 puppetd[1345]: Finished catalog run in > 292.53 seconds > > Now when emsdb01 and emsweb01 are checking in, all the other clients > seem to queue up behind them. So I end up seeing... > > Conection timeout calling puppetmaster.getconfig: execution expired > > And... > > Could not retrieve catalog: Connection Timeout > > from all the other clients. For those that don''t time out, I see a > flurry of "finished catalog run" as they all complete at the same time > (but with excessive catalog run times, presumably because everyone else > got held up by emsdb01 and emsweb01). > > When I run "puppetd --test --debug" on emsdb01 and emsweb01, I don''t see > anything unusual happening. Both servers are basically idle while the > catalog run is happening. What I do see is that they get hung up on > these tasks for some reason: > > debug: Calling puppetmaster.getconfig > debug: Calling fileserver.describe > > When I look at the puppetmaster as this is going on, I don''t see > anything unusual in regards to server load there either. At first, I > suspected some kind of network connectivity issue, but I have run > several tcpdumps, and everything seems to be connecting fine. Is there > something else I should be looking at to determine the cause of this > problem? > > -- > Michael Conigliaro > Computer Analyst > Fuss & O''Neill Technologies > www.fandotech.com > > > > > >-- Nigel Kersten nigelk@google.com System Administrator Google, Inc. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Paul Lathrop
2009-May-04 19:40 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
On Mon, May 4, 2009 at 7:55 AM, Michael Conigliaro <mconigliaro@fandotech.com> wrote:> > I''m actually not sure. How do I determine that? I just use the redhat > rpms from the epel repository, and I don''t remember seeing an option for > that anywhere.If you aren''t sure what you are using, you are using webrick. The problem you''re running into is probably the scalability wall. How many clients are you running? Take a look at the http://reductivelabs.com/trac/puppet/wiki/PuppetScalability page. I have had great success with Mongrel+Nginx, but Passenger also looks promising. --Paul --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Nigel Kersten
2009-May-04 20:20 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
On Mon, May 4, 2009 at 12:40 PM, Paul Lathrop <paul@tertiusfamily.net> wrote:> > On Mon, May 4, 2009 at 7:55 AM, Michael Conigliaro > <mconigliaro@fandotech.com> wrote: >> >> I''m actually not sure. How do I determine that? I just use the redhat >> rpms from the epel repository, and I don''t remember seeing an option for >> that anywhere. > > If you aren''t sure what you are using, you are using webrick. The > problem you''re running into is probably the scalability wall. How many > clients are you running? > > Take a look at the > http://reductivelabs.com/trac/puppet/wiki/PuppetScalability page. I > have had great success with Mongrel+Nginx, but Passenger also looks > promising.We''re seeing an incredible difference under very heavy load with Passenger. Passenger is a much simpler setup to maintain as well, and you get all the existing stuff around Apache for free, which makes capacity planning a lot simpler. -- Nigel Kersten nigelk@google.com System Administrator Google, Inc. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Michael Conigliaro
2009-May-04 21:45 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
Ok guys, this was a tough nut to crack, but I think I figured it out. This problem only occurred on clients that lived within a certain security zone behind my firewall. When a client was on the same vlan as the puppetmaster, everything worked fine. As soon as I moved it into any one of a particular set of vlans (all within the same security zone on my firewall), I got this slowness problem. I spent most of my time trying to figure out why/how my firewall could be causing things to be slow rather than just denying the connections altogether. But I digress... The root cause was that I did not have reverse dns records set up for any of these vlans. Using tcpdump, I was able to see that every time a client connects, the puppetmaster attempts a reverse dns lookup on the client''s ip. I''m not exactly sure why yet, but dns lookups against nonexistent in-addr.arpa domains take *for-freaking-ever* on my network. Once I set up the reverse lookup zone and added the necessary ptr records, catalog runs were completing in a few seconds again. I hope someone out there benefits from this thread, because I was pulling my hair out over this problem! -- Michael Conigliaro Computer Analyst Fuss & O''Neill Technologies www.fandotech.com -----Original Message----- From: puppet-users@googlegroups.com [mailto:puppet-users@googlegroups.com] On Behalf Of Paul Lathrop Sent: Monday, May 04, 2009 3:41 PM To: puppet-users@googlegroups.com Subject: [Puppet Users] Re: long catalog run times and random connection timeouts On Mon, May 4, 2009 at 7:55 AM, Michael Conigliaro <mconigliaro@fandotech.com> wrote:> > I''m actually not sure. How do I determine that? I just use the redhat > rpms from the epel repository, and I don''t remember seeing an optionfor> that anywhere.If you aren''t sure what you are using, you are using webrick. The problem you''re running into is probably the scalability wall. How many clients are you running? Take a look at the http://reductivelabs.com/trac/puppet/wiki/PuppetScalability page. I have had great success with Mongrel+Nginx, but Passenger also looks promising. --Paul --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---
Jean-Baptiste Quenot
2009-Oct-08 10:00 UTC
[Puppet Users] Re: long catalog run times and random connection timeouts
2009/5/4 Michael Conigliaro <mconigliaro@fandotech.com>:> > Ok guys, this was a tough nut to crack, but I think I figured it out. > > This problem only occurred on clients that lived within a certain > security zone behind my firewall. When a client was on the same vlan as > the puppetmaster, everything worked fine. As soon as I moved it into any > one of a particular set of vlans (all within the same security zone on > my firewall), I got this slowness problem. I spent most of my time > trying to figure out why/how my firewall could be causing things to be > slow rather than just denying the connections altogether. But I > digress... > > The root cause was that I did not have reverse dns records set up for > any of these vlans. Using tcpdump, I was able to see that every time a > client connects, the puppetmaster attempts a reverse dns lookup on the > client''s ip. I''m not exactly sure why yet, but dns lookups against > nonexistent in-addr.arpa domains take *for-freaking-ever* on my network. > Once I set up the reverse lookup zone and added the necessary ptr > records, catalog runs were completing in a few seconds again. > > I hope someone out there benefits from this thread, because I was > pulling my hair out over this problem!Indeed Puppet is completely unusable when the reverse DNS entries are not declared, and especially when DNS timeouts are experienced. I filed an issue for this http://projects.reductivelabs.com/issues/2708 Please feel free to comment with your own experience to have a more comprehensive bug report. -- Jean-Baptiste Quenot http://jbq.caraldi.com/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en -~----------~----~----~----~------~----~------~--~---