We are moving to have our nagios servers generate their nagios configs based on what services are installed on specific hosts (as well as the hosts registering themselves). What we have found is that our runtimes have gone through the roof on this and I''m trying to figure out why (summary below from a puppet run). The config pull takes a while, but the majority of the time is spent on the client side. Running puppet with -d has a large chunk of this time with nothing being updated on the screen and one processor core being pegged. We''re running 2.6.9 on SL6.0 x86_64. I''m not sure if I have an unreasonable number of resources and I need to do things differently or if I have a problem on my client I need to address. Any insight or direction to go down to continue debugging? (Ignore the errors, side effect of this being a test build on a test machine) # time puppets -td ….. debug: file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml; using pson [stall for a long time] debug: file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml; using pson …. notice: Finished catalog run in 1094.09 seconds Changes: Total: 2 Events: Success: 2 Failure: 3 Total: 5 Resources: Skipped: 1 Total: 1566 Changed: 2 Failed: 3 Out of sync: 5 Time: User: 0.00 Filebucket: 0.00 Host: 0.00 Mount: 0.01 Ssh authorized key: 0.01 Schedule: 0.01 Class: 0.01 Yumrepo: 0.03 Package: 0.23 Total: 108.87 Last run: 1315860016 File: 19.57 Exec: 29.27 Config retrieval: 52.20 Service: 7.53 real 20m47.707s user 18m12.793s sys 0m39.436s # Thanks, jl -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, On 11-09-12 04:43 PM, Justin Lambert wrote:> We are moving to have our nagios servers generate their nagios configs > based on what services are installed on specific hosts (as well as the > hosts registering themselves). What we have found is that our runtimes > have gone through the roof on this and I''m trying to figure out why > (summary below from a puppet run). The config pull takes a while, but > the majority of the time is spent on the client side. Running puppet > with -d has a large chunk of this time with nothing being updated on the > screen and one processor core being pegged. We''re running 2.6.9 on > SL6.0 x86_64.What db backend are you using for stored configs? If you''re using the sqlite3 backend, I''d recommend switching to mysql or postgresql. The sqlite3 backend is mainly there for easing puppet dev, but it''s way too slow for production use..> I''m not sure if I have an unreasonable number of resources and I need to > do things differently or if I have a problem on my client I need to > address. Any insight or direction to go down to continue debugging?Normally the client run time shouldn''t change much with or without exporting nagios resources, except on the Nagios server (the one extracting the puppet resources). In my experience, exporting native Nagios resources on Nagios clients and collecting them on the Nagios server doesn''t seem to be scaling very well. But still, it''s usable with around 100 hosts and 500 services.. -- Gabriel Filion -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Thanks for the response. We''re using Posrgres, and the catalog build seems a bit slow, but nothing compared to the client runtime which is where I''ve been focusing. Your assessment is correct, it is just the nagios server that is extremely slow (~20 mins), there is minimal/no impact to the client machines. We''re at about the 100 hosts, but have closer to 1500 services - maybe we have exceeded what storeconfigs can do then. If that is the case, is there a recommended alternative that isn''t manually maintaining config files? It seems like most of the processing time is spent client side and I haven''t been able to figure out why. Even doing an md5sum on all of the files from the CLI takes less than 2 seconds. On Mon, Sep 12, 2011 at 3:30 PM, Gabriel Filion <lelutin@gmail.com> wrote:> Hi, > > On 11-09-12 04:43 PM, Justin Lambert wrote: > > We are moving to have our nagios servers generate their nagios configs > > based on what services are installed on specific hosts (as well as the > > hosts registering themselves). What we have found is that our runtimes > > have gone through the roof on this and I''m trying to figure out why > > (summary below from a puppet run). The config pull takes a while, but > > the majority of the time is spent on the client side. Running puppet > > with -d has a large chunk of this time with nothing being updated on the > > screen and one processor core being pegged. We''re running 2.6.9 on > > SL6.0 x86_64. > > What db backend are you using for stored configs? > > If you''re using the sqlite3 backend, I''d recommend switching to mysql or > postgresql. The sqlite3 backend is mainly there for easing puppet dev, > but it''s way too slow for production use.. > > > I''m not sure if I have an unreasonable number of resources and I need to > > do things differently or if I have a problem on my client I need to > > address. Any insight or direction to go down to continue debugging? > > Normally the client run time shouldn''t change much with or without > exporting nagios resources, except on the Nagios server (the one > extracting the puppet resources). > > In my experience, exporting native Nagios resources on Nagios clients > and collecting them on the Nagios server doesn''t seem to be scaling very > well. But still, it''s usable with around 100 hosts and 500 services.. > > -- > Gabriel Filion >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On 11-09-12 05:41 PM, Justin Lambert wrote:> Thanks for the response. We''re using Posrgres, and the catalog build > seems a bit slow, but nothing compared to the client runtime which is > where I''ve been focusing. Your assessment is correct, it is just the > nagios server that is extremely slow (~20 mins), there is minimal/no > impact to the client machines. > > We''re at about the 100 hosts, but have closer to 1500 services - maybe > we have exceeded what storeconfigs can do then.hmm.. so yeah, you''ve hit the same kind of very bad scaling from the nagios config native resources than I''ve experienced. Seeing how bad it becomes with that number of services is now convincing me that I want to change method.> If that is the case, is > there a recommended alternative that isn''t manually maintaining config > files?One alternative would be to use file templates, combined with concatenated_file resources (from David Schmidt''s ''puppet-common'' module). That way, for every host and service definition (and other nagios config items), you can export a file and its contents will be verified by md5 sum. Every file that you export to the nagios server should notify a concatenation exec that binds everything together. The good thing with this method is that you can manage the module directory (where the different config file excerpts are stored) with ''purge => true'' so that only exported resources are present in the final nagios configuration (something that native types don''t handle very well -- or actually handle very badly).> It seems like most of the processing time is spent client side > and I haven''t been able to figure out why. Even doing an md5sum on all > of the files from the CLI takes less than 2 seconds.I haven''t traced the thing, but from what I could understand, the most time is spent in resolving relationships between exported nagios resources and ensuring that all the exported resources are unique. To verify this, you could setup postgres to log SQL requests and check out what gets requested during one run. -- Gabriel Filion -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
> > > We''re at about the 100 hosts, but have closer to 1500 services - maybe > > we have exceeded what storeconfigs can do then. > > hmm.. so yeah, you''ve hit the same kind of very bad scaling from the > nagios config native resources than I''ve experienced. Seeing how bad it > becomes with that number of services is now convincing me that I want to > change method.I''m just not convinced yet that the issue is on the stored config side. The config retrieval/catalog build is 2-3 mins (long, I agree), but that''s nothing compared to the other 17 minutes the client is busy with no output in debug mode and 100% on one core on the CPU used by puppet. What is it doing then?> > If that is the case, is > > there a recommended alternative that isn''t manually maintaining config > > files? > > One alternative would be to use file templates, combined with > concatenated_file resources (from David Schmidt''s ''puppet-common'' module). > That way, for every host and service definition (and other nagios config > items), you can export a file and its contents will be verified by md5 > sum. Every file that you export to the nagios server should notify a > concatenation exec that binds everything together. > > The good thing with this method is that you can manage the module > directory (where the different config file excerpts are stored) with > ''purge => true'' so that only exported resources are present in the final > nagios configuration (something that native types don''t handle very well > -- or actually handle very badly). >We actually wrote our own nagios module rather than using the built-in ones. We are exporting one cfg file per host for the host config, one subdir per host which contains all of the checks for that host (one file per check) and then can use purge on it as well to clean removed hosts/services. I know puppet does an md5sum on each of the files, but since the md5sum binary runs on all of /etc (with the binary being invoked once per file using find -exec) in less than 3 seconds it seems like something else is going on. I assume puppet is using a ruby md5 method and I haven''t tested it, but I can''t believe there is that significant of a difference over a binary invocation.> > > It seems like most of the processing time is spent client side > > and I haven''t been able to figure out why. Even doing an md5sum on all > > of the files from the CLI takes less than 2 seconds. > > I haven''t traced the thing, but from what I could understand, the most > time is spent in resolving relationships between exported nagios > resources and ensuring that all the exported resources are unique. To > verify this, you could setup postgres to log SQL requests and check out > what gets requested during one run.This is going to be another issue as this scales, but right now if I could get puppet runs to <5 mins on the nagios server I would be happy enough to move on and come back later. I should probably do this though so I understand how the queries are structured to see if there is any way I can add more dependency information to the data we are feeding the DB to make the queries more efficient. I appreciate the thoughts and feedback. Let me know if there is any other way to get more debug info that might help figure out what is going on or if there is a doNothing() method somewhere deep in the puppet magic. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Tue, Sep 13, 2011 at 12:41 AM, Justin Lambert <jlambert@localmatters.com> wrote:> Thanks for the response. We''re using Posrgres, and the catalog build seems > a bit slow, but nothing compared to the client runtime which is where I''ve > been focusing. Your assessment is correct, it is just the nagios server > that is extremely slow (~20 mins), there is minimal/no impact to the client > machines. > We''re at about the 100 hosts, but have closer to 1500 services - maybe we > have exceeded what storeconfigs can do then. If that is the case, is there > a recommended alternative that isn''t manually maintaining config files? It > seems like most of the processing time is spent client side and I haven''t > been able to figure out why. Even doing an md5sum on all of the files from > the CLI takes less than 2 seconds.While it would require you to generate the templates yourself, you can use foreman query script [1] to get the data you need based on all sort of conditions. Ohad [1] - https://github.com/ohadlevy/puppet-foreman/blob/master/foreman/lib/puppet/parser/functions/foreman.rb> > On Mon, Sep 12, 2011 at 3:30 PM, Gabriel Filion <lelutin@gmail.com> wrote: >> >> Hi, >> >> On 11-09-12 04:43 PM, Justin Lambert wrote: >> > We are moving to have our nagios servers generate their nagios configs >> > based on what services are installed on specific hosts (as well as the >> > hosts registering themselves). What we have found is that our runtimes >> > have gone through the roof on this and I''m trying to figure out why >> > (summary below from a puppet run). The config pull takes a while, but >> > the majority of the time is spent on the client side. Running puppet >> > with -d has a large chunk of this time with nothing being updated on the >> > screen and one processor core being pegged. We''re running 2.6.9 on >> > SL6.0 x86_64. >> >> What db backend are you using for stored configs? >> >> If you''re using the sqlite3 backend, I''d recommend switching to mysql or >> postgresql. The sqlite3 backend is mainly there for easing puppet dev, >> but it''s way too slow for production use.. >> >> > I''m not sure if I have an unreasonable number of resources and I need to >> > do things differently or if I have a problem on my client I need to >> > address. Any insight or direction to go down to continue debugging? >> >> Normally the client run time shouldn''t change much with or without >> exporting nagios resources, except on the Nagios server (the one >> extracting the puppet resources). >> >> In my experience, exporting native Nagios resources on Nagios clients >> and collecting them on the Nagios server doesn''t seem to be scaling very >> well. But still, it''s usable with around 100 hosts and 500 services.. >> >> -- >> Gabriel Filion > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1> We''re at about the 100 hosts, but have closer to 1500 services - maybe we > have exceeded what storeconfigs can do then. If that is the case, is there > a recommended alternative that isn''t manually maintaining config files?Just to be clear: After the catalog has been compiled and sent to the client, storeconfigs is not anymore involved at all. So if your compile time is reasonable, but execution time on the client is slow, then you have to look on the client side and not storeconfigs. Storeconfigs just adds a few selects (and inserts) of additional resources, that weren''t directly in your manifest. But they end up the same way in the catalog as any other resource does. ~pete -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5u+AkACgkQbwltcAfKi38OJACfXT9sumhvPpIoK0qUlH5x4JfN AaMAnROsu3S4u3w82sI9NzDeBWdlwKvP =xUF9 -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1> The good thing with this method is that you can manage the module > directory (where the different config file excerpts are stored) with > ''purge => true'' so that only exported resources are present in the final > nagios configuration (something that native types don''t handle very well > -- or actually handle very badly).Yes, because purging the directory would unexport the resource of the decomissioned host. But, as other resources might have been exported as well, that can''t be purged by that trick, I would still recommend to clean up decommissioned hosts with the "puppet node clean" face-action, that got partially merged into 2.7.3 and hopefully will be fully (with the important parts for our discussion) merged into 2.7.4. ~pete -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5u+bAACgkQbwltcAfKi39iQQCeIf2CPVgA3f11wE0VxPCefZDb OvEAn1Tw5UTSTnons6wJGyqUdO50lspD =c84z -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
I had a similar problem with my nagios server where the catalog run took about 500seconds for about 100 nodes with about 1000 services, most of which were generated with exported resources/stored config. We use the Naginator-resources in puppet. However, the main speed issue was not with the fetching of these resources but with calculating the checksums of all the files I used (to see if there were changes compared to the master). As was suggested in the 0.25.x-documentation we put our hosts and our services in different .cfg-files (in my case we choose to have a file per host or hostgroup with all of its services included). Yesterday I changed this however to only a couple of files. In services.cfg for example all services are kept now... same thing for hosts.cfg, hostgroups.cfg and commands.cfg. My puppetrun now takes about 160 seconds, which is still very slow in my opinion but a big gain compared to the 500 seconds before. The compile time has pretty much stayed the same (about 80 seconds), but at clientside we gained a lot of time. Maybe you can try if the same action works for you? The MD5-checksum of Puppet seems to be very slow indeed. We also don''t understand why it takes so long, but apparently it does. Kind regards, Bart On Sep 13, 8:35 am, Peter Meier <peter.me...@immerda.ch> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > The good thing with this method is that you can manage the module > > directory (where the different config file excerpts are stored) with > > ''purge => true'' so that only exported resources are present in the final > > nagios configuration (something that native types don''t handle very well > > -- or actually handle very badly). > > Yes, because purging the directory would unexport the resource of the > decomissioned host. But, as other resources might have been exported as > well, that can''t be purged by that trick, I would still recommend to > clean up decommissioned hosts with the "puppet node clean" face-action, > that got partially merged into 2.7.3 and hopefully will be fully (with > the important parts for our discussion) merged into 2.7.4. > > ~pete > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/ > > iEYEARECAAYFAk5u+bAACgkQbwltcAfKi39iQQCeIf2CPVgA3f11wE0VxPCefZDb > OvEAn1Tw5UTSTnons6wJGyqUdO50lspD > =c84z > -----END PGP SIGNATURE------- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
And as an additional note, there is a related bug/feature open for it: http://projects.puppetlabs.com/issues/5650 On Sep 13, 9:49 am, Bart Descamps <barty.desca...@gmail.com> wrote:> I had a similar problem with my nagios server where the catalog run > took about 500seconds for about 100 nodes with about 1000 services, > most of which were generated with exported resources/stored config. We > use the Naginator-resources in puppet. > > However, the main speed issue was not with the fetching of these > resources but with calculating the checksums of all the files I used > (to see if there were changes compared to the master). As was > suggested in the 0.25.x-documentation we put our hosts and our > services in different .cfg-files (in my case we choose to have a file > per host or hostgroup with all of its services included). Yesterday I > changed this however to only a couple of files. In services.cfg for > example all services are kept now... same thing for hosts.cfg, > hostgroups.cfg and commands.cfg. > > My puppetrun now takes about 160 seconds, which is still very slow in > my opinion but a big gain compared to the 500 seconds before. The > compile time has pretty much stayed the same (about 80 seconds), but > at clientside we gained a lot of time. > > Maybe you can try if the same action works for you? > > The MD5-checksum of Puppet seems to be very slow indeed. We also don''t > understand why it takes so long, but apparently it does. > > Kind regards, > Bart > > On Sep 13, 8:35 am, Peter Meier <peter.me...@immerda.ch> wrote: > > > > > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > The good thing with this method is that you can manage the module > > > directory (where the different config file excerpts are stored) with > > > ''purge => true'' so that only exported resources are present in the final > > > nagios configuration (something that native types don''t handle very well > > > -- or actually handle very badly). > > > Yes, because purging the directory would unexport the resource of the > > decomissioned host. But, as other resources might have been exported as > > well, that can''t be purged by that trick, I would still recommend to > > clean up decommissioned hosts with the "puppet node clean" face-action, > > that got partially merged into 2.7.3 and hopefully will be fully (with > > the important parts for our discussion) merged into 2.7.4. > > > ~pete > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.11 (GNU/Linux) > > Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/ > > > iEYEARECAAYFAk5u+bAACgkQbwltcAfKi39iQQCeIf2CPVgA3f11wE0VxPCefZDb > > OvEAn1Tw5UTSTnons6wJGyqUdO50lspD > > =c84z > > -----END PGP SIGNATURE------- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Thanks for everyone''s comments - I finally found where the large chunk of client time was going. Client runs are still slow, but 5 mins is a lot better than 20 minutes. The issue in our case was having the directory that contained the nagios config files managed by puppet (purge => true, recurse => true, force => true), but also had a source set pushing some real files as well as the files in the stored config. The process of merging those together seems to have taken about 12 minutes in our case. Thanks again, jl -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.