Matthew Nicholson
2012-Nov-14 17:37 UTC
[Puppet Users] ssh key collection in puppetdb slow queries/blocking nodes
I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, and wanted to loop back, figured at least starting the conversion here again was a good idea. We have starting having problems with one query slowing our puppetdb way, way down, and seemingly blocking other queries/node runs. We are an HPC shop, and have about 2K nodes. of that, about 1500 do the following: @@sshkey { "${hostname}": host_aliases => ["$fqdn", "$ipaddress" ], type => "rsa", key => $sshrsakey, ensure => present, } Sshkey <<| type == "rsa" |>> {ensure => present} Thats it. nothing really crazy/special in there. This ends up as: LOG: duration: 5690.773 ms execute <unnamed>: select results.* from (SELECT certname_catalogs.certname, catalog_resources.resource, catalog_resources.type, catalog_resources.title,catalog_resources.tags, catalog_resources.exported, catalog_resources.sourcefile, catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp USING(resource) INNER JOIN certnames ON certname_catalogs.certname certnames.name WHERE (catalog_resources.type = $1) AND (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND rp.value = $5))) results LIMIT 50001 DETAIL: parameters: $1 = ''Sshkey'', $2 = ''t'', $3 = ''hero4209'', $4 = ''type'', $5 = ''"rsa"'' Would adding an index on this be an option (i''m not a huge postgres guru, maybe I''m using the wrong terms). As soon as we commented out the collection, like: #Sshkey <<| type == "rsa" |>> {ensure => present} Things all go back to normal, and nodes run nice and quickly. With that in there, nodes would hang running and start timing out. Our 2K nodes are on a 2 hour run interval. Any help/thoughts? I''m in irc as sjoeboo as well. -- Matthew Nicholson -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2012-Nov-15 14:53 UTC
[Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes
On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote:> > I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, > and wanted to loop back, figured at least starting the conversion here > again was a good idea. > > We have starting having problems with one query slowing our puppetdb way, > way down, and seemingly blocking other queries/node runs. > > We are an HPC shop, and have about 2K nodes. of that, about 1500 do the > following: > > @@sshkey { "${hostname}": > host_aliases => ["$fqdn", "$ipaddress" ], > type => "rsa", > key => $sshrsakey, > ensure => present, > } > > Sshkey <<| type == "rsa" |>> {ensure => present} > > Thats it. nothing really crazy/special in there. > > This ends up as: > > LOG: duration: 5690.773 ms execute <unnamed>: select results.* from > (SELECT certname_catalogs.certname, catalog_resources.resource, > catalog_resources.type, catalog_resources.title,catalog_resources.tags, > catalog_resources.exported, catalog_resources.sourcefile, > catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources > JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp > USING(resource) INNER JOIN certnames ON certname_catalogs.certname = > certnames.name WHERE (catalog_resources.type = $1) AND > (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND > (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource > IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND > rp.value = $5))) results LIMIT 50001 > DETAIL: parameters: $1 = ''Sshkey'', $2 = ''t'', $3 = ''hero4209'', $4 = > ''type'', $5 = ''"rsa"'' > > Would adding an index on this be an option (i''m not a huge postgres guru, > maybe I''m using the wrong terms). > > As soon as we commented out the collection, like: > > #Sshkey <<| type == "rsa" |>> {ensure => present} > > Things all go back to normal, and nodes run nice and quickly. With that in > there, nodes would hang running and start timing out. Our 2K nodes are on a > 2 hour run interval. > > Any help/thoughts? I''m in irc as sjoeboo as well. > >First, make sure you are using thin_storeconfigs. After making the switch (if it is a switch), it may take some time for the all nodes'' changes to propagate to the DB, but the difference should be a lot fewer rows in your DB. That could speed you up far more than any indexing. Also, I presume that you are already using the PostgreSQL back-end instead of the built-in one, but if not then you should switch now. The built-in back end is simply not up to the task of handing so many nodes efficiently. The query itself looks like it could use some optimization, but that''s out of your hands unless you want to hack on Puppet itself. I don''t know which columns may be indexed already, and I didn''t find any documentation of the schema at PL or in puppetlabs'' GitHub (what''s up with that, PL?). It might indeed be the case that adding indexes on one or more key columns would help you out, but you really ought to tackle this in a systematic manner. Specifically, use a query analyzer (I presume Postgres has one) to identify the expensive parts of that query, and consider adding indices that will improve those parts (e.g. indices on columns of long tables that serve as join columns or WHERE criteria). Lather, rinse, repeat until it''s good enough or you can''t do any better. Be aware also that time saved in the query will be partially offset by time consumed in maintaining each additional index you create. If you choose strategically then you could conceivably see a dramatic overall gain, but don''t go overboard. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Deepak Giridharagopal
2012-Nov-15 20:26 UTC
Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes
We (grim_radical, nlew, and cprice) have continued working with Matt (sjoeboo) on the #puppet IRC channel over the last few days. Apologies for not updating this thread accordingly! On Thu, Nov 15, 2012 at 7:53 AM, jcbollinger <John.Bollinger@stjude.org>wrote:> > > On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote: >> >> I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, >> and wanted to loop back, figured at least starting the conversion here >> again was a good idea. >> >> We have starting having problems with one query slowing our puppetdb way, >> way down, and seemingly blocking other queries/node runs. >> >> We are an HPC shop, and have about 2K nodes. of that, about 1500 do the >> following: >> >> @@sshkey { "${hostname}": >> host_aliases => ["$fqdn", "$ipaddress" ], >> type => "rsa", >> key => $sshrsakey, >> ensure => present, >> } >> >> Sshkey <<| type == "rsa" |>> {ensure => present} >> >> Thats it. nothing really crazy/special in there. >> >> This ends up as: >> >> LOG: duration: 5690.773 ms execute <unnamed>: select results.* from >> (SELECT certname_catalogs.certname, catalog_resources.resource, >> catalog_resources.type, catalog_resources.title,**catalog_resources.tags, >> catalog_resources.exported, catalog_resources.sourcefile, >> catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources >> JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp >> USING(resource) INNER JOIN certnames ON certname_catalogs.certname >> certnames.name WHERE (catalog_resources.type = $1) AND >> (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND >> (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource >> IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND >> rp.value = $5))) results LIMIT 50001 >> DETAIL: parameters: $1 = ''Sshkey'', $2 = ''t'', $3 = ''hero4209'', $4 >> ''type'', $5 = ''"rsa"'' >> >> Would adding an index on this be an option (i''m not a huge postgres guru, >> maybe I''m using the wrong terms). >> >> As soon as we commented out the collection, like: >> >> #Sshkey <<| type == "rsa" |>> {ensure => present} >> >> Things all go back to normal, and nodes run nice and quickly. With that >> in there, nodes would hang running and start timing out. Our 2K nodes are >> on a 2 hour run interval. >> >> Any help/thoughts? I''m in irc as sjoeboo as well. >> >> > > > First, make sure you are using thin_storeconfigs. After making the switch > (if it is a switch), it may take some time for the all nodes'' changes to > propagate to the DB, but the difference should be a lot fewer rows in your > DB. That could speed you up far more than any indexing. >Just an FYI...thin_storeconfigs doesn''t have an effect when using PuppetDB, as we always store the entire catalog every time. I''m confident we can come up with a perf fix without resorting to storing significantly less information, but that largely depends on how clever we are. :)> > Also, I presume that you are already using the PostgreSQL back-end instead > of the built-in one, but if not then you should switch now. The built-in > back end is simply not up to the task of handing so many nodes efficiently. > > The query itself looks like it could use some optimization, but that''s out > of your hands unless you want to hack on Puppet itself. > > I don''t know which columns may be indexed already, and I didn''t find any > documentation of the schema at PL or in puppetlabs'' GitHub (what''s up with > that, PL?). It might indeed be the case that adding indexes on one or more > key columns would help you out, but you really ought to tackle this in a > systematic manner. >There are two pieces to this: there is the puppetdb query that''s formulated on the puppet side, inside the puppetdb "resource" terminus. Then there''s the translation of that query to low-level SQL, which happens inside of the PuppetDB daemon. The schema is defined here: https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj It is currently represented as a "base" schema, with migrations on top that modify it. We should probably have a complementary version of this code that has the entire schema in totality in one shot, instead of building it up incrementally. Or at a minimum, a dev document that outlines the schema. I''d very much welcome some community help on that!> > Specifically, use a query analyzer (I presume Postgres has one) to > identify the expensive parts of that query, and consider adding indices > that will improve those parts (e.g. indices on columns of long tables that > serve as join columns or WHERE criteria). Lather, rinse, repeat until it''s > good enough or you can''t do any better. > > Be aware also that time saved in the query will be partially offset by > time consumed in maintaining each additional index you create. If you > choose strategically then you could conceivably see a dramatic overall > gain, but don''t go overboard. >This is great advice, and it''s the way we''ve been approaching debugging the problem on the IRC channel. We''ve learned a few things: * The plan for the query when the problem was originally spotted looked like so: http://explain.depesz.com/s/gssC * We added an index on the "exported" field in catalog_resources. We re-analyzed the query when the system was mostly idle, and got the following plan: http://explain.depesz.com/s/I7e * That plan looks *significantly* better after that change, but once the load on the database machine ramped back up again, the plan looked like so: http://explain.depesz.com/s/en6 * The current catalog duplication rate Matt is seeing is somewhat low, < 50%. That could be causing bloat in the table, resulting in suboptimal performance. We''re now experimenting with changing the puppetdb GC interval to clean out old rows more regularly, and seeing if that correlates to decreased performance. * We''ve also changed the index we added to a partial index, so that index scans are cheaper. Stay tuned, and if anyone is interested in helping out with the debugging effort, we''re talking about this in #puppet on Freenode! deepak> > > John > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jakov Sosic
2012-Nov-22 00:43 UTC
Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes
On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote:> We (grim_radical, nlew, and cprice) have continued working with Matt > (sjoeboo) on the #puppet IRC channel over the last few days. Apologies > for not updating this thread accordingly! >jcbollinger asked about documentation of the schema of database, you seem to missed that part for some reason? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Deepak Giridharagopal
2012-Nov-22 01:20 UTC
Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes
On Wed, Nov 21, 2012 at 5:43 PM, Jakov Sosic <jsosic@srce.hr> wrote:> On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote: > > We (grim_radical, nlew, and cprice) have continued working with Matt > > (sjoeboo) on the #puppet IRC channel over the last few days. Apologies > > for not updating this thread accordingly! > > > > jcbollinger asked about documentation of the schema of database, you > seem to missed that part for some reason? > >From my earlier reply: "The schema is defined here: https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj It is currently represented as a "base" schema, with migrations on top that modify it. We should probably have a complementary version of this code that has the entire schema in totality in one shot, instead of building it up incrementally. Or at a minimum, a dev document that outlines the schema. I''d very much welcome some community help on that!" deepak -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jakov Sosic
2012-Nov-22 03:28 UTC
Re: [Puppet Users] Re: ssh key collection in puppetdb slow queries/blocking nodes
On 11/22/2012 02:20 AM, Deepak Giridharagopal wrote:> From my earlier reply:Ah, sorry, my mistake. I didn''t saw that because the first part of your answer was top-posted so I didn''t even bother to scroll further down... Thank you nevertheless :) -- Jakov Sosic www.srce.unizg.hr -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.