Chuck
2013-Jan-14 20:40 UTC
[Puppet Users] Puppetdb will setting gc-interval to 0 disable it
I have multiple puppetdb severs pointed at the same database. Would setting gc-interval to 0 disable it on the extra puppetdb nodes? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/4DEyx0IXbLsJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-14 20:46 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
I believe you''ll get an assertion failure by setting it to zero. In a way you probably don''t want it to be pinned to 1 node, because if it is removed from the cluster or goes down you''ll lose the garbage collection. However we don''t have a nice way today of splaying the interval or anything magical like that. Is having it run in multiple places causing you an issue today? On Mon, Jan 14, 2013 at 8:40 PM, Chuck <csschwe@gmail.com> wrote:> I have multiple puppetdb severs pointed at the same database. > > Would setting gc-interval to 0 disable it on the extra puppetdb nodes? > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/4DEyx0IXbLsJ. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-15 01:42 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb servers in various geographic locations. The garbage collect seems to be a pretty intensive operation on the Postgres DB server in the current design. We have also been having an issue that started at the beginning of 2013 that we have not been able to narrow down at this time (we were on puppetdb 1.0.1 for a long time with no issues. Updated to 1.0.5 last week with no improvement) I was planning on having my puppet manifest, for the puppetdb servers, elect one or two master puppetdb servers by querying for active puppetdb servers. So I would be dynamically electing the master(s) based on an algorithm. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/g0ERyTN2xS8J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-15 01:53 UTC
[Puppet Users] Re: Puppetdb will setting gc-interval to 0 disable it
PuppetDB dashboard snapshot <https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s1600/Screen+Shot+2013-01-14+at+7.50.12+PM.png> -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/24okmKqjUm8J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hi, I see that comments could be written : # my comment Or /* my coment */ But is there something like javadoc comments and tag ? Cordialement, Bernard Granier CE Plateforme Système bernard.granier@morpho.com<mailto:bernard.granier@morpho.com> 01 58 11 32 51 From: puppet-users@googlegroups.com [mailto:puppet-users@googlegroups.com] On Behalf Of Chuck Sent: Tuesday, January 15, 2013 2:53 AM To: puppet-users@googlegroups.com Subject: [Puppet Users] Re: Puppetdb will setting gc-interval to 0 disable it PuppetDB dashboard snapshot [https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s320/Screen+Shot+2013-01-14+at+7.50.12+PM.png]<https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s1600/Screen+Shot+2013-01-14+at+7.50.12+PM.png> -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/24okmKqjUm8J. To post to this group, send email to puppet-users@googlegroups.com<mailto:puppet-users@googlegroups.com>. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com<mailto:puppet-users+unsubscribe@googlegroups.com>. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. # " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." # -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-15 15:55 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
> Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb > servers in various geographic locations. The garbage collect seems to be a > pretty intensive operation on the Postgres DB server in the current design.Thats a shame to hear. What''s the impact for you? I see from your screenshot its averaging 16 seconds, is that what you are seeing?> We have also been having an issue that started at the beginning of 2013 that > we have not been able to narrow down at this time (we were on puppetdb 1.0.1 > for a long time with no issues. Updated to 1.0.5 last week with no > improvement)This issue, is it related to the gc problem or something else? Can you elaborate?> I was planning on having my puppet manifest, for the puppetdb servers, elect > one or two master puppetdb servers by querying for active puppetdb servers. > So I would be dynamically electing the master(s) based on an algorithm.Sounds good to me. Do you want to raise a feature request for this? http://projects.puppetlabs.com/projects/puppetdb Just more or less cover what you''ve said about wanting to disable by setting to zero. I''ve already more or less got a patch that can disable it when zero (its really just a conditional clause in the right place), but I haven''t done any thorough testing or gotten review on it yet. ken. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-15 16:12 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote:> > > Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb > > servers in various geographic locations. The garbage collect seems to > be a > > pretty intensive operation on the Postgres DB server in the current > design. > > Thats a shame to hear. What''s the impact for you? I see from your > screenshot its averaging 16 seconds, is that what you are seeing? >This is normally what we see 8 times of course.> > We have also been having an issue that started at the beginning of 2013 > that > > we have not been able to narrow down at this time (we were on puppetdb > 1.0.1 > > for a long time with no issues. Updated to 1.0.5 last week with no > > improvement) > > This issue, is it related to the gc problem or something else? Can you > elaborate? > > We are still trying to determine what is happening. We just see a hugespike in the number of rows being deleted.> > I was planning on having my puppet manifest, for the puppetdb servers, > elect > > one or two master puppetdb servers by querying for active puppetdb > servers. > > So I would be dynamically electing the master(s) based on an algorithm. > > Sounds good to me. Do you want to raise a feature request for this? >Yeah I can do that.> http://projects.puppetlabs.com/projects/puppetdb > > Just more or less cover what you''ve said about wanting to disable by > setting to zero. I''ve already more or less got a patch that can > disable it when zero (its really just a conditional clause in the > right place), but I haven''t done any thorough testing or gotten review > on it yet. > > ken. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-15 19:27 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
Hey Chuck, I''ve had a chat with my colleagues and they raised some concerns about your gc performance. I wouldn''t mind drilling into that problem if thats okay with you? For starters - what are your settings looking like? In particular I''m interested in node-ttl-seconds and report-ttl-seconds. FYI there are a few actions that occur when gc kicks in: * sweeping stale nodes * sweeping stale reports * removing any unassociated data (catalogs or params) It would be useful to find out which of these are taking up the most amount of your time. I wouldn''t mind a dump of your logs when in a specific debug mode if thats at all possible, this should give me a rough idea of the time the sweeper spends doing its various deletes so I can at least get a focus on your issue. In particular the bits between ''Starting database garbage collection'' and ''Finished database garbage collection''. The easiest way to get some debugging is to modify your log4j.properties (usually in /etc/puppetdb) and add the line: log4j.logger.com.jolbox.bonecp=DEBUG This will log *ALL* SQL statements, so it might be heavy - I''m not sure if you can isolate one of your nodes for this purpose or somehow mitigate this risk ... we only need a heavy GC run, it probably doesn''t need to be actively receiving requests and such ... ken. On Tue, Jan 15, 2013 at 4:12 PM, Chuck <csschwe@gmail.com> wrote:> > > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb >> > servers in various geographic locations. The garbage collect seems to >> > be a >> > pretty intensive operation on the Postgres DB server in the current >> > design. >> >> Thats a shame to hear. What''s the impact for you? I see from your >> screenshot its averaging 16 seconds, is that what you are seeing? > > > This is normally what we see 8 times of course. > >> >> > We have also been having an issue that started at the beginning of 2013 >> > that >> > we have not been able to narrow down at this time (we were on puppetdb >> > 1.0.1 >> > for a long time with no issues. Updated to 1.0.5 last week with no >> > improvement) >> >> This issue, is it related to the gc problem or something else? Can you >> elaborate? >> > We are still trying to determine what is happening. We just see a huge > spike in the number of rows being deleted. > >> >> > I was planning on having my puppet manifest, for the puppetdb servers, >> > elect >> > one or two master puppetdb servers by querying for active puppetdb >> > servers. >> > So I would be dynamically electing the master(s) based on an algorithm. >> >> Sounds good to me. Do you want to raise a feature request for this? > > > Yeah I can do that. > >> >> http://projects.puppetlabs.com/projects/puppetdb >> >> Just more or less cover what you''ve said about wanting to disable by >> setting to zero. I''ve already more or less got a patch that can >> disable it when zero (its really just a conditional clause in the >> right place), but I haven''t done any thorough testing or gotten review >> on it yet. >> >> ken. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-15 20:24 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
I will work on setting up a puppetdb server I can put into debug mode. For now here is a "normal" GC run log output and my configuration files. Thanks, 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] [cli.services] Starting database garbage collection 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] [jdbc.internal] Query slower than 10s threshold: actual execution time: 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] [jdbc.internal] Query slower than 10s threshold: actual execution time: 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS (SELECT * FROM catalog_resources cr WHERE cr.resource=resource_params.resource); Query Params: 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] [cli.services] Finished database garbage collection 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] [cli.services] Starting sweep of stale nodes (1 day threshold) 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] [cli.services] Finished sweep of stale nodes (1 day threshold) ########################################################## # # config.ini # # See README.md for more thorough explanations of each section and # option. [global] # Store mq/db data in a custom directory vardir = /var/lib/puppetdb # Use an external log4j config file logging-config = /etc/puppetdb/conf.d/../log4j.properties # Maximum number of results that a resource query may return resource-query-limit = 20000 [command-processing] # How many command-processing threads to use, defaults to (CPUs / 2) threads = 10 ########################################################## # # database.ini # [database] # For the embedded DB: org.hsqldb.jdbcDriver # For PostgreSQL: org.postgresql.Driver # Defaults to embedded DB classname = org.postgresql.Driver # For the embedded DB: hsqldb # For PostgreSQL: postgresql # Defaults to embedded DB subprotocol = postgresql # For the embedded DB: file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true # For PostgreSQL: //host:port/databaseName # Defaults to embedded DB located in <vardir>/db subname = //PUPPETDB:5432/puppet # Connect as a specific user username = PUPPETDBUSER # Use a specific password password = PASSWORDPUPPETDB # How often (in minutes) to compact the database # gc-interval = 60 # Number of seconds before any SQL query is considered ''slow''; offending # queries will not be interrupted, but will be logged at the WARN log level. log-slow-statements = 10 # How old (in days) to deactivate nodes that have not reported node-ttl-days = 1 # How often to send a keepalive conn-keep-alive = 4 # How long to keep connections open conn-max-age = 30 On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote:> > Hey Chuck, > > I''ve had a chat with my colleagues and they raised some concerns about > your gc performance. I wouldn''t mind drilling into that problem if > thats okay with you? > > For starters - what are your settings looking like? In particular I''m > interested in node-ttl-seconds and report-ttl-seconds. > > FYI there are a few actions that occur when gc kicks in: > > * sweeping stale nodes > * sweeping stale reports > * removing any unassociated data (catalogs or params) > > It would be useful to find out which of these are taking up the most > amount of your time. > > I wouldn''t mind a dump of your logs when in a specific debug mode if > thats at all possible, this should give me a rough idea of the time > the sweeper spends doing its various deletes so I can at least get a > focus on your issue. In particular the bits between ''Starting database > garbage collection'' and ''Finished database garbage collection''. > > The easiest way to get some debugging is to modify your > log4j.properties (usually in /etc/puppetdb) and add the line: > > log4j.logger.com.jolbox.bonecp=DEBUG > > This will log *ALL* SQL statements, so it might be heavy - I''m not > sure if you can isolate one of your nodes for this purpose or somehow > mitigate this risk ... we only need a heavy GC run, it probably > doesn''t need to be actively receiving requests and such ... > > ken. > > On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com <javascript:>> > wrote: > > > > > > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: > >> > >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 > puppetdb > >> > servers in various geographic locations. The garbage collect seems > to > >> > be a > >> > pretty intensive operation on the Postgres DB server in the current > >> > design. > >> > >> Thats a shame to hear. What''s the impact for you? I see from your > >> screenshot its averaging 16 seconds, is that what you are seeing? > > > > > > This is normally what we see 8 times of course. > > > >> > >> > We have also been having an issue that started at the beginning of > 2013 > >> > that > >> > we have not been able to narrow down at this time (we were on > puppetdb > >> > 1.0.1 > >> > for a long time with no issues. Updated to 1.0.5 last week with no > >> > improvement) > >> > >> This issue, is it related to the gc problem or something else? Can you > >> elaborate? > >> > > We are still trying to determine what is happening. We just see a huge > > spike in the number of rows being deleted. > > > >> > >> > I was planning on having my puppet manifest, for the puppetdb > servers, > >> > elect > >> > one or two master puppetdb servers by querying for active puppetdb > >> > servers. > >> > So I would be dynamically electing the master(s) based on an > algorithm. > >> > >> Sounds good to me. Do you want to raise a feature request for this? > > > > > > Yeah I can do that. > > > >> > >> http://projects.puppetlabs.com/projects/puppetdb > >> > >> Just more or less cover what you''ve said about wanting to disable by > >> setting to zero. I''ve already more or less got a patch that can > >> disable it when zero (its really just a conditional clause in the > >> right place), but I haven''t done any thorough testing or gotten review > >> on it yet. > >> > >> ken. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Puppet Users" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. > > > > To post to this group, send email to puppet...@googlegroups.com<javascript:>. > > > To unsubscribe from this group, send email to > > puppet-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-15 21:19 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
So that looks like its taking ages ... some questions for you: * What version of Postgresql are you running and on what distro & version exactly? * Are you using the distros packages or someone elses? * Is autovacuum switched on at all (check your postgresql.conf)? * Do you know the last time the database was vacuum''d and/or analyzed? Try this link for a way to check this: http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ * Can you get me some box metrics, memory size, cpu size/number, disk spindle speeds etc. of your postgresql box? So I can get an idea of the throughput available to you. * Maybe any information about the postgresql tuning items - shared_buffers is probably a good start, or have you left the tuning alone? You probably don''t need debug with log-slow-statements, I wasn''t aware that was an option but I think that gives a good amount of data for this problem. On Tue, Jan 15, 2013 at 8:24 PM, Chuck <csschwe@gmail.com> wrote:> I will work on setting up a puppetdb server I can put into debug mode. For > now here is a "normal" GC run log output and my configuration files. > > Thanks, > > > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] > [cli.services] Starting database garbage collection > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] > [jdbc.internal] Query slower than 10s threshold: actual execution time: > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] > [jdbc.internal] Query slower than 10s threshold: actual execution time: > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS > (SELECT * FROM catalog_resources cr WHERE > cr.resource=resource_params.resource); Query Params: > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > [cli.services] Finished database garbage collection > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > [cli.services] Starting sweep of stale nodes (1 day threshold) > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] > [cli.services] Finished sweep of stale nodes (1 day threshold) > > ########################################################## > # > # config.ini > # > > # See README.md for more thorough explanations of each section and > # option. > > [global] > # Store mq/db data in a custom directory > vardir = /var/lib/puppetdb > # Use an external log4j config file > logging-config = /etc/puppetdb/conf.d/../log4j.properties > > # Maximum number of results that a resource query may return > resource-query-limit = 20000 > > [command-processing] > # How many command-processing threads to use, defaults to (CPUs / 2) > threads = 10 > > ########################################################## > # > # database.ini > # > > [database] > # For the embedded DB: org.hsqldb.jdbcDriver > # For PostgreSQL: org.postgresql.Driver > # Defaults to embedded DB > classname = org.postgresql.Driver > > # For the embedded DB: hsqldb > # For PostgreSQL: postgresql > # Defaults to embedded DB > subprotocol = postgresql > > # For the embedded DB: > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true > # For PostgreSQL: //host:port/databaseName > # Defaults to embedded DB located in <vardir>/db > subname = //PUPPETDB:5432/puppet > > # Connect as a specific user > username = PUPPETDBUSER > > # Use a specific password > password = PASSWORDPUPPETDB > > # How often (in minutes) to compact the database > # gc-interval = 60 > > # Number of seconds before any SQL query is considered ''slow''; offending > # queries will not be interrupted, but will be logged at the WARN log level. > log-slow-statements = 10 > > # How old (in days) to deactivate nodes that have not reported > node-ttl-days = 1 > > # How often to send a keepalive > conn-keep-alive = 4 > > # How long to keep connections open > conn-max-age = 30 > > > > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: >> >> Hey Chuck, >> >> I''ve had a chat with my colleagues and they raised some concerns about >> your gc performance. I wouldn''t mind drilling into that problem if >> thats okay with you? >> >> For starters - what are your settings looking like? In particular I''m >> interested in node-ttl-seconds and report-ttl-seconds. >> >> FYI there are a few actions that occur when gc kicks in: >> >> * sweeping stale nodes >> * sweeping stale reports >> * removing any unassociated data (catalogs or params) >> >> It would be useful to find out which of these are taking up the most >> amount of your time. >> >> I wouldn''t mind a dump of your logs when in a specific debug mode if >> thats at all possible, this should give me a rough idea of the time >> the sweeper spends doing its various deletes so I can at least get a >> focus on your issue. In particular the bits between ''Starting database >> garbage collection'' and ''Finished database garbage collection''. >> >> The easiest way to get some debugging is to modify your >> log4j.properties (usually in /etc/puppetdb) and add the line: >> >> log4j.logger.com.jolbox.bonecp=DEBUG >> >> This will log *ALL* SQL statements, so it might be heavy - I''m not >> sure if you can isolate one of your nodes for this purpose or somehow >> mitigate this risk ... we only need a heavy GC run, it probably >> doesn''t need to be actively receiving requests and such ... >> >> ken. >> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: >> > >> > >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: >> >> >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 >> >> > puppetdb >> >> > servers in various geographic locations. The garbage collect seems >> >> > to >> >> > be a >> >> > pretty intensive operation on the Postgres DB server in the current >> >> > design. >> >> >> >> Thats a shame to hear. What''s the impact for you? I see from your >> >> screenshot its averaging 16 seconds, is that what you are seeing? >> > >> > >> > This is normally what we see 8 times of course. >> > >> >> >> >> > We have also been having an issue that started at the beginning of >> >> > 2013 >> >> > that >> >> > we have not been able to narrow down at this time (we were on >> >> > puppetdb >> >> > 1.0.1 >> >> > for a long time with no issues. Updated to 1.0.5 last week with no >> >> > improvement) >> >> >> >> This issue, is it related to the gc problem or something else? Can you >> >> elaborate? >> >> >> > We are still trying to determine what is happening. We just see a huge >> > spike in the number of rows being deleted. >> > >> >> >> >> > I was planning on having my puppet manifest, for the puppetdb >> >> > servers, >> >> > elect >> >> > one or two master puppetdb servers by querying for active puppetdb >> >> > servers. >> >> > So I would be dynamically electing the master(s) based on an >> >> > algorithm. >> >> >> >> Sounds good to me. Do you want to raise a feature request for this? >> > >> > >> > Yeah I can do that. >> > >> >> >> >> http://projects.puppetlabs.com/projects/puppetdb >> >> >> >> Just more or less cover what you''ve said about wanting to disable by >> >> setting to zero. I''ve already more or less got a patch that can >> >> disable it when zero (its really just a conditional clause in the >> >> right place), but I haven''t done any thorough testing or gotten review >> >> on it yet. >> >> >> >> ken. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Puppet Users" group. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. >> > >> > To post to this group, send email to puppet...@googlegroups.com. >> > To unsubscribe from this group, send email to >> > puppet-users...@googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/group/puppet-users?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-15 22:07 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote:> > So that looks like its taking ages ... some questions for you: > > * What version of Postgresql are you running and on what distro & > version exactly? >RHEL 6.3 Enterprise DB ppas-9.1.4 * Are you using the distros packages or someone elses?>Provided by Enterprise DB * Is autovacuum switched on at all (check your postgresql.conf)?>Yes. We also ran manual vacuum analyze with no noticeable changes in performance.> * Do you know the last time the database was vacuum''d and/or analyzed? > Try this link for a way to check this: > http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ >puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_user_tables; relname | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze -------------------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- catalog_resources | | 2013-01-12 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 18:27:11.166982+00 certname_facts_metadata | | 2013-01-15 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 19:22:15.308925+00 certnames | 2013-01-03 22:22:11.416002+00 | 2013-01-15 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 21:17:48.14814+00 resource_params | | 2013-01-05 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00 | 2013-01-15 15:56:52.589316+00 classes | | 2013-01-12 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 16:24:04.909758+00 catalogs | | 2013-01-15 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 21:18:16.493626+00 certname_facts | | 2013-01-15 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00 | 2013-01-15 21:25:08.780617+00 tags | | 2013-01-15 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00 | 2013-01-15 21:48:19.751198+00 edges | | 2013-01-12 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 17:27:19.061491+00 certname_catalogs | | 2013-01-15 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 21:18:55.369032+00 schema_migrations | | | 2012-11-06 22:02:40.637773+00 | (11 rows)> * Can you get me some box metrics, memory size, cpu size/number, disk > spindle speeds etc. of your postgresql box? So I can get an idea of > the throughput available to you. >It is a VMware VM the VM has the following properties: 8 CPU 16 GB of memory Attached to an Enterprise class SAN The database is on dev253-12 SAR DISK output 09:40:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 09:50:01 PM dev8-0 2.28 33.73 21.09 24.03 0.01 2.23 1.38 0.32 09:50:01 PM dev8-16 2.24 0.23 1501.34 670.68 0.03 11.20 1.50 0.34 09:50:01 PM dev8-32 0.57 0.01 7.28 12.68 0.00 3.42 3.11 0.18 09:50:01 PM dev8-48 10.07 0.53 1591.25 158.02 0.02 1.74 1.64 1.66 09:50:01 PM dev8-64 354.77 16164.46 2231.28 51.85 2.80 7.89 1.81 64.30 09:50:01 PM dev253-0 0.78 33.70 0.62 43.93 0.00 3.91 2.47 0.19 09:50:01 PM dev253-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-3 1.67 0.05 13.28 8.01 0.00 1.21 0.38 0.06 09:50:01 PM dev253-4 0.38 0.00 3.06 8.00 0.00 1.64 0.72 0.03 09:50:01 PM dev253-5 0.43 0.00 3.45 8.00 0.00 1.28 0.66 0.03 09:50:01 PM dev253-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-7 0.09 0.04 0.68 8.00 0.00 0.72 0.45 0.00 09:50:01 PM dev253-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:50:01 PM dev253-10 187.70 0.23 1501.34 8.00 2.62 13.95 0.02 0.34 09:50:01 PM dev253-11 198.97 0.53 1591.25 8.00 0.53 2.67 0.08 1.66 09:50:01 PM dev253-12 534.17 16164.46 2231.28 34.44 4.24 7.93 1.20 64.32 09:50:01 PM dev253-13 0.91 0.01 7.28 8.00 0.00 5.24 1.96 0.18 Average: dev8-0 3.21 73.51 24.19 30.43 0.02 7.48 4.18 1.34 Average: dev8-16 3.36 1.14 2568.18 763.92 0.08 22.78 1.09 0.37 Average: dev8-32 0.66 0.97 7.42 12.74 0.00 1.81 1.55 0.10 Average: dev8-48 9.95 1.60 1373.40 138.23 0.01 1.51 1.42 1.41 Average: dev8-64 140.65 6834.41 2298.08 64.93 1.19 8.49 1.82 25.66 Average: dev253-0 1.34 45.41 1.08 34.72 0.02 15.92 5.52 0.74 Average: dev253-1 0.03 0.20 0.07 8.00 0.00 2.18 0.67 0.00 Average: dev253-2 0.04 0.23 0.08 8.00 0.00 10.40 2.11 0.01 Average: dev253-3 1.89 9.17 13.84 12.20 0.01 3.85 1.67 0.31 Average: dev253-4 0.46 0.21 3.43 8.00 0.00 2.26 0.91 0.04 Average: dev253-5 0.81 16.84 4.94 26.88 0.01 7.93 2.83 0.23 Average: dev253-6 0.02 0.19 0.00 8.00 0.00 0.34 0.32 0.00 Average: dev253-7 0.12 0.23 0.74 8.10 0.00 1.41 0.99 0.01 Average: dev253-8 0.02 0.19 0.00 8.00 0.00 0.33 0.32 0.00 Average: dev253-9 0.04 0.36 0.00 8.00 0.00 0.52 0.52 0.00 Average: dev253-10 321.11 0.74 2568.18 8.00 9.20 28.65 0.01 0.36 Average: dev253-11 171.74 1.20 1373.40 8.00 0.43 2.53 0.08 1.42 Average: dev253-12 339.64 6834.01 2298.08 26.89 3.01 8.84 0.76 25.66 Average: dev253-13 0.98 0.57 7.42 8.19 0.00 2.26 1.01 0.10 * Maybe any information about the postgresql tuning items -> shared_buffers is probably a good start, or have you left the tuning > alone? >shared_buffers = 4096MB work_mem = 8MB maintenance_work_mem = 32MB> > You probably don''t need debug with log-slow-statements, I wasn''t aware > that was an option but I think that gives a good amount of data for > this problem. > > On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com <javascript:>> > wrote: > > I will work on setting up a puppetdb server I can put into debug mode. > For > > now here is a "normal" GC run log output and my configuration files. > > > > Thanks, > > > > > > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Starting database garbage collection > > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] > > [jdbc.internal] Query slower than 10s threshold: actual execution time: > > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * > > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: > > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] > > [jdbc.internal] Query slower than 10s threshold: actual execution time: > > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS > > (SELECT * FROM catalog_resources cr WHERE > > cr.resource=resource_params.resource); Query Params: > > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Finished database garbage collection > > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Starting sweep of stale nodes (1 day threshold) > > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] > > [cli.services] Finished sweep of stale nodes (1 day threshold) > > > > ########################################################## > > # > > # config.ini > > # > > > > # See README.md for more thorough explanations of each section and > > # option. > > > > [global] > > # Store mq/db data in a custom directory > > vardir = /var/lib/puppetdb > > # Use an external log4j config file > > logging-config = /etc/puppetdb/conf.d/../log4j.properties > > > > # Maximum number of results that a resource query may return > > resource-query-limit = 20000 > > > > [command-processing] > > # How many command-processing threads to use, defaults to (CPUs / 2) > > threads = 10 > > > > ########################################################## > > # > > # database.ini > > # > > > > [database] > > # For the embedded DB: org.hsqldb.jdbcDriver > > # For PostgreSQL: org.postgresql.Driver > > # Defaults to embedded DB > > classname = org.postgresql.Driver > > > > # For the embedded DB: hsqldb > > # For PostgreSQL: postgresql > > # Defaults to embedded DB > > subprotocol = postgresql > > > > # For the embedded DB: > > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true > > # For PostgreSQL: //host:port/databaseName > > # Defaults to embedded DB located in <vardir>/db > > subname = //PUPPETDB:5432/puppet > > > > # Connect as a specific user > > username = PUPPETDBUSER > > > > # Use a specific password > > password = PASSWORDPUPPETDB > > > > # How often (in minutes) to compact the database > > # gc-interval = 60 > > > > # Number of seconds before any SQL query is considered ''slow''; offending > > # queries will not be interrupted, but will be logged at the WARN log > level. > > log-slow-statements = 10 > > > > # How old (in days) to deactivate nodes that have not reported > > node-ttl-days = 1 > > > > # How often to send a keepalive > > conn-keep-alive = 4 > > > > # How long to keep connections open > > conn-max-age = 30 > > > > > > > > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: > >> > >> Hey Chuck, > >> > >> I''ve had a chat with my colleagues and they raised some concerns about > >> your gc performance. I wouldn''t mind drilling into that problem if > >> thats okay with you? > >> > >> For starters - what are your settings looking like? In particular I''m > >> interested in node-ttl-seconds and report-ttl-seconds. > >> > >> FYI there are a few actions that occur when gc kicks in: > >> > >> * sweeping stale nodes > >> * sweeping stale reports > >> * removing any unassociated data (catalogs or params) > >> > >> It would be useful to find out which of these are taking up the most > >> amount of your time. > >> > >> I wouldn''t mind a dump of your logs when in a specific debug mode if > >> thats at all possible, this should give me a rough idea of the time > >> the sweeper spends doing its various deletes so I can at least get a > >> focus on your issue. In particular the bits between ''Starting database > >> garbage collection'' and ''Finished database garbage collection''. > >> > >> The easiest way to get some debugging is to modify your > >> log4j.properties (usually in /etc/puppetdb) and add the line: > >> > >> log4j.logger.com.jolbox.bonecp=DEBUG > >> > >> This will log *ALL* SQL statements, so it might be heavy - I''m not > >> sure if you can isolate one of your nodes for this purpose or somehow > >> mitigate this risk ... we only need a heavy GC run, it probably > >> doesn''t need to be actively receiving requests and such ... > >> > >> ken. > >> > >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: > >> > > >> > > >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: > >> >> > >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 > >> >> > puppetdb > >> >> > servers in various geographic locations. The garbage collect > seems > >> >> > to > >> >> > be a > >> >> > pretty intensive operation on the Postgres DB server in the > current > >> >> > design. > >> >> > >> >> Thats a shame to hear. What''s the impact for you? I see from your > >> >> screenshot its averaging 16 seconds, is that what you are seeing? > >> > > >> > > >> > This is normally what we see 8 times of course. > >> > > >> >> > >> >> > We have also been having an issue that started at the beginning of > >> >> > 2013 > >> >> > that > >> >> > we have not been able to narrow down at this time (we were on > >> >> > puppetdb > >> >> > 1.0.1 > >> >> > for a long time with no issues. Updated to 1.0.5 last week with > no > >> >> > improvement) > >> >> > >> >> This issue, is it related to the gc problem or something else? Can > you > >> >> elaborate? > >> >> > >> > We are still trying to determine what is happening. We just see a > huge > >> > spike in the number of rows being deleted. > >> > > >> >> > >> >> > I was planning on having my puppet manifest, for the puppetdb > >> >> > servers, > >> >> > elect > >> >> > one or two master puppetdb servers by querying for active puppetdb > >> >> > servers. > >> >> > So I would be dynamically electing the master(s) based on an > >> >> > algorithm. > >> >> > >> >> Sounds good to me. Do you want to raise a feature request for this? > >> > > >> > > >> > Yeah I can do that. > >> > > >> >> > >> >> http://projects.puppetlabs.com/projects/puppetdb > >> >> > >> >> Just more or less cover what you''ve said about wanting to disable by > >> >> setting to zero. I''ve already more or less got a patch that can > >> >> disable it when zero (its really just a conditional clause in the > >> >> right place), but I haven''t done any thorough testing or gotten > review > >> >> on it yet. > >> >> > >> >> ken. > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "Puppet Users" group. > >> > To view this discussion on the web visit > >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. > >> > > >> > To post to this group, send email to puppet...@googlegroups.com. > >> > To unsubscribe from this group, send email to > >> > puppet-users...@googlegroups.com. > >> > For more options, visit this group at > >> > http://groups.google.com/group/puppet-users?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Puppet Users" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. > > > > To post to this group, send email to puppet...@googlegroups.com<javascript:>. > > > To unsubscribe from this group, send email to > > puppet-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/3BUCBok8BzIJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-16 14:33 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
Here is an example of locks taking a long time to acquire in the postgres DB 2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet LOG: process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 12166509.285 ms 2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet LOG: process 4553 still waiting for ShareLock on transaction 32181035 after 1000.100 ms 2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet LOG: process 5953 still waiting for ShareLock on transaction 32245132 after 1000.148 ms 2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet LOG: process 5953 acquired ShareLock on transaction 32245132 after 1813.791 ms 2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet LOG: process 4553 acquired ShareLock on transaction 32181035 after 172731.472 ms 2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: process 3346 acquired ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 12261437.817 ms 2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: process 3346 still waiting for ShareLock on transaction 32196872 after 1000.516 ms 2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: process 3346 acquired ShareLock on transaction 32196872 after 260360.667 ms 2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet LOG: process 27293 acquired ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 12475027.712 ms 2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet LOG: process 27293 still waiting for ShareLock on transaction 32197208 after 1000.078 ms 2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS |puppet LOG: process 17853 still waiting for ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 1000.212 ms 2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS |puppet LOG: process 20980 still waiting for ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 1000.733 ms 2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS |puppet LOG: process 28432 still waiting for ExclusiveLock on tuple (0,102) of relation 16811 of database 16513 after 1000.283 ms On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck wrote:> > > > On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote: >> >> So that looks like its taking ages ... some questions for you: >> >> * What version of Postgresql are you running and on what distro & >> version exactly? >> > RHEL 6.3 > Enterprise DB ppas-9.1.4 > > * Are you using the distros packages or someone elses? >> > Provided by Enterprise DB > > * Is autovacuum switched on at all (check your postgresql.conf)? >> > Yes. We also ran manual vacuum analyze with no noticeable changes in > performance. > >> * Do you know the last time the database was vacuum''d and/or analyzed? >> Try this link for a way to check this: >> >> http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ >> > puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, > last_autoanalyze from pg_stat_user_tables; > relname | last_vacuum | > last_autovacuum | last_analyze | > last_autoanalyze > > -------------------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- > catalog_resources | | 2013-01-12 > 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 > 18:27:11.166982+00 > certname_facts_metadata | | 2013-01-15 > 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 > 19:22:15.308925+00 > certnames | 2013-01-03 22:22:11.416002+00 | 2013-01-15 > 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 > 21:17:48.14814+00 > resource_params | | 2013-01-05 > 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00 | 2013-01-15 > 15:56:52.589316+00 > classes | | 2013-01-12 > 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 > 16:24:04.909758+00 > catalogs | | 2013-01-15 > 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 > 21:18:16.493626+00 > certname_facts | | 2013-01-15 > 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00 | 2013-01-15 > 21:25:08.780617+00 > tags | | 2013-01-15 > 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00 | 2013-01-15 > 21:48:19.751198+00 > edges | | 2013-01-12 > 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 > 17:27:19.061491+00 > certname_catalogs | | 2013-01-15 > 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 > 21:18:55.369032+00 > schema_migrations | | > | 2012-11-06 22:02:40.637773+00 | > (11 rows) > > >> * Can you get me some box metrics, memory size, cpu size/number, disk >> spindle speeds etc. of your postgresql box? So I can get an idea of >> the throughput available to you. >> > It is a VMware VM > the VM has the following properties: > 8 CPU > 16 GB of memory > Attached to an Enterprise class SAN > > The database is on dev253-12 > > SAR DISK output > 09:40:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz > await svctm %util > 09:50:01 PM dev8-0 2.28 33.73 21.09 24.03 0.01 > 2.23 1.38 0.32 > 09:50:01 PM dev8-16 2.24 0.23 1501.34 670.68 0.03 > 11.20 1.50 0.34 > 09:50:01 PM dev8-32 0.57 0.01 7.28 12.68 0.00 > 3.42 3.11 0.18 > 09:50:01 PM dev8-48 10.07 0.53 1591.25 158.02 0.02 > 1.74 1.64 1.66 > 09:50:01 PM dev8-64 354.77 16164.46 2231.28 51.85 2.80 > 7.89 1.81 64.30 > 09:50:01 PM dev253-0 0.78 33.70 0.62 43.93 0.00 > 3.91 2.47 0.19 > 09:50:01 PM dev253-1 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 09:50:01 PM dev253-2 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 09:50:01 PM dev253-3 1.67 0.05 13.28 8.01 0.00 > 1.21 0.38 0.06 > 09:50:01 PM dev253-4 0.38 0.00 3.06 8.00 0.00 > 1.64 0.72 0.03 > 09:50:01 PM dev253-5 0.43 0.00 3.45 8.00 0.00 > 1.28 0.66 0.03 > 09:50:01 PM dev253-6 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 09:50:01 PM dev253-7 0.09 0.04 0.68 8.00 0.00 > 0.72 0.45 0.00 > 09:50:01 PM dev253-8 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 09:50:01 PM dev253-9 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 09:50:01 PM dev253-10 187.70 0.23 1501.34 8.00 2.62 > 13.95 0.02 0.34 > 09:50:01 PM dev253-11 198.97 0.53 1591.25 8.00 0.53 > 2.67 0.08 1.66 > 09:50:01 PM dev253-12 534.17 16164.46 2231.28 34.44 4.24 > 7.93 1.20 64.32 > 09:50:01 PM dev253-13 0.91 0.01 7.28 8.00 0.00 > 5.24 1.96 0.18 > Average: dev8-0 3.21 73.51 24.19 30.43 0.02 > 7.48 4.18 1.34 > Average: dev8-16 3.36 1.14 2568.18 763.92 0.08 > 22.78 1.09 0.37 > Average: dev8-32 0.66 0.97 7.42 12.74 0.00 > 1.81 1.55 0.10 > Average: dev8-48 9.95 1.60 1373.40 138.23 0.01 > 1.51 1.42 1.41 > Average: dev8-64 140.65 6834.41 2298.08 64.93 1.19 > 8.49 1.82 25.66 > Average: dev253-0 1.34 45.41 1.08 34.72 0.02 > 15.92 5.52 0.74 > Average: dev253-1 0.03 0.20 0.07 8.00 0.00 > 2.18 0.67 0.00 > Average: dev253-2 0.04 0.23 0.08 8.00 0.00 > 10.40 2.11 0.01 > Average: dev253-3 1.89 9.17 13.84 12.20 0.01 > 3.85 1.67 0.31 > Average: dev253-4 0.46 0.21 3.43 8.00 0.00 > 2.26 0.91 0.04 > Average: dev253-5 0.81 16.84 4.94 26.88 0.01 > 7.93 2.83 0.23 > Average: dev253-6 0.02 0.19 0.00 8.00 0.00 > 0.34 0.32 0.00 > Average: dev253-7 0.12 0.23 0.74 8.10 0.00 > 1.41 0.99 0.01 > Average: dev253-8 0.02 0.19 0.00 8.00 0.00 > 0.33 0.32 0.00 > Average: dev253-9 0.04 0.36 0.00 8.00 0.00 > 0.52 0.52 0.00 > Average: dev253-10 321.11 0.74 2568.18 8.00 9.20 > 28.65 0.01 0.36 > Average: dev253-11 171.74 1.20 1373.40 8.00 0.43 > 2.53 0.08 1.42 > Average: dev253-12 339.64 6834.01 2298.08 26.89 3.01 > 8.84 0.76 25.66 > Average: dev253-13 0.98 0.57 7.42 8.19 0.00 > 2.26 1.01 0.10 > > * Maybe any information about the postgresql tuning items - >> shared_buffers is probably a good start, or have you left the tuning >> alone? >> > shared_buffers = 4096MB > work_mem = 8MB > maintenance_work_mem = 32MB > >> >> You probably don''t need debug with log-slow-statements, I wasn''t aware >> that was an option but I think that gives a good amount of data for >> this problem. >> >> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com> wrote: >> > I will work on setting up a puppetdb server I can put into debug mode. >> For >> > now here is a "normal" GC run log output and my configuration files. >> > >> > Thanks, >> > >> > >> > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] >> > [cli.services] Starting database garbage collection >> > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] >> > [jdbc.internal] Query slower than 10s threshold: actual execution >> time: >> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT >> * >> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query >> Params: >> > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] >> > [jdbc.internal] Query slower than 10s threshold: actual execution >> time: >> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS >> > (SELECT * FROM catalog_resources cr WHERE >> > cr.resource=resource_params.resource); Query Params: >> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] >> > [cli.services] Finished database garbage collection >> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] >> > [cli.services] Starting sweep of stale nodes (1 day threshold) >> > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] >> > [cli.services] Finished sweep of stale nodes (1 day threshold) >> > >> > ########################################################## >> > # >> > # config.ini >> > # >> > >> > # See README.md for more thorough explanations of each section and >> > # option. >> > >> > [global] >> > # Store mq/db data in a custom directory >> > vardir = /var/lib/puppetdb >> > # Use an external log4j config file >> > logging-config = /etc/puppetdb/conf.d/../log4j.properties >> > >> > # Maximum number of results that a resource query may return >> > resource-query-limit = 20000 >> > >> > [command-processing] >> > # How many command-processing threads to use, defaults to (CPUs / 2) >> > threads = 10 >> > >> > ########################################################## >> > # >> > # database.ini >> > # >> > >> > [database] >> > # For the embedded DB: org.hsqldb.jdbcDriver >> > # For PostgreSQL: org.postgresql.Driver >> > # Defaults to embedded DB >> > classname = org.postgresql.Driver >> > >> > # For the embedded DB: hsqldb >> > # For PostgreSQL: postgresql >> > # Defaults to embedded DB >> > subprotocol = postgresql >> > >> > # For the embedded DB: >> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true >> > # For PostgreSQL: //host:port/databaseName >> > # Defaults to embedded DB located in <vardir>/db >> > subname = //PUPPETDB:5432/puppet >> > >> > # Connect as a specific user >> > username = PUPPETDBUSER >> > >> > # Use a specific password >> > password = PASSWORDPUPPETDB >> > >> > # How often (in minutes) to compact the database >> > # gc-interval = 60 >> > >> > # Number of seconds before any SQL query is considered ''slow''; >> offending >> > # queries will not be interrupted, but will be logged at the WARN log >> level. >> > log-slow-statements = 10 >> > >> > # How old (in days) to deactivate nodes that have not reported >> > node-ttl-days = 1 >> > >> > # How often to send a keepalive >> > conn-keep-alive = 4 >> > >> > # How long to keep connections open >> > conn-max-age = 30 >> > >> > >> > >> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: >> >> >> >> Hey Chuck, >> >> >> >> I''ve had a chat with my colleagues and they raised some concerns about >> >> your gc performance. I wouldn''t mind drilling into that problem if >> >> thats okay with you? >> >> >> >> For starters - what are your settings looking like? In particular I''m >> >> interested in node-ttl-seconds and report-ttl-seconds. >> >> >> >> FYI there are a few actions that occur when gc kicks in: >> >> >> >> * sweeping stale nodes >> >> * sweeping stale reports >> >> * removing any unassociated data (catalogs or params) >> >> >> >> It would be useful to find out which of these are taking up the most >> >> amount of your time. >> >> >> >> I wouldn''t mind a dump of your logs when in a specific debug mode if >> >> thats at all possible, this should give me a rough idea of the time >> >> the sweeper spends doing its various deletes so I can at least get a >> >> focus on your issue. In particular the bits between ''Starting database >> >> garbage collection'' and ''Finished database garbage collection''. >> >> >> >> The easiest way to get some debugging is to modify your >> >> log4j.properties (usually in /etc/puppetdb) and add the line: >> >> >> >> log4j.logger.com.jolbox.bonecp=DEBUG >> >> >> >> This will log *ALL* SQL statements, so it might be heavy - I''m not >> >> sure if you can isolate one of your nodes for this purpose or somehow >> >> mitigate this risk ... we only need a heavy GC run, it probably >> >> doesn''t need to be actively receiving requests and such ... >> >> >> >> ken. >> >> >> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: >> >> > >> >> > >> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: >> >> >> >> >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 >> >> >> > puppetdb >> >> >> > servers in various geographic locations. The garbage collect >> seems >> >> >> > to >> >> >> > be a >> >> >> > pretty intensive operation on the Postgres DB server in the >> current >> >> >> > design. >> >> >> >> >> >> Thats a shame to hear. What''s the impact for you? I see from your >> >> >> screenshot its averaging 16 seconds, is that what you are seeing? >> >> > >> >> > >> >> > This is normally what we see 8 times of course. >> >> > >> >> >> >> >> >> > We have also been having an issue that started at the beginning >> of >> >> >> > 2013 >> >> >> > that >> >> >> > we have not been able to narrow down at this time (we were on >> >> >> > puppetdb >> >> >> > 1.0.1 >> >> >> > for a long time with no issues. Updated to 1.0.5 last week with >> no >> >> >> > improvement) >> >> >> >> >> >> This issue, is it related to the gc problem or something else? Can >> you >> >> >> elaborate? >> >> >> >> >> > We are still trying to determine what is happening. We just see a >> huge >> >> > spike in the number of rows being deleted. >> >> > >> >> >> >> >> >> > I was planning on having my puppet manifest, for the puppetdb >> >> >> > servers, >> >> >> > elect >> >> >> > one or two master puppetdb servers by querying for active >> puppetdb >> >> >> > servers. >> >> >> > So I would be dynamically electing the master(s) based on an >> >> >> > algorithm. >> >> >> >> >> >> Sounds good to me. Do you want to raise a feature request for this? >> >> > >> >> > >> >> > Yeah I can do that. >> >> > >> >> >> >> >> >> http://projects.puppetlabs.com/projects/puppetdb >> >> >> >> >> >> Just more or less cover what you''ve said about wanting to disable >> by >> >> >> setting to zero. I''ve already more or less got a patch that can >> >> >> disable it when zero (its really just a conditional clause in the >> >> >> right place), but I haven''t done any thorough testing or gotten >> review >> >> >> on it yet. >> >> >> >> >> >> ken. >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups >> >> > "Puppet Users" group. >> >> > To view this discussion on the web visit >> >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. >> >> > >> >> > To post to this group, send email to puppet...@googlegroups.com. >> >> > To unsubscribe from this group, send email to >> >> > puppet-users...@googlegroups.com. >> >> > For more options, visit this group at >> >> > http://groups.google.com/group/puppet-users?hl=en. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups >> > "Puppet Users" group. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. >> > >> > To post to this group, send email to puppet...@googlegroups.com. >> > To unsubscribe from this group, send email to >> > puppet-users...@googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/group/puppet-users?hl=en. >> >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-17 17:17 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
Hmm. I wonder if this is an artifact of the query for the delete taking too long, or the fact that there the query for the delete is taking too long. It would seem odd to me if there were a lot of deletes, this implies a lot of orphans which shouldn''t be a regular thing, unless this points to an orphan problem that we need to address. What are the results of these queries? select count(*) from catalogs where not exists (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); select count(*) from resource_params where not exists(SELECT * FROM catalog_resources cr WHERE cr.resource=resource_params.resource); explain analyze verbose select count(*) from catalogs where not exists (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); explain analyze verbose select count(*) from resource_params where not exists(SELECT * FROM catalog_resources cr WHERE cr.resource=resource_params.resource); And if you run the queries without the explain, do you see any results come back at any point in time? This might help us understand how much is pending deletion. As far as I can see all the columns in those queries have corresponding indexes. So I would be surprised if its an inefficient SQL query. As far as the DELETE locks, I starts to make me wonder how much of this load is a side-effect of this problem reaching a threshold whereby contention on the DB is high and now we''re getting a spiral of backlogging causing the overall times to slow down. With 8 nodes, and a delete taking 6 minutes or so ... thats 48 minutes of gc. Almost makes you want a way to run the GC on one node only to remove the contention so you can analyse the problem properly ;-). I presume this is production, and you can''t just somehow make that happen can you? ie. get only 1 node running. Maybe making the GC on a few nodes excessive (like 24 hours instead of 60 minutes) might emulate this effect without us having to patch. I''d be interested in seeing how this affects your SQL times in the logs when only 1 node is running a GC at a time instead of 8, just to remove contention from the equation. ken. On Wed, Jan 16, 2013 at 2:33 PM, Chuck <csschwe@gmail.com> wrote:> Here is an example of locks taking a long time to acquire in the postgres DB > > 2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet LOG: > process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811 of > database 16513 after 12166509.285 ms > 2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet LOG: > process 4553 still waiting for ShareLock on transaction 32181035 after > 1000.100 ms > 2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet LOG: > process 5953 still waiting for ShareLock on transaction 32245132 after > 1000.148 ms > 2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet LOG: > process 5953 acquired ShareLock on transaction 32245132 after 1813.791 ms > 2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet LOG: > process 4553 acquired ShareLock on transaction 32181035 after 172731.472 ms > 2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: > process 3346 acquired ExclusiveLock on tuple (0,102) of relation 16811 of > database 16513 after 12261437.817 ms > 2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: > process 3346 still waiting for ShareLock on transaction 32196872 after > 1000.516 ms > 2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet LOG: > process 3346 acquired ShareLock on transaction 32196872 after 260360.667 ms > 2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet > LOG: process 27293 acquired ExclusiveLock on tuple (0,102) of relation > 16811 of database 16513 after 12475027.712 ms > 2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet > LOG: process 27293 still waiting for ShareLock on transaction 32197208 > after 1000.078 ms > 2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS |puppet > LOG: process 17853 still waiting for ExclusiveLock on tuple (0,102) of > relation 16811 of database 16513 after 1000.212 ms > 2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS |puppet > LOG: process 20980 still waiting for ExclusiveLock on tuple (0,102) of > relation 16811 of database 16513 after 1000.733 ms > 2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS |puppet > LOG: process 28432 still waiting for ExclusiveLock on tuple (0,102) of > relation 16811 of database 16513 after 1000.283 ms > > On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck wrote: >> >> >> >> On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote: >>> >>> So that looks like its taking ages ... some questions for you: >>> >>> * What version of Postgresql are you running and on what distro & >>> version exactly? >> >> RHEL 6.3 >> Enterprise DB ppas-9.1.4 >> >>> * Are you using the distros packages or someone elses? >> >> Provided by Enterprise DB >> >>> * Is autovacuum switched on at all (check your postgresql.conf)? >> >> Yes. We also ran manual vacuum analyze with no noticeable changes in >> performance. >>> >>> * Do you know the last time the database was vacuum''d and/or analyzed? >>> Try this link for a way to check this: >>> >>> http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ >> >> puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, >> last_autoanalyze from pg_stat_user_tables; >> relname | last_vacuum | >> last_autovacuum | last_analyze | >> last_autoanalyze >> >> -------------------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- >> catalog_resources | | 2013-01-12 >> 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 >> 18:27:11.166982+00 >> certname_facts_metadata | | 2013-01-15 >> 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 >> 19:22:15.308925+00 >> certnames | 2013-01-03 22:22:11.416002+00 | 2013-01-15 >> 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 >> 21:17:48.14814+00 >> resource_params | | 2013-01-05 >> 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00 | 2013-01-15 >> 15:56:52.589316+00 >> classes | | 2013-01-12 >> 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 >> 16:24:04.909758+00 >> catalogs | | 2013-01-15 >> 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 >> 21:18:16.493626+00 >> certname_facts | | 2013-01-15 >> 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00 | 2013-01-15 >> 21:25:08.780617+00 >> tags | | 2013-01-15 >> 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00 | 2013-01-15 >> 21:48:19.751198+00 >> edges | | 2013-01-12 >> 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 >> 17:27:19.061491+00 >> certname_catalogs | | 2013-01-15 >> 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 >> 21:18:55.369032+00 >> schema_migrations | | >> | 2012-11-06 22:02:40.637773+00 | >> (11 rows) >> >>> >>> * Can you get me some box metrics, memory size, cpu size/number, disk >>> spindle speeds etc. of your postgresql box? So I can get an idea of >>> the throughput available to you. >> >> It is a VMware VM >> the VM has the following properties: >> 8 CPU >> 16 GB of memory >> Attached to an Enterprise class SAN >> >> The database is on dev253-12 >> >> SAR DISK output >> 09:40:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz >> await svctm %util >> 09:50:01 PM dev8-0 2.28 33.73 21.09 24.03 0.01 >> 2.23 1.38 0.32 >> 09:50:01 PM dev8-16 2.24 0.23 1501.34 670.68 0.03 >> 11.20 1.50 0.34 >> 09:50:01 PM dev8-32 0.57 0.01 7.28 12.68 0.00 >> 3.42 3.11 0.18 >> 09:50:01 PM dev8-48 10.07 0.53 1591.25 158.02 0.02 >> 1.74 1.64 1.66 >> 09:50:01 PM dev8-64 354.77 16164.46 2231.28 51.85 2.80 >> 7.89 1.81 64.30 >> 09:50:01 PM dev253-0 0.78 33.70 0.62 43.93 0.00 >> 3.91 2.47 0.19 >> 09:50:01 PM dev253-1 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 >> 09:50:01 PM dev253-2 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 >> 09:50:01 PM dev253-3 1.67 0.05 13.28 8.01 0.00 >> 1.21 0.38 0.06 >> 09:50:01 PM dev253-4 0.38 0.00 3.06 8.00 0.00 >> 1.64 0.72 0.03 >> 09:50:01 PM dev253-5 0.43 0.00 3.45 8.00 0.00 >> 1.28 0.66 0.03 >> 09:50:01 PM dev253-6 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 >> 09:50:01 PM dev253-7 0.09 0.04 0.68 8.00 0.00 >> 0.72 0.45 0.00 >> 09:50:01 PM dev253-8 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 >> 09:50:01 PM dev253-9 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 >> 09:50:01 PM dev253-10 187.70 0.23 1501.34 8.00 2.62 >> 13.95 0.02 0.34 >> 09:50:01 PM dev253-11 198.97 0.53 1591.25 8.00 0.53 >> 2.67 0.08 1.66 >> 09:50:01 PM dev253-12 534.17 16164.46 2231.28 34.44 4.24 >> 7.93 1.20 64.32 >> 09:50:01 PM dev253-13 0.91 0.01 7.28 8.00 0.00 >> 5.24 1.96 0.18 >> Average: dev8-0 3.21 73.51 24.19 30.43 0.02 >> 7.48 4.18 1.34 >> Average: dev8-16 3.36 1.14 2568.18 763.92 0.08 >> 22.78 1.09 0.37 >> Average: dev8-32 0.66 0.97 7.42 12.74 0.00 >> 1.81 1.55 0.10 >> Average: dev8-48 9.95 1.60 1373.40 138.23 0.01 >> 1.51 1.42 1.41 >> Average: dev8-64 140.65 6834.41 2298.08 64.93 1.19 >> 8.49 1.82 25.66 >> Average: dev253-0 1.34 45.41 1.08 34.72 0.02 >> 15.92 5.52 0.74 >> Average: dev253-1 0.03 0.20 0.07 8.00 0.00 >> 2.18 0.67 0.00 >> Average: dev253-2 0.04 0.23 0.08 8.00 0.00 >> 10.40 2.11 0.01 >> Average: dev253-3 1.89 9.17 13.84 12.20 0.01 >> 3.85 1.67 0.31 >> Average: dev253-4 0.46 0.21 3.43 8.00 0.00 >> 2.26 0.91 0.04 >> Average: dev253-5 0.81 16.84 4.94 26.88 0.01 >> 7.93 2.83 0.23 >> Average: dev253-6 0.02 0.19 0.00 8.00 0.00 >> 0.34 0.32 0.00 >> Average: dev253-7 0.12 0.23 0.74 8.10 0.00 >> 1.41 0.99 0.01 >> Average: dev253-8 0.02 0.19 0.00 8.00 0.00 >> 0.33 0.32 0.00 >> Average: dev253-9 0.04 0.36 0.00 8.00 0.00 >> 0.52 0.52 0.00 >> Average: dev253-10 321.11 0.74 2568.18 8.00 9.20 >> 28.65 0.01 0.36 >> Average: dev253-11 171.74 1.20 1373.40 8.00 0.43 >> 2.53 0.08 1.42 >> Average: dev253-12 339.64 6834.01 2298.08 26.89 3.01 >> 8.84 0.76 25.66 >> Average: dev253-13 0.98 0.57 7.42 8.19 0.00 >> 2.26 1.01 0.10 >> >>> * Maybe any information about the postgresql tuning items - >>> shared_buffers is probably a good start, or have you left the tuning >>> alone? >> >> shared_buffers = 4096MB >> work_mem = 8MB >> maintenance_work_mem = 32MB >>> >>> >>> You probably don''t need debug with log-slow-statements, I wasn''t aware >>> that was an option but I think that gives a good amount of data for >>> this problem. >>> >>> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com> wrote: >>> > I will work on setting up a puppetdb server I can put into debug mode. >>> > For >>> > now here is a "normal" GC run log output and my configuration files. >>> > >>> > Thanks, >>> > >>> > >>> > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] >>> > [cli.services] Starting database garbage collection >>> > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] >>> > [jdbc.internal] Query slower than 10s threshold: actual execution >>> > time: >>> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT >>> > * >>> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query >>> > Params: >>> > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] >>> > [jdbc.internal] Query slower than 10s threshold: actual execution >>> > time: >>> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS >>> > (SELECT * FROM catalog_resources cr WHERE >>> > cr.resource=resource_params.resource); Query Params: >>> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] >>> > [cli.services] Finished database garbage collection >>> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] >>> > [cli.services] Starting sweep of stale nodes (1 day threshold) >>> > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] >>> > [cli.services] Finished sweep of stale nodes (1 day threshold) >>> > >>> > ########################################################## >>> > # >>> > # config.ini >>> > # >>> > >>> > # See README.md for more thorough explanations of each section and >>> > # option. >>> > >>> > [global] >>> > # Store mq/db data in a custom directory >>> > vardir = /var/lib/puppetdb >>> > # Use an external log4j config file >>> > logging-config = /etc/puppetdb/conf.d/../log4j.properties >>> > >>> > # Maximum number of results that a resource query may return >>> > resource-query-limit = 20000 >>> > >>> > [command-processing] >>> > # How many command-processing threads to use, defaults to (CPUs / 2) >>> > threads = 10 >>> > >>> > ########################################################## >>> > # >>> > # database.ini >>> > # >>> > >>> > [database] >>> > # For the embedded DB: org.hsqldb.jdbcDriver >>> > # For PostgreSQL: org.postgresql.Driver >>> > # Defaults to embedded DB >>> > classname = org.postgresql.Driver >>> > >>> > # For the embedded DB: hsqldb >>> > # For PostgreSQL: postgresql >>> > # Defaults to embedded DB >>> > subprotocol = postgresql >>> > >>> > # For the embedded DB: >>> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true >>> > # For PostgreSQL: //host:port/databaseName >>> > # Defaults to embedded DB located in <vardir>/db >>> > subname = //PUPPETDB:5432/puppet >>> > >>> > # Connect as a specific user >>> > username = PUPPETDBUSER >>> > >>> > # Use a specific password >>> > password = PASSWORDPUPPETDB >>> > >>> > # How often (in minutes) to compact the database >>> > # gc-interval = 60 >>> > >>> > # Number of seconds before any SQL query is considered ''slow''; >>> > offending >>> > # queries will not be interrupted, but will be logged at the WARN log >>> > level. >>> > log-slow-statements = 10 >>> > >>> > # How old (in days) to deactivate nodes that have not reported >>> > node-ttl-days = 1 >>> > >>> > # How often to send a keepalive >>> > conn-keep-alive = 4 >>> > >>> > # How long to keep connections open >>> > conn-max-age = 30 >>> > >>> > >>> > >>> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: >>> >> >>> >> Hey Chuck, >>> >> >>> >> I''ve had a chat with my colleagues and they raised some concerns about >>> >> your gc performance. I wouldn''t mind drilling into that problem if >>> >> thats okay with you? >>> >> >>> >> For starters - what are your settings looking like? In particular I''m >>> >> interested in node-ttl-seconds and report-ttl-seconds. >>> >> >>> >> FYI there are a few actions that occur when gc kicks in: >>> >> >>> >> * sweeping stale nodes >>> >> * sweeping stale reports >>> >> * removing any unassociated data (catalogs or params) >>> >> >>> >> It would be useful to find out which of these are taking up the most >>> >> amount of your time. >>> >> >>> >> I wouldn''t mind a dump of your logs when in a specific debug mode if >>> >> thats at all possible, this should give me a rough idea of the time >>> >> the sweeper spends doing its various deletes so I can at least get a >>> >> focus on your issue. In particular the bits between ''Starting database >>> >> garbage collection'' and ''Finished database garbage collection''. >>> >> >>> >> The easiest way to get some debugging is to modify your >>> >> log4j.properties (usually in /etc/puppetdb) and add the line: >>> >> >>> >> log4j.logger.com.jolbox.bonecp=DEBUG >>> >> >>> >> This will log *ALL* SQL statements, so it might be heavy - I''m not >>> >> sure if you can isolate one of your nodes for this purpose or somehow >>> >> mitigate this risk ... we only need a heavy GC run, it probably >>> >> doesn''t need to be actively receiving requests and such ... >>> >> >>> >> ken. >>> >> >>> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: >>> >> > >>> >> > >>> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: >>> >> >> >>> >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and 8 >>> >> >> > puppetdb >>> >> >> > servers in various geographic locations. The garbage collect >>> >> >> > seems >>> >> >> > to >>> >> >> > be a >>> >> >> > pretty intensive operation on the Postgres DB server in the >>> >> >> > current >>> >> >> > design. >>> >> >> >>> >> >> Thats a shame to hear. What''s the impact for you? I see from your >>> >> >> screenshot its averaging 16 seconds, is that what you are seeing? >>> >> > >>> >> > >>> >> > This is normally what we see 8 times of course. >>> >> > >>> >> >> >>> >> >> > We have also been having an issue that started at the beginning >>> >> >> > of >>> >> >> > 2013 >>> >> >> > that >>> >> >> > we have not been able to narrow down at this time (we were on >>> >> >> > puppetdb >>> >> >> > 1.0.1 >>> >> >> > for a long time with no issues. Updated to 1.0.5 last week with >>> >> >> > no >>> >> >> > improvement) >>> >> >> >>> >> >> This issue, is it related to the gc problem or something else? Can >>> >> >> you >>> >> >> elaborate? >>> >> >> >>> >> > We are still trying to determine what is happening. We just see a >>> >> > huge >>> >> > spike in the number of rows being deleted. >>> >> > >>> >> >> >>> >> >> > I was planning on having my puppet manifest, for the puppetdb >>> >> >> > servers, >>> >> >> > elect >>> >> >> > one or two master puppetdb servers by querying for active >>> >> >> > puppetdb >>> >> >> > servers. >>> >> >> > So I would be dynamically electing the master(s) based on an >>> >> >> > algorithm. >>> >> >> >>> >> >> Sounds good to me. Do you want to raise a feature request for this? >>> >> > >>> >> > >>> >> > Yeah I can do that. >>> >> > >>> >> >> >>> >> >> http://projects.puppetlabs.com/projects/puppetdb >>> >> >> >>> >> >> Just more or less cover what you''ve said about wanting to disable >>> >> >> by >>> >> >> setting to zero. I''ve already more or less got a patch that can >>> >> >> disable it when zero (its really just a conditional clause in the >>> >> >> right place), but I haven''t done any thorough testing or gotten >>> >> >> review >>> >> >> on it yet. >>> >> >> >>> >> >> ken. >>> >> > >>> >> > -- >>> >> > You received this message because you are subscribed to the Google >>> >> > Groups >>> >> > "Puppet Users" group. >>> >> > To view this discussion on the web visit >>> >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. >>> >> > >>> >> > To post to this group, send email to puppet...@googlegroups.com. >>> >> > To unsubscribe from this group, send email to >>> >> > puppet-users...@googlegroups.com. >>> >> > For more options, visit this group at >>> >> > http://groups.google.com/group/puppet-users?hl=en. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups >>> > "Puppet Users" group. >>> > To view this discussion on the web visit >>> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. >>> > >>> > To post to this group, send email to puppet...@googlegroups.com. >>> > To unsubscribe from this group, send email to >>> > puppet-users...@googlegroups.com. >>> > For more options, visit this group at >>> > http://groups.google.com/group/puppet-users?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-17 21:04 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
On Thursday, January 17, 2013 11:17:27 AM UTC-6, Ken Barber wrote:> > Hmm. I wonder if this is an artifact of the query for the delete > taking too long, or the fact that there the query for the delete is > taking too long. It would seem odd to me if there were a lot of > deletes, this implies a lot of orphans which shouldn''t be a regular > thing, unless this points to an orphan problem that we need to > address. > > What are the results of these queries? > > select count(*) from catalogs where not exists (SELECT * FROM > certname_catalogs cc WHERE cc.catalog=catalogs.hash); > > puppet=# select count(*) from puppet.catalogs where not exists (SELECT *FROM puppet.certname_catalogs cc WHERE cc.catalog=catalogs.hash); count ------- 18 (1 row)> select count(*) from resource_params where not exists(SELECT * FROM > catalog_resources cr WHERE cr.resource=resource_params.resource); >puppet=# select count(*) from puppet.resource_params where not exists(SELECT * FROM puppet.catalog_resources cr WHERE cr.resource=resource_params.resource); count ------- 0 (1 row)> explain analyze verbose select count(*) from catalogs where not exists > (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); > > puppet=# explain analyze verbose select count(*) from puppet.catalogswhere not exists puppet-# (SELECT * FROM puppet.certname_catalogs cc WHERE cc.catalog=catalogs.hash); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2920.43..2920.44 rows=1 width=0) (actual time=63.301..63.302 rows=1 loops=1) Output: count(*) -> Hash Anti Join (cost=1381.86..2899.52 rows=8365 width=0) (actual time=29.369..63.285 rows=18 loops=1) Hash Cond: ((catalogs.hash)::text = (cc.catalog)::text) -> Seq Scan on puppet.catalogs (cost=0.00..1219.70 rows=29870 width=41) (actual time=0.022..10.124 rows=21523 loops=1) Output: catalogs.hash, catalogs.api_version, catalogs.catalog_version, catalogs.mmr_resend_indicator -> Hash (cost=1113.05..1113.05 rows=21505 width=41) (actual time=26.394..26.394 rows=21505 loops=1) Output: cc.catalog Buckets: 4096 Batches: 1 Memory Usage: 1534kB -> Seq Scan on puppet.certname_catalogs cc (cost=0.00..1113.05 rows=21505 width=41) (actual time=0.016..12.341 rows=21505 loops=1) Output: cc.catalog Total runtime: 63.380 ms (12 rows)> explain analyze verbose select count(*) from resource_params where not > exists(SELECT * FROM catalog_resources cr WHERE > cr.resource=resource_params.resource); > > puppet=# explain analyze verbose select count(*) frompuppet.resource_params where not puppet-# exists(SELECT * FROM puppet.catalog_resources cr WHERE puppet(# cr.resource=puppet.resource_params.resource); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------- Aggregate (cost=1257861.13..1257861.14 rows=1 width=0) (actual time=18486.103..18486.103 rows=1 loops=1) Output: count(*) -> Nested Loop Anti Join (cost=0.00..1254977.05 rows=1153633 width=0) (actual time=18486.098..18486.098 rows=0 loops=1) -> Seq Scan on puppet.resource_params (cost=0.00..32052.34 rows=1171134 width=41) (actual time=0.005..264.081 rows=1166567 loops=1) Output: resource_params.resource, resource_params.name, resource_params.value, resource_params.mmr_resend_indicator -> Index Scan using idx_catalog_resources_resource on puppet.catalog_resources cr (cost=0.00..3196.19 rows=3116 width=41) (actual time=0.015..0.015 rows=1 loops=11 66567) Output: cr.catalog, cr.resource, cr.type, cr.title, cr.tags, cr.exported, cr.sourcefile, cr.sourceline, cr.mmr_resend_indicator Index Cond: ((cr.resource)::text = (resource_params.resource)::text) Total runtime: 18486.163 ms (9 rows)> And if you run the queries without the explain, do you see any results > come back at any point in time? This might help us understand how much > is pending deletion. > > puppet=# select count(*) from puppet.catalogs where not existspuppet-# (SELECT * FROM puppet.certname_catalogs cc WHERE cc.catalog=catalogs.hash); count ------- 20 (1 row) puppet=# select count(*) from puppet.resource_params where not puppet-# exists(SELECT * FROM puppet.catalog_resources cr WHERE puppet(# cr.resource=puppet.resource_params.resource); count ------- 0 (1 row)> As far as I can see all the columns in those queries have > corresponding indexes. So I would be surprised if its an inefficient > SQL query. > > As far as the DELETE locks, I starts to make me wonder how much of > this load is a side-effect of this problem reaching a threshold > whereby contention on the DB is high and now we''re getting a spiral of > backlogging causing the overall times to slow down. With 8 nodes, and > a delete taking 6 minutes or so ... thats 48 minutes of gc. > > Almost makes you want a way to run the GC on one node only to remove > the contention so you can analyse the problem properly ;-). I presume > this is production, and you can''t just somehow make that happen can > you? ie. get only 1 node running. Maybe making the GC on a few nodes > excessive (like 24 hours instead of 60 minutes) might emulate this > effect without us having to patch. > > Correct. I don''t have a way to reduce this down to one node due to itbeing production.> I''d be interested in seeing how this affects your SQL times in the > logs when only 1 node is running a GC at a time instead of 8, just to > remove contention from the equation. > > ken. > > On Wed, Jan 16, 2013 at 2:33 PM, Chuck <css...@gmail.com <javascript:>> > wrote: > > Here is an example of locks taking a long time to acquire in the > postgres DB > > > > 2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet > LOG: > > process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811 > of > > database 16513 after 12166509.285 ms > > 2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet > LOG: > > process 4553 still waiting for ShareLock on transaction 32181035 after > > 1000.100 ms > > 2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet > LOG: > > process 5953 still waiting for ShareLock on transaction 32245132 after > > 1000.148 ms > > 2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet > LOG: > > process 5953 acquired ShareLock on transaction 32245132 after 1813.791 > ms > > 2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet > LOG: > > process 4553 acquired ShareLock on transaction 32181035 after 172731.472 > ms > > 2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet > LOG: > > process 3346 acquired ExclusiveLock on tuple (0,102) of relation 16811 > of > > database 16513 after 12261437.817 ms > > 2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet > LOG: > > process 3346 still waiting for ShareLock on transaction 32196872 after > > 1000.516 ms > > 2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet > LOG: > > process 3346 acquired ShareLock on transaction 32196872 after 260360.667 > ms > > 2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet > > LOG: process 27293 acquired ExclusiveLock on tuple (0,102) of relation > > 16811 of database 16513 after 12475027.712 ms > > 2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet > > LOG: process 27293 still waiting for ShareLock on transaction 32197208 > > after 1000.078 ms > > 2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS |puppet > > LOG: process 17853 still waiting for ExclusiveLock on tuple (0,102) of > > relation 16811 of database 16513 after 1000.212 ms > > 2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS |puppet > > LOG: process 20980 still waiting for ExclusiveLock on tuple (0,102) of > > relation 16811 of database 16513 after 1000.733 ms > > 2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS |puppet > > LOG: process 28432 still waiting for ExclusiveLock on tuple (0,102) of > > relation 16811 of database 16513 after 1000.283 ms > > > > On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck wrote: > >> > >> > >> > >> On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote: > >>> > >>> So that looks like its taking ages ... some questions for you: > >>> > >>> * What version of Postgresql are you running and on what distro & > >>> version exactly? > >> > >> RHEL 6.3 > >> Enterprise DB ppas-9.1.4 > >> > >>> * Are you using the distros packages or someone elses? > >> > >> Provided by Enterprise DB > >> > >>> * Is autovacuum switched on at all (check your postgresql.conf)? > >> > >> Yes. We also ran manual vacuum analyze with no noticeable changes in > >> performance. > >>> > >>> * Do you know the last time the database was vacuum''d and/or analyzed? > >>> Try this link for a way to check this: > >>> > >>> > http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/ > >> > >> puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, > >> last_autoanalyze from pg_stat_user_tables; > >> relname | last_vacuum | > >> last_autovacuum | last_analyze | > >> last_autoanalyze > >> > >> > -------------------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- > > >> catalog_resources | | 2013-01-12 > >> 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 > >> 18:27:11.166982+00 > >> certname_facts_metadata | | 2013-01-15 > >> 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 > >> 19:22:15.308925+00 > >> certnames | 2013-01-03 22:22:11.416002+00 | 2013-01-15 > >> 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 > >> 21:17:48.14814+00 > >> resource_params | | 2013-01-05 > >> 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00 | 2013-01-15 > >> 15:56:52.589316+00 > >> classes | | 2013-01-12 > >> 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 > >> 16:24:04.909758+00 > >> catalogs | | 2013-01-15 > >> 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 > >> 21:18:16.493626+00 > >> certname_facts | | 2013-01-15 > >> 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00 | 2013-01-15 > >> 21:25:08.780617+00 > >> tags | | 2013-01-15 > >> 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00 | 2013-01-15 > >> 21:48:19.751198+00 > >> edges | | 2013-01-12 > >> 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 > >> 17:27:19.061491+00 > >> certname_catalogs | | 2013-01-15 > >> 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 > >> 21:18:55.369032+00 > >> schema_migrations | | > >> | 2012-11-06 22:02:40.637773+00 | > >> (11 rows) > >> > >>> > >>> * Can you get me some box metrics, memory size, cpu size/number, disk > >>> spindle speeds etc. of your postgresql box? So I can get an idea of > >>> the throughput available to you. > >> > >> It is a VMware VM > >> the VM has the following properties: > >> 8 CPU > >> 16 GB of memory > >> Attached to an Enterprise class SAN > >> > >> The database is on dev253-12 > >> > >> SAR DISK output > >> 09:40:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz > >> await svctm %util > >> 09:50:01 PM dev8-0 2.28 33.73 21.09 24.03 0.01 > >> 2.23 1.38 0.32 > >> 09:50:01 PM dev8-16 2.24 0.23 1501.34 670.68 0.03 > >> 11.20 1.50 0.34 > >> 09:50:01 PM dev8-32 0.57 0.01 7.28 12.68 0.00 > >> 3.42 3.11 0.18 > >> 09:50:01 PM dev8-48 10.07 0.53 1591.25 158.02 0.02 > >> 1.74 1.64 1.66 > >> 09:50:01 PM dev8-64 354.77 16164.46 2231.28 51.85 2.80 > >> 7.89 1.81 64.30 > >> 09:50:01 PM dev253-0 0.78 33.70 0.62 43.93 0.00 > >> 3.91 2.47 0.19 > >> 09:50:01 PM dev253-1 0.00 0.00 0.00 0.00 0.00 > >> 0.00 0.00 0.00 > >> 09:50:01 PM dev253-2 0.00 0.00 0.00 0.00 0.00 > >> 0.00 0.00 0.00 > >> 09:50:01 PM dev253-3 1.67 0.05 13.28 8.01 0.00 > >> 1.21 0.38 0.06 > >> 09:50:01 PM dev253-4 0.38 0.00 3.06 8.00 0.00 > >> 1.64 0.72 0.03 > >> 09:50:01 PM dev253-5 0.43 0.00 3.45 8.00 0.00 > >> 1.28 0.66 0.03 > >> 09:50:01 PM dev253-6 0.00 0.00 0.00 0.00 0.00 > >> 0.00 0.00 0.00 > >> 09:50:01 PM dev253-7 0.09 0.04 0.68 8.00 0.00 > >> 0.72 0.45 0.00 > >> 09:50:01 PM dev253-8 0.00 0.00 0.00 0.00 0.00 > >> 0.00 0.00 0.00 > >> 09:50:01 PM dev253-9 0.00 0.00 0.00 0.00 0.00 > >> 0.00 0.00 0.00 > >> 09:50:01 PM dev253-10 187.70 0.23 1501.34 8.00 2.62 > >> 13.95 0.02 0.34 > >> 09:50:01 PM dev253-11 198.97 0.53 1591.25 8.00 0.53 > >> 2.67 0.08 1.66 > >> 09:50:01 PM dev253-12 534.17 16164.46 2231.28 34.44 4.24 > >> 7.93 1.20 64.32 > >> 09:50:01 PM dev253-13 0.91 0.01 7.28 8.00 0.00 > >> 5.24 1.96 0.18 > >> Average: dev8-0 3.21 73.51 24.19 30.43 0.02 > >> 7.48 4.18 1.34 > >> Average: dev8-16 3.36 1.14 2568.18 763.92 0.08 > >> 22.78 1.09 0.37 > >> Average: dev8-32 0.66 0.97 7.42 12.74 0.00 > >> 1.81 1.55 0.10 > >> Average: dev8-48 9.95 1.60 1373.40 138.23 0.01 > >> 1.51 1.42 1.41 > >> Average: dev8-64 140.65 6834.41 2298.08 64.93 1.19 > >> 8.49 1.82 25.66 > >> Average: dev253-0 1.34 45.41 1.08 34.72 0.02 > >> 15.92 5.52 0.74 > >> Average: dev253-1 0.03 0.20 0.07 8.00 0.00 > >> 2.18 0.67 0.00 > >> Average: dev253-2 0.04 0.23 0.08 8.00 0.00 > >> 10.40 2.11 0.01 > >> Average: dev253-3 1.89 9.17 13.84 12.20 0.01 > >> 3.85 1.67 0.31 > >> Average: dev253-4 0.46 0.21 3.43 8.00 0.00 > >> 2.26 0.91 0.04 > >> Average: dev253-5 0.81 16.84 4.94 26.88 0.01 > >> 7.93 2.83 0.23 > >> Average: dev253-6 0.02 0.19 0.00 8.00 0.00 > >> 0.34 0.32 0.00 > >> Average: dev253-7 0.12 0.23 0.74 8.10 0.00 > >> 1.41 0.99 0.01 > >> Average: dev253-8 0.02 0.19 0.00 8.00 0.00 > >> 0.33 0.32 0.00 > >> Average: dev253-9 0.04 0.36 0.00 8.00 0.00 > >> 0.52 0.52 0.00 > >> Average: dev253-10 321.11 0.74 2568.18 8.00 9.20 > >> 28.65 0.01 0.36 > >> Average: dev253-11 171.74 1.20 1373.40 8.00 0.43 > >> 2.53 0.08 1.42 > >> Average: dev253-12 339.64 6834.01 2298.08 26.89 3.01 > >> 8.84 0.76 25.66 > >> Average: dev253-13 0.98 0.57 7.42 8.19 0.00 > >> 2.26 1.01 0.10 > >> > >>> * Maybe any information about the postgresql tuning items - > >>> shared_buffers is probably a good start, or have you left the tuning > >>> alone? > >> > >> shared_buffers = 4096MB > >> work_mem = 8MB > >> maintenance_work_mem = 32MB > >>> > >>> > >>> You probably don''t need debug with log-slow-statements, I wasn''t aware > >>> that was an option but I think that gives a good amount of data for > >>> this problem. > >>> > >>> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com> wrote: > >>> > I will work on setting up a puppetdb server I can put into debug > mode. > >>> > For > >>> > now here is a "normal" GC run log output and my configuration files. > >>> > > >>> > Thanks, > >>> > > >>> > > >>> > 2013-01-15 19:20:57,767 INFO [clojure-agent-send-off-pool-12] > >>> > [cli.services] Starting database garbage collection > >>> > 2013-01-15 19:26:40,383 WARN [clojure-agent-send-off-pool-12] > >>> > [jdbc.internal] Query slower than 10s threshold: actual execution > >>> > time: > >>> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS > (SELECT > >>> > * > >>> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query > >>> > Params: > >>> > 2013-01-15 19:33:29,304 WARN [clojure-agent-send-off-pool-12] > >>> > [jdbc.internal] Query slower than 10s threshold: actual execution > >>> > time: > >>> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT > EXISTS > >>> > (SELECT * FROM catalog_resources cr WHERE > >>> > cr.resource=resource_params.resource); Query Params: > >>> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > >>> > [cli.services] Finished database garbage collection > >>> > 2013-01-15 19:33:29,308 INFO [clojure-agent-send-off-pool-12] > >>> > [cli.services] Starting sweep of stale nodes (1 day threshold) > >>> > 2013-01-15 19:33:29,390 INFO [clojure-agent-send-off-pool-12] > >>> > [cli.services] Finished sweep of stale nodes (1 day threshold) > >>> > > >>> > ########################################################## > >>> > # > >>> > # config.ini > >>> > # > >>> > > >>> > # See README.md for more thorough explanations of each section and > >>> > # option. > >>> > > >>> > [global] > >>> > # Store mq/db data in a custom directory > >>> > vardir = /var/lib/puppetdb > >>> > # Use an external log4j config file > >>> > logging-config = /etc/puppetdb/conf.d/../log4j.properties > >>> > > >>> > # Maximum number of results that a resource query may return > >>> > resource-query-limit = 20000 > >>> > > >>> > [command-processing] > >>> > # How many command-processing threads to use, defaults to (CPUs / 2) > >>> > threads = 10 > >>> > > >>> > ########################################################## > >>> > # > >>> > # database.ini > >>> > # > >>> > > >>> > [database] > >>> > # For the embedded DB: org.hsqldb.jdbcDriver > >>> > # For PostgreSQL: org.postgresql.Driver > >>> > # Defaults to embedded DB > >>> > classname = org.postgresql.Driver > >>> > > >>> > # For the embedded DB: hsqldb > >>> > # For PostgreSQL: postgresql > >>> > # Defaults to embedded DB > >>> > subprotocol = postgresql > >>> > > >>> > # For the embedded DB: > >>> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true > >>> > # For PostgreSQL: //host:port/databaseName > >>> > # Defaults to embedded DB located in <vardir>/db > >>> > subname = //PUPPETDB:5432/puppet > >>> > > >>> > # Connect as a specific user > >>> > username = PUPPETDBUSER > >>> > > >>> > # Use a specific password > >>> > password = PASSWORDPUPPETDB > >>> > > >>> > # How often (in minutes) to compact the database > >>> > # gc-interval = 60 > >>> > > >>> > # Number of seconds before any SQL query is considered ''slow''; > >>> > offending > >>> > # queries will not be interrupted, but will be logged at the WARN > log > >>> > level. > >>> > log-slow-statements = 10 > >>> > > >>> > # How old (in days) to deactivate nodes that have not reported > >>> > node-ttl-days = 1 > >>> > > >>> > # How often to send a keepalive > >>> > conn-keep-alive = 4 > >>> > > >>> > # How long to keep connections open > >>> > conn-max-age = 30 > >>> > > >>> > > >>> > > >>> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: > >>> >> > >>> >> Hey Chuck, > >>> >> > >>> >> I''ve had a chat with my colleagues and they raised some concerns > about > >>> >> your gc performance. I wouldn''t mind drilling into that problem if > >>> >> thats okay with you? > >>> >> > >>> >> For starters - what are your settings looking like? In particular > I''m > >>> >> interested in node-ttl-seconds and report-ttl-seconds. > >>> >> > >>> >> FYI there are a few actions that occur when gc kicks in: > >>> >> > >>> >> * sweeping stale nodes > >>> >> * sweeping stale reports > >>> >> * removing any unassociated data (catalogs or params) > >>> >> > >>> >> It would be useful to find out which of these are taking up the > most > >>> >> amount of your time. > >>> >> > >>> >> I wouldn''t mind a dump of your logs when in a specific debug mode > if > >>> >> thats at all possible, this should give me a rough idea of the time > >>> >> the sweeper spends doing its various deletes so I can at least get > a > >>> >> focus on your issue. In particular the bits between ''Starting > database > >>> >> garbage collection'' and ''Finished database garbage collection''. > >>> >> > >>> >> The easiest way to get some debugging is to modify your > >>> >> log4j.properties (usually in /etc/puppetdb) and add the line: > >>> >> > >>> >> log4j.logger.com.jolbox.bonecp=DEBUG > >>> >> > >>> >> This will log *ALL* SQL statements, so it might be heavy - I''m not > >>> >> sure if you can isolate one of your nodes for this purpose or > somehow > >>> >> mitigate this risk ... we only need a heavy GC run, it probably > >>> >> doesn''t need to be actively receiving requests and such ... > >>> >> > >>> >> ken. > >>> >> > >>> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote: > >>> >> > > >>> >> > > >>> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: > >>> >> >> > >>> >> >> > Well I currently have 7,000 nodes in my PuppetDB instance and > 8 > >>> >> >> > puppetdb > >>> >> >> > servers in various geographic locations. The garbage collect > >>> >> >> > seems > >>> >> >> > to > >>> >> >> > be a > >>> >> >> > pretty intensive operation on the Postgres DB server in the > >>> >> >> > current > >>> >> >> > design. > >>> >> >> > >>> >> >> Thats a shame to hear. What''s the impact for you? I see from > your > >>> >> >> screenshot its averaging 16 seconds, is that what you are > seeing? > >>> >> > > >>> >> > > >>> >> > This is normally what we see 8 times of course. > >>> >> > > >>> >> >> > >>> >> >> > We have also been having an issue that started at the > beginning > >>> >> >> > of > >>> >> >> > 2013 > >>> >> >> > that > >>> >> >> > we have not been able to narrow down at this time (we were on > >>> >> >> > puppetdb > >>> >> >> > 1.0.1 > >>> >> >> > for a long time with no issues. Updated to 1.0.5 last week > with > >>> >> >> > no > >>> >> >> > improvement) > >>> >> >> > >>> >> >> This issue, is it related to the gc problem or something else? > Can > >>> >> >> you > >>> >> >> elaborate? > >>> >> >> > >>> >> > We are still trying to determine what is happening. We just see > a > >>> >> > huge > >>> >> > spike in the number of rows being deleted. > >>> >> > > >>> >> >> > >>> >> >> > I was planning on having my puppet manifest, for the puppetdb > >>> >> >> > servers, > >>> >> >> > elect > >>> >> >> > one or two master puppetdb servers by querying for active > >>> >> >> > puppetdb > >>> >> >> > servers. > >>> >> >> > So I would be dynamically electing the master(s) based on an > >>> >> >> > algorithm. > >>> >> >> > >>> >> >> Sounds good to me. Do you want to raise a feature request for > this? > >>> >> > > >>> >> > > >>> >> > Yeah I can do that. > >>> >> > > >>> >> >> > >>> >> >> http://projects.puppetlabs.com/projects/puppetdb > >>> >> >> > >>> >> >> Just more or less cover what you''ve said about wanting to > disable > >>> >> >> by > >>> >> >> setting to zero. I''ve already more or less got a patch that can > >>> >> >> disable it when zero (its really just a conditional clause in > the > >>> >> >> right place), but I haven''t done any thorough testing or gotten > >>> >> >> review > >>> >> >> on it yet. > >>> >> >> > >>> >> >> ken. > >>> >> > > >>> >> > -- > >>> >> > You received this message because you are subscribed to the > Google > >>> >> > Groups > >>> >> > "Puppet Users" group. > >>> >> > To view this discussion on the web visit > >>> >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. > >>> >> > > >>> >> > To post to this group, send email to puppet...@googlegroups.com. > >>> >> > To unsubscribe from this group, send email to > >>> >> > puppet-users...@googlegroups.com. > >>> >> > For more options, visit this group at > >>> >> > http://groups.google.com/group/puppet-users?hl=en. > >>> > > >>> > -- > >>> > You received this message because you are subscribed to the Google > >>> > Groups > >>> > "Puppet Users" group. > >>> > To view this discussion on the web visit > >>> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. > >>> > > >>> > To post to this group, send email to puppet...@googlegroups.com. > >>> > To unsubscribe from this group, send email to > >>> > puppet-users...@googlegroups.com. > >>> > For more options, visit this group at > >>> > http://groups.google.com/group/puppet-users?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Puppet Users" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ. > > > > To post to this group, send email to puppet...@googlegroups.com<javascript:>. > > > To unsubscribe from this group, send email to > > puppet-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/vqcNuCBG3_kJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-18 00:28 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
This is unconfirmed at this point. It seems like every 4 days starting Jan 4. 2013 (interesting is started in 2013 and never happened in 2012) like the Catalog or Resource Hash changes, causing the entire catalog_resource table to insert new entries, then possibly GC deletes the old entries. If this holds true we will see this behavior again on Jan. 20, 2013. Hopefully we will have a better idea of what is going on. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Ken Barber
2013-Jan-22 19:27 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
Does this happen across all nodes? This is an indication you might have a resource that affects a large set of nodes that suddenly changes every 4 days. In the catalogs table, the ''hash'' is just a hash of the catalogue data, if anything in the catalogue changes - it changes. And new entries are created. The database garbage collection is there to remove the orphaned entries. So really one could consider this part of normal operation, if your catalogues are constantly changing then the garbage collection runs are bigger. Did you see this happen on Sunday? On Fri, Jan 18, 2013 at 12:28 AM, Chuck <csschwe@gmail.com> wrote:> This is unconfirmed at this point. > > It seems like every 4 days starting Jan 4. 2013 (interesting is started in > 2013 and never happened in 2012) like the Catalog or Resource Hash changes, > causing the entire catalog_resource table to insert new entries, then > possibly GC deletes the old entries. > > If this holds true we will see this behavior again on Jan. 20, 2013. > Hopefully we will have a better idea of what is going on. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en.-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-22 19:40 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
We didn''t notice anything on Sunday. We have a decent number of resources that effect all nodes. This may explain the ocasional performance issues. We have also been messing around with the Indexes as they were getting HUGE. We also started promoting our Puppet changes on a schedule so we effect more nodes at a time than we have in the past. On Tuesday, January 22, 2013 1:27:15 PM UTC-6, Ken Barber wrote:> > Does this happen across all nodes? This is an indication you might > have a resource that affects a large set of nodes that suddenly > changes every 4 days. > > In the catalogs table, the ''hash'' is just a hash of the catalogue > data, if anything in the catalogue changes - it changes. And new > entries are created. The database garbage collection is there to > remove the orphaned entries. So really one could consider this part of > normal operation, if your catalogues are constantly changing then the > garbage collection runs are bigger. > > Did you see this happen on Sunday? > > On Fri, Jan 18, 2013 at 12:28 AM, Chuck <css...@gmail.com <javascript:>> > wrote: > > This is unconfirmed at this point. > > > > It seems like every 4 days starting Jan 4. 2013 (interesting is started > in > > 2013 and never happened in 2012) like the Catalog or Resource Hash > changes, > > causing the entire catalog_resource table to insert new entries, then > > possibly GC deletes the old entries. > > > > If this holds true we will see this behavior again on Jan. 20, 2013. > > Hopefully we will have a better idea of what is going on. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Puppet Users" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ. > > > > To post to this group, send email to puppet...@googlegroups.com<javascript:>. > > > To unsubscribe from this group, send email to > > puppet-users...@googlegroups.com <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/38QnH1c20UcJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Erik Dalén
2013-Jan-23 08:46 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
One of the largest indexes was not needed and removed in the latest version of puppetdb. So you might want to try out that version to reduce the index sizes. On Jan 22, 2013 8:41 PM, "Chuck" <csschwe@gmail.com> wrote:> We didn''t notice anything on Sunday. We have a decent number of resources > that effect all nodes. This may explain the ocasional performance issues. > We have also been messing around with the Indexes as they were getting > HUGE. We also started promoting our Puppet changes on a schedule so we > effect more nodes at a time than we have in the past. > > On Tuesday, January 22, 2013 1:27:15 PM UTC-6, Ken Barber wrote: >> >> Does this happen across all nodes? This is an indication you might >> have a resource that affects a large set of nodes that suddenly >> changes every 4 days. >> >> In the catalogs table, the ''hash'' is just a hash of the catalogue >> data, if anything in the catalogue changes - it changes. And new >> entries are created. The database garbage collection is there to >> remove the orphaned entries. So really one could consider this part of >> normal operation, if your catalogues are constantly changing then the >> garbage collection runs are bigger. >> >> Did you see this happen on Sunday? >> >> On Fri, Jan 18, 2013 at 12:28 AM, Chuck <css...@gmail.com> wrote: >> > This is unconfirmed at this point. >> > >> > It seems like every 4 days starting Jan 4. 2013 (interesting is started >> in >> > 2013 and never happened in 2012) like the Catalog or Resource Hash >> changes, >> > causing the entire catalog_resource table to insert new entries, then >> > possibly GC deletes the old entries. >> > >> > If this holds true we will see this behavior again on Jan. 20, 2013. >> > Hopefully we will have a better idea of what is going on. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups >> > "Puppet Users" group. >> > To view this discussion on the web visit >> > https://groups.google.com/d/**msg/puppet-users/-/**APtEGbv578QJ<https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ>. >> >> > >> > To post to this group, send email to puppet...@googlegroups.com. >> > To unsubscribe from this group, send email to >> > puppet-users...@**googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/**group/puppet-users?hl=en<http://groups.google.com/group/puppet-users?hl=en>. >> >> > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/38QnH1c20UcJ. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Chuck
2013-Jan-23 19:47 UTC
Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it
On Wednesday, January 23, 2013 8:20:34 AM UTC-6, Ken Barber wrote:> > > We didn''t notice anything on Sunday. We have a decent number of > resources > > that effect all nodes. This may explain the ocasional performance > issues. > > Sure, more specifically you''ll get catalog replaces in the database if > you have resources that are always ''changing''. This might be a dynamic > parameter to a resource, a dynamic title or alias. The hash doesn''t > care, any change means that hash is different, thus things become > candidates for garbage collection. > >Yes this does seem to be an issue since we are updating 7k - 8k nodes over the course of an hour. We don''t really have any dynamic catalogs, but we are updating our code in ways that does create the new catalog hashes.> > We have also been messing around with the Indexes as they were getting > HUGE. > > What did you do to the indexes Chuck? Did you rebuild them, if so - > which ones - and do you think it helped? >Still collecting information on this.> Also, I''ve realised my mistake in asking for explain plans for the > select queries I gave you. It didn''t take into account the cascades > during delete, better that I had asked you for explain plans on the > deletes instead: > > explain analyze verbose DELETE FROM catalogs WHERE NOT EXISTS (SELECT > * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); >puppet=# explain analyze verbose DELETE FROM puppet.catalogs WHERE NOT EXISTS (SELECT puppet(# * FROM puppet.certname_catalogs cc WHERE cc.catalog=catalogs.hash); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Delete on puppet.catalogs (cost=764.75..1501.94 rows=985 width=12) (actual time=226.633..226.633 rows=0 loops=1) -> Hash Anti Join (cost=764.75..1501.94 rows=985 width=12) (actual time=41.310..100.652 rows=4326 loops=1) Output: catalogs.ctid, cc.ctid Hash Cond: ((catalogs.hash)::text = (cc.catalog)::text) -> Seq Scan on puppet.catalogs (cost=0.00..637.63 rows=8663 width=47) (actual time=0.386..46.478 rows=12006 loops=1) Output: catalogs.ctid, catalogs.hash -> Hash (cost=668.78..668.78 rows=7678 width=47) (actual time=39.635..39.635 rows=7680 loops=1) Output: cc.ctid, cc.catalog Buckets: 1024 Batches: 1 Memory Usage: 593kB -> Seq Scan on puppet.certname_catalogs cc (cost=0.00..668.78 rows=7678 width=47) (actual time=0.022..35.959 rows=7680 loops=1) Output: cc.ctid, cc.catalog Trigger RI_ConstraintTrigger_16919 for constraint catalog_resources_catalog_fkey: time=220008.002 calls=4326 Trigger RI_ConstraintTrigger_16924 for constraint certname_catalogs_catalog_fkey: time=374.236 calls=4326 Trigger RI_ConstraintTrigger_16944 for constraint classes_catalog_fkey: time=45119.841 calls=4326 Trigger RI_ConstraintTrigger_16949 for constraint edges_catalog_fkey: time=395772.761 calls=4326 Trigger RI_ConstraintTrigger_16954 for constraint tags_catalog_fkey: time=46050.487 calls=4326 Total runtime: 707572.852 ms (17 rows)> explain analyze verbose DELETE FROM resource_params WHERE NOT EXISTS > (SELECT * FROM catalog_resources cr WHERE > cr.resource=resource_params.resource); > > puppet=# explain analyze verbose DELETE FROM puppet.resource_params WHERENOT EXISTS puppet-# (SELECT * FROM puppet.catalog_resources cr WHERE puppet(# cr.resource=puppet.resource_params.resource); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on puppet.resource_params (cost=0.00..34821.24 rows=422373 width=12) (actual time=177204.058..177204.058 rows=0 loops=1) -> Nested Loop Anti Join (cost=0.00..34821.24 rows=422373 width=12) (actual time=1029.091..177055.468 rows=389 loops=1) Output: resource_params.ctid, cr.ctid -> Seq Scan on puppet.resource_params (cost=0.00..25337.86 rows=438886 width=47) (actual time=0.014..597.273 rows=446297 loops=1) Output: resource_params.ctid, resource_params.resource -> Index Scan using idx_catalog_resources_resource on puppet.catalog_resources cr (cost=0.00..62.39 rows=3155 width=47) (actual time=0.394..0.394 rows=1 loops=446297) Output: cr.ctid, cr.resource Index Cond: ((cr.resource)::text = (resource_params.resource)::text) Total runtime: 177204.131 ms> Also without a real delete in the first SQL, the second delete > wouldn''t have triggered any removal. > > ken. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/7TS_DCm09uMJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.