thr3ads.net - Puppet users - [Puppet Users] Puppetdb will setting gc-interval to 0 disable it [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Chuck

2013-Jan-14 20:40 UTC

[Puppet Users] Puppetdb will setting gc-interval to 0 disable it

I have multiple puppetdb severs pointed at the same database.

Would setting gc-interval to 0 disable it on the extra puppetdb nodes?

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/4DEyx0IXbLsJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-14 20:46 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

I believe you''ll get an assertion failure by setting it to zero.

In a way you probably don''t want it to be pinned to 1 node, because if
it is removed from the cluster or goes down you''ll lose the garbage
collection. However we don''t have a nice way today of splaying the
interval or anything magical like that. Is having it run in multiple
places causing you an issue today?

On Mon, Jan 14, 2013 at 8:40 PM, Chuck <csschwe@gmail.com>
wrote:> I have multiple puppetdb severs pointed at the same database.
>
> Would setting gc-interval to 0 disable it on the extra puppetdb nodes?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/4DEyx0IXbLsJ.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-15 01:42 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb 
servers in various geographic locations.   The garbage collect seems to be 
a pretty intensive operation on the Postgres DB server in the current 
design.   We have also been having an issue that started at the beginning 
of 2013 that we have not been able to narrow down at this time (we were on 
puppetdb 1.0.1 for a long time with no issues.   Updated to 1.0.5 last week 
with no improvement)

I was planning on having my puppet manifest, for the puppetdb servers, 
elect one or two master puppetdb servers by querying for active puppetdb 
servers.  So I would be dynamically electing the master(s) based on an 
algorithm.  

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/g0ERyTN2xS8J.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-15 01:53 UTC

head link

[Puppet Users] Re: Puppetdb will setting gc-interval to 0 disable it

PuppetDB dashboard snapshot

<https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s1600/Screen+Shot+2013-01-14+at+7.50.12+PM.png>

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/24okmKqjUm8J.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

GRANIER Bernard (MORPHO)

2013-Jan-15 10:24 UTC

head link

[Puppet Users] module documentation

Hi,

I see that comments could be written :
# my comment
Or
/* my coment */

But is there something like javadoc comments and tag ?

Cordialement,

Bernard Granier
CE Plateforme Système
bernard.granier@morpho.com<mailto:bernard.granier@morpho.com>
01 58 11 32 51

From: puppet-users@googlegroups.com [mailto:puppet-users@googlegroups.com] On
Behalf Of Chuck
Sent: Tuesday, January 15, 2013 2:53 AM
To: puppet-users@googlegroups.com
Subject: [Puppet Users] Re: Puppetdb will setting gc-interval to 0 disable it


PuppetDB dashboard snapshot

[https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s320/Screen+Shot+2013-01-14+at+7.50.12+PM.png]<https://lh6.googleusercontent.com/-0HsHXGsQq6k/UPS2Vq4XAwI/AAAAAAAACrw/sXsbRupQVXY/s1600/Screen+Shot+2013-01-14+at+7.50.12+PM.png>

--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/24okmKqjUm8J.
To post to this group, send email to
puppet-users@googlegroups.com<mailto:puppet-users@googlegroups.com>.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com<mailto:puppet-users+unsubscribe@googlegroups.com>.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
#
" This e-mail and any attached documents may contain confidential or
proprietary information. If you are not the intended recipient, you are notified
that any dissemination, copying of this e-mail and any attachments thereto or
use of their contents by any means whatsoever is strictly prohibited. If you
have received this e-mail in error, please advise the sender immediately and
delete this e-mail and all attached documents from your computer system."
#

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-15 15:55 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

> Well I currently have 7,000 nodes in my PuppetDB instance and 8 puppetdb
> servers in various geographic locations.   The garbage collect seems to be
a
> pretty intensive operation on the Postgres DB server in the current design.
Thats a shame to hear. What''s the impact for you? I see from your
screenshot its averaging 16 seconds, is that what you are seeing?
> We have also been having an issue that started at the beginning of 2013
that
> we have not been able to narrow down at this time (we were on puppetdb
1.0.1
> for a long time with no issues.   Updated to 1.0.5 last week with no
> improvement)
This issue, is it related to the gc problem or something else? Can you
elaborate?
> I was planning on having my puppet manifest, for the puppetdb servers,
elect
> one or two master puppetdb servers by querying for active puppetdb servers.
> So I would be dynamically electing the master(s) based on an algorithm.
Sounds good to me. Do you want to raise a feature request for this?

http://projects.puppetlabs.com/projects/puppetdb

Just more or less cover what you''ve said about wanting to disable by
setting to zero. I''ve already more or less got a patch that can
disable it when zero (its really just a conditional clause in the
right place), but I haven''t done any thorough testing or gotten review
on it yet.

ken.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-15 16:12 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber
wrote:>
> > Well I currently have 7,000 nodes in my PuppetDB instance and 8
puppetdb
> > servers in various geographic locations.   The garbage collect seems
to
> be a 
> > pretty intensive operation on the Postgres DB server in the current 
> design. 
>
> Thats a shame to hear. What''s the impact for you? I see from your 
> screenshot its averaging 16 seconds, is that what you are seeing? 
>
This is normally what we see 8 times of course. 
 
> > We have also been having an issue that started at the beginning of
2013
> that 
> > we have not been able to narrow down at this time (we were on puppetdb
> 1.0.1 
> > for a long time with no issues.   Updated to 1.0.5 last week with no 
> > improvement) 
>
> This issue, is it related to the gc problem or something else? Can you 
> elaborate? 
>
> We are still trying to determine what is happening.  We just see a huge spike in the number of rows being deleted.
 
> > I was planning on having my puppet manifest, for the puppetdb servers,
> elect 
> > one or two master puppetdb servers by querying for active puppetdb 
> servers. 
> > So I would be dynamically electing the master(s) based on an
algorithm.
>
> Sounds good to me. Do you want to raise a feature request for this? 
>
Yeah I can do that. 
 
> http://projects.puppetlabs.com/projects/puppetdb 
>
> Just more or less cover what you''ve said about wanting to disable
by
> setting to zero. I''ve already more or less got a patch that can 
> disable it when zero (its really just a conditional clause in the 
> right place), but I haven''t done any thorough testing or gotten
review
> on it yet. 
>
> ken. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-15 19:27 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Hey Chuck,

I''ve had a chat with my colleagues and they raised some concerns about
your gc performance. I wouldn''t mind drilling into that problem if
thats okay with you?

For starters - what are your settings looking like? In particular I''m
interested in node-ttl-seconds and report-ttl-seconds.

FYI there are a few actions that occur when gc kicks in:

* sweeping stale nodes
* sweeping stale reports
* removing any unassociated data (catalogs or params)

It would be useful to find out which of these are taking up the most
amount of your time.

I wouldn''t mind a dump of your logs when in a specific debug mode if
thats at all possible, this should give me a rough idea of the time
the sweeper spends doing its various deletes so I can at least get a
focus on your issue. In particular the bits between ''Starting database
garbage collection'' and ''Finished database garbage
collection''.

The easiest way to get some debugging is to modify your
log4j.properties (usually in /etc/puppetdb) and add the line:

log4j.logger.com.jolbox.bonecp=DEBUG

This will log *ALL* SQL statements, so it might be heavy - I''m not
sure if you can isolate one of your nodes for this purpose or somehow
mitigate this risk ... we only need a heavy GC run, it probably
doesn''t need to be actively receiving requests and such ...

ken.

On Tue, Jan 15, 2013 at 4:12 PM, Chuck <csschwe@gmail.com>
wrote:>
>
> On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote:
>>
>> > Well I currently have 7,000 nodes in my PuppetDB instance and 8
puppetdb
>> > servers in various geographic locations.   The garbage collect
seems to
>> > be a
>> > pretty intensive operation on the Postgres DB server in the
current
>> > design.
>>
>> Thats a shame to hear. What''s the impact for you? I see from
your
>> screenshot its averaging 16 seconds, is that what you are seeing?
>
>
> This is normally what we see 8 times of course.
>
>>
>> > We have also been having an issue that started at the beginning of
2013
>> > that
>> > we have not been able to narrow down at this time (we were on
puppetdb
>> > 1.0.1
>> > for a long time with no issues.   Updated to 1.0.5 last week with
no
>> > improvement)
>>
>> This issue, is it related to the gc problem or something else? Can you
>> elaborate?
>>
> We are still trying to determine what is happening.  We just see a huge
> spike in the number of rows being deleted.
>
>>
>> > I was planning on having my puppet manifest, for the puppetdb
servers,
>> > elect
>> > one or two master puppetdb servers by querying for active puppetdb
>> > servers.
>> > So I would be dynamically electing the master(s) based on an
algorithm.
>>
>> Sounds good to me. Do you want to raise a feature request for this?
>
>
> Yeah I can do that.
>
>>
>> http://projects.puppetlabs.com/projects/puppetdb
>>
>> Just more or less cover what you''ve said about wanting to
disable by
>> setting to zero. I''ve already more or less got a patch that
can
>> disable it when zero (its really just a conditional clause in the
>> right place), but I haven''t done any thorough testing or
gotten review
>> on it yet.
>>
>> ken.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-15 20:24 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

I will work on setting up a puppetdb server I can put into debug mode.  For 
now here is a "normal" GC run log output and my configuration files.

Thanks,


2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12] 
[cli.services] Starting database garbage collection
2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12] 
[jdbc.internal] Query slower than 10s threshold:  actual execution time: 
342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT * 
FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params: 
2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12] 
[jdbc.internal] Query slower than 10s threshold:  actual execution time: 
408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS 
(SELECT * FROM catalog_resources cr WHERE 
cr.resource=resource_params.resource); Query Params: 
2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
[cli.services] Finished database garbage collection
2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
[cli.services] Starting sweep of stale nodes (1 day threshold)
2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12] 
[cli.services] Finished sweep of stale nodes (1 day threshold)

##########################################################
#
#   config.ini 
#

# See README.md for more thorough explanations of each section and
# option.

[global]
# Store mq/db data in a custom directory
vardir = /var/lib/puppetdb
# Use an external log4j config file
logging-config = /etc/puppetdb/conf.d/../log4j.properties

# Maximum number of results that a resource query may return
resource-query-limit = 20000

[command-processing]
# How many command-processing threads to use, defaults to (CPUs / 2)
threads = 10

##########################################################
#
# database.ini 
#

[database]
# For the embedded DB: org.hsqldb.jdbcDriver
# For PostgreSQL: org.postgresql.Driver
# Defaults to embedded DB
classname = org.postgresql.Driver

# For the embedded DB: hsqldb
# For PostgreSQL: postgresql
# Defaults to embedded DB
subprotocol = postgresql

# For the embedded DB: 
file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true
# For PostgreSQL: //host:port/databaseName
# Defaults to embedded DB located in <vardir>/db
subname = //PUPPETDB:5432/puppet

# Connect as a specific user
username = PUPPETDBUSER

# Use a specific password
password = PASSWORDPUPPETDB

# How often (in minutes) to compact the database
# gc-interval = 60

# Number of seconds before any SQL query is considered ''slow'';
offending
# queries will not be interrupted, but will be logged at the WARN log level.
log-slow-statements = 10

# How old (in days) to deactivate nodes that have not reported
node-ttl-days = 1

# How often to send a keepalive
conn-keep-alive = 4

# How long to keep connections open
conn-max-age = 30



On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber
wrote:>
> Hey Chuck, 
>
> I''ve had a chat with my colleagues and they raised some concerns
about
> your gc performance. I wouldn''t mind drilling into that problem if
> thats okay with you? 
>
> For starters - what are your settings looking like? In particular
I''m
> interested in node-ttl-seconds and report-ttl-seconds. 
>
> FYI there are a few actions that occur when gc kicks in: 
>
> * sweeping stale nodes 
> * sweeping stale reports 
> * removing any unassociated data (catalogs or params) 
>
> It would be useful to find out which of these are taking up the most 
> amount of your time. 
>
> I wouldn''t mind a dump of your logs when in a specific debug mode
if
> thats at all possible, this should give me a rough idea of the time 
> the sweeper spends doing its various deletes so I can at least get a 
> focus on your issue. In particular the bits between ''Starting
database
> garbage collection'' and ''Finished database garbage
collection''.
>
> The easiest way to get some debugging is to modify your 
> log4j.properties (usually in /etc/puppetdb) and add the line: 
>
> log4j.logger.com.jolbox.bonecp=DEBUG 
>
> This will log *ALL* SQL statements, so it might be heavy - I''m not
> sure if you can isolate one of your nodes for this purpose or somehow 
> mitigate this risk ... we only need a heavy GC run, it probably 
> doesn''t need to be actively receiving requests and such ... 
>
> ken. 
>
> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com
<javascript:>>
> wrote: 
> > 
> > 
> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote: 
> >> 
> >> > Well I currently have 7,000 nodes in my PuppetDB instance and
8
> puppetdb 
> >> > servers in various geographic locations.   The garbage
collect seems
> to 
> >> > be a 
> >> > pretty intensive operation on the Postgres DB server in the
current
> >> > design. 
> >> 
> >> Thats a shame to hear. What''s the impact for you? I see
from your
> >> screenshot its averaging 16 seconds, is that what you are seeing? 
> > 
> > 
> > This is normally what we see 8 times of course. 
> > 
> >> 
> >> > We have also been having an issue that started at the
beginning of
> 2013 
> >> > that 
> >> > we have not been able to narrow down at this time (we were on
> puppetdb 
> >> > 1.0.1 
> >> > for a long time with no issues.   Updated to 1.0.5 last week
with no
> >> > improvement) 
> >> 
> >> This issue, is it related to the gc problem or something else? Can
you
> >> elaborate? 
> >> 
> > We are still trying to determine what is happening.  We just see a
huge
> > spike in the number of rows being deleted. 
> > 
> >> 
> >> > I was planning on having my puppet manifest, for the puppetdb
> servers, 
> >> > elect 
> >> > one or two master puppetdb servers by querying for active
puppetdb
> >> > servers. 
> >> > So I would be dynamically electing the master(s) based on an 
> algorithm. 
> >> 
> >> Sounds good to me. Do you want to raise a feature request for
this?
> > 
> > 
> > Yeah I can do that. 
> > 
> >> 
> >> http://projects.puppetlabs.com/projects/puppetdb 
> >> 
> >> Just more or less cover what you''ve said about wanting to
disable by
> >> setting to zero. I''ve already more or less got a patch
that can
> >> disable it when zero (its really just a conditional clause in the 
> >> right place), but I haven''t done any thorough testing or
gotten review
> >> on it yet. 
> >> 
> >> ken. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Users" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. 
> > 
> > To post to this group, send email to
puppet...@googlegroups.com<javascript:>.
>
> > To unsubscribe from this group, send email to 
> > puppet-users...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/puppet-users?hl=en. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-15 21:19 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

So that looks like its taking ages ... some questions for you:

* What version of Postgresql are you running and on what distro &
version exactly?
* Are you using the distros packages or someone elses?
* Is autovacuum switched on at all (check your postgresql.conf)?
* Do you know the last time the database was vacuum''d and/or analyzed?
Try this link for a way to check this:
http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/
* Can you get me some box metrics, memory size, cpu size/number, disk
spindle speeds etc. of your postgresql box? So I can get an idea of
the throughput available to you.
* Maybe any information about the postgresql tuning items -
shared_buffers is probably a good start, or have you left the tuning
alone?

You probably don''t need debug with log-slow-statements, I
wasn''t aware
that was an option but I think that gives a good amount of data for
this problem.

On Tue, Jan 15, 2013 at 8:24 PM, Chuck <csschwe@gmail.com>
wrote:> I will work on setting up a puppetdb server I can put into debug mode.  For
> now here is a "normal" GC run log output and my configuration
files.
>
> Thanks,
>
>
> 2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12]
> [cli.services] Starting database garbage collection
> 2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12]
> [jdbc.internal] Query slower than 10s threshold:  actual execution time:
> 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT *
> FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query Params:
> 2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12]
> [jdbc.internal] Query slower than 10s threshold:  actual execution time:
> 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS
> (SELECT * FROM catalog_resources cr WHERE
> cr.resource=resource_params.resource); Query Params:
> 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12]
> [cli.services] Finished database garbage collection
> 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12]
> [cli.services] Starting sweep of stale nodes (1 day threshold)
> 2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12]
> [cli.services] Finished sweep of stale nodes (1 day threshold)
>
> ##########################################################
> #
> #   config.ini
> #
>
> # See README.md for more thorough explanations of each section and
> # option.
>
> [global]
> # Store mq/db data in a custom directory
> vardir = /var/lib/puppetdb
> # Use an external log4j config file
> logging-config = /etc/puppetdb/conf.d/../log4j.properties
>
> # Maximum number of results that a resource query may return
> resource-query-limit = 20000
>
> [command-processing]
> # How many command-processing threads to use, defaults to (CPUs / 2)
> threads = 10
>
> ##########################################################
> #
> # database.ini
> #
>
> [database]
> # For the embedded DB: org.hsqldb.jdbcDriver
> # For PostgreSQL: org.postgresql.Driver
> # Defaults to embedded DB
> classname = org.postgresql.Driver
>
> # For the embedded DB: hsqldb
> # For PostgreSQL: postgresql
> # Defaults to embedded DB
> subprotocol = postgresql
>
> # For the embedded DB:
> file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true
> # For PostgreSQL: //host:port/databaseName
> # Defaults to embedded DB located in <vardir>/db
> subname = //PUPPETDB:5432/puppet
>
> # Connect as a specific user
> username = PUPPETDBUSER
>
> # Use a specific password
> password = PASSWORDPUPPETDB
>
> # How often (in minutes) to compact the database
> # gc-interval = 60
>
> # Number of seconds before any SQL query is considered
''slow''; offending
> # queries will not be interrupted, but will be logged at the WARN log
level.
> log-slow-statements = 10
>
> # How old (in days) to deactivate nodes that have not reported
> node-ttl-days = 1
>
> # How often to send a keepalive
> conn-keep-alive = 4
>
> # How long to keep connections open
> conn-max-age = 30
>
>
>
> On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote:
>>
>> Hey Chuck,
>>
>> I''ve had a chat with my colleagues and they raised some
concerns about
>> your gc performance. I wouldn''t mind drilling into that
problem if
>> thats okay with you?
>>
>> For starters - what are your settings looking like? In particular
I''m
>> interested in node-ttl-seconds and report-ttl-seconds.
>>
>> FYI there are a few actions that occur when gc kicks in:
>>
>> * sweeping stale nodes
>> * sweeping stale reports
>> * removing any unassociated data (catalogs or params)
>>
>> It would be useful to find out which of these are taking up the most
>> amount of your time.
>>
>> I wouldn''t mind a dump of your logs when in a specific debug
mode if
>> thats at all possible, this should give me a rough idea of the time
>> the sweeper spends doing its various deletes so I can at least get a
>> focus on your issue. In particular the bits between ''Starting
database
>> garbage collection'' and ''Finished database garbage
collection''.
>>
>> The easiest way to get some debugging is to modify your
>> log4j.properties (usually in /etc/puppetdb) and add the line:
>>
>> log4j.logger.com.jolbox.bonecp=DEBUG
>>
>> This will log *ALL* SQL statements, so it might be heavy - I''m
not
>> sure if you can isolate one of your nodes for this purpose or somehow
>> mitigate this risk ... we only need a heavy GC run, it probably
>> doesn''t need to be actively receiving requests and such ...
>>
>> ken.
>>
>> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com> wrote:
>> >
>> >
>> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber wrote:
>> >>
>> >> > Well I currently have 7,000 nodes in my PuppetDB instance
and 8
>> >> > puppetdb
>> >> > servers in various geographic locations.   The garbage
collect seems
>> >> > to
>> >> > be a
>> >> > pretty intensive operation on the Postgres DB server in
the current
>> >> > design.
>> >>
>> >> Thats a shame to hear. What''s the impact for you? I
see from your
>> >> screenshot its averaging 16 seconds, is that what you are
seeing?
>> >
>> >
>> > This is normally what we see 8 times of course.
>> >
>> >>
>> >> > We have also been having an issue that started at the
beginning of
>> >> > 2013
>> >> > that
>> >> > we have not been able to narrow down at this time (we
were on
>> >> > puppetdb
>> >> > 1.0.1
>> >> > for a long time with no issues.   Updated to 1.0.5 last
week with no
>> >> > improvement)
>> >>
>> >> This issue, is it related to the gc problem or something else?
Can you
>> >> elaborate?
>> >>
>> > We are still trying to determine what is happening.  We just see a
huge
>> > spike in the number of rows being deleted.
>> >
>> >>
>> >> > I was planning on having my puppet manifest, for the
puppetdb
>> >> > servers,
>> >> > elect
>> >> > one or two master puppetdb servers by querying for active
puppetdb
>> >> > servers.
>> >> > So I would be dynamically electing the master(s) based on
an
>> >> > algorithm.
>> >>
>> >> Sounds good to me. Do you want to raise a feature request for
this?
>> >
>> >
>> > Yeah I can do that.
>> >
>> >>
>> >> http://projects.puppetlabs.com/projects/puppetdb
>> >>
>> >> Just more or less cover what you''ve said about
wanting to disable by
>> >> setting to zero. I''ve already more or less got a
patch that can
>> >> disable it when zero (its really just a conditional clause in
the
>> >> right place), but I haven''t done any thorough testing
or gotten review
>> >> on it yet.
>> >>
>> >> ken.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Puppet Users" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
>> >
>> > To post to this group, send email to puppet...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > puppet-users...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/puppet-users?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-15 22:07 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber
wrote:>
> So that looks like its taking ages ... some questions for you: 
>
> * What version of Postgresql are you running and on what distro & 
> version exactly? 
>RHEL 6.3
Enterprise DB ppas-9.1.4

* Are you using the distros packages or someone elses? >Provided by Enterprise DB 

* Is autovacuum switched on at all (check your postgresql.conf)?
>Yes. We also ran manual vacuum analyze with no noticeable changes in 
performance. 
> * Do you know the last time the database was vacuum''d and/or
analyzed?
> Try this link for a way to check this: 
> http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/
>puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, 
last_autoanalyze from pg_stat_user_tables;
         relname         |          last_vacuum          |       
 last_autovacuum        |         last_analyze          |       
last_autoanalyze        
-------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 catalog_resources       |                               | 2013-01-12 
15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 
18:27:11.166982+00
 certname_facts_metadata |                               | 2013-01-15 
19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 
19:22:15.308925+00
 certnames               | 2013-01-03 22:22:11.416002+00 | 2013-01-15 
21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 
21:17:48.14814+00
 resource_params         |                               | 2013-01-05 
08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00  | 2013-01-15 
15:56:52.589316+00
 classes                 |                               | 2013-01-12 
09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 
16:24:04.909758+00
 catalogs                |                               | 2013-01-15 
21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 
21:18:16.493626+00
 certname_facts          |                               | 2013-01-15 
21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00  | 2013-01-15 
21:25:08.780617+00
 tags                    |                               | 2013-01-15 
21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00  | 2013-01-15 
21:48:19.751198+00
 edges                   |                               | 2013-01-12 
11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 
17:27:19.061491+00
 certname_catalogs       |                               | 2013-01-15 
21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 
21:18:55.369032+00
 schema_migrations       |                               |                 
              | 2012-11-06 22:02:40.637773+00 | 
(11 rows)
 
> * Can you get me some box metrics, memory size, cpu size/number, disk 
> spindle speeds etc. of your postgresql box? So I can get an idea of 
> the throughput available to you. 
>It is a VMware VM
the VM has the following properties:
8 CPU
16 GB of memory 
Attached to an Enterprise class SAN

The database is on dev253-12

SAR DISK output
09:40:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     
await     svctm     %util
09:50:01 PM    dev8-0      2.28     33.73     21.09     24.03      0.01     
 2.23      1.38      0.32
09:50:01 PM   dev8-16      2.24      0.23   1501.34    670.68      0.03     
11.20      1.50      0.34
09:50:01 PM   dev8-32      0.57      0.01      7.28     12.68      0.00     
 3.42      3.11      0.18
09:50:01 PM   dev8-48     10.07      0.53   1591.25    158.02      0.02     
 1.74      1.64      1.66
09:50:01 PM   dev8-64    354.77  16164.46   2231.28     51.85      2.80     
 7.89      1.81     64.30
09:50:01 PM  dev253-0      0.78     33.70      0.62     43.93      0.00     
 3.91      2.47      0.19
09:50:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-2      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-3      1.67      0.05     13.28      8.01      0.00     
 1.21      0.38      0.06
09:50:01 PM  dev253-4      0.38      0.00      3.06      8.00      0.00     
 1.64      0.72      0.03
09:50:01 PM  dev253-5      0.43      0.00      3.45      8.00      0.00     
 1.28      0.66      0.03
09:50:01 PM  dev253-6      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-7      0.09      0.04      0.68      8.00      0.00     
 0.72      0.45      0.00
09:50:01 PM  dev253-8      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM  dev253-9      0.00      0.00      0.00      0.00      0.00     
 0.00      0.00      0.00
09:50:01 PM dev253-10    187.70      0.23   1501.34      8.00      2.62     
13.95      0.02      0.34
09:50:01 PM dev253-11    198.97      0.53   1591.25      8.00      0.53     
 2.67      0.08      1.66
09:50:01 PM dev253-12    534.17  16164.46   2231.28     34.44      4.24     
 7.93      1.20     64.32
09:50:01 PM dev253-13      0.91      0.01      7.28      8.00      0.00     
 5.24      1.96      0.18
Average:       dev8-0      3.21     73.51     24.19     30.43      0.02     
 7.48      4.18      1.34
Average:      dev8-16      3.36      1.14   2568.18    763.92      0.08     
22.78      1.09      0.37
Average:      dev8-32      0.66      0.97      7.42     12.74      0.00     
 1.81      1.55      0.10
Average:      dev8-48      9.95      1.60   1373.40    138.23      0.01     
 1.51      1.42      1.41
Average:      dev8-64    140.65   6834.41   2298.08     64.93      1.19     
 8.49      1.82     25.66
Average:     dev253-0      1.34     45.41      1.08     34.72      0.02     
15.92      5.52      0.74
Average:     dev253-1      0.03      0.20      0.07      8.00      0.00     
 2.18      0.67      0.00
Average:     dev253-2      0.04      0.23      0.08      8.00      0.00     
10.40      2.11      0.01
Average:     dev253-3      1.89      9.17     13.84     12.20      0.01     
 3.85      1.67      0.31
Average:     dev253-4      0.46      0.21      3.43      8.00      0.00     
 2.26      0.91      0.04
Average:     dev253-5      0.81     16.84      4.94     26.88      0.01     
 7.93      2.83      0.23
Average:     dev253-6      0.02      0.19      0.00      8.00      0.00     
 0.34      0.32      0.00
Average:     dev253-7      0.12      0.23      0.74      8.10      0.00     
 1.41      0.99      0.01
Average:     dev253-8      0.02      0.19      0.00      8.00      0.00     
 0.33      0.32      0.00
Average:     dev253-9      0.04      0.36      0.00      8.00      0.00     
 0.52      0.52      0.00
Average:    dev253-10    321.11      0.74   2568.18      8.00      9.20     
28.65      0.01      0.36
Average:    dev253-11    171.74      1.20   1373.40      8.00      0.43     
 2.53      0.08      1.42
Average:    dev253-12    339.64   6834.01   2298.08     26.89      3.01     
 8.84      0.76     25.66
Average:    dev253-13      0.98      0.57      7.42      8.19      0.00     
 2.26      1.01      0.10

* Maybe any information about the postgresql tuning items -
> shared_buffers is probably a good start, or have you left the tuning 
> alone? 
>shared_buffers = 4096MB
work_mem = 8MB 
maintenance_work_mem = 32MB 
>
> You probably don''t need debug with log-slow-statements, I
wasn''t aware
> that was an option but I think that gives a good amount of data for 
> this problem. 
>
> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com
<javascript:>>
> wrote: 
> > I will work on setting up a puppetdb server I can put into debug mode.
>  For 
> > now here is a "normal" GC run log output and my
configuration files.
> > 
> > Thanks, 
> > 
> > 
> > 2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Starting database garbage collection 
> > 2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12] 
> > [jdbc.internal] Query slower than 10s threshold:  actual execution
time:
> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS (SELECT
*
> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query
Params:
> > 2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12] 
> > [jdbc.internal] Query slower than 10s threshold:  actual execution
time:
> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT EXISTS 
> > (SELECT * FROM catalog_resources cr WHERE 
> > cr.resource=resource_params.resource); Query Params: 
> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Finished database garbage collection 
> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Starting sweep of stale nodes (1 day threshold) 
> > 2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12] 
> > [cli.services] Finished sweep of stale nodes (1 day threshold) 
> > 
> > ########################################################## 
> > # 
> > #   config.ini 
> > # 
> > 
> > # See README.md for more thorough explanations of each section and 
> > # option. 
> > 
> > [global] 
> > # Store mq/db data in a custom directory 
> > vardir = /var/lib/puppetdb 
> > # Use an external log4j config file 
> > logging-config = /etc/puppetdb/conf.d/../log4j.properties 
> > 
> > # Maximum number of results that a resource query may return 
> > resource-query-limit = 20000 
> > 
> > [command-processing] 
> > # How many command-processing threads to use, defaults to (CPUs / 2) 
> > threads = 10 
> > 
> > ########################################################## 
> > # 
> > # database.ini 
> > # 
> > 
> > [database] 
> > # For the embedded DB: org.hsqldb.jdbcDriver 
> > # For PostgreSQL: org.postgresql.Driver 
> > # Defaults to embedded DB 
> > classname = org.postgresql.Driver 
> > 
> > # For the embedded DB: hsqldb 
> > # For PostgreSQL: postgresql 
> > # Defaults to embedded DB 
> > subprotocol = postgresql 
> > 
> > # For the embedded DB: 
> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true 
> > # For PostgreSQL: //host:port/databaseName 
> > # Defaults to embedded DB located in <vardir>/db 
> > subname = //PUPPETDB:5432/puppet 
> > 
> > # Connect as a specific user 
> > username = PUPPETDBUSER 
> > 
> > # Use a specific password 
> > password = PASSWORDPUPPETDB 
> > 
> > # How often (in minutes) to compact the database 
> > # gc-interval = 60 
> > 
> > # Number of seconds before any SQL query is considered
''slow''; offending
> > # queries will not be interrupted, but will be logged at the WARN log 
> level. 
> > log-slow-statements = 10 
> > 
> > # How old (in days) to deactivate nodes that have not reported 
> > node-ttl-days = 1 
> > 
> > # How often to send a keepalive 
> > conn-keep-alive = 4 
> > 
> > # How long to keep connections open 
> > conn-max-age = 30 
> > 
> > 
> > 
> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: 
> >> 
> >> Hey Chuck, 
> >> 
> >> I''ve had a chat with my colleagues and they raised some
concerns about
> >> your gc performance. I wouldn''t mind drilling into that
problem if
> >> thats okay with you? 
> >> 
> >> For starters - what are your settings looking like? In particular
I''m
> >> interested in node-ttl-seconds and report-ttl-seconds. 
> >> 
> >> FYI there are a few actions that occur when gc kicks in: 
> >> 
> >> * sweeping stale nodes 
> >> * sweeping stale reports 
> >> * removing any unassociated data (catalogs or params) 
> >> 
> >> It would be useful to find out which of these are taking up the
most
> >> amount of your time. 
> >> 
> >> I wouldn''t mind a dump of your logs when in a specific
debug mode if
> >> thats at all possible, this should give me a rough idea of the
time
> >> the sweeper spends doing its various deletes so I can at least get
a
> >> focus on your issue. In particular the bits between
''Starting database
> >> garbage collection'' and ''Finished database
garbage collection''.
> >> 
> >> The easiest way to get some debugging is to modify your 
> >> log4j.properties (usually in /etc/puppetdb) and add the line: 
> >> 
> >> log4j.logger.com.jolbox.bonecp=DEBUG 
> >> 
> >> This will log *ALL* SQL statements, so it might be heavy -
I''m not
> >> sure if you can isolate one of your nodes for this purpose or
somehow
> >> mitigate this risk ... we only need a heavy GC run, it probably 
> >> doesn''t need to be actively receiving requests and such
...
> >> 
> >> ken. 
> >> 
> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck <css...@gmail.com>
wrote:
> >> > 
> >> > 
> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber
wrote:
> >> >> 
> >> >> > Well I currently have 7,000 nodes in my PuppetDB
instance and 8
> >> >> > puppetdb 
> >> >> > servers in various geographic locations.   The
garbage collect
> seems 
> >> >> > to 
> >> >> > be a 
> >> >> > pretty intensive operation on the Postgres DB server
in the
> current 
> >> >> > design. 
> >> >> 
> >> >> Thats a shame to hear. What''s the impact for
you? I see from your
> >> >> screenshot its averaging 16 seconds, is that what you are
seeing?
> >> > 
> >> > 
> >> > This is normally what we see 8 times of course. 
> >> > 
> >> >> 
> >> >> > We have also been having an issue that started at
the beginning of
> >> >> > 2013 
> >> >> > that 
> >> >> > we have not been able to narrow down at this time
(we were on
> >> >> > puppetdb 
> >> >> > 1.0.1 
> >> >> > for a long time with no issues.   Updated to 1.0.5
last week with
> no 
> >> >> > improvement) 
> >> >> 
> >> >> This issue, is it related to the gc problem or something
else? Can
> you 
> >> >> elaborate? 
> >> >> 
> >> > We are still trying to determine what is happening.  We just
see a
> huge 
> >> > spike in the number of rows being deleted. 
> >> > 
> >> >> 
> >> >> > I was planning on having my puppet manifest, for the
puppetdb
> >> >> > servers, 
> >> >> > elect 
> >> >> > one or two master puppetdb servers by querying for
active puppetdb
> >> >> > servers. 
> >> >> > So I would be dynamically electing the master(s)
based on an
> >> >> > algorithm. 
> >> >> 
> >> >> Sounds good to me. Do you want to raise a feature request
for this?
> >> > 
> >> > 
> >> > Yeah I can do that. 
> >> > 
> >> >> 
> >> >> http://projects.puppetlabs.com/projects/puppetdb 
> >> >> 
> >> >> Just more or less cover what you''ve said about
wanting to disable by
> >> >> setting to zero. I''ve already more or less got a
patch that can
> >> >> disable it when zero (its really just a conditional
clause in the
> >> >> right place), but I haven''t done any thorough
testing or gotten
> review 
> >> >> on it yet. 
> >> >> 
> >> >> ken. 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the
Google
> >> > Groups 
> >> > "Puppet Users" group. 
> >> > To view this discussion on the web visit 
> >> > https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ. 
> >> > 
> >> > To post to this group, send email to
puppet...@googlegroups.com.
> >> > To unsubscribe from this group, send email to 
> >> > puppet-users...@googlegroups.com. 
> >> > For more options, visit this group at 
> >> > http://groups.google.com/group/puppet-users?hl=en. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Users" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. 
> > 
> > To post to this group, send email to
puppet...@googlegroups.com<javascript:>.
>
> > To unsubscribe from this group, send email to 
> > puppet-users...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/puppet-users?hl=en. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/3BUCBok8BzIJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-16 14:33 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Here is an example of locks taking a long time to acquire in the postgres DB

2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet LOG: 
 process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811 of 
database 16513 after 12166509.285 ms
2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet 
LOG:  process 4553 still waiting for ShareLock on transaction 32181035 
after 1000.100 ms
2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet 
LOG:  process 5953 still waiting for ShareLock on transaction 32245132 
after 1000.148 ms
2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet 
LOG:  process 5953 acquired ShareLock on transaction 32245132 after 
1813.791 ms
2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet 
LOG:  process 4553 acquired ShareLock on transaction 32181035 after 
172731.472 ms
2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet 
LOG:  process 3346 acquired ExclusiveLock on tuple (0,102) of relation 
16811 of database 16513 after 12261437.817 ms
2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet 
LOG:  process 3346 still waiting for ShareLock on transaction 32196872 
after 1000.516 ms
2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet 
LOG:  process 3346 acquired ShareLock on transaction 32196872 after 
260360.667 ms
2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet 
LOG:  process 27293 acquired ExclusiveLock on tuple (0,102) of relation 
16811 of database 16513 after 12475027.712 ms
2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet 
LOG:  process 27293 still waiting for ShareLock on transaction 32197208 
after 1000.078 ms
2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS |puppet 
LOG:  process 17853 still waiting for ExclusiveLock on tuple (0,102) of 
relation 16811 of database 16513 after 1000.212 ms
2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS |puppet 
LOG:  process 20980 still waiting for ExclusiveLock on tuple (0,102) of 
relation 16811 of database 16513 after 1000.733 ms
2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS |puppet 
LOG:  process 28432 still waiting for ExclusiveLock on tuple (0,102) of 
relation 16811 of database 16513 after 1000.283 ms

On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck
wrote:>
>
>
> On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote:
>>
>> So that looks like its taking ages ... some questions for you: 
>>
>> * What version of Postgresql are you running and on what distro & 
>> version exactly? 
>>
> RHEL 6.3
> Enterprise DB ppas-9.1.4
>
> * Are you using the distros packages or someone elses? 
>>
> Provided by Enterprise DB 
>
> * Is autovacuum switched on at all (check your postgresql.conf)? 
>>
> Yes. We also ran manual vacuum analyze with no noticeable changes in 
> performance. 
>
>> * Do you know the last time the database was vacuum''d and/or
analyzed?
>> Try this link for a way to check this: 
>>
>>
http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/
>>
> puppet=# select relname,last_vacuum, last_autovacuum, last_analyze, 
> last_autoanalyze from pg_stat_user_tables;
>          relname         |          last_vacuum          |       
>  last_autovacuum        |         last_analyze          |       
> last_autoanalyze        
>
>
-------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
>  catalog_resources       |                               | 2013-01-12 
> 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 
> 18:27:11.166982+00
>  certname_facts_metadata |                               | 2013-01-15 
> 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 
> 19:22:15.308925+00
>  certnames               | 2013-01-03 22:22:11.416002+00 | 2013-01-15 
> 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 
> 21:17:48.14814+00
>  resource_params         |                               | 2013-01-05 
> 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00  | 2013-01-15 
> 15:56:52.589316+00
>  classes                 |                               | 2013-01-12 
> 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 
> 16:24:04.909758+00
>  catalogs                |                               | 2013-01-15 
> 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 
> 21:18:16.493626+00
>  certname_facts          |                               | 2013-01-15 
> 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00  | 2013-01-15 
> 21:25:08.780617+00
>  tags                    |                               | 2013-01-15 
> 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00  | 2013-01-15 
> 21:48:19.751198+00
>  edges                   |                               | 2013-01-12 
> 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 
> 17:27:19.061491+00
>  certname_catalogs       |                               | 2013-01-15 
> 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 
> 21:18:55.369032+00
>  schema_migrations       |                               |                 
>               | 2012-11-06 22:02:40.637773+00 | 
> (11 rows)
>  
>
>> * Can you get me some box metrics, memory size, cpu size/number, disk 
>> spindle speeds etc. of your postgresql box? So I can get an idea of 
>> the throughput available to you. 
>>
> It is a VMware VM
> the VM has the following properties:
> 8 CPU
> 16 GB of memory 
> Attached to an Enterprise class SAN
>
> The database is on dev253-12
>
> SAR DISK output
> 09:40:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz   
>   await     svctm     %util
> 09:50:01 PM    dev8-0      2.28     33.73     21.09     24.03      0.01   
>    2.23      1.38      0.32
> 09:50:01 PM   dev8-16      2.24      0.23   1501.34    670.68      0.03   
>   11.20      1.50      0.34
> 09:50:01 PM   dev8-32      0.57      0.01      7.28     12.68      0.00   
>    3.42      3.11      0.18
> 09:50:01 PM   dev8-48     10.07      0.53   1591.25    158.02      0.02   
>    1.74      1.64      1.66
> 09:50:01 PM   dev8-64    354.77  16164.46   2231.28     51.85      2.80   
>    7.89      1.81     64.30
> 09:50:01 PM  dev253-0      0.78     33.70      0.62     43.93      0.00   
>    3.91      2.47      0.19
> 09:50:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00   
>    0.00      0.00      0.00
> 09:50:01 PM  dev253-2      0.00      0.00      0.00      0.00      0.00   
>    0.00      0.00      0.00
> 09:50:01 PM  dev253-3      1.67      0.05     13.28      8.01      0.00   
>    1.21      0.38      0.06
> 09:50:01 PM  dev253-4      0.38      0.00      3.06      8.00      0.00   
>    1.64      0.72      0.03
> 09:50:01 PM  dev253-5      0.43      0.00      3.45      8.00      0.00   
>    1.28      0.66      0.03
> 09:50:01 PM  dev253-6      0.00      0.00      0.00      0.00      0.00   
>    0.00      0.00      0.00
> 09:50:01 PM  dev253-7      0.09      0.04      0.68      8.00      0.00   
>    0.72      0.45      0.00
> 09:50:01 PM  dev253-8      0.00      0.00      0.00      0.00      0.00   
>    0.00      0.00      0.00
> 09:50:01 PM  dev253-9      0.00      0.00      0.00      0.00      0.00   
>    0.00      0.00      0.00
> 09:50:01 PM dev253-10    187.70      0.23   1501.34      8.00      2.62   
>   13.95      0.02      0.34
> 09:50:01 PM dev253-11    198.97      0.53   1591.25      8.00      0.53   
>    2.67      0.08      1.66
> 09:50:01 PM dev253-12    534.17  16164.46   2231.28     34.44      4.24   
>    7.93      1.20     64.32
> 09:50:01 PM dev253-13      0.91      0.01      7.28      8.00      0.00   
>    5.24      1.96      0.18
> Average:       dev8-0      3.21     73.51     24.19     30.43      0.02   
>    7.48      4.18      1.34
> Average:      dev8-16      3.36      1.14   2568.18    763.92      0.08   
>   22.78      1.09      0.37
> Average:      dev8-32      0.66      0.97      7.42     12.74      0.00   
>    1.81      1.55      0.10
> Average:      dev8-48      9.95      1.60   1373.40    138.23      0.01   
>    1.51      1.42      1.41
> Average:      dev8-64    140.65   6834.41   2298.08     64.93      1.19   
>    8.49      1.82     25.66
> Average:     dev253-0      1.34     45.41      1.08     34.72      0.02   
>   15.92      5.52      0.74
> Average:     dev253-1      0.03      0.20      0.07      8.00      0.00   
>    2.18      0.67      0.00
> Average:     dev253-2      0.04      0.23      0.08      8.00      0.00   
>   10.40      2.11      0.01
> Average:     dev253-3      1.89      9.17     13.84     12.20      0.01   
>    3.85      1.67      0.31
> Average:     dev253-4      0.46      0.21      3.43      8.00      0.00   
>    2.26      0.91      0.04
> Average:     dev253-5      0.81     16.84      4.94     26.88      0.01   
>    7.93      2.83      0.23
> Average:     dev253-6      0.02      0.19      0.00      8.00      0.00   
>    0.34      0.32      0.00
> Average:     dev253-7      0.12      0.23      0.74      8.10      0.00   
>    1.41      0.99      0.01
> Average:     dev253-8      0.02      0.19      0.00      8.00      0.00   
>    0.33      0.32      0.00
> Average:     dev253-9      0.04      0.36      0.00      8.00      0.00   
>    0.52      0.52      0.00
> Average:    dev253-10    321.11      0.74   2568.18      8.00      9.20   
>   28.65      0.01      0.36
> Average:    dev253-11    171.74      1.20   1373.40      8.00      0.43   
>    2.53      0.08      1.42
> Average:    dev253-12    339.64   6834.01   2298.08     26.89      3.01   
>    8.84      0.76     25.66
> Average:    dev253-13      0.98      0.57      7.42      8.19      0.00   
>    2.26      1.01      0.10
>
> * Maybe any information about the postgresql tuning items - 
>> shared_buffers is probably a good start, or have you left the tuning 
>> alone? 
>>
> shared_buffers = 4096MB
> work_mem = 8MB 
> maintenance_work_mem = 32MB 
>
>>
>> You probably don''t need debug with log-slow-statements, I
wasn''t aware
>> that was an option but I think that gives a good amount of data for 
>> this problem. 
>>
>> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com> wrote: 
>> > I will work on setting up a puppetdb server I can put into debug
mode.
>>  For 
>> > now here is a "normal" GC run log output and my
configuration files.
>> > 
>> > Thanks, 
>> > 
>> > 
>> > 2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12] 
>> > [cli.services] Starting database garbage collection 
>> > 2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12] 
>> > [jdbc.internal] Query slower than 10s threshold:  actual execution
>> time: 
>> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS
(SELECT
>> * 
>> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); Query 
>> Params: 
>> > 2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12] 
>> > [jdbc.internal] Query slower than 10s threshold:  actual execution
>> time: 
>> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT
EXISTS
>> > (SELECT * FROM catalog_resources cr WHERE 
>> > cr.resource=resource_params.resource); Query Params: 
>> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
>> > [cli.services] Finished database garbage collection 
>> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12] 
>> > [cli.services] Starting sweep of stale nodes (1 day threshold) 
>> > 2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12] 
>> > [cli.services] Finished sweep of stale nodes (1 day threshold) 
>> > 
>> > ########################################################## 
>> > # 
>> > #   config.ini 
>> > # 
>> > 
>> > # See README.md for more thorough explanations of each section and
>> > # option. 
>> > 
>> > [global] 
>> > # Store mq/db data in a custom directory 
>> > vardir = /var/lib/puppetdb 
>> > # Use an external log4j config file 
>> > logging-config = /etc/puppetdb/conf.d/../log4j.properties 
>> > 
>> > # Maximum number of results that a resource query may return 
>> > resource-query-limit = 20000 
>> > 
>> > [command-processing] 
>> > # How many command-processing threads to use, defaults to (CPUs /
2)
>> > threads = 10 
>> > 
>> > ########################################################## 
>> > # 
>> > # database.ini 
>> > # 
>> > 
>> > [database] 
>> > # For the embedded DB: org.hsqldb.jdbcDriver 
>> > # For PostgreSQL: org.postgresql.Driver 
>> > # Defaults to embedded DB 
>> > classname = org.postgresql.Driver 
>> > 
>> > # For the embedded DB: hsqldb 
>> > # For PostgreSQL: postgresql 
>> > # Defaults to embedded DB 
>> > subprotocol = postgresql 
>> > 
>> > # For the embedded DB: 
>> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true 
>> > # For PostgreSQL: //host:port/databaseName 
>> > # Defaults to embedded DB located in <vardir>/db 
>> > subname = //PUPPETDB:5432/puppet 
>> > 
>> > # Connect as a specific user 
>> > username = PUPPETDBUSER 
>> > 
>> > # Use a specific password 
>> > password = PASSWORDPUPPETDB 
>> > 
>> > # How often (in minutes) to compact the database 
>> > # gc-interval = 60 
>> > 
>> > # Number of seconds before any SQL query is considered
''slow'';
>> offending 
>> > # queries will not be interrupted, but will be logged at the WARN
log
>> level. 
>> > log-slow-statements = 10 
>> > 
>> > # How old (in days) to deactivate nodes that have not reported 
>> > node-ttl-days = 1 
>> > 
>> > # How often to send a keepalive 
>> > conn-keep-alive = 4 
>> > 
>> > # How long to keep connections open 
>> > conn-max-age = 30 
>> > 
>> > 
>> > 
>> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber wrote: 
>> >> 
>> >> Hey Chuck, 
>> >> 
>> >> I''ve had a chat with my colleagues and they raised
some concerns about
>> >> your gc performance. I wouldn''t mind drilling into
that problem if
>> >> thats okay with you? 
>> >> 
>> >> For starters - what are your settings looking like? In
particular I''m
>> >> interested in node-ttl-seconds and report-ttl-seconds. 
>> >> 
>> >> FYI there are a few actions that occur when gc kicks in: 
>> >> 
>> >> * sweeping stale nodes 
>> >> * sweeping stale reports 
>> >> * removing any unassociated data (catalogs or params) 
>> >> 
>> >> It would be useful to find out which of these are taking up
the most
>> >> amount of your time. 
>> >> 
>> >> I wouldn''t mind a dump of your logs when in a
specific debug mode if
>> >> thats at all possible, this should give me a rough idea of the
time
>> >> the sweeper spends doing its various deletes so I can at least
get a
>> >> focus on your issue. In particular the bits between
''Starting database
>> >> garbage collection'' and ''Finished database
garbage collection''.
>> >> 
>> >> The easiest way to get some debugging is to modify your 
>> >> log4j.properties (usually in /etc/puppetdb) and add the line: 
>> >> 
>> >> log4j.logger.com.jolbox.bonecp=DEBUG 
>> >> 
>> >> This will log *ALL* SQL statements, so it might be heavy -
I''m not
>> >> sure if you can isolate one of your nodes for this purpose or
somehow
>> >> mitigate this risk ... we only need a heavy GC run, it
probably
>> >> doesn''t need to be actively receiving requests and
such ...
>> >> 
>> >> ken. 
>> >> 
>> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck
<css...@gmail.com> wrote:
>> >> > 
>> >> > 
>> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken Barber
wrote:
>> >> >> 
>> >> >> > Well I currently have 7,000 nodes in my PuppetDB
instance and 8
>> >> >> > puppetdb 
>> >> >> > servers in various geographic locations.   The
garbage collect
>> seems 
>> >> >> > to 
>> >> >> > be a 
>> >> >> > pretty intensive operation on the Postgres DB
server in the
>> current 
>> >> >> > design. 
>> >> >> 
>> >> >> Thats a shame to hear. What''s the impact for
you? I see from your
>> >> >> screenshot its averaging 16 seconds, is that what you
are seeing?
>> >> > 
>> >> > 
>> >> > This is normally what we see 8 times of course. 
>> >> > 
>> >> >> 
>> >> >> > We have also been having an issue that started
at the beginning
>> of 
>> >> >> > 2013 
>> >> >> > that 
>> >> >> > we have not been able to narrow down at this
time (we were on
>> >> >> > puppetdb 
>> >> >> > 1.0.1 
>> >> >> > for a long time with no issues.   Updated to
1.0.5 last week with
>> no 
>> >> >> > improvement) 
>> >> >> 
>> >> >> This issue, is it related to the gc problem or
something else? Can
>> you 
>> >> >> elaborate? 
>> >> >> 
>> >> > We are still trying to determine what is happening.  We
just see a
>> huge 
>> >> > spike in the number of rows being deleted. 
>> >> > 
>> >> >> 
>> >> >> > I was planning on having my puppet manifest, for
the puppetdb
>> >> >> > servers, 
>> >> >> > elect 
>> >> >> > one or two master puppetdb servers by querying
for active
>> puppetdb 
>> >> >> > servers. 
>> >> >> > So I would be dynamically electing the master(s)
based on an
>> >> >> > algorithm. 
>> >> >> 
>> >> >> Sounds good to me. Do you want to raise a feature
request for this?
>> >> > 
>> >> > 
>> >> > Yeah I can do that. 
>> >> > 
>> >> >> 
>> >> >> http://projects.puppetlabs.com/projects/puppetdb 
>> >> >> 
>> >> >> Just more or less cover what you''ve said
about wanting to disable
>> by 
>> >> >> setting to zero. I''ve already more or less
got a patch that can
>> >> >> disable it when zero (its really just a conditional
clause in the
>> >> >> right place), but I haven''t done any
thorough testing or gotten
>> review 
>> >> >> on it yet. 
>> >> >> 
>> >> >> ken. 
>> >> > 
>> >> > -- 
>> >> > You received this message because you are subscribed to
the Google
>> >> > Groups 
>> >> > "Puppet Users" group. 
>> >> > To view this discussion on the web visit 
>> >> >
https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
>> >> > 
>> >> > To post to this group, send email to
puppet...@googlegroups.com.
>> >> > To unsubscribe from this group, send email to 
>> >> > puppet-users...@googlegroups.com. 
>> >> > For more options, visit this group at 
>> >> > http://groups.google.com/group/puppet-users?hl=en. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google
>> Groups 
>> > "Puppet Users" group. 
>> > To view this discussion on the web visit 
>> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ. 
>> > 
>> > To post to this group, send email to puppet...@googlegroups.com. 
>> > To unsubscribe from this group, send email to 
>> > puppet-users...@googlegroups.com. 
>> > For more options, visit this group at 
>> > http://groups.google.com/group/puppet-users?hl=en. 
>>
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-17 17:17 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Hmm. I wonder if this is an artifact of the query for the delete
taking too long, or the fact that there the query for the delete is
taking too long. It would seem odd to me if there were a lot of
deletes, this implies a lot of orphans which shouldn''t be a regular
thing, unless this points to an orphan problem that we need to
address.

What are the results of these queries?

select count(*) from catalogs where not exists (SELECT * FROM
certname_catalogs cc WHERE cc.catalog=catalogs.hash);

select count(*) from resource_params where not exists(SELECT * FROM
catalog_resources cr WHERE cr.resource=resource_params.resource);

explain analyze verbose select count(*) from catalogs where not exists
(SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash);

explain analyze verbose select count(*) from resource_params where not
exists(SELECT * FROM catalog_resources cr WHERE
cr.resource=resource_params.resource);

And if you run the queries without the explain, do you see any results
come back at any point in time? This might help us understand how much
is pending deletion.

As far as I can see all the columns in those queries have
corresponding indexes. So I would be surprised if its an inefficient
SQL query.

As far as the DELETE locks, I starts to make me wonder how much of
this load is a side-effect of this problem reaching a threshold
whereby contention on the DB is high and now we''re getting a spiral of
backlogging causing the overall times to slow down. With 8 nodes, and
a delete taking 6 minutes or so ... thats 48 minutes of gc.

Almost makes you want a way to run the GC on one node only to remove
the contention so you can analyse the problem properly ;-). I presume
this is production, and you can''t just somehow make that happen can
you? ie. get only 1 node running. Maybe making the GC on a few nodes
excessive (like 24 hours instead of 60 minutes) might emulate this
effect without us having to patch.

I''d be interested in seeing how this affects your SQL times in the
logs when only 1 node is running a GC at a time instead of 8, just to
remove contention from the equation.

ken.

On Wed, Jan 16, 2013 at 2:33 PM, Chuck <csschwe@gmail.com>
wrote:> Here is an example of locks taking a long time to acquire in the postgres
DB
>
> 2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet LOG:
> process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811 of
> database 16513 after 12166509.285 ms
> 2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet
LOG:
> process 4553 still waiting for ShareLock on transaction 32181035 after
> 1000.100 ms
> 2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet
LOG:
> process 5953 still waiting for ShareLock on transaction 32245132 after
> 1000.148 ms
> 2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS |puppet
LOG:
> process 5953 acquired ShareLock on transaction 32245132 after 1813.791 ms
> 2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS |puppet
LOG:
> process 4553 acquired ShareLock on transaction 32181035 after 172731.472 ms
> 2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet
LOG:
> process 3346 acquired ExclusiveLock on tuple (0,102) of relation 16811 of
> database 16513 after 12261437.817 ms
> 2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet
LOG:
> process 3346 still waiting for ShareLock on transaction 32196872 after
> 1000.516 ms
> 2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS |puppet
LOG:
> process 3346 acquired ShareLock on transaction 32196872 after 260360.667 ms
> 2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet
> LOG:  process 27293 acquired ExclusiveLock on tuple (0,102) of relation
> 16811 of database 16513 after 12475027.712 ms
> 2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS |puppet
> LOG:  process 27293 still waiting for ShareLock on transaction 32197208
> after 1000.078 ms
> 2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS |puppet
> LOG:  process 17853 still waiting for ExclusiveLock on tuple (0,102) of
> relation 16811 of database 16513 after 1000.212 ms
> 2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS |puppet
> LOG:  process 20980 still waiting for ExclusiveLock on tuple (0,102) of
> relation 16811 of database 16513 after 1000.733 ms
> 2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS |puppet
> LOG:  process 28432 still waiting for ExclusiveLock on tuple (0,102) of
> relation 16811 of database 16513 after 1000.283 ms
>
> On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck wrote:
>>
>>
>>
>> On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote:
>>>
>>> So that looks like its taking ages ... some questions for you:
>>>
>>> * What version of Postgresql are you running and on what distro
&
>>> version exactly?
>>
>> RHEL 6.3
>> Enterprise DB ppas-9.1.4
>>
>>> * Are you using the distros packages or someone elses?
>>
>> Provided by Enterprise DB
>>
>>> * Is autovacuum switched on at all (check your postgresql.conf)?
>>
>> Yes. We also ran manual vacuum analyze with no noticeable changes in
>> performance.
>>>
>>> * Do you know the last time the database was vacuum''d
and/or analyzed?
>>> Try this link for a way to check this:
>>>
>>>
http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/
>>
>> puppet=# select relname,last_vacuum, last_autovacuum, last_analyze,
>> last_autoanalyze from pg_stat_user_tables;
>>          relname         |          last_vacuum          |
>> last_autovacuum        |         last_analyze          |
>> last_autoanalyze
>>
>>
-------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
>>  catalog_resources       |                               | 2013-01-12
>> 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15
>> 18:27:11.166982+00
>>  certname_facts_metadata |                               | 2013-01-15
>> 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15
>> 19:22:15.308925+00
>>  certnames               | 2013-01-03 22:22:11.416002+00 | 2013-01-15
>> 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15
>> 21:17:48.14814+00
>>  resource_params         |                               | 2013-01-05
>> 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00  | 2013-01-15
>> 15:56:52.589316+00
>>  classes                 |                               | 2013-01-12
>> 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15
>> 16:24:04.909758+00
>>  catalogs                |                               | 2013-01-15
>> 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15
>> 21:18:16.493626+00
>>  certname_facts          |                               | 2013-01-15
>> 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00  | 2013-01-15
>> 21:25:08.780617+00
>>  tags                    |                               | 2013-01-15
>> 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00  | 2013-01-15
>> 21:48:19.751198+00
>>  edges                   |                               | 2013-01-12
>> 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15
>> 17:27:19.061491+00
>>  certname_catalogs       |                               | 2013-01-15
>> 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15
>> 21:18:55.369032+00
>>  schema_migrations       |                               |
>> | 2012-11-06 22:02:40.637773+00 |
>> (11 rows)
>>
>>>
>>> * Can you get me some box metrics, memory size, cpu size/number,
disk
>>> spindle speeds etc. of your postgresql box? So I can get an idea of
>>> the throughput available to you.
>>
>> It is a VMware VM
>> the VM has the following properties:
>> 8 CPU
>> 16 GB of memory
>> Attached to an Enterprise class SAN
>>
>> The database is on dev253-12
>>
>> SAR DISK output
>> 09:40:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
>> await     svctm     %util
>> 09:50:01 PM    dev8-0      2.28     33.73     21.09     24.03      0.01
>> 2.23      1.38      0.32
>> 09:50:01 PM   dev8-16      2.24      0.23   1501.34    670.68      0.03
>> 11.20      1.50      0.34
>> 09:50:01 PM   dev8-32      0.57      0.01      7.28     12.68      0.00
>> 3.42      3.11      0.18
>> 09:50:01 PM   dev8-48     10.07      0.53   1591.25    158.02      0.02
>> 1.74      1.64      1.66
>> 09:50:01 PM   dev8-64    354.77  16164.46   2231.28     51.85      2.80
>> 7.89      1.81     64.30
>> 09:50:01 PM  dev253-0      0.78     33.70      0.62     43.93      0.00
>> 3.91      2.47      0.19
>> 09:50:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:50:01 PM  dev253-2      0.00      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:50:01 PM  dev253-3      1.67      0.05     13.28      8.01      0.00
>> 1.21      0.38      0.06
>> 09:50:01 PM  dev253-4      0.38      0.00      3.06      8.00      0.00
>> 1.64      0.72      0.03
>> 09:50:01 PM  dev253-5      0.43      0.00      3.45      8.00      0.00
>> 1.28      0.66      0.03
>> 09:50:01 PM  dev253-6      0.00      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:50:01 PM  dev253-7      0.09      0.04      0.68      8.00      0.00
>> 0.72      0.45      0.00
>> 09:50:01 PM  dev253-8      0.00      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:50:01 PM  dev253-9      0.00      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:50:01 PM dev253-10    187.70      0.23   1501.34      8.00      2.62
>> 13.95      0.02      0.34
>> 09:50:01 PM dev253-11    198.97      0.53   1591.25      8.00      0.53
>> 2.67      0.08      1.66
>> 09:50:01 PM dev253-12    534.17  16164.46   2231.28     34.44      4.24
>> 7.93      1.20     64.32
>> 09:50:01 PM dev253-13      0.91      0.01      7.28      8.00      0.00
>> 5.24      1.96      0.18
>> Average:       dev8-0      3.21     73.51     24.19     30.43      0.02
>> 7.48      4.18      1.34
>> Average:      dev8-16      3.36      1.14   2568.18    763.92      0.08
>> 22.78      1.09      0.37
>> Average:      dev8-32      0.66      0.97      7.42     12.74      0.00
>> 1.81      1.55      0.10
>> Average:      dev8-48      9.95      1.60   1373.40    138.23      0.01
>> 1.51      1.42      1.41
>> Average:      dev8-64    140.65   6834.41   2298.08     64.93      1.19
>> 8.49      1.82     25.66
>> Average:     dev253-0      1.34     45.41      1.08     34.72      0.02
>> 15.92      5.52      0.74
>> Average:     dev253-1      0.03      0.20      0.07      8.00      0.00
>> 2.18      0.67      0.00
>> Average:     dev253-2      0.04      0.23      0.08      8.00      0.00
>> 10.40      2.11      0.01
>> Average:     dev253-3      1.89      9.17     13.84     12.20      0.01
>> 3.85      1.67      0.31
>> Average:     dev253-4      0.46      0.21      3.43      8.00      0.00
>> 2.26      0.91      0.04
>> Average:     dev253-5      0.81     16.84      4.94     26.88      0.01
>> 7.93      2.83      0.23
>> Average:     dev253-6      0.02      0.19      0.00      8.00      0.00
>> 0.34      0.32      0.00
>> Average:     dev253-7      0.12      0.23      0.74      8.10      0.00
>> 1.41      0.99      0.01
>> Average:     dev253-8      0.02      0.19      0.00      8.00      0.00
>> 0.33      0.32      0.00
>> Average:     dev253-9      0.04      0.36      0.00      8.00      0.00
>> 0.52      0.52      0.00
>> Average:    dev253-10    321.11      0.74   2568.18      8.00      9.20
>> 28.65      0.01      0.36
>> Average:    dev253-11    171.74      1.20   1373.40      8.00      0.43
>> 2.53      0.08      1.42
>> Average:    dev253-12    339.64   6834.01   2298.08     26.89      3.01
>> 8.84      0.76     25.66
>> Average:    dev253-13      0.98      0.57      7.42      8.19      0.00
>> 2.26      1.01      0.10
>>
>>> * Maybe any information about the postgresql tuning items -
>>> shared_buffers is probably a good start, or have you left the
tuning
>>> alone?
>>
>> shared_buffers = 4096MB
>> work_mem = 8MB
>> maintenance_work_mem = 32MB
>>>
>>>
>>> You probably don''t need debug with log-slow-statements, I
wasn''t aware
>>> that was an option but I think that gives a good amount of data for
>>> this problem.
>>>
>>> On Tue, Jan 15, 2013 at 8:24 PM, Chuck <css...@gmail.com>
wrote:
>>> > I will work on setting up a puppetdb server I can put into
debug mode.
>>> > For
>>> > now here is a "normal" GC run log output and my
configuration files.
>>> >
>>> > Thanks,
>>> >
>>> >
>>> > 2013-01-15 19:20:57,767 INFO  [clojure-agent-send-off-pool-12]
>>> > [cli.services] Starting database garbage collection
>>> > 2013-01-15 19:26:40,383 WARN  [clojure-agent-send-off-pool-12]
>>> > [jdbc.internal] Query slower than 10s threshold:  actual
execution
>>> > time:
>>> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT EXISTS
(SELECT
>>> > *
>>> > FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash);
Query
>>> > Params:
>>> > 2013-01-15 19:33:29,304 WARN  [clojure-agent-send-off-pool-12]
>>> > [jdbc.internal] Query slower than 10s threshold:  actual
execution
>>> > time:
>>> > 408.9210 seconds; Query: DELETE FROM resource_params WHERE NOT
EXISTS
>>> > (SELECT * FROM catalog_resources cr WHERE
>>> > cr.resource=resource_params.resource); Query Params:
>>> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12]
>>> > [cli.services] Finished database garbage collection
>>> > 2013-01-15 19:33:29,308 INFO  [clojure-agent-send-off-pool-12]
>>> > [cli.services] Starting sweep of stale nodes (1 day threshold)
>>> > 2013-01-15 19:33:29,390 INFO  [clojure-agent-send-off-pool-12]
>>> > [cli.services] Finished sweep of stale nodes (1 day threshold)
>>> >
>>> > ##########################################################
>>> > #
>>> > #   config.ini
>>> > #
>>> >
>>> > # See README.md for more thorough explanations of each section
and
>>> > # option.
>>> >
>>> > [global]
>>> > # Store mq/db data in a custom directory
>>> > vardir = /var/lib/puppetdb
>>> > # Use an external log4j config file
>>> > logging-config = /etc/puppetdb/conf.d/../log4j.properties
>>> >
>>> > # Maximum number of results that a resource query may return
>>> > resource-query-limit = 20000
>>> >
>>> > [command-processing]
>>> > # How many command-processing threads to use, defaults to
(CPUs / 2)
>>> > threads = 10
>>> >
>>> > ##########################################################
>>> > #
>>> > # database.ini
>>> > #
>>> >
>>> > [database]
>>> > # For the embedded DB: org.hsqldb.jdbcDriver
>>> > # For PostgreSQL: org.postgresql.Driver
>>> > # Defaults to embedded DB
>>> > classname = org.postgresql.Driver
>>> >
>>> > # For the embedded DB: hsqldb
>>> > # For PostgreSQL: postgresql
>>> > # Defaults to embedded DB
>>> > subprotocol = postgresql
>>> >
>>> > # For the embedded DB:
>>> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true
>>> > # For PostgreSQL: //host:port/databaseName
>>> > # Defaults to embedded DB located in <vardir>/db
>>> > subname = //PUPPETDB:5432/puppet
>>> >
>>> > # Connect as a specific user
>>> > username = PUPPETDBUSER
>>> >
>>> > # Use a specific password
>>> > password = PASSWORDPUPPETDB
>>> >
>>> > # How often (in minutes) to compact the database
>>> > # gc-interval = 60
>>> >
>>> > # Number of seconds before any SQL query is considered
''slow'';
>>> > offending
>>> > # queries will not be interrupted, but will be logged at the
WARN log
>>> > level.
>>> > log-slow-statements = 10
>>> >
>>> > # How old (in days) to deactivate nodes that have not reported
>>> > node-ttl-days = 1
>>> >
>>> > # How often to send a keepalive
>>> > conn-keep-alive = 4
>>> >
>>> > # How long to keep connections open
>>> > conn-max-age = 30
>>> >
>>> >
>>> >
>>> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber
wrote:
>>> >>
>>> >> Hey Chuck,
>>> >>
>>> >> I''ve had a chat with my colleagues and they
raised some concerns about
>>> >> your gc performance. I wouldn''t mind drilling
into that problem if
>>> >> thats okay with you?
>>> >>
>>> >> For starters - what are your settings looking like? In
particular I''m
>>> >> interested in node-ttl-seconds and report-ttl-seconds.
>>> >>
>>> >> FYI there are a few actions that occur when gc kicks in:
>>> >>
>>> >> * sweeping stale nodes
>>> >> * sweeping stale reports
>>> >> * removing any unassociated data (catalogs or params)
>>> >>
>>> >> It would be useful to find out which of these are taking
up the most
>>> >> amount of your time.
>>> >>
>>> >> I wouldn''t mind a dump of your logs when in a
specific debug mode if
>>> >> thats at all possible, this should give me a rough idea of
the time
>>> >> the sweeper spends doing its various deletes so I can at
least get a
>>> >> focus on your issue. In particular the bits between
''Starting database
>>> >> garbage collection'' and ''Finished
database garbage collection''.
>>> >>
>>> >> The easiest way to get some debugging is to modify your
>>> >> log4j.properties (usually in /etc/puppetdb) and add the
line:
>>> >>
>>> >> log4j.logger.com.jolbox.bonecp=DEBUG
>>> >>
>>> >> This will log *ALL* SQL statements, so it might be heavy -
I''m not
>>> >> sure if you can isolate one of your nodes for this purpose
or somehow
>>> >> mitigate this risk ... we only need a heavy GC run, it
probably
>>> >> doesn''t need to be actively receiving requests
and such ...
>>> >>
>>> >> ken.
>>> >>
>>> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck
<css...@gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6, Ken
Barber wrote:
>>> >> >>
>>> >> >> > Well I currently have 7,000 nodes in my
PuppetDB instance and 8
>>> >> >> > puppetdb
>>> >> >> > servers in various geographic locations.  
The garbage collect
>>> >> >> > seems
>>> >> >> > to
>>> >> >> > be a
>>> >> >> > pretty intensive operation on the Postgres
DB server in the
>>> >> >> > current
>>> >> >> > design.
>>> >> >>
>>> >> >> Thats a shame to hear. What''s the impact
for you? I see from your
>>> >> >> screenshot its averaging 16 seconds, is that what
you are seeing?
>>> >> >
>>> >> >
>>> >> > This is normally what we see 8 times of course.
>>> >> >
>>> >> >>
>>> >> >> > We have also been having an issue that
started at the beginning
>>> >> >> > of
>>> >> >> > 2013
>>> >> >> > that
>>> >> >> > we have not been able to narrow down at this
time (we were on
>>> >> >> > puppetdb
>>> >> >> > 1.0.1
>>> >> >> > for a long time with no issues.   Updated to
1.0.5 last week with
>>> >> >> > no
>>> >> >> > improvement)
>>> >> >>
>>> >> >> This issue, is it related to the gc problem or
something else? Can
>>> >> >> you
>>> >> >> elaborate?
>>> >> >>
>>> >> > We are still trying to determine what is happening. 
We just see a
>>> >> > huge
>>> >> > spike in the number of rows being deleted.
>>> >> >
>>> >> >>
>>> >> >> > I was planning on having my puppet manifest,
for the puppetdb
>>> >> >> > servers,
>>> >> >> > elect
>>> >> >> > one or two master puppetdb servers by
querying for active
>>> >> >> > puppetdb
>>> >> >> > servers.
>>> >> >> > So I would be dynamically electing the
master(s) based on an
>>> >> >> > algorithm.
>>> >> >>
>>> >> >> Sounds good to me. Do you want to raise a feature
request for this?
>>> >> >
>>> >> >
>>> >> > Yeah I can do that.
>>> >> >
>>> >> >>
>>> >> >> http://projects.puppetlabs.com/projects/puppetdb
>>> >> >>
>>> >> >> Just more or less cover what you''ve said
about wanting to disable
>>> >> >> by
>>> >> >> setting to zero. I''ve already more or
less got a patch that can
>>> >> >> disable it when zero (its really just a
conditional clause in the
>>> >> >> right place), but I haven''t done any
thorough testing or gotten
>>> >> >> review
>>> >> >> on it yet.
>>> >> >>
>>> >> >> ken.
>>> >> >
>>> >> > --
>>> >> > You received this message because you are subscribed
to the Google
>>> >> > Groups
>>> >> > "Puppet Users" group.
>>> >> > To view this discussion on the web visit
>>> >> >
https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
>>> >> >
>>> >> > To post to this group, send email to
puppet...@googlegroups.com.
>>> >> > To unsubscribe from this group, send email to
>>> >> > puppet-users...@googlegroups.com.
>>> >> > For more options, visit this group at
>>> >> > http://groups.google.com/group/puppet-users?hl=en.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the
Google
>>> > Groups
>>> > "Puppet Users" group.
>>> > To view this discussion on the web visit
>>> > https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ.
>>> >
>>> > To post to this group, send email to
puppet...@googlegroups.com.
>>> > To unsubscribe from this group, send email to
>>> > puppet-users...@googlegroups.com.
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/puppet-users?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-17 21:04 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

On Thursday, January 17, 2013 11:17:27 AM UTC-6, Ken Barber
wrote:>
> Hmm. I wonder if this is an artifact of the query for the delete 
> taking too long, or the fact that there the query for the delete is 
> taking too long. It would seem odd to me if there were a lot of 
> deletes, this implies a lot of orphans which shouldn''t be a
regular
> thing, unless this points to an orphan problem that we need to 
> address. 
>
> What are the results of these queries? 
>
> select count(*) from catalogs where not exists (SELECT * FROM 
> certname_catalogs cc WHERE cc.catalog=catalogs.hash); 
>
> puppet=# select count(*) from puppet.catalogs where not exists (SELECT * FROM puppet.certname_catalogs cc WHERE cc.catalog=catalogs.hash); 
 count 
-------
    18
(1 row)
 
> select count(*) from resource_params where not exists(SELECT * FROM 
> catalog_resources cr WHERE cr.resource=resource_params.resource); 
>puppet=# select count(*) from puppet.resource_params where not 
exists(SELECT * FROM puppet.catalog_resources cr WHERE 
cr.resource=resource_params.resource); 
 count 
-------
     0
(1 row)
 
> explain analyze verbose select count(*) from catalogs where not exists 
> (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); 
>
> puppet=# explain analyze verbose select count(*) from puppet.catalogs where not exists 
puppet-# (SELECT * FROM puppet.certname_catalogs cc WHERE 
cc.catalog=catalogs.hash); 
                                                                     QUERY 
PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2920.43..2920.44 rows=1 width=0) (actual 
time=63.301..63.302 rows=1 loops=1)
   Output: count(*)
   ->  Hash Anti Join  (cost=1381.86..2899.52 rows=8365 width=0) (actual 
time=29.369..63.285 rows=18 loops=1)
         Hash Cond: ((catalogs.hash)::text = (cc.catalog)::text)
         ->  Seq Scan on puppet.catalogs  (cost=0.00..1219.70 rows=29870 
width=41) (actual time=0.022..10.124 rows=21523 loops=1)
               Output: catalogs.hash, catalogs.api_version, 
catalogs.catalog_version, catalogs.mmr_resend_indicator
         ->  Hash  (cost=1113.05..1113.05 rows=21505 width=41) (actual 
time=26.394..26.394 rows=21505 loops=1)
               Output: cc.catalog
               Buckets: 4096  Batches: 1  Memory Usage: 1534kB
               ->  Seq Scan on puppet.certname_catalogs cc 
 (cost=0.00..1113.05 rows=21505 width=41) (actual time=0.016..12.341 
rows=21505 loops=1)
                     Output: cc.catalog
 Total runtime: 63.380 ms
(12 rows)
 
> explain analyze verbose select count(*) from resource_params where not 
> exists(SELECT * FROM catalog_resources cr WHERE 
> cr.resource=resource_params.resource); 
>
> puppet=# explain analyze verbose select count(*) from puppet.resource_params where not 
puppet-# exists(SELECT * FROM puppet.catalog_resources cr WHERE 
puppet(# cr.resource=puppet.resource_params.resource);
                                                                            
         QUERY PLAN                                                         
                      
       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Aggregate  (cost=1257861.13..1257861.14 rows=1 width=0) (actual 
time=18486.103..18486.103 rows=1 loops=1)
   Output: count(*)
   ->  Nested Loop Anti Join  (cost=0.00..1254977.05 rows=1153633 width=0) 
(actual time=18486.098..18486.098 rows=0 loops=1)
         ->  Seq Scan on puppet.resource_params  (cost=0.00..32052.34 
rows=1171134 width=41) (actual time=0.005..264.081 rows=1166567 loops=1)
               Output: resource_params.resource, resource_params.name, 
resource_params.value, resource_params.mmr_resend_indicator
         ->  Index Scan using idx_catalog_resources_resource on 
puppet.catalog_resources cr  (cost=0.00..3196.19 rows=3116 width=41) 
(actual time=0.015..0.015 rows=1 loops=11
66567)
               Output: cr.catalog, cr.resource, cr.type, cr.title, cr.tags, 
cr.exported, cr.sourcefile, cr.sourceline, cr.mmr_resend_indicator
               Index Cond: ((cr.resource)::text = 
(resource_params.resource)::text)
 Total runtime: 18486.163 ms
(9 rows)
 
> And if you run the queries without the explain, do you see any results 
> come back at any point in time? This might help us understand how much 
> is pending deletion. 
>
> puppet=# select count(*) from puppet.catalogs where not exists puppet-# (SELECT * FROM puppet.certname_catalogs cc WHERE 
cc.catalog=catalogs.hash); 
 count 
-------
    20
(1 row)

puppet=# select count(*) from puppet.resource_params where not 
puppet-# exists(SELECT * FROM puppet.catalog_resources cr WHERE 
puppet(# cr.resource=puppet.resource_params.resource);
 count 
-------
     0
(1 row)
 
> As far as I can see all the columns in those queries have 
> corresponding indexes. So I would be surprised if its an inefficient 
> SQL query. 
>
> As far as the DELETE locks, I starts to make me wonder how much of 
> this load is a side-effect of this problem reaching a threshold 
> whereby contention on the DB is high and now we''re getting a
spiral of
> backlogging causing the overall times to slow down. With 8 nodes, and 
> a delete taking 6 minutes or so ... thats 48 minutes of gc. 
>
> Almost makes you want a way to run the GC on one node only to remove 
> the contention so you can analyse the problem properly ;-). I presume 
> this is production, and you can''t just somehow make that happen
can
> you? ie. get only 1 node running. Maybe making the GC on a few nodes 
> excessive (like 24 hours instead of 60 minutes) might emulate this 
> effect without us having to patch. 
>
> Correct.  I don''t have a way to reduce this down to one node due
to itbeing production.
 
> I''d be interested in seeing how this affects your SQL times in the
> logs when only 1 node is running a GC at a time instead of 8, just to 
> remove contention from the equation. 
>
> ken. 
>
> On Wed, Jan 16, 2013 at 2:33 PM, Chuck <css...@gmail.com
<javascript:>>
> wrote: 
> > Here is an example of locks taking a long time to acquire in the 
> postgres DB 
> > 
> > 2013-01-12 03:10:55 UTC|puppetdb|4553|DELETE waiting|IPADDRESS|puppet 
> LOG: 
> > process 4553 acquired ExclusiveLock on tuple (0,102) of relation 16811
> of 
> > database 16513 after 12166509.285 ms 
> > 2013-01-12 03:10:56 UTC|puppetdb|4553|DELETE waiting| IPADDRESS
|puppet
> LOG: 
> > process 4553 still waiting for ShareLock on transaction 32181035 after
> > 1000.100 ms 
> > 2013-01-12 03:11:25 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS
|puppet
> LOG: 
> > process 5953 still waiting for ShareLock on transaction 32245132 after
> > 1000.148 ms 
> > 2013-01-12 03:11:26 UTC|puppetdb|5953|UPDATE waiting| IPADDRESS
|puppet
> LOG: 
> > process 5953 acquired ShareLock on transaction 32245132 after 1813.791
> ms 
> > 2013-01-12 03:13:48 UTC|puppetdb|4553|DELETE waiting| IPADDRESS
|puppet
> LOG: 
> > process 4553 acquired ShareLock on transaction 32181035 after
172731.472
> ms 
> > 2013-01-12 03:13:48 UTC|puppetdb|3346|DELETE waiting| IPADDRESS
|puppet
> LOG: 
> > process 3346 acquired ExclusiveLock on tuple (0,102) of relation 16811
> of 
> > database 16513 after 12261437.817 ms 
> > 2013-01-12 03:13:49 UTC|puppetdb|3346|DELETE waiting| IPADDRESS
|puppet
> LOG: 
> > process 3346 still waiting for ShareLock on transaction 32196872 after
> > 1000.516 ms 
> > 2013-01-12 03:18:08 UTC|puppetdb|3346|DELETE waiting| IPADDRESS
|puppet
> LOG: 
> > process 3346 acquired ShareLock on transaction 32196872 after
260360.667
> ms 
> > 2013-01-12 03:18:08 UTC|puppetdb|27293|DELETE waiting| IPADDRESS
|puppet
> > LOG:  process 27293 acquired ExclusiveLock on tuple (0,102) of
relation
> > 16811 of database 16513 after 12475027.712 ms 
> > 2013-01-12 03:18:09 UTC|puppetdb|27293|DELETE waiting| IPADDRESS
|puppet
> > LOG:  process 27293 still waiting for ShareLock on transaction
32197208
> > after 1000.078 ms 
> > 2013-01-12 03:18:33 UTC|puppetdb|17853|DELETE waiting| IPADDRESS
|puppet
> > LOG:  process 17853 still waiting for ExclusiveLock on tuple (0,102)
of
> > relation 16811 of database 16513 after 1000.212 ms 
> > 2013-01-12 03:19:49 UTC|puppetdb|20980|DELETE waiting| IPADDRESS
|puppet
> > LOG:  process 20980 still waiting for ExclusiveLock on tuple (0,102)
of
> > relation 16811 of database 16513 after 1000.733 ms 
> > 2013-01-12 03:20:37 UTC|puppetdb|28432|DELETE waiting| IPADDRESS
|puppet
> > LOG:  process 28432 still waiting for ExclusiveLock on tuple (0,102)
of
> > relation 16811 of database 16513 after 1000.283 ms 
> > 
> > On Tuesday, January 15, 2013 4:07:11 PM UTC-6, Chuck wrote: 
> >> 
> >> 
> >> 
> >> On Tuesday, January 15, 2013 3:19:29 PM UTC-6, Ken Barber wrote: 
> >>> 
> >>> So that looks like its taking ages ... some questions for you:
> >>> 
> >>> * What version of Postgresql are you running and on what
distro &
> >>> version exactly? 
> >> 
> >> RHEL 6.3 
> >> Enterprise DB ppas-9.1.4 
> >> 
> >>> * Are you using the distros packages or someone elses? 
> >> 
> >> Provided by Enterprise DB 
> >> 
> >>> * Is autovacuum switched on at all (check your
postgresql.conf)?
> >> 
> >> Yes. We also ran manual vacuum analyze with no noticeable changes
in
> >> performance. 
> >>> 
> >>> * Do you know the last time the database was vacuum''d
and/or analyzed?
> >>> Try this link for a way to check this: 
> >>> 
> >>> 
> http://heatware.net/databases/how-when-postgres-tables-auto-vacuum-analyze/
> >> 
> >> puppet=# select relname,last_vacuum, last_autovacuum,
last_analyze,
> >> last_autoanalyze from pg_stat_user_tables; 
> >>          relname         |          last_vacuum          | 
> >> last_autovacuum        |         last_analyze          | 
> >> last_autoanalyze 
> >> 
> >> 
>
-------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
>
> >>  catalog_resources       |                               |
2013-01-12
> >> 15:16:10.188342+00 | 2013-01-09 23:26:34.068919+00 | 2013-01-15 
> >> 18:27:11.166982+00 
> >>  certname_facts_metadata |                               |
2013-01-15
> >> 19:22:13.932096+00 | 2012-11-06 22:02:25.469113+00 | 2013-01-15 
> >> 19:22:15.308925+00 
> >>  certnames               | 2013-01-03 22:22:11.416002+00 |
2013-01-15
> >> 21:17:47.046022+00 | 2013-01-03 22:22:11.574618+00 | 2013-01-15 
> >> 21:17:48.14814+00 
> >>  resource_params         |                               |
2013-01-05
> >> 08:04:04.856417+00 | 2012-11-06 21:25:40.15877+00  | 2013-01-15 
> >> 15:56:52.589316+00 
> >>  classes                 |                               |
2013-01-12
> >> 09:29:53.118022+00 | 2012-11-06 22:02:40.547686+00 | 2013-01-15 
> >> 16:24:04.909758+00 
> >>  catalogs                |                               |
2013-01-15
> >> 21:18:15.342477+00 | 2012-11-06 22:02:25.182041+00 | 2013-01-15 
> >> 21:18:16.493626+00 
> >>  certname_facts          |                               |
2013-01-15
> >> 21:17:39.514162+00 | 2013-01-09 18:00:16.10573+00  | 2013-01-15 
> >> 21:25:08.780617+00 
> >>  tags                    |                               |
2013-01-15
> >> 21:44:16.074229+00 | 2012-11-06 22:02:24.75951+00  | 2013-01-15 
> >> 21:48:19.751198+00 
> >>  edges                   |                               |
2013-01-12
> >> 11:54:54.182207+00 | 2012-11-06 21:33:24.162324+00 | 2013-01-15 
> >> 17:27:19.061491+00 
> >>  certname_catalogs       |                               |
2013-01-15
> >> 21:18:54.227264+00 | 2013-01-09 23:26:38.684536+00 | 2013-01-15 
> >> 21:18:55.369032+00 
> >>  schema_migrations       |                               | 
> >> | 2012-11-06 22:02:40.637773+00 | 
> >> (11 rows) 
> >> 
> >>> 
> >>> * Can you get me some box metrics, memory size, cpu
size/number, disk
> >>> spindle speeds etc. of your postgresql box? So I can get an
idea of
> >>> the throughput available to you. 
> >> 
> >> It is a VMware VM 
> >> the VM has the following properties: 
> >> 8 CPU 
> >> 16 GB of memory 
> >> Attached to an Enterprise class SAN 
> >> 
> >> The database is on dev253-12 
> >> 
> >> SAR DISK output 
> >> 09:40:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz 
avgqu-sz
> >> await     svctm     %util 
> >> 09:50:01 PM    dev8-0      2.28     33.73     21.09     24.03     
0.01
> >> 2.23      1.38      0.32 
> >> 09:50:01 PM   dev8-16      2.24      0.23   1501.34    670.68     
0.03
> >> 11.20      1.50      0.34 
> >> 09:50:01 PM   dev8-32      0.57      0.01      7.28     12.68     
0.00
> >> 3.42      3.11      0.18 
> >> 09:50:01 PM   dev8-48     10.07      0.53   1591.25    158.02     
0.02
> >> 1.74      1.64      1.66 
> >> 09:50:01 PM   dev8-64    354.77  16164.46   2231.28     51.85     
2.80
> >> 7.89      1.81     64.30 
> >> 09:50:01 PM  dev253-0      0.78     33.70      0.62     43.93     
0.00
> >> 3.91      2.47      0.19 
> >> 09:50:01 PM  dev253-1      0.00      0.00      0.00      0.00     
0.00
> >> 0.00      0.00      0.00 
> >> 09:50:01 PM  dev253-2      0.00      0.00      0.00      0.00     
0.00
> >> 0.00      0.00      0.00 
> >> 09:50:01 PM  dev253-3      1.67      0.05     13.28      8.01     
0.00
> >> 1.21      0.38      0.06 
> >> 09:50:01 PM  dev253-4      0.38      0.00      3.06      8.00     
0.00
> >> 1.64      0.72      0.03 
> >> 09:50:01 PM  dev253-5      0.43      0.00      3.45      8.00     
0.00
> >> 1.28      0.66      0.03 
> >> 09:50:01 PM  dev253-6      0.00      0.00      0.00      0.00     
0.00
> >> 0.00      0.00      0.00 
> >> 09:50:01 PM  dev253-7      0.09      0.04      0.68      8.00     
0.00
> >> 0.72      0.45      0.00 
> >> 09:50:01 PM  dev253-8      0.00      0.00      0.00      0.00     
0.00
> >> 0.00      0.00      0.00 
> >> 09:50:01 PM  dev253-9      0.00      0.00      0.00      0.00     
0.00
> >> 0.00      0.00      0.00 
> >> 09:50:01 PM dev253-10    187.70      0.23   1501.34      8.00     
2.62
> >> 13.95      0.02      0.34 
> >> 09:50:01 PM dev253-11    198.97      0.53   1591.25      8.00     
0.53
> >> 2.67      0.08      1.66 
> >> 09:50:01 PM dev253-12    534.17  16164.46   2231.28     34.44     
4.24
> >> 7.93      1.20     64.32 
> >> 09:50:01 PM dev253-13      0.91      0.01      7.28      8.00     
0.00
> >> 5.24      1.96      0.18 
> >> Average:       dev8-0      3.21     73.51     24.19     30.43     
0.02
> >> 7.48      4.18      1.34 
> >> Average:      dev8-16      3.36      1.14   2568.18    763.92     
0.08
> >> 22.78      1.09      0.37 
> >> Average:      dev8-32      0.66      0.97      7.42     12.74     
0.00
> >> 1.81      1.55      0.10 
> >> Average:      dev8-48      9.95      1.60   1373.40    138.23     
0.01
> >> 1.51      1.42      1.41 
> >> Average:      dev8-64    140.65   6834.41   2298.08     64.93     
1.19
> >> 8.49      1.82     25.66 
> >> Average:     dev253-0      1.34     45.41      1.08     34.72     
0.02
> >> 15.92      5.52      0.74 
> >> Average:     dev253-1      0.03      0.20      0.07      8.00     
0.00
> >> 2.18      0.67      0.00 
> >> Average:     dev253-2      0.04      0.23      0.08      8.00     
0.00
> >> 10.40      2.11      0.01 
> >> Average:     dev253-3      1.89      9.17     13.84     12.20     
0.01
> >> 3.85      1.67      0.31 
> >> Average:     dev253-4      0.46      0.21      3.43      8.00     
0.00
> >> 2.26      0.91      0.04 
> >> Average:     dev253-5      0.81     16.84      4.94     26.88     
0.01
> >> 7.93      2.83      0.23 
> >> Average:     dev253-6      0.02      0.19      0.00      8.00     
0.00
> >> 0.34      0.32      0.00 
> >> Average:     dev253-7      0.12      0.23      0.74      8.10     
0.00
> >> 1.41      0.99      0.01 
> >> Average:     dev253-8      0.02      0.19      0.00      8.00     
0.00
> >> 0.33      0.32      0.00 
> >> Average:     dev253-9      0.04      0.36      0.00      8.00     
0.00
> >> 0.52      0.52      0.00 
> >> Average:    dev253-10    321.11      0.74   2568.18      8.00     
9.20
> >> 28.65      0.01      0.36 
> >> Average:    dev253-11    171.74      1.20   1373.40      8.00     
0.43
> >> 2.53      0.08      1.42 
> >> Average:    dev253-12    339.64   6834.01   2298.08     26.89     
3.01
> >> 8.84      0.76     25.66 
> >> Average:    dev253-13      0.98      0.57      7.42      8.19     
0.00
> >> 2.26      1.01      0.10 
> >> 
> >>> * Maybe any information about the postgresql tuning items - 
> >>> shared_buffers is probably a good start, or have you left the
tuning
> >>> alone? 
> >> 
> >> shared_buffers = 4096MB 
> >> work_mem = 8MB 
> >> maintenance_work_mem = 32MB 
> >>> 
> >>> 
> >>> You probably don''t need debug with
log-slow-statements, I wasn''t aware
> >>> that was an option but I think that gives a good amount of
data for
> >>> this problem. 
> >>> 
> >>> On Tue, Jan 15, 2013 at 8:24 PM, Chuck
<css...@gmail.com> wrote:
> >>> > I will work on setting up a puppetdb server I can put
into debug
> mode. 
> >>> > For 
> >>> > now here is a "normal" GC run log output and my
configuration files.
> >>> > 
> >>> > Thanks, 
> >>> > 
> >>> > 
> >>> > 2013-01-15 19:20:57,767 INFO 
[clojure-agent-send-off-pool-12]
> >>> > [cli.services] Starting database garbage collection 
> >>> > 2013-01-15 19:26:40,383 WARN 
[clojure-agent-send-off-pool-12]
> >>> > [jdbc.internal] Query slower than 10s threshold:  actual
execution
> >>> > time: 
> >>> > 342.6148 seconds; Query: DELETE FROM catalogs WHERE NOT
EXISTS
> (SELECT 
> >>> > * 
> >>> > FROM certname_catalogs cc WHERE
cc.catalog=catalogs.hash); Query
> >>> > Params: 
> >>> > 2013-01-15 19:33:29,304 WARN 
[clojure-agent-send-off-pool-12]
> >>> > [jdbc.internal] Query slower than 10s threshold:  actual
execution
> >>> > time: 
> >>> > 408.9210 seconds; Query: DELETE FROM resource_params
WHERE NOT
> EXISTS 
> >>> > (SELECT * FROM catalog_resources cr WHERE 
> >>> > cr.resource=resource_params.resource); Query Params: 
> >>> > 2013-01-15 19:33:29,308 INFO 
[clojure-agent-send-off-pool-12]
> >>> > [cli.services] Finished database garbage collection 
> >>> > 2013-01-15 19:33:29,308 INFO 
[clojure-agent-send-off-pool-12]
> >>> > [cli.services] Starting sweep of stale nodes (1 day
threshold)
> >>> > 2013-01-15 19:33:29,390 INFO 
[clojure-agent-send-off-pool-12]
> >>> > [cli.services] Finished sweep of stale nodes (1 day
threshold)
> >>> > 
> >>> >
##########################################################
> >>> > # 
> >>> > #   config.ini 
> >>> > # 
> >>> > 
> >>> > # See README.md for more thorough explanations of each
section and
> >>> > # option. 
> >>> > 
> >>> > [global] 
> >>> > # Store mq/db data in a custom directory 
> >>> > vardir = /var/lib/puppetdb 
> >>> > # Use an external log4j config file 
> >>> > logging-config = /etc/puppetdb/conf.d/../log4j.properties
> >>> > 
> >>> > # Maximum number of results that a resource query may
return
> >>> > resource-query-limit = 20000 
> >>> > 
> >>> > [command-processing] 
> >>> > # How many command-processing threads to use, defaults to
(CPUs / 2)
> >>> > threads = 10 
> >>> > 
> >>> >
##########################################################
> >>> > # 
> >>> > # database.ini 
> >>> > # 
> >>> > 
> >>> > [database] 
> >>> > # For the embedded DB: org.hsqldb.jdbcDriver 
> >>> > # For PostgreSQL: org.postgresql.Driver 
> >>> > # Defaults to embedded DB 
> >>> > classname = org.postgresql.Driver 
> >>> > 
> >>> > # For the embedded DB: hsqldb 
> >>> > # For PostgreSQL: postgresql 
> >>> > # Defaults to embedded DB 
> >>> > subprotocol = postgresql 
> >>> > 
> >>> > # For the embedded DB: 
> >>> > file:/path/to/database;hsqldb.tx=mvcc;sql.syntax_pgs=true
> >>> > # For PostgreSQL: //host:port/databaseName 
> >>> > # Defaults to embedded DB located in <vardir>/db 
> >>> > subname = //PUPPETDB:5432/puppet 
> >>> > 
> >>> > # Connect as a specific user 
> >>> > username = PUPPETDBUSER 
> >>> > 
> >>> > # Use a specific password 
> >>> > password = PASSWORDPUPPETDB 
> >>> > 
> >>> > # How often (in minutes) to compact the database 
> >>> > # gc-interval = 60 
> >>> > 
> >>> > # Number of seconds before any SQL query is considered
''slow'';
> >>> > offending 
> >>> > # queries will not be interrupted, but will be logged at
the WARN
> log 
> >>> > level. 
> >>> > log-slow-statements = 10 
> >>> > 
> >>> > # How old (in days) to deactivate nodes that have not
reported
> >>> > node-ttl-days = 1 
> >>> > 
> >>> > # How often to send a keepalive 
> >>> > conn-keep-alive = 4 
> >>> > 
> >>> > # How long to keep connections open 
> >>> > conn-max-age = 30 
> >>> > 
> >>> > 
> >>> > 
> >>> > On Tuesday, January 15, 2013 1:27:58 PM UTC-6, Ken Barber
wrote:
> >>> >> 
> >>> >> Hey Chuck, 
> >>> >> 
> >>> >> I''ve had a chat with my colleagues and they
raised some concerns
> about 
> >>> >> your gc performance. I wouldn''t mind
drilling into that problem if
> >>> >> thats okay with you? 
> >>> >> 
> >>> >> For starters - what are your settings looking like?
In particular
> I''m 
> >>> >> interested in node-ttl-seconds and
report-ttl-seconds.
> >>> >> 
> >>> >> FYI there are a few actions that occur when gc kicks
in:
> >>> >> 
> >>> >> * sweeping stale nodes 
> >>> >> * sweeping stale reports 
> >>> >> * removing any unassociated data (catalogs or params)
> >>> >> 
> >>> >> It would be useful to find out which of these are
taking up the
> most 
> >>> >> amount of your time. 
> >>> >> 
> >>> >> I wouldn''t mind a dump of your logs when in
a specific debug mode
> if 
> >>> >> thats at all possible, this should give me a rough
idea of the time
> >>> >> the sweeper spends doing its various deletes so I can
at least get
> a 
> >>> >> focus on your issue. In particular the bits between
''Starting
> database 
> >>> >> garbage collection'' and ''Finished
database garbage collection''.
> >>> >> 
> >>> >> The easiest way to get some debugging is to modify
your
> >>> >> log4j.properties (usually in /etc/puppetdb) and add
the line:
> >>> >> 
> >>> >> log4j.logger.com.jolbox.bonecp=DEBUG 
> >>> >> 
> >>> >> This will log *ALL* SQL statements, so it might be
heavy - I''m not
> >>> >> sure if you can isolate one of your nodes for this
purpose or
> somehow 
> >>> >> mitigate this risk ... we only need a heavy GC run,
it probably
> >>> >> doesn''t need to be actively receiving
requests and such ...
> >>> >> 
> >>> >> ken. 
> >>> >> 
> >>> >> On Tue, Jan 15, 2013 at 4:12 PM, Chuck
<css...@gmail.com> wrote:
> >>> >> > 
> >>> >> > 
> >>> >> > On Tuesday, January 15, 2013 9:55:48 AM UTC-6,
Ken Barber wrote:
> >>> >> >> 
> >>> >> >> > Well I currently have 7,000 nodes in my
PuppetDB instance and
> 8 
> >>> >> >> > puppetdb 
> >>> >> >> > servers in various geographic
locations.   The garbage collect
> >>> >> >> > seems 
> >>> >> >> > to 
> >>> >> >> > be a 
> >>> >> >> > pretty intensive operation on the
Postgres DB server in the
> >>> >> >> > current 
> >>> >> >> > design. 
> >>> >> >> 
> >>> >> >> Thats a shame to hear. What''s the
impact for you? I see from
> your 
> >>> >> >> screenshot its averaging 16 seconds, is that
what you are
> seeing? 
> >>> >> > 
> >>> >> > 
> >>> >> > This is normally what we see 8 times of course. 
> >>> >> > 
> >>> >> >> 
> >>> >> >> > We have also been having an issue that
started at the
> beginning 
> >>> >> >> > of 
> >>> >> >> > 2013 
> >>> >> >> > that 
> >>> >> >> > we have not been able to narrow down at
this time (we were on
> >>> >> >> > puppetdb 
> >>> >> >> > 1.0.1 
> >>> >> >> > for a long time with no issues.  
Updated to 1.0.5 last week
> with 
> >>> >> >> > no 
> >>> >> >> > improvement) 
> >>> >> >> 
> >>> >> >> This issue, is it related to the gc problem
or something else?
> Can 
> >>> >> >> you 
> >>> >> >> elaborate? 
> >>> >> >> 
> >>> >> > We are still trying to determine what is
happening.  We just see
> a 
> >>> >> > huge 
> >>> >> > spike in the number of rows being deleted. 
> >>> >> > 
> >>> >> >> 
> >>> >> >> > I was planning on having my puppet
manifest, for the puppetdb
> >>> >> >> > servers, 
> >>> >> >> > elect 
> >>> >> >> > one or two master puppetdb servers by
querying for active
> >>> >> >> > puppetdb 
> >>> >> >> > servers. 
> >>> >> >> > So I would be dynamically electing the
master(s) based on an
> >>> >> >> > algorithm. 
> >>> >> >> 
> >>> >> >> Sounds good to me. Do you want to raise a
feature request for
> this? 
> >>> >> > 
> >>> >> > 
> >>> >> > Yeah I can do that. 
> >>> >> > 
> >>> >> >> 
> >>> >> >>
http://projects.puppetlabs.com/projects/puppetdb
> >>> >> >> 
> >>> >> >> Just more or less cover what you''ve
said about wanting to
> disable 
> >>> >> >> by 
> >>> >> >> setting to zero. I''ve already more
or less got a patch that can
> >>> >> >> disable it when zero (its really just a
conditional clause in
> the 
> >>> >> >> right place), but I haven''t done
any thorough testing or gotten
> >>> >> >> review 
> >>> >> >> on it yet. 
> >>> >> >> 
> >>> >> >> ken. 
> >>> >> > 
> >>> >> > -- 
> >>> >> > You received this message because you are
subscribed to the
> Google 
> >>> >> > Groups 
> >>> >> > "Puppet Users" group. 
> >>> >> > To view this discussion on the web visit 
> >>> >> >
https://groups.google.com/d/msg/puppet-users/-/OkVPHnluxpcJ.
> >>> >> > 
> >>> >> > To post to this group, send email to
puppet...@googlegroups.com.
> >>> >> > To unsubscribe from this group, send email to 
> >>> >> > puppet-users...@googlegroups.com. 
> >>> >> > For more options, visit this group at 
> >>> >> >
http://groups.google.com/group/puppet-users?hl=en.
> >>> > 
> >>> > -- 
> >>> > You received this message because you are subscribed to
the Google
> >>> > Groups 
> >>> > "Puppet Users" group. 
> >>> > To view this discussion on the web visit 
> >>> >
https://groups.google.com/d/msg/puppet-users/-/vuZyS8chcXYJ.
> >>> > 
> >>> > To post to this group, send email to
puppet...@googlegroups.com.
> >>> > To unsubscribe from this group, send email to 
> >>> > puppet-users...@googlegroups.com. 
> >>> > For more options, visit this group at 
> >>> > http://groups.google.com/group/puppet-users?hl=en. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Users" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/puppet-users/-/jnWvUJepjzYJ. 
> > 
> > To post to this group, send email to
puppet...@googlegroups.com<javascript:>.
>
> > To unsubscribe from this group, send email to 
> > puppet-users...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/puppet-users?hl=en. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/vqcNuCBG3_kJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-18 00:28 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

This is unconfirmed at this point. 

It seems like every 4 days starting Jan 4. 2013 (interesting is started in 
2013 and never happened in 2012) like the Catalog or Resource Hash changes, 
causing the entire catalog_resource table to insert new entries, then 
possibly GC deletes the old entries.

If this holds true we will see this behavior again on Jan. 20, 2013. 
 Hopefully we will have a better idea of what is going on.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Ken Barber

2013-Jan-22 19:27 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Does this happen across all nodes? This is an indication you might
have a resource that affects a large set of nodes that suddenly
changes every 4 days.

In the catalogs table, the ''hash'' is just a hash of the
catalogue
data, if anything in the catalogue changes - it changes. And new
entries are created. The database garbage collection is there to
remove the orphaned entries. So really one could consider this part of
normal operation, if your catalogues are constantly changing then the
garbage collection runs are bigger.

Did you see this happen on Sunday?

On Fri, Jan 18, 2013 at 12:28 AM, Chuck <csschwe@gmail.com>
wrote:> This is unconfirmed at this point.
>
> It seems like every 4 days starting Jan 4. 2013 (interesting is started in
> 2013 and never happened in 2012) like the Catalog or Resource Hash changes,
> causing the entire catalog_resource table to insert new entries, then
> possibly GC deletes the old entries.
>
> If this holds true we will see this behavior again on Jan. 20, 2013.
> Hopefully we will have a better idea of what is going on.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-22 19:40 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

We didn''t notice anything on Sunday.  We have a decent number of
resources
that effect all nodes.  This may explain the ocasional performance issues. 
  We have also been messing around with the Indexes as they were getting 
HUGE.  We also started promoting our Puppet changes on a schedule so we 
effect more nodes at a time than we have in the past.

On Tuesday, January 22, 2013 1:27:15 PM UTC-6, Ken Barber
wrote:>
> Does this happen across all nodes? This is an indication you might 
> have a resource that affects a large set of nodes that suddenly 
> changes every 4 days. 
>
> In the catalogs table, the ''hash'' is just a hash of the
catalogue
> data, if anything in the catalogue changes - it changes. And new 
> entries are created. The database garbage collection is there to 
> remove the orphaned entries. So really one could consider this part of 
> normal operation, if your catalogues are constantly changing then the 
> garbage collection runs are bigger. 
>
> Did you see this happen on Sunday? 
>
> On Fri, Jan 18, 2013 at 12:28 AM, Chuck <css...@gmail.com
<javascript:>>
> wrote: 
> > This is unconfirmed at this point. 
> > 
> > It seems like every 4 days starting Jan 4. 2013 (interesting is
started
> in 
> > 2013 and never happened in 2012) like the Catalog or Resource Hash 
> changes, 
> > causing the entire catalog_resource table to insert new entries, then 
> > possibly GC deletes the old entries. 
> > 
> > If this holds true we will see this behavior again on Jan. 20, 2013. 
> > Hopefully we will have a better idea of what is going on. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Puppet Users" group. 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ. 
> > 
> > To post to this group, send email to
puppet...@googlegroups.com<javascript:>.
>
> > To unsubscribe from this group, send email to 
> > puppet-users...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/puppet-users?hl=en. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/38QnH1c20UcJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Erik Dalén

2013-Jan-23 08:46 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

One of the largest indexes was not needed and removed in the latest version
of puppetdb. So you might want to try out that version to reduce the index
sizes.
On Jan 22, 2013 8:41 PM, "Chuck" <csschwe@gmail.com> wrote:
> We didn''t notice anything on Sunday.  We have a decent number of
resources
> that effect all nodes.  This may explain the ocasional performance issues.
>   We have also been messing around with the Indexes as they were getting
> HUGE.  We also started promoting our Puppet changes on a schedule so we
> effect more nodes at a time than we have in the past.
>
> On Tuesday, January 22, 2013 1:27:15 PM UTC-6, Ken Barber wrote:
>>
>> Does this happen across all nodes? This is an indication you might
>> have a resource that affects a large set of nodes that suddenly
>> changes every 4 days.
>>
>> In the catalogs table, the ''hash'' is just a hash of
the catalogue
>> data, if anything in the catalogue changes - it changes. And new
>> entries are created. The database garbage collection is there to
>> remove the orphaned entries. So really one could consider this part of
>> normal operation, if your catalogues are constantly changing then the
>> garbage collection runs are bigger.
>>
>> Did you see this happen on Sunday?
>>
>> On Fri, Jan 18, 2013 at 12:28 AM, Chuck <css...@gmail.com> wrote:
>> > This is unconfirmed at this point.
>> >
>> > It seems like every 4 days starting Jan 4. 2013 (interesting is
started
>> in
>> > 2013 and never happened in 2012) like the Catalog or Resource Hash
>> changes,
>> > causing the entire catalog_resource table to insert new entries,
then
>> > possibly GC deletes the old entries.
>> >
>> > If this holds true we will see this behavior again on Jan. 20,
2013.
>> > Hopefully we will have a better idea of what is going on.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Puppet Users" group.
>> > To view this discussion on the web visit
>> >
https://groups.google.com/d/**msg/puppet-users/-/**APtEGbv578QJ<https://groups.google.com/d/msg/puppet-users/-/APtEGbv578QJ>.
>>
>> >
>> > To post to this group, send email to puppet...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > puppet-users...@**googlegroups.com.
>> > For more options, visit this group at
>> >
http://groups.google.com/**group/puppet-users?hl=en<http://groups.google.com/group/puppet-users?hl=en>.
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-users/-/38QnH1c20UcJ.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Chuck

2013-Jan-23 19:47 UTC

head link

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

On Wednesday, January 23, 2013 8:20:34 AM UTC-6, Ken Barber
wrote:>
> > We didn''t notice anything on Sunday.  We have a decent number
of
> resources 
> > that effect all nodes.  This may explain the ocasional performance 
> issues. 
>
> Sure, more specifically you''ll get catalog replaces in the
database if
> you have resources that are always ''changing''. This might
be a dynamic
> parameter to a resource, a dynamic title or alias. The hash
doesn''t
> care, any change means that hash is different, thus things become 
> candidates for garbage collection. 
>
>  Yes this does seem to be an issue since we are updating 7k - 8k nodes over 
the course of an hour.  We don''t really have any dynamic catalogs, but
we
are updating our code in ways that does create the new catalog hashes.
 
> > We have also been messing around with the Indexes as they were getting
> HUGE. 
>
> What did you do to the indexes Chuck? Did you rebuild them, if so - 
> which ones - and do you think it helped? 
> 
Still collecting information on this. 

> Also, I''ve realised my mistake in asking for explain plans for the
> select queries I gave you. It didn''t take into account the
cascades
> during delete, better that I had asked you for explain plans on the 
> deletes instead: 
>
> explain analyze verbose DELETE FROM catalogs WHERE NOT EXISTS (SELECT 
> * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash); 
>
puppet=# explain analyze verbose DELETE FROM puppet.catalogs WHERE NOT 
EXISTS (SELECT 
puppet(# * FROM puppet.certname_catalogs cc WHERE 
cc.catalog=catalogs.hash); 
                                                                   QUERY 
PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on puppet.catalogs  (cost=764.75..1501.94 rows=985 width=12) 
(actual time=226.633..226.633 rows=0 loops=1)
   ->  Hash Anti Join  (cost=764.75..1501.94 rows=985 width=12) (actual 
time=41.310..100.652 rows=4326 loops=1)
         Output: catalogs.ctid, cc.ctid
         Hash Cond: ((catalogs.hash)::text = (cc.catalog)::text)
         ->  Seq Scan on puppet.catalogs  (cost=0.00..637.63 rows=8663 
width=47) (actual time=0.386..46.478 rows=12006 loops=1)
               Output: catalogs.ctid, catalogs.hash
         ->  Hash  (cost=668.78..668.78 rows=7678 width=47) (actual 
time=39.635..39.635 rows=7680 loops=1)
               Output: cc.ctid, cc.catalog
               Buckets: 1024  Batches: 1  Memory Usage: 593kB
               ->  Seq Scan on puppet.certname_catalogs cc 
 (cost=0.00..668.78 rows=7678 width=47) (actual time=0.022..35.959 
rows=7680 loops=1)
                     Output: cc.ctid, cc.catalog
 Trigger RI_ConstraintTrigger_16919 for constraint 
catalog_resources_catalog_fkey: time=220008.002 calls=4326
 Trigger RI_ConstraintTrigger_16924 for constraint 
certname_catalogs_catalog_fkey: time=374.236 calls=4326
 Trigger RI_ConstraintTrigger_16944 for constraint classes_catalog_fkey: 
time=45119.841 calls=4326
 Trigger RI_ConstraintTrigger_16949 for constraint edges_catalog_fkey: 
time=395772.761 calls=4326
 Trigger RI_ConstraintTrigger_16954 for constraint tags_catalog_fkey: 
time=46050.487 calls=4326
 Total runtime: 707572.852 ms
(17 rows)
 
> explain analyze verbose DELETE FROM resource_params WHERE NOT EXISTS 
> (SELECT * FROM catalog_resources cr WHERE 
> cr.resource=resource_params.resource); 
>
> puppet=# explain analyze verbose DELETE FROM puppet.resource_params WHERE NOT EXISTS 
puppet-# (SELECT * FROM puppet.catalog_resources cr WHERE 
puppet(# cr.resource=puppet.resource_params.resource); 
                                                                            
        QUERY PLAN                                                         
                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on puppet.resource_params  (cost=0.00..34821.24 rows=422373 
width=12) (actual time=177204.058..177204.058 rows=0 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..34821.24 rows=422373 width=12) 
(actual time=1029.091..177055.468 rows=389 loops=1)
         Output: resource_params.ctid, cr.ctid
         ->  Seq Scan on puppet.resource_params  (cost=0.00..25337.86 
rows=438886 width=47) (actual time=0.014..597.273 rows=446297 loops=1)
               Output: resource_params.ctid, resource_params.resource
         ->  Index Scan using idx_catalog_resources_resource on 
puppet.catalog_resources cr  (cost=0.00..62.39 rows=3155 width=47) (actual 
time=0.394..0.394 rows=1 loops=446297)
               Output: cr.ctid, cr.resource
               Index Cond: ((cr.resource)::text = 
(resource_params.resource)::text)
 Total runtime: 177204.131 ms
 
> Also without a real delete in the first SQL, the second delete 
> wouldn''t have triggered any removal. 
>
> ken. 
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/puppet-users/-/7TS_DCm09uMJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Apparently Analagous Threads

Search for more reasonably related threads

Puppet users - Jan 2013 - Puppetdb will setting gc-interval to 0 disable it

[Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

[Puppet Users] Re: Puppetdb will setting gc-interval to 0 disable it

[Puppet Users] module documentation

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Re: [Puppet Users] Puppetdb will setting gc-interval to 0 disable it

Apparently Analagous Threads