thr3ads.net - Puppet users - [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be? [Sep 2013]

If this information is useful, please help other people find it:
Share via:

marguin

2013-Sep-24 18:22 UTC

[Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

I just recently replaced the ActiveRecord storeconfigs with PuppetDB 
(1.4.0)and i am working through the various configurations.  I only have 14 
nodes  that are being managed and i have hitting the default limit for the 
resource-query-limit.  I can remedy it by setting it to 50000, but i would 
think with few hosts that i have, the default should be fine.  Am I wrong 
about that?  For the most part i am using default values that were 
installed when i installed the module (installed it via puppet).  I am 
running with Postgresql, on Centos 5.8

installed RPM''s:

postgresql-libs-8.1.23-6.el5_8
postgresql-8.1.23-6.el5_8
postgresql-libs-8.1.23-6.el5_8
postgresql-server-8.1.23-6.el5_8

thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Ken Barber

2013-Sep-25 13:25 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

> I just recently replaced the ActiveRecord storeconfigs with PuppetDB
> (1.4.0)and i am working through the various configurations.  I only have 14
> nodes  that are being managed and i have hitting the default limit for the
> resource-query-limit.  I can remedy it by setting it to 50000, but i would
> think with few hosts that i have, the default should be fine.  Am I wrong
> about that?  For the most part i am using default values that were
installed
> when i installed the module (installed it via puppet).  I am running with
> Postgresql, on Centos 5.8
Well, the number of resources to nodes does seem high, but you can
check how many resources you have in PuppetDB by looking at the
dashboard (see "Resources in the population"). Navigate to the URL on
your PuppetDB, for example:
http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet doesn''t
put a limit on # of resources per node, its hard to say if your case
is a problem somewhere. It does however sound exceptional but not
unlikely (I''ve seen some nodes with 10k resources a-piece for
example).

In the future this resource limit would hopefully be removed, but for
now this is to put an administrative stop on large queries from
blowing the JVM heap and causing an OutOfMemory crash. The new feature
causes resources to be streamed instead of loaded up into memory
before being served which puts less strain on the heap.

ken.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Matthew Arguin

2013-Sep-25 18:46 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Thanks Ken.  Looking at my Dashboard i show about 13K resources with about
23% duplication.  Does that sound reasonable for 14 nodes?

JVM Heap
bytes
317M400M300M500M
Nodes
in the population
14
Resources
in the population
12,559
Resource duplication
% of resources stored
23.3%
Catalog duplication
% of catalogs encountered
93.0%
Command Queue
depth
0























On Wed, Sep 25, 2013 at 9:25 AM, Ken Barber <ken@puppetlabs.com> wrote:
> > I just recently replaced the ActiveRecord storeconfigs with PuppetDB
> > (1.4.0)and i am working through the various configurations.  I only
have
> 14
> > nodes  that are being managed and i have hitting the default limit for
> the
> > resource-query-limit.  I can remedy it by setting it to 50000, but i
> would
> > think with few hosts that i have, the default should be fine.  Am I
wrong
> > about that?  For the most part i am using default values that were
> installed
> > when i installed the module (installed it via puppet).  I am running
with
> > Postgresql, on Centos 5.8
>
> Well, the number of resources to nodes does seem high, but you can
> check how many resources you have in PuppetDB by looking at the
> dashboard (see "Resources in the population"). Navigate to the
URL on
> your PuppetDB, for example:
> http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet
doesn''t
> put a limit on # of resources per node, its hard to say if your case
> is a problem somewhere. It does however sound exceptional but not
> unlikely (I''ve seen some nodes with 10k resources a-piece for
> example).
>
> In the future this resource limit would hopefully be removed, but for
> now this is to put an administrative stop on large queries from
> blowing the JVM heap and causing an OutOfMemory crash. The new feature
> causes resources to be streamed instead of loaded up into memory
> before being served which puts less strain on the heap.
>
> ken.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Christopher Wood

2013-Sep-26 03:17 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:

(SNIP)
> http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet
doesn''t
> put a limit on # of resources per node, its hard to say if your case
> is a problem somewhere. It does however sound exceptional but not
> unlikely (I''ve seen some nodes with 10k resources a-piece for
> example).
Now I''m curious about

who these people are
why they need 10,000 resources per host
how they keep track of everything
how long an agent run takes
and how much cpu/ram an agent run takes
and how they troubleshoot the massive debug output

among other things.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

David Schmitt

2013-Sep-26 07:33 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

On 26.09.2013 05:17, Christopher Wood wrote:> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>
> (SNIP)
>
>> http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet
doesn''t
>> put a limit on # of resources per node, its hard to say if your case
>> is a problem somewhere. It does however sound exceptional but not
>> unlikely (I''ve seen some nodes with 10k resources a-piece for
>> example).
>
> Now I''m curious about
>
> who these people are
Me, for example.
> why they need 10,000 resources per host
Such numbers are easy to reach when every service exports a nagios check 
into a central server.
> how they keep track of everything
High modularity. See below.
> how long an agent run takes
Ages. The biggest node I know takes around 44 minutes to run.
> and how much cpu/ram an agent run takes
Too much.
> and how they troubleshoot the massive debug output
Since these 10k+ resources are 99% the same, there is not much to 
troubleshoot.


Regards, David

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Matthew Arguin

2013-Sep-26 15:20 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

So my reasoning behind the initial question/post again is due largely to
being unfamiliar with puppetdb i would say.  We do export a lot of
resources in our puppet deployment due to the nagios checks.  In poking
around on the groups, i came across this post:
https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA

i was especially interested in the comment posted by windowsrefund at the
bottom and trying to understand that because it seems like he is saying
that i could reduce the amount of duplication of exported resources, but i
am not entirely sure.

Basic questions:  Is it "bad" to have resource duplication?  Is it
"good"
to have catalog duplication?  Should i just forget about the 20000 default
on the query param or should i be aiming to tune my puppet deployment to
work towards that?  (currently set to 50000 to stop the issue).

if i did not mention previously, heap currently set to 1G and looking at
the spark line, i seem to be maxing out right now at about 500MB.


On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <david@dasz.at> wrote:
> On 26.09.2013 05:17, Christopher Wood wrote:
>
>> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>>
>> (SNIP)
>>
>> 
http://puppetdb1.vm:8080/**dashboard/index.html<http://puppetdb1.vm:8080/dashboard/index.html>.
>>> Since Puppet doesn''t
>>> put a limit on # of resources per node, its hard to say if your
case
>>> is a problem somewhere. It does however sound exceptional but not
>>> unlikely (I''ve seen some nodes with 10k resources a-piece
for
>>> example).
>>>
>>
>> Now I''m curious about
>>
>> who these people are
>>
>
> Me, for example.
>
>
>  why they need 10,000 resources per host
>>
>
> Such numbers are easy to reach when every service exports a nagios check
> into a central server.
>
>
>  how they keep track of everything
>>
>
> High modularity. See below.
>
>
>  how long an agent run takes
>>
>
> Ages. The biggest node I know takes around 44 minutes to run.
>
>
>  and how much cpu/ram an agent run takes
>>
>
> Too much.
>
>
>  and how they troubleshoot the massive debug output
>>
>
> Since these 10k+ resources are 99% the same, there is not much to
> troubleshoot.
>
>
> Regards, David
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/**
>
topic/puppet-users/**D1KyxpUB4UU/unsubscribe<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>
> .
> To unsubscribe from this group and all its topics, send an email to
>
puppet-users+unsubscribe@**googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com>
> .
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at
http://groups.google.com/**group/puppet-users<http://groups.google.com/group/puppet-users>
> .
> For more options, visit
https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> .
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

David Schmitt

2013-Sep-26 15:54 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re,

Well, the big question is do you HAVE 10k resources, or do you have a 
bug in your manifest?

When you have ruled out obvious bugs in your manifest (like exporting 
stuff thrice), you can turn to tuning your setup:

   * I found significant savings in grouping common service checks into
     a single template.

   * The nagios_* types are "nice", but just deploying file(snippets)
is
     just MUCH cheaper.


Regards, David

On 26.09.2013 17:20, Matthew Arguin wrote:> So my reasoning behind the initial question/post again is due largely to
> being unfamiliar with puppetdb i would say.  We do export a lot of
> resources in our puppet deployment due to the nagios checks.  In poking
> around on the groups, i came across this post:
> https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
>
> i was especially interested in the comment posted by windowsrefund at
> the bottom and trying to understand that because it seems like he is
> saying that i could reduce the amount of duplication of exported
> resources, but i am not entirely sure.
>
> Basic questions:  Is it "bad" to have resource duplication?  Is
it
> "good" to have catalog duplication?  Should i just forget about
the
> 20000 default on the query param or should i be aiming to tune my puppet
> deployment to work towards that?  (currently set to 50000 to stop the
> issue).
>
> if i did not mention previously, heap currently set to 1G and looking at
> the spark line, i seem to be maxing out right now at about 500MB.
>
>
> On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <david@dasz.at
> <mailto:david@dasz.at>> wrote:
>
>     On 26.09.2013 05 <tel:26.09.2013%2005>:17, Christopher Wood
wrote:
>
>         On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>
>         (SNIP)
>
>             http://puppetdb1.vm:8080/__dashboard/index.html
>             <http://puppetdb1.vm:8080/dashboard/index.html>. Since
>             Puppet doesn''t
>             put a limit on # of resources per node, its hard to say if
>             your case
>             is a problem somewhere. It does however sound exceptional
>             but not
>             unlikely (I''ve seen some nodes with 10k resources
a-piece for
>             example).
>
>
>         Now I''m curious about
>
>         who these people are
>
>
>     Me, for example.
>
>
>         why they need 10,000 resources per host
>
>
>     Such numbers are easy to reach when every service exports a nagios
>     check into a central server.
>
>
>         how they keep track of everything
>
>
>     High modularity. See below.
>
>
>         how long an agent run takes
>
>
>     Ages. The biggest node I know takes around 44 minutes to run.
>
>
>         and how much cpu/ram an agent run takes
>
>
>     Too much.
>
>
>         and how they troubleshoot the massive debug output
>
>
>     Since these 10k+ resources are 99% the same, there is not much to
>     troubleshoot.
>
>
>     Regards, David
>
>
>     --
>     You received this message because you are subscribed to a topic in
>     the Google Groups "Puppet Users" group.
>     To unsubscribe from this topic, visit
>    
https://groups.google.com/d/__topic/puppet-users/__D1KyxpUB4UU/unsubscribe
>    
<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>.
>     To unsubscribe from this group and all its topics, send an email to
>     puppet-users+unsubscribe@__googlegroups.com
>     <mailto:puppet-users%2Bunsubscribe@googlegroups.com>.
>     To post to this group, send email to puppet-users@googlegroups.com
>     <mailto:puppet-users@googlegroups.com>.
>     Visit this group at http://groups.google.com/__group/puppet-users
>     <http://groups.google.com/group/puppet-users>.
>     For more options, visit https://groups.google.com/__groups/opt_out
>     <https://groups.google.com/groups/opt_out>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Deepak Giridharagopal

2013-Sep-26 18:27 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

On Sep 26, 2013, at 8:20 AM, Matthew Arguin <matthewarguin@gmail.com>
wrote:
> So my reasoning behind the initial question/post again is due largely to
being unfamiliar with puppetdb i would say.  We do export a lot of resources in
our puppet deployment due to the nagios checks.  In poking around on the groups,
i came across this post: 
https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
> 
> i was especially interested in the comment posted by windowsrefund at the
bottom and trying to understand that because it seems like he is saying that i
could reduce the amount of duplication of exported resources, but i am not
entirely sure.
> 
> Basic questions:  Is it "bad" to have resource duplication?  Is
it "good" to have catalog duplication?  Should i just forget about the
20000 default on the query param or should i be aiming to tune my puppet
deployment to work towards that?  (currently set to 50000 to stop the issue).
A few definitions that may help (I should really add this to the FAQ!):

A resource is considered "duplicated" if it exists, identically, on
more than one system. More specifically: if a resource with the same type,
title, parameters, and other metadata exists on more than one node in PuppetDB
then that resource is considered one that is duplicated. So a resource
duplication rate of, say, 40% means that 60% of your resources exist only on one
system. I like to think of this as the "snowflake
quotient"...it''s a measurement of how many of your resources are
unique and beautiful snowflakes.

A catalog is considered "duplicated" if it''s identical to the
previous catalog that PuppetDB has stored. So if you have a node foo.com, run
puppet on it twice, and the catalog hasn''t changed for that system (you
haven''t made a config change that affects that system between runs)
then that''s considered a catalog duplicate.

Internally, PuppetDB uses both of these concepts to improve performance. If a
new catalog is exactly the same as the previously stored one for a node, then
there''s no need to use up IO to store it again. Similarly, if a catalog
contains 90% the same resources that already exist on other nodes, PuppetDB
doesn''t need to store those resources either (rather we can just store
pointers to already-existing data in the database).

Now, are the numbers you posted good/bad? In the field, we overwhelmingly see
resource duplication and catalog duplication in the 85-95% range. So
I''d say that your low resource duplication rate is atypical. It may
indicate that you are perhaps not leveraging abstractions in your puppet code,
or it could be that you really, truly have a large number of unique resources.
One thing I can definitely say, though, is that the higher your resource
duplication rate the faster PuppetDB will run.

Now, regarding the max query results: I''d set that to whatever works
for you. If you''re doing queries that return a huge number of results,
then feel free to bump that setting up. The only caveat is, as mentioned before,
you need to make sure you give PuppetDB enough heap to actually deal with that
size of a result set.

Lastly, as Ken Barber indicated, we''ve already merged in code that
eliminates the need for that setting. We now stream resource query results to
the client on-the-fly, avoiding batching things up in memory first. This results
in much lower memory usage, and greatly reduces the time before the client gets
the first result. So...problem solved? :)

deepak
> 
> if i did not mention previously, heap currently set to 1G and looking at
the spark line, i seem to be maxing out right now at about 500MB.
> 
> 
> On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <david@dasz.at> wrote:
> On 26.09.2013 05:17, Christopher Wood wrote:
> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
> 
> (SNIP)
> 
> http://puppetdb1.vm:8080/dashboard/index.html. Since Puppet
doesn''t
> put a limit on # of resources per node, its hard to say if your case
> is a problem somewhere. It does however sound exceptional but not
> unlikely (I''ve seen some nodes with 10k resources a-piece for
> example).
> 
> Now I''m curious about
> 
> who these people are
> 
> Me, for example.
> 
> 
> why they need 10,000 resources per host
> 
> Such numbers are easy to reach when every service exports a nagios check
into a central server.
> 
> 
> how they keep track of everything
> 
> High modularity. See below.
> 
> 
> how long an agent run takes
> 
> Ages. The biggest node I know takes around 44 minutes to run.
> 
> 
> and how much cpu/ram an agent run takes
> 
> Too much.
> 
> 
> and how they troubleshoot the massive debug output
> 
> Since these 10k+ resources are 99% the same, there is not much to
troubleshoot.
> 
> 
> Regards, David
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the
Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit
https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Matthew Arguin

2013-Sep-26 18:38 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Deepak,
  thank you very much for that detail.  it certainly clears up some things
for me.  I am guessing that i might have an exec or something somewhere
(complete conjecture at this point) on a machine or two that might be
leading to my low %age on the resource duplication since my catalog %age is
at 96%...

David, completely possible that there is a bug in my manifest somewhere, my
coworker has been doing most of the work getting our puppet system set up
this way and i am working my way in to it, so i will certainly be trying to
see if there is duplication that needs not be there, however, i do know
that we have a pretty sizable number  of check (at least in my opinion for
the number of servers) for a total of 14 nodes (about 1100 active checks
and 300 passive).


On Thu, Sep 26, 2013 at 2:27 PM, Deepak Giridharagopal <
deepak@puppetlabs.com> wrote:
>
> On Sep 26, 2013, at 8:20 AM, Matthew Arguin <matthewarguin@gmail.com>
> wrote:
>
> So my reasoning behind the initial question/post again is due largely to
> being unfamiliar with puppetdb i would say.  We do export a lot of
> resources in our puppet deployment due to the nagios checks.  In poking
> around on the groups, i came across this post:
> https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
>
> i was especially interested in the comment posted by windowsrefund at the
> bottom and trying to understand that because it seems like he is saying
> that i could reduce the amount of duplication of exported resources, but i
> am not entirely sure.
>
> Basic questions:  Is it "bad" to have resource duplication?  Is
it "good"
> to have catalog duplication?  Should i just forget about the 20000 default
> on the query param or should i be aiming to tune my puppet deployment to
> work towards that?  (currently set to 50000 to stop the issue).
>
>
> A few definitions that may help (I should really add this to the FAQ!):
>
> A resource is considered "duplicated" if it exists, identically,
on more
> than one system. More specifically: if a resource with the same type,
> title, parameters, and other metadata exists on more than one node in
> PuppetDB then that resource is considered one that is duplicated. So a
> resource duplication rate of, say, 40% means that 60% of your resources
> exist only on one system. I like to think of this as the "snowflake
> quotient"...it''s a measurement of how many of your resources
are unique and
> beautiful snowflakes.
>
> A catalog is considered "duplicated" if it''s identical
to the previous
> catalog that PuppetDB has stored. So if you have a node foo.com, run
> puppet on it twice, and the catalog hasn''t changed for that system
(you
> haven''t made a config change that affects that system between
runs) then
> that''s considered a catalog duplicate.
>
> Internally, PuppetDB uses both of these concepts to improve performance.
> If a new catalog is exactly the same as the previously stored one for a
> node, then there''s no need to use up IO to store it again.
Similarly, if a
> catalog contains 90% the same resources that already exist on other nodes,
> PuppetDB doesn''t need to store those resources either (rather we
can just
> store pointers to already-existing data in the database).
>
> Now, are the numbers you posted good/bad? In the field, we overwhelmingly
> see resource duplication and catalog duplication in the 85-95% range. So
> I''d say that your low resource duplication rate is atypical. It
may
> indicate that you are perhaps not leveraging abstractions in your puppet
> code, or it could be that you really, truly have a large number of unique
> resources. One thing I can definitely say, though, is that the higher your
> resource duplication rate the faster PuppetDB will run.
>
> Now, regarding the max query results: I''d set that to whatever
works for
> you. If you''re doing queries that return a huge number of results,
then
> feel free to bump that setting up. The only caveat is, as mentioned before,
> you need to make sure you give PuppetDB enough heap to actually deal with
> that size of a result set.
>
> Lastly, as Ken Barber indicated, we''ve already merged in code that
> eliminates the need for that setting. We now stream resource query results
> to the client on-the-fly, avoiding batching things up in memory first. This
> results in much lower memory usage, and greatly reduces the time before the
> client gets the first result. So...problem solved? :)
>
> deepak
>
>
> if i did not mention previously, heap currently set to 1G and looking at
> the spark line, i seem to be maxing out right now at about 500MB.
>
>
> On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <david@dasz.at> wrote:
>
>> On 26.09.2013 05:17, Christopher Wood wrote:
>>
>>> On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>>>
>>> (SNIP)
>>>
>>> 
http://puppetdb1.vm:8080/**dashboard/index.html<http://puppetdb1.vm:8080/dashboard/index.html>.
>>>> Since Puppet doesn''t
>>>> put a limit on # of resources per node, its hard to say if your
case
>>>> is a problem somewhere. It does however sound exceptional but
not
>>>> unlikely (I''ve seen some nodes with 10k resources
a-piece for
>>>> example).
>>>>
>>>
>>> Now I''m curious about
>>>
>>> who these people are
>>>
>>
>> Me, for example.
>>
>>
>>  why they need 10,000 resources per host
>>>
>>
>> Such numbers are easy to reach when every service exports a nagios
check
>> into a central server.
>>
>>
>>  how they keep track of everything
>>>
>>
>> High modularity. See below.
>>
>>
>>  how long an agent run takes
>>>
>>
>> Ages. The biggest node I know takes around 44 minutes to run.
>>
>>
>>  and how much cpu/ram an agent run takes
>>>
>>
>> Too much.
>>
>>
>>  and how they troubleshoot the massive debug output
>>>
>>
>> Since these 10k+ resources are 99% the same, there is not much to
>> troubleshoot.
>>
>>
>> Regards, David
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Puppet Users" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/**
>>
topic/puppet-users/**D1KyxpUB4UU/unsubscribe<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>
>> .
>> To unsubscribe from this group and all its topics, send an email to
>>
puppet-users+unsubscribe@**googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com>
>> .
>> To post to this group, send email to puppet-users@googlegroups.com.
>> Visit this group at
http://groups.google.com/**group/puppet-users<http://groups.google.com/group/puppet-users>
>> .
>> For more options, visit
https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>> .
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users+unsubscribe@googlegroups.com.
>
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

David Schmitt

2013-Sep-27 12:12 UTC

head link

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

On 26.09.2013 20:38, Matthew Arguin wrote:> David, completely possible that there is a bug in my manifest somewhere,
> my coworker has been doing most of the work getting our puppet system
> set up this way and i am working my way in to it, so i will certainly be
> trying to see if there is duplication that needs not be there, however,
> i do know that we have a pretty sizable number  of check (at least in my
> opinion for the number of servers) for a total of 14 nodes (about 1100
> active checks and 300 passive).
so that should result in the order of 2800 resources for the checks 
(1100+300, once for the actual node and once for the monitoring host).

That would indicate that you are storing 8-10 resources per check, which 
seems to indicate a certain "potential" for "optimisation".


Regards, David
>
>
> On Thu, Sep 26, 2013 at 2:27 PM, Deepak Giridharagopal
> <deepak@puppetlabs.com <mailto:deepak@puppetlabs.com>> wrote:
>
>
>     On Sep 26, 2013, at 8:20 AM, Matthew Arguin <matthewarguin@gmail.com
>     <mailto:matthewarguin@gmail.com>> wrote:
>
>>     So my reasoning behind the initial question/post again is due
>>     largely to being unfamiliar with puppetdb i would say.  We do
>>     export a lot of resources in our puppet deployment due to the
>>     nagios checks.  In poking around on the groups, i came across this
>>     post:
https://groups.google.com/forum/#!topic/puppet-users/z1kjqwko1iA
>>
>>     i was especially interested in the comment posted by windowsrefund
>>     at the bottom and trying to understand that because it seems like
>>     he is saying that i could reduce the amount of duplication of
>>     exported resources, but i am not entirely sure.
>>
>>     Basic questions:  Is it "bad" to have resource
duplication?  Is it
>>     "good" to have catalog duplication?  Should i just forget
about
>>     the 20000 default on the query param or should i be aiming to tune
>>     my puppet deployment to work towards that?  (currently set to
>>     50000 to stop the issue).
>
>     A few definitions that may help (I should really add this to the FAQ!):
>
>     A resource is considered "duplicated" if it exists,
identically, on
>     more than one system. More specifically: if a resource with the same
>     type, title, parameters, and other metadata exists on more than one
>     node in PuppetDB then that resource is considered one that is
>     duplicated. So a resource duplication rate of, say, 40% means that
>     60% of your resources exist only on one system. I like to think of
>     this as the "snowflake quotient"...it''s a
measurement of how many of
>     your resources are unique and beautiful snowflakes.
>
>     A catalog is considered "duplicated" if it''s
identical to the
>     previous catalog that PuppetDB has stored. So if you have a node
>     foo.com <http://foo.com>, run puppet on it twice, and the catalog
>     hasn''t changed for that system (you haven''t made a
config change
>     that affects that system between runs) then that''s considered
a
>     catalog duplicate.
>
>     Internally, PuppetDB uses both of these concepts to improve
>     performance. If a new catalog is exactly the same as the previously
>     stored one for a node, then there''s no need to use up IO to
store it
>     again. Similarly, if a catalog contains 90% the same resources that
>     already exist on other nodes, PuppetDB doesn''t need to store
those
>     resources either (rather we can just store pointers to
>     already-existing data in the database).
>
>     Now, are the numbers you posted good/bad? In the field, we
>     overwhelmingly see resource duplication and catalog duplication in
>     the 85-95% range. So I''d say that your low resource
duplication rate
>     is atypical. It may indicate that you are perhaps not leveraging
>     abstractions in your puppet code, or it could be that you really,
>     truly have a large number of unique resources. One thing I can
>     definitely say, though, is that the higher your resource duplication
>     rate the faster PuppetDB will run.
>
>     Now, regarding the max query results: I''d set that to whatever
works
>     for you. If you''re doing queries that return a huge number of
>     results, then feel free to bump that setting up. The only caveat is,
>     as mentioned before, you need to make sure you give PuppetDB enough
>     heap to actually deal with that size of a result set.
>
>     Lastly, as Ken Barber indicated, we''ve already merged in code
that
>     eliminates the need for that setting. We now stream resource query
>     results to the client on-the-fly, avoiding batching things up in
>     memory first. This results in much lower memory usage, and greatly
>     reduces the time before the client gets the first result.
>     So...problem solved? :)
>
>     deepak
>
>>
>>     if i did not mention previously, heap currently set to 1G and
>>     looking at the spark line, i seem to be maxing out right now at
>>     about 500MB.
>>
>>
>>     On Thu, Sep 26, 2013 at 3:33 AM, David Schmitt <david@dasz.at
>>     <mailto:david@dasz.at>> wrote:
>>
>>         On 26.09.2013 05 <tel:26.09.2013%2005>:17, Christopher
Wood wrote:
>>
>>             On Wed, Sep 25, 2013 at 02:25:50PM +0100, Ken Barber wrote:
>>
>>             (SNIP)
>>
>>                 http://puppetdb1.vm:8080/__dashboard/index.html
>>                 <http://puppetdb1.vm:8080/dashboard/index.html>.
Since
>>                 Puppet doesn''t
>>                 put a limit on # of resources per node, its hard to
>>                 say if your case
>>                 is a problem somewhere. It does however sound
>>                 exceptional but not
>>                 unlikely (I''ve seen some nodes with 10k
resources
>>                 a-piece for
>>                 example).
>>
>>
>>             Now I''m curious about
>>
>>             who these people are
>>
>>
>>         Me, for example.
>>
>>
>>             why they need 10,000 resources per host
>>
>>
>>         Such numbers are easy to reach when every service exports a
>>         nagios check into a central server.
>>
>>
>>             how they keep track of everything
>>
>>
>>         High modularity. See below.
>>
>>
>>             how long an agent run takes
>>
>>
>>         Ages. The biggest node I know takes around 44 minutes to run.
>>
>>
>>             and how much cpu/ram an agent run takes
>>
>>
>>         Too much.
>>
>>
>>             and how they troubleshoot the massive debug output
>>
>>
>>         Since these 10k+ resources are 99% the same, there is not much
>>         to troubleshoot.
>>
>>
>>         Regards, David
>>
>>
>>         --
>>         You received this message because you are subscribed to a
>>         topic in the Google Groups "Puppet Users" group.
>>         To unsubscribe from this topic, visit
>>        
https://groups.google.com/d/__topic/puppet-users/__D1KyxpUB4UU/unsubscribe
>>        
<https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe>.
>>         To unsubscribe from this group and all its topics, send an
>>         email to puppet-users+unsubscribe@__googlegroups.com
>>         <mailto:puppet-users%2Bunsubscribe@googlegroups.com>.
>>         To post to this group, send email to
>>         puppet-users@googlegroups.com
>>         <mailto:puppet-users@googlegroups.com>.
>>         Visit this group at
>>         http://groups.google.com/__group/puppet-users
>>         <http://groups.google.com/group/puppet-users>.
>>         For more options, visit
>>         https://groups.google.com/__groups/opt_out
>>         <https://groups.google.com/groups/opt_out>.
>>
>>
>>
>>     --
>>     You received this message because you are subscribed to the Google
>>     Groups "Puppet Users" group.
>>     To unsubscribe from this group and stop receiving emails from it,
>>     send an email to puppet-users+unsubscribe@googlegroups.com
>>     <mailto:puppet-users+unsubscribe@googlegroups.com>.
>>
>>     To post to this group, send email to puppet-users@googlegroups.com
>>     <mailto:puppet-users@googlegroups.com>.
>>     Visit this group at http://groups.google.com/group/puppet-users.
>>     For more options, visit https://groups.google.com/groups/opt_out.
>
>     --
>     You received this message because you are subscribed to a topic in
>     the Google Groups "Puppet Users" group.
>     To unsubscribe from this topic, visit
>     https://groups.google.com/d/topic/puppet-users/D1KyxpUB4UU/unsubscribe.
>     To unsubscribe from this group and all its topics, send an email to
>     puppet-users+unsubscribe@googlegroups.com
>     <mailto:puppet-users%2Bunsubscribe@googlegroups.com>.
>     To post to this group, send email to puppet-users@googlegroups.com
>     <mailto:puppet-users@googlegroups.com>.
>     Visit this group at http://groups.google.com/group/puppet-users.
>     For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-users+unsubscribe@googlegroups.com.
> To post to this group, send email to puppet-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/puppet-users.
> For more options, visit https://groups.google.com/groups/opt_out.
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-users+unsubscribe@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.

Puppet users - Sep 2013 - puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

[Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?

Re: [Puppet Users] puppetdb query result exceeding the default of 20000, only 14 nodes should it be?