thr3ads.net - Puppet users - [Puppet Users] serialized or limited parallelism [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Philip Brown

2010-Oct-11 21:09 UTC

[Puppet Users] serialized or limited parallelism

I''ve been poking around the web docs, and dont see an answer to this
yet:

Is there any pre-existing functionality in puppet, to allow limiting
parallelism?

Example:
Lets say that I want all machines to run some sort of job, that
updates a central database with information about the state of each
puppet client.

Lets also say, that I have 1000 machines, so if all of them decide to
do it at the exact same time, it would be a Bad Thing.
So I ideally would like some kind of puppet mechanism that says,
"run this script... but only if there are less than 10 other machines
doing the same thing at this particular moment)"

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Bruce Richardson

2010-Oct-11 22:05 UTC

head link

Re: [Puppet Users] serialized or limited parallelism

On Mon, Oct 11, 2010 at 02:09:54PM -0700, Philip Brown
wrote:> 
> Example:
> Lets say that I want all machines to run some sort of job, that
> updates a central database with information about the state of each
> puppet client.
> 
> Lets also say, that I have 1000 machines, so if all of them decide to
> do it at the exact same time, it would be a Bad Thing.
> So I ideally would like some kind of puppet mechanism that says,
> "run this script... but only if there are less than 10 other machines
> doing the same thing at this particular moment)"
It may be simply that your example isn''t the best expression of what
you
want, but I wouldn''t do it that way.  The first point is that I
don''t
see Puppet as a way of making hosts do things, I see it as a way of
making them conform to a desired state.  If that''s a state where they
can pass information back (or have it gathered from them) then fine, but
I''d use Puppet to set that up, not be the conduit.

A more prosaic reason for not using Puppet as the information transport
mechanism is that a puppet run already puts enough of a load spike on
the client and master; don''t see a good reason to add to that.  I also
think that such a mechanism would be complex and fragile and have real
problems scaling.  

I would look at using snmp traps to relay that information; it wouldn''t
be hard to use puppet to create a distributed system which could cope
with the load.

Do you have an example that would make a more direct case for puppet
having the kind of mechanism you''re looking for?

-- 
Bruce

If the universe were simple enough to be understood, we would be too
simple to understand it.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

John Warburton

2010-Oct-11 23:14 UTC

head link

Re: [Puppet Users] serialized or limited parallelism

Philip

There are a couple of ways:

If your clients run puppetd in daemon mode, then look in the client
configuration file at:
    splay = true
    splaylimit = 1800

We run our puppet clients as a wrapper from cron and make use of the
function which randomises a number based on fqdn (and other options if you
want)

    cron {"puppet client hourly noop":
        user    => root,
        minute  => fqdn_rand(60, "noop"),
        command => "puppet_wrapper_script.sh",
    }

Regards

John

On 12 October 2010 08:09, Philip Brown <phil.googlenews@bolthole.com>
wrote:
> I''ve been poking around the web docs, and dont see an answer to
this
> yet:
>
> Is there any pre-existing functionality in puppet, to allow limiting
> parallelism?
>
> Example:
> Lets say that I want all machines to run some sort of job, that
> updates a central database with information about the state of each
> puppet client.
>
> Lets also say, that I have 1000 machines, so if all of them decide to
> do it at the exact same time, it would be a Bad Thing.
> So I ideally would like some kind of puppet mechanism that says,
> "run this script... but only if there are less than 10 other machines
> doing the same thing at this particular moment)"
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
>
puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>
>

-- 
John Warburton
Ph: 0417 299 600
Email: jwarburton@gmail.com

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Scott Smith

2010-Oct-12 00:45 UTC

head link

Re: [Puppet Users] serialized or limited parallelism

One word: Idempotence.

If you have to do what you are asking, ur doin'' it wrong.

On Mon, Oct 11, 2010 at 2:09 PM, Philip Brown
<phil.googlenews@bolthole.com>wrote:
> I''ve been poking around the web docs, and dont see an answer to
this
> yet:
>
> Is there any pre-existing functionality in puppet, to allow limiting
> parallelism?
>
> Example:
> Lets say that I want all machines to run some sort of job, that
> updates a central database with information about the state of each
> puppet client.
>
> Lets also say, that I have 1000 machines, so if all of them decide to
> do it at the exact same time, it would be a Bad Thing.
> So I ideally would like some kind of puppet mechanism that says,
> "run this script... but only if there are less than 10 other machines
> doing the same thing at this particular moment)"
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
>
puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
>
>

-- 
http://about.me/scoot
http://twitter.com/ohlol

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Philip Brown

2010-Oct-12 01:44 UTC

head link

[Puppet Users] Re: serialized or limited parallelism

On Oct 11, 5:45 pm, Scott Smith <sc...@ohlol.net>
wrote:> One word: Idempotence.
>
> If you have to do what you are asking, ur doin'' it wrong.
Weeel,,not neccessarily.
It can still be "Idempotent".

The ''state'' we wish to maintain, can be,
"machine has a self-generated/updated entry in central database, that
is no older than X number of hours".
So after the first successful run, it will not fully trigger again,
until after X number of hours.
And that''s fine.

BUUT... maybe I am thinking about this all wrong after all. Possibly
the better way is to give facter a custom set of facts... and then
since facter output gets automatically stored on the master when
puppet runs on the client, that would be a better source for the
repository...?
Hmm....

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Carl.caum

2010-Oct-12 02:23 UTC

head link

Re: [Puppet Users] Re: serialized or limited parallelism

Take a look at mcollective. You can query your facter facts on all your systems.
You can also use puppetcommanderd, an mcollective add on, that will schedule
runs of your clients matching a filter (filter on facts or classes) spread out
over a specified time period.

On Oct 11, 2010, at 8:44 PM, Philip Brown <phil.googlenews@bolthole.com>
wrote:
> 
> 
> On Oct 11, 5:45 pm, Scott Smith <sc...@ohlol.net> wrote:
>> One word: Idempotence.
>> 
>> If you have to do what you are asking, ur doin'' it wrong.
> 
> Weeel,,not neccessarily.
> It can still be "Idempotent".
> 
> The ''state'' we wish to maintain, can be,
> "machine has a self-generated/updated entry in central database, that
> is no older than X number of hours".
> So after the first successful run, it will not fully trigger again,
> until after X number of hours.
> And that''s fine.
> 
> BUUT... maybe I am thinking about this all wrong after all. Possibly
> the better way is to give facter a custom set of facts... and then
> since facter output gets automatically stored on the master when
> puppet runs on the client, that would be a better source for the
> repository...?
> Hmm....
> 
> -- 
> You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
> 
-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Bruce Richardson

2010-Oct-12 08:44 UTC

head link

Re: [Puppet Users] Re: serialized or limited parallelism

On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown
wrote:> BUUT... maybe I am thinking about this all wrong after all. Possibly
> the better way is to give facter a custom set of facts... and then
> since facter output gets automatically stored on the master when
> puppet runs on the client
If you''re running storeconfigs, which isn''t a given for
everybody, not
least because it can cause scaling problems of its own (memcached
backend for storeconfigs will be such a win).
>, that would be a better source for the
> repository...?
If the data you are feeding back really is directly relevant and
important to your puppet configuration rather than incidental, then yes.
If you hit scaling issues, though, you''ll either need to be looking to
an alternative method of delivering and storing the data that does scale
well or enhancing your puppet infrastructure (the mcollective suggestion
was a good one).

Is it crucial that the data be current at the beginning any puppet run?
Is the data itself used in configuring the host, or is it just the
presence of current data that has an effect on how/when puppet runs?

-- 
Bruce

Explota!: miles de lemmings no pueden estar equivocados.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Philip Brown

2010-Oct-12 15:48 UTC

head link

[Puppet Users] Re: serialized or limited parallelism

On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown
wrote:> BUUT... maybe I am thinking about this all wrong after all. Possibly
> the better way is to give facter a custom set of facts... and then
> since facter output gets automatically stored on the master when
> puppet runs on the client
On Oct 12, 1:44 am, Bruce Richardson <itsbr...@workshy.org>
wrote:>
> Is it crucial that the data be current at the beginning any puppet run?
> Is the data itself used in configuring the host, or is it just the
> presence of current data that has an effect on how/when puppet runs?
>
Weeelll.. this PARTICULAR data, is more just inventory type data. it
is not crucial to the puppet run itself at all.

However, in the future, we will have more interest in adding on
[random scripts that need to be run on all machines, but not at
exactly the same time]

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Nigel Kersten

2010-Oct-12 16:02 UTC

head link

Re: [Puppet Users] Re: serialized or limited parallelism

On Tue, Oct 12, 2010 at 8:48 AM, Philip Brown
<phil.googlenews@bolthole.com> wrote:> On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown wrote:
>> BUUT... maybe I am thinking about this all wrong after all. Possibly
>> the better way is to give facter a custom set of facts... and then
>> since facter output gets automatically stored on the master when
>> puppet runs on the client
>
> On Oct 12, 1:44 am, Bruce Richardson <itsbr...@workshy.org> wrote:
>>
>> Is it crucial that the data be current at the beginning any puppet run?
>> Is the data itself used in configuring the host, or is it just the
>> presence of current data that has an effect on how/when puppet runs?
>>
>
> Weeelll.. this PARTICULAR data, is more just inventory type data. it
> is not crucial to the puppet run itself at all.
>
> However, in the future, we will have more interest in adding on
> [random scripts that need to be run on all machines, but not at
> exactly the same time]
Then it sounds like you should either:

* construct a defined type around an exec that has an onlyif/unless
parameter to check the freshness of the db entry
or
* write a fact to make the same check, and use that in your manifests.

I''m a little confused though, as your initial post seems to describe
an orchestration problem, but your clarification simply makes it sound
like you want to implement a TTL check on the db entry for each node?
>
> --
> You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
> To post to this group, send email to puppet-users@googlegroups.com.
> To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.
>
>


-- 
nigel

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Bruce Richardson

2010-Oct-12 17:49 UTC

head link

Re: [Puppet Users] Re: serialized or limited parallelism

On Tue, Oct 12, 2010 at 08:48:30AM -0700, Philip Brown
wrote:> > Is it crucial that the data be current at the beginning any puppet
run?
> > Is the data itself used in configuring the host, or is it just the
> > presence of current data that has an effect on how/when puppet runs?
> >
> 
> Weeelll.. this PARTICULAR data, is more just inventory type data. it
> is not crucial to the puppet run itself at all.
Then I really do advise you not to use Puppet as the transport.  Use
puppet to set up the data collection and transmission/retrieval
processes and let them run independently.
> 
> However, in the future, we will have more interest in adding on
> random scripts that need to be run on all machines, but not at
> exactly the same time
And will *those* be integral to Puppet''s operation, or are you just
hoping that Puppet can provide you with a way of managing parallel
execution?

-- 
Bruce

If the universe were simple enough to be understood, we would be too
simple to understand it.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

R.I.Pienaar

2010-Oct-12 18:01 UTC

head link

Re: [Puppet Users] Re: serialized or limited parallelism

hello,

----- "Bruce Richardson" <itsbruce@workshy.org> wrote:
> On Tue, Oct 12, 2010 at 08:48:30AM -0700, Philip Brown wrote:
> > > Is it crucial that the data be current at the beginning any
puppet
> run?
> > > Is the data itself used in configuring the host, or is it just
> the
> > > presence of current data that has an effect on how/when puppet
> runs?
> > >
> > 
> > Weeelll.. this PARTICULAR data, is more just inventory type data.
> it is not crucial to the puppet run itself at all.

Just expanding on the mcollective suggestion a bit since it sounds 
like it really is what you want.

Mcollective will run on each node as a daemon, you can do command
and control of your machines with it, thats a side issue in this case.

For your inventory setup though it has a optional system called 
''registration''.  The registration system will send on a
interval say
30 or 3000 seconds, you get to decide what is in the registration data
and you get to decide where to put it.  I''d say wherever you put it
needs to be very very fast, you can''t be updating 100s of rows in a
relational database rather you want to think about something like a 
document orientated NoSQL database.

I have code available that sends all puppet classes and all facts thats
on a node to a mongodb instance from where you can build your web 
frontends or whatever you want.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

donavan

2010-Oct-13 00:15 UTC

head link

[Puppet Users] Re: serialized or limited parallelism

On Oct 11, 2:09 pm, Philip Brown <phil.googlen...@bolthole.com>
wrote:> Is there any pre-existing functionality in puppet, to allow limiting
> parallelism?
There''s nothing inherent in puppet, besides the
''splay'' options. The
other common solution is to use the $rand_from_fqdn custom Fact
pattern. That said puppet nodes typically won''t know of the existence/
state of other nodes.
> So I ideally would like some kind of puppet mechanism that says,
> "run this script... but only if there are less than 10 other machines
> doing the same thing at this particular moment)"
I think this is a different tool, which I use mcollective for.
Capturing registration metadata is pretty simple, see R.Is suggestion.
For orchestrating concurrent actions look at ''mc-puppetd
runall'' for
an example. It runs puppetd on all of the available nodes with a
specified concurrency. Writing your own version of that plugin should
be trivial.

-- 
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to
puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/puppet-users?hl=en.

Puppet users - Oct 2010 - serialized or limited parallelism

[Puppet Users] serialized or limited parallelism

Re: [Puppet Users] serialized or limited parallelism

Re: [Puppet Users] serialized or limited parallelism

Re: [Puppet Users] serialized or limited parallelism

[Puppet Users] Re: serialized or limited parallelism

Re: [Puppet Users] Re: serialized or limited parallelism

Re: [Puppet Users] Re: serialized or limited parallelism

[Puppet Users] Re: serialized or limited parallelism

Re: [Puppet Users] Re: serialized or limited parallelism

Re: [Puppet Users] Re: serialized or limited parallelism

Re: [Puppet Users] Re: serialized or limited parallelism

[Puppet Users] Re: serialized or limited parallelism