I''ve been poking around the web docs, and dont see an answer to this yet: Is there any pre-existing functionality in puppet, to allow limiting parallelism? Example: Lets say that I want all machines to run some sort of job, that updates a central database with information about the state of each puppet client. Lets also say, that I have 1000 machines, so if all of them decide to do it at the exact same time, it would be a Bad Thing. So I ideally would like some kind of puppet mechanism that says, "run this script... but only if there are less than 10 other machines doing the same thing at this particular moment)" -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Bruce Richardson
2010-Oct-11 22:05 UTC
Re: [Puppet Users] serialized or limited parallelism
On Mon, Oct 11, 2010 at 02:09:54PM -0700, Philip Brown wrote:> > Example: > Lets say that I want all machines to run some sort of job, that > updates a central database with information about the state of each > puppet client. > > Lets also say, that I have 1000 machines, so if all of them decide to > do it at the exact same time, it would be a Bad Thing. > So I ideally would like some kind of puppet mechanism that says, > "run this script... but only if there are less than 10 other machines > doing the same thing at this particular moment)"It may be simply that your example isn''t the best expression of what you want, but I wouldn''t do it that way. The first point is that I don''t see Puppet as a way of making hosts do things, I see it as a way of making them conform to a desired state. If that''s a state where they can pass information back (or have it gathered from them) then fine, but I''d use Puppet to set that up, not be the conduit. A more prosaic reason for not using Puppet as the information transport mechanism is that a puppet run already puts enough of a load spike on the client and master; don''t see a good reason to add to that. I also think that such a mechanism would be complex and fragile and have real problems scaling. I would look at using snmp traps to relay that information; it wouldn''t be hard to use puppet to create a distributed system which could cope with the load. Do you have an example that would make a more direct case for puppet having the kind of mechanism you''re looking for? -- Bruce If the universe were simple enough to be understood, we would be too simple to understand it. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Philip There are a couple of ways: If your clients run puppetd in daemon mode, then look in the client configuration file at: splay = true splaylimit = 1800 We run our puppet clients as a wrapper from cron and make use of the function which randomises a number based on fqdn (and other options if you want) cron {"puppet client hourly noop": user => root, minute => fqdn_rand(60, "noop"), command => "puppet_wrapper_script.sh", } Regards John On 12 October 2010 08:09, Philip Brown <phil.googlenews@bolthole.com> wrote:> I''ve been poking around the web docs, and dont see an answer to this > yet: > > Is there any pre-existing functionality in puppet, to allow limiting > parallelism? > > Example: > Lets say that I want all machines to run some sort of job, that > updates a central database with information about the state of each > puppet client. > > Lets also say, that I have 1000 machines, so if all of them decide to > do it at the exact same time, it would be a Bad Thing. > So I ideally would like some kind of puppet mechanism that says, > "run this script... but only if there are less than 10 other machines > doing the same thing at this particular moment)" > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- John Warburton Ph: 0417 299 600 Email: jwarburton@gmail.com -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
One word: Idempotence. If you have to do what you are asking, ur doin'' it wrong. On Mon, Oct 11, 2010 at 2:09 PM, Philip Brown <phil.googlenews@bolthole.com>wrote:> I''ve been poking around the web docs, and dont see an answer to this > yet: > > Is there any pre-existing functionality in puppet, to allow limiting > parallelism? > > Example: > Lets say that I want all machines to run some sort of job, that > updates a central database with information about the state of each > puppet client. > > Lets also say, that I have 1000 machines, so if all of them decide to > do it at the exact same time, it would be a Bad Thing. > So I ideally would like some kind of puppet mechanism that says, > "run this script... but only if there are less than 10 other machines > doing the same thing at this particular moment)" > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com<puppet-users%2Bunsubscribe@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > >-- http://about.me/scoot http://twitter.com/ohlol -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Oct 11, 5:45 pm, Scott Smith <sc...@ohlol.net> wrote:> One word: Idempotence. > > If you have to do what you are asking, ur doin'' it wrong.Weeel,,not neccessarily. It can still be "Idempotent". The ''state'' we wish to maintain, can be, "machine has a self-generated/updated entry in central database, that is no older than X number of hours". So after the first successful run, it will not fully trigger again, until after X number of hours. And that''s fine. BUUT... maybe I am thinking about this all wrong after all. Possibly the better way is to give facter a custom set of facts... and then since facter output gets automatically stored on the master when puppet runs on the client, that would be a better source for the repository...? Hmm.... -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Take a look at mcollective. You can query your facter facts on all your systems. You can also use puppetcommanderd, an mcollective add on, that will schedule runs of your clients matching a filter (filter on facts or classes) spread out over a specified time period. On Oct 11, 2010, at 8:44 PM, Philip Brown <phil.googlenews@bolthole.com> wrote:> > > On Oct 11, 5:45 pm, Scott Smith <sc...@ohlol.net> wrote: >> One word: Idempotence. >> >> If you have to do what you are asking, ur doin'' it wrong. > > Weeel,,not neccessarily. > It can still be "Idempotent". > > The ''state'' we wish to maintain, can be, > "machine has a self-generated/updated entry in central database, that > is no older than X number of hours". > So after the first successful run, it will not fully trigger again, > until after X number of hours. > And that''s fine. > > BUUT... maybe I am thinking about this all wrong after all. Possibly > the better way is to give facter a custom set of facts... and then > since facter output gets automatically stored on the master when > puppet runs on the client, that would be a better source for the > repository...? > Hmm.... > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Bruce Richardson
2010-Oct-12 08:44 UTC
Re: [Puppet Users] Re: serialized or limited parallelism
On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown wrote:> BUUT... maybe I am thinking about this all wrong after all. Possibly > the better way is to give facter a custom set of facts... and then > since facter output gets automatically stored on the master when > puppet runs on the clientIf you''re running storeconfigs, which isn''t a given for everybody, not least because it can cause scaling problems of its own (memcached backend for storeconfigs will be such a win).>, that would be a better source for the > repository...?If the data you are feeding back really is directly relevant and important to your puppet configuration rather than incidental, then yes. If you hit scaling issues, though, you''ll either need to be looking to an alternative method of delivering and storing the data that does scale well or enhancing your puppet infrastructure (the mcollective suggestion was a good one). Is it crucial that the data be current at the beginning any puppet run? Is the data itself used in configuring the host, or is it just the presence of current data that has an effect on how/when puppet runs? -- Bruce Explota!: miles de lemmings no pueden estar equivocados. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown wrote:> BUUT... maybe I am thinking about this all wrong after all. Possibly > the better way is to give facter a custom set of facts... and then > since facter output gets automatically stored on the master when > puppet runs on the clientOn Oct 12, 1:44 am, Bruce Richardson <itsbr...@workshy.org> wrote:> > Is it crucial that the data be current at the beginning any puppet run? > Is the data itself used in configuring the host, or is it just the > presence of current data that has an effect on how/when puppet runs? >Weeelll.. this PARTICULAR data, is more just inventory type data. it is not crucial to the puppet run itself at all. However, in the future, we will have more interest in adding on [random scripts that need to be run on all machines, but not at exactly the same time] -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2010-Oct-12 16:02 UTC
Re: [Puppet Users] Re: serialized or limited parallelism
On Tue, Oct 12, 2010 at 8:48 AM, Philip Brown <phil.googlenews@bolthole.com> wrote:> On Mon, Oct 11, 2010 at 06:44:20PM -0700, Philip Brown wrote: >> BUUT... maybe I am thinking about this all wrong after all. Possibly >> the better way is to give facter a custom set of facts... and then >> since facter output gets automatically stored on the master when >> puppet runs on the client > > On Oct 12, 1:44 am, Bruce Richardson <itsbr...@workshy.org> wrote: >> >> Is it crucial that the data be current at the beginning any puppet run? >> Is the data itself used in configuring the host, or is it just the >> presence of current data that has an effect on how/when puppet runs? >> > > Weeelll.. this PARTICULAR data, is more just inventory type data. it > is not crucial to the puppet run itself at all. > > However, in the future, we will have more interest in adding on > [random scripts that need to be run on all machines, but not at > exactly the same time]Then it sounds like you should either: * construct a defined type around an exec that has an onlyif/unless parameter to check the freshness of the db entry or * write a fact to make the same check, and use that in your manifests. I''m a little confused though, as your initial post seems to describe an orchestration problem, but your clarification simply makes it sound like you want to implement a TTL check on the db entry for each node?> > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- nigel -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Bruce Richardson
2010-Oct-12 17:49 UTC
Re: [Puppet Users] Re: serialized or limited parallelism
On Tue, Oct 12, 2010 at 08:48:30AM -0700, Philip Brown wrote:> > Is it crucial that the data be current at the beginning any puppet run? > > Is the data itself used in configuring the host, or is it just the > > presence of current data that has an effect on how/when puppet runs? > > > > Weeelll.. this PARTICULAR data, is more just inventory type data. it > is not crucial to the puppet run itself at all.Then I really do advise you not to use Puppet as the transport. Use puppet to set up the data collection and transmission/retrieval processes and let them run independently.> > However, in the future, we will have more interest in adding on > random scripts that need to be run on all machines, but not at > exactly the same timeAnd will *those* be integral to Puppet''s operation, or are you just hoping that Puppet can provide you with a way of managing parallel execution? -- Bruce If the universe were simple enough to be understood, we would be too simple to understand it. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
R.I.Pienaar
2010-Oct-12 18:01 UTC
Re: [Puppet Users] Re: serialized or limited parallelism
hello, ----- "Bruce Richardson" <itsbruce@workshy.org> wrote:> On Tue, Oct 12, 2010 at 08:48:30AM -0700, Philip Brown wrote: > > > Is it crucial that the data be current at the beginning any puppet > run? > > > Is the data itself used in configuring the host, or is it just > the > > > presence of current data that has an effect on how/when puppet > runs? > > > > > > > Weeelll.. this PARTICULAR data, is more just inventory type data. > it is not crucial to the puppet run itself at all.Just expanding on the mcollective suggestion a bit since it sounds like it really is what you want. Mcollective will run on each node as a daemon, you can do command and control of your machines with it, thats a side issue in this case. For your inventory setup though it has a optional system called ''registration''. The registration system will send on a interval say 30 or 3000 seconds, you get to decide what is in the registration data and you get to decide where to put it. I''d say wherever you put it needs to be very very fast, you can''t be updating 100s of rows in a relational database rather you want to think about something like a document orientated NoSQL database. I have code available that sends all puppet classes and all facts thats on a node to a mongodb instance from where you can build your web frontends or whatever you want. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Oct 11, 2:09 pm, Philip Brown <phil.googlen...@bolthole.com> wrote:> Is there any pre-existing functionality in puppet, to allow limiting > parallelism?There''s nothing inherent in puppet, besides the ''splay'' options. The other common solution is to use the $rand_from_fqdn custom Fact pattern. That said puppet nodes typically won''t know of the existence/ state of other nodes.> So I ideally would like some kind of puppet mechanism that says, > "run this script... but only if there are less than 10 other machines > doing the same thing at this particular moment)"I think this is a different tool, which I use mcollective for. Capturing registration metadata is pretty simple, see R.Is suggestion. For orchestrating concurrent actions look at ''mc-puppetd runall'' for an example. It runs puppetd on all of the available nodes with a specified concurrency. Writing your own version of that plugin should be trivial. -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.