Hi, My environment is composed of ~250 workstations hitting a single puppetmaster server, which has been working fairly well up until now. The most recent change has been a migration of a lot of remote file copy objects which were previously handled with cfengine. client side puppetd calls to the puppetmaster.getconfig method are taking unreasonably long, on the order of 2-3 minutes. It typically takes the server about 1 to 2 seconds to compile the configuration for each node. In an effort to mitigate the problem, I''ve switched all fileserver operations to another server process, ensuring only CA methods and configuration methods are being called from the default server process. This is described in: http://reductivelabs.com/cgi-bin/puppet.cgi/wiki/PuppetScalability Even after offloading all fileserver operations, the getconfig method is taking a minute or more on average. I''m currently running puppet every half hour from cron with a 15 minute splay. I''m wondering if anyone else has suggestions or insight into reducing response time in a setup like this. Cheers, -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
Hey Jeff, I think that this points back to a previous discussion about the lack of parallelism in puppet. Without an individual puppet server being able to support multiple actions in parallel. I can see three potential solutions to your problem. First, you could set up a different type of file serving mechanism, such as SSLified HTTP or Subversion. Or, second, you could split your configurations down even further and set up additional puppet server processes to support different node sets via different ports and make sure that your nodes are roughly load balanced. Last, fully load balance your puppet server processes via DNS and possibly NAT to have multiple identical server processes handle your clients in a RR type setup. Of course, there is the possibility that I''m completely wrong because I haven''t tried any of these :-). Trevor On 2/22/07, Jeff McCune <mccune@math.ohio-state.edu> wrote:> > Hi, > > My environment is composed of ~250 workstations hitting a single > puppetmaster server, which has been working fairly well up until now. > The most recent change has been a migration of a lot of remote file copy > objects which were previously handled with cfengine. > > client side puppetd calls to the puppetmaster.getconfig method are > taking unreasonably long, on the order of 2-3 minutes. It typically > takes the server about 1 to 2 seconds to compile the configuration for > each node. > > In an effort to mitigate the problem, I''ve switched all fileserver > operations to another server process, ensuring only CA methods and > configuration methods are being called from the default server process. > This is described in: > http://reductivelabs.com/cgi-bin/puppet.cgi/wiki/PuppetScalability > > Even after offloading all fileserver operations, the getconfig method is > taking a minute or more on average. I''m currently running puppet every > half hour from cron with a 15 minute splay. > > I''m wondering if anyone else has suggestions or insight into reducing > response time in a setup like this. > > Cheers, > -- > Jeff McCune > The Ohio State University > Department of Mathematics > Systems Manager > > _______________________________________________ > Puppet-users mailing list > Puppet-users@madstop.com > https://mail.madstop.com/mailman/listinfo/puppet-users > > >_______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
Jeff McCune wrote:> Hi, > > My environment is composed of ~250 workstations hitting a single > puppetmaster server, which has been working fairly well up until now. > The most recent change has been a migration of a lot of remote file copy > objects which were previously handled with cfengine.Grr... Looks like it''s worse than simply being slow to respond. Even with the split file server, I''m now seeing these on my clients: debug: Calling fileserver.describe err: Could not call fileserver.describe: #<Errno::ECONNRESET: Connection reset by peer> err: /default/dispatcher/cfengine/cfile[root-140.254.93.76.pub]/File[/var/cfengine/ppkeys/root-140.254.93.76.pub]/source: Could not describe /cf_master_dynamic/manage-files/ppkeys/root-140.254.93.76.pub: Connection reset by peer Ideas? -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
On Thu, 2007-02-22 at 10:51 -0500, Jeff McCune wrote:> client side puppetd calls to the puppetmaster.getconfig method are > taking unreasonably long, on the order of 2-3 minutes. It typically > takes the server about 1 to 2 seconds to compile the configuration for > each node.That''s not good .. are those measurements based on all clients hitting the server or is do you see them with only a single client accessing the server ?> I''m wondering if anyone else has suggestions or insight into reducing > response time in a setup like this.Another band aid you can try is some simple round-robin load balancing. I''d guess that you could even run several server processes on the same machine listening on different ports. The real fix of course is to understand better what takes the server so long to compile the manifest and fix that .. but that might take a while ;) David
David Lutterkort wrote:> On Thu, 2007-02-22 at 10:51 -0500, Jeff McCune wrote: >> client side puppetd calls to the puppetmaster.getconfig method are >> taking unreasonably long, on the order of 2-3 minutes. It typically >> takes the server about 1 to 2 seconds to compile the configuration for >> each node. > > That''s not good .. are those measurements based on all clients hitting > the server or is do you see them with only a single client accessing the > server ?All clients. My test server running on an alternate port of the same host with only my test client hitting it compiles the configuration in approximately 1 second and the getconfig method doesn''t take much longer than that from the client''s perspective. I''m looking at subversion as a replacement. Dynamic content will still have to reside in puppet, but the static files might as well be subversion working copies on each node''s local disk. Cheers, -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
On Thu, Feb 22, 2007 at 11:02:38AM -0500, Jeff McCune wrote:> Jeff McCune wrote: > >Hi, > > > >My environment is composed of ~250 workstations hitting a single > >puppetmaster server, which has been working fairly well up until now. > >The most recent change has been a migration of a lot of remote file copy > >objects which were previously handled with cfengine. > > Grr... Looks like it''s worse than simply being slow to respond. > > Even with the split file server, I''m now seeing these on my clients: > > debug: Calling fileserver.describe > err: Could not call fileserver.describe: #<Errno::ECONNRESET: Connection > reset by peer>When I start getting ECONNRESETs, the only thing that has reliably worked for me to resolve them is a restart of the Puppetmaster. I''ve never looked particularly deeply into the problem, though. - Matt
On Thu, Feb 22, 2007 at 10:51:38AM -0500, Jeff McCune wrote:> In an effort to mitigate the problem, I''ve switched all fileserver > operations to another server process, ensuring only CA methods and > configuration methods are being called from the default server process. > This is described in: > http://reductivelabs.com/cgi-bin/puppet.cgi/wiki/PuppetScalabilityIck. I just have localised servers accessable from a "puppetfiles" DNS entry, and get all my files from that. Much less effort involved.> Even after offloading all fileserver operations, the getconfig method is > taking a minute or more on average. I''m currently running puppet every > half hour from cron with a 15 minute splay. > > I''m wondering if anyone else has suggestions or insight into reducing > response time in a setup like this.I had this problem when I had all machines hammering one Puppetmaster for everything. Switching to inlining as many files as possible (content => template(...) instead of source => puppet://), and having individual puppetfiles servers per subdomain, has removed my problems entirely (though the load on my puppetmsater is a bit smaller than yours). I think the problem is that the Puppetmaster can only handle one manifest compilation at a time, so as soon as you have getconfig requests coming in faster than the puppetmaster can handle them, you get this backlog of requests and your times start shooting up. That was the symptom I was seeing -- once the puppetmaster got overloaded, it was all over. Unfortunately, simply multi-threading Webrick (which is apparently supported) won''t help, as when the puppetmaster is compiling a manifest it tends to chew all CPU for a while (and Ruby''s green threads preclude multi-core performance enhancements). On top of that, the thread-safety of the Puppetmaster is an unknown (to me, anyway) quantity. Modifying Webrick to spawn multiple child processes to process requests in parallel (a la Apache''s worker MPM) would improve performance on multi-{CPU,core} machines, but would involve modifying webrick. A composite approach, of multi-threading webrick and spawning separate processes to handle the actual manifest compilation is theoretically possible, but scary at a practical level. Drb might help here, but I''ve never used it. Finally, there''s making the Puppetmaster less of a resources hog. I can''t comment on that really, since I don''t know how efficient it is at the moment -- perhaps turning off the --run-like-a-dog option would do the trick, or maybe it''s already going as fast as it possibly can. - Matt -- Politics and religion are just like software and hardware. They all suck, the documentation is provably incorrect, and all the vendors tell lies. -- Andrew Dalgleish, in the Monastery
On Feb 23, 2007, at 1:25 AM, Matthew Palmer wrote:> > I had this problem when I had all machines hammering one > Puppetmaster for > everything. Switching to inlining as many files as possible > (content => > template(...) instead of source => puppet://), and having individual > puppetfiles servers per subdomain, has removed my problems entirely > (though > the load on my puppetmsater is a bit smaller than yours).This is probably a good idea when possible. It might even be a good idea to encourage people to move entirely to it, but I haven''t thought about it enough yet.> I think the problem is that the Puppetmaster can only handle one > manifest > compilation at a time, so as soon as you have getconfig requests > coming in > faster than the puppetmaster can handle them, you get this backlog of > requests and your times start shooting up. That was the symptom I was > seeing -- once the puppetmaster got overloaded, it was all over.Yeah, I think the single-threaded nature of webrick is the problem.> Unfortunately, simply multi-threading Webrick (which is apparently > supported) won''t help, as when the puppetmaster is compiling a > manifest it > tends to chew all CPU for a while (and Ruby''s green threads preclude > multi-core performance enhancements). On top of that, the thread- > safety of > the Puppetmaster is an unknown (to me, anyway) quantity.As far as I have been able, the master should be thread safe, but I haven''t been able to test that. Parsing is not stateful, so it should be fine.> Modifying Webrick to spawn multiple child processes to process > requests in > parallel (a la Apache''s worker MPM) would improve performance on > multi-{CPU,core} machines, but would involve modifying webrick.Well, I could certainly just fork, compile, and exit, but then you could pretty easily fork-bomb a machine.> A composite approach, of multi-threading webrick and spawning separate > processes to handle the actual manifest compilation is theoretically > possible, but scary at a practical level. Drb might help here, but > I''ve > never used it.I''m currently working on replacing webrick with Mongrel, as in, this is the main thing I''ve been working on the last couple of weeks. I''m at the "functional prototype" phase, and I hope to have something more production-worthy within a week or two. However, I don''t know how much this will really help; it should at least be better on multi- core machines, but I don''t yet know how much better. I would definitely like volunteers to be prepared to test when it is ready.> Finally, there''s making the Puppetmaster less of a resources hog. > I can''t > comment on that really, since I don''t know how efficient it is at the > moment -- perhaps turning off the --run-like-a-dog option would do the > trick, or maybe it''s already going as fast as it possibly can.I haven''t spent much time on performance optimizations in the compiler, because individual compilers only take a couple of seconds, which means a single process should be able to, on average, handle at least 12 nodes a minute (at 5 seconds a compile) and thus 360 nodes every half an hour, assuming perfect serialization. With a single mongrel cluster running on one multi-core box, with one process per core, you should scale pretty easily to hundreds of nodes, and if you get much beyond that, you''re going to want to have multilpe servers anyway, I expect. We clearly need to switch fileserving to plain http instead of xmlrpc, although it won''t be straightforward to make this backward compatible. -- It''s impossible to foresee the consequences of being clever. -- Christopher Strachey --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
Perhaps it could help , for file serving i use this little class as transfering phpmyadmin with puppet file type is simply as it check every file for owner, mode etc... :) : define filevault( $source, $destination, $recursif = ''true'', $deletesource = ''delete'', $exclude = '''', $pwf= ''/usr/local/.aqadmin/etc/rsyncd.pwf'' ) { file { $pwf: content => ''xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'', before => Exec["rsync-$source-$destination"], mode => 600 } exec { "rsync-$source-$destination": command => $recursif ? { true => "rsync --password-file=$pwf -raz --exclude=$exclude --$deletesource aqueossync@$puppetserver::share_rsync/$source $destination", fase => "rsync --password-file=$pwf -az --exclude=$exclude --$deletesource aqueossync@$puppetserver::share_rsync/$source $destination" } } } and use it like this: filevault { phpmyadmin: source => ''webaps/phpMyAdmin/'', destination => $chemin, recursif => true, deletesource => delete, exclude => ''config.inc.php'' } just setup a rsync server on the puppetserver host to serve your files and use this class instead of the file on to move files. This is not mega secure as rsyncd does not encrypt data at all (perhaps puppet encrypt files before sending them ?). Beware as you can see that the --delete flag is used by default, use at your own risk :) -- Cordialement, Ghislain ADNET. AQUEOS. Attention ! Toute demande de support ou commande de domaine par email sera refusée, pour cela utilisez https://support.aqueos.net. Pour tout contact nos coordonnées : http://www.aqueos.com/aqueos-services-informatiques-societe.php Fax: 01.72.70.32.66 _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
On Fri, Feb 23, 2007 at 02:54:07PM +0000, Luke Kanies wrote:> On Feb 23, 2007, at 1:25 AM, Matthew Palmer wrote: > > > > I had this problem when I had all machines hammering one > > Puppetmaster for > > everything. Switching to inlining as many files as possible > > (content => > > template(...) instead of source => puppet://), and having individual > > puppetfiles servers per subdomain, has removed my problems entirely > > (though > > the load on my puppetmsater is a bit smaller than yours). > > This is probably a good idea when possible. It might even be a good > idea to encourage people to move entirely to it, but I haven''t > thought about it enough yet.You can''t do it for recursive file transfers (which are the ones that get real big, real quick), and it does make the manifest data being sent quite large, which can be problem.> > I think the problem is that the Puppetmaster can only handle one > > manifest > > compilation at a time, so as soon as you have getconfig requests > > coming in > > faster than the puppetmaster can handle them, you get this backlog of > > requests and your times start shooting up. That was the symptom I was > > seeing -- once the puppetmaster got overloaded, it was all over. > > Yeah, I think the single-threaded nature of webrick is the problem.Webrick. Is. Not. Single. Threaded. Where did this idea come from? When Rails uses webrick, it only processes a single thread at a time, but that''s because Rails isn''t thread safe.> > Modifying Webrick to spawn multiple child processes to process > > requests in > > parallel (a la Apache''s worker MPM) would improve performance on > > multi-{CPU,core} machines, but would involve modifying webrick. > > Well, I could certainly just fork, compile, and exit, but then you > could pretty easily fork-bomb a machine.As you can with anything that runs out of inetd. If you need to, keep a count of how many active compilations are running and limit it somehow.> > A composite approach, of multi-threading webrick and spawning separate > > processes to handle the actual manifest compilation is theoretically > > possible, but scary at a practical level. Drb might help here, but > > I''ve > > never used it. > > I''m currently working on replacing webrick with Mongrel, as in, this > is the main thing I''ve been working on the last couple of weeks. I''m > at the "functional prototype" phase, and I hope to have something > more production-worthy within a week or two. However, I don''t know > how much this will really help; it should at least be better on multi- > core machines, but I don''t yet know how much better. I would > definitely like volunteers to be prepared to test when it is ready.Last time we talked about this, Mongrel was single-threaded, lacked SSL support, and didn''t really qualify as a "real" webserver. Does this switch to Mongrel mean that we''re going to have to start running Apache in front of our Puppetmasters?> We clearly need to switch fileserving to plain http instead of > xmlrpc, although it won''t be straightforward to make this backward > compatible.Can this be supported without losing the ability to not download a file unless we actually need it? I would have thought that we''re still going to need some interface to say "does this file have the appropriate checksum?". - Matt -- (And don''t even mention the Army Of Cultists that pop up every time you claim that it might be less than absolutely perfect for every purpose ever conceived.) -- Dave Brown, ASR, on MacOS X
On Fri, Feb 23, 2007 at 04:14:20PM +0100, ADNET Ghislain wrote:> exec { "rsync-$source-$destination":Hmm. There''s an interesting possibility. Anyone know of a librsync-ruby? - Matt -- I don''t do veggies if I can help it. -- stevo If you could see your colon, you''d be horrified. -- Iain Broadfoot If he could see his colon, he''d be management. -- David Scheidt
On Feb 23, 2007, at 10:23 PM, Matthew Palmer wrote:> > You can''t do it for recursive file transfers (which are the ones > that get > real big, real quick), and it does make the manifest data being > sent quite > large, which can be problem.True on both accounts. I clearly don''t have a good solution.> Webrick. Is. Not. Single. Threaded. Where did this idea come > from? > When Rails uses webrick, it only processes a single thread at a > time, but > that''s because Rails isn''t thread safe.I guess I just thought it was, and maybe I thought Puppet would be behaving better if webrick were multithreaded. Either way, it doesn''t matter all that much because they aren''t real threads so a single process can''t scale across multiple cores. The two problems with webrick right now are that it can''t use all the cores on a machine, and something''s happening during compilation that causes exponential increase in compile times when too many machines hit it. The only way to fix the first problem is with forking, and I have no idea of the source of the second problem, and thus I''ve no idea how to fix it. It''s quite possible that it''s somehow intrinsic to some of my code, but I don''t think so.> As you can with anything that runs out of inetd. If you need to, > keep a > count of how many active compilations are running and limit it > somehow.True, but most of those apps are a bit more lightweight than Puppet. Point taken, though.> Last time we talked about this, Mongrel was single-threaded, lacked > SSL > support, and didn''t really qualify as a "real" webserver. Does > this switch > to Mongrel mean that we''re going to have to start running Apache in > front of > our Puppetmasters?If you want the scaling that Mongrel provides, yes. I''ll always support the degenerate, out of the box usage of webrick, because it''s wicked easy and clearly works fine for small sites. If we can find a way to make webrick scale, I''m all for that, because it sure is easy, but it seemed easier to do what everyone else is doing rather than figuring it all out on my own. I agree that mongrel isn''t a great solution, but the only real options are mod_ruby plus apache (everyone seems to hate mod_ruby and it seems to be poorly maintained), fcgi + something (apparently this is fine as long as you don''t use apache, because apache''s fcgi configs are heinous), or mongrel + something. Pretty much everyone is now using mongrel, so I''ve been looking at using it with pound, which is the only lightweight webserver I''ve found that supports client certs. The cert info would be sent to mongrel as part of the http request. I''d love not to have to do it this way, so maybe I should just start forking per request. Jeff, are you willing to test this? It''s about two lines of code to see how it works for you.>> We clearly need to switch fileserving to plain http instead of >> xmlrpc, although it won''t be straightforward to make this backward >> compatible. > > Can this be supported without losing the ability to not download a > file > unless we actually need it? I would have thought that we''re still > going to > need some interface to say "does this file have the appropriate > checksum?".Yeah; file description would still be over xmlrpc, only the file retrieval would be plain html. -- The Washington Bullets are changing their name. The owners no longer want their team''s name to be associated with crime. So from now on the team will be known as The Bullets. -- Paul Harvey, quoting Argus Hamilton --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 23, 2007, at 10:24 PM, Matthew Palmer wrote:> On Fri, Feb 23, 2007 at 04:14:20PM +0100, ADNET Ghislain wrote: >> exec { "rsync-$source-$destination": > > Hmm. There''s an interesting possibility. Anyone know of a > librsync-ruby?There''s an rsync library, but it''s not compatible with rsync. I didn''t bother checking for ruby bindings when I found that out. The only problem with using rsync instead of the internal fileserver is that you''d need to generate events when files changed, which is the reason why I didn''t do it in the first place. Well, that, and it''s a whole different authentication infrastructure and transport. However, my plans went somewhat awry, since you can''t subscribe to files that are within recursive copies (e.g., if you''re recursively copying /etc/ssh, you can''t subscribe directly to /etc/ssh/ sshd_config), because the relationships have to be realized before the recursion happens. Although as I type this I realize I might have found a way to fix this. -- Everything that is really great and inspiring is created by the individual who can labor in freedom. -- Albert Einstein --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Fri, Feb 23, 2007 at 10:49:16PM +0100, Luke Kanies wrote:> I agree that mongrel isn''t a great solution, but the only real > options are mod_ruby plus apache (everyone seems to hate mod_ruby and > it seems to be poorly maintained), fcgi + something (apparently this > is fine as long as you don''t use apache, because apache''s fcgi > configs are heinous), or mongrel + something.That''s the irony -- mod_ruby sucks and is Apache-only, FCGI only works half-decently on lighttpd, but lighty doesn''t do SSL at all well.> Pretty much everyone is now using mongrel, so I''ve been looking at using > it with pound, which is the only lightweight webserver I''ve found that > supports client certs. The cert info would be sent to mongrel as part of > the http request.Presumably with Mongrel only listening on localhost, to limit the chance of someone going around the front-end?> >> We clearly need to switch fileserving to plain http instead of > >> xmlrpc, although it won''t be straightforward to make this backward > >> compatible. > > > > Can this be supported without losing the ability to not download a > > file > > unless we actually need it? I would have thought that we''re still > > going to > > need some interface to say "does this file have the appropriate > > checksum?". > > Yeah; file description would still be over xmlrpc, only the file > retrieval would be plain html.I hope you meant "plain HTTP" there. With the file descs still being an XML-RPC call, you''re not going to get much performance improvement without improving the performance of the file description code. The HTTP requests to the actual file will still need to be handled by dynamic code, too, to ensure that only authorised clients are getting the files. - Matt -- "[the average computer user] has been served so poorly that he expects his system to crash all the time, and we witness a massive worldwide distribution of bug-ridden software for which we should be deeply ashamed." -- Edsger Dijkstra
On Sat, 2007-02-24 at 10:07 +1100, Matthew Palmer wrote:> I hope you meant "plain HTTP" there. With the file descs still being an > XML-RPC call, you''re not going to get much performance improvement without > improving the performance of the file description code. The HTTP requests > to the actual file will still need to be handled by dynamic code, too, to > ensure that only authorised clients are getting the files.The big difference is that over plain HTTP you can easily stream the file without reading the whole thing into memory. That''s not possible with XMLRPC. David
On Feb 24, 2007, at 12:07 AM, Matthew Palmer wrote:> > That''s the irony -- mod_ruby sucks and is Apache-only, FCGI only works > half-decently on lighttpd, but lighty doesn''t do SSL at all well.I.e., there are no great options. Do you think I should focus on a fork-based system instead of mongrel? I can only imagine that mongrel''s speed is based on its html parsing, but... I don''t really do any html parsing in Puppet, so I''m not that concerned about it.> Presumably with Mongrel only listening on localhost, to limit the > chance of > someone going around the front-end?Essentially, yes. Of course, you could set up a load balancer in front of a private network, so it wouldn''t have to be just localhost, but that would be the default.>> >> Yeah; file description would still be over xmlrpc, only the file >> retrieval would be plain html. > > I hope you meant "plain HTTP" there. With the file descs still > being an > XML-RPC call, you''re not going to get much performance improvement > without > improving the performance of the file description code. The HTTP > requests > to the actual file will still need to be handled by dynamic code, > too, to > ensure that only authorised clients are getting the files.Yes, of course; that''s what I get for writing email when I''m exhausted. Some significant part of the slowness is definitely the file description code, but the absolute killer is base64-encoding the files and then decoding them on the client, which requires a minimum of 3 copies of the file in memory and takes a decent amount of cpu. Really, we need to switch to REST instead of xmlrpc, which will reduce overhead all around, but that''s less urgent than this fileserving change. If anyone''s excited about doing this, I''d love to get it done but it''s disappointingly low on my priority list. -- It is curious that physical courage should be so common in the world and moral courage so rare. -- Mark Twain --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 24, 2007, at 2:33 AM, David Lutterkort wrote:> > The big difference is that over plain HTTP you can easily stream the > file without reading the whole thing into memory. That''s not possible > with XMLRPC.Exactly -- I get to use the native web server''s (and client''s) ability to send files around, and I don''t have to touch them at all, other than handling the authentication and authorization. -- Kai''s Example Dilemma: A good analogy is like a diagonal frog. --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
> > That''s the irony -- mod_ruby sucks and is Apache-only, FCGI only works > > half-decently on lighttpd, but lighty doesn''t do SSL at all well. > > I.e., there are no great options. Do you think I should focus on a > fork-based system instead of mongrel?Right now, it seems like having a system that runs "right out of the box" is a very useful piece to have. If we can maintain that and add little pieces to scale it, that seems to be the way to go rather then convert over to something else right at this moment. Without SSL et al, the bar to entry for the puppet install is going to go up, and that seems fairly critical right now. Webrick sounds like the most complete base to use, at least for now. If we think it''ll fix for now, I''d vote for a straight fork or a series of forked off drb workers(they can hang around, can be an inherent way for connection limiting, can be spread to other systems, etc). --mac
On Sat, Feb 24, 2007 at 11:19:32AM +0100, Luke Kanies wrote:> On Feb 24, 2007, at 12:07 AM, Matthew Palmer wrote: > > > > That''s the irony -- mod_ruby sucks and is Apache-only, FCGI only works > > half-decently on lighttpd, but lighty doesn''t do SSL at all well. > > I.e., there are no great options. Do you think I should focus on a > fork-based system instead of mongrel?I honestly don''t know. In a "grand plan" sense, making Webrick suck less would be a big win for a lot of things, but it doesn''t seem like *anyone* wants to take that job on.> I can only imagine that mongrel''s speed is based on its html parsing, > but... I don''t really do any html parsing in Puppet, so I''m not that > concerned about it.I think mongrel''s speed is based on it being half a webserver.> Some significant part of the slowness is definitely the file > description code, but the absolute killer is base64-encoding the > files and then decoding them on the client, which requires a minimum > of 3 copies of the file in memory and takes a decent amount of cpu.Yeah, David''s mention of "file transfer over XML-RPC" reminded me of the horrors there. - Matt -- How about "suspender snapping three martini lunching mahogany tabled conference room equipped with overhead projector dwelling golden parachute flying bill gates specifying buzzword spewing computerworld and datamation reading trend bandwagoneering meeting going morons". -- Tom O''Toole
On Feb 25, 2007, at 1:51 AM, Chris McEniry wrote:> > Right now, it seems like having a system that runs "right out of the > box" is a very useful piece to have. If we can maintain that and > add little pieces to scale it, that seems to be the way to go rather > then convert over to something else right at this moment. Without > SSL et al, the bar to entry for the puppet install is going to go > up, and that seems fairly critical right now.I absolutely guarantee that Puppet will always work out of the box, using webrick or whatever. My plan until this thread started was to support multiple servers, like Rails does, such that you''d start with webrick, and if you had scaling problems you could switch to a higher- power server.> Webrick sounds like the most complete base to use, at least for now. > If we think it''ll fix for now, I''d vote for a straight fork or a > series > of forked off drb workers(they can hang around, can be an inherent way > for connection limiting, can be spread to other systems, etc).Anyone have any expertise in this area and interested in helping? I''ve no real idea even where to start, other than being pretty convinced that it''s probably a bad idea to fork for every connection I get. Also, is anyone who has plenty of nodes in a position to test any of this, or is anyone willing to otherwise find a way to load-test the server? -- Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for - in order to get to the job you need to pay for the clothes and the car, and the house you leave vacant all day so you can afford to live in it. -- Ellen DeGeneres --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 25, 2007, at 6:26 AM, Matthew Palmer wrote:>> I.e., there are no great options. Do you think I should focus on a >> fork-based system instead of mongrel? > > I honestly don''t know. In a "grand plan" sense, making Webrick > suck less > would be a big win for a lot of things, but it doesn''t seem like > *anyone* > wants to take that job on.I hadn''t even thought about modifying the webrick core, just about doing other stuff outside (like forking) to make it behave better for Puppet''s usage. I''ve no real idea where Webrick''s suckage comes from, so I''m not much in a position to make it better. It helps that I don''t actually even know how webrick sucks, other than knowing that it can''t run across more than one core and that multiple compiles seem to cause exponential slowdowns.>> I can only imagine that mongrel''s speed is based on its html parsing, >> but... I don''t really do any html parsing in Puppet, so I''m not that >> concerned about it. > > I think mongrel''s speed is based on it being half a webserver.Well, I know it''s got some C code for the performance-sensitive part, including a C-based html parser, but I wouldn''t have thought that html parsing had a significant effect on web server speed. It''d be interesting to drop in a different html parser in webrick to see if that made things better. -- A classic is something that everybody wants to have read and nobody wants to read. -- Mark Twain --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Thu, Feb 22, 2007 at 10:51:38AM -0500, Jeff McCune wrote:> Hi, > > My environment is composed of ~250 workstations hitting a single > puppetmaster server, which has been working fairly well up until now. > The most recent change has been a migration of a lot of remote file copy > objects which were previously handled with cfengine. > > client side puppetd calls to the puppetmaster.getconfig method are > taking unreasonably long, on the order of 2-3 minutes. It typically > takes the server about 1 to 2 seconds to compile the configuration for > each node. > > In an effort to mitigate the problem, I''ve switched all fileserver > operations to another server process, ensuring only CA methods and > configuration methods are being called from the default server process. > This is described in: > http://reductivelabs.com/cgi-bin/puppet.cgi/wiki/PuppetScalability > > Even after offloading all fileserver operations, the getconfig method is > taking a minute or more on average. I''m currently running puppet every > half hour from cron with a 15 minute splay. > > I''m wondering if anyone else has suggestions or insight into reducing > response time in a setup like this.Hi, I tried to reproduce this on my systems with out any success, restarting puppet on 50 nodes (compile time is 2-5 secs normally for each one) and the server manages to server *all* clients in under 30 secs. There are less than a dozen files in the fileserver for those nodes though. Kostas
Luke Kanies wrote:> On Feb 23, 2007, at 10:23 PM, Matthew Palmer wrote: >> You can''t do it for recursive file transfers (which are the ones >> that get >> real big, real quick), and it does make the manifest data being >> sent quite >> large, which can be problem. > > True on both accounts. I clearly don''t have a good solution.My current solution is a discrete subversion repository which only the puppet clients can authenticate to.>> Webrick. Is. Not. Single. Threaded. Where did this idea come >> from? >> When Rails uses webrick, it only processes a single thread at a >> time, but >> that''s because Rails isn''t thread safe. > > I guess I just thought it was, and maybe I thought Puppet would be > behaving better if webrick were multithreaded.FWIW: http://wiki.rubyonrails.org/rails/pages/HowTosWorkerThreads "First some background: WebRick is a single threaded server. If a request from a browser comes in for dynamic content (e.g. a …rhtml page) there is a single Mutex lock that gets acquired and held while the request is being processed.">> Last time we talked about this, Mongrel was single-threaded, lacked >> SSL >> support, and didn''t really qualify as a "real" webserver. Does >> this switch >> to Mongrel mean that we''re going to have to start running Apache in >> front of >> our Puppetmasters? > > If you want the scaling that Mongrel provides, yes. I''ll always > support the degenerate, out of the box usage of webrick, because it''s > wicked easy and clearly works fine for small sites. If we can find a > way to make webrick scale, I''m all for that, because it sure is easy, > but it seemed easier to do what everyone else is doing rather than > figuring it all out on my own.I''d be quite happy running apache in front of puppet. It gives us a *lot* of functionality. Case in point: Subversion repositories.> I agree that mongrel isn''t a great solution, but the only real > options are mod_ruby plus apache (everyone seems to hate mod_ruby and > it seems to be poorly maintained), fcgi + something (apparently this > is fine as long as you don''t use apache, because apache''s fcgi > configs are heinous), or mongrel + something. Pretty much everyone > is now using mongrel, so I''ve been looking at using it with pound, > which is the only lightweight webserver I''ve found that supports > client certs. The cert info would be sent to mongrel as part of the > http request.I tried to get lighttpd checking client certs about a year ago. The developers stated it probably won''t ever happen.> I''d love not to have to do it this way, so maybe I should just start > forking per request. Jeff, are you willing to test this? It''s about > two lines of code to see how it works for you.I''d love to test forking per request.> Yeah; file description would still be over xmlrpc, only the file > retrieval would be plain html.I think this is the way to go. Scaling HTTP servers is a problem that''s already been solved, so we may as well take advantage of existing solutions. Cheers, -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
Luke Kanies wrote:> Also, is anyone who has plenty of nodes in a position to test any of > this, or is anyone willing to otherwise find a way to load-test the > server?Count me in. I''m guessing it''d help if we sprinkle in some benchmarking checks in the master and take tcpdumps while N nodes connect, we''d probably gain a lot more insight. -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
Kostas Georgiou wrote:> Hi, > > I tried to reproduce this on my systems with out any success, restarting > puppet on 50 nodes (compile time is 2-5 secs normally for each one) and the > server manages to server *all* clients in under 30 secs. There are less > than a dozen files in the fileserver for those nodes though.Try it with one recursive file copy with about 10 directories and 30 files in the tree... -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
On Mon, Feb 26, 2007 at 09:53:27AM -0500, Jeff McCune wrote:> Kostas Georgiou wrote: > >Hi, > > > >I tried to reproduce this on my systems with out any success, restarting > >puppet on 50 nodes (compile time is 2-5 secs normally for each one) and the > >server manages to server *all* clients in under 30 secs. There are less > >than a dozen files in the fileserver for those nodes though. > > Try it with one recursive file copy with about 10 directories and 30 > files in the tree...Yes that is really slow with even one client, I have some systems that do a recursive copy on a directory with ~500 files and it takes around 120secs if everything is synced even if the server is idle. It looks to me that the client fires an xmlrpc call for each file in a serial manner (does it open a new connection every time?) and the latency kills the speed. Changing the describe,list,retrieve rpc calls to be able to use arrays/lists will probably improve performance a lot more than threads or forks in the server, it will also drop the cpu load by a lot since SSL connections are quite heavy. Kostas
On Sun, 2007-02-25 at 09:38 +0100, Luke Kanies wrote:> > I think mongrel''s speed is based on it being half a webserver. > > Well, I know it''s got some C code for the performance-sensitive part, > including a C-based html parser, but I wouldn''t have thought that > html parsing had a significant effect on web server speed. It''d be > interesting to drop in a different html parser in webrick to see if > that made things better.Where exactly does it have to do HTML parsing ? If webrick wants to parse HTML when used with puppet, there''s something really fishy going on.>From the rest of the thread, it sounds though that focusing on makingthe fileserver faster might be a serious win (especially in light of Kostas'' and Jeff''s experiments with how scalability changes when larger amounts of fileserving are involved) - that also has the advantage that it won''t require hairy threading or preforking biz, only better batching of requests for describe/download of files. David
On Mon, Feb 26, 2007 at 09:45:22AM -0500, Jeff McCune wrote:> Luke Kanies wrote: > >On Feb 23, 2007, at 10:23 PM, Matthew Palmer wrote: > >>Webrick. Is. Not. Single. Threaded. Where did this idea come > >>from? > >>When Rails uses webrick, it only processes a single thread at a > >>time, but > >>that''s because Rails isn''t thread safe. > > > >I guess I just thought it was, and maybe I thought Puppet would be > >behaving better if webrick were multithreaded. > > FWIW: http://wiki.rubyonrails.org/rails/pages/HowTosWorkerThreads > > "First some background: WebRick is a single threaded server. If a > request from a browser comes in for dynamic content (e.g. a ?rhtml page) > there is a single Mutex lock that gets acquired and held while the > request is being processed."The only place mutexes are used in webrick is in the authentication code. The non-thread-safe code is Rails, and it''s Rails that sets the global mutex to avoid problems. - Matt -- "As far as I''m concerned, spammers are nothing more than electronic home-invasion gangs." -- Andy Markley
Kostas Georgiou wrote:> On Mon, Feb 26, 2007 at 09:53:27AM -0500, Jeff McCune wrote: > >> Kostas Georgiou wrote: >>> Hi, >>> >>> I tried to reproduce this on my systems with out any success, restarting >>> puppet on 50 nodes (compile time is 2-5 secs normally for each one) and the >>> server manages to server *all* clients in under 30 secs. There are less >>> than a dozen files in the fileserver for those nodes though. >> Try it with one recursive file copy with about 10 directories and 30 >> files in the tree... > > Yes that is really slow with even one client, I have some systems that do a > recursive copy on a directory with ~500 files and it takes around 120secs if > everything is synced even if the server is idle. > > It looks to me that the client fires an xmlrpc call for each file in a serial > manner (does it open a new connection every time?) and the latency kills the > speed. > > Changing the describe,list,retrieve rpc calls to be able to use arrays/lists > will probably improve performance a lot more than threads or forks in the server, > it will also drop the cpu load by a lot since SSL connections are quite heavy. > > KostasI haven''t looked at improving the fileserver methods yet, but as a stop gap solution, here''s what I''ve settled on: I have a subversion repository using AuthzSVNAccessFile for access control and have setup a single account with read only access to the repository. All static content goes into the subversion repository, the dynamic content, of which there is little, reverts back to non-recursive fileserver methods. I haven''t switched all 200+ nodes over yet, but so far it looks like it''s much more responsive, which is to be expected. I think there''s only 4 calls to fileserver.* on each node now. I sync the repository down to each node like so: file { "/Support/vault": ensure => directory, owner => 0, group => 0, mode => 750 } exec { "FileCache": path => "/usr/bin:/bin:/opt/local/bin:/usr/local/bin", command => "svn co --non-interactive \ --username=selfupdate\ --password=password \ https://manage.math.ohio-state.edu/svn/siteconfig/trunk \ /Support/vault/cache", require => File["/Support/vault"] } I then have a file_from_cache component, which always requires Exec["FileCache"]: define file_from_cache( $source = false, $sourcedir = "/Support/vault/cache", $destdir = false, $owner = 0, $group = 0, $mode = 0640, $recurse = true, $ignore = ".svn", $backup = false, $require = false ) { $source_real = $source ? { false => $name, default => $source } $name_real = $destdir ? { false => $name, default => "$destdir/$name" } $require_real = $require ? { false => Exec["FileCache"], default => [ Exec["FileCache"], $require ] } file { $name_real: source => "$sourcedir/$source_real", owner => $owner, group => $group, mode => $mode, recurse => $recurse, ignore => $ignore, backup => $backup, require => $require_real } } -- Jeff McCune The Ohio State University Department of Mathematics Systems Manager _______________________________________________ Puppet-users mailing list Puppet-users@madstop.com https://mail.madstop.com/mailman/listinfo/puppet-users
On Mon, Feb 26, 2007 at 11:03:53AM -0800, David Lutterkort wrote:> On Sun, 2007-02-25 at 09:38 +0100, Luke Kanies wrote: > > Well, I know it''s got some C code for the performance-sensitive part, > > including a C-based html parser, but I wouldn''t have thought that > > html parsing had a significant effect on web server speed. It''d be > > interesting to drop in a different html parser in webrick to see if > > that made things better.> Where exactly does it have to do HTML parsing ? If webrick wants to > parse HTML when used with puppet, there''s something really fishy going > on.I think he meant "HTTP request parser". -- Ceri Storey <cez@necrofish.org.uk> ''What I really want is "apt-get smite"'' --Rob Partington http://unix.culti.st/
On Feb 27, 2007, at 4:10 AM, Ceri Storey wrote:> On Mon, Feb 26, 2007 at 11:03:53AM -0800, David Lutterkort wrote: >> On Sun, 2007-02-25 at 09:38 +0100, Luke Kanies wrote: >>> Well, I know it''s got some C code for the performance-sensitive >>> part, >>> including a C-based html parser, but I wouldn''t have thought that >>> html parsing had a significant effect on web server speed. It''d be >>> interesting to drop in a different html parser in webrick to see if >>> that made things better. > >> Where exactly does it have to do HTML parsing ? If webrick wants to >> parse HTML when used with puppet, there''s something really fishy >> going >> on. > > I think he meant "HTTP request parser".I was basing my email on this post: http://www.zedshaw.com/tips/ragel_state_charts.html But yeah, it looks like I meant http parser, not html. -- Sometimes I think we''re alone. Sometimes I think we''re not. In either case, the thought is staggering. --R. Buckminster Fuller --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 26, 2007, at 1:03 PM, David Lutterkort wrote:>> From the rest of the thread, it sounds though that focusing on making > the fileserver faster might be a serious win (especially in light of > Kostas'' and Jeff''s experiments with how scalability changes when > larger > amounts of fileserving are involved) - that also has the advantage > that > it won''t require hairy threading or preforking biz, only better > batching > of requests for describe/download of files.This should definitely be a goal, but it won''t be sufficient on its own. I''ve got a customer (whose machines are checking in every 15 minutes and who regularly reboots all of them at the same time) who''s having trouble even though they''re not using the fileserver. -- The remarkable thing about Shakespeare is that he really is very good, in spite of all the people who say he is very good. -- Robert Graves --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 26, 2007, at 11:11 AM, Kostas Georgiou wrote:> > Yes that is really slow with even one client, I have some systems > that do a > recursive copy on a directory with ~500 files and it takes around > 120secs if > everything is synced even if the server is idle. > > It looks to me that the client fires an xmlrpc call for each file > in a serial > manner (does it open a new connection every time?) and the latency > kills the > speed. > > Changing the describe,list,retrieve rpc calls to be able to use > arrays/lists > will probably improve performance a lot more than threads or forks > in the server, > it will also drop the cpu load by a lot since SSL connections are > quite heavy.I knew when I was setting it up that it would not scale well, although I didn''t, of course, know exactly how poorly it would function. I''d love to be able to spend a month solving this problem, but I''m afraid it would actually take something like that. File recursion is some of the most difficult code in all of Puppet, and it''s implemented in this poor-performing fashion because that''s the only way I could figure out how to do it, basically. All of the file recursion code needs to be rethought completely, and, as has been asked for about ten million times, Puppet needs to support more file transfer protocols beyond its own. But I''ve got too many other things on my plate and I need some help from someone who can come up with a better way to solve these problems, since I can''t figure it out. -- As a general rule, don''t solve puzzles that open portals to Hell. --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com
On Feb 26, 2007, at 8:51 AM, Jeff McCune wrote:> Luke Kanies wrote: >> Also, is anyone who has plenty of nodes in a position to test any >> of this, or is anyone willing to otherwise find a way to load- >> test the server? > > Count me in. I''m guessing it''d help if we sprinkle in some > benchmarking checks in the master and take tcpdumps while N nodes > connect, we''d probably gain a lot more insight.Yeah, I definitely need to add more benchmarking in. -- When one admits that nothing is certain one must, I think, also admit that some things are much more nearly certain than others. -- Bertrand Russell --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com