Paul Butcher
2006-Sep-20 10:18 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
We have been searching for a Rails deployment architecture which works for us for some time. We''ve recently moved from Apache 1.3 + FastCGI to Apache 2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a significant improvement. But it still exhibits serious performance problems. We have the beginnings of a fix that we would like to share. To illustrate the problem, imagine a 2 element mongrel cluster running a Rails app containing the following simple controller: class HomeController < ApplicationController def fast sleep 1 render :text => "I''m fast" end def slow sleep 10 render :text => "I''m slow" end end and the following test app #!/usr/bin/env ruby require File.dirname(__FILE__) + ''/config/boot'' require File.dirname(__FILE__) + ''/config/environment'' end_time = 1.minute.from_now fast_count = 0 slow_count = 0 fastthread = Thread.start do while Time.now < end_time do Net::HTTP.get ''localhost'', ''/home/fast'' fast_count += 1 end end slowthread = Thread.start do while Time.now < end_time do Net::HTTP.get ''localhost'', ''/home/slow'' slow_count += 1 end end fastthread.join slowthread.join puts "Fast: #{fast_count}" puts "Slow: #{slow_count}" In this scenario, there will be two requests outstanding at any time, one "fast" and one "slow". You would expect approximately 60 fast and 6 slow GETs to complete over the course of a minute. This is not what happens; approximately 12 fast and 6 slow GETs complete per minute. The reason is that mod_proxy_balancer assumes that it can send multiple requests to each mongrel and fast requests end up waiting for slow requests, even if there is an idle mongrel server available. We''ve experimented with various different configurations for mod_proxy_balancer without successfully solving this issue. As far as we can tell, all other popular load balancers (Pound, Pen, balance) behave in roughly the same way. This is causing us real problems. Our user interface is very time-sensitive. For common user actions, a page refresh delay of more than a couple of seconds is unacceptable. What we''re finding is that if we have (say) a reporting page which takes 10 seconds to display (an entirely acceptable delay for a rarely-used report) then our users are seeing similar delays on pages which should be virtually instantaneous (and would be, if their requests were directed to idle servers). Worse, we''re occasionally seeing unnecessary timeouts because requests are queuing up on one server. The real solution to the problem would be to remove Rails'' inability to handle more than one thread. In the absence of that solution, however, we''ve implemented (in Ruby) what might be the world''s smallest load-balancer. It only ever sends a single request to each member of the cluster at a time. It''s called HighWire and is available on RubyForge (no Gem yet - it''s on the list of things to do!): svn checkout svn://rubyforge.org/var/svn/highwire Using this instead of mod_proxy_balancer, and running the same test script above, we see approximately 54 fast and 6 slow requests per minute. HighWire is very young and has a way to go. It''s not had any serious optimization or testing, and there are a bunch of things that need doing before it can really be considered production ready. But it does work for us, and does produce a significant performance improvement. Please check it out and let us know what you think. -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
Jens Kraemer
2006-Sep-20 17:07 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
Hi! On Wed, Sep 20, 2006 at 11:18:53AM +0100, Paul Butcher wrote:> We have been searching for a Rails deployment architecture which works for > us for some time. We''ve recently moved from Apache 1.3 + FastCGI to Apache > 2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a significant > improvement. But it still exhibits serious performance problems.hey, cool, I really like the simplicity of your approach. However I tried to solve your problem with Pen, and here''s what I got: standard pen setup, no special options besides ''no sticky sessions'' and ''non blocking mode'': pen -fndr 9000 localhost:9001 localhost:9002 Fast: 13 Slow: 6 just as predicted by you. However, if I limit *one* of the mongrels to 1 connection at a time, I get this: pen -fdnr 9000 localhost:9001:1 localhost:9002 Fast: 59 Slow: 6 When I limit both backend servers to only 1 connection the test script always bails out since pen doesn''t seem to keep connections like you do in your queue, but closes them if it finds no server to dispatch to, so it needs at least one backend process to pile up connections at when the need arises. I have no idea if this is a solution that would be useful under real life conditions, but found the behaviour quite interesting. I also raised the number of threads requesting the fast action, which led to more successful requests on the fast action, and (sometimes) less on the slow one, i.e. Fast: 67-69 Slow: 5-6 with 10 threads accessing the fast action. Whatever load balancing you use, you''ll always need to have more Mongrels than there are concurrent requests for ''slow'' actions to avoid delays for clients requesting a ''fast'' action. So maybe, if the slow actions are well known, one could just reserve a pool of Mongrels for these slow actions, and another one for the fast ones... Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Vishnu Gopal
2006-Sep-20 17:36 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
On 9/20/06, Paul Butcher <paul at paulbutcher.com> wrote:> > [..] > implemented (in Ruby) what might be the world''s smallest load-balancer. It > only ever sends a single request to each member of the cluster at a time. > [..]What happens to the other requests? Are they queued up? And if so, how does that solve the problem? Please check it out and let us know what you think. Will do. Looks interesting :-) Vish --> paul.butcher->msgCount++ > > Snetterton, Castle Combe, Cadwell Park... > Who says I have a one track mind? > > MSN: paul at paulbutcher.com > AIM: paulrabutcher > Skype: paulrabutcher > LinkedIn: https://www.linkedin.com/in/paulbutcher > > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20060920/d5edf328/attachment.html
Zed Shaw
2006-Sep-20 19:21 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
On Wed, 2006-09-20 at 11:18 +0100, Paul Butcher wrote: First off, it''s awesome that you''re writing a solution to your problem. Very cool and looks like it didn''t take you much. Maybe I''ll work up something in C that does this since it''s a common requested solution.> This is causing us real problems. Our user interface is very time-sensitive. > For common user actions, a page refresh delay of more than a couple of > seconds is unacceptable.But, if your interface is time sensitive then why does it have actions that take too much time? See the conundrum there?> What we''re finding is that if we have (say) a > reporting page which takes 10 seconds to display (an entirely acceptable > delay for a rarely-used report) ....<snip>See, right there. "reporting" is usually a goldmine for stuff you can push out to a backgroundrb processors. The best pattern here is not "ok, let''s write a load balancer in ruby so all processes are as fast as one ruby process" but instead to redesign that rails action to use backgroundrb. This will get you over the hump, but you''ll seriously have to look at carving those unpredictable actions out and making them shorter. -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
Paul Butcher
2006-Sep-20 19:45 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
Vishnu Gopal <g.vishnu at gmail.com> wrote:> What happens to the other requests? Are they queued up? And if so, how > does that solve the problem?It depends on what you mean by "the other" requests. The problem with mod_proxy_balancer is that you can get into a situation where all mongrels are idle but one, but it sends the next request to the busy mongrel, not one of the idle mongrels. HighWire will always send each incoming request to an idle mongrel if there are any. If there aren''t any, then yes, it will queue them up. But it will only do so if there aren''t any idle mongrels. That''s the difference. -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
Paul Butcher
2006-Sep-20 20:14 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
Zed Shaw <zedshaw at zedshaw.com> wrote> But, if your interface is time sensitive then why does it have actions > that take too much time? See the conundrum there?I was kinda expecting that response, Zed, but I didn''t want to rob you of the pleasure of saying it ;-) We are in fact using backgroundrb for a number of long-running actions. It''s fantastic, but it isn''t ever going to solve the problem for us completely. There are a whole bunch of reasons, which may be specific to our particular application, but I doubt it. First the less convincing arguments: 1) We''re lazy (in a good way). Our application is used by hundreds of our employees, 24 hours a day. We, correctly, invest a great deal of time making sure that the things that they do over and over again are as efficient as they possibly can be. We don''t, however, believe that it''s appropriate for us to spend time optimising things which are only used by one or two guys once or twice a day. It''s not as if we have a lack of things to spend our time on, and we''d rather spend that time on things that really matter, rather than optimising things which only need to be optimised because the performance characteristics of Rails mean that one poorly performing action results in *all* actions performing poorly. 2) We''re not infallible. Sometimes we screw up and release a version of the software where one of our actions takes longer than it should. This is bad if it only affects the action we screwed up, but if that''s all it does then it''s not a disaster. If one slow action screws up *all* the actions, however, that is a disaster. 3) Even if we spend all of the time we possibly can optimising actions, different actions will always take different amounts of time to execute. It''s not as if there''s "one true" execution time for an action. The bad effects become apparent whenever you have any two actions which take noticeably different amounts of time to execute. Yes, in the example that I gave it was 1 second and 10 seconds. But the same basic effect would be there if we were talking about 1ms and 10ms. Those arguments would be enough, I believe. But in our case there''s one much stronger argument which trumps all the above IMHO: 4) It''s not possible to predict the time that an action, even a heavily optimised one, will take. The vagaries of cacheing, swapping etc mean that this is simply out of our hands. In our case, for example, many of our actions involve fulltext searches of the database (we have no choice about this - it''s fundamental to the nature of what we do). The performance of a fulltext search is *extremely* unpredictable (in MySQL anyway). There can be occasions where the first time a particular search is done, it takes 10 seconds. The second time, because the database has "got the idea", it can take a few milliseconds. Backgroundrb cannot solve this problem - if we sent all of these actions to backgroundrb, pretty much *every* action would end up being sent to backgroundrb and then we end up with a very similar load balancing problem - it just becomes a problem of load balancing to backgroundrb instead of to mongrel. Make sense? -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
Paul Butcher
2006-Sep-20 20:18 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
Jens Kraemer <kraemer at webit.de> wrote:> However, if I limit *one* of the mongrels to 1 connection at a time, I get > this: > pen -fdnr 9000 localhost:9001:1 localhost:9002 > Fast: 59 > Slow: 6That''s interesting! Thanks. I did try a similar configuration of mod_proxy_balancer, but it didn''t have the same effect. Maybe I''ll play with Pen some more...> So maybe, if the slow actions are well known, one could just reserve a > pool of Mongrels for these slow actions, and another one for the fast > ones...We did consider this approach. It "feels wrong" to us though, and as I said in my response to Zed''s mail, for our application at least it''s not always possible to identify "slow" actions ahead of time :-( Thanks for the information though! -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
Joshua Sierles
2006-Sep-20 22:14 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
I have to chime in and agree with Zed here. We had similar problems at MOG, and came to the conclusion that solving the delegation problem is just curing the symptoms. By systematically offloading a lot of slow requests to background processes, we got more flexibility and the ability to check the progress of such events. I''d say the general consensus is that any page that needs more than a few seconds to load needs optimization or offloading. Joshua Sierles
snacktime
2006-Sep-21 01:23 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
Might be interesting to give Perlbal (http://www.danga.com) a try as an http proxy. Looks extremely configurable and supports ssl. They also use an interesting technique where the backend node can return a reproxy command if it''s busy, and the proxy can then send the request to another node. It also appears to be able to only send connections to nodes that are free to service a request.
Alexander Lazic
2006-Sep-21 14:08 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
On Mit 20.09.2006 11:18, Paul Butcher wrote:>We have been searching for a Rails deployment architecture which works >for us for some time. We''ve recently moved from Apache 1.3 + FastCGI to >Apache 2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a >significant improvement. But it still exhibits serious performance >problems.Have you ever use haproxy http://haproxy.1wt.eu/ ?! He have the following feature which can help you: --- http://haproxy.1wt.eu/download/1.2/doc/haproxy-en.txt 3.4) Limiting the number of concurrent sessions on each server weight minconn maxconn --- This tool can also check the availibility of your backends. For ssl you need a ssl-termination SW such as stunnel or delegate or which ever SW you prefer. On the haproxy site you have a patch for stunnel for the x-forwarded-for header, if you need ;-) Regards Alex
Paul Butcher
2006-Sep-21 19:06 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
> Have you ever use haproxy http://haproxy.1wt.eu/ ?!In a word, no :-)> He have the following feature which can help you:Thanks - sounds interesting. We''ll give it a go! -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
hemant
2006-Sep-21 19:17 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution
On 9/20/06, Paul Butcher <paul at paulbutcher.com> wrote:> > The real solution to the problem would be to remove Rails'' inability to > handle more than one thread. In the absence of that solution, however, we''ve > implemented (in Ruby) what might be the world''s smallest load-balancer. It > only ever sends a single request to each member of the cluster at a time. > It''s called HighWire and is available on RubyForge (no Gem yet - it''s on the > list of things to do!): > > svn checkout svn://rubyforge.org/var/svn/highwire > > Using this instead of mod_proxy_balancer, and running the same test script > above, we see approximately 54 fast and 6 slow requests per minute. > > HighWire is very young and has a way to go. It''s not had any serious > optimization or testing, and there are a bunch of things that need doing > before it can really be considered production ready. But it does work for > us, and does produce a significant performance improvement. > > Please check it out and let us know what you think.(*Laziness kicks in*) How do you find...which one of mongrel id idle? -- There was only one Road; that it was like a great river: its springs were at every doorstep, and every path was its tributary.
Alexander Lazic
2006-Sep-22 05:30 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
On Don 21.09.2006 20:06, Paul Butcher wrote:>> Have you ever use haproxy http://haproxy.1wt.eu/ ?! > >In a word, no :-) > >> He have the following feature which can help you: > >Thanks - sounds interesting. We''ll give it a go!Please let us/me know your experience ;-) Thanks && Regards Alex
Paul Butcher
2006-Sep-22 08:51 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
> (*Laziness kicks in*) How do you find...which one of mongrel id idle?It''s a fantastically subtle and complicated algorithm. Not. The main thread handles incoming connections, and then there''s one thread per worker. The main thread does this: while request = server.accept queue.push request end And each worker does this: loop do request = queue.pop # Handle the request ... end -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? MSN: paul at paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher LinkedIn: https://www.linkedin.com/in/paulbutcher
snacktime
2006-Sep-25 06:07 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
I was taking a look at the source for mod_proxy_balancer this evening, and it looks like the simplest solution to this whole problem is to just add another balance method. The functions that determine what balance member gets the next request are pretty short. I''m a pretty bad C programmer, but I subscribed to the apache-dev list this evening and maybe I can get someone there to assist me on the finer points.>From what I saw, it looks like you could keep an array of balancemembers that are handling a request, and then implement a short wait cycle if all the members are busy. Have it write out a warning to the apache log when it has to wait so you know when to be adding more mongrels. I''ll let everyone know how it goes. Chris
Alexander Lazic
2006-Sep-25 11:55 UTC
[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution
On Son 24.09.2006 23:07, snacktime wrote:> >I was taking a look at the source for mod_proxy_balancer this evening, >and it looks like the simplest solution to this whole problem is to >just add another balance method. The functions that determine what >balance member gets the next request are pretty short. I''m a pretty >bad C programmer, but I subscribed to the apache-dev list this evening >and maybe I can get someone there to assist me on the finer points.I think it would be possible to use this setup: +------------------------+ |apache_mod_proxy_balance| | or any other webserver | +------------------------+ | V +----------------------------+ | haproxy maxconn 1 minconn 1| +----------------------------+ | | V V +-------------+ +-------------+ |mongrel_rails| |mongrel_rails| +-------------+ +-------------+ thoughts?! Pros: haproxy balance thru all mongel_rails check the backend if still alive it queues the incomming requests Cons: more complex setup you need to manage more programms/tools Regards Aleks