thr3ads.net - Mongrel users - [Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Paul Butcher

2006-Sep-20 10:18 UTC

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

We have been searching for a Rails deployment architecture which works for
us for some time. We''ve recently moved from Apache 1.3 + FastCGI to
Apache
2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a significant
improvement. But it still exhibits serious performance problems.

We have the beginnings of a fix that we would like to share.

To illustrate the problem, imagine a 2 element mongrel cluster running a
Rails app containing the following simple controller:

  class HomeController < ApplicationController
    def fast
      sleep 1
      render :text => "I''m fast"
    end

    def slow
      sleep 10
      render :text => "I''m slow"
    end
  end

and the following test app 

  #!/usr/bin/env ruby
  require File.dirname(__FILE__) + ''/config/boot''
  require File.dirname(__FILE__) + ''/config/environment''

  end_time = 1.minute.from_now

  fast_count = 0
  slow_count = 0

  fastthread = Thread.start do
    while Time.now < end_time do
      Net::HTTP.get ''localhost'',
''/home/fast''
      fast_count += 1
    end
  end

  slowthread = Thread.start do
    while Time.now < end_time do
      Net::HTTP.get ''localhost'',
''/home/slow''
      slow_count += 1
    end
  end

  fastthread.join
  slowthread.join

  puts "Fast: #{fast_count}"
  puts "Slow: #{slow_count}"

In this scenario, there will be two requests outstanding at any time, one
"fast" and one "slow". You would expect approximately 60
fast and 6 slow
GETs to complete over the course of a minute. This is not what happens;
approximately 12 fast and 6 slow GETs complete per minute.

The reason is that mod_proxy_balancer assumes that it can send multiple
requests to each mongrel and fast requests end up waiting for slow requests,
even if there is an idle mongrel server available.

We''ve experimented with various different configurations for
mod_proxy_balancer without successfully solving this issue. As far as we can
tell, all other popular load balancers (Pound, Pen, balance) behave in
roughly the same way.

This is causing us real problems. Our user interface is very time-sensitive.
For common user actions, a page refresh delay of more than a couple of
seconds is unacceptable. What we''re finding is that if we have (say) a
reporting page which takes 10 seconds to display (an entirely acceptable
delay for a rarely-used report) then our users are seeing similar delays on
pages which should be virtually instantaneous (and would be, if their
requests were directed to idle servers). Worse, we''re occasionally
seeing
unnecessary timeouts because requests are queuing up on one server.

The real solution to the problem would be to remove Rails'' inability to
handle more than one thread. In the absence of that solution, however,
we''ve
implemented (in Ruby) what might be the world''s smallest load-balancer.
It
only ever sends a single request to each member of the cluster at a time.
It''s called HighWire and is available on RubyForge (no Gem yet -
it''s on the
list of things to do!):

  svn checkout svn://rubyforge.org/var/svn/highwire

Using this instead of mod_proxy_balancer, and running the same test script
above, we see approximately 54 fast and 6 slow requests per minute.

HighWire is very young and has a way to go. It''s not had any serious
optimization or testing, and there are a bunch of things that need doing
before it can really be considered production ready. But it does work for
us, and does produce a significant performance improvement.

Please check it out and let us know what you think.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

Jens Kraemer

2006-Sep-20 17:07 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

Hi!

On Wed, Sep 20, 2006 at 11:18:53AM +0100, Paul Butcher
wrote:> We have been searching for a Rails deployment architecture which works for
> us for some time. We''ve recently moved from Apache 1.3 + FastCGI
to Apache
> 2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a significant
> improvement. But it still exhibits serious performance problems.

hey, cool, I really like the simplicity of your approach. 

However I tried to solve your problem with Pen, and here''s what I got:

standard pen setup, no special options besides ''no sticky
sessions'' and
''non blocking mode'':
pen -fndr 9000 localhost:9001 localhost:9002
Fast: 13
Slow: 6

just as predicted by you.

However, if I limit *one* of the mongrels to 1 connection at a time, I get
this:
pen -fdnr 9000 localhost:9001:1 localhost:9002
Fast: 59
Slow: 6

When I limit both backend servers to only 1 connection the test script 
always bails out since pen doesn''t seem to keep connections like you do
in your queue, but closes them if it finds no server to dispatch to, so
it needs at least one backend process to pile up connections at when the
need arises.

I have no idea if this is a solution that would be useful under real life
conditions, but found the behaviour quite interesting.

I also raised the number of threads requesting the fast action, which
led to more successful requests on the fast action, and (sometimes) less 
on the slow one, i.e.
Fast: 67-69
Slow: 5-6
with 10 threads accessing the fast action.

Whatever load balancing you use, you''ll always need to have more
Mongrels than there are concurrent requests for ''slow'' actions
to avoid
delays for clients requesting a ''fast'' action.

So maybe, if the slow actions are well known, one could just reserve a
pool of Mongrels for these slow actions, and another one for the fast
ones...

Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Vishnu Gopal

2006-Sep-20 17:36 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

On 9/20/06, Paul Butcher <paul at paulbutcher.com>
wrote:>
> [..]
> implemented (in Ruby) what might be the world''s smallest
load-balancer. It
> only ever sends a single request to each member of the cluster at a time.
> [..]

What happens to the other requests? Are they queued up? And if so, how does
that solve the problem?

Please check it out and let us know what you think.



Will do. Looks interesting :-)

Vish

--> paul.butcher->msgCount++
>
> Snetterton, Castle Combe, Cadwell Park...
> Who says I have a one track mind?
>
> MSN: paul at paulbutcher.com
> AIM: paulrabutcher
> Skype: paulrabutcher
> LinkedIn: https://www.linkedin.com/in/paulbutcher
>
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20060920/d5edf328/attachment.html

Zed Shaw

2006-Sep-20 19:21 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

On Wed, 2006-09-20 at 11:18 +0100, Paul Butcher wrote:

First off, it''s awesome that you''re writing a solution to your
problem.
Very cool and looks like it didn''t take you much.  Maybe I''ll
work up
something in C that does this since it''s a common requested solution.
> This is causing us real problems. Our user interface is very
time-sensitive.
> For common user actions, a page refresh delay of more than a couple of
> seconds is unacceptable.
But, if your interface is time sensitive then why does it have actions
that take too much time?  See the conundrum there?
>  What we''re finding is that if we have (say) a
> reporting page which takes 10 seconds to display (an entirely acceptable
> delay for a rarely-used report) ....<snip>
See, right there.  "reporting" is usually a goldmine for stuff you can
push out to a backgroundrb processors.  The best pattern here is not
"ok, let''s write a load balancer in ruby so all processes are as
fast as
one ruby process" but instead to redesign that rails action to use
backgroundrb.

This will get you over the hump, but you''ll seriously have to look at
carving those unpredictable actions out and making them shorter.

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Paul Butcher

2006-Sep-20 19:45 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

Vishnu Gopal <g.vishnu at gmail.com> wrote:> What happens to the other requests? Are they queued up? And if so, how
> does that solve the problem?
It depends on what you mean by "the other" requests.

The problem with mod_proxy_balancer is that you can get into a situation
where all mongrels are idle but one, but it sends the next request to the
busy mongrel, not one of the idle mongrels.

HighWire will always send each incoming request to an idle mongrel if there
are any. If there aren''t any, then yes, it will queue them up. But it
will
only do so if there aren''t any idle mongrels. That''s the
difference.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

Paul Butcher

2006-Sep-20 20:14 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

Zed Shaw <zedshaw at zedshaw.com> wrote> But, if your interface is time sensitive then why does it have actions
> that take too much time? See the conundrum there?
I was kinda expecting that response, Zed, but I didn''t want to rob you
of
the pleasure of saying it ;-)

We are in fact using backgroundrb for a number of long-running actions.
It''s
fantastic, but it isn''t ever going to solve the problem for us
completely.
There are a whole bunch of reasons, which may be specific to our particular
application, but I doubt it.

First the less convincing arguments:

1) We''re lazy (in a good way). Our application is used by hundreds of
our
employees, 24 hours a day. We, correctly, invest a great deal of time making
sure that the things that they do over and over again are as efficient as
they possibly can be. We don''t, however, believe that it''s
appropriate for
us to spend time optimising things which are only used by one or two guys
once or twice a day. It''s not as if we have a lack of things to spend
our
time on, and we''d rather spend that time on things that really matter,
rather than optimising things which only need to be optimised because the
performance characteristics of Rails mean that one poorly performing action
results in *all* actions performing poorly.

2) We''re not infallible. Sometimes we screw up and release a version of
the
software where one of our actions takes longer than it should. This is bad
if it only affects the action we screwed up, but if that''s all it does
then
it''s not a disaster. If one slow action screws up *all* the actions,
however, that is a disaster.

3) Even if we spend all of the time we possibly can optimising actions,
different actions will always take different amounts of time to execute.
It''s not as if there''s "one true" execution time for
an action. The bad
effects become apparent whenever you have any two actions which take
noticeably different amounts of time to execute. Yes, in the example that I
gave it was 1 second and 10 seconds. But the same basic effect would be
there if we were talking about 1ms and 10ms.

Those arguments would be enough, I believe. But in our case there''s one
much
stronger argument which trumps all the above IMHO:

4) It''s not possible to predict the time that an action, even a heavily
optimised one, will take. The vagaries of cacheing, swapping etc mean that
this is simply out of our hands. In our case, for example, many of our
actions involve fulltext searches of the database (we have no choice about
this - it''s fundamental to the nature of what we do). The performance
of a
fulltext search is *extremely* unpredictable (in MySQL anyway). There can be
occasions where the first time a particular search is done, it takes 10
seconds. The second time, because the database has "got the idea", it
can
take a few milliseconds. Backgroundrb cannot solve this problem - if we sent
all of these actions to backgroundrb, pretty much *every* action would end
up being sent to backgroundrb and then we end up with a very similar load
balancing problem - it just becomes a problem of load balancing to
backgroundrb instead of to mongrel.

Make sense?

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

Paul Butcher

2006-Sep-20 20:18 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

Jens Kraemer <kraemer at webit.de> wrote:> However, if I limit *one* of the mongrels to 1 connection at a time, I get
> this:
> pen -fdnr 9000 localhost:9001:1 localhost:9002
> Fast: 59
> Slow: 6
That''s interesting! Thanks. I did try a similar configuration of
mod_proxy_balancer, but it didn''t have the same effect. Maybe
I''ll play with
Pen some more...
> So maybe, if the slow actions are well known, one could just reserve a
> pool of Mongrels for these slow actions, and another one for the fast
> ones...
We did consider this approach. It "feels wrong" to us though, and as I
said
in my response to Zed''s mail, for our application at least
it''s not always
possible to identify "slow" actions ahead of time :-(

Thanks for the information though!

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

Joshua Sierles

2006-Sep-20 22:14 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

I have to chime in and agree with Zed here. We had similar problems at
MOG, and came to the conclusion that solving the delegation problem is
just curing the symptoms. By systematically offloading a lot of slow
requests to background processes, we got more flexibility and the
ability to check the progress of such events.

I''d say the general consensus is that any page that needs more than a
few seconds to load needs optimization or offloading.

Joshua Sierles

snacktime

2006-Sep-21 01:23 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

Might be interesting to give Perlbal (http://www.danga.com) a try as
an http proxy.   Looks extremely configurable and supports ssl.  They
also use an interesting technique where the backend node can return a
reproxy command if it''s busy, and the proxy can then send the request
to another node.  It also appears to be able to only send connections
to nodes that are free to service a request.

Alexander Lazic

2006-Sep-21 14:08 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

On Mit 20.09.2006 11:18, Paul Butcher wrote:
>We have been searching for a Rails deployment architecture which works
>for us for some time. We''ve recently moved from Apache 1.3 +
FastCGI to
>Apache 2.2 + mod_proxy_balancer + mongrel_cluster, and it''s a
>significant improvement. But it still exhibits serious performance
>problems.
Have you ever use haproxy http://haproxy.1wt.eu/ ?!

He have the following feature which can help you:

---
http://haproxy.1wt.eu/download/1.2/doc/haproxy-en.txt
3.4) Limiting the number of concurrent sessions on each server

weight minconn maxconn
---

This tool can also check the availibility of your backends.
For ssl you need a ssl-termination SW such as stunnel or delegate or
which ever SW you prefer.

On the haproxy site you have a patch for stunnel for the x-forwarded-for
header, if you need ;-)

Regards

Alex

Paul Butcher

2006-Sep-21 19:06 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

> Have you ever use haproxy http://haproxy.1wt.eu/ ?!
In a word, no :-)
> He have the following feature which can help you:
Thanks - sounds interesting. We''ll give it a go!

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

hemant

2006-Sep-21 19:17 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

On 9/20/06, Paul Butcher <paul at paulbutcher.com>
wrote:>
> The real solution to the problem would be to remove Rails''
inability to
> handle more than one thread. In the absence of that solution, however,
we''ve
> implemented (in Ruby) what might be the world''s smallest
load-balancer. It
> only ever sends a single request to each member of the cluster at a time.
> It''s called HighWire and is available on RubyForge (no Gem yet -
it''s on the
> list of things to do!):
>
>   svn checkout svn://rubyforge.org/var/svn/highwire
>
> Using this instead of mod_proxy_balancer, and running the same test script
> above, we see approximately 54 fast and 6 slow requests per minute.
>
> HighWire is very young and has a way to go. It''s not had any
serious
> optimization or testing, and there are a bunch of things that need doing
> before it can really be considered production ready. But it does work for
> us, and does produce a significant performance improvement.
>
> Please check it out and let us know what you think.
(*Laziness kicks in*) How do you find...which one of mongrel id idle?


-- 
There was only one Road; that it was like a great river: its springs
were at every doorstep, and every path was its tributary.

Alexander Lazic

2006-Sep-22 05:30 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

On Don 21.09.2006 20:06, Paul Butcher wrote:>> Have you ever use haproxy http://haproxy.1wt.eu/ ?!
>
>In a word, no :-)
>
>> He have the following feature which can help you:
>
>Thanks - sounds interesting. We''ll give it a go!
Please let us/me know your experience ;-)

Thanks && Regards

Alex

Paul Butcher

2006-Sep-22 08:51 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

> (*Laziness kicks in*) How do you find...which one of mongrel id idle?
It''s a fantastically subtle and complicated algorithm. Not.

The main thread handles incoming connections, and then there''s one
thread
per worker. The main thread does this:

  while request = server.accept
    queue.push request
  end

And each worker does this:

  loop do
    request = queue.pop

    # Handle the request ...
  end

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher

snacktime

2006-Sep-25 06:07 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

I was taking a look at the source for mod_proxy_balancer this evening,
and it looks like the simplest solution to this whole problem is to
just add another balance method.  The functions that determine what
balance member gets the next request are pretty short.  I''m a pretty
bad C programmer, but I subscribed to the apache-dev list this evening
and maybe I can get someone there to assist me on the finer points.
>From what I saw, it looks like you could keep an array of balancemembers that are handling a request, and then implement a short wait
cycle if all the members are busy.  Have it write out a warning to the
apache log when it has to wait so you know when to be adding more
mongrels.  I''ll let everyone know how it goes.

Chris

Alexander Lazic

2006-Sep-25 11:55 UTC

head link

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

On Son 24.09.2006 23:07, snacktime wrote:>
>I was taking a look at the source for mod_proxy_balancer this evening,
>and it looks like the simplest solution to this whole problem is to
>just add another balance method.  The functions that determine what
>balance member gets the next request are pretty short.  I''m a
pretty
>bad C programmer, but I subscribed to the apache-dev list this evening
>and maybe I can get someone there to assist me on the finer points.
I think it would be possible to use this setup:

  +------------------------+
  |apache_mod_proxy_balance|
  | or any other webserver |
  +------------------------+
           |
           V
  +----------------------------+
  | haproxy maxconn 1 minconn 1|
  +----------------------------+
      |                |
      V                V
  +-------------+ +-------------+
  |mongrel_rails| |mongrel_rails|
  +-------------+ +-------------+

thoughts?!

Pros: haproxy balance thru all mongel_rails
              check the backend if still alive
              it queues the incomming requests

Cons: more complex setup
      you need to manage more programms/tools

Regards

Aleks

Apparently Analagous Threads

Search for more possibly parallel threads

Mongrel users - Sep 2006 - Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''t work for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

[Mongrel] Why Rails + mongrel_cluster + load balancing doesn''twork for us and the beginning of a solution

Apparently Analagous Threads