thr3ads.net - Rails - [Rails] How to scale mysql servers for a rails application? [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Dean Holdren

2006-Mar-09 19:24 UTC

[Rails] How to scale mysql servers for a rails application?

I''m a developer working on an application that will potentially be
used by around 500,000 users on a daily basis. Plus some internal apps
communicating to it via ActionWebServices with a potentially high
demand.

Our Operations team is helping us define the necessary system
architecture, and I have one remaining question:

What is the best way to scale the database? I have no expertise in
this area.  But considering that one one web server can talk to only
one database ip/host (as configured in database.yml) what is the best
option?

Note: I''m using MySQL 4.x, and there will be N web servers hosting the
rails application behind a load-balancer. 
(user->web-load-balancer->webserver)

0) No db scaling - One database ip/host for all web servers - single
point of failure, potentially too much load for one server to handle

1) DB Load balancing with database replication - a layer of load
balancing between the webservers and multiple databases - requires
that the databases perform replication amongst themselves. in this
configuration, the database.yml will point to the db-load-balancer.

2) partition sets of the web servers to talk to one database per set,
with database replication.
i.e. set A consists of 3 web servers, web1, web2, web3 which all
communicate with dbA, set B consists of 4 web servers, web4, web5,
web6, web7 which all communicate with dbB.  There is replication
between dbA and dbB.

Our biggest concern is failover/availability, so if one database goes
down, we can still continue. This sort of rules out option 2, unless
our web load balancer can somehow register a web server as unavailable
if the database it uses is unavailable.

what types of architectures are high-demand rails applications using?

I am limited by the database I''m using, which is currently MySQL 4.x,
If there is a good reason to move to MySQL 5.x for a feature that will
help in this capactiy, please let me know. Does anyone have experience
using the clustering capability of MySQL?

Tom Mornini

2006-Mar-09 19:47 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 9, 2006, at 11:24 AM, Dean Holdren wrote:
> Note: I''m using MySQL 4.x, and there will be N web servers hosting
the
> rails application behind a load-balancer.
> (user->web-load-balancer->webserver)
I''d recommend a three-tier setup, web server, app server and DB server 
(s)
> 0) No db scaling - One database ip/host for all web servers - single
> point of failure, potentially too much load for one server to handle
Not an option for you...
> 1) DB Load balancing with database replication - a layer of load
> balancing between the webservers and multiple databases - requires
> that the databases perform replication amongst themselves. in this
> configuration, the database.yml will point to the db-load-balancer.
What a wonderful world!

http://www.mysql.com/products/database/cluster/
http://www.continuent.com/index.php? 
option=com_content&task=view&id=210&Itemid=173
http://www.openminds.co.uk/high_availability_solutions/databases/ 
postgresql.htm
http://www.linuxlabs.com/clusgres.html
> 2) partition sets of the web servers to talk to one database per set,
> with database replication.
> i.e. set A consists of 3 web servers, web1, web2, web3 which all
> communicate with dbA, set B consists of 4 web servers, web4, web5,
> web6, web7 which all communicate with dbB.  There is replication
> between dbA and dbB.
Ugggh. :-)

-- 
-- Tom Mornini

Adam Fields

2006-Mar-09 20:44 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Thu, Mar 09, 2006 at 11:47:42AM -0800, Tom Mornini wrote:
[...]> >0) No db scaling - One database ip/host for all web servers - single
> >point of failure, potentially too much load for one server to handle
> 
> Not an option for you...
That''s true. :)
> >1) DB Load balancing with database replication - a layer of load
> >balancing between the webservers and multiple databases - requires
> >that the databases perform replication amongst themselves. in this
> >configuration, the database.yml will point to the db-load-balancer.
> 
> What a wonderful world!
> 
> http://www.mysql.com/products/database/cluster/
> http://www.continuent.com/index.php? 
> option=com_content&task=view&id=210&Itemid=173
> http://www.openminds.co.uk/high_availability_solutions/databases/ 
> postgresql.htm
> http://www.linuxlabs.com/clusgres.html
Last time I checked, the MySQL cluster was based on the Emic
clustering. While this will give you pretty good throughput, it''s
still got some connection limitations.

I''ve done this a lot with PHP and other platforms, not so much with
rails yet. I don''t know how the database pooling works in rails, but
if you''ve got a lot of apache processes running, you may be in danger
of exhausting the mysql connection limitations on the server. 
> >2) partition sets of the web servers to talk to one database per set,
> >with database replication.
> >i.e. set A consists of 3 web servers, web1, web2, web3 which all
> >communicate with dbA, set B consists of 4 web servers, web4, web5,
> >web6, web7 which all communicate with dbB.  There is replication
> >between dbA and dbB.
> 
> Ugggh. :-)
This is not a terrible configuration if you''re able to segregate reads
and writes. Depends on the application. Select against the slave is a
pretty common scaling technique, although it requires some
infrastructure. I haven''t seen a rails installation do it yet.

Dean, if you want to contact me offlist, I do general mysql and
application scaling and architecture consulting (including failover
and replication). I offer a discount for rails applications.

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Tom Mornini

2006-Mar-09 21:12 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 9, 2006, at 12:44 PM, Adam Fields wrote:
>>> 1) DB Load balancing with database replication - a layer of load
>>> balancing between the webservers and multiple databases - requires
>>> that the databases perform replication amongst themselves. in this
>>> configuration, the database.yml will point to the db-load-balancer.
>>
>> What a wonderful world!
>>
>> http://www.mysql.com/products/database/cluster/
>> http://www.continuent.com/index.php?
>> option=com_content&task=view&id=210&Itemid=173
>> http://www.openminds.co.uk/high_availability_solutions/databases/
>> postgresql.htm
>> http://www.linuxlabs.com/clusgres.html
>
> Last time I checked, the MySQL cluster was based on the Emic
> clustering. While this will give you pretty good throughput, it''s
> still got some connection limitations.
>
> I''ve done this a lot with PHP and other platforms, not so much
with
> rails yet. I don''t know how the database pooling works in rails,
but
> if you''ve got a lot of apache processes running, you may be in
danger
> of exhausting the mysql connection limitations on the server.
Generally speaking, you get connection "pooling" in Rails via lots of
front-end web server (Apache and/or Lighttpd) connections being proxied
back to a far smaller number of application processes  
(FCGI,SCGI,Mongrel,
or WEBrick). Rails would then have 1 connection per application process,
so the number of Apache processes running is largely irrelevant.

-- 
-- Tom Mornini

Adam Fields

2006-Mar-10 02:33 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote:
[...]> Generally speaking, you get connection "pooling" in Rails via
lots of
> front-end web server (Apache and/or Lighttpd) connections being proxied
> back to a far smaller number of application processes  
> (FCGI,SCGI,Mongrel,
> or WEBrick). Rails would then have 1 connection per application process,
> so the number of Apache processes running is largely irrelevant.
Makes sense.

What happens with simultaneous requests within the same application
process? How do you deal with resource contention on the same
connection, long running queries, and potential blocking? 

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Tom Mornini

2006-Mar-10 04:24 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 9, 2006, at 6:33 PM, Adam Fields wrote:
> On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote:
> [...]
>> Generally speaking, you get connection "pooling" in Rails via
lots of
>> front-end web server (Apache and/or Lighttpd) connections being  
>> proxied
>> back to a far smaller number of application processes
>> (FCGI,SCGI,Mongrel,
>> or WEBrick). Rails would then have 1 connection per application  
>> process,
>> so the number of Apache processes running is largely irrelevant.
>
> Makes sense.
>
> What happens with simultaneous requests within the same application
> process? How do you deal with resource contention on the same
> connection, long running queries, and potential blocking?
The same thing connection pools do. :-)

Simultaneous requests are run on separate application processes.

You never have resource contention on the same connection, because  
you have
one connection per concurrent thread.

Long running queries take up a connection (and corresponding application
process) for a long time.

If you don''t have enough backends to handle the situation you  
mention, you
add more.

Ever since LAMP scaling was pioneered in the late 90''s,
there''s been
discussion on this subject. It generally breaks down like this:

By splitting up the web server from the application server, you take  
away
the main bottleneck which is (somewhat surprisingly) the connection  
to the
client, who generally has much lower bandwidth than the app cluster.  
This
was particularly true back in the modem days.

I cannot remember the exact numbers we discovered back then, but it was
somewhere in the neighborhood 1/8 to 1/16 the number of backend  
processes
as we had web sockets, assuming the web sockets served static content
themselves.

So, in effect, you get connection pooling, just in a different  
architecture.

-- 
-- Tom Mornini

Adam Fields

2006-Mar-10 05:02 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Thu, Mar 09, 2006 at 08:24:31PM -0800, Tom Mornini
wrote:> On Mar 9, 2006, at 6:33 PM, Adam Fields wrote:
> 
> >On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote:
> >[...]
> >>Generally speaking, you get connection "pooling" in Rails
via lots of
> >>front-end web server (Apache and/or Lighttpd) connections being  
> >>proxied
> >>back to a far smaller number of application processes
> >>(FCGI,SCGI,Mongrel,
> >>or WEBrick). Rails would then have 1 connection per application  
> >>process,
> >>so the number of Apache processes running is largely irrelevant.
> >
> >Makes sense.
> >
> >What happens with simultaneous requests within the same application
> >process? How do you deal with resource contention on the same
> >connection, long running queries, and potential blocking?
> 
> The same thing connection pools do. :-)
Well, not really. Since connection pools are available intraprocess,
if you have two threads running competing queries, you can assign them
each a connection from the pool.

Say, hypothetically, that each user is going to hit a common
transaction table. So you have a few thousand users, each on their own
apache thread, but all sharing one rails fcgi backend process, and
thus one db connection between them. You may get resource contention
on that transaction table, because of that sharing, since only one
query can execute at a time on the connection.

Unless I''m misunderstanding what you said - it''s not clear.

Is there one db connection per execution thread, or one db connection
per application process?
> Simultaneous requests are run on separate application processes.
> 
> You never have resource contention on the same connection, because  
> you have
> one connection per concurrent thread.
But then this devolves to the same case where you have a separate db
connection per apache process. If you have one connection per client,
what difference does it make if it''s assigned via an apache process or
a rails execution thread? In that case you do have the possibility of
exhausting the number of possible connections.
> Long running queries take up a connection (and corresponding application
> process) for a long time.
> 
> If you don''t have enough backends to handle the situation you  
> mention, you
> add more.
> 
> Ever since LAMP scaling was pioneered in the late 90''s,
there''s been
> discussion on this subject. It generally breaks down like this:
> 
> By splitting up the web server from the application server, you take  
> away
> the main bottleneck which is (somewhat surprisingly) the connection  
> to the
> client, who generally has much lower bandwidth than the app cluster.  
> This
> was particularly true back in the modem days.
>
> I cannot remember the exact numbers we discovered back then, but it was
> somewhere in the neighborhood 1/8 to 1/16 the number of backend  
> processes
> as we had web sockets, assuming the web sockets served static content
> themselves.
> 
> So, in effect, you get connection pooling, just in a different  
> architecture.

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Tom Mornini

2006-Mar-10 06:44 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 9, 2006, at 9:02 PM, Adam Fields wrote:
>>> What happens with simultaneous requests within the same application
>>> process? How do you deal with resource contention on the same
>>> connection, long running queries, and potential blocking?
>>
>> The same thing connection pools do. :-)
>
> Well, not really. Since connection pools are available intraprocess,
> if you have two threads running competing queries, you can assign them
> each a connection from the pool.
Yes, but each connection can only be used by one thread at a time...
> Say, hypothetically, that each user is going to hit a common
> transaction table. So you have a few thousand users, each on their own
> apache thread, but all sharing one rails fcgi backend process, and
> thus one db connection between them. You may get resource contention
> on that transaction table, because of that sharing, since only one
> query can execute at a time on the connection.
Yes, that would be a problem, but not necessarily more so than the
pooled model, except that you''ve artificially set the connection/FCGI
ration at a few thousand to one. Certainly you wouldn''t run the same
few thousand users with a pool of one connection, would you?
> Is there one db connection per execution thread, or one db connection
> per application process?
Per application process, which is also per execution thread, as the  
Rails
internals are not multi-threaded.
>> Simultaneous requests are run on separate application processes.
>>
>> You never have resource contention on the same connection, because
>> you have one connection per concurrent thread.
>
> But then this devolves to the same case where you have a separate db
> connection per apache process. If you have one connection per client,
> what difference does it make if it''s assigned via an apache
process or
> a rails execution thread? In that case you do have the possibility of
> exhausting the number of possible connections.
Because the reality is this: you get more than one HTTP (not necessarily
handled by Apache...there are other servers) requests than you get
application requests. The HTTP server serves static pages, images, CSS
and Javascript files.

Addtionally, see comments below which you did not comment on.
>> Long running queries take up a connection (and corresponding  
>> application
>> process) for a long time.
>>
>> If you don''t have enough backends to handle the situation you
>> mention, you add more.
>>
>> Ever since LAMP scaling was pioneered in the late 90''s,
there''s been
>> discussion on this subject. It generally breaks down like this:
>>
>> By splitting up the web server from the application server, you take
>> away the main bottleneck which is (somewhat surprisingly) the  
>> connection
>> to the client, who generally has much lower bandwidth than the app  
>> cluster.
>> This was particularly true back in the modem days.
>>
>> I cannot remember the exact numbers we discovered back then, but  
>> it was
>> somewhere in the neighborhood 1/8 to 1/16 the number of backend  
>> processes
>> as we had web sockets, assuming the web sockets served static content
>> themselves.
>>
>> So, in effect, you get connection pooling, just in a different
>> architecture.
It breaks down like this, for the same fundamental reasons:

In the connection pooled world, there are clearly rules of thumb and  
tuning
involved in determining the correct number of pools per client  
connection:

http://edocs.bea.com/wls/docs70/perform/WLSTuning.html

If I read the correctly by default WebLogic creates a connection pool  
equal
to 33% of application threads, and states that the most significant  
performance
increase caused by connection pooling comes from maintaining the  
connection as
opposed to reducing resource utilization.

The three-tier LAMP scaling model provides the same benefits in a  
different
way. As I stated, in the past I''ve seen application specific tuning  
of the
LAMP configuration at ration of 1/8-1/16 application processes per HTTP
connection, and I''m willing to bet that''s a fairly similar
number to the
tuning for the connection pooling ratio.

Here''s some nasty application art:

Web client <------------> HTTP Server <------------> App Server  
<------------> DB
   1,000                      1,000                       
125                   125

So, for 1,000 simultaneous web requests, you need 125 DB connections  
(in this example)
and in my experience perhaps only 67 or so DB connections.

The reason for this is the latency and bandwidth limits between the  
web client
and the HTTP server *plus* the fact that many of those connections  
require NO
App Server utilization (static files and images) combine to produce  
resource
contention in the first stage that are between 8 and 16 times the  
resource
contention the 2nd and third combined.

I''ve heard that 37Signals and other large scale applications are  
running far more
front end process per backend process than the 8-16 that I''ve  
described. The reason
for this would be the dramatic increase in performance of today''s  
hardware -vs- the
far smaller increase in client side bandwidth during that same period.

The latency of the connections also plays an enormous role in these  
issues. Packet
times between the web client and HTTP server are likely to be in the  
30-60 ms
range, while the intra-farm packet latencies are sub-millisecond.

So, to sum it up, even for dynamic requests, a minimum latency from  
the client to
the HTTP server would be in the 60-120 ms range, even if the request  
only took one
packet in each direction. That''s only about 8 requests per second.

The vast majority of the dynamic requests that will be proxied back  
to the application
servers will be handled in just a few milliseconds.

These are the reasons why you can handle more than one client side  
connections per
application process.

-- 
-- Tom Mornini

Dylan Stamat

2006-Mar-10 14:33 UTC

head link

[Rails] How to scale mysql servers for a rails application?

Just wanted to jump in and thank you guys for hashing this out in this post
!
I''m sure this area of expertise is a huge weakness of a lot of web app
developers (myself at least), and these type of threads end up helping
tremendously at some point :)


On 3/9/06, Tom Mornini <tmornini@infomania.com>
wrote:>
> On Mar 9, 2006, at 9:02 PM, Adam Fields wrote:
>
> >>> What happens with simultaneous requests within the same
application
> >>> process? How do you deal with resource contention on the same
> >>> connection, long running queries, and potential blocking?
> >>
> >> The same thing connection pools do. :-)
> >
> > Well, not really. Since connection pools are available intraprocess,
> > if you have two threads running competing queries, you can assign them
> > each a connection from the pool.
>
> Yes, but each connection can only be used by one thread at a time...
>
> > Say, hypothetically, that each user is going to hit a common
> > transaction table. So you have a few thousand users, each on their own
> > apache thread, but all sharing one rails fcgi backend process, and
> > thus one db connection between them. You may get resource contention
> > on that transaction table, because of that sharing, since only one
> > query can execute at a time on the connection.
>
> Yes, that would be a problem, but not necessarily more so than the
> pooled model, except that you''ve artificially set the
connection/FCGI
> ration at a few thousand to one. Certainly you wouldn''t run the
same
> few thousand users with a pool of one connection, would you?
>
> > Is there one db connection per execution thread, or one db connection
> > per application process?
>
> Per application process, which is also per execution thread, as the
> Rails
> internals are not multi-threaded.
>
> >> Simultaneous requests are run on separate application processes.
> >>
> >> You never have resource contention on the same connection, because
> >> you have one connection per concurrent thread.
> >
> > But then this devolves to the same case where you have a separate db
> > connection per apache process. If you have one connection per client,
> > what difference does it make if it''s assigned via an apache
process or
> > a rails execution thread? In that case you do have the possibility of
> > exhausting the number of possible connections.
>
> Because the reality is this: you get more than one HTTP (not necessarily
> handled by Apache...there are other servers) requests than you get
> application requests. The HTTP server serves static pages, images, CSS
> and Javascript files.
>
> Addtionally, see comments below which you did not comment on.
>
> >> Long running queries take up a connection (and corresponding
> >> application
> >> process) for a long time.
> >>
> >> If you don''t have enough backends to handle the situation
you
> >> mention, you add more.
> >>
> >> Ever since LAMP scaling was pioneered in the late 90''s,
there''s been
> >> discussion on this subject. It generally breaks down like this:
> >>
> >> By splitting up the web server from the application server, you
take
> >> away the main bottleneck which is (somewhat surprisingly) the
> >> connection
> >> to the client, who generally has much lower bandwidth than the app
> >> cluster.
> >> This was particularly true back in the modem days.
> >>
> >> I cannot remember the exact numbers we discovered back then, but
> >> it was
> >> somewhere in the neighborhood 1/8 to 1/16 the number of backend
> >> processes
> >> as we had web sockets, assuming the web sockets served static
content
> >> themselves.
> >>
> >> So, in effect, you get connection pooling, just in a different
> >> architecture.
>
> It breaks down like this, for the same fundamental reasons:
>
> In the connection pooled world, there are clearly rules of thumb and
> tuning
> involved in determining the correct number of pools per client
> connection:
>
> http://edocs.bea.com/wls/docs70/perform/WLSTuning.html
>
> If I read the correctly by default WebLogic creates a connection pool
> equal
> to 33% of application threads, and states that the most significant
> performance
> increase caused by connection pooling comes from maintaining the
> connection as
> opposed to reducing resource utilization.
>
> The three-tier LAMP scaling model provides the same benefits in a
> different
> way. As I stated, in the past I''ve seen application specific
tuning
> of the
> LAMP configuration at ration of 1/8-1/16 application processes per HTTP
> connection, and I''m willing to bet that''s a fairly
similar number to the
> tuning for the connection pooling ratio.
>
> Here''s some nasty application art:
>
> Web client <------------> HTTP Server <------------> App Server
> <------------> DB
>    1,000                      1,000
> 125                   125
>
> So, for 1,000 simultaneous web requests, you need 125 DB connections
> (in this example)
> and in my experience perhaps only 67 or so DB connections.
>
> The reason for this is the latency and bandwidth limits between the
> web client
> and the HTTP server *plus* the fact that many of those connections
> require NO
> App Server utilization (static files and images) combine to produce
> resource
> contention in the first stage that are between 8 and 16 times the
> resource
> contention the 2nd and third combined.
>
> I''ve heard that 37Signals and other large scale applications are
> running far more
> front end process per backend process than the 8-16 that I''ve
> described. The reason
> for this would be the dramatic increase in performance of today''s
> hardware -vs- the
> far smaller increase in client side bandwidth during that same period.
>
> The latency of the connections also plays an enormous role in these
> issues. Packet
> times between the web client and HTTP server are likely to be in the
> 30-60 ms
> range, while the intra-farm packet latencies are sub-millisecond.
>
> So, to sum it up, even for dynamic requests, a minimum latency from
> the client to
> the HTTP server would be in the 60-120 ms range, even if the request
> only took one
> packet in each direction. That''s only about 8 requests per second.
>
> The vast majority of the dynamic requests that will be proxied back
> to the application
> servers will be handled in just a few milliseconds.
>
> These are the reasons why you can handle more than one client side
> connections per
> application process.
>
> --
> -- Tom Mornini
>
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060310/439ba3cb/attachment-0001.html

Adam Fields

2006-Mar-10 15:49 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Thu, Mar 09, 2006 at 10:44:37PM -0800, Tom Mornini
wrote:> On Mar 9, 2006, at 9:02 PM, Adam Fields wrote:
> 
> >>>What happens with simultaneous requests within the same
application
> >>>process? How do you deal with resource contention on the same
> >>>connection, long running queries, and potential blocking?
> >>
> >>The same thing connection pools do. :-)
> >
> >Well, not really. Since connection pools are available intraprocess,
> >if you have two threads running competing queries, you can assign them
> >each a connection from the pool.
> 
> Yes, but each connection can only be used by one thread at a time...
To be clear, there are really four architectures we''re talking
about. Connection below always means db connection, and a client user
is assumed to be using the database (see below for more on that):

1) Non-persistent connections, each client opens its own connection,
   and closes it when it''s done. Basically 1:1 clients to
   connections.

2) Persistent connections, each client opens its own connection, but
   if there''s one already open, that''s used. Definitely 1:1
clients to
   connections, and in fact this can be worse, because clients are
   probably holding open persistent connections even if they''re not
   using them. This may make sense if the connection overhead is
   large.

3) "Real" DB pooling, where the application server / connection
   manager hands out db connections as needed for each query and
   reclaims them back to the pool when they''re done. There''s
basically
   no relation between the number of connections and the number of
   client users.

4) "Thread"-based connections, where each application thread (or
   process, it''s not important how it''s implemented) gets its
own
   connection. Many client users may share an application thread
   connection.

The first two are basic PHP models, #3 is how Java does it if you''re
doing it right, and if I''m understanding it, #4 is the rails method.

Correct?
> >Say, hypothetically, that each user is going to hit a common
> >transaction table. So you have a few thousand users, each on their own
> >apache thread, but all sharing one rails fcgi backend process, and
> >thus one db connection between them. You may get resource contention
> >on that transaction table, because of that sharing, since only one
> >query can execute at a time on the connection.
> 
> Yes, that would be a problem, but not necessarily more so than the
> pooled model, except that you''ve artificially set the
connection/FCGI
> ration at a few thousand to one. Certainly you wouldn''t run the
same
> few thousand users with a pool of one connection, would you?
No - but that''s a limitation of the thread connection model. Do I need
to spawn a whole new fcgi process if I saturate my db (or any other
external resource, for that matter)? That seems like an inefficient
shotgun scaling mechanism.
> >Is there one db connection per execution thread, or one db connection
> >per application process?
> 
> Per application process, which is also per execution thread, as the  
> Rails internals are not multi-threaded.
That makes sense.
> >>Simultaneous requests are run on separate application processes.
> >>
> >>You never have resource contention on the same connection, because
> >>you have one connection per concurrent thread.
> >
> >But then this devolves to the same case where you have a separate db
> >connection per apache process. If you have one connection per client,
> >what difference does it make if it''s assigned via an apache
process or
> >a rails execution thread? In that case you do have the possibility of
> >exhausting the number of possible connections.
> 
> Because the reality is this: you get more than one HTTP (not necessarily
> handled by Apache...there are other servers) requests than you get
> application requests. The HTTP server serves static pages, images, CSS
> and Javascript files.
Well, in a large application where you''re in danger of exhausting the
number of simultaneous connections on the database (which, for MySQL
under normal conditions is about 4,000, although I''ve been able to
push it to around 10,000), you''re probably pushing static and/or
cached files out from somewhere else. Either a CDN, or some sort of
front-end caching mechanism. On 16 webservers, each of which does the
default max of 256 apache processes (or 4 webservers if you push it to
1024), that''s enough to crash the database under heavy load. Granted,
this is larger than the average application, but that''s why they call
it "scaling".

I''ve seen cases where even static files are invoking the php engine
because of misconfiguration, and that obviously is a problem that
compounds this. No idea if that''s a possible mistake to make in
rails. I hope not.
> Addtionally, see comments below which you did not comment on.
I was waiting to respond until I understood what you were saying. :)
> >>Long running queries take up a connection (and corresponding  
> >>application
> >>process) for a long time.
> >>
> >>If you don''t have enough backends to handle the situation
you
> >>mention, you add more.
> >>
> >>Ever since LAMP scaling was pioneered in the late 90''s,
there''s been
> >>discussion on this subject. It generally breaks down like this:
> >>
> >>By splitting up the web server from the application server, you
take
> >>away the main bottleneck which is (somewhat surprisingly) the  
> >>connection
> >>to the client, who generally has much lower bandwidth than the app
> >>cluster.
> >>This was particularly true back in the modem days.
> >>
> >>I cannot remember the exact numbers we discovered back then, but  
> >>it was
> >>somewhere in the neighborhood 1/8 to 1/16 the number of backend  
> >>processes
> >>as we had web sockets, assuming the web sockets served static
content
> >>themselves.
> >>
> >>So, in effect, you get connection pooling, just in a different
> >>architecture.
> 
> It breaks down like this, for the same fundamental reasons:
> 
> In the connection pooled world, there are clearly rules of thumb and  
> tuning
> involved in determining the correct number of pools per client  
> connection:
> 
> http://edocs.bea.com/wls/docs70/perform/WLSTuning.html
> 
> If I read the correctly by default WebLogic creates a connection pool  
> equal
> to 33% of application threads, and states that the most significant  
> performance
> increase caused by connection pooling comes from maintaining the  
> connection as
> opposed to reducing resource utilization.
If your connection overhead is high, as it is with Oracle. This is
MUCH less of an issue with mysql, and in fact, you''ll often get better
performance by not using persistent connections and letting the
connections cycle away and not be held beyond when they''re needed.
> The three-tier LAMP scaling model provides the same benefits in a  
> different
> way. As I stated, in the past I''ve seen application specific
tuning
> of the
> LAMP configuration at ration of 1/8-1/16 application processes per HTTP
> connection, and I''m willing to bet that''s a fairly
similar number to the
> tuning for the connection pooling ratio.
> 
> Here''s some nasty application art:
:)
> Web client <------------> HTTP Server <------------> App Server
<------------> DB
>   1,000                      1,000                       125               
125
>
> So, for 1,000 simultaneous web requests, you need 125 DB connections  
> (in this example) and in my experience perhaps only 67 or so DB
connections.
> 
> The reason for this is the latency and bandwidth limits between the
> web client and the HTTP server *plus* the fact that many of those
> connections require NO App Server utilization (static files and
> images) combine to produce resource contention in the first stage
> that are between 8 and 16 times the resource contention the 2nd and
> third combined.
It depends on what kind of queries your running and how long they take
on average, but okay for the general case.
> I''ve heard that 37Signals and other large scale applications are
> running far more front end process per backend process than the 8-16
> that I''ve described. The reason for this would be the dramatic
> increase in performance of today''s hardware -vs- the far smaller
> increase in client side bandwidth during that same period.
Yes, I can definitely see that. 
> The latency of the connections also plays an enormous role in these
> issues. Packet times between the web client and HTTP server are
> likely to be in the 30-60 ms range, while the intra-farm packet
> latencies are sub-millisecond.
>
> So, to sum it up, even for dynamic requests, a minimum latency from
> the client to the HTTP server would be in the 60-120 ms range, even
> if the request only took one packet in each direction. That''s only
> about 8 requests per second.
>
> The vast majority of the dynamic requests that will be proxied back  
> to the application servers will be handled in just a few milliseconds.
> 
> These are the reasons why you can handle more than one client side  
> connections per application process.
Latency is not the only issue though - this equation is heavily
dependent on the efficiency of your backend queries, how long they
take to run, and how they stack up with respect to each other than the
db''s own resource contention algorithms. For example, if
you''re doing
a lot of big file uploads, you may be holding open db connections for
the whole length of that connection, which may be minutes, even if you
only use the db at the very beginning to record the transaction.

Perhaps what you say is true for many apps, but I''m also interested in
the difficult boundary cases. :)

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Adam Fields

2006-Mar-10 16:02 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 06:33:15AM -0800, Dylan Stamat
wrote:> Just wanted to jump in and thank you guys for hashing this out in this
post!
> I''m sure this area of expertise is a huge weakness of a lot of web
app
> developers (myself at least), and these type of threads end up helping
> tremendously at some point :)
Happy to help.

Remember that when you need an expensive consultant to bail you out of
a problem by Monday. :)

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Tom Mornini

2006-Mar-10 16:08 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 10, 2006, at 8:02 AM, Adam Fields wrote:
> On Fri, Mar 10, 2006 at 06:33:15AM -0800, Dylan Stamat wrote:
>> Just wanted to jump in and thank you guys for hashing this out in  
>> this post!
>> I''m sure this area of expertise is a huge weakness of a lot of
web
>> app
>> developers (myself at least), and these type of threads end up  
>> helping
>> tremendously at some point :)
>
> Happy to help.
>
> Remember that when you need an expensive consultant to bail you out of
> a problem by Monday. :)
+1 :-)

-- 
-- Tom Mornini

Tom Mornini

2006-Mar-10 16:29 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 10, 2006, at 7:49 AM, Adam Fields wrote:
> On Thu, Mar 09, 2006 at 10:44:37PM -0800, Tom Mornini wrote:
>> On Mar 9, 2006, at 9:02 PM, Adam Fields wrote:
>>
>>>>> What happens with simultaneous requests within the same  
>>>>> application
>>>>> process? How do you deal with resource contention on the
same
>>>>> connection, long running queries, and potential blocking?
>>>>
>>>> The same thing connection pools do. :-)
>>>
>>> Well, not really. Since connection pools are available
intraprocess,
>>> if you have two threads running competing queries, you can assign  
>>> them
>>> each a connection from the pool.
>>
>> Yes, but each connection can only be used by one thread at a time...
>
> To be clear, there are really four architectures we''re talking
> about. Connection below always means db connection, and a client user
> is assumed to be using the database (see below for more on that):
>
> 1) Non-persistent connections, each client opens its own connection,
>    and closes it when it''s done. Basically 1:1 clients to
>    connections.
Common examples: CGI and mod_perl without Apache::DBI
> 2) Persistent connections, each client opens its own connection, but
>    if there''s one already open, that''s used. Definitely
1:1 clients to
>    connections, and in fact this can be worse, because clients are
>    probably holding open persistent connections even if they''re
not
>    using them. This may make sense if the connection overhead is
>    large.
Common examples: mod_perl with Apache::DBI

Additionally, I believe you''re throwing in the assumption that app
connections are equal to HTTP connections.
> 3) "Real" DB pooling, where the application server / connection
>    manager hands out db connections as needed for each query and
>    reclaims them back to the pool when they''re done.
There''s basically
>    no relation between the number of connections and the number of
>    client users.
Common examples: Java
> 4) "Thread"-based connections, where each application thread (or
>    process, it''s not important how it''s implemented) gets
its own
>    connection. Many client users may share an application thread
>    connection.
Yes, and only dynamic requests get handled by the process that holds
the DB connection open, and the HTTP connection to the client is NOT
handled by the application thread.
> The first two are basic PHP models, #3 is how Java does it if
you''re
> doing it right, and if I''m understanding it, #4 is the rails
method.
>
> Correct?
Yes, with clarifying comments.
>>> Say, hypothetically, that each user is going to hit a common
>>> transaction table. So you have a few thousand users, each on  
>>> their own
>>> apache thread, but all sharing one rails fcgi backend process, and
>>> thus one db connection between them. You may get resource
contention
>>> on that transaction table, because of that sharing, since only one
>>> query can execute at a time on the connection.
>>
>> Yes, that would be a problem, but not necessarily more so than the
>> pooled model, except that you''ve artificially set the
connection/FCGI
>> ration at a few thousand to one. Certainly you wouldn''t run
the same
>> few thousand users with a pool of one connection, would you?
>
> No - but that''s a limitation of the thread connection model. Do I
need
> to spawn a whole new fcgi process if I saturate my db (or any other
> external resource, for that matter)? That seems like an inefficient
> shotgun scaling mechanism.
I don''t understand you here. The whole new FCGI process is spawned in
advanced, and tuned over time to match traffic patterns. Generally
speaking in the high ends of scalability we''re discussing,
you''d add
FCGI processes by plugging in a new application server.
>>> Is there one db connection per execution thread, or one db  
>>> connection
>>> per application process?
>>
>> Per application process, which is also per execution thread, as the
>> Rails internals are not multi-threaded.
>
> That makes sense.
>
>>>> Simultaneous requests are run on separate application
processes.
>>>>
>>>> You never have resource contention on the same connection,
because
>>>> you have one connection per concurrent thread.
>>>
>>> But then this devolves to the same case where you have a separate
db
>>> connection per apache process. If you have one connection per  
>>> client,
>>> what difference does it make if it''s assigned via an
apache
>>> process or
>>> a rails execution thread? In that case you do have the  
>>> possibility of
>>> exhausting the number of possible connections.
>>
>> Because the reality is this: you get more than one HTTP (not  
>> necessarily
>> handled by Apache...there are other servers) requests than you get
>> application requests. The HTTP server serves static pages, images,  
>> CSS
>> and Javascript files.
>
> Well, in a large application where you''re in danger of exhausting
the
> number of simultaneous connections on the database (which, for MySQL
> under normal conditions is about 4,000, although I''ve been able to
> push it to around 10,000), you''re probably pushing static and/or
> cached files out from somewhere else. Either a CDN, or some sort of
> front-end caching mechanism.
Yes, that''s the front end proxy servers in the drawing below...
> On 16 webservers, each of which does the default max of 256 apache
> processes (or 4 webservers if you push it to 1024), that''s enough
to
> crash the database under heavy load. Granted, this is larger than the
> average application, but that''s why they call it
"scaling".
But these connections are the client side HTTP connections, and DO NOT
have DB connections. :-)
> I''ve seen cases where even static files are invoking the php
engine
> because of misconfiguration, and that obviously is a problem that
> compounds this. No idea if that''s a possible mistake to make in
> rails. I hope not.
Well, you know what they say about pigs...lipstick doesn''t help. :-)
>> It breaks down like this, for the same fundamental reasons:
>>
>> In the connection pooled world, there are clearly rules of thumb and
>> tuning involved in determining the correct number of pools per client
>> connection:
>>
>> http://edocs.bea.com/wls/docs70/perform/WLSTuning.html
>>
>> If I read the correctly by default WebLogic creates a connection pool
>> equal to 33% of application threads, and states that the most  
>> significant
>> performance increase caused by connection pooling comes from  
>> maintaining
>> the connection as opposed to reducing resource utilization.
>
> If your connection overhead is high, as it is with Oracle. This is
> MUCH less of an issue with mysql, and in fact, you''ll often get
better
> performance by not using persistent connections and letting the
> connections cycle away and not be held beyond when they''re needed.
I find it very hard to believe that connecting per request is more
efficient that persistent connections, though I do agree that MySQL
connects incredibly faster than Oracle, and would therefore show a
much smaller improvement -vs- Oracle.

But, I can only imagine that''s faster if you''re opening
connections
that *aren''t going to be used*. If that''s the case, then all
bets are
off.
>> The three-tier LAMP scaling model provides the same benefits in a
>> different way. As I stated, in the past I''ve seen application
>> specific tuning
>> of the LAMP configuration at ration of 1/8-1/16 application  
>> processes per HTTP
>> connection, and I''m willing to bet that''s a fairly
similar number
>> to the
>> tuning for the connection pooling ratio.
>>
>> Here''s some nasty application art:
>
> :)
>
>> Web client <------------> HTTP Server <------------> App
Server
>> <------------> DB
>>   1,000                      1,000                        
>> 125                   125
>>
>> So, for 1,000 simultaneous web requests, you need 125 DB connections
>> (in this example) and in my experience perhaps only 67 or so DB  
>> connections.
>>
>> The reason for this is the latency and bandwidth limits between the
>> web client and the HTTP server *plus* the fact that many of those
>> connections require NO App Server utilization (static files and
>> images) combine to produce resource contention in the first stage
>> that are between 8 and 16 times the resource contention the 2nd and
>> third combined.
>
> It depends on what kind of queries your running and how long they take
> on average, but okay for the general case.
Oh, absolutely! The ratio needs to be tuned per-application, and is  
dependent
on many variables, perhaps the most volatile of which is the  
application code
itself. Efficient apps will have higher HTTP/app ratios than  
inefficient apps.
>> I''ve heard that 37Signals and other large scale applications
are
>> running far more front end process per backend process than the 8-16
>> that I''ve described. The reason for this would be the dramatic
>> increase in performance of today''s hardware -vs- the far
smaller
>> increase in client side bandwidth during that same period.
>
> Yes, I can definitely see that.
>
>> The latency of the connections also plays an enormous role in these
>> issues. Packet times between the web client and HTTP server are
>> likely to be in the 30-60 ms range, while the intra-farm packet
>> latencies are sub-millisecond.
>>
>> So, to sum it up, even for dynamic requests, a minimum latency from
>> the client to the HTTP server would be in the 60-120 ms range, even
>> if the request only took one packet in each direction. That''s
only
>> about 8 requests per second.
>>
>> The vast majority of the dynamic requests that will be proxied back
>> to the application servers will be handled in just a few  
>> milliseconds.
>>
>> These are the reasons why you can handle more than one client side
>> connections per application process.
>
> Latency is not the only issue though - this equation is heavily
> dependent on the efficiency of your backend queries, how long they
> take to run, and how they stack up with respect to each other than the
> db''s own resource contention algorithms. For example, if
you''re doing
> a lot of big file uploads, you may be holding open db connections for
> the whole length of that connection, which may be minutes, even if you
> only use the db at the very beginning to record the transaction.
Yes, again, there are many complex variables that play into the  
equation.

The reason I mentioned latency is that I''ve found again in again in  
career
that serial latencies are more often the cause of performance problems
performance resource limitations, though the latencies often cause a
particular architecture to be starved for resources. :-)

I find again and again that many people have a hard time conceptualizing
how fast computers are. They think several milliseconds is incredibly
fast, where to a modern computer it''s incredibly slow. I cannot tell
you
how many times I''ve seen applications running slowly, but having little
to no disk and/or processor utilization issues due to serialized  
latencies,
and people suggesting that throwing more and/or faster hardware at it  
was
going to fix the problem. :-)
> Perhaps what you say is true for many apps, but I''m also
interested in
> the difficult boundary cases. :)
Me too. While I gave some specific examples of the most common issues,
we''re in agreement that there''s more than one way to cause an
application
to perform poorly. :-)

Again, just remember the pig!

-- 
-- Tom Mornini

Adam Fields

2006-Mar-10 17:31 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 08:29:22AM -0800, Tom Mornini wrote:
[...]> >1) Non-persistent connections, each client opens its own connection,
> >   and closes it when it''s done. Basically 1:1 clients to
> >   connections.
> 
> Common examples: CGI and mod_perl without Apache::DBI
Also PHP with standard connect.
> >2) Persistent connections, each client opens its own connection, but
> >   if there''s one already open, that''s used.
Definitely 1:1 clients to
> >   connections, and in fact this can be worse, because clients are
> >   probably holding open persistent connections even if
they''re not
> >   using them. This may make sense if the connection overhead is
> >   large.
> 
> Common examples: mod_perl with Apache::DBI
Also PHP with pconnect.
> Additionally, I believe you''re throwing in the assumption that app
> connections are equal to HTTP connections.
Not necessarily, but an HTTP connection is what I refer to when I say
"client user".
> >3) "Real" DB pooling, where the application server /
connection
> >   manager hands out db connections as needed for each query and
> >   reclaims them back to the pool when they''re done.
There''s basically
> >   no relation between the number of connections and the number of
> >   client users.
> 
> Common examples: Java
> 
> >4) "Thread"-based connections, where each application thread
(or
> >   process, it''s not important how it''s implemented)
gets its own
> >   connection. Many client users may share an application thread
> >   connection.
> 
> Yes, and only dynamic requests get handled by the process that holds
> the DB connection open, and the HTTP connection to the client is NOT
> handled by the application thread.
Yes, I''m clear on that - it''s handled by the web server and
the
connection to the rails process is brokered in some way (fcgi
probably, at this point).
> >The first two are basic PHP models, #3 is how Java does it if
you''re
> >doing it right, and if I''m understanding it, #4 is the rails
method.
> >
> >Correct?
> 
> Yes, with clarifying comments.
Good. :)

[...]> >No - but that''s a limitation of the thread connection model.
Do I need
> >to spawn a whole new fcgi process if I saturate my db (or any other
> >external resource, for that matter)? That seems like an inefficient
> >shotgun scaling mechanism.
> 
> I don''t understand you here. The whole new FCGI process is spawned
in
> advanced, and tuned over time to match traffic patterns. Generally
> speaking in the high ends of scalability we''re discussing,
you''d add
> FCGI processes by plugging in a new application server.
So that''s my real question - what''s an "application
server" in the
rails scaling model?

If each application process gets 1 database connection to be shared
among all of the client users (HTTP connections), what happens if you
saturate that, assuming that some reasonably large number of them
require database access (if they don''t, it''s easy)? Do you
have to add
a whole other rails process, or is there a way to add additional db
connections within the context of one?

[...]> Yes, that''s the front end proxy servers in the drawing below...
> 
> >On 16 webservers, each of which does the default max of 256 apache
> >processes (or 4 webservers if you push it to 1024), that''s
enough to
> >crash the database under heavy load. Granted, this is larger than the
> >average application, but that''s why they call it
"scaling".
> 
> But these connections are the client side HTTP connections, and DO NOT
> have DB connections. :-)
Not necessarily. Ideally, everything that gets past your proxy cache
requires a database connection, because if it didn''t, the proxy cache
should probably be able to handle it. Agreed though, most applications
probably aren''t pushing this kind of fully dynamic traffic. Some may
be though. I want to know how rails deals with that if they do.
> >I''ve seen cases where even static files are invoking the php
engine
> >because of misconfiguration, and that obviously is a problem that
> >compounds this. No idea if that''s a possible mistake to make
in
> >rails. I hope not.
> 
> Well, you know what they say about pigs...lipstick doesn''t help.
:-)
Hey, look what list we''re on.

[...]> I find it very hard to believe that connecting per request is more
> efficient that persistent connections, though I do agree that MySQL
> connects incredibly faster than Oracle, and would therefore show a
> much smaller improvement -vs- Oracle.
> 
> But, I can only imagine that''s faster if you''re opening
connections
> that *aren''t going to be used*. If that''s the case, then
all bets are
> off.
It''s not so much that they aren''t being used, but that they
aren''t
being used for the full duration of the request. Under normal
circumstances, pconnects don''t close when you''re done with
them, until
your request is fully handled. That may be keeping a db connection
locked up for full seconds (or even minutes, in some cases), even
though it''s no longer technically needed. In that case, you need to
use non-persistent connections to cycle them effectively
intra-request. I''ve seen it happen.

[...]> I find again and again that many people have a hard time
> conceptualizing how fast computers are. They think several
> milliseconds is incredibly fast, where to a modern computer it''s
> incredibly slow. I cannot tell you how many times I''ve seen
> applications running slowly, but having little to no disk and/or
> processor utilization issues due to serialized latencies, and people
> suggesting that throwing more and/or faster hardware at it was going
> to fix the problem. :-)
Sometimes it does, but this is why resource contention analysis (of
all different kinds) is important.
> >Perhaps what you say is true for many apps, but I''m also
interested in
> >the difficult boundary cases. :)
> 
> Me too. While I gave some specific examples of the most common issues,
> we''re in agreement that there''s more than one way to
cause an
> application
> to perform poorly. :-)
> 
> Again, just remember the pig!
Often, it''s impossible to forget. But we do what we can. :)

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Tom Mornini

2006-Mar-10 20:47 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:
> So that''s my real question - what''s an "application
server" in the
> rails scaling model?
A machine with one or more processes executing Rails code.
> If each application process gets 1 database connection to be shared
> among all of the client users (HTTP connections), what happens if you
> saturate that, assuming that some reasonably large number of them
> require database access (if they don''t, it''s easy)? Do
you have to add
> a whole other rails process, or is there a way to add additional db
> connections within the context of one?
Well, that''s just it...and seems to be a point of confusion between us.
In the pictures I''ve drawn the systems I''ve described you can
tune the
number of application processes (and therefore DB connections) per HTTP
connections.
> ot necessarily. Ideally, everything that gets past your proxy cache
> requires a database connection, because if it didn''t, the proxy
cache
> should probably be able to handle it. Agreed though, most applications
> probably aren''t pushing this kind of fully dynamic traffic. Some
may
> be though. I want to know how rails deals with that if they do.
Well, in the LAMP scaling model, EVERY application pushes that sort of
load *at the application process*, as all other requests are handled by
the HTTP front end. The application processes *only* serve dynamic
content, and I''m quite sure you understand this.
>>> I''ve seen cases where even static files are invoking the
php engine
>>> because of misconfiguration, and that obviously is a problem that
>>> compounds this. No idea if that''s a possible mistake to
make in
>>> rails. I hope not.
>>
>> Well, you know what they say about pigs...lipstick doesn''t
help. :-)
>
> Hey, look what list we''re on.
I don''t understand this comment at all.

In response to your previous statement, it''s *absolutely* possible to
misconfigure Rails production server environments, and I cannot imagine
that you think that any highly scalable system had infallible
configuration.
>> I find it very hard to believe that connecting per request is more
>> efficient that persistent connections, though I do agree that MySQL
>> connects incredibly faster than Oracle, and would therefore show a
>> much smaller improvement -vs- Oracle.
>>
>> But, I can only imagine that''s faster if you''re
opening connections
>> that *aren''t going to be used*. If that''s the case,
then all bets are
>> off.
>
> It''s not so much that they aren''t being used, but that
they aren''t
> being used for the full duration of the request. Under normal
> circumstances, pconnects don''t close when you''re done
with them, until
> your request is fully handled. That may be keeping a db connection
> locked up for full seconds (or even minutes, in some cases), even
> though it''s no longer technically needed. In that case, you need
to
> use non-persistent connections to cycle them effectively
> intra-request. I''ve seen it happen.
Ah, now this is an interesting point that I had failed to consider.

Yes! Connection pooling would be more efficient in the case where
the majority of dynamic processing time were spent in non-DB related
code. I think that this must be a relatively uncommon situation,
though with data coming from external sources (i.e. web services)
I can see this becoming more prevalent in the future.

Thanks for clarifying that advantage of pooling.
>> I find again and again that many people have a hard time
>> conceptualizing how fast computers are. They think several
>> milliseconds is incredibly fast, where to a modern computer
it''s
>> incredibly slow. I cannot tell you how many times I''ve seen
>> applications running slowly, but having little to no disk and/or
>> processor utilization issues due to serialized latencies, and people
>> suggesting that throwing more and/or faster hardware at it was going
>> to fix the problem. :-)
>
> Sometimes it does, but this is why resource contention analysis (of
> all different kinds) is important.
I''m sure we''ll agree that the most common non-trivial scaling
problems are
not solvable via simple hardware addition, and that''s my point.
>>> Perhaps what you say is true for many apps, but I''m also  
>>> interested in
>>> the difficult boundary cases. :)
>>
>> Me too. While I gave some specific examples of the most common  
>> issues,
>> we''re in agreement that there''s more than one way to
cause an
>> application to perform poorly. :-)
>>
>> Again, just remember the pig!
>
> Often, it''s impossible to forget. But we do what we can. :)
Yes, bill by the hour for creative solutions, no? :-)

-- 
-- Tom Mornini

Adam Fields

2006-Mar-10 21:32 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 12:47:44PM -0800, Tom Mornini
wrote:> On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:
> 
> >So that''s my real question - what''s an
"application server" in the
> >rails scaling model?
> 
> A machine with one or more processes executing Rails code.
> 
> >If each application process gets 1 database connection to be shared
> >among all of the client users (HTTP connections), what happens if you
> >saturate that, assuming that some reasonably large number of them
> >require database access (if they don''t, it''s easy)?
Do you have to add
> >a whole other rails process, or is there a way to add additional db
> >connections within the context of one?
> 
> Well, that''s just it...and seems to be a point of confusion
between us.
> In the pictures I''ve drawn the systems I''ve described you
can tune the
> number of application processes (and therefore DB connections) per HTTP
> connections.
I don''t think that''s a point of confusion. The point of
confusion is
"why". Adding a whole other application process seems pretty
heavyweight (extra ram and cpu utilization, mostly) when all you need
is just another db connection.
> >ot necessarily. Ideally, everything that gets past your proxy cache
> >requires a database connection, because if it didn''t, the
proxy cache
> >should probably be able to handle it. Agreed though, most applications
> >probably aren''t pushing this kind of fully dynamic traffic.
Some may
> >be though. I want to know how rails deals with that if they do.
> 
> Well, in the LAMP scaling model, EVERY application pushes that sort of
> load *at the application process*, as all other requests are handled by
> the HTTP front end. The application processes *only* serve dynamic
> content, and I''m quite sure you understand this.
Yes. I''m getting at questioning the edge cases where the overwhelming
majority of your application traffic is fully dynamic and hitting your
application processes.
> >>>I''ve seen cases where even static files are invoking
the php engine
> >>>because of misconfiguration, and that obviously is a problem
that
> >>>compounds this. No idea if that''s a possible mistake
to make in
> >>>rails. I hope not.
> >>
> >>Well, you know what they say about pigs...lipstick doesn''t
help. :-)
> >
> >Hey, look what list we''re on.
> 
> I don''t understand this comment at all.
Oh, that was just a feeble attempt at humor. The hope is that rails is
less of a pig. :)
> In response to your previous statement, it''s *absolutely* possible
to
> misconfigure Rails production server environments, and I cannot imagine
> that you think that any highly scalable system had infallible
> configuration.
Absolutely true.

[...]> >locked up for full seconds (or even minutes, in some cases), even
> >though it''s no longer technically needed. In that case, you
need to
> >use non-persistent connections to cycle them effectively
> >intra-request. I''ve seen it happen.
> 
> Ah, now this is an interesting point that I had failed to consider.
> 
> Yes! Connection pooling would be more efficient in the case where
> the majority of dynamic processing time were spent in non-DB related
> code. I think that this must be a relatively uncommon situation,
> though with data coming from external sources (i.e. web services)
> I can see this becoming more prevalent in the future.
> 
> Thanks for clarifying that advantage of pooling.
This is exactly why some people look at rails and say "where''s the
real application services layer?".

But I suspect that that''s a simplistic answer and there''s a
more
elegant one in there somewhere.

[...]> I''m sure we''ll agree that the most common non-trivial
scaling
> problems are
> not solvable via simple hardware addition, and that''s my point.
Agreed.

[...]> Yes, bill by the hour for creative solutions, no? :-)
Agreed.

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Justin Forder

2006-Mar-10 22:30 UTC

head link

[Rails] How to scale mysql servers for a rails application?

Tom Mornini wrote:

 > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:
>> It''s not so much that [persistent db connections]   >> aren''t being used, but that they
aren''t>> being used for the full duration of the request. Under normal
>> circumstances, pconnects don''t close when you''re done
with them, until
>> your request is fully handled. That may be keeping a db connection
>> locked up for full seconds (or even minutes, in some cases), even
>> though it''s no longer technically needed. In that case, you
need to
>> use non-persistent connections to cycle them effectively
>> intra-request. I''ve seen it happen.
> 
> Ah, now this is an interesting point that I had failed to consider.
> 
> Yes! Connection pooling would be more efficient in the case where
> the majority of dynamic processing time were spent in non-DB related
> code. I think that this must be a relatively uncommon situation,
> though with data coming from external sources (i.e. web services)
> I can see this becoming more prevalent in the future.
> 
> Thanks for clarifying that advantage of pooling.
 From my production.log:

Processing HistoryController#list (for 217.169.11.194 at 2006-03-10 
20:30:15) [GET]
   Parameters: {"action"=>"list",
"controller"=>"history"}
Rendering  within layouts/application
Rendering history/list
Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) | DB: 
0.02177 (23%) | 200 OK [http://real.host.deleted/history/list]

.. from a small application on a Mac mini running 
Apache-lighty-fcgi-Rails-MySQL (all on the same machine, under Tiger),
and that''s one of the higher records in terms of reported DB time.

This application displays pages of data with embedded graphs 
(sparklines) - the graphs are generated from data provided in the URL.
All requests are for dynamic data, either HTML or PNG.

Averaging over the last 2000 requests, the time reported by Rails for 
the DB contribution is 6.14% of the total (rendering is 38.6%).

If I remove the records which have 0.00000 as the DB time (which will be 
the image requests) I am left with 850 records. For these, the DB 
contribution is 12.4% (rendering is 77.8%).

The rendered pages contain tables with many links, and these use 
link_to, which Stefan Kaes has said is slow - so my figures could change 
with tuning. But they do show clearly that the majority of time can be 
spent in non-DB code.

regards

   Justin

Adam Fields

2006-Mar-10 22:54 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 10:30:19PM +0000, Justin Forder wrote:
[...]> The rendered pages contain tables with many links, and these use 
> link_to, which Stefan Kaes has said is slow - so my figures could change 
> with tuning. But they do show clearly that the majority of time can be 
> spent in non-DB code.
That''s not the whole picture though. That total aggregate time could
be spread out throughout a large number of DB calls. Theoretically,
you shouldn''t be disconnecting from the DB between calls (although in
a rails model, maybe the application is timeslicing the DB connection
between requests - I don''t know the answer to that), but
what''s really
important is the processing time after the last DB call in a
request. Any time there is definitely wasted from the point of view of
keeping a DB connection open.

Whether this is an issue probably depends on how a request ties up the
DB connection within the rails execution context, and I''m not clear on
how that works.

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Justin Forder

2006-Mar-11 00:39 UTC

head link

[Rails] How to scale mysql servers for a rails application?

Adam Fields wrote:> On Fri, Mar 10, 2006 at 10:30:19PM +0000, Justin Forder wrote:
> [...]
>> The rendered pages contain tables with many links, and these use 
>> link_to, which Stefan Kaes has said is slow - so my figures could
change
>> with tuning. But they do show clearly that the majority of time can be 
>> spent in non-DB code.
> 
> That''s not the whole picture though. That total aggregate time
could
> be spread out throughout a large number of DB calls. Theoretically,
> you shouldn''t be disconnecting from the DB between calls (although
in
> a rails model, maybe the application is timeslicing the DB connection
> between requests - I don''t know the answer to that), but
what''s really
> important is the processing time after the last DB call in a
> request. Any time there is definitely wasted from the point of view of
> keeping a DB connection open.
It''s true that if you are not careful you can find yourself making
calls
to the DB while rendering the response (because of lazy loading). It''s 
preferable IMHO to build the complete model before rendering it, and 
that is what my application does. So, for a given request, all the 
rendering time comes *after* all the DB time (and the DB time should be 
from one or two calls to the DB).
> Whether this is an issue probably depends on how a request ties up the
> DB connection within the rails execution context, and I''m not
clear on
> how that works.
As I mentioned before, this is running under FCGI, and each Rails FCGI 
process holds an open DB connection. These are the persistent 
connections Tom was referring to. This is a straightforward model, and 
for nearly all applications you don''t need a large number of FCGI 
processes, so I don''t see a problem with having a DB connection per 
process. I just posted my figures to show that this model doesn''t load 
the individual DB connections heavily.

regards

   Justin

Tom Mornini

2006-Mar-11 03:38 UTC

head link

[Rails] How to scale mysql servers for a rails application?

Interesting data, Justin!

My experience suggests that most applications
don''t generate large numbers of rendered images,
but my view may be hopelessly Web 1.0.

--  
-- Tom Mornini

On Mar 10, 2006, at 2:30 PM, Justin Forder wrote:
> Tom Mornini wrote:
>
> > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:
>
>>> It''s not so much that [persistent db connections]
> >> aren''t being used, but that they aren''t
>>> being used for the full duration of the request. Under normal
>>> circumstances, pconnects don''t close when you''re
done with them,
>>> until
>>> your request is fully handled. That may be keeping a db connection
>>> locked up for full seconds (or even minutes, in some cases), even
>>> though it''s no longer technically needed. In that case,
you need to
>>> use non-persistent connections to cycle them effectively
>>> intra-request. I''ve seen it happen.
>> Ah, now this is an interesting point that I had failed to consider.
>> Yes! Connection pooling would be more efficient in the case where
>> the majority of dynamic processing time were spent in non-DB related
>> code. I think that this must be a relatively uncommon situation,
>> though with data coming from external sources (i.e. web services)
>> I can see this becoming more prevalent in the future.
>> Thanks for clarifying that advantage of pooling.
>
> From my production.log:
>
> Processing HistoryController#list (for 217.169.11.194 at 2006-03-10  
> 20:30:15) [GET]
>   Parameters: {"action"=>"list",
"controller"=>"history"}
> Rendering  within layouts/application
> Rendering history/list
> Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) | DB:  
> 0.02177 (23%) | 200 OK [http://real.host.deleted/history/list]
>
> .. from a small application on a Mac mini running Apache-lighty- 
> fcgi-Rails-MySQL (all on the same machine, under Tiger),
> and that''s one of the higher records in terms of reported DB time.
>
> This application displays pages of data with embedded graphs  
> (sparklines) - the graphs are generated from data provided in the URL.
> All requests are for dynamic data, either HTML or PNG.
>
> Averaging over the last 2000 requests, the time reported by Rails  
> for the DB contribution is 6.14% of the total (rendering is 38.6%).
>
> If I remove the records which have 0.00000 as the DB time (which  
> will be the image requests) I am left with 850 records. For these,  
> the DB contribution is 12.4% (rendering is 77.8%).
>
> The rendered pages contain tables with many links, and these use  
> link_to, which Stefan Kaes has said is slow - so my figures could  
> change with tuning. But they do show clearly that the majority of  
> time can be spent in non-DB code.
>
> regards
>
>   Justin
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

Tom Mornini

2006-Mar-11 03:45 UTC

head link

[Rails] How to scale mysql servers for a rails application?

Oh, additionally, as you add app servers to the
system, the DB will become loaded, while the
app server load will remain fairly constant.

This would tend to skew loaded results toward
higher DB time ratios.

Of course you can scale the DB through a variety
of methods, but generally speaking, DBs don''t
scale as easily as the application layer.

--  
-- Tom Mornini

On Mar 10, 2006, at 7:38 PM, Tom Mornini wrote:
> Interesting data, Justin!
>
> My experience suggests that most applications
> don''t generate large numbers of rendered images,
> but my view may be hopelessly Web 1.0.
>
> -- -- Tom Mornini
>
> On Mar 10, 2006, at 2:30 PM, Justin Forder wrote:
>
>> Tom Mornini wrote:
>>
>> > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:
>>
>>>> It''s not so much that [persistent db connections]
>> >> aren''t being used, but that they aren''t
>>>> being used for the full duration of the request. Under normal
>>>> circumstances, pconnects don''t close when
you''re done with them,
>>>> until
>>>> your request is fully handled. That may be keeping a db
connection
>>>> locked up for full seconds (or even minutes, in some cases),
even
>>>> though it''s no longer technically needed. In that
case, you need to
>>>> use non-persistent connections to cycle them effectively
>>>> intra-request. I''ve seen it happen.
>>> Ah, now this is an interesting point that I had failed to consider.
>>> Yes! Connection pooling would be more efficient in the case where
>>> the majority of dynamic processing time were spent in non-DB
related
>>> code. I think that this must be a relatively uncommon situation,
>>> though with data coming from external sources (i.e. web services)
>>> I can see this becoming more prevalent in the future.
>>> Thanks for clarifying that advantage of pooling.
>>
>> From my production.log:
>>
>> Processing HistoryController#list (for 217.169.11.194 at  
>> 2006-03-10 20:30:15) [GET]
>>   Parameters: {"action"=>"list",
"controller"=>"history"}
>> Rendering  within layouts/application
>> Rendering history/list
>> Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) |  
>> DB: 0.02177 (23%) | 200 OK [http://real.host.deleted/history/list]
>>
>> .. from a small application on a Mac mini running Apache-lighty- 
>> fcgi-Rails-MySQL (all on the same machine, under Tiger),
>> and that''s one of the higher records in terms of reported DB
time.
>>
>> This application displays pages of data with embedded graphs  
>> (sparklines) - the graphs are generated from data provided in the  
>> URL.
>> All requests are for dynamic data, either HTML or PNG.
>>
>> Averaging over the last 2000 requests, the time reported by Rails  
>> for the DB contribution is 6.14% of the total (rendering is 38.6%).
>>
>> If I remove the records which have 0.00000 as the DB time (which  
>> will be the image requests) I am left with 850 records. For these,  
>> the DB contribution is 12.4% (rendering is 77.8%).
>>
>> The rendered pages contain tables with many links, and these use  
>> link_to, which Stefan Kaes has said is slow - so my figures could  
>> change with tuning. But they do show clearly that the majority of  
>> time can be spent in non-DB code.
>>
>> regards
>>
>>   Justin
>> _______________________________________________
>> Rails mailing list
>> Rails@lists.rubyonrails.org
>> http://lists.rubyonrails.org/mailman/listinfo/rails
>
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

Adam Fields

2006-Mar-11 03:52 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 07:45:05PM -0800, Tom Mornini
wrote:> Oh, additionally, as you add app servers to the
> system, the DB will become loaded, while the
> app server load will remain fairly constant.
> 
> This would tend to skew loaded results toward
> higher DB time ratios.
> 
> Of course you can scale the DB through a variety
> of methods, but generally speaking, DBs don''t
> scale as easily as the application layer.
We''ve come full circle - this discussion started with a question about
how to scale MySQL to multiple servers.

It''s easier if you have a real db connection pool broker. :)

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Adam Fields

2006-Mar-11 03:58 UTC

head link

[Rails] How to scale mysql servers for a rails application?

On Fri, Mar 10, 2006 at 07:38:49PM -0800, Tom Mornini
wrote:> Interesting data, Justin!
> 
> My experience suggests that most applications
> don''t generate large numbers of rendered images,
> but my view may be hopelessly Web 1.0.
I thought SVG would get some more traction sooner, but it hasn''t
really caught on. 

-- 
				- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.adamfields.com ]

[ http://www.aquick.org/blog ] ............ Blog
[ http://www.adamfields.com/resume.html ].. Experience
[ http://www.flickr.com/photos/fields ] ... Photos
[ http://www.aquicki.com/wiki ].............Wiki

Seemingly Similar Threads

Search for more apparently analagous threads

Rails - Mar 2006 - How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

[Rails] How to scale mysql servers for a rails application?

Seemingly Similar Threads