Dean Holdren
2006-Mar-09 19:24 UTC
[Rails] How to scale mysql servers for a rails application?
I''m a developer working on an application that will potentially be used by around 500,000 users on a daily basis. Plus some internal apps communicating to it via ActionWebServices with a potentially high demand. Our Operations team is helping us define the necessary system architecture, and I have one remaining question: What is the best way to scale the database? I have no expertise in this area. But considering that one one web server can talk to only one database ip/host (as configured in database.yml) what is the best option? Note: I''m using MySQL 4.x, and there will be N web servers hosting the rails application behind a load-balancer. (user->web-load-balancer->webserver) 0) No db scaling - One database ip/host for all web servers - single point of failure, potentially too much load for one server to handle 1) DB Load balancing with database replication - a layer of load balancing between the webservers and multiple databases - requires that the databases perform replication amongst themselves. in this configuration, the database.yml will point to the db-load-balancer. 2) partition sets of the web servers to talk to one database per set, with database replication. i.e. set A consists of 3 web servers, web1, web2, web3 which all communicate with dbA, set B consists of 4 web servers, web4, web5, web6, web7 which all communicate with dbB. There is replication between dbA and dbB. Our biggest concern is failover/availability, so if one database goes down, we can still continue. This sort of rules out option 2, unless our web load balancer can somehow register a web server as unavailable if the database it uses is unavailable. what types of architectures are high-demand rails applications using? I am limited by the database I''m using, which is currently MySQL 4.x, If there is a good reason to move to MySQL 5.x for a feature that will help in this capactiy, please let me know. Does anyone have experience using the clustering capability of MySQL?
Tom Mornini
2006-Mar-09 19:47 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 9, 2006, at 11:24 AM, Dean Holdren wrote:> Note: I''m using MySQL 4.x, and there will be N web servers hosting the > rails application behind a load-balancer. > (user->web-load-balancer->webserver)I''d recommend a three-tier setup, web server, app server and DB server (s)> 0) No db scaling - One database ip/host for all web servers - single > point of failure, potentially too much load for one server to handleNot an option for you...> 1) DB Load balancing with database replication - a layer of load > balancing between the webservers and multiple databases - requires > that the databases perform replication amongst themselves. in this > configuration, the database.yml will point to the db-load-balancer.What a wonderful world! http://www.mysql.com/products/database/cluster/ http://www.continuent.com/index.php? option=com_content&task=view&id=210&Itemid=173 http://www.openminds.co.uk/high_availability_solutions/databases/ postgresql.htm http://www.linuxlabs.com/clusgres.html> 2) partition sets of the web servers to talk to one database per set, > with database replication. > i.e. set A consists of 3 web servers, web1, web2, web3 which all > communicate with dbA, set B consists of 4 web servers, web4, web5, > web6, web7 which all communicate with dbB. There is replication > between dbA and dbB.Ugggh. :-) -- -- Tom Mornini
Adam Fields
2006-Mar-09 20:44 UTC
[Rails] How to scale mysql servers for a rails application?
On Thu, Mar 09, 2006 at 11:47:42AM -0800, Tom Mornini wrote: [...]> >0) No db scaling - One database ip/host for all web servers - single > >point of failure, potentially too much load for one server to handle > > Not an option for you...That''s true. :)> >1) DB Load balancing with database replication - a layer of load > >balancing between the webservers and multiple databases - requires > >that the databases perform replication amongst themselves. in this > >configuration, the database.yml will point to the db-load-balancer. > > What a wonderful world! > > http://www.mysql.com/products/database/cluster/ > http://www.continuent.com/index.php? > option=com_content&task=view&id=210&Itemid=173 > http://www.openminds.co.uk/high_availability_solutions/databases/ > postgresql.htm > http://www.linuxlabs.com/clusgres.htmlLast time I checked, the MySQL cluster was based on the Emic clustering. While this will give you pretty good throughput, it''s still got some connection limitations. I''ve done this a lot with PHP and other platforms, not so much with rails yet. I don''t know how the database pooling works in rails, but if you''ve got a lot of apache processes running, you may be in danger of exhausting the mysql connection limitations on the server.> >2) partition sets of the web servers to talk to one database per set, > >with database replication. > >i.e. set A consists of 3 web servers, web1, web2, web3 which all > >communicate with dbA, set B consists of 4 web servers, web4, web5, > >web6, web7 which all communicate with dbB. There is replication > >between dbA and dbB. > > Ugggh. :-)This is not a terrible configuration if you''re able to segregate reads and writes. Depends on the application. Select against the slave is a pretty common scaling technique, although it requires some infrastructure. I haven''t seen a rails installation do it yet. Dean, if you want to contact me offlist, I do general mysql and application scaling and architecture consulting (including failover and replication). I offer a discount for rails applications. -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Tom Mornini
2006-Mar-09 21:12 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 9, 2006, at 12:44 PM, Adam Fields wrote:>>> 1) DB Load balancing with database replication - a layer of load >>> balancing between the webservers and multiple databases - requires >>> that the databases perform replication amongst themselves. in this >>> configuration, the database.yml will point to the db-load-balancer. >> >> What a wonderful world! >> >> http://www.mysql.com/products/database/cluster/ >> http://www.continuent.com/index.php? >> option=com_content&task=view&id=210&Itemid=173 >> http://www.openminds.co.uk/high_availability_solutions/databases/ >> postgresql.htm >> http://www.linuxlabs.com/clusgres.html > > Last time I checked, the MySQL cluster was based on the Emic > clustering. While this will give you pretty good throughput, it''s > still got some connection limitations. > > I''ve done this a lot with PHP and other platforms, not so much with > rails yet. I don''t know how the database pooling works in rails, but > if you''ve got a lot of apache processes running, you may be in danger > of exhausting the mysql connection limitations on the server.Generally speaking, you get connection "pooling" in Rails via lots of front-end web server (Apache and/or Lighttpd) connections being proxied back to a far smaller number of application processes (FCGI,SCGI,Mongrel, or WEBrick). Rails would then have 1 connection per application process, so the number of Apache processes running is largely irrelevant. -- -- Tom Mornini
Adam Fields
2006-Mar-10 02:33 UTC
[Rails] How to scale mysql servers for a rails application?
On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote: [...]> Generally speaking, you get connection "pooling" in Rails via lots of > front-end web server (Apache and/or Lighttpd) connections being proxied > back to a far smaller number of application processes > (FCGI,SCGI,Mongrel, > or WEBrick). Rails would then have 1 connection per application process, > so the number of Apache processes running is largely irrelevant.Makes sense. What happens with simultaneous requests within the same application process? How do you deal with resource contention on the same connection, long running queries, and potential blocking? -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Tom Mornini
2006-Mar-10 04:24 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 9, 2006, at 6:33 PM, Adam Fields wrote:> On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote: > [...] >> Generally speaking, you get connection "pooling" in Rails via lots of >> front-end web server (Apache and/or Lighttpd) connections being >> proxied >> back to a far smaller number of application processes >> (FCGI,SCGI,Mongrel, >> or WEBrick). Rails would then have 1 connection per application >> process, >> so the number of Apache processes running is largely irrelevant. > > Makes sense. > > What happens with simultaneous requests within the same application > process? How do you deal with resource contention on the same > connection, long running queries, and potential blocking?The same thing connection pools do. :-) Simultaneous requests are run on separate application processes. You never have resource contention on the same connection, because you have one connection per concurrent thread. Long running queries take up a connection (and corresponding application process) for a long time. If you don''t have enough backends to handle the situation you mention, you add more. Ever since LAMP scaling was pioneered in the late 90''s, there''s been discussion on this subject. It generally breaks down like this: By splitting up the web server from the application server, you take away the main bottleneck which is (somewhat surprisingly) the connection to the client, who generally has much lower bandwidth than the app cluster. This was particularly true back in the modem days. I cannot remember the exact numbers we discovered back then, but it was somewhere in the neighborhood 1/8 to 1/16 the number of backend processes as we had web sockets, assuming the web sockets served static content themselves. So, in effect, you get connection pooling, just in a different architecture. -- -- Tom Mornini
Adam Fields
2006-Mar-10 05:02 UTC
[Rails] How to scale mysql servers for a rails application?
On Thu, Mar 09, 2006 at 08:24:31PM -0800, Tom Mornini wrote:> On Mar 9, 2006, at 6:33 PM, Adam Fields wrote: > > >On Thu, Mar 09, 2006 at 01:12:47PM -0800, Tom Mornini wrote: > >[...] > >>Generally speaking, you get connection "pooling" in Rails via lots of > >>front-end web server (Apache and/or Lighttpd) connections being > >>proxied > >>back to a far smaller number of application processes > >>(FCGI,SCGI,Mongrel, > >>or WEBrick). Rails would then have 1 connection per application > >>process, > >>so the number of Apache processes running is largely irrelevant. > > > >Makes sense. > > > >What happens with simultaneous requests within the same application > >process? How do you deal with resource contention on the same > >connection, long running queries, and potential blocking? > > The same thing connection pools do. :-)Well, not really. Since connection pools are available intraprocess, if you have two threads running competing queries, you can assign them each a connection from the pool. Say, hypothetically, that each user is going to hit a common transaction table. So you have a few thousand users, each on their own apache thread, but all sharing one rails fcgi backend process, and thus one db connection between them. You may get resource contention on that transaction table, because of that sharing, since only one query can execute at a time on the connection. Unless I''m misunderstanding what you said - it''s not clear. Is there one db connection per execution thread, or one db connection per application process?> Simultaneous requests are run on separate application processes. > > You never have resource contention on the same connection, because > you have > one connection per concurrent thread.But then this devolves to the same case where you have a separate db connection per apache process. If you have one connection per client, what difference does it make if it''s assigned via an apache process or a rails execution thread? In that case you do have the possibility of exhausting the number of possible connections.> Long running queries take up a connection (and corresponding application > process) for a long time. > > If you don''t have enough backends to handle the situation you > mention, you > add more. > > Ever since LAMP scaling was pioneered in the late 90''s, there''s been > discussion on this subject. It generally breaks down like this: > > By splitting up the web server from the application server, you take > away > the main bottleneck which is (somewhat surprisingly) the connection > to the > client, who generally has much lower bandwidth than the app cluster. > This > was particularly true back in the modem days. > > I cannot remember the exact numbers we discovered back then, but it was > somewhere in the neighborhood 1/8 to 1/16 the number of backend > processes > as we had web sockets, assuming the web sockets served static content > themselves. > > So, in effect, you get connection pooling, just in a different > architecture.-- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Tom Mornini
2006-Mar-10 06:44 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 9, 2006, at 9:02 PM, Adam Fields wrote:>>> What happens with simultaneous requests within the same application >>> process? How do you deal with resource contention on the same >>> connection, long running queries, and potential blocking? >> >> The same thing connection pools do. :-) > > Well, not really. Since connection pools are available intraprocess, > if you have two threads running competing queries, you can assign them > each a connection from the pool.Yes, but each connection can only be used by one thread at a time...> Say, hypothetically, that each user is going to hit a common > transaction table. So you have a few thousand users, each on their own > apache thread, but all sharing one rails fcgi backend process, and > thus one db connection between them. You may get resource contention > on that transaction table, because of that sharing, since only one > query can execute at a time on the connection.Yes, that would be a problem, but not necessarily more so than the pooled model, except that you''ve artificially set the connection/FCGI ration at a few thousand to one. Certainly you wouldn''t run the same few thousand users with a pool of one connection, would you?> Is there one db connection per execution thread, or one db connection > per application process?Per application process, which is also per execution thread, as the Rails internals are not multi-threaded.>> Simultaneous requests are run on separate application processes. >> >> You never have resource contention on the same connection, because >> you have one connection per concurrent thread. > > But then this devolves to the same case where you have a separate db > connection per apache process. If you have one connection per client, > what difference does it make if it''s assigned via an apache process or > a rails execution thread? In that case you do have the possibility of > exhausting the number of possible connections.Because the reality is this: you get more than one HTTP (not necessarily handled by Apache...there are other servers) requests than you get application requests. The HTTP server serves static pages, images, CSS and Javascript files. Addtionally, see comments below which you did not comment on.>> Long running queries take up a connection (and corresponding >> application >> process) for a long time. >> >> If you don''t have enough backends to handle the situation you >> mention, you add more. >> >> Ever since LAMP scaling was pioneered in the late 90''s, there''s been >> discussion on this subject. It generally breaks down like this: >> >> By splitting up the web server from the application server, you take >> away the main bottleneck which is (somewhat surprisingly) the >> connection >> to the client, who generally has much lower bandwidth than the app >> cluster. >> This was particularly true back in the modem days. >> >> I cannot remember the exact numbers we discovered back then, but >> it was >> somewhere in the neighborhood 1/8 to 1/16 the number of backend >> processes >> as we had web sockets, assuming the web sockets served static content >> themselves. >> >> So, in effect, you get connection pooling, just in a different >> architecture.It breaks down like this, for the same fundamental reasons: In the connection pooled world, there are clearly rules of thumb and tuning involved in determining the correct number of pools per client connection: http://edocs.bea.com/wls/docs70/perform/WLSTuning.html If I read the correctly by default WebLogic creates a connection pool equal to 33% of application threads, and states that the most significant performance increase caused by connection pooling comes from maintaining the connection as opposed to reducing resource utilization. The three-tier LAMP scaling model provides the same benefits in a different way. As I stated, in the past I''ve seen application specific tuning of the LAMP configuration at ration of 1/8-1/16 application processes per HTTP connection, and I''m willing to bet that''s a fairly similar number to the tuning for the connection pooling ratio. Here''s some nasty application art: Web client <------------> HTTP Server <------------> App Server <------------> DB 1,000 1,000 125 125 So, for 1,000 simultaneous web requests, you need 125 DB connections (in this example) and in my experience perhaps only 67 or so DB connections. The reason for this is the latency and bandwidth limits between the web client and the HTTP server *plus* the fact that many of those connections require NO App Server utilization (static files and images) combine to produce resource contention in the first stage that are between 8 and 16 times the resource contention the 2nd and third combined. I''ve heard that 37Signals and other large scale applications are running far more front end process per backend process than the 8-16 that I''ve described. The reason for this would be the dramatic increase in performance of today''s hardware -vs- the far smaller increase in client side bandwidth during that same period. The latency of the connections also plays an enormous role in these issues. Packet times between the web client and HTTP server are likely to be in the 30-60 ms range, while the intra-farm packet latencies are sub-millisecond. So, to sum it up, even for dynamic requests, a minimum latency from the client to the HTTP server would be in the 60-120 ms range, even if the request only took one packet in each direction. That''s only about 8 requests per second. The vast majority of the dynamic requests that will be proxied back to the application servers will be handled in just a few milliseconds. These are the reasons why you can handle more than one client side connections per application process. -- -- Tom Mornini
Dylan Stamat
2006-Mar-10 14:33 UTC
[Rails] How to scale mysql servers for a rails application?
Just wanted to jump in and thank you guys for hashing this out in this post ! I''m sure this area of expertise is a huge weakness of a lot of web app developers (myself at least), and these type of threads end up helping tremendously at some point :) On 3/9/06, Tom Mornini <tmornini@infomania.com> wrote:> > On Mar 9, 2006, at 9:02 PM, Adam Fields wrote: > > >>> What happens with simultaneous requests within the same application > >>> process? How do you deal with resource contention on the same > >>> connection, long running queries, and potential blocking? > >> > >> The same thing connection pools do. :-) > > > > Well, not really. Since connection pools are available intraprocess, > > if you have two threads running competing queries, you can assign them > > each a connection from the pool. > > Yes, but each connection can only be used by one thread at a time... > > > Say, hypothetically, that each user is going to hit a common > > transaction table. So you have a few thousand users, each on their own > > apache thread, but all sharing one rails fcgi backend process, and > > thus one db connection between them. You may get resource contention > > on that transaction table, because of that sharing, since only one > > query can execute at a time on the connection. > > Yes, that would be a problem, but not necessarily more so than the > pooled model, except that you''ve artificially set the connection/FCGI > ration at a few thousand to one. Certainly you wouldn''t run the same > few thousand users with a pool of one connection, would you? > > > Is there one db connection per execution thread, or one db connection > > per application process? > > Per application process, which is also per execution thread, as the > Rails > internals are not multi-threaded. > > >> Simultaneous requests are run on separate application processes. > >> > >> You never have resource contention on the same connection, because > >> you have one connection per concurrent thread. > > > > But then this devolves to the same case where you have a separate db > > connection per apache process. If you have one connection per client, > > what difference does it make if it''s assigned via an apache process or > > a rails execution thread? In that case you do have the possibility of > > exhausting the number of possible connections. > > Because the reality is this: you get more than one HTTP (not necessarily > handled by Apache...there are other servers) requests than you get > application requests. The HTTP server serves static pages, images, CSS > and Javascript files. > > Addtionally, see comments below which you did not comment on. > > >> Long running queries take up a connection (and corresponding > >> application > >> process) for a long time. > >> > >> If you don''t have enough backends to handle the situation you > >> mention, you add more. > >> > >> Ever since LAMP scaling was pioneered in the late 90''s, there''s been > >> discussion on this subject. It generally breaks down like this: > >> > >> By splitting up the web server from the application server, you take > >> away the main bottleneck which is (somewhat surprisingly) the > >> connection > >> to the client, who generally has much lower bandwidth than the app > >> cluster. > >> This was particularly true back in the modem days. > >> > >> I cannot remember the exact numbers we discovered back then, but > >> it was > >> somewhere in the neighborhood 1/8 to 1/16 the number of backend > >> processes > >> as we had web sockets, assuming the web sockets served static content > >> themselves. > >> > >> So, in effect, you get connection pooling, just in a different > >> architecture. > > It breaks down like this, for the same fundamental reasons: > > In the connection pooled world, there are clearly rules of thumb and > tuning > involved in determining the correct number of pools per client > connection: > > http://edocs.bea.com/wls/docs70/perform/WLSTuning.html > > If I read the correctly by default WebLogic creates a connection pool > equal > to 33% of application threads, and states that the most significant > performance > increase caused by connection pooling comes from maintaining the > connection as > opposed to reducing resource utilization. > > The three-tier LAMP scaling model provides the same benefits in a > different > way. As I stated, in the past I''ve seen application specific tuning > of the > LAMP configuration at ration of 1/8-1/16 application processes per HTTP > connection, and I''m willing to bet that''s a fairly similar number to the > tuning for the connection pooling ratio. > > Here''s some nasty application art: > > Web client <------------> HTTP Server <------------> App Server > <------------> DB > 1,000 1,000 > 125 125 > > So, for 1,000 simultaneous web requests, you need 125 DB connections > (in this example) > and in my experience perhaps only 67 or so DB connections. > > The reason for this is the latency and bandwidth limits between the > web client > and the HTTP server *plus* the fact that many of those connections > require NO > App Server utilization (static files and images) combine to produce > resource > contention in the first stage that are between 8 and 16 times the > resource > contention the 2nd and third combined. > > I''ve heard that 37Signals and other large scale applications are > running far more > front end process per backend process than the 8-16 that I''ve > described. The reason > for this would be the dramatic increase in performance of today''s > hardware -vs- the > far smaller increase in client side bandwidth during that same period. > > The latency of the connections also plays an enormous role in these > issues. Packet > times between the web client and HTTP server are likely to be in the > 30-60 ms > range, while the intra-farm packet latencies are sub-millisecond. > > So, to sum it up, even for dynamic requests, a minimum latency from > the client to > the HTTP server would be in the 60-120 ms range, even if the request > only took one > packet in each direction. That''s only about 8 requests per second. > > The vast majority of the dynamic requests that will be proxied back > to the application > servers will be handled in just a few milliseconds. > > These are the reasons why you can handle more than one client side > connections per > application process. > > -- > -- Tom Mornini > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060310/439ba3cb/attachment-0001.html
Adam Fields
2006-Mar-10 15:49 UTC
[Rails] How to scale mysql servers for a rails application?
On Thu, Mar 09, 2006 at 10:44:37PM -0800, Tom Mornini wrote:> On Mar 9, 2006, at 9:02 PM, Adam Fields wrote: > > >>>What happens with simultaneous requests within the same application > >>>process? How do you deal with resource contention on the same > >>>connection, long running queries, and potential blocking? > >> > >>The same thing connection pools do. :-) > > > >Well, not really. Since connection pools are available intraprocess, > >if you have two threads running competing queries, you can assign them > >each a connection from the pool. > > Yes, but each connection can only be used by one thread at a time...To be clear, there are really four architectures we''re talking about. Connection below always means db connection, and a client user is assumed to be using the database (see below for more on that): 1) Non-persistent connections, each client opens its own connection, and closes it when it''s done. Basically 1:1 clients to connections. 2) Persistent connections, each client opens its own connection, but if there''s one already open, that''s used. Definitely 1:1 clients to connections, and in fact this can be worse, because clients are probably holding open persistent connections even if they''re not using them. This may make sense if the connection overhead is large. 3) "Real" DB pooling, where the application server / connection manager hands out db connections as needed for each query and reclaims them back to the pool when they''re done. There''s basically no relation between the number of connections and the number of client users. 4) "Thread"-based connections, where each application thread (or process, it''s not important how it''s implemented) gets its own connection. Many client users may share an application thread connection. The first two are basic PHP models, #3 is how Java does it if you''re doing it right, and if I''m understanding it, #4 is the rails method. Correct?> >Say, hypothetically, that each user is going to hit a common > >transaction table. So you have a few thousand users, each on their own > >apache thread, but all sharing one rails fcgi backend process, and > >thus one db connection between them. You may get resource contention > >on that transaction table, because of that sharing, since only one > >query can execute at a time on the connection. > > Yes, that would be a problem, but not necessarily more so than the > pooled model, except that you''ve artificially set the connection/FCGI > ration at a few thousand to one. Certainly you wouldn''t run the same > few thousand users with a pool of one connection, would you?No - but that''s a limitation of the thread connection model. Do I need to spawn a whole new fcgi process if I saturate my db (or any other external resource, for that matter)? That seems like an inefficient shotgun scaling mechanism.> >Is there one db connection per execution thread, or one db connection > >per application process? > > Per application process, which is also per execution thread, as the > Rails internals are not multi-threaded.That makes sense.> >>Simultaneous requests are run on separate application processes. > >> > >>You never have resource contention on the same connection, because > >>you have one connection per concurrent thread. > > > >But then this devolves to the same case where you have a separate db > >connection per apache process. If you have one connection per client, > >what difference does it make if it''s assigned via an apache process or > >a rails execution thread? In that case you do have the possibility of > >exhausting the number of possible connections. > > Because the reality is this: you get more than one HTTP (not necessarily > handled by Apache...there are other servers) requests than you get > application requests. The HTTP server serves static pages, images, CSS > and Javascript files.Well, in a large application where you''re in danger of exhausting the number of simultaneous connections on the database (which, for MySQL under normal conditions is about 4,000, although I''ve been able to push it to around 10,000), you''re probably pushing static and/or cached files out from somewhere else. Either a CDN, or some sort of front-end caching mechanism. On 16 webservers, each of which does the default max of 256 apache processes (or 4 webservers if you push it to 1024), that''s enough to crash the database under heavy load. Granted, this is larger than the average application, but that''s why they call it "scaling". I''ve seen cases where even static files are invoking the php engine because of misconfiguration, and that obviously is a problem that compounds this. No idea if that''s a possible mistake to make in rails. I hope not.> Addtionally, see comments below which you did not comment on.I was waiting to respond until I understood what you were saying. :)> >>Long running queries take up a connection (and corresponding > >>application > >>process) for a long time. > >> > >>If you don''t have enough backends to handle the situation you > >>mention, you add more. > >> > >>Ever since LAMP scaling was pioneered in the late 90''s, there''s been > >>discussion on this subject. It generally breaks down like this: > >> > >>By splitting up the web server from the application server, you take > >>away the main bottleneck which is (somewhat surprisingly) the > >>connection > >>to the client, who generally has much lower bandwidth than the app > >>cluster. > >>This was particularly true back in the modem days. > >> > >>I cannot remember the exact numbers we discovered back then, but > >>it was > >>somewhere in the neighborhood 1/8 to 1/16 the number of backend > >>processes > >>as we had web sockets, assuming the web sockets served static content > >>themselves. > >> > >>So, in effect, you get connection pooling, just in a different > >>architecture. > > It breaks down like this, for the same fundamental reasons: > > In the connection pooled world, there are clearly rules of thumb and > tuning > involved in determining the correct number of pools per client > connection: > > http://edocs.bea.com/wls/docs70/perform/WLSTuning.html > > If I read the correctly by default WebLogic creates a connection pool > equal > to 33% of application threads, and states that the most significant > performance > increase caused by connection pooling comes from maintaining the > connection as > opposed to reducing resource utilization.If your connection overhead is high, as it is with Oracle. This is MUCH less of an issue with mysql, and in fact, you''ll often get better performance by not using persistent connections and letting the connections cycle away and not be held beyond when they''re needed.> The three-tier LAMP scaling model provides the same benefits in a > different > way. As I stated, in the past I''ve seen application specific tuning > of the > LAMP configuration at ration of 1/8-1/16 application processes per HTTP > connection, and I''m willing to bet that''s a fairly similar number to the > tuning for the connection pooling ratio. > > Here''s some nasty application art::)> Web client <------------> HTTP Server <------------> App Server <------------> DB > 1,000 1,000 125 125 > > So, for 1,000 simultaneous web requests, you need 125 DB connections > (in this example) and in my experience perhaps only 67 or so DB connections. > > The reason for this is the latency and bandwidth limits between the > web client and the HTTP server *plus* the fact that many of those > connections require NO App Server utilization (static files and > images) combine to produce resource contention in the first stage > that are between 8 and 16 times the resource contention the 2nd and > third combined.It depends on what kind of queries your running and how long they take on average, but okay for the general case.> I''ve heard that 37Signals and other large scale applications are > running far more front end process per backend process than the 8-16 > that I''ve described. The reason for this would be the dramatic > increase in performance of today''s hardware -vs- the far smaller > increase in client side bandwidth during that same period.Yes, I can definitely see that.> The latency of the connections also plays an enormous role in these > issues. Packet times between the web client and HTTP server are > likely to be in the 30-60 ms range, while the intra-farm packet > latencies are sub-millisecond. > > So, to sum it up, even for dynamic requests, a minimum latency from > the client to the HTTP server would be in the 60-120 ms range, even > if the request only took one packet in each direction. That''s only > about 8 requests per second. > > The vast majority of the dynamic requests that will be proxied back > to the application servers will be handled in just a few milliseconds. > > These are the reasons why you can handle more than one client side > connections per application process.Latency is not the only issue though - this equation is heavily dependent on the efficiency of your backend queries, how long they take to run, and how they stack up with respect to each other than the db''s own resource contention algorithms. For example, if you''re doing a lot of big file uploads, you may be holding open db connections for the whole length of that connection, which may be minutes, even if you only use the db at the very beginning to record the transaction. Perhaps what you say is true for many apps, but I''m also interested in the difficult boundary cases. :) -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Adam Fields
2006-Mar-10 16:02 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 06:33:15AM -0800, Dylan Stamat wrote:> Just wanted to jump in and thank you guys for hashing this out in this post! > I''m sure this area of expertise is a huge weakness of a lot of web app > developers (myself at least), and these type of threads end up helping > tremendously at some point :)Happy to help. Remember that when you need an expensive consultant to bail you out of a problem by Monday. :) -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Tom Mornini
2006-Mar-10 16:08 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 10, 2006, at 8:02 AM, Adam Fields wrote:> On Fri, Mar 10, 2006 at 06:33:15AM -0800, Dylan Stamat wrote: >> Just wanted to jump in and thank you guys for hashing this out in >> this post! >> I''m sure this area of expertise is a huge weakness of a lot of web >> app >> developers (myself at least), and these type of threads end up >> helping >> tremendously at some point :) > > Happy to help. > > Remember that when you need an expensive consultant to bail you out of > a problem by Monday. :)+1 :-) -- -- Tom Mornini
Tom Mornini
2006-Mar-10 16:29 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 10, 2006, at 7:49 AM, Adam Fields wrote:> On Thu, Mar 09, 2006 at 10:44:37PM -0800, Tom Mornini wrote: >> On Mar 9, 2006, at 9:02 PM, Adam Fields wrote: >> >>>>> What happens with simultaneous requests within the same >>>>> application >>>>> process? How do you deal with resource contention on the same >>>>> connection, long running queries, and potential blocking? >>>> >>>> The same thing connection pools do. :-) >>> >>> Well, not really. Since connection pools are available intraprocess, >>> if you have two threads running competing queries, you can assign >>> them >>> each a connection from the pool. >> >> Yes, but each connection can only be used by one thread at a time... > > To be clear, there are really four architectures we''re talking > about. Connection below always means db connection, and a client user > is assumed to be using the database (see below for more on that): > > 1) Non-persistent connections, each client opens its own connection, > and closes it when it''s done. Basically 1:1 clients to > connections.Common examples: CGI and mod_perl without Apache::DBI> 2) Persistent connections, each client opens its own connection, but > if there''s one already open, that''s used. Definitely 1:1 clients to > connections, and in fact this can be worse, because clients are > probably holding open persistent connections even if they''re not > using them. This may make sense if the connection overhead is > large.Common examples: mod_perl with Apache::DBI Additionally, I believe you''re throwing in the assumption that app connections are equal to HTTP connections.> 3) "Real" DB pooling, where the application server / connection > manager hands out db connections as needed for each query and > reclaims them back to the pool when they''re done. There''s basically > no relation between the number of connections and the number of > client users.Common examples: Java> 4) "Thread"-based connections, where each application thread (or > process, it''s not important how it''s implemented) gets its own > connection. Many client users may share an application thread > connection.Yes, and only dynamic requests get handled by the process that holds the DB connection open, and the HTTP connection to the client is NOT handled by the application thread.> The first two are basic PHP models, #3 is how Java does it if you''re > doing it right, and if I''m understanding it, #4 is the rails method. > > Correct?Yes, with clarifying comments.>>> Say, hypothetically, that each user is going to hit a common >>> transaction table. So you have a few thousand users, each on >>> their own >>> apache thread, but all sharing one rails fcgi backend process, and >>> thus one db connection between them. You may get resource contention >>> on that transaction table, because of that sharing, since only one >>> query can execute at a time on the connection. >> >> Yes, that would be a problem, but not necessarily more so than the >> pooled model, except that you''ve artificially set the connection/FCGI >> ration at a few thousand to one. Certainly you wouldn''t run the same >> few thousand users with a pool of one connection, would you? > > No - but that''s a limitation of the thread connection model. Do I need > to spawn a whole new fcgi process if I saturate my db (or any other > external resource, for that matter)? That seems like an inefficient > shotgun scaling mechanism.I don''t understand you here. The whole new FCGI process is spawned in advanced, and tuned over time to match traffic patterns. Generally speaking in the high ends of scalability we''re discussing, you''d add FCGI processes by plugging in a new application server.>>> Is there one db connection per execution thread, or one db >>> connection >>> per application process? >> >> Per application process, which is also per execution thread, as the >> Rails internals are not multi-threaded. > > That makes sense. > >>>> Simultaneous requests are run on separate application processes. >>>> >>>> You never have resource contention on the same connection, because >>>> you have one connection per concurrent thread. >>> >>> But then this devolves to the same case where you have a separate db >>> connection per apache process. If you have one connection per >>> client, >>> what difference does it make if it''s assigned via an apache >>> process or >>> a rails execution thread? In that case you do have the >>> possibility of >>> exhausting the number of possible connections. >> >> Because the reality is this: you get more than one HTTP (not >> necessarily >> handled by Apache...there are other servers) requests than you get >> application requests. The HTTP server serves static pages, images, >> CSS >> and Javascript files. > > Well, in a large application where you''re in danger of exhausting the > number of simultaneous connections on the database (which, for MySQL > under normal conditions is about 4,000, although I''ve been able to > push it to around 10,000), you''re probably pushing static and/or > cached files out from somewhere else. Either a CDN, or some sort of > front-end caching mechanism.Yes, that''s the front end proxy servers in the drawing below...> On 16 webservers, each of which does the default max of 256 apache > processes (or 4 webservers if you push it to 1024), that''s enough to > crash the database under heavy load. Granted, this is larger than the > average application, but that''s why they call it "scaling".But these connections are the client side HTTP connections, and DO NOT have DB connections. :-)> I''ve seen cases where even static files are invoking the php engine > because of misconfiguration, and that obviously is a problem that > compounds this. No idea if that''s a possible mistake to make in > rails. I hope not.Well, you know what they say about pigs...lipstick doesn''t help. :-)>> It breaks down like this, for the same fundamental reasons: >> >> In the connection pooled world, there are clearly rules of thumb and >> tuning involved in determining the correct number of pools per client >> connection: >> >> http://edocs.bea.com/wls/docs70/perform/WLSTuning.html >> >> If I read the correctly by default WebLogic creates a connection pool >> equal to 33% of application threads, and states that the most >> significant >> performance increase caused by connection pooling comes from >> maintaining >> the connection as opposed to reducing resource utilization. > > If your connection overhead is high, as it is with Oracle. This is > MUCH less of an issue with mysql, and in fact, you''ll often get better > performance by not using persistent connections and letting the > connections cycle away and not be held beyond when they''re needed.I find it very hard to believe that connecting per request is more efficient that persistent connections, though I do agree that MySQL connects incredibly faster than Oracle, and would therefore show a much smaller improvement -vs- Oracle. But, I can only imagine that''s faster if you''re opening connections that *aren''t going to be used*. If that''s the case, then all bets are off.>> The three-tier LAMP scaling model provides the same benefits in a >> different way. As I stated, in the past I''ve seen application >> specific tuning >> of the LAMP configuration at ration of 1/8-1/16 application >> processes per HTTP >> connection, and I''m willing to bet that''s a fairly similar number >> to the >> tuning for the connection pooling ratio. >> >> Here''s some nasty application art: > > :) > >> Web client <------------> HTTP Server <------------> App Server >> <------------> DB >> 1,000 1,000 >> 125 125 >> >> So, for 1,000 simultaneous web requests, you need 125 DB connections >> (in this example) and in my experience perhaps only 67 or so DB >> connections. >> >> The reason for this is the latency and bandwidth limits between the >> web client and the HTTP server *plus* the fact that many of those >> connections require NO App Server utilization (static files and >> images) combine to produce resource contention in the first stage >> that are between 8 and 16 times the resource contention the 2nd and >> third combined. > > It depends on what kind of queries your running and how long they take > on average, but okay for the general case.Oh, absolutely! The ratio needs to be tuned per-application, and is dependent on many variables, perhaps the most volatile of which is the application code itself. Efficient apps will have higher HTTP/app ratios than inefficient apps.>> I''ve heard that 37Signals and other large scale applications are >> running far more front end process per backend process than the 8-16 >> that I''ve described. The reason for this would be the dramatic >> increase in performance of today''s hardware -vs- the far smaller >> increase in client side bandwidth during that same period. > > Yes, I can definitely see that. > >> The latency of the connections also plays an enormous role in these >> issues. Packet times between the web client and HTTP server are >> likely to be in the 30-60 ms range, while the intra-farm packet >> latencies are sub-millisecond. >> >> So, to sum it up, even for dynamic requests, a minimum latency from >> the client to the HTTP server would be in the 60-120 ms range, even >> if the request only took one packet in each direction. That''s only >> about 8 requests per second. >> >> The vast majority of the dynamic requests that will be proxied back >> to the application servers will be handled in just a few >> milliseconds. >> >> These are the reasons why you can handle more than one client side >> connections per application process. > > Latency is not the only issue though - this equation is heavily > dependent on the efficiency of your backend queries, how long they > take to run, and how they stack up with respect to each other than the > db''s own resource contention algorithms. For example, if you''re doing > a lot of big file uploads, you may be holding open db connections for > the whole length of that connection, which may be minutes, even if you > only use the db at the very beginning to record the transaction.Yes, again, there are many complex variables that play into the equation. The reason I mentioned latency is that I''ve found again in again in career that serial latencies are more often the cause of performance problems performance resource limitations, though the latencies often cause a particular architecture to be starved for resources. :-) I find again and again that many people have a hard time conceptualizing how fast computers are. They think several milliseconds is incredibly fast, where to a modern computer it''s incredibly slow. I cannot tell you how many times I''ve seen applications running slowly, but having little to no disk and/or processor utilization issues due to serialized latencies, and people suggesting that throwing more and/or faster hardware at it was going to fix the problem. :-)> Perhaps what you say is true for many apps, but I''m also interested in > the difficult boundary cases. :)Me too. While I gave some specific examples of the most common issues, we''re in agreement that there''s more than one way to cause an application to perform poorly. :-) Again, just remember the pig! -- -- Tom Mornini
Adam Fields
2006-Mar-10 17:31 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 08:29:22AM -0800, Tom Mornini wrote: [...]> >1) Non-persistent connections, each client opens its own connection, > > and closes it when it''s done. Basically 1:1 clients to > > connections. > > Common examples: CGI and mod_perl without Apache::DBIAlso PHP with standard connect.> >2) Persistent connections, each client opens its own connection, but > > if there''s one already open, that''s used. Definitely 1:1 clients to > > connections, and in fact this can be worse, because clients are > > probably holding open persistent connections even if they''re not > > using them. This may make sense if the connection overhead is > > large. > > Common examples: mod_perl with Apache::DBIAlso PHP with pconnect.> Additionally, I believe you''re throwing in the assumption that app > connections are equal to HTTP connections.Not necessarily, but an HTTP connection is what I refer to when I say "client user".> >3) "Real" DB pooling, where the application server / connection > > manager hands out db connections as needed for each query and > > reclaims them back to the pool when they''re done. There''s basically > > no relation between the number of connections and the number of > > client users. > > Common examples: Java > > >4) "Thread"-based connections, where each application thread (or > > process, it''s not important how it''s implemented) gets its own > > connection. Many client users may share an application thread > > connection. > > Yes, and only dynamic requests get handled by the process that holds > the DB connection open, and the HTTP connection to the client is NOT > handled by the application thread.Yes, I''m clear on that - it''s handled by the web server and the connection to the rails process is brokered in some way (fcgi probably, at this point).> >The first two are basic PHP models, #3 is how Java does it if you''re > >doing it right, and if I''m understanding it, #4 is the rails method. > > > >Correct? > > Yes, with clarifying comments.Good. :) [...]> >No - but that''s a limitation of the thread connection model. Do I need > >to spawn a whole new fcgi process if I saturate my db (or any other > >external resource, for that matter)? That seems like an inefficient > >shotgun scaling mechanism. > > I don''t understand you here. The whole new FCGI process is spawned in > advanced, and tuned over time to match traffic patterns. Generally > speaking in the high ends of scalability we''re discussing, you''d add > FCGI processes by plugging in a new application server.So that''s my real question - what''s an "application server" in the rails scaling model? If each application process gets 1 database connection to be shared among all of the client users (HTTP connections), what happens if you saturate that, assuming that some reasonably large number of them require database access (if they don''t, it''s easy)? Do you have to add a whole other rails process, or is there a way to add additional db connections within the context of one? [...]> Yes, that''s the front end proxy servers in the drawing below... > > >On 16 webservers, each of which does the default max of 256 apache > >processes (or 4 webservers if you push it to 1024), that''s enough to > >crash the database under heavy load. Granted, this is larger than the > >average application, but that''s why they call it "scaling". > > But these connections are the client side HTTP connections, and DO NOT > have DB connections. :-)Not necessarily. Ideally, everything that gets past your proxy cache requires a database connection, because if it didn''t, the proxy cache should probably be able to handle it. Agreed though, most applications probably aren''t pushing this kind of fully dynamic traffic. Some may be though. I want to know how rails deals with that if they do.> >I''ve seen cases where even static files are invoking the php engine > >because of misconfiguration, and that obviously is a problem that > >compounds this. No idea if that''s a possible mistake to make in > >rails. I hope not. > > Well, you know what they say about pigs...lipstick doesn''t help. :-)Hey, look what list we''re on. [...]> I find it very hard to believe that connecting per request is more > efficient that persistent connections, though I do agree that MySQL > connects incredibly faster than Oracle, and would therefore show a > much smaller improvement -vs- Oracle. > > But, I can only imagine that''s faster if you''re opening connections > that *aren''t going to be used*. If that''s the case, then all bets are > off.It''s not so much that they aren''t being used, but that they aren''t being used for the full duration of the request. Under normal circumstances, pconnects don''t close when you''re done with them, until your request is fully handled. That may be keeping a db connection locked up for full seconds (or even minutes, in some cases), even though it''s no longer technically needed. In that case, you need to use non-persistent connections to cycle them effectively intra-request. I''ve seen it happen. [...]> I find again and again that many people have a hard time > conceptualizing how fast computers are. They think several > milliseconds is incredibly fast, where to a modern computer it''s > incredibly slow. I cannot tell you how many times I''ve seen > applications running slowly, but having little to no disk and/or > processor utilization issues due to serialized latencies, and people > suggesting that throwing more and/or faster hardware at it was going > to fix the problem. :-)Sometimes it does, but this is why resource contention analysis (of all different kinds) is important.> >Perhaps what you say is true for many apps, but I''m also interested in > >the difficult boundary cases. :) > > Me too. While I gave some specific examples of the most common issues, > we''re in agreement that there''s more than one way to cause an > application > to perform poorly. :-) > > Again, just remember the pig!Often, it''s impossible to forget. But we do what we can. :) -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Tom Mornini
2006-Mar-10 20:47 UTC
[Rails] How to scale mysql servers for a rails application?
On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:> So that''s my real question - what''s an "application server" in the > rails scaling model?A machine with one or more processes executing Rails code.> If each application process gets 1 database connection to be shared > among all of the client users (HTTP connections), what happens if you > saturate that, assuming that some reasonably large number of them > require database access (if they don''t, it''s easy)? Do you have to add > a whole other rails process, or is there a way to add additional db > connections within the context of one?Well, that''s just it...and seems to be a point of confusion between us. In the pictures I''ve drawn the systems I''ve described you can tune the number of application processes (and therefore DB connections) per HTTP connections.> ot necessarily. Ideally, everything that gets past your proxy cache > requires a database connection, because if it didn''t, the proxy cache > should probably be able to handle it. Agreed though, most applications > probably aren''t pushing this kind of fully dynamic traffic. Some may > be though. I want to know how rails deals with that if they do.Well, in the LAMP scaling model, EVERY application pushes that sort of load *at the application process*, as all other requests are handled by the HTTP front end. The application processes *only* serve dynamic content, and I''m quite sure you understand this.>>> I''ve seen cases where even static files are invoking the php engine >>> because of misconfiguration, and that obviously is a problem that >>> compounds this. No idea if that''s a possible mistake to make in >>> rails. I hope not. >> >> Well, you know what they say about pigs...lipstick doesn''t help. :-) > > Hey, look what list we''re on.I don''t understand this comment at all. In response to your previous statement, it''s *absolutely* possible to misconfigure Rails production server environments, and I cannot imagine that you think that any highly scalable system had infallible configuration.>> I find it very hard to believe that connecting per request is more >> efficient that persistent connections, though I do agree that MySQL >> connects incredibly faster than Oracle, and would therefore show a >> much smaller improvement -vs- Oracle. >> >> But, I can only imagine that''s faster if you''re opening connections >> that *aren''t going to be used*. If that''s the case, then all bets are >> off. > > It''s not so much that they aren''t being used, but that they aren''t > being used for the full duration of the request. Under normal > circumstances, pconnects don''t close when you''re done with them, until > your request is fully handled. That may be keeping a db connection > locked up for full seconds (or even minutes, in some cases), even > though it''s no longer technically needed. In that case, you need to > use non-persistent connections to cycle them effectively > intra-request. I''ve seen it happen.Ah, now this is an interesting point that I had failed to consider. Yes! Connection pooling would be more efficient in the case where the majority of dynamic processing time were spent in non-DB related code. I think that this must be a relatively uncommon situation, though with data coming from external sources (i.e. web services) I can see this becoming more prevalent in the future. Thanks for clarifying that advantage of pooling.>> I find again and again that many people have a hard time >> conceptualizing how fast computers are. They think several >> milliseconds is incredibly fast, where to a modern computer it''s >> incredibly slow. I cannot tell you how many times I''ve seen >> applications running slowly, but having little to no disk and/or >> processor utilization issues due to serialized latencies, and people >> suggesting that throwing more and/or faster hardware at it was going >> to fix the problem. :-) > > Sometimes it does, but this is why resource contention analysis (of > all different kinds) is important.I''m sure we''ll agree that the most common non-trivial scaling problems are not solvable via simple hardware addition, and that''s my point.>>> Perhaps what you say is true for many apps, but I''m also >>> interested in >>> the difficult boundary cases. :) >> >> Me too. While I gave some specific examples of the most common >> issues, >> we''re in agreement that there''s more than one way to cause an >> application to perform poorly. :-) >> >> Again, just remember the pig! > > Often, it''s impossible to forget. But we do what we can. :)Yes, bill by the hour for creative solutions, no? :-) -- -- Tom Mornini
Adam Fields
2006-Mar-10 21:32 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 12:47:44PM -0800, Tom Mornini wrote:> On Mar 10, 2006, at 9:31 AM, Adam Fields wrote: > > >So that''s my real question - what''s an "application server" in the > >rails scaling model? > > A machine with one or more processes executing Rails code. > > >If each application process gets 1 database connection to be shared > >among all of the client users (HTTP connections), what happens if you > >saturate that, assuming that some reasonably large number of them > >require database access (if they don''t, it''s easy)? Do you have to add > >a whole other rails process, or is there a way to add additional db > >connections within the context of one? > > Well, that''s just it...and seems to be a point of confusion between us. > In the pictures I''ve drawn the systems I''ve described you can tune the > number of application processes (and therefore DB connections) per HTTP > connections.I don''t think that''s a point of confusion. The point of confusion is "why". Adding a whole other application process seems pretty heavyweight (extra ram and cpu utilization, mostly) when all you need is just another db connection.> >ot necessarily. Ideally, everything that gets past your proxy cache > >requires a database connection, because if it didn''t, the proxy cache > >should probably be able to handle it. Agreed though, most applications > >probably aren''t pushing this kind of fully dynamic traffic. Some may > >be though. I want to know how rails deals with that if they do. > > Well, in the LAMP scaling model, EVERY application pushes that sort of > load *at the application process*, as all other requests are handled by > the HTTP front end. The application processes *only* serve dynamic > content, and I''m quite sure you understand this.Yes. I''m getting at questioning the edge cases where the overwhelming majority of your application traffic is fully dynamic and hitting your application processes.> >>>I''ve seen cases where even static files are invoking the php engine > >>>because of misconfiguration, and that obviously is a problem that > >>>compounds this. No idea if that''s a possible mistake to make in > >>>rails. I hope not. > >> > >>Well, you know what they say about pigs...lipstick doesn''t help. :-) > > > >Hey, look what list we''re on. > > I don''t understand this comment at all.Oh, that was just a feeble attempt at humor. The hope is that rails is less of a pig. :)> In response to your previous statement, it''s *absolutely* possible to > misconfigure Rails production server environments, and I cannot imagine > that you think that any highly scalable system had infallible > configuration.Absolutely true. [...]> >locked up for full seconds (or even minutes, in some cases), even > >though it''s no longer technically needed. In that case, you need to > >use non-persistent connections to cycle them effectively > >intra-request. I''ve seen it happen. > > Ah, now this is an interesting point that I had failed to consider. > > Yes! Connection pooling would be more efficient in the case where > the majority of dynamic processing time were spent in non-DB related > code. I think that this must be a relatively uncommon situation, > though with data coming from external sources (i.e. web services) > I can see this becoming more prevalent in the future. > > Thanks for clarifying that advantage of pooling.This is exactly why some people look at rails and say "where''s the real application services layer?". But I suspect that that''s a simplistic answer and there''s a more elegant one in there somewhere. [...]> I''m sure we''ll agree that the most common non-trivial scaling > problems are > not solvable via simple hardware addition, and that''s my point.Agreed. [...]> Yes, bill by the hour for creative solutions, no? :-)Agreed. -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Justin Forder
2006-Mar-10 22:30 UTC
[Rails] How to scale mysql servers for a rails application?
Tom Mornini wrote: > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote:>> It''s not so much that [persistent db connections]>> aren''t being used, but that they aren''t>> being used for the full duration of the request. Under normal >> circumstances, pconnects don''t close when you''re done with them, until >> your request is fully handled. That may be keeping a db connection >> locked up for full seconds (or even minutes, in some cases), even >> though it''s no longer technically needed. In that case, you need to >> use non-persistent connections to cycle them effectively >> intra-request. I''ve seen it happen. > > Ah, now this is an interesting point that I had failed to consider. > > Yes! Connection pooling would be more efficient in the case where > the majority of dynamic processing time were spent in non-DB related > code. I think that this must be a relatively uncommon situation, > though with data coming from external sources (i.e. web services) > I can see this becoming more prevalent in the future. > > Thanks for clarifying that advantage of pooling.From my production.log: Processing HistoryController#list (for 217.169.11.194 at 2006-03-10 20:30:15) [GET] Parameters: {"action"=>"list", "controller"=>"history"} Rendering within layouts/application Rendering history/list Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) | DB: 0.02177 (23%) | 200 OK [http://real.host.deleted/history/list] .. from a small application on a Mac mini running Apache-lighty-fcgi-Rails-MySQL (all on the same machine, under Tiger), and that''s one of the higher records in terms of reported DB time. This application displays pages of data with embedded graphs (sparklines) - the graphs are generated from data provided in the URL. All requests are for dynamic data, either HTML or PNG. Averaging over the last 2000 requests, the time reported by Rails for the DB contribution is 6.14% of the total (rendering is 38.6%). If I remove the records which have 0.00000 as the DB time (which will be the image requests) I am left with 850 records. For these, the DB contribution is 12.4% (rendering is 77.8%). The rendered pages contain tables with many links, and these use link_to, which Stefan Kaes has said is slow - so my figures could change with tuning. But they do show clearly that the majority of time can be spent in non-DB code. regards Justin
Adam Fields
2006-Mar-10 22:54 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 10:30:19PM +0000, Justin Forder wrote: [...]> The rendered pages contain tables with many links, and these use > link_to, which Stefan Kaes has said is slow - so my figures could change > with tuning. But they do show clearly that the majority of time can be > spent in non-DB code.That''s not the whole picture though. That total aggregate time could be spread out throughout a large number of DB calls. Theoretically, you shouldn''t be disconnecting from the DB between calls (although in a rails model, maybe the application is timeslicing the DB connection between requests - I don''t know the answer to that), but what''s really important is the processing time after the last DB call in a request. Any time there is definitely wasted from the point of view of keeping a DB connection open. Whether this is an issue probably depends on how a request ties up the DB connection within the rails execution context, and I''m not clear on how that works. -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Justin Forder
2006-Mar-11 00:39 UTC
[Rails] How to scale mysql servers for a rails application?
Adam Fields wrote:> On Fri, Mar 10, 2006 at 10:30:19PM +0000, Justin Forder wrote: > [...] >> The rendered pages contain tables with many links, and these use >> link_to, which Stefan Kaes has said is slow - so my figures could change >> with tuning. But they do show clearly that the majority of time can be >> spent in non-DB code. > > That''s not the whole picture though. That total aggregate time could > be spread out throughout a large number of DB calls. Theoretically, > you shouldn''t be disconnecting from the DB between calls (although in > a rails model, maybe the application is timeslicing the DB connection > between requests - I don''t know the answer to that), but what''s really > important is the processing time after the last DB call in a > request. Any time there is definitely wasted from the point of view of > keeping a DB connection open.It''s true that if you are not careful you can find yourself making calls to the DB while rendering the response (because of lazy loading). It''s preferable IMHO to build the complete model before rendering it, and that is what my application does. So, for a given request, all the rendering time comes *after* all the DB time (and the DB time should be from one or two calls to the DB).> Whether this is an issue probably depends on how a request ties up the > DB connection within the rails execution context, and I''m not clear on > how that works.As I mentioned before, this is running under FCGI, and each Rails FCGI process holds an open DB connection. These are the persistent connections Tom was referring to. This is a straightforward model, and for nearly all applications you don''t need a large number of FCGI processes, so I don''t see a problem with having a DB connection per process. I just posted my figures to show that this model doesn''t load the individual DB connections heavily. regards Justin
Tom Mornini
2006-Mar-11 03:38 UTC
[Rails] How to scale mysql servers for a rails application?
Interesting data, Justin! My experience suggests that most applications don''t generate large numbers of rendered images, but my view may be hopelessly Web 1.0. -- -- Tom Mornini On Mar 10, 2006, at 2:30 PM, Justin Forder wrote:> Tom Mornini wrote: > > > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote: > >>> It''s not so much that [persistent db connections] > >> aren''t being used, but that they aren''t >>> being used for the full duration of the request. Under normal >>> circumstances, pconnects don''t close when you''re done with them, >>> until >>> your request is fully handled. That may be keeping a db connection >>> locked up for full seconds (or even minutes, in some cases), even >>> though it''s no longer technically needed. In that case, you need to >>> use non-persistent connections to cycle them effectively >>> intra-request. I''ve seen it happen. >> Ah, now this is an interesting point that I had failed to consider. >> Yes! Connection pooling would be more efficient in the case where >> the majority of dynamic processing time were spent in non-DB related >> code. I think that this must be a relatively uncommon situation, >> though with data coming from external sources (i.e. web services) >> I can see this becoming more prevalent in the future. >> Thanks for clarifying that advantage of pooling. > > From my production.log: > > Processing HistoryController#list (for 217.169.11.194 at 2006-03-10 > 20:30:15) [GET] > Parameters: {"action"=>"list", "controller"=>"history"} > Rendering within layouts/application > Rendering history/list > Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) | DB: > 0.02177 (23%) | 200 OK [http://real.host.deleted/history/list] > > .. from a small application on a Mac mini running Apache-lighty- > fcgi-Rails-MySQL (all on the same machine, under Tiger), > and that''s one of the higher records in terms of reported DB time. > > This application displays pages of data with embedded graphs > (sparklines) - the graphs are generated from data provided in the URL. > All requests are for dynamic data, either HTML or PNG. > > Averaging over the last 2000 requests, the time reported by Rails > for the DB contribution is 6.14% of the total (rendering is 38.6%). > > If I remove the records which have 0.00000 as the DB time (which > will be the image requests) I am left with 850 records. For these, > the DB contribution is 12.4% (rendering is 77.8%). > > The rendered pages contain tables with many links, and these use > link_to, which Stefan Kaes has said is slow - so my figures could > change with tuning. But they do show clearly that the majority of > time can be spent in non-DB code. > > regards > > Justin > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails
Tom Mornini
2006-Mar-11 03:45 UTC
[Rails] How to scale mysql servers for a rails application?
Oh, additionally, as you add app servers to the system, the DB will become loaded, while the app server load will remain fairly constant. This would tend to skew loaded results toward higher DB time ratios. Of course you can scale the DB through a variety of methods, but generally speaking, DBs don''t scale as easily as the application layer. -- -- Tom Mornini On Mar 10, 2006, at 7:38 PM, Tom Mornini wrote:> Interesting data, Justin! > > My experience suggests that most applications > don''t generate large numbers of rendered images, > but my view may be hopelessly Web 1.0. > > -- -- Tom Mornini > > On Mar 10, 2006, at 2:30 PM, Justin Forder wrote: > >> Tom Mornini wrote: >> >> > On Mar 10, 2006, at 9:31 AM, Adam Fields wrote: >> >>>> It''s not so much that [persistent db connections] >> >> aren''t being used, but that they aren''t >>>> being used for the full duration of the request. Under normal >>>> circumstances, pconnects don''t close when you''re done with them, >>>> until >>>> your request is fully handled. That may be keeping a db connection >>>> locked up for full seconds (or even minutes, in some cases), even >>>> though it''s no longer technically needed. In that case, you need to >>>> use non-persistent connections to cycle them effectively >>>> intra-request. I''ve seen it happen. >>> Ah, now this is an interesting point that I had failed to consider. >>> Yes! Connection pooling would be more efficient in the case where >>> the majority of dynamic processing time were spent in non-DB related >>> code. I think that this must be a relatively uncommon situation, >>> though with data coming from external sources (i.e. web services) >>> I can see this becoming more prevalent in the future. >>> Thanks for clarifying that advantage of pooling. >> >> From my production.log: >> >> Processing HistoryController#list (for 217.169.11.194 at >> 2006-03-10 20:30:15) [GET] >> Parameters: {"action"=>"list", "controller"=>"history"} >> Rendering within layouts/application >> Rendering history/list >> Completed in 0.09403 (10 reqs/sec) | Rendering: 0.06680 (71%) | >> DB: 0.02177 (23%) | 200 OK [http://real.host.deleted/history/list] >> >> .. from a small application on a Mac mini running Apache-lighty- >> fcgi-Rails-MySQL (all on the same machine, under Tiger), >> and that''s one of the higher records in terms of reported DB time. >> >> This application displays pages of data with embedded graphs >> (sparklines) - the graphs are generated from data provided in the >> URL. >> All requests are for dynamic data, either HTML or PNG. >> >> Averaging over the last 2000 requests, the time reported by Rails >> for the DB contribution is 6.14% of the total (rendering is 38.6%). >> >> If I remove the records which have 0.00000 as the DB time (which >> will be the image requests) I am left with 850 records. For these, >> the DB contribution is 12.4% (rendering is 77.8%). >> >> The rendered pages contain tables with many links, and these use >> link_to, which Stefan Kaes has said is slow - so my figures could >> change with tuning. But they do show clearly that the majority of >> time can be spent in non-DB code. >> >> regards >> >> Justin >> _______________________________________________ >> Rails mailing list >> Rails@lists.rubyonrails.org >> http://lists.rubyonrails.org/mailman/listinfo/rails > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails
Adam Fields
2006-Mar-11 03:52 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 07:45:05PM -0800, Tom Mornini wrote:> Oh, additionally, as you add app servers to the > system, the DB will become loaded, while the > app server load will remain fairly constant. > > This would tend to skew loaded results toward > higher DB time ratios. > > Of course you can scale the DB through a variety > of methods, but generally speaking, DBs don''t > scale as easily as the application layer.We''ve come full circle - this discussion started with a question about how to scale MySQL to multiple servers. It''s easier if you have a real db connection pool broker. :) -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki
Adam Fields
2006-Mar-11 03:58 UTC
[Rails] How to scale mysql servers for a rails application?
On Fri, Mar 10, 2006 at 07:38:49PM -0800, Tom Mornini wrote:> Interesting data, Justin! > > My experience suggests that most applications > don''t generate large numbers of rendered images, > but my view may be hopelessly Web 1.0.I thought SVG would get some more traction sooner, but it hasn''t really caught on. -- - Adam ** Expert Technical Project and Business Management **** System Performance Analysis and Architecture ****** [ http://www.adamfields.com ] [ http://www.aquick.org/blog ] ............ Blog [ http://www.adamfields.com/resume.html ].. Experience [ http://www.flickr.com/photos/fields ] ... Photos [ http://www.aquicki.com/wiki ].............Wiki