I started to title this as ''approaches to scaling rails'', but realized it''s not really a rails specific problem. I''m currently working with a small team designing the architecture for our application. A previous version of this product was rolled out to a smaller user base using another framework, so we have a pretty good idea of the basic traffic patterns, data models, etc.. But now we are going to expand it to a much larger scale, and we already have clients committed to using it. So basically we are debating how much scalability to put in now versus later, and naturally we don''t all agree. I''m very much in favor of retaining what''s good in rails as much as we can, even if we have to do some extra work on the backend to make that happen. My attitude is that keeping the benefits that we get from a framework like rails should be a primary goal, not an afterthought. The folks who have worked a lot on distributed systems and spent less time on the web side tend to see it differently. They are quicker to build in the scaling right from the start and toss out whole chunks of rails such as activerecord and a good part of the template/rendering system. Doing things like fetching 90% of the data via ajax calls directly to the backend, and not having rails deal with much of the data/content at all. They don''t want to use an rdbms at all and think very little of MVC. In their eyes these are just natural sacrifices you have to make if you want to scale. My approach is pretty much the opposite. Use rails as it''s designed to be used from the start, and add in scalability later. If you need to design certain things to scale from the start, then put some time into seeing how you can make it work with rails instead of tossing out rails if it doesn''t work with what you know. A couple of concrete examples of the different ways we are approaching some scaling issues. The backend folks want to start with a flattened/hierarchical data store and not use an rdbms at all. Some of this isn''t even a scaling issue, they don''t even like using a relational system to represent data models. While I agree that eventually it will make sense to move some data out of the database, I don''t agree with just dumping the rdbms entirely from the start. Then we have the guys who want to do everything in javascript. The rails template system would pretty much just spit out the template and async ajax calls would go directly to the backend. Javascript would not only be used for fetching the data, but for most of the display logic. So anyways it''s a challenging debate for me. It''s difficult to convey the advantages of a coherent framework like rails to guys that haven''t spent much time using a web framework, and I think that''s a big part of the problem. Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Oct 20, 2007, at 3:48 PM, snacktime wrote:> I''m currently working with a small team designing the architecture for > our application. A previous version of this product was rolled > out to a smaller user base using another framework, so we have a > pretty good idea of the basic traffic patterns, data models, etc.. > But now we are going to expand it to a much larger scale, and we > already have clients committed to using it.How large is "much larger" -- how many machines would youexpect to need, and how many db queries would that generate?> So basically we are debating how much scalability to put in now versus > later, and naturally we don''t all agree. I''m very much in favor of > retaining what''s good in rails as much as we can, even if we have to > do some extra work on the backend to make that happen.If agility is critcal (app is tweaked frequently), I would think you will not want very much difference between your dev & prod environments. You can spend a lot of $ on people''s time that is easily made up by spending it on hardware. Overall benefit is team efficiency.> My attitude is > that keeping the benefits that we get from a framework like rails > should be a primary goal, not an afterthought. The folks who have > worked a lot on distributed systems and spent less time on the web > side tend to see it differently. They are quicker to build in the > scaling right from the start and toss out whole chunks of rails such > as activerecord and a good part of the template/rendering system.And a formula 1 engineer would love to toss out the driver so that aerodynamics can in the body can be better optimized. Stupid driver just gets in the way! I have worked in several engineering fields (mech, elect, chem and soft), and without fail, the majority of textbook engineers insist on optimizing their part at the expense of others. It''s what they were indoctrinated to do. Very few appreciate entire systems. Those that do, end up being the best engineers you''ll ever work with, and they''ll be a part of the best products ever made in their fields. What is your role in this process? Cog in the wheel, or do you have influence over the direction after considering their input?> Doing things like fetching 90% of the data via ajax calls directly to > the backend, and not having rails deal with much of the data/content > at all. They don''t want to use an rdbms at all and think very little > of MVC. In their eyes these are just natural sacrifices you have to > make if you want to scale.Scaling can be handled many ways, so could be hogwash, or could be relevant, but there are some very large rdbms systems out there, so this sounds a little too much like limited capacity brains than limited capacity RDBMS to me.> A couple of concrete examples of the different ways we are approaching > some scaling issues. > > The backend folks want to start with a flattened/hierarchical data > store and not use an rdbms at all. Some of this isn''t even a scaling > issue, they don''t even like using a relational system to represent > data models. While I agree that eventually it will make sense to move > some data out of the database, I don''t agree with just dumping the > rdbms entirely from the start.If you can, get the team to focus on business value added. If you deployed today as-is, can the business meet its goal and make some $ (or whatever its purpose is). If yes, deploy it! If you''re going to change it anyway, then itwould be better to deploy it now as-is, earn money to pay for the changes, and get real data/feedback to help direct what those changes should be. Refactor it in stages to get from "here" to "there." It gives everyone time to feel out a sub- sysyetm at a time. Focus on changing the ones that matter the most. If some dude hates RDBMS just because, well, that''s a bird of a different feather.> Then we have the guys who want to do everything in javascript. The > rails template system would pretty much just spit out the template and > async ajax calls would go directly to the backend. Javascript would > not only be used for fetching the data, but for most of the display > logic.Weird.> So anyways it''s a challenging debate for me. It''s difficult to convey > the advantages of a coherent framework like rails to guys that haven''t > spent much time using a web framework, and I think that''s a big part > of the problem.People with one view of the world are always a problem. Hard to give some objective options w/o knowing what type of overall scale you''re talking about. Advice for 10 systems might not make sense for 1,000 systems. These guys might have a point, they might not. Reality is probably somewhere in th the middle -- getting evryone to admit that first is probably the hardest part. -- gw --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Sat, 20 Oct 2007 15:48:15 -0700, snacktime wrote:> The backend folks want to start with a flattened/hierarchical data > store and not use an rdbms at all. Some of this isn''t even a scaling > issue, they don''t even like using a relational system to represent > data models. While I agree that eventually it will make sense to move > some data out of the database, I don''t agree with just dumping the > rdbms entirely from the start.Having been down this road a few times... YAGNI. No matter how big you get. Even at AOL, we ended up moving to RDBMS''s for some of our highest-volume databases, because, at their core, they WERE databases. All the work we did to make flat files feel database-like gave us all the overhead of an RDBMS with none of the advantages. Now, if your data isn''t actually a good fit for an RDBMS, then you shouldn''t use it. But then you probably shouldn''t use Rails, either, because that''s what it''s good at. But if you have relational data - and most of us do - then you want a relational database. RDBMS vendors have spent a few decades focused solely on optimizing RDBMS-like queries. They''re really good at it now, whether you need row-level locking, or replication, or transactions, or efficient indices, or BLOBs, or complex joins. If your engineers really think they will grow beyond the abilities of, say, Oracle, they can check out Tandem - it implements SQL at the *drive controller* level. Talk about fast. -- Jay Levitt | Boston, MA | My character doesn''t like it when they Faster: jay at jay dot fm | cry or shout or hit. http://www.jay.fm | - Kristoffer --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 10/20/07, Greg Willits <lists-0Bv1hcaDFPRk211Z5VL+QA@public.gmane.org> wrote:> > On Oct 20, 2007, at 3:48 PM, snacktime wrote: > > > I''m currently working with a small team designing the architecture for > > our application. A previous version of this product was rolled > > out to a smaller user base using another framework, so we have a > > pretty good idea of the basic traffic patterns, data models, etc.. > > But now we are going to expand it to a much larger scale, and we > > already have clients committed to using it. > > How large is "much larger" -- how many machines would youexpect to > need, and how many db queries would that generate? >Unfortunately I can''t give too much detail about the specific type of application this is. One side is a social networking app. The other side is an administrative web app for a specific set of large organizations that manage large amounts of data. Then we tie the two together. Each organization has anywhere from 50,000 to 10 million or so users. The previous system was getting a couple million hits per day, all to pages that required at least one db query. Current estimate is that we will have 10 times that volume within a year or two, but it could be considerably higher towards the end of year 2. I do have considerable influence, as we are making the decision as a group and all have pretty much an equal say. Thanks for the input, Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Oct 20, 2007, at 6:09 PM, snacktime wrote:> On 10/20/07, Greg Willits <lists-0Bv1hcaDFPRk211Z5VL+QA@public.gmane.org> wrote: >> On Oct 20, 2007, at 3:48 PM, snacktime wrote: >> >>> I''m currently working with a small team designing the >>> architecture for >>> our application. A previous version of this product was rolled >>> out to a smaller user base using another framework, so we have a >>> pretty good idea of the basic traffic patterns, data models, etc.. >>> But now we are going to expand it to a much larger scale, and we >>> already have clients committed to using it. >> >> How large is "much larger" -- how many machines would you expect to >> need, and how many db queries would that generate? > > Unfortunately I can''t give too much detail about the specific type of > application this is. One side is a social networking app. The other > side is an administrative web app for a specific set of large > organizations that manage large amounts of data. Then we tie the two > together. Each organization has anywhere from 50,000 to 10 million or > so users. The previous system was getting a couple million hits per > day, all to pages that required at least one db query. Current > estimate is that we will have 10 times that volume within a year or > two, but it could be considerably higher towards the end of year 2. > > I do have considerable influence, as we are making the decision as a > group and all have pretty much an equal say. > > Thanks for the input,Sounds like Jay has some concrete advice, I just wanted to lend some philosophical support :-) So, some basic math gives a bit of a feel for the scale: - 2 million hits per day (I''ll assume "hit" means a page load) - let''s say there''s 4 queries per page (hardly any ever have just one it seems) - 8 million queries per day - 93 queries per second. Even if your timeframe is generally compressed into an 8 or 10 hour bell curve, that''s about 300 queries per second I''m no db guru when it comes to optimization, but poking around various sources, it doesn''t seem like that is all that high of a number. Sounds like you have multiple apps to deliver, so, OK that number goes up by some factor, but I would assume with each app/ client, there''s an income to allow covering the costs of all this traffic. Anyway, I''d have your db guys "prove" that RDBMS on suitably prepped hardware really can''t do the job you need, and compare the cost of the two approaches at various points along your predicted ramp up. Time probably doesn''t matter, just scale. At scaling of X, cost = Y (a), at scaling of 2X cost = Y(b). Not that these number are the end of the debate, but it helps to attach some "realm of reality" data to the emotional arguments. Plus if you scale huge, that means getting more clients. Does that mean hiring more developers? If so, how long will it take them to be functional using architecture X vs Y. Are people available ready to work on X? Same for Y? How many people can be hired off the street to maintain architecture X vs Y? I''ve drawn plenty of boxes and circles on whiteboards, but that''s only a part of the story. I''d explore all the peripheral impacts to product development, maintenance, sales proces, etc. -- gw --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Oct 20, 2007, at 11:59 PM, Greg Willits wrote:> - 93 queries per second.[...]> I''m no db guru when it comes to optimization, but poking around > various sources, it doesn''t seem like that is all that high of a > number.Twitter was handling 600 requests per second against MySQL 6 months ago, and I imagine it''s only gone up since then. MySQL isn''t necessarily the best choice for high-load concurrent access, either. http://www.slideshare.net/Blaine/scaling-twitter -faisal --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
I tend to think that your numbers are small for a "commercial" database with the right memory and hardware. For our production apps we use oracle on the backend. There''s no reason to go "flat" files at all. Use NGINX and mongrel-cluster on the front side. You can use Oracle and a Oracle Cluster on the back. You can even use in-memory tables as well. But from your numbers its no more than two shelfs of blades, and the right storage. Of course it might be fun to use some of sun''s new mutli-core for front and back, so a single blade could do 480 request per second on the front end. :) Oracle can pull the load, with the right associations. I suspect if its mostly read, even a "clustered" mysql can easily pull your load. The numbers "sound" large, but broken down to the "per" second level, seem quite reasonable to small. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---