Hi, Anyone got an idea of how many web and database servers I''d need to push out 10,000 dynamic pages per second? Fairly simple pages and database queries. I''d appreciate recommendations for hardware. The clients for this project are anticipating large amounts of burst traffic. Joe
On 7/28/06, Joe Van Dyk <joevandyk@gmail.com> wrote:> Hi, > > Anyone got an idea of how many web and database servers I''d need to > push out 10,000 dynamic pages per second? Fairly simple pages and > database queries. I''d appreciate recommendations for hardware. > > The clients for this project are anticipating large amounts of burst trafficWhat makes them think they will get that much traffic? I can probably count on one hand the number of sites that do that much, burst or otherwise. Sounds fishy to me.
I''d say it''ll take about 100 times as much as it would take to push out 100 dynamic pages per second... Seriously, you need a WHOLE lot of info about app, infrastructure, hardware etc. before anyone could make any such recommendation. Anyone who says otherwise has no idea what they''re talking about. If you get a sample of the hardware you intend to use, then load up your app, tune it and push it as hard as it can go, you''ll get some sort of idea. If it can handle 100 pages/second, then you need ~100 times as much hardware as you''ve already got. Yep, I know that''s overly simplistic, but it''s relatively cheap, simple to extrapolate, and it''ll get you within 20-50% of a reasonable estimate in dollar terms. You''ll have to factor in load balancing hardware (which you won''t be able to drive to breaking point without a sizeably greater investment), database replication and the costs of running a suitable data centre, at some point, but to counter that you''ll probably be able to squeeze out some more pages/second from your existing hardware (and in any case, $$$/hardware grunt continues to drop so that by the time you''ve bought your 10k pages/second hardware, the cost of the hardware will have dropped considerably). Alternately, you could just listen to the guy who replies e.g. "27 Web servers and 13 database servers", and accept that at face value ;-> Sorry if that''s very little help, but at least it''s the truth. Nobody could answer your question without doing a lot of research on your specific app first. Regards Dave M. On 29/07/06, Joe Van Dyk <joevandyk@gmail.com> wrote:> Hi, > > Anyone got an idea of how many web and database servers I''d need to > push out 10,000 dynamic pages per second? Fairly simple pages and > database queries. I''d appreciate recommendations for hardware. > > The clients for this project are anticipating large amounts of burst traffic. > > Joe > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
That question is rather hard to answer, since "dynamic" isnt really well-specified... your best bet would be to use caching heavily and in the best of situations you could just serve static pages, and then its up to the web server - not rails. On 7/29/06, Joe Van Dyk <joevandyk@gmail.com> wrote:> > Hi, > > Anyone got an idea of how many web and database servers I''d need to > push out 10,000 dynamic pages per second? Fairly simple pages and > database queries. I''d appreciate recommendations for hardware. > > The clients for this project are anticipating large amounts of burst > traffic. > > Joe > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060729/fe5e55b6/attachment.html
On 7/28/06, Joe Van Dyk <joevandyk@gmail.com> wrote:> Hi, > > Anyone got an idea of how many web and database servers I''d need to > push out 10,000 dynamic pages per second? Fairly simple pages and > database queries. I''d appreciate recommendations for hardware. > > The clients for this project are anticipating large amounts of burst traffic.(I think that estimate is quite a bit on the high end of what will actually happen, but anyways) I want to be able to tell them that going up to that number of visits will be just a matter of adding another machine or ten to the rack at the data center. In other words, I''d like to have a hardware and software setup that could handle that number of visits simply by adding another web server. Thanks, Joe
On 7/28/06, Joe Van Dyk <joevandyk@gmail.com> wrote:> On 7/28/06, Joe Van Dyk <joevandyk@gmail.com> wrote: > > Hi, > > > > Anyone got an idea of how many web and database servers I''d need to > > push out 10,000 dynamic pages per second? Fairly simple pages and > > database queries. I''d appreciate recommendations for hardware. > > > > The clients for this project are anticipating large amounts of burst traffic. > > (I think that estimate is quite a bit on the high end of what will > actually happen, but anyways) > > I want to be able to tell them that going up to that number of visits > will be just a matter of adding another machine or ten to the rack at > the data center. In other words, I''d like to have a hardware and > software setup that could handle that number of visits simply by > adding another web server.Even assuming that they will get say 5-10 million hits per day, if the site is database driven more than likely it''s going to take more than just adding X number of servers per X amount of traffic. How you setup your database servers to handle reads/writes would probably be one of the bigger issues, as well as how you handle your sessions and caching. I would start out with some of the basics in place, like a database cluster of some type and hardware based load balancers on the front end such as ServerIrons. Some things are easier to change once you get going than others. Switching from a single database to a cluster while you are already getting a million hits a day is not fun. You will also be spending some money on routers, probably something like the cisco 28XX or 38XX series. You could easily use 20mbps or so when bursting. And if your clients have unrealistic expectations, I would be very very careful. Personally I tell my clients to be prepared for the worst, and if they except that, only then will I work for them. That way when something does go wrong (and it will), they won''t be coming back to you yelling and screaming. They might not like things going wrong, but they will remember that you told them that things like this were bound to happen, and to be prepared for it.
10,000,000 (ten million) hits a day is only 115.74 hits per second. 10,000 hits per second is 864,000,000 hits per day. There''s no way I''m going to believe that your client is going to be getting 864,000,000 page views per day. Any company that is getting that many page views per day, even in bursts, already has the architecture and infrastructure in place to handle it, so they wouldn''t be asking your advice, furthermore they''d know HOW to handle that kind of load too, so again, they wouldn''t be asking an outsider for help. Find out how many hits your customer is going to REALISTICALLY be expecting, then come back and re-submit your question. This is of course excluding sites whose sole purpose is to support massive botnets and other tools of evil. -masukomi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060730/d1a766f2/attachment-0001.html
On 7/30/06, kate rhodes <masukomi@gmail.com> wrote:> 10,000,000 (ten million) hits a day is only 115.74 hits per second. > 10,000 hits per second is 864,000,000 hits per day. > > There''s no way I''m going to believe that your client is going to be getting > 864,000,000 page views per day. Any company that is getting that many page > views per day, even in bursts, already has the architecture and > infrastructure in place to handle it, so they wouldn''t be asking your > advice, furthermore they''d know HOW to handle that kind of load too, so > again, they wouldn''t be asking an outsider for help.I thought I said it''s a lot of burst traffic (meaning a lot of traffic in a short amount of time). The site is not going to sustain that much traffic through out the day. Joe> Find out how many hits your customer is going to REALISTICALLY be expecting, > then come back and re-submit your question. > > This is of course excluding sites whose sole purpose is to support massive > botnets and other tools of evil. > > -masukomi > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >
Tim Perrett
2006-Jul-30 19:23 UTC
[Rails] Re: Re: Dynamically generating 10k pages per second
Still.... even to be getting anywhere near that kind of traffic, the previous responses are correct, just doesnt sound right!! Any idea on what actual web server your going to be using? lighttpd is capable of dynamic load balencing so if you havent already, take a good long hard look at it! Tim -- Posted via http://www.ruby-forum.com/.
Francis Cianfrocca
2006-Jul-30 20:08 UTC
[Rails] Re: Dynamically generating 10k pages per second
I assume 10,000 hits/second is an average, and I also assume it''s a global average so the peak rate at particular times will be closer to 40,000/second. You are among the very top sliver of the most heavily trafficked sites in the world. Many companies (I assume you''re a company) in this position create their own application software on top of modified kernels. (I''ve been involved in several such efforts with traffic loads similar to yours.) One thing you will not do if you''re like most people is use an RDBMS to back this site. You''ll probably design your own well-customized and highly-denormal data-query system. There are a lot of different approaches to this, but the commercial value of such a well-trafficked site is such that you should already have lined up more than enough funding to do this job right. And there are plenty of Internet-bubble veterans around who''ve been there and done that, that you can hire. I''m still trying to decide if you''re playing with us here. -- Posted via http://www.ruby-forum.com/.
> > I thought I said it''s a lot of burst traffic (meaning a lot of traffic > in a short amount of time). The site is not going to sustain that > much traffic through out the day.Doesn''t really matter, it''s still out in lala land IMO. Look, the other guy is right, this just isn''t how it''s done. Sharp business people know how to find talent, whether it''s because they have experience in the industry, or through VC''s who know how to get talent, or simply because they are smart enough to talk to people who have done it before. If they had any clue at all they would know that they needed people with prior experience in this area. It''s just common sense. That said, lets assume for the moment that it''s legit and you are going to do this. Why haven''t you given any details? Several people have asked for more detail, and they are right in saying that no one can give you any meaningful information without a lot more detail. You have some of the sharpest minds in the ruby community here that would be willing to help, but you essentially deny their help by not giving them the information they need to help you. Which is another reason people are probably disinclined to believe this whole thing is legitimate. In any case, good luck with it all.
On 7/30/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:> I assume 10,000 hits/second is an average, and I also assume it''s a > global average so the peak rate at particular times will be closer to > 40,000/second.It''s not. 5-10k hits per second would be at the very high end for a short amount of time. Joe
On Jul 30, 2006, at 9:02 AM, kate rhodes wrote:> There''s no way I''m going to believe that your client is going to be > getting 864,000,000 page views per day. Any company that is getting > that many page views per day, even in bursts, already has the > architecture and infrastructure in place to handle it, so they > wouldn''t be asking your advice, furthermore they''d know HOW to > handle that kind of load too, so again, they wouldn''t be asking an > outsider for help.Sorry to pick on you, Kate, because this fits many others as well) I cannot imagine why anyone would be so close minded. Does *anything* that you mention in your response fit Google, eBay, or Amazon at inception? Is it *impossible* to imagine that someone has a good idea, has done some research, is *slightly* over optimistic (but not necessarily wrong!), and want to get an idea of what it might take to handle that sort of load? -- -- Tom Mornini
Nathaniel Brown
2006-Jul-31 06:18 UTC
[Rails] Re: Dynamically generating 10k pages per second
Rasmus has done several talks on how to architect a system which can handle the load of a place such as Yahoo. Might be worth sifting through his slides for the diagrams he mentioned. Whether its PHP or Rails, they are similar enough that you can leverage his knoweldge when you get down to something like system architecture. Not sure why everyone is jumping down this guys throat. Who cares if he landed a client like Digg or YouTube? He was just asking for how it would be done. -NSHB On 7/30/06, Tom Mornini <tmornini@infomania.com> wrote:> > On Jul 30, 2006, at 9:02 AM, kate rhodes wrote: > > > There''s no way I''m going to believe that your client is going to be > > getting 864,000,000 page views per day. Any company that is getting > > that many page views per day, even in bursts, already has the > > architecture and infrastructure in place to handle it, so they > > wouldn''t be asking your advice, furthermore they''d know HOW to > > handle that kind of load too, so again, they wouldn''t be asking an > > outsider for help. > > Sorry to pick on you, Kate, because this fits many others as well) > > I cannot imagine why anyone would be so close minded. > > Does *anything* that you mention in your response fit Google, eBay, > or Amazon at inception? > > Is it *impossible* to imagine that someone has a good idea, has done > some research, is *slightly* over optimistic (but not necessarily > wrong!), and want to get an idea of what it might take to handle that > sort of load? > > -- > -- Tom Mornini > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- Kind regards, Nathaniel Brown President & CEO Inimit Innovations Inc. - http://inimit.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060731/bdf9ed4b/attachment-0001.html
Nathaniel Brown
2006-Jul-31 06:19 UTC
[Rails] Re: Dynamically generating 10k pages per second
By the way, there is a rather large archive of PHP talks at http://talks.php.net -NSHB On 7/30/06, Nathaniel Brown <nshb@inimit.com> wrote:> > Rasmus has done several talks on how to architect a system which can > handle the load of a place such as Yahoo. > > Might be worth sifting through his slides for the diagrams he mentioned. > Whether its PHP or Rails, they are similar enough that you can leverage his > knoweldge when you get down to something like system architecture. > > Not sure why everyone is jumping down this guys throat. Who cares if he > landed a client like Digg or YouTube? He was just asking for how it would be > done. > > -NSHB > > > On 7/30/06, Tom Mornini <tmornini@infomania.com> wrote: > > > > On Jul 30, 2006, at 9:02 AM, kate rhodes wrote: > > > > > There''s no way I''m going to believe that your client is going to be > > > getting 864,000,000 page views per day. Any company that is getting > > > that many page views per day, even in bursts, already has the > > > architecture and infrastructure in place to handle it, so they > > > wouldn''t be asking your advice, furthermore they''d know HOW to > > > handle that kind of load too, so again, they wouldn''t be asking an > > > outsider for help. > > > > Sorry to pick on you, Kate, because this fits many others as well) > > > > I cannot imagine why anyone would be so close minded. > > > > Does *anything* that you mention in your response fit Google, eBay, > > or Amazon at inception? > > > > Is it *impossible* to imagine that someone has a good idea, has done > > some research, is *slightly* over optimistic (but not necessarily > > wrong!), and want to get an idea of what it might take to handle that > > sort of load? > > > > -- > > -- Tom Mornini > > > > _______________________________________________ > > Rails mailing list > > Rails@lists.rubyonrails.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > -- > Kind regards, > > Nathaniel Brown > President & CEO > Inimit Innovations Inc. - http://inimit.com >-- Kind regards, Nathaniel Brown President & CEO Inimit Innovations Inc. - http://inimit.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060731/e85c1845/attachment.html
On 7/30/06, Nathaniel Brown <nshb@inimit.com> wrote:> Rasmus has done several talks on how to architect a system which can handle > the load of a place such as Yahoo.Any chance you could link to those slides?> Might be worth sifting through his slides for the diagrams he mentioned. > Whether its PHP or Rails, they are similar enough that you can leverage his > knoweldge when you get down to something like system architecture. > > Not sure why everyone is jumping down this guys throat. Who cares if he > landed a client like Digg or YouTube? He was just asking for how it would be > done.:-) The intent of my question was to figure out what changes when you move to, say, 100 dynamic pages per second, which my laptop can handle, to 1000 dynamic pages per second, which probably a couple servers could handle, to 10k pages per second. I probably should''ve phrased the original question better. Again, this is a large amount of traffic in a *burst*, a short amount of time. Think victoriasecret.com advertised during the superbowl. Not quite at that level, but the general idea applies. In my situation, handling 500 to 1,000 dynamic pages per second without any slow down would be great, and quite honestly, is probably all we''ll ever need. But, the folks I''m doing this for want to be assured that it''s not too difficult to go higher. I don''t have much experience at that level of performance, hence the question. Another question: Assuming I''ve got some initial architecture in place, how do I test everything? Using the Apache benchmark ''ab'' program seems to only measure the performance of downloading one single page, so it wouldn''t measure the effect of having a couple images, javascript includes, css files, etc. Thanks for all your responses, even the snarky ones, :-D Joe> On 7/30/06, Tom Mornini <tmornini@infomania.com> wrote: > > On Jul 30, 2006, at 9:02 AM, kate rhodes wrote: > > > > > There''s no way I''m going to believe that your client is going to be > > > getting 864,000,000 page views per day. Any company that is getting > > > that many page views per day, even in bursts, already has the > > > architecture and infrastructure in place to handle it, so they > > > wouldn''t be asking your advice, furthermore they''d know HOW to > > > handle that kind of load too, so again, they wouldn''t be asking an > > > outsider for help. > > > > Sorry to pick on you, Kate, because this fits many others as well) > > > > I cannot imagine why anyone would be so close minded. > > > > Does *anything* that you mention in your response fit Google, eBay, > > or Amazon at inception? > > > > Is it *impossible* to imagine that someone has a good idea, has done > > some research, is *slightly* over optimistic (but not necessarily > > wrong!), and want to get an idea of what it might take to handle that > > sort of load? > > > > -- > > -- Tom Mornini > > > > _______________________________________________ > > Rails mailing list > > Rails@lists.rubyonrails.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > -- > Kind regards, > > Nathaniel Brown > President & CEO > Inimit Innovations Inc. - http://inimit.com > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >
Rimantas Liubertas
2006-Jul-31 11:36 UTC
[Rails] Re: Dynamically generating 10k pages per second
<...>> Is it *impossible* to imagine that someone has a good idea, has done > some research, is *slightly* over optimistic (but not necessarily > wrong!), and want to get an idea of what it might take to handle that > sort of load?https://gettingreal.37signals.com/samples/37s-scale-later.pdf Regards, Rimantas -- http://rimantas.com/
Francis Cianfrocca
2006-Jul-31 14:36 UTC
[Rails] Re: Re: Dynamically generating 10k pages per second
Joe Van Dyk wrote: > Again, this is a large amount of traffic in a *burst*, a short amount> of time. Think victoriasecret.com advertised during the superbowl. > Not quite at that level, but the general idea applies. > > In my situation, handling 500 to 1,000 dynamic pages per second > without any slow down would be great, and quite honestly, is probably > all we''ll ever need. But, the folks I''m doing this for want to be > assured that it''s not too difficult to go higher. I don''t have much > experience at that level of performance, hence the question. >You''re not defining what a "burst" is. The issue you will face at very high load-levels is this: depending on a lot of factors in your application and in your infrastructure, you may find that scalability barriers emerge at particular load levels that are not easy to break through simply by adding hardware. How close can you get your architecture to true shared-nothing? At the end of the day, you''re sharing the network infrastructure among machines, so you can''t really get all the way to shared-nothing. In my experiences on extremely high-load sites, I''ve seen the barriers emerge the earliest in the RDBMS. (And I''ve seen people try to address this with enormously expensive computers and Oracle licenses, which only gets you so far. A far better answer is not to use an RDBMS.) Another barrier is when you try to use a standard web server like Apache and it runs out of gas. At this point most people write their own custom web server, generally using an event-driven model, that knows a lot about how their dynamic data is structured. Yes, this breaks down all of the commonly-accepted principles of software engineering. And yes, this breakdown is easy to cost-justify at the highest load-level requirements. (Remember, Google went so far as to write their own file system, a very odd duck with engineering parameters that don''t match any application I can imagine, apart from Google.) At really high levels, I''ve even seen the interpacket delay on switched ethernet links inside the server farm become the bottleneck. Your most important question is about the economics of the site. When you get those 10,000/second bursts, is it feasible for the site simply to fail and start sending 503 responses until it recovers? As ugly as it sounds to a techie, that is often the right answer from a business point of view. On the other hand, what if most of the actual value of the site is created in those few seconds of top load? (Victoria''s Secret, your example, got huge publicity out of those high-traffic events, which crashed the first time they ran it. But this is a very unusual example.) In this case, it makes sense for you to design to the highest-traffic case, and as I''m suggesting, you may hit barriers that you will have to solve in very non-standard ways. -- Posted via http://www.ruby-forum.com/.
On 7/31/06, Rimantas Liubertas <rimantas@gmail.com> wrote:> <...> > > Is it *impossible* to imagine that someone has a good idea, has done > > some research, is *slightly* over optimistic (but not necessarily > > wrong!), and want to get an idea of what it might take to handle that > > sort of load? > > https://gettingreal.37signals.com/samples/37s-scale-later.pdfI just knew someone would post a link to that. :-D
Joe Van Dyk
2006-Jul-31 14:56 UTC
[Rails] Re: Re: Dynamically generating 10k pages per second
On 7/31/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:> Joe Van Dyk wrote: > > Again, this is a large amount of traffic in a *burst*, a short amount > > of time. Think victoriasecret.com advertised during the superbowl. > > Not quite at that level, but the general idea applies. > > > > In my situation, handling 500 to 1,000 dynamic pages per second > > without any slow down would be great, and quite honestly, is probably > > all we''ll ever need. But, the folks I''m doing this for want to be > > assured that it''s not too difficult to go higher. I don''t have much > > experience at that level of performance, hence the question. > > > > You''re not defining what a "burst" is.Perhaps I''m using the term incorrectly. "An abrupt, intense increase; a rush: a burst of speed; fitful bursts of wind." is how I mean it.> The issue you will face at very > high load-levels is this: depending on a lot of factors in your > application and in your infrastructure, you may find that scalability > barriers emerge at particular load levels that are not easy to break > through simply by adding hardware. How close can you get your > architecture to true shared-nothing? At the end of the day, you''re > sharing the network infrastructure among machines, so you can''t really > get all the way to shared-nothing. > > In my experiences on extremely high-load sites, I''ve seen the barriers > emerge the earliest in the RDBMS. (And I''ve seen people try to address > this with enormously expensive computers and Oracle licenses, which only > gets you so far. A far better answer is not to use an RDBMS.)Huh, first time I''ve heard that.> Another > barrier is when you try to use a standard web server like Apache and it > runs out of gas. At this point most people write their own custom web > server, generally using an event-driven model, that knows a lot about > how their dynamic data is structured. Yes, this breaks down all of the > commonly-accepted principles of software engineering. And yes, this > breakdown is easy to cost-justify at the highest load-level > requirements. (Remember, Google went so far as to write their own file > system, a very odd duck with engineering parameters that don''t match any > application I can imagine, apart from Google.) At really high levels, > I''ve even seen the interpacket delay on switched ethernet links inside > the server farm become the bottleneck. > > Your most important question is about the economics of the site. When > you get those 10,000/second bursts, is it feasible for the site simply > to fail and start sending 503 responses until it recovers? As ugly as it > sounds to a techie, that is often the right answer from a business point > of view. On the other hand, what if most of the actual value of the site > is created in those few seconds of top load? (Victoria''s Secret, your > example, got huge publicity out of those high-traffic events, which > crashed the first time they ran it. But this is a very unusual example.) > In this case, it makes sense for you to design to the highest-traffic > case, and as I''m suggesting, you may hit barriers that you will have to > solve in very non-standard ways.I realize it depends entirely on the application and usage patterns, but at what point does "standard share-nothing" practice break down? I assume it''s somewhere in between 1000 to 10000 requests per second?
Jeff Pritchard
2006-Jul-31 15:06 UTC
[Rails] Re: Re: Dynamically generating 10k pages per second
I think one little 386 box with a 56k modem connection will do nicely. Anything that is amazing enough to make that many people run to their computer all at once is worth waiting for! jp -- Posted via http://www.ruby-forum.com/.
Ben Bleything
2006-Jul-31 15:43 UTC
[Rails] Re: Re: Dynamically generating 10k pages per second
On Mon, Jul 31, 2006, Francis Cianfrocca wrote:> In my experiences on extremely high-load sites, I''ve seen the barriers > emerge the earliest in the RDBMS. (And I''ve seen people try to address > this with enormously expensive computers and Oracle licenses, which only > gets you so far. A far better answer is not to use an RDBMS.)A less extreme solution to the database bottleneck is to cache heavily. Assuming you have the resources for it (a bunch of RAM on a few servers), memcached can be extremely effective at alleviating db bottleneck. http://danga.com/memcached/ http://deveiate.org/projects/RMemCache (the original client) http://dev.robotcoop.com/Libraries/ (memcache-client and cached_model) Robot Coop''s library is said to be faster than the original, but I haven''t tested. cached_model makes it very easy to cache simple queries (see documentation for details). Stefan Kaes also discusses storing sessions in memcached for additional speed boosts: http://www.railsexpress.de/blog/articles/2006/01/24/using-memcached-for-ruby-on-rails-session-storage In general, the railsexpress blog is an awesome resource for performance tricks.> Another barrier is when you try to use a standard web server like > Apache and it runs out of gas. At this point most people write their > own custom web server, generally using an event-driven model, that > knows a lot about how their dynamic data is structured. Yes, this > breaks down all of the commonly-accepted principles of software > engineering. And yes, this breakdown is easy to cost-justify at the > highest load-level requirements. (Remember, Google went so far as > to write their own file system, a very odd duck with engineering > parameters that don''t match any application I can imagine, apart > from Google.) At really high levels, I''ve even seen the interpacket > delay on switched ethernet links inside the server farm become the > bottleneck.I''m not sure I buy this. The high-load sites I''m familiar with still use apache (LiveJournal and slashdot are two good examples). While you will get a large performance boost out of writing your own specialty infrastructure, it''s not worth the time for the vast majority of sites (even those doing more traffic than Joe is talking about). The mantra of successful scaling is almost always "scale out not up". More machines and better load balancing is almost always better than bigger machines running fancier software. Load balancing is an easier problem to solve than reinventing the (HTTP) wheel. Ben
bbqDude
2006-Jul-31 15:51 UTC
[Rails] Re: Re: Re: Dynamically generating 10k pages per second
if i can get that many hits per day, my wife and i would be working from hawaii in a nice beach house with a lifted tundra truck parked outside. -- Posted via http://www.ruby-forum.com/.
On Jul 31, 2006, at 4:36 AM, Rimantas Liubertas wrote:> <...> >> Is it *impossible* to imagine that someone has a good idea, has done >> some research, is *slightly* over optimistic (but not necessarily >> wrong!), and want to get an idea of what it might take to handle that >> sort of load? > > https://gettingreal.37signals.com/samples/37s-scale-later.pdfHe didn''t say he was going to go out and scale, he said he wanted to know what was required to do so. It''s conceivable, for instance, that the economics of the project require high loads before it''ll ever be profitable. Many people, perhaps many of those who read the book, and perhaps those who wrote it, might look down their noses at such projects. However, with that attitude, we might not have light bulbs or the electrical power to run them, as those were projects that both required massive scale before they made any money at all. And those guys put a *lot* of thought into making sure they could scale long before it was required. I read the book, and agree with the premises. But the book doesn''t mean the OP: 1) Was insane 2) Has insane customers 3) Should be embarrassed for asking -- -- Tom Mornini
Francis Cianfrocca
2006-Jul-31 16:23 UTC
[Rails] Re: Re: Re: Dynamically generating 10k pages per second
Ben Bleything wrote: >> I''m not sure I buy this. The high-load sites I''m familiar with still > use apache (LiveJournal and slashdot are two good examples). While you > will get a large performance boost out of writing your own specialty > infrastructure, it''s not worth the time for the vast majority of sites > (even those doing more traffic than Joe is talking about). > > The mantra of successful scaling is almost always "scale out not up". > More machines and better load balancing is almost always better than > bigger machines running fancier software. Load balancing is an easier > problem to solve than reinventing the (HTTP) wheel. >It sounds like you habitually build dynamic web sites that sustain 10,000 hits/second or more. I''ve only worked on a handful of them, and they were all well-known sites with very large amounts of VC funding behind them. The vast majority of sites should be built with commodity tools and commodity hardware, they don''t make economic sense otherwise, that''s obvious. But for really serious, high-value applications with extremely large working sets, I can tell you firsthand that the investment in custom software really can pay off. (And I''ve already written low-drag, event-driven custom HTTP servers, multiple times- it''s not as hard as some may think.) I have no doubt that a site like slashdot can scale easily enough with commodity software. Try something like DoubleClick or Google, though. (But of course we''re offtopic since the OP has already clarified that his scale requirements are not sustained.) -- Posted via http://www.ruby-forum.com/.
Ben Bleything
2006-Jul-31 16:54 UTC
[Rails] Re: Re: Re: Dynamically generating 10k pages per second
On Mon, Jul 31, 2006, Francis Cianfrocca wrote:> It sounds like you habitually build dynamic web sites that sustain > 10,000 hits/second or more. I''ve only worked on a handful of them, and > they were all well-known sites with very large amounts of VC funding > behind them. The vast majority of sites should be built with commodity > tools and commodity hardware, they don''t make economic sense otherwise, > that''s obvious. But for really serious, high-value applications with > extremely large working sets, I can tell you firsthand that the > investment in custom software really can pay off. (And I''ve already > written low-drag, event-driven custom HTTP servers, multiple times- it''s > not as hard as some may think.)I''m not sure why you think it sounds like that. I said "the sites I''m familiar with" and gave two examples that are much smaller than DoubleClick and Google. That said, they''re also much closer to the theoretical site the OP was talking about than a site like Google. I would suggest that if we really were talking about a site on the level of Google (which we''re not), building a custom HTTP layer will only defer the application-level bottleneck. Sooner or later it''s going to come back to Rails. Since we''re talking about a Rails app, in my opinion it only makes sense to consider commodity (ie, outward) scaling. I have no doubt that what you''re talking about is highly effective, and given the an appropriate level of resources it would be the best solution... but how many sites that are planning on maybe peaking at 10k hits/sec are in a position to invest that much?> I have no doubt that a site like slashdot can scale easily enough with > commodity software. Try something like DoubleClick or Google, though.Yep, and the former is what I was talking about. Ben
> Sorry to pick on you, Kate, because this fits many others as well) > > I cannot imagine why anyone would be so close minded. > > Does *anything* that you mention in your response fit Google, eBay, > or Amazon at inception? > > Is it *impossible* to imagine that someone has a good idea, has done > some research, is *slightly* over optimistic (but not necessarily > wrong!), and want to get an idea of what it might take to handle that > sort of load?No worries Tom, and you''re right none of that matches Google, eBay or Amazon at inception. But neither do the numbers. Unless they''re something illegitimate going on companies don''t suddenly get 10k hits per second. Live Journal is a good example. They started off with a fairly standard webapp and their infrastructure (software and hardware) grew along with the load. But none of these companies, AFAIK, called up a contractor and said "OMFG 10k hits help!". I think you''re totally right about them being overly optimistic. And unfortunately that puts Joe in a bad position that''s the result of them not understanding how unlikely that kind of load is and them not realizing just how dramatic the difference is between a normal webapp infrastructure and one that can handle that kind of load. Hopefully they have deep pockets and Joe will be able to actually deliver something that can support that. If you do Joe i''m sure we''d love to hear your solution. As for real advice i have to agree with Francis. The largest site I''ve worked on averaged about 3,000,000 page views per day (mostly during business hours) and we were only able to maintain quick response times due to having a custom built search indexing system. We had a ridiculously large database of items and the system was so thoroughly optimized that it took like a day and a half to generate an updated index. I scoffed at it when i was first hired. But I soon became a convert. When you deal with loads like that you seriously need to consider hiring some brilliant old school coder with a greying beard. -- - kate = masukomi
Francis Cianfrocca
2006-Aug-01 12:31 UTC
[Rails] Re: Re: Re: Re: Dynamically generating 10k pages per second
Ben, I''ll take one more bite at this apple, and my apologies for continuing the threadjack. You said this:>The high-load sites I''m familiar with still > use apache (LiveJournal and slashdot are two good examples).which I took to mean that you are part of the engineering team for these sites and others with similar loads, so you have firsthand knowledge. (I don''t know how much traffic LiveJournal gets.) The actual application profile determines to a great degree what kind of scalability you will need, and there are interesting tradeoffs all over the place. It''s not universally true, for example, that a site becomes superlinearly more valuable with usage. (Put differently, Metcalfe''s Law may not be true in general.) That makes it reasonable for people approaching a site that may someday become really big to use the commodity-software, outward-scaling approach, which in essence considers development effort (including time-to-market) to be more expensive than operational costs. My real point (borne out by first-hand experience) is twofold: First, with extremely large sites (and there are so few of them in reality that each one is a special case, and there really are no universally-applicable best-practices), you have to consider that operational costs at a certain point really do outweigh development costs. As an extreme example, Eric Schmidt has said that one of the biggest costs Google faces is for *electricity.* Well-engineered custom software can be the difference between economically possible and not-possible. Second, with some problems, outward scaling simply can''t be made to work. One of your examples is slashdot. Think about what /. does and you can see that there are multiple points where scalability can be enhanced by partitioning working sets, introducing update latency, etc etc. But look at something like an ad-serving network, which relies on a vast working set that is in constant flux. You can''t just scale that up by adding more machines and more switched network segments. Very early in that process, your replication-traffic alone will swamp your internal network. (I''ve seen that myself, which is why I mentioned Ethernet interpacket delays as a critical factor.) There are many places for Ruby in such an environment. I''m working on one now that uses a lot of Ruby, but RoR was not an appropriate choice. We''re using a custom HTTP server that includes a bit of Ruby code, though. -- Posted via http://www.ruby-forum.com/.
Ben Bleything
2006-Aug-01 15:41 UTC
[Rails] Re: Re: Re: Re: Dynamically generating 10k pages per second
On Tue, Aug 01, 2006, Francis Cianfrocca wrote:> >The high-load sites I''m familiar with still > > use apache (LiveJournal and slashdot are two good examples). > > which I took to mean that you are part of the engineering team for these > sites and others with similar loads, so you have firsthand knowledge. (I > don''t know how much traffic LiveJournal gets.)I''m sorry if I gave you that impression. I did work for LiveJournal, and while I was in the room with the engineers and heard a lot of the scaling discussion, I was not actively a part of engineering. What I know of Slashdot comes from hearing discussions between the LJ and Slashdot engineers about the topic, and from their talks at OSCON and the like. Far from first-hand knowledge, for sure, but recall that all I said was that they use apache, and it''s undeniable that Slashdot is a high-traffic site :)> The actual application profile determines to a great degree what kind of > scalability you will need, and there are interesting tradeoffs all over > the place. It''s not universally true, for example, that a site becomes > superlinearly more valuable with usage. (Put differently, Metcalfe''s Law > may not be true in general.) That makes it reasonable for people > approaching a site that may someday become really big to use the > commodity-software, outward-scaling approach, which in essence considers > development effort (including time-to-market) to be more expensive than > operational costs.Of course. I should note that I don''t disagree with you at all. Maybe I''m off by an order of magnitude, but a site which *might* burst to 10k hits/sec just does not strike me as a Google-level site. Since that''s what the OP was talking about, discussing the techniques of those sites (while academically interesting) doesn''t seem to have bearing on the conversation.> My real point (borne out by first-hand experience) is twofold: > > First, with extremely large sites (and there are so few of them in > reality that each one is a special case, and there really are no > universally-applicable best-practices), you have to consider that > operational costs at a certain point really do outweigh development > costs. As an extreme example, Eric Schmidt has said that one of the > biggest costs Google faces is for *electricity.* Well-engineered custom > software can be the difference between economically possible and > not-possible.True.> Second, with some problems, outward scaling simply can''t be made to > work. One of your examples is slashdot. Think about what /. does and you > can see that there are multiple points where scalability can be enhanced > by partitioning working sets, introducing update latency, etc etc. But > look at something like an ad-serving network, which relies on a vast > working set that is in constant flux. You can''t just scale that up by > adding more machines and more switched network segments. Very early in > that process, your replication-traffic alone will swamp your internal > network. (I''ve seen that myself, which is why I mentioned Ethernet > interpacket delays as a critical factor.)Again, I agree completely. But (as you mention below), a CRUD-based framework probably doesn''t make any sense in an ad-serving environment. Frankly, that''s a much more specialized problem than the average application, and so again seems tangential to the discussion. Of course, this is all based on my assumption that the OP is building the typical Rails site, since he didn''t say anything to the contrary. I appreciate that very large sites can benefit from customized software and clever infrastructure. But I also recognize (as was mentioned previously!) that companies doing those kinds of sites are unlikely to elect to use Rails unless it''s appropriate for their site. In the context of a Rails application, outward scaling is going to be more effective than upward scaling. That''s all I was ever saying :)> There are many places for Ruby in such an environment. I''m working on > one now that uses a lot of Ruby, but RoR was not an appropriate choice. > We''re using a custom HTTP server that includes a bit of Ruby code, > though.Very interesting. Does the server encapsulate the application logic as well, or can it serve other applications? How is the Ruby stuff tied in? Ben ps- I think this is close enough to the topic that it''s not really a threadjack. People interested in very large sites will be reading this thread, and a discussion of when and how you have to move past your framework is interesting and valuable.
Francis Cianfrocca
2006-Aug-01 18:29 UTC
[Rails] Re: Re: Re: Re: Re: Dynamically generating 10k pages per sec
Ben Bleything wrote:> Very interesting. Does the server encapsulate the application logic as > well, or can it serve other applications? How is the Ruby stuff tied > in?It''s a general server we''ve used on several applications now. The hard-core guts of it are coded in C and we put a Ruby-extension wrapper around it so it can be started as a Ruby program. That lets us add functionality in Ruby for handling some of the requests. With roughly equivalent dynamically-generated loads, it''s maybe twenty times faster than RoR+Apache+Fastcgi on the same hardware. When we build plain old CRUD sites with this technology, we generally use a component-based framework rather than an action-based one- seems to make everything easier.> ps- I think this is close enough to the topic that it''s not really a > threadjack. People interested in very large sites will be reading this > thread, and a discussion of when and how you have to move past your > framework is interesting and valuable.It might be really interesting to ask what are the outer limits of RoR+your typical clustered RDBMS engine. Any web site that sustains 1,000 dynamic requests per second is a major piece of engineering, and probably not that rare a requirement either. (Being able to burst up to 10,000/sec isn''t really interesting without knowing how wide and how frequent the bursts are.) It would be great to determine and publish best-practices for such sites. Especially if they can be combined with expected cost metrics. (For example, Java partisans might argue that the 3-5x development cost increment of using J2EE is well-compensated for by requiring less hardware and infrastructure at runtime, but that might not turn out to be true.) And of course not all CRUD applications are created equal. Some are read-many-write-few, so caching result sets gives a big win and permits easy outward scaling. But some are read-many-write-many. Those are harder. -- Posted via http://www.ruby-forum.com/.