We''re using Unicorn to serve a Rails app on a few app servers built on Amazon EC2 instances. Each of the xlarge EC2 instances have the equivalent of 8 CPUs, but it seems like our Unicorn master and 8 workers are only utilizing the first CPU. We''ve been watching the CPU graphs from collectd data when the website is under load, and only cpu-0 shows any activity ... the others seem to be idle, or minimally used by other services. I had assumed that the OS would automatically allocate the Unicorn worker processes to use multiple CPUs, but now I''m not sure. I couldn''t find anything about this in the Unicorn docs (except for the mention of the worker_processes configuration, which seems to imply that multiple CPUs would be used). Is there something that I''m not doing? Our EC2 instances are running Ubuntu 10.04 LTS with Linux kernel 2.6.32. Thanks in advance for any insights or suggestions. Nate Clark Pivotal Labs Singapore
Hi Nate,> We''ve been watching the CPU > graphs from collectd data when the website is under load, and only > cpu-0 shows any activity ... the others seem to be idle, or minimally > used by other services.I don''t think you can rely on the numbers collectd (nor top) gives you when measuring from within the hypervisor powering your EC2 instance. The only reliable source of CPU utilization is Cloudwatch, as that measures outside your instances. I''ve used an array of xlarge instances myself, each running 17 unicorn workers serving a rails app, consuming 4GB, leaving 3GB, no swap. Worked well for us under high load. It couldn''t have handled that if all 17 unicorn workers would''ve been served by 1 of those 8 virtual cores. Cheers, Lawrence
Lawrence, I''ve suspected that it may be a monitoring problem and not a Unicorn problem, but I''m not yet convinced either way. Our monitoring via collectd is done through Rightscale. They have a lot of experience with EC2, so I''d assume that it is monitoring properly. Also, our other services (mysql, for example) are showing activity on multiple cores under load, so that leads me to believe that the monitoring is working in at least some cases. I wasn''t aware of the Cloudwatch service until now, that looks interesting ... I''ll check it out. Anyone else experience a problem like this? Nate On Tue, May 31, 2011 at 8:10 PM, Lawrence Pit <lawrence.pit at gmail.com> wrote:> Hi Nate, >> >> We''ve been watching the CPU >> graphs from collectd data when the website is under load, and only >> cpu-0 shows any activity ... the others seem to be idle, or minimally >> used by other services. > > I don''t think you can rely on the numbers collectd (nor top) gives you when > measuring from within the hypervisor powering your EC2 instance. The only > reliable source of CPU utilization is Cloudwatch, as that measures outside > your instances. > > I''ve used an array of xlarge instances myself, each running 17 unicorn > workers serving a rails app, consuming 4GB, leaving 3GB, no swap. Worked > well for us under high load. It couldn''t have handled that if all 17 unicorn > workers would''ve been served by 1 of those 8 virtual cores. > > > Cheers, > Lawrence > >
We experience the same problem. I believe the problem has more to do with the kernel CPU scheduler than anything else. If you figure put a reliable way to spread the load, I''d like to hear it. Clifton King Development clifton at orgsync.com 512-940-7744 Sent from my phone. On May 31, 2011, at 7:20 AM, Nate Clark <nate at pivotallabs.com> wrote:> Lawrence, > > I''ve suspected that it may be a monitoring problem and not a Unicorn > problem, but I''m not yet convinced either way. Our monitoring via > collectd is done through Rightscale. They have a lot of experience > with EC2, so I''d assume that it is monitoring properly. Also, our > other services (mysql, for example) are showing activity on multiple > cores under load, so that leads me to believe that the monitoring is > working in at least some cases. > > I wasn''t aware of the Cloudwatch service until now, that looks > interesting ... I''ll check it out. > > Anyone else experience a problem like this? > > Nate > > On Tue, May 31, 2011 at 8:10 PM, Lawrence Pit <lawrence.pit at gmail.com> wrote: >> Hi Nate, >>> >>> We''ve been watching the CPU >>> graphs from collectd data when the website is under load, and only >>> cpu-0 shows any activity ... the others seem to be idle, or minimally >>> used by other services. >> >> I don''t think you can rely on the numbers collectd (nor top) gives you when >> measuring from within the hypervisor powering your EC2 instance. The only >> reliable source of CPU utilization is Cloudwatch, as that measures outside >> your instances. >> >> I''ve used an array of xlarge instances myself, each running 17 unicorn >> workers serving a rails app, consuming 4GB, leaving 3GB, no swap. Worked >> well for us under high load. It couldn''t have handled that if all 17 unicorn >> workers would''ve been served by 1 of those 8 virtual cores. >> >> >> Cheers, >> Lawrence >> >> > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying
Nate Clark <nate at pivotallabs.com> wrote:> We''re using Unicorn to serve a Rails app on a few app servers built on > Amazon EC2 instances. Each of the xlarge EC2 instances have the > equivalent of 8 CPUs, but it seems like our Unicorn master and 8 > workers are only utilizing the first CPU. We''ve been watching the CPU > graphs from collectd data when the website is under load, and only > cpu-0 shows any activity ... the others seem to be idle, or minimally > used by other services.What is your request rate and average response time for the application? If requests come in more quickly than one worker can respond, /then/ the kernel may start using more workers. However, it looks like your application is just responding faster and can keep up with requests coming in.> I had assumed that the OS would automatically allocate the Unicorn > worker processes to use multiple CPUs, but now I''m not sure.The kernel does all the work for balancing. -- Eric Wong
Clifton King <cliftonk at gmail.com> wrote:> We experience the same problem. I believe the problem has more to do > with the kernel CPU scheduler than anything else. If you figure put a > reliable way to spread the load, I''d like to hear it.Load not being spread is /not/ a problem unless there are requests that get stuck in the listen queue. If no requests are actually stuck in the queue (light load), the kernel is right to put requests into the most recently used worker since it can get better CPU cache behavior this way. == The real problem Under high loads (many cores, fast responses), Unicorn currently uses more resources because of non-blocking accept() + select(). This isn''t a noticeable problem for most machines (1-16 cores). Future versions of Unicorn may take advantage of /blocking/ accept() optimizations under Linux. Rainbows! already lets you take advantage of this behavior if you meet the following requirements: * Ruby 1.9.x under Linux * only one listen socket (if worker_connections == 1 under Rainbows!) * use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll I haven''t had a chance to benchmark any of this on very big machines so I have no idea how well it actually works compared to Unicorn, only how well it works in theory :) Blocking accept() under Ruby 1.9.x + Linux should distribute load evenly across workers in all situations, even in the non-busy cases where load distribution doesn''t matter (your case :). [1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html -- Eric Wong
Thanks Eric, I had expected that to be the case (we are under light load as of now). On Tue, May 31, 2011 at 10:48 AM, Eric Wong <normalperson at yhbt.net> wrote:> Clifton King <cliftonk at gmail.com> wrote: >> We experience the same problem. I believe the problem has more to do >> with the kernel CPU scheduler than anything else. If you figure put a >> reliable way to spread the load, I''d like to hear it. > > Load not being spread is /not/ a problem unless there are requests that > get stuck in the listen queue. > > If no requests are actually stuck in the queue (light load), the kernel > is right to put requests into the most recently used worker since it can > get better CPU cache behavior this way. > > > == The real problem > > Under high loads (many cores, fast responses), Unicorn currently uses > more resources because of non-blocking accept() + select(). ?This isn''t > a noticeable problem for most machines (1-16 cores). > > Future versions of Unicorn may take advantage of /blocking/ accept() > optimizations under Linux. ?Rainbows! already lets you take advantage > of this behavior if you meet the following requirements: > > * Ruby 1.9.x under Linux > * only one listen socket (if worker_connections == 1 under Rainbows!) > * use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll > > I haven''t had a chance to benchmark any of this on very big machines so > I have no idea how well it actually works compared to Unicorn, only how > well it works in theory :) > > > Blocking accept() under Ruby 1.9.x + Linux should distribute load evenly > across workers in all situations, even in the non-busy cases where load > distribution doesn''t matter (your case :). > > [1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html > > -- > Eric Wong > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying >
Thanks for the responses, all. Eric, you were right, our load was not enough. We had just started on load testing our app, and I think we started with too many app servers and not enough load. Once we cranked up the load and used fewer instances, we''re now?definitely?seeing all CPU cores being utilized. I was not aware that the kernel would optimize like you described. Once we did start seeing heavier load, our collectd data and htop were reporting usage on the virtual cores correctly. Thanks again, very happy with the results so far, Nate On Wed, Jun 1, 2011 at 2:35 PM, Nate Clark <nate at pivotallabs.com> wrote:> > Thanks for the responses, all. > Eric, you were right, our load was not enough. We had just started on load testing our app, and I think we started with too many app servers and not enough load. Once we cranked up the load and used fewer instances, we''re now?definitely?seeing all CPU cores being utilized. I was not aware that the kernel would optimize like you described. > Once we did start seeing heavier load, our collectd data and htop were reporting usage on the virtual cores correctly. > Thanks again, very happy with the results so far, > Nate > On Tue, May 31, 2011 at 11:55 PM, Clifton King <cliftonk at gmail.com> wrote: >> >> Thanks Eric, I had expected that to be the case (we are under light >> load as of now). >> >> On Tue, May 31, 2011 at 10:48 AM, Eric Wong <normalperson at yhbt.net> wrote: >> > Clifton King <cliftonk at gmail.com> wrote: >> >> We experience the same problem. I believe the problem has more to do >> >> with the kernel CPU scheduler than anything else. If you figure put a >> >> reliable way to spread the load, I''d like to hear it. >> > >> > Load not being spread is /not/ a problem unless there are requests that >> > get stuck in the listen queue. >> > >> > If no requests are actually stuck in the queue (light load), the kernel >> > is right to put requests into the most recently used worker since it can >> > get better CPU cache behavior this way. >> > >> > >> > == The real problem >> > >> > Under high loads (many cores, fast responses), Unicorn currently uses >> > more resources because of non-blocking accept() + select(). ?This isn''t >> > a noticeable problem for most machines (1-16 cores). >> > >> > Future versions of Unicorn may take advantage of /blocking/ accept() >> > optimizations under Linux. ?Rainbows! already lets you take advantage >> > of this behavior if you meet the following requirements: >> > >> > * Ruby 1.9.x under Linux >> > * only one listen socket (if worker_connections == 1 under Rainbows!) >> > * use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll >> > >> > I haven''t had a chance to benchmark any of this on very big machines so >> > I have no idea how well it actually works compared to Unicorn, only how >> > well it works in theory :) >> > >> > >> > Blocking accept() under Ruby 1.9.x + Linux should distribute load evenly >> > across workers in all situations, even in the non-busy cases where load >> > distribution doesn''t matter (your case :). >> > >> > [1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html >> > >> > -- >> > Eric Wong >> > _______________________________________________ >> > Unicorn mailing list - mongrel-unicorn at rubyforge.org >> > http://rubyforge.org/mailman/listinfo/mongrel-unicorn >> > Do not quote signatures (like this one) or top post when replying >> > >> _______________________________________________ >> Unicorn mailing list - mongrel-unicorn at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mongrel-unicorn >> Do not quote signatures (like this one) or top post when replying >