thr3ads.net - mongrel unicorn - workers not utilizing multiple CPUs [May 2011]

If this information is useful, please help other people find it:
Share via:

Nate Clark

2011-May-31 09:02 UTC

workers not utilizing multiple CPUs

We''re using Unicorn to serve a Rails app on a few app servers built on
Amazon EC2 instances. Each of the xlarge EC2 instances have the
equivalent of 8 CPUs, but it seems like our Unicorn master and 8
workers are only utilizing the first CPU. We''ve been watching the CPU
graphs from collectd data when the website is under load, and only
cpu-0 shows any activity ... the others seem to be idle, or minimally
used by other services.

I had assumed that the OS would automatically allocate the Unicorn
worker processes to use multiple CPUs, but now I''m not sure. I
couldn''t find anything about this in the Unicorn docs (except for the
mention of the worker_processes configuration, which seems to imply
that multiple CPUs would be used). Is there something that I''m not
doing?

Our EC2 instances are running Ubuntu 10.04 LTS with Linux kernel 2.6.32.

Thanks in advance for any insights or suggestions.

Nate Clark
Pivotal Labs Singapore

Lawrence Pit

2011-May-31 12:10 UTC

head link

workers not utilizing multiple CPUs

Hi Nate,> We''ve been watching the CPU
> graphs from collectd data when the website is under load, and only
> cpu-0 shows any activity ... the others seem to be idle, or minimally
> used by other services.
I don''t think you can rely on the numbers collectd (nor top) gives you 
when measuring from within the hypervisor powering your EC2 instance. 
The only reliable source of CPU utilization is Cloudwatch, as that 
measures outside your instances.

I''ve used an array of xlarge instances myself, each running 17 unicorn 
workers serving a rails app, consuming 4GB, leaving 3GB, no swap. Worked 
well for us under high load. It couldn''t have handled that if all 17 
unicorn workers would''ve been served by 1 of those 8 virtual cores.


Cheers,
Lawrence

Nate Clark

2011-May-31 12:20 UTC

head link

workers not utilizing multiple CPUs

Lawrence,

I''ve suspected that it may be a monitoring problem and not a Unicorn
problem, but I''m not yet convinced either way. Our monitoring via
collectd is done through Rightscale. They have a lot of experience
with EC2, so I''d assume that it is monitoring properly. Also, our
other services (mysql, for example) are showing activity on multiple
cores under load, so that leads me to believe that the monitoring is
working in at least some cases.

I wasn''t aware of the Cloudwatch service until now, that looks
interesting ... I''ll check it out.

Anyone else experience a problem like this?

Nate

On Tue, May 31, 2011 at 8:10 PM, Lawrence Pit <lawrence.pit at gmail.com>
wrote:> Hi Nate,
>>
>> We''ve been watching the CPU
>> graphs from collectd data when the website is under load, and only
>> cpu-0 shows any activity ... the others seem to be idle, or minimally
>> used by other services.
>
> I don''t think you can rely on the numbers collectd (nor top) gives
you when
> measuring from within the hypervisor powering your EC2 instance. The only
> reliable source of CPU utilization is Cloudwatch, as that measures outside
> your instances.
>
> I''ve used an array of xlarge instances myself, each running 17
unicorn
> workers serving a rails app, consuming 4GB, leaving 3GB, no swap. Worked
> well for us under high load. It couldn''t have handled that if all
17 unicorn
> workers would''ve been served by 1 of those 8 virtual cores.
>
>
> Cheers,
> Lawrence
>
>

Clifton King

2011-May-31 14:07 UTC

head link

workers not utilizing multiple CPUs

We experience the same problem. I believe the problem has more to do with the
kernel CPU scheduler than anything else. If you figure put a reliable way to
spread the load, I''d like to hear it.

Clifton King
Development
clifton at orgsync.com
512-940-7744

Sent from my phone. 

On May 31, 2011, at 7:20 AM, Nate Clark <nate at pivotallabs.com> wrote:
> Lawrence,
> 
> I''ve suspected that it may be a monitoring problem and not a
Unicorn
> problem, but I''m not yet convinced either way. Our monitoring via
> collectd is done through Rightscale. They have a lot of experience
> with EC2, so I''d assume that it is monitoring properly. Also, our
> other services (mysql, for example) are showing activity on multiple
> cores under load, so that leads me to believe that the monitoring is
> working in at least some cases.
> 
> I wasn''t aware of the Cloudwatch service until now, that looks
> interesting ... I''ll check it out.
> 
> Anyone else experience a problem like this?
> 
> Nate
> 
> On Tue, May 31, 2011 at 8:10 PM, Lawrence Pit <lawrence.pit at
gmail.com> wrote:
>> Hi Nate,
>>> 
>>> We''ve been watching the CPU
>>> graphs from collectd data when the website is under load, and only
>>> cpu-0 shows any activity ... the others seem to be idle, or
minimally
>>> used by other services.
>> 
>> I don''t think you can rely on the numbers collectd (nor top)
gives you when
>> measuring from within the hypervisor powering your EC2 instance. The
only
>> reliable source of CPU utilization is Cloudwatch, as that measures
outside
>> your instances.
>> 
>> I''ve used an array of xlarge instances myself, each running 17
unicorn
>> workers serving a rails app, consuming 4GB, leaving 3GB, no swap.
Worked
>> well for us under high load. It couldn''t have handled that if
all 17 unicorn
>> workers would''ve been served by 1 of those 8 virtual cores.
>> 
>> 
>> Cheers,
>> Lawrence
>> 
>> 
> _______________________________________________
> Unicorn mailing list - mongrel-unicorn at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-unicorn
> Do not quote signatures (like this one) or top post when replying

Eric Wong

2011-May-31 15:27 UTC

head link

workers not utilizing multiple CPUs

Nate Clark <nate at pivotallabs.com> wrote:> We''re using Unicorn to serve a Rails app on a few app servers
built on
> Amazon EC2 instances. Each of the xlarge EC2 instances have the
> equivalent of 8 CPUs, but it seems like our Unicorn master and 8
> workers are only utilizing the first CPU. We''ve been watching the
CPU
> graphs from collectd data when the website is under load, and only
> cpu-0 shows any activity ... the others seem to be idle, or minimally
> used by other services.
What is your request rate and average response time for the application?

If requests come in more quickly than one worker can respond, /then/ the
kernel may start using more workers.  However, it looks like your
application is just responding faster and can keep up with requests
coming in.
> I had assumed that the OS would automatically allocate the Unicorn
> worker processes to use multiple CPUs, but now I''m not sure.
The kernel does all the work for balancing.

-- 
Eric Wong

Eric Wong

2011-May-31 15:48 UTC

head link

workers not utilizing multiple CPUs

Clifton King <cliftonk at gmail.com> wrote:> We experience the same problem. I believe the problem has more to do
> with the kernel CPU scheduler than anything else. If you figure put a
> reliable way to spread the load, I''d like to hear it.
Load not being spread is /not/ a problem unless there are requests that
get stuck in the listen queue.

If no requests are actually stuck in the queue (light load), the kernel
is right to put requests into the most recently used worker since it can
get better CPU cache behavior this way.

== The real problem

Under high loads (many cores, fast responses), Unicorn currently uses
more resources because of non-blocking accept() + select().  This isn''t
a noticeable problem for most machines (1-16 cores).

Future versions of Unicorn may take advantage of /blocking/ accept()
optimizations under Linux.  Rainbows! already lets you take advantage
of this behavior if you meet the following requirements:

* Ruby 1.9.x under Linux
* only one listen socket (if worker_connections == 1 under Rainbows!)
* use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll

I haven''t had a chance to benchmark any of this on very big machines so
I have no idea how well it actually works compared to Unicorn, only how
well it works in theory :)

Blocking accept() under Ruby 1.9.x + Linux should distribute load evenly
across workers in all situations, even in the non-busy cases where load
distribution doesn''t matter (your case :).

[1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html

-- 
Eric Wong

Clifton King

2011-May-31 15:55 UTC

head link

workers not utilizing multiple CPUs

Thanks Eric, I had expected that to be the case (we are under light
load as of now).

On Tue, May 31, 2011 at 10:48 AM, Eric Wong <normalperson at yhbt.net>
wrote:> Clifton King <cliftonk at gmail.com> wrote:
>> We experience the same problem. I believe the problem has more to do
>> with the kernel CPU scheduler than anything else. If you figure put a
>> reliable way to spread the load, I''d like to hear it.
>
> Load not being spread is /not/ a problem unless there are requests that
> get stuck in the listen queue.
>
> If no requests are actually stuck in the queue (light load), the kernel
> is right to put requests into the most recently used worker since it can
> get better CPU cache behavior this way.
>
>
> == The real problem
>
> Under high loads (many cores, fast responses), Unicorn currently uses
> more resources because of non-blocking accept() + select(). ?This
isn''t
> a noticeable problem for most machines (1-16 cores).
>
> Future versions of Unicorn may take advantage of /blocking/ accept()
> optimizations under Linux. ?Rainbows! already lets you take advantage
> of this behavior if you meet the following requirements:
>
> * Ruby 1.9.x under Linux
> * only one listen socket (if worker_connections == 1 under Rainbows!)
> * use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll
>
> I haven''t had a chance to benchmark any of this on very big
machines so
> I have no idea how well it actually works compared to Unicorn, only how
> well it works in theory :)
>
>
> Blocking accept() under Ruby 1.9.x + Linux should distribute load evenly
> across workers in all situations, even in the non-busy cases where load
> distribution doesn''t matter (your case :).
>
> [1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html
>
> --
> Eric Wong
> _______________________________________________
> Unicorn mailing list - mongrel-unicorn at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-unicorn
> Do not quote signatures (like this one) or top post when replying
>

Nate Clark

2011-Jun-01 06:51 UTC

head link

workers not utilizing multiple CPUs

Thanks for the responses, all.

Eric, you were right, our load was not enough. We had just started on
load testing our app, and I think we started with too many app servers
and not enough load. Once we cranked up the load and used fewer
instances, we''re now?definitely?seeing all CPU cores being utilized. I
was not aware that the kernel would optimize like you described.

Once we did start seeing heavier load, our collectd data and htop were
reporting usage on the virtual cores correctly.
Thanks again, very happy with the results so far,

Nate

On Wed, Jun 1, 2011 at 2:35 PM, Nate Clark <nate at pivotallabs.com>
wrote:>
> Thanks for the responses, all.
> Eric, you were right, our load was not enough. We had just started on load
testing our app, and I think we started with too many app servers and not enough
load. Once we cranked up the load and used fewer instances, we''re
now?definitely?seeing all CPU cores being utilized. I was not aware that the
kernel would optimize like you described.
> Once we did start seeing heavier load, our collectd data and htop were
reporting usage on the virtual cores correctly.
> Thanks again, very happy with the results so far,
> Nate
> On Tue, May 31, 2011 at 11:55 PM, Clifton King <cliftonk at
gmail.com> wrote:
>>
>> Thanks Eric, I had expected that to be the case (we are under light
>> load as of now).
>>
>> On Tue, May 31, 2011 at 10:48 AM, Eric Wong <normalperson at
yhbt.net> wrote:
>> > Clifton King <cliftonk at gmail.com> wrote:
>> >> We experience the same problem. I believe the problem has more
to do
>> >> with the kernel CPU scheduler than anything else. If you
figure put a
>> >> reliable way to spread the load, I''d like to hear it.
>> >
>> > Load not being spread is /not/ a problem unless there are requests
that
>> > get stuck in the listen queue.
>> >
>> > If no requests are actually stuck in the queue (light load), the
kernel
>> > is right to put requests into the most recently used worker since
it can
>> > get better CPU cache behavior this way.
>> >
>> >
>> > == The real problem
>> >
>> > Under high loads (many cores, fast responses), Unicorn currently
uses
>> > more resources because of non-blocking accept() + select(). ?This
isn''t
>> > a noticeable problem for most machines (1-16 cores).
>> >
>> > Future versions of Unicorn may take advantage of /blocking/
accept()
>> > optimizations under Linux. ?Rainbows! already lets you take
advantage
>> > of this behavior if you meet the following requirements:
>> >
>> > * Ruby 1.9.x under Linux
>> > * only one listen socket (if worker_connections == 1 under
Rainbows!)
>> > * use ThreadPool|XEpollThreadPool|XEpollThreadSpawn|XEpoll
>> >
>> > I haven''t had a chance to benchmark any of this on very
big machines so
>> > I have no idea how well it actually works compared to Unicorn,
only how
>> > well it works in theory :)
>> >
>> >
>> > Blocking accept() under Ruby 1.9.x + Linux should distribute load
evenly
>> > across workers in all situations, even in the non-busy cases where
load
>> > distribution doesn''t matter (your case :).
>> >
>> > [1] - http://rainbows.rubyforge.org/Rainbows/XEpollThreadPool.html
>> >
>> > --
>> > Eric Wong
>> > _______________________________________________
>> > Unicorn mailing list - mongrel-unicorn at rubyforge.org
>> > http://rubyforge.org/mailman/listinfo/mongrel-unicorn
>> > Do not quote signatures (like this one) or top post when replying
>> >
>> _______________________________________________
>> Unicorn mailing list - mongrel-unicorn at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/mongrel-unicorn
>> Do not quote signatures (like this one) or top post when replying
>

mongrel unicorn - May 2011 - workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs

workers not utilizing multiple CPUs