thr3ads.net - mongrel unicorn - 502s with Nginx, Unicorn, and Unix Domain Sockets [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Tom Preston-Werner

2009-Sep-18 04:54 UTC

502s with Nginx, Unicorn, and Unix Domain Sockets

I''m doing some benchmarking on our new Rackspace frontend machines (8
core, 16GB) and running into some problems with the Unix domain socket
setup. At high request rates (on simple pages) I''m getting a lot of
HTTP 502 errors from Nginx. Nothing shows up in the Unicorn error log,
but Nginx has the following in its error log:

2009/09/17 19:36:52 [error] 28277#0: *524824 connect() to
unix:/data/github/current/tmp/sockets/unicorn.sock failed (11:
Resource temporarily unavailable) while connecting to upstream,
client: 172.17.1.5, server: github.com, request: "GET /site/junk
HTTP/1.1", upstream:
"http://unix:/data/github/current/tmp/sockets/unic
orn.sock:/site/junk", host: "github.com"

This problem does not exist with the nginx -> haproxy -> unicorn
setup. Thinking this might be a file descriptor problem, I upped the
fd limit to 32768 with no luck. Then I tried upping net.core.somaxconn
to 262144 which also had no effect. I thought I''d ask about the
problem here to see if anyone knows a simple solution that I''m
missing. Perhaps there is an Nginx configuration directive I need?
Thanks. Unicorn rocks!

Tom

--
Tom Preston-Werner
GitHub Cofounder
http://tom.preston-werner.com
github.com/mojombo

Eric Wong

2009-Sep-18 06:48 UTC

head link

502s with Nginx, Unicorn, and Unix Domain Sockets

Tom Preston-Werner <tom at github.com> wrote:> I''m doing some benchmarking on our new Rackspace frontend machines
(8
> core, 16GB) and running into some problems with the Unix domain socket
> setup. At high request rates (on simple pages) I''m getting a lot
of
> HTTP 502 errors from Nginx. Nothing shows up in the Unicorn error log,
> but Nginx has the following in its error log:
Hi Tom,

At what request rates were you running into this?  Also how large are
your responses?  It could be the listen() backlog overflowing if Unicorn
isn''t logging anything.  Anything in the system/kernel logs (doubtful,
actually)?

Does increasing the listen :backlog parameter work?  Default is 1024
(which is pretty high already), maybe try a higher number along with the
net.core.netdev_max_backlog sysctl.

Is there a large discrepancy between the times your benchmark client
logs, the request time nginx logs, and whatever Rails/Rack logs for
request times for any particular request?

If the Rails/Rack logging times all seem consistently low but your
nginx/benchmark has some weird spikes/outliers, then some are stuck in
the kernel listen backlog.

How much of the 8 cores are being used on those boxes when this
starts happening?
> 2009/09/17 19:36:52 [error] 28277#0: *524824 connect() to
> unix:/data/github/current/tmp/sockets/unicorn.sock failed (11:
> Resource temporarily unavailable) while connecting to upstream,
> client: 172.17.1.5, server: github.com, request: "GET /site/junk
> HTTP/1.1", upstream:
> "http://unix:/data/github/current/tmp/sockets/unic
> orn.sock:/site/junk", host: "github.com"
Raising proxy_connect_timeout in nginx may be a work around, what is it
set to now?  On the other hand, keeping it (and :backlog in Unicorn) low
would give better indications for failover to other hosts.
> This problem does not exist with the nginx -> haproxy -> unicorn
> setup. Thinking this might be a file descriptor problem, I upped the
> fd limit to 32768 with no luck. Then I tried upping net.core.somaxconn
> to 262144 which also had no effect. I thought I''d ask about the
> problem here to see if anyone knows a simple solution that I''m
> missing. Perhaps there is an Nginx configuration directive I need?
> Thanks. Unicorn rocks!
Definitely not a file descriptor problem (at least not inside Unicorn).

Also, I''m not sure there''s a reason to keep haproxy between
nginx
and Unicorn...  Maybe haproxy in front of the entire cluster of servers.

Are you already hitting higher request rates (and more consistent
times logged by client/nginx) with:

  nginx -> unicorn/unix

vs

  nginx -> unicorn/tcp(localhost)

?

Under extremely high loads, 502s may actually be wanted since it allows
failover to a less loaded box if there''s uneven balancing; but we
really
need to have numbers on the request rates.

-- 
Eric Wong

Eric Wong

2009-Sep-19 02:30 UTC

head link

502s with Nginx, Unicorn, and Unix Domain Sockets

Hi Tom, any updates on this?  I''d really like to get to the bottom of
this, thanks!

-- 
Eric Wong

Tom Preston-Werner

2009-Sep-19 20:23 UTC

head link

502s with Nginx, Unicorn, and Unix Domain Sockets

On Thu, Sep 17, 2009 at 11:48 PM, Eric Wong <normalperson at yhbt.net>
wrote:> At what request rates were you running into this? ?Also how large are
> your responses? ?It could be the listen() backlog overflowing if Unicorn
> isn''t logging anything.
I was hitting the 502s at about 1300 req/sec and 80% CPU utilization.
Response size was only a few bytes + headers. I was just testing a
very simple string response from our Rails app to make sure our setup
could tolerate very high request rates.
> Does increasing the listen :backlog parameter work? ?Default is 1024
> (which is pretty high already), maybe try a higher number along with the
> net.core.netdev_max_backlog sysctl.
This was the first thing I tried after getting your response, and it
seems that upping the :backlog to 2048 solves the 502 problem! I''m now
able to get 1500 req/sec out of Unicorn/UNIX (as opposed to 1350
req/sec with the TCP/HAProxy setup). I''m quite satisfied with this
result, and I think this is how we''ll end up deploying the app.

Thanks for your help, and I''ll try to keep you updated on how our
installation performs and if I see any strange behavior under normal
traffic.

Tom

Eric Wong

2009-Sep-19 22:08 UTC

head link

502s with Nginx, Unicorn, and Unix Domain Sockets

Tom Preston-Werner <tom at github.com> wrote:> On Thu, Sep 17, 2009 at 11:48 PM, Eric Wong <normalperson at
yhbt.net> wrote:
> > At what request rates were you running into this? ??Also how large are
> > your responses? ??It could be the listen() backlog overflowing if
Unicorn
> > isn''t logging anything.
> 
> I was hitting the 502s at about 1300 req/sec and 80% CPU utilization.
> Response size was only a few bytes + headers. I was just testing a
> very simple string response from our Rails app to make sure our setup
> could tolerate very high request rates.
Yup, as I suspected: your UNIX socket setup was maxing out right around
where your TCP setup was maxing out.  TCP is just better at
handling/recovering from errors.
> > Does increasing the listen :backlog parameter work? ??Default is 1024
> > (which is pretty high already), maybe try a higher number along with
the
> > net.core.netdev_max_backlog sysctl.
> 
> This was the first thing I tried after getting your response, and it
> seems that upping the :backlog to 2048 solves the 502 problem! I''m
now
> able to get 1500 req/sec out of Unicorn/UNIX (as opposed to 1350
> req/sec with the TCP/HAProxy setup). I''m quite satisfied with this
> result, and I think this is how we''ll end up deploying the app.
Good to know it worked!

However, I do hesitate to recommend a large listen() backlog for
production.  It can impede with monitoring/failover/load-balancing in
multi-server setups even if it looks good on benchmarks.

I''ll make a separate call-for-testing mailing list related to
this subject in a bit...
> Thanks for your help, and I''ll try to keep you updated on how our
> installation performs and if I see any strange behavior under normal
> traffic.
No problem, thanks for the feedback!  It''s great to know people
actually use it.

-- 
Eric Wong

mongrel unicorn - Sep 2009 - 502s with Nginx, Unicorn, and Unix Domain Sockets

502s with Nginx, Unicorn, and Unix Domain Sockets

502s with Nginx, Unicorn, and Unix Domain Sockets

502s with Nginx, Unicorn, and Unix Domain Sockets

502s with Nginx, Unicorn, and Unix Domain Sockets

502s with Nginx, Unicorn, and Unix Domain Sockets