Hello, I''m trying to fix a problem I''m experiencing with my Rails application hosted at Heroku. I''ve one POST request which hangs and with the help of a customized rack-timeout gem (https://github.com/tompesman/rack-timeout) I managed to get a stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 How can I debug this further to get to the bottom of this and is this a rack or a unicorn problem? Thanks! -- Tom Pesman
Tom Pesman <tom at tnux.net> wrote:> Hello, > > I''m trying to fix a problem I''m experiencing with my Rails > application hosted at Heroku. I''ve one POST request which hangs and > with the help of a customized rack-timeout gem > (https://github.com/tompesman/rack-timeout) I managed to get a > stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 > > How can I debug this further to get to the bottom of this and is > this a rack or a unicorn problem?It''s a client or proxy problem. The request was too large to be transferred within your configured timeout, or the client or proxy layer was too slow at transferring the POST to unicorn, or the host running unicorn was too overloaded/slow to buffer the request. Factors: 1) Disk/filesystem/memory speed on the (client|proxy) talking to unicorn 2) Disk/filesystem/memory speed on the host running unicorn. 3) The network link between the (client|proxy) <-> unicorn. I don''t know about Heroku, but nginx will fully buffer the request body before sending to unicorn, so all 3 factors are within your control. Does Heroku limit (or allow limiting of) the size of request bodies? Maybe a bad client sent a gigantic request. nginx limits request bodies to 1M by default (client_max_body_size config directive). [1] unicorn buffers request bodies to TMPDIR via TeeInput
> Tom Pesman <tom at tnux.net> wrote: >> Hello, >> >> I''m trying to fix a problem I''m experiencing with my Rails >> application hosted at Heroku. I''ve one POST request which hangs and >> with the help of a customized rack-timeout gem >> (https://github.com/tompesman/rack-timeout) I managed to get a >> stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 >> >> How can I debug this further to get to the bottom of this and is >> this a rack or a unicorn problem? > > It''s a client or proxy problem. > > The request was too large to be transferred within your configured > timeout, or the client or proxy layer was too slow at transferring the > POST to unicorn, or the host running unicorn was too overloaded/slow > to buffer the request. > > Factors: > 1) Disk/filesystem/memory speed on the (client|proxy) talking to unicorn > 2) Disk/filesystem/memory speed on the host running unicorn. > 3) The network link between the (client|proxy) <-> unicorn. > > I don''t know about Heroku, but nginx will fully buffer the request body > before sending to unicorn, so all 3 factors are within your control. > > Does Heroku limit (or allow limiting of) the size of request bodies? > > Maybe a bad client sent a gigantic request. nginx limits request bodies > to 1M by default (client_max_body_size config directive). > > [1] unicorn buffers request bodies to TMPDIR via TeeInput >I agree with you if the POST request has a file to upload, but the requests we''re dealing with do not upload a file and are actually quite small. Example request (this is not a request that has failed, these are hard to catch): https://gist.github.com/tompesman/2e8ca417cdecb7580c9f I''ve also made a dump of the env variable in the customized rack-timeout middleware. This is a request that has failed. https://gist.github.com/tompesman/1217d4f02aa33efcd873 Can I modify the my customized rack-timeout gem to get more information to debug this problem? https://github.com/tompesman/rack-timeout/blob/master/lib/rack/timeout.rb Cheers, -- Tom Pesman
Tom Pesman <tom at tnux.net> wrote:> Eric Wong wrote: > > Tom Pesman <tom at tnux.net> wrote: > >> I''m trying to fix a problem I''m experiencing with my Rails > >> application hosted at Heroku. I''ve one POST request which hangs and > >> with the help of a customized rack-timeout gem > >> (https://github.com/tompesman/rack-timeout) I managed to get a > >> stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 > >> > >> How can I debug this further to get to the bottom of this and is > >> this a rack or a unicorn problem? > > > > It''s a client or proxy problem. > > > > The request was too large to be transferred within your configured > > timeout, or the client or proxy layer was too slow at transferring the > > POST to unicorn, or the host running unicorn was too overloaded/slow > > to buffer the request. > > > > Factors: > > 1) Disk/filesystem/memory speed on the (client|proxy) talking to unicorn > > 2) Disk/filesystem/memory speed on the host running unicorn. > > 3) The network link between the (client|proxy) <-> unicorn. > > > > I don''t know about Heroku, but nginx will fully buffer the request body > > before sending to unicorn, so all 3 factors are within your control. > > > > Does Heroku limit (or allow limiting of) the size of request bodies? > > > > Maybe a bad client sent a gigantic request. nginx limits request bodies > > to 1M by default (client_max_body_size config directive). > > > > [1] unicorn buffers request bodies to TMPDIR via TeeInput > > > > I agree with you if the POST request has a file to upload, but the > requests we''re dealing with do not upload a file and are actually quite > small.Do you have error logs from the proxy Heroku uses? Even with small requests, clients/networks can fail to send the entire request. nginx will log prematurely aborted client requests; check if whatever proxy Heroku uses does the same.> Can I modify the my customized rack-timeout gem to get more information to > debug this problem? > https://github.com/tompesman/rack-timeout/blob/master/lib/rack/timeout.rbYour env.inspect should show you @bytes_read in the Unicorn::TeeInput object before the timeout was hit.
> Tom Pesman <tom at tnux.net> wrote: >> Eric Wong wrote: >> > Tom Pesman <tom at tnux.net> wrote: >> >> I''m trying to fix a problem I''m experiencing with my Rails >> >> application hosted at Heroku. I''ve one POST request which hangs and >> >> with the help of a customized rack-timeout gem >> >> (https://github.com/tompesman/rack-timeout) I managed to get a >> >> stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 >> >> >> >> How can I debug this further to get to the bottom of this and is >> >> this a rack or a unicorn problem? >> > >> > It''s a client or proxy problem. >> > >> > The request was too large to be transferred within your configured >> > timeout, or the client or proxy layer was too slow at transferring the >> > POST to unicorn, or the host running unicorn was too overloaded/slow >> > to buffer the request. >> > >> > Factors: >> > 1) Disk/filesystem/memory speed on the (client|proxy) talking to >> unicorn >> > 2) Disk/filesystem/memory speed on the host running unicorn. >> > 3) The network link between the (client|proxy) <-> unicorn. >> > >> > I don''t know about Heroku, but nginx will fully buffer the request >> body >> > before sending to unicorn, so all 3 factors are within your control. >> > >> > Does Heroku limit (or allow limiting of) the size of request bodies? >> > >> > Maybe a bad client sent a gigantic request. nginx limits request >> bodies >> > to 1M by default (client_max_body_size config directive). >> > >> > [1] unicorn buffers request bodies to TMPDIR via TeeInput >> > >> >> I agree with you if the POST request has a file to upload, but the >> requests we''re dealing with do not upload a file and are actually quite >> small. > > Do you have error logs from the proxy Heroku uses?The only output Heroku proxy (router called at Heroku) gives is something like this: 2013-03-11 15:55:58+00:00 heroku router - - at=info method=POST path=/api/v1/games/2257907/move.json host=aaa.bbb.ccc fwd="xxx.xxx.xxx.xxx/NX" dyno=web.1 queue=0 wait=8ms connect=9ms service=5019ms status=500 bytes=0> Even with small requests, clients/networks can fail to send the entire > request. nginx will log prematurely aborted client requests; check > if whatever proxy Heroku uses does the same.I think it''s hard to debug with Heroku as the router doesn''t give more output than this. I''m in contact with Heroku and I''ll try to debug this with them. For me I have 2 problems with this issue, the waiting affects other requests and it''s happening frequently (say multiple times per minute at 2k-4k rpm). How can I reduce the impact while not aborting valid requests?>> Can I modify the my customized rack-timeout gem to get more information >> to >> debug this problem? >> https://github.com/tompesman/rack-timeout/blob/master/lib/rack/timeout.rb > > Your env.inspect should show you @bytes_read in the Unicorn::TeeInput > object before the timeout was hit.This is the output of env.inspect: https://gist.github.com/tompesman/1217d4f02aa33efcd873 It shows a CONTENT_LENGTH of 1351 and the "rack.input" => Unicorn::TeeInput -> @bytes_read == 0. So what is happening here? 1. Heroku router/proxy sends the header to unicorn. 2. Unicorn receives the request and gives the request to Rack 3. Rack receives the request and asks Unicorn for the contents of the request 4. Unicorn should give the content to Rack. Now it gets interesting, how can I see the raw contents of the request? Or are we certain that it isn''t there because @bytes_read == 0? Thanks! -- Tom Pesman
Tom Pesman <tom at tnux.net> wrote:> > Tom Pesman <tom at tnux.net> wrote: > >> Eric Wong wrote: > >> > Tom Pesman <tom at tnux.net> wrote: > >> >> I''m trying to fix a problem I''m experiencing with my Rails > >> >> application hosted at Heroku. I''ve one POST request which hangs and > >> >> with the help of a customized rack-timeout gem > >> >> (https://github.com/tompesman/rack-timeout) I managed to get a > >> >> stacktrace: https://gist.github.com/tompesman/7b13e02d349aacc720e0 > >> >> > >> >> How can I debug this further to get to the bottom of this and is > >> >> this a rack or a unicorn problem? > >> > > >> > It''s a client or proxy problem. > >> > > >> > The request was too large to be transferred within your configured > >> > timeout, or the client or proxy layer was too slow at transferring the > >> > POST to unicorn, or the host running unicorn was too overloaded/slow > >> > to buffer the request. > >> > > >> > Factors: > >> > 1) Disk/filesystem/memory speed on the (client|proxy) talking to > >> unicorn > >> > 2) Disk/filesystem/memory speed on the host running unicorn. > >> > 3) The network link between the (client|proxy) <-> unicorn. > >> > > >> > I don''t know about Heroku, but nginx will fully buffer the request > >> body > >> > before sending to unicorn, so all 3 factors are within your control. > >> > > >> > Does Heroku limit (or allow limiting of) the size of request bodies? > >> > > >> > Maybe a bad client sent a gigantic request. nginx limits request > >> bodies > >> > to 1M by default (client_max_body_size config directive). > >> > > >> > [1] unicorn buffers request bodies to TMPDIR via TeeInput > >> > > >> > >> I agree with you if the POST request has a file to upload, but the > >> requests we''re dealing with do not upload a file and are actually quite > >> small. > > > > Do you have error logs from the proxy Heroku uses? > > The only output Heroku proxy (router called at Heroku) gives is something > like this: > 2013-03-11 15:55:58+00:00 heroku router - - at=info method=POST > path=/api/v1/games/2257907/move.json host=aaa.bbb.ccc > fwd="xxx.xxx.xxx.xxx/NX" dyno=web.1 queue=0 wait=8ms connect=9ms > service=5019ms status=500 bytes=0 > > > Even with small requests, clients/networks can fail to send the entire > > request. nginx will log prematurely aborted client requests; check > > if whatever proxy Heroku uses does the same. > > I think it''s hard to debug with Heroku as the router doesn''t give more > output than this. I''m in contact with Heroku and I''ll try to debug this > with them. For me I have 2 problems with this issue, the waiting affects > other requests and it''s happening frequently (say multiple times per > minute at 2k-4k rpm). How can I reduce the impact while not aborting valid > requests?The best way is to use nginx. Otherwise, lowering your rack-timeout is probably the best way More below...> >> Can I modify the my customized rack-timeout gem to get more information > >> to > >> debug this problem? > >> https://github.com/tompesman/rack-timeout/blob/master/lib/rack/timeout.rb > > > > Your env.inspect should show you @bytes_read in the Unicorn::TeeInput > > object before the timeout was hit. > > This is the output of env.inspect: > https://gist.github.com/tompesman/1217d4f02aa33efcd873 > > It shows a CONTENT_LENGTH of 1351 and the "rack.input" => > Unicorn::TeeInput -> @bytes_read == 0. > > So what is happening here? > > 1. Heroku router/proxy sends the header to unicorn. > 2. Unicorn receives the request and gives the request to Rack > 3. Rack receives the request and asks Unicorn for the contents of the request > 4. Unicorn should give the content to Rack. Now it gets interesting, how > can I see the raw contents of the request? Or are we certain that it isn''t > there because @bytes_read == 0?The request body doesn''t seem to be there, presumably because Heroku isn''t sending it. Doe heroku fully buffer the request body before sending it to unicorn? nginx fully buffers, and this is why I can only recommend nginx for slow clients. The proxy -> unicorn transfer speed should _never_ be dependent by the client -> proxy transfer speed. 1) client ------------------ (slow) --------------> proxy 2) proxy (nginx) --- (as fast as the connection goes) ---> unicorn With nginx, 1) and 2) are decoupled and serialized. This adds latency, but is the only way for multiprocess servers like unicorn to efficiently handle slow clients.
> The request body doesn''t seem to be there, presumably because Heroku > isn''t sending it. > > Doe heroku fully buffer the request body before sending it to unicorn? > nginx fully buffers, and this is why I can only recommend nginx for slow > clients. > > The proxy -> unicorn transfer speed should _never_ be dependent by the > client -> proxy transfer speed. > > 1) client ------------------ (slow) --------------> proxy > 2) proxy (nginx) --- (as fast as the connection goes) ---> unicorn > > With nginx, 1) and 2) are decoupled and serialized. This adds latency, > but is the only way for multiprocess servers like unicorn to efficiently > handle slow clients.I''ve some new information. Heroku buffers the headers of a HTTP request but it doesn''t buffer the body of POST requests. Because of that I switched to Rainbows! and the responsiveness of the application increased dramatically. Right now I''m using this configuration: worker_processes 4 timeout 15 preload_app true Rainbows! do use :EventMachine worker_connections 50 client_max_body_size 5*1024*1024 # 5 megabytes client_header_buffer_size 2 * 1024 # 2 kilobytes timeout 10 end With high load the performance drops, is EventMachine the right choice for this situation (Rails application with slow POST requests and with up to 50% POST requests)? Will increasing the worker_connections help? I''m planning to make a blog post about this and tell Heroku not to advise Unicorn for Rails applications but to use Rainbows and suggest a correct/optimised configuration. Thanks! Tom
Tom Pesman <tom at tnux.net> wrote:> > The request body doesn''t seem to be there, presumably because Heroku > > isn''t sending it. > > > > Doe heroku fully buffer the request body before sending it to unicorn? > > nginx fully buffers, and this is why I can only recommend nginx for slow > > clients. > > > > The proxy -> unicorn transfer speed should _never_ be dependent by the > > client -> proxy transfer speed. > > > > 1) client ------------------ (slow) --------------> proxy > > 2) proxy (nginx) --- (as fast as the connection goes) ---> unicorn > > > > With nginx, 1) and 2) are decoupled and serialized. This adds latency, > > but is the only way for multiprocess servers like unicorn to efficiently > > handle slow clients. > > I''ve some new information. Heroku buffers the headers of a HTTP > request but it doesn''t buffer the body of POST requests. Because of > that I switched to Rainbows! and the responsiveness of the application > increased dramatically.Wow, the Heroku proxy is completely unsuitable for unicorn because of the lack of body buffering. Heroku really needs to do body buffering to support unicorn properly (perhaps use nginx...). It''s a shame they charge people to deploy unicorn improperly like this.> Right now I''m using this configuration: > > worker_processes 4 > timeout 15 > preload_app true > > Rainbows! do > use :EventMachine > worker_connections 50 > client_max_body_size 5*1024*1024 # 5 megabytes > client_header_buffer_size 2 * 1024 # 2 kilobytes > timeout 10 > end > > With high load the performance drops, is EventMachine the right choice > for this situation (Rails application with slow POST requests and with > up to 50% POST requests)? Will increasing the worker_connections help?Probably not, at least it won''t improve _consistency_ of performance without changing your app. The problem with buffering in Rainbows!+EM is the buffering happens after the work is distributed to different worker processes (Rainbows!+EM is still single-threaded).> I''m planning to make a blog post about this and tell Heroku not to > advise Unicorn for Rails applications but to use Rainbows and suggest > a correct/optimised configuration.The easiest solution is to use nginx with unicorn, I still believe this is the best configuration for most web apps. I don''t believe there''s a universal Rainbows! configuration which is good for most apps. With Rainbows!+EM, the buffering happens in a late stage, after workload is distributed, so you end up with head-of-queue blocking inside each process. All of the modules _without_ "rack.input streaming" will fully buffer responses (similar to how Rainbows!+EM) does it: http://rainbows.rubyforge.org/Summary.html However, CoolioThread{Pool/Spawn} should help you with the work load distribution problem if your app is thread-safe, but these both do body buffering. However, the multithreaded options _with_ "rack.input streaming" still give pretty good performance and let you get around the workload distribution problem of the single-threaded + internal buffering options. You''ll reduce disk seek overhead with input streaming. For reference, nginx+unicorn is single-threaded, but with external buffering (nginx does the buffering, not unicorn).
> Probably not, at least it won''t improve _consistency_ of performance > without changing your app. > > The problem with buffering in Rainbows!+EM is the buffering happens after > the work is distributed to different worker processes (Rainbows!+EM is > still single-threaded). > > I don''t believe there''s a universal Rainbows! configuration which is > good for most apps. With Rainbows!+EM, the buffering happens in a late > stage, after workload is distributed, so you end up with head-of-queue > blocking inside each process. > > All of the modules _without_ "rack.input streaming" will fully buffer > responses (similar to how Rainbows!+EM) does it: > http://rainbows.rubyforge.org/Summary.html > > However, CoolioThread{Pool/Spawn} should help you with the work load > distribution problem if your app is thread-safe, but these both do body > buffering. > > However, the multithreaded options _with_ "rack.input streaming" still > give pretty good performance and let you get around the workload > distribution problem of the single-threaded + internal buffering > options. You''ll reduce disk seek overhead with input streaming. > > For reference, nginx+unicorn is single-threaded, but with external > buffering (nginx does the buffering, not unicorn).Thank you for your extensive response! To verify my thoughts on this, if I want to prevent the head-of-queue blocking behaviour, I need to take a module without rack.input_streaming. But we need to make sure the body is buffered before it''s given to a worker. On which property do I select the correct module? Are Coolio modules still a sensible choice as the Coolio project isn''t actively maintained?
Tom Pesman <tom at tnux.net> wrote:> > Probably not, at least it won''t improve _consistency_ of performance > > without changing your app. > > > > The problem with buffering in Rainbows!+EM is the buffering happens after > > the work is distributed to different worker processes (Rainbows!+EM is > > still single-threaded). > > > > I don''t believe there''s a universal Rainbows! configuration which is > > good for most apps. With Rainbows!+EM, the buffering happens in a late > > stage, after workload is distributed, so you end up with head-of-queue > > blocking inside each process. > > > > All of the modules _without_ "rack.input streaming" will fully buffer > > responses (similar to how Rainbows!+EM) does it: > > http://rainbows.rubyforge.org/Summary.html > > > > However, CoolioThread{Pool/Spawn} should help you with the work load > > distribution problem if your app is thread-safe, but these both do body > > buffering. > > > > However, the multithreaded options _with_ "rack.input streaming" still > > give pretty good performance and let you get around the workload > > distribution problem of the single-threaded + internal buffering > > options. You''ll reduce disk seek overhead with input streaming. > > > > For reference, nginx+unicorn is single-threaded, but with external > > buffering (nginx does the buffering, not unicorn). > > Thank you for your extensive response! > > To verify my thoughts on this, if I want to prevent the head-of-queue > blocking behaviour, I need to take a module without > rack.input_streaming. But we need to make sure the body is buffered > before it''s given to a worker. On which property do I select the > correct module?Erm, no. You''ll have head-of-queue blocking behavior with any single-threaded Rainbows! option and worker_connections > 1 Unless you rely on Cool.io or EM-specific features in your app, :EventMachine and :Coolio (and :Epoll/:XEpoll for Linux-only) are the same from an app-perspective in Rainbows! Using multi-threaded application dispatch (CoolioThread{Pool,Spawn}) will mitigate and reduce head-of-queue blocking. Likewise for using XEpollThreadPool/XEpollThreadSpawn (all the *Epoll* stuff is Linux-only) Alternatively, you won''t get _any_ head-of-queue blocking in a multi-threaded app, you can use Thread{Pool,Spawn}. Sorry for the confusion.> Are Coolio modules still a sensible choice as the Coolio project isn''t > actively maintained?Fwiw, Cool.io works pretty well in my experience. Last I checked, EM still had a problem by calling close(2) directly instead of rb_io_close(), triggering Errno::EBADF. There''s a bug opened by somebody on the ruby-core team, but I''m not sure if it ever got resolved... I can also help fix Cool.io bugs since it''s written in C, but I can''t fix EM bugs: C++ is too big for my tiny brain.
On Wed, Apr 3, 2013 at 6:36 AM, Eric Wong <normalperson at yhbt.net> wrote:> Fwiw, Cool.io works pretty well in my experience.[...]> I can also help fix Cool.io bugs since it''s written in C, but I can''t > fix EM bugs: C++ is too big for my tiny brain.Last time I tried Cool.io it would occasionally crash with assertion errors. It''s been a while from now, but I guess since there''s no much development happened for a long time, it won''t change too much right now. I could try to get the assertion errors again and see if anyone wants to step in, but personally since the author is heading to celluloid-io, I am not too interested in fixing Cool.io... Off topic, I have some EM issue with Ruby 2.0, which would hang on our tests and after interrupting the hung test, the following tests would yield: "RuntimeError: eventmachine not initialized: evma_signal_loopbreak" I have no idea what''s going on... :( I hope we could leave EM soon...
"Lin Jen-Shin (godfat)" <godfat at godfat.org> wrote:> On Wed, Apr 3, 2013 at 6:36 AM, Eric Wong <normalperson at yhbt.net> wrote: > > Fwiw, Cool.io works pretty well in my experience. > [...] > > I can also help fix Cool.io bugs since it''s written in C, but I can''t > > fix EM bugs: C++ is too big for my tiny brain. > > Last time I tried Cool.io it would occasionally crash with assertion errors. > It''s been a while from now, but I guess since there''s no much development > happened for a long time, it won''t change too much right now. > > I could try to get the assertion errors again and see if anyone wants to step > in, but personally since the author is heading to celluloid-io, I am not too > interested in fixing Cool.io...I''d be willing to look into this and send a fix to coolio upstream if you can help me reproduce it. No guarantees there''ll be a release even if I patch it, though, that''s up to Tony Arcieri (Coolio maintainer).