We''re using unicornctl restart with the default before/after hook behavior, which is to reap old Unicorn workers via SIGQUIT after the new one has finished booting. Unfortunately, while the new workers are forking and begin processing requests, we''re still seeing significant spikes in our haproxy request queue. It seems as if after we restart, the unwarmed workers get swamped by the incoming requests. As far as I can tell, the momentary loss of capacity we experience translates fairly quickly into a thundering herd. We''ve experimented with rolling restarts at the server level but these do not resolve the problem. I''m curious if we could do a more granular application-level rolling restart, perhaps using TTOU instead of QUIT to progressively dial down the old workers one-at-a-time, and forking new ones to replace them incrementally. Anyone tried anything like that before? Or are there any other suggestions? (short of "add more capacity") -- Tony Arcieri<div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Nov 29, 2012 at 2:50 PM, Tony Arcieri <span dir="ltr"><<a href="mailto:tony.arcieri at gmail.com" target="_blank">tony.arcieri at gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">We''re using unicornctl restart with the default before/after hook behavior, which is to reap old Unicorn workers via SIGQUIT after the new one has finished booting.<div><br></div><div>Unfortunately, while the new workers are forking and begin processing requests, we''re still seeing significant spikes in our haproxy request queue. It seems as if after we restart, the unwarmed workers get swamped by the incoming requests. As far as I can tell, the momentary loss of capacity we experience translates fairly quickly into a thundering herd.</div> <div><div><br></div><div>We''ve experimented with rolling restarts at the server level but these do not resolve the problem.</div><div><br></div><div>I''m curious if we could do a more granular application-level rolling restart, perhaps using TTOU instead of QUIT to progressively dial down the old workers one-at-a-time, and forking new ones to replace them incrementally. Anyone tried anything like that before?</div> <div><br></div><div>Or are there any other suggestions? (short of "add more capacity")</div><span class="HOEnZb"><font color="#888888"><div><br></div>-- <br>Tony Arcieri<br><br> </font></span></div> </blockquote></div><br><br clear="all"><div><br></div>-- <br>Tony Arcieri<br><br> </div>
I remember seeing a gist of a cap script a year or so ago that did something like you''re suggesting with TTOU. I know unicorn supports TTOU, but I''ve never personally done anything different than just using QUIT. - Alex Sharp
Tony Arcieri <tony.arcieri at gmail.com> wrote:> We''re using unicornctl restart with the default before/after hook > behavior, which is to reap old Unicorn workers via SIGQUIT after the > new one has finished booting. > > Unfortunately, while the new workers are forking and begin processing > requests, we''re still seeing significant spikes in our haproxy request > queue. It seems as if after we restart, the unwarmed workers get > swamped by the incoming requests. As far as I can tell, the momentary > loss of capacity we experience translates fairly quickly into a > thundering herd. > > We''ve experimented with rolling restarts at the server level but these > do not resolve the problem.So it''s one haproxy -> multiple nginx/unicorn servers? Do you mark the server down or lower the weight in haproxy when deploying the Ruby app? (Perhaps using monitor-uri in haproxy) If the server is down to haproxy, but still up, you can send some warmup requests to the server before enabling the monitor-uri for haproxy.
> Unfortunately, while the new workers are forking and begin processing > requests, we''re still seeing significant spikes in our haproxy request > queue. It seems as if after we restart, the unwarmed workers get > swamped by the incoming requests.Perhaps it''s possible to warm up the workers in the unicorn after_fork block? Cheers, Lawrence
On Thu, Nov 29, 2012 at 3:32 PM, Eric Wong <normalperson at yhbt.net> wrote:> So it''s one haproxy -> multiple nginx/unicorn servers?Confirm> Do you mark the server down or lower the weight in haproxy when > deploying the Ruby app? (Perhaps using monitor-uri in haproxy)I assume you''re talking about when we did rolling restarts at the server level? We don''t do this presently as we deploy to all servers at the same time. Doing single-server-at-a-time deploys still resulted in a backlog in our queue, which speaks to our capacity problems. It''s being worked on, but we''re looking for an interim solution. It''d also be nice not to have to do this as it further extends the length of our (already gruelingly long) deploy process. -- Tony Arcieri
On Thu, Nov 29, 2012 at 3:34 PM, Lawrence Pit <lawrence.pit at gmail.com> wrote:> > Perhaps it''s possible to warm up the workers in the unicorn after_fork block?Are people doing this in production (i.e. moving the termination of the old master from before_fork to after_fork)? My worry is that during this warming process you will have 2X the normal number of Unicorn workers active at the same time, which could potentially lead to exhausting of system resources (i.e. RAM) -- Tony Arcieri
Tony Arcieri <tony.arcieri at gmail.com> wrote:> On Thu, Nov 29, 2012 at 3:34 PM, Lawrence Pit <lawrence.pit at gmail.com> wrote: > > > > Perhaps it''s possible to warm up the workers in the unicorn after_fork block? > > Are people doing this in production (i.e. moving the termination of > the old master from before_fork to after_fork)? My worry is that > during this warming process you will have 2X the normal number of > Unicorn workers active at the same time, which could potentially lead > to exhausting of system resources (i.e. RAM)I haven''t done any terminations in the *_fork hooks for a long time. I just let 2x the normal workers run for a bit before sending SIGQUIT. That said, I usually have plenty of RAM (and DB connections) to spare. Excessive CPU-bound loads are handled very well nowadays.
On 11/29/12 3:34 PM, Lawrence Pit wrote:>> Unfortunately, while the new workers are forking and begin processing >> requests, we''re still seeing significant spikes in our haproxy request >> queue. It seems as if after we restart, the unwarmed workers get >> swamped by the incoming requests. > > Perhaps it''s possible to warm up the workers in the unicorn after_fork block?I''ve successfully applied this methodology to a nasty rails app that had a lot of latent initialization upon first request. Each worker gets a unique private secondary listen port and each worker sends a warm-up request to a prior worker in the after_fork hook. In our environment our load balancer drains each host as it''s being deployed, and this does effect the length of deployment across many hosts in a cluster, but the warmup bucket brigade is effective at making sure workers on that host are responsive when they get added back to the available pool. A better solution is to use a profiler to identify what extra work is being done when an unwarm worker gets its first request and move that work into an initialization step which occurs before fork when run with app preload enabled.
On Thu, Nov 29, 2012 at 5:28 PM, Devin Ben-Hur <dbenhur at whitepages.com> wrote:> A better solution is to use a profiler to identify what extra work is being > done when an unwarm worker gets its first request and move that work into an > initialization step which occurs before fork when run with app preload > enabled.I''ve done that, unfortunately that work is connection setup which must happen after forking or otherwise file descriptors would wind up shared between processes. -- Tony Arcieri
In my experience high loads and contention are a common issue when restarting the unicorn master process. In a previous project we dealt with this by 1) performing some warmup requests in the master before starting to fork workers; 2) replacing old workers slowly by having each new worker send a TTOU to the old master in after_fork and having the new master sleep for a couple of seconds between spawning workers. It was a couple of years ago so the details are not fresh but iirc before tuning a restart took 5-10 seconds followed by load climbing to 10-20 (on a 4 proc machine) with a 2-5 minute slow recovery of long request times. In particularly pathological cases requests can start timing out which results in workers being killed and new workers needing to warm up on and already overloaded system. After tuning the rolling restart took 30-40 seconds but the load barely budged and the request processing times stayed constant. .seth On Nov 29, 2012, at 5:24 PM, Eric Wong <normalperson at yhbt.net> wrote:> Tony Arcieri <tony.arcieri at gmail.com> wrote: >> On Thu, Nov 29, 2012 at 3:34 PM, Lawrence Pit <lawrence.pit at gmail.com> wrote: >>> >>> Perhaps it''s possible to warm up the workers in the unicorn after_fork block? >> >> Are people doing this in production (i.e. moving the termination of >> the old master from before_fork to after_fork)? My worry is that >> during this warming process you will have 2X the normal number of >> Unicorn workers active at the same time, which could potentially lead >> to exhausting of system resources (i.e. RAM) > > I haven''t done any terminations in the *_fork hooks for a long time. > I just let 2x the normal workers run for a bit before sending SIGQUIT. > > That said, I usually have plenty of RAM (and DB connections) to spare. > Excessive CPU-bound loads are handled very well nowadays. > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying
On Thu, Nov 29, 2012 at 3:32 PM, Eric Wong <normalperson at yhbt.net> wrote:> If the server is down to haproxy, but still up, you can send some warmup > requests to the server before enabling the monitor-uri for haproxy.I''ve heard various solutions for exactly how to do warmup in this thread. Anyone have specific recommendations? Should I spin off a background thread that hits the local instance with a request/requests then does the SIGQUIT-style switchover? -- Tony Arcieri
Tony Arcieri <tony.arcieri at gmail.com> wrote:> On Thu, Nov 29, 2012 at 3:32 PM, Eric Wong <normalperson at yhbt.net> wrote: > > If the server is down to haproxy, but still up, you can send some warmup > > requests to the server before enabling the monitor-uri for haproxy. > > I''ve heard various solutions for exactly how to do warmup in this > thread. Anyone have specific recommendations? Should I spin off a > background thread that hits the local instance with a request/requests > then does the SIGQUIT-style switchover?I usually put that logic in the deployment script (probably just with "curl -sf"), but a background thread would probably work. I think it''s a good idea anyways to ensure your newly-deployed app is configured and running properly before throwing real traffic for it.
On Fri, Nov 30, 2012 at 2:27 PM, Eric Wong <normalperson at yhbt.net> wrote:> I usually put that logic in the deployment script (probably just > with "curl -sf"), but a background thread would probably work.Are you doing something different than unicornctl restart? It seems like with unicornctl restart 1) our deployment automation doesn''t know when the restart has finished, since unicornctl is just sending signals 2) we don''t have any way to send requests specifically to the new worker instead of the old one Perhaps I''m misreading the unicorn source code, but here''s what I see happening: 1) old unicorn master forks a new master. They share the same TCP listen socket, but only the old master continues accepting requests 2) new master loads the Rails app and runs the before_fork hook. It seems like normally this hook would send SIGQUIT to the new master, causing it to close its TCP listen socket 3) new master forks and begins accepting on the TCP listen socket 4) new workers run the after_fork hook and begin accepting requests It seems like if we remove the logic which reaps the old master in the before_fork hook and attempt to warm the workers in the after_fork hook, then we''re stuck in a state where both the old master and new master are accepting requests but the new workers have not yet been warmed up. Is this the case, and if so, is there a way we can prevent the new master from accepting requests until warmup is complete? Or how would we change the way we restart unicorn to support our deployment automation (Capistrano, in this case) handling starting and healthchecking a new set of workers? Would we have to start the new master on a separate port and use e.g. nginx to handle the switchover? Something which doesn''t involve massive changes to the way we presently restart Unicorm (i.e. unicornctl restart) would probably be the most practical solution for us. We have a "real solution" for all of these problems in the works. What I''m looking for in the interim is a band-aid. -- Tony Arcieri
Tony Arcieri <tony.arcieri at gmail.com> wrote:> On Fri, Nov 30, 2012 at 2:27 PM, Eric Wong <normalperson at yhbt.net> wrote: > > I usually put that logic in the deployment script (probably just > > with "curl -sf"), but a background thread would probably work. > > Are you doing something different than unicornctl restart? It seems > like with unicornctl restartI''m actually not sure what "unicornctl" is... Is it this? https://gist.github.com/1207003 I normally use a shell script (similar to examples/init.sh) in the unicorn source tree.> 1) our deployment automation doesn''t know when the restart has > finished, since unicornctl is just sending signals > 2) we don''t have any way to send requests specifically to the new > worker instead of the old one > > Perhaps I''m misreading the unicorn source code, but here''s what I see happening: > > 1) old unicorn master forks a new master. They share the same TCP > listen socket, but only the old master continues accepting requestsCorrect.> 2) new master loads the Rails app and runs the before_fork hook. It > seems like normally this hook would send SIGQUIT to the new master, > causing it to close its TCP listen socketCorrect, if you''re using preload_app true. Keep in mind you''re never required to use the before_fork hook to send SIGQUIT.> 3) new master forks and begins accepting on the TCP listen socketaccept() never runs on the master, only workers.> 4) new workers run the after_fork hook and begin accepting requestsInstead of sending HTTP requests to warmup, can you put internal warmup logic in your after_fork hook? The worker won''t accept a request until after_fork is done running. Hell, maybe you can even use Rack::Mock in your after_fork to fake requests w/o going through sockets. (random idea, I''ve never tried it)> It seems like if we remove the logic which reaps the old master in the > before_fork hook and attempt to warm the workers in the after_fork > hook, then we''re stuck in a state where both the old master and new > master are accepting requests but the new workers have not yet been > warmed up.Yes, but if you have enough resources, the split should be even> Is this the case, and if so, is there a way we can prevent the new > master from accepting requests until warmup is complete?If the new processes never accept requests, can they ever complete warm up? :)> Or how would we change the way we restart unicorn to support our > deployment automation (Capistrano, in this case) handling starting and > healthchecking a new set of workers?> Would we have to start the new > master on a separate port and use e.g. nginx to handle the switchover?Maybe using a separate port for the new master will work.> Something which doesn''t involve massive changes to the way we > presently restart Unicorm (i.e. unicornctl restart) would probably be > the most practical solution for us. We have a "real solution" for all > of these problems in the works. What I''m looking for in the interim is > a band-aid.It sounds like you''re really in a bad spot :< Honestly I''ve never had these combinations of problems to deal with.
Reasonably Related Threads
- preload_app = true causing - ActiveModel::MissingAttributeError: missing attribute: some_attr
- AMQP and Unicorn (mq gem)
- [PATCH] rework master-to-worker signaling to use a pipe
- unicorn doesn't restart properly after cap deploy (not using Bundler)
- Unicorn: UNIX+localhost/LAN-only Mongrel fork