Mark Mccraw
2012-Jul-17 00:33 UTC
Any signal other than -9 causes full CPU utilization by master unicorn process on FreeBSD
Hi There! I''m having a devil of a time figuring out a weird issue I''m running into. I have unicorn configured to start 4 worker processes, and that works great. However, when it''s time to cycle the app, everything goes haywire. By trial and error, I have narrowed it down to this: sending any signal to the master process other than SIGKILL fails miserably. No new master process is created, as described in the documentation, nothing happens to the existing workers, nothing gets written to any log, and if I run top -u, I can see that very quickly the master ramps up to 100% CPU utilization. This happens if I run ''kill -HUP <master pid>'', ''kill -USR2 <master pid>'', even ''kill -QUIT <master pid>''. Here''s what I''m running on: uname -a FreeBSD bb20web04.unx.sas.com 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root at farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 ruby -v ruby 1.9.3p0 (2011-10-30 revision 33570) [amd64-freebsd9] gem list | grep unicorn unicorn (4.3.1) My unicorn.rb file is pasted at the bottom. It should be noted that I have tried every permutation of this I can think of to narrow out the problematic part (set preload_app to false, comment out preload_app), comment out before_exec, before_fork, after_fork, comment out the START_CTX[0] bit, etc), but things always fail the same way, so I''m guessing it''s not the config, but I''m open to anything. Any suggestions at all are greatly appreciated. I''d love to know how to interrupt the master process when it''s slamming the CPU and get a stack trace, but I have no idea how in ruby. Any thoughts? Thanks! Mark APP_ROOT="/usr/local/rails/partsdb/current" working_directory APP_ROOT pid "#{APP_ROOT}/tmp/pids/unicorn.pid" stderr_path "#{APP_ROOT}/log/unicorn.log" stdout_path "#{APP_ROOT}/log/unicorn.log" Unicorn::HttpServer::START_CTX[0] = "#{APP_ROOT}/bin/unicorn" rails_env = ENV[''RAILS_ENV''] || ''production'' worker_processes 4 timeout 120 # Speed up worker spawn times preload_app true listen "/tmp/unicorn.sock", :backlog => 10 listen "bb20web04:8080", :backlog => 1024 before_exec do |server| ENV["BUNDLE_GEMFILE"] = "#{APP_ROOT}/Gemfile" end before_fork do |server, worker| ## # When sent a USR2, Unicorn will suffix its pidfile with .oldbin and # immediately start loading up a new version of itself (loaded with a new # version of our app). When this new Unicorn is completely loaded # it will begin spawning workers. The first worker spawned will check to # see if an .oldbin pidfile exists. If so, this means we''ve just booted up # a new Unicorn and need to tell the old one that it can now die. To do so # we send it a QUIT. # # Using this method we get 0 downtime deploys. if defined?(ActiveRecord::Base) ActiveRecord::Base.connection.disconnect! end old_pid = APP_ROOT + ''/tmp/pids/unicorn.pid.oldbin'' if File.exists?(old_pid) && server.pid != old_pid begin Process.kill("QUIT", File.read(old_pid).to_i) rescue Errno::ENOENT, Errno::ESRCH # someone else did our job for us end end end after_fork do |server, worker| ## # Unicorn master loads the app then forks off workers - because of the way # Unix forking works, we need to make sure we aren''t using any of the parent''s # sockets, e.g. db connection if defined?(ActiveRecord::Base) ActiveRecord::Base.establish_connection end # Redis and Memcached would go here but their connections are established # on demand, so the master never opens a socket end
Eric Wong
2012-Jul-17 02:05 UTC
Any signal other than -9 causes full CPU utilization by master unicorn process on FreeBSD
Mark Mccraw <Mark.Mccraw at sas.com> wrote:> Hi There! > > I''m having a devil of a time figuring out a weird issue I''m running > into. I have unicorn configured to start 4 worker processes, and that > works great. However, when it''s time to cycle the app, everything > goes haywire. By trial and error, I have narrowed it down to this: > sending any signal to the master process other than SIGKILL fails > miserably. No new master process is created, as described in the > documentation, nothing happens to the existing workers, nothing gets > written to any log, and if I run top -u, I can see that very quickly > the master ramps up to 100% CPU utilization. This happens if I run > ''kill -HUP <master pid>'', ''kill -USR2 <master pid>'', even ''kill -QUIT > <master pid>''.This sounds like a Ruby/FreeBSD bug we''ve seen before. My script in http://mid.gmane.org/20120201181445.GA31624 at dcvr.yhbt.net should reproduce the issue w/o unicorn.> ruby 1.9.3p0 (2011-10-30 revision 33570) [amd64-freebsd9]I think this is a Ruby bug that was fixed in 1.9.3-p30 according to naruse: http://mid.gmane.org/CAK6HhsppWVPijWLyZMwcKueYDT5sZroGv6ADXkgreht3aLfR9A at mail.gmail.com Since 1.9.3 p194 is the latest, can you try that out and confirm the fix? I don''t remember the other bug reported confirmed this issue was fixed by upgrading Ruby. Thanks.
Mark Mccraw
2012-Jul-17 11:56 UTC
Any signal other than -9 causes full CPU utilization by master unicorn process on FreeBSD
On Jul 16, 2012, at 10:05 PM, Eric Wong wrote:> Mark Mccraw <Mark.Mccraw at sas.com> wrote: >> Hi There! >> >> I''m having a devil of a time figuring out a weird issue I''m running >> into. I have unicorn configured to start 4 worker processes, and that >> works great. However, when it''s time to cycle the app, everything >> goes haywire. By trial and error, I have narrowed it down to this: >> sending any signal to the master process other than SIGKILL fails >> miserably. No new master process is created, as described in the >> documentation, nothing happens to the existing workers, nothing gets >> written to any log, and if I run top -u, I can see that very quickly >> the master ramps up to 100% CPU utilization. This happens if I run >> ''kill -HUP <master pid>'', ''kill -USR2 <master pid>'', even ''kill -QUIT >> <master pid>''. > > This sounds like a Ruby/FreeBSD bug we''ve seen before. My script > in http://mid.gmane.org/20120201181445.GA31624 at dcvr.yhbt.net should > reproduce the issue w/o unicorn.You are absolutely correct! Your script replicates the problem perfectly.>> ruby 1.9.3p0 (2011-10-30 revision 33570) [amd64-freebsd9] > > I think this is a Ruby bug that was fixed in 1.9.3-p30 according to > naruse: > http://mid.gmane.org/CAK6HhsppWVPijWLyZMwcKueYDT5sZroGv6ADXkgreht3aLfR9A at mail.gmail.com > > Since 1.9.3 p194 is the latest, can you try that out and confirm the > fix? I don''t remember the other bug reported confirmed this issue was > fixed by upgrading Ruby.We''re upgrading now to see what happens. I''m so glad you knew about this. There''s no telling how long it would have taken me to question the ruby interpreter implementation, and since it''s FreeBSD, I never would have found it by googling. Thanks for hours (days?) of my life back. Mark
Mark Mccraw
2012-Jul-17 21:23 UTC
Any signal other than -9 causes full CPU utilization by master unicorn process on FreeBSD
On Jul 17, 2012, at 7:56 AM, Mark McCraw wrote:> > On Jul 16, 2012, at 10:05 PM, Eric Wong wrote: > >> Mark Mccraw <Mark.Mccraw at sas.com> wrote: >>> Hi There! >>> >>> I''m having a devil of a time figuring out a weird issue I''m running >>> into. I have unicorn configured to start 4 worker processes, and that >>> works great. However, when it''s time to cycle the app, everything >>> goes haywire. By trial and error, I have narrowed it down to this: >>> sending any signal to the master process other than SIGKILL fails >>> miserably. No new master process is created, as described in the >>> documentation, nothing happens to the existing workers, nothing gets >>> written to any log, and if I run top -u, I can see that very quickly >>> the master ramps up to 100% CPU utilization. This happens if I run >>> ''kill -HUP <master pid>'', ''kill -USR2 <master pid>'', even ''kill -QUIT >>> <master pid>''. >> >> This sounds like a Ruby/FreeBSD bug we''ve seen before. My script >> in http://mid.gmane.org/20120201181445.GA31624 at dcvr.yhbt.net should >> reproduce the issue w/o unicorn. > > You are absolutely correct! Your script replicates the problem perfectly. > >>> ruby 1.9.3p0 (2011-10-30 revision 33570) [amd64-freebsd9] >> >> I think this is a Ruby bug that was fixed in 1.9.3-p30 according to >> naruse: >> http://mid.gmane.org/CAK6HhsppWVPijWLyZMwcKueYDT5sZroGv6ADXkgreht3aLfR9A at mail.gmail.com >> >> Since 1.9.3 p194 is the latest, can you try that out and confirm the >> fix? I don''t remember the other bug reported confirmed this issue was >> fixed by upgrading Ruby. > > We''re upgrading now to see what happens. I''m so glad you knew about this. > There''s no telling how long it would have taken me to question the ruby interpreter implementation, and > since it''s FreeBSD, I never would have found it by googling. > Thanks for hours (days?) of my life back. > > Mark >Just to follow up and close out the thread - Eric''s recollection was spot on. We upgraded ruby on our FreeBSD server to the latest thing, and the problem completely disappeared. Thanks again!
Eric Wong
2012-Jul-17 22:17 UTC
Any signal other than -9 causes full CPU utilization by master unicorn process on FreeBSD
Mark Mccraw <Mark.Mccraw at sas.com> wrote:> On Jul 17, 2012, at 7:56 AM, Mark McCraw wrote: > > We''re upgrading now to see what happens. I''m so glad you knew about > > this. There''s no telling how long it would have taken me to > > question the ruby interpreter implementation, and since it''s > > FreeBSD, I never would have found it by googling. Thanks for hours > > (days?) of my life back. > > Just to follow up and close out the thread - Eric''s recollection was > spot on. We upgraded ruby on our FreeBSD server to the latest thing, > and the problem completely disappeared. Thanks again!Thanks for confirming this fix! Fwiw, the Ruby core team probably uses/tests on GNU/Linux more than any other platform. Bugs on less common development platforms (especially w.r.t tricky thread/fork/signal handling issues) may go unnoticed elsewhere. If you''re focused on using Ruby + *BSD in a production system, I suggest testing/fixing/reporting issues against the Ruby development branches as much as possible before they hit production :)