First off: Our clusters are LVS balanced Apache 2.2.3 + mod_proxy_balancer + gem mongrel 0.3.13.3 / mongrel_cluster 0.2 + memcached / gem memcache_client + gem rails 1.1.6 on debian boxen, and a pgcluster backend. On 2 of our deployed clusters, we are getting the "spinning mongrel" problem. As the clusters are very low volume right now, it takes days to collect a spinner, making it difficult to debug what the problem might be. From what I''ve been following on the list, the appropriate next debugging step is to send a SIGUSR1 signal to the spinning mongrel to get it to spit out what it thinks it is currently running. All we get from this is the following: /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/lib/mongrel.rb:982:in `join'': SIGUSR1 (SignalException) from /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/lib/mongrel.rb:982:in `join'' from /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/lib/mongrel.rb:982:in `join'' from /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/bin/mongrel_rails:136:in `run'' from /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/lib/mongrel/command.rb:199:in `run'' from /usr/lib/ruby/gems/1.8/gems/mongrel-0.3.13.3/bin/mongrel_rails:235 from /usr/bin/mongrel_rails:18 And this is only when a mongrel responds to the SIGUSR1 signal. The same output everytime. About 10-20% of the time, a mongrel process won''t respond to SIGUSR1, SIGHUP, or anything other than a SIGKILL. In either case, what should my next step be? The current plan is to run mongrel in debugging mode and beat the hell out of it, hoping to trigger the spinning response. After (during?) this, I guess I''ll strace / gdb it to death. Any other suggestions? - Ian C. Blenke <ian at blenke.com> http://ian.blenke.com/
On Wed, 2006-09-20 at 18:26 -0400, Ian C. Blenke wrote:> First off: Our clusters are LVS balanced Apache 2.2.3 + > mod_proxy_balancer + gem mongrel 0.3.13.3 / mongrel_cluster 0.2 + > memcached / gem memcache_client + gem rails 1.1.6 on debian boxen, and a > pgcluster backend. > > On 2 of our deployed clusters, we are getting the "spinning mongrel" > problem. As the clusters are very low volume right now, it takes days to > collect a spinner, making it difficult to debug what the problem might be. > > From what I''ve been following on the list, the appropriate next > debugging step is to send a SIGUSR1 signal to the spinning mongrel to > get it to spit out what it thinks it is currently running.USR2 is the debugging signal. You actually want to turn it on a few minutes after they start, and it''s safe to turn on in production (not much of a performance hit). -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
On Wed, 2006-09-20 at 19:22 -0700, Zed Shaw wrote:> On Wed, 2006-09-20 at 18:26 -0400, Ian C. Blenke wrote: > > First off: Our clusters are LVS balanced Apache 2.2.3 + > > mod_proxy_balancer + gem mongrel 0.3.13.3 / mongrel_cluster 0.2 + > > memcached / gem memcache_client + gem rails 1.1.6 on debian boxen, and a > > pgcluster backend. > > > > On 2 of our deployed clusters, we are getting the "spinning mongrel" > > problem. As the clusters are very low volume right now, it takes days to > > collect a spinner, making it difficult to debug what the problem might be. > > > > From what I''ve been following on the list, the appropriate next > > debugging step is to send a SIGUSR1 signal to the spinning mongrel to > > get it to spit out what it thinks it is currently running. > > USR2 is the debugging signal. You actually want to turn it on a few > minutes after they start, and it''s safe to turn on in production (not > much of a performance hit). >I''m such an idiot. It is USR1. You''ve got the 0.3.13.3 version of mongrel, so you need to upgrade if you want to get the USR1 debugging. Sorry. Zed
Zed Shaw <zedshaw at zedshaw.com> writes:> On Wed, 2006-09-20 at 18:26 -0400, Ian C. Blenke wrote: >> First off: Our clusters are LVS balanced Apache 2.2.3 + >> mod_proxy_balancer + gem mongrel 0.3.13.3 / mongrel_cluster 0.2 + >> memcached / gem memcache_client + gem rails 1.1.6 on debian boxen, and a >> pgcluster backend. >> >> On 2 of our deployed clusters, we are getting the "spinning mongrel" >> problem. As the clusters are very low volume right now, it takes days to >> collect a spinner, making it difficult to debug what the problem might be. >> >> From what I''ve been following on the list, the appropriate next >> debugging step is to send a SIGUSR1 signal to the spinning mongrel to >> get it to spit out what it thinks it is currently running. > > USR2 is the debugging signal. You actually want to turn it on a few > minutes after they start, and it''s safe to turn on in production (not > much of a performance hit).Uh, USR2 is "restart", right? At least that''s what Mongrel claims when it starts up. Steve