thr3ads.net - Rails - [Rails] FastCGI processes sometimes ''hang'' [Jan 2006]

If this information is useful, please help other people find it:
Share via:

John Jeffery

2006-Jan-13 00:07 UTC

[Rails] FastCGI processes sometimes ''hang''

I am running a RoR application on Apache 1.3/RedHat 7.3/MySQL 3.1.23 
(Old versions I know, but upgrading to latest versions are not practical 
for a number of reasons). There are 5 RoR FastCGI processes configured 
using FastCgiServer.

What I am finding is that, after a while, some of the FastCGI processes 
seem to ''hang''. They no longer process requests, and the only
way to
remove them is to use "kill -9".

When all 5 FastCGI processes enter this state, my production site no 
longer works.

Has anyone else had a similar problem? Is there an elegant work-around 
that I can use to detect these dead processes and kill them?

John Jeffery

2006-Jan-13 00:31 UTC

head link

[Rails] Re: FastCGI processes sometimes ''hang''

I should probably also mention that I suspect that the problem has 
something to do with garbage collection.

My reason for thinking this is that I initially had the garbage 
collector configured to clean up every 25 requests:

ie:

    RailsFCGIHandler.process! nil, 25


But when I changed it back to automatic GC,

    RailsFCGIHandler.process!

I found that the processes ran for a significantly longer amount of 
time.


-- 
Posted via http://www.ruby-forum.com/.

Jon Smirl

2006-Jan-15 02:07 UTC

head link

[Rails] FastCGI processes sometimes ''hang''

On 1/12/06, John Jeffery <jjeffery@sp.com.au>
wrote:> I am running a RoR application on Apache 1.3/RedHat 7.3/MySQL 3.1.23
> (Old versions I know, but upgrading to latest versions are not practical
> for a number of reasons). There are 5 RoR FastCGI processes configured
> using FastCgiServer.
>
> What I am finding is that, after a while, some of the FastCGI processes
> seem to ''hang''. They no longer process requests, and the
only way to
> remove them is to use "kill -9".
>
> When all 5 FastCGI processes enter this state, my production site no
> longer works.
>
> Has anyone else had a similar problem? Is there an elegant work-around
> that I can use to detect these dead processes and kill them?
I am experiencing something similar. Apache at my hosting provider is
configured to send a signal -USR1 to the fcgi processes every four
hours in order to make them exit and restart.  What seems to be
happening is that the FCGI process receives the USR1 and doesn''t exit
until the next request. Meanwhile Apache thinks it has killed the
process and doesn''t send it any more requests. After a while I reach
my process limit with processes stuck in this state. kill -9 will kill
them and get things going again.

I have been playing around with changes to dispatch.fcgi, here''s my
current code but it isn''t always working correctly.

if ENV["RAILS_ENV"] == "production"

  ENV[''GEM_PATH'']=''/home/jonsmirl/gems''

  class MyRailsFCGIHandler < RailsFCGIHandler

    def initialize(log_file_path = nil, gc_request_period = nil)
      super(log_file_path, gc_request_period)
      trap(''TERM'', method(:exit_now_handler).to_proc);
    end

    def process!(provider = FCGI)
      # Make a note of $" so we can safely reload this instance.
      mark!

      run_gc! if gc_request_period

      usr1 = trap("USR1", "DEFAULT")
      provider.each_cgi do |cgi|
        trap("USR1", usr1)
        process_request(cgi)

        case when_ready
          when :reload
            reload!
          when :restart
            close_connection(cgi)
            restart!
          when :exit
            close_connection(cgi)
            break
        end

        gc_countdown
        trap("USR1", "DEFAULT")
      end

      GC.enable
      dispatcher_log :info, "terminated gracefully"

    rescue SystemExit => exit_error
      dispatcher_log :info, "terminated by explicit exit"

    rescue Object => fcgi_error
      # retry on errors that would otherwise have terminated the FCGI process,
      # but only if they occur more than 10 seconds apart.
      if !(SignalException === fcgi_error) && Time.now - @last_error_on
> 10
        @last_error_on = Time.now
        dispatcher_error(fcgi_error, "almost killed by this error")
        retry
      else
        dispatcher_error(fcgi_error, "killed by this error")
      end
    end

    def exit_now_handler(signal)
      dispatcher_log :info, "ignoring request to terminate
immediately"
    end
  end

  MyRailsFCGIHandler.process! nil, 50

else

  RailsFCGIHandler.process! nil, 50
end

--
Jon Smirl
jonsmirl@gmail.com

John Jeffery

2006-Jan-15 06:34 UTC

head link

[Rails] Re: FastCGI processes sometimes ''hang''

Jon Smirl wrote:> 
> I am experiencing something similar. Apache at my hosting provider is
> configured to send a signal -USR1 to the fcgi processes every four
> hours in order to make them exit and restart.  What seems to be
> happening is that the FCGI process receives the USR1 and doesn''t
exit
> until the next request. Meanwhile Apache thinks it has killed the
> process and doesn''t send it any more requests. After a while I
reach
> my process limit with processes stuck in this state. kill -9 will kill
> them and get things going again.
> 
Thanks for the hint Jon. I had thought about modifying the
RailsFCGIHandler so that the process exits after (say) 25 requests
instead of invoking the garbage collector. I was not, however, aware of
the USR1 signal thing. I think I will play around with the
RailsFCGIHandler and see if I get more reliability.

-- 
Posted via http://www.ruby-forum.com/.

Jon Smirl

2006-Jan-15 17:04 UTC

head link

[Rails] Re: FastCGI processes sometimes ''hang''

On 1/15/06, John Jeffery <ax01@sp.com.au> wrote:> Jon Smirl wrote:
> >
> > I am experiencing something similar. Apache at my hosting provider is
> > configured to send a signal -USR1 to the fcgi processes every four
> > hours in order to make them exit and restart.  What seems to be
> > happening is that the FCGI process receives the USR1 and
doesn''t exit
> > until the next request. Meanwhile Apache thinks it has killed the
> > process and doesn''t send it any more requests. After a while
I reach
> > my process limit with processes stuck in this state. kill -9 will kill
> > them and get things going again.
> >
>
> Thanks for the hint Jon. I had thought about modifying the
> RailsFCGIHandler so that the process exits after (say) 25 requests
> instead of invoking the garbage collector. I was not, however, aware of
> the USR1 signal thing. I think I will play around with the
> RailsFCGIHandler and see if I get more reliability.
In the ruby fcgi gem there is a file called README.signals. It
describes what needs to be done to make Apache fcgi work correctly.
The problem is that Rails fcgi_handler.rb is not implementing what
that file says to do.

My hosting provider is doing a ''graceful'' Apache restart every
four
hours. Apache sends out the USR1 signals like described in
README.signals. Without changing the Rails fcgi_handler code the USR1
signal gets queued and the process doesn''t exit since Rails has
registed a USR1 handler. Queuing is what ruby is supposed to do if the
main thread is stuck in the select(). The USR1 signal will be dequeued
and handled when the select() completes. But Apache has restarted and
is disconnected from the process and the select never completes and
the process never exits. After a while these build up and I reach my
process limit. At that point all of the process will be running but
they are  disconnected from Apache - then you start getting a
permanent Error 500.

Another way I am getting Error 500 at dreamhost is via sigTERM. They
seem to have a supervisor process out there that looks for
''extra''
FCGI processes and sends them a sigTERM. TERM is bad because it make
fcgi exit even if it is in the middle of processing a request. That''s
a guarantee way to get an intermittent Error 500.

After a while I end up in a steady dance of disconnected process
getting TERM to kill them, that works. But the TERM is also hitting
random good processes too. Thus the random Error 500 behavior at
dreamhost. The site keeps running but it is really broken.

My solution to this is to disable kill from TERM and make USR1 work
correctly. This seems to be working but this is a slow, long term
problem and it is hard to tell if I really have eliminated the Error
500''s.

One part I don''t understand is why the selects don''t complete
to the
disconnected processes after the Apache graceful restart. It seems
like these sockets should be getting closed and causing the select to
return nil but this doesn''t happen. I haven''t figured out if
Apache
isn''t closing the socket or if Ruby is broken on completing the select
when the socket closes. If the select() completed the queued USR1
signal would run and the process would exit. I''ve tried playing with
the code in this area and I keep getting the processes stuck in zombie
state.

--
Jon Smirl
jonsmirl@gmail.com

Reasonably Related Threads

Search for more seemingly similar threads

Rails - Jan 2006 - FastCGI processes sometimes ''hang''

[Rails] FastCGI processes sometimes ''hang''

[Rails] Re: FastCGI processes sometimes ''hang''

[Rails] FastCGI processes sometimes ''hang''

[Rails] Re: FastCGI processes sometimes ''hang''

[Rails] Re: FastCGI processes sometimes ''hang''

Reasonably Related Threads