Gianni Jacklone
2005-Aug-10 16:45 UTC
Image Uploading with FastCGI Woes, App Stops Responding to Requests
Hello Rails Community,
I''m having an issue with an application of mine where periodically the
fastcgi processes running on the box are turning to zombies, requests
stop being handled, and I''ve got to manually restart apache to bring
the
app back to life. I''ve put together a little trail of what''s
going on
based on data from my fastcgi.crash.log, and I figured the community
might be able to help me figure out just what''s going on, and hopefully
help me fix it.
Here''s the skinny:
Rails 0.13.1 running on OS X Server Panther, Apache 1.3 FastCGI setup.
15 server processes, big timeout value of 1800 (My app involves
uploading large image files)
Every now and again my app stops responding to requests. Logging in over
ssh shows memory usuage is quite normal, fastcgi processes hover between
35-40MB a piece, and I''ve got plenty memory to spare. A look into my
fastcgi.crash.log right after I notice the apps stops responding, shows
entries like these:
09/Aug/2005:13:22:57 :: 795] Dispatcher failed to catch: protocol error
(FCGI::Stream::ProtocolError)
/opt/local/lib/ruby/1.8/cgi.rb:1015:in `read''
/opt/local/lib/ruby/1.8/cgi.rb:1015:in `read_multipart''
/opt/local/lib/ruby/1.8/cgi.rb:983:in `loop''
/opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart''
/opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/cgi_ext/raw_post_data_fix.rb:13:in
`initialize_query''
/opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize''
(eval):16:in `initialize''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in
`each_cgi''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in
`each''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in
`each_cgi''
/opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:55:in
`process!''
/opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:21:in
`process!''
/Library/WebServer/Documents/intranet/dispatch.fcgi:24
almost killed by this error
My app involves heavy uploading of files, 20MB uploads are uncommon, but
do occur. We''re thinking of putting limits on the downloads because of
bandwidth issues and how long uploading a 20MB set of images takes over
slow connections, but I''m not sure if file size is the cause of the
FCGI::Stream::ProtocolError, but it is a hunch.
I can''t seem to reproduce this Exception on my dev box though, or my
integration server. Poking around through the code that the stacktrace
points to, it appears that someting goes wrong in cgi.rb when reading
the image data from the post that contains the image being uploaded.
The FCGI::Stream::ProtocolError is thrown from the fcgi.c file, I''m not
sure what this exception means however, perhaps someone reading this knows?
Occassionaly I see log entries like these which seem similar:
[09/Aug/2005:13:47:51 :: 792] Dispatcher failed to catch: bad content
body (EOFError)
/opt/local/lib/ruby/1.8/cgi.rb:1020:in `read_multipart''
/opt/local/lib/ruby/1.8/cgi.rb:983:in `loop''
/opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart''
/opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/cgi_ext/raw_post_data_fix.rb:13:in
`initialize_query''
unc /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize''
(eval):16:in `initialize''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in
`each_cgi''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in
`each''
/opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in
`each_cgi''
/opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:55:in
`process!''
/opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:21:in
`process!''
/Library/WebServer/Documents/intranet/dispatch.fcgi:24
almost killed by this error
However this error I am able to reproduce. If I submit a form with a
file upload and then hit Firefox''s stop button in the middle of the
request before it completes, this exception is thrown. It does NOT seem
to leave my app in an unstable state however, I did this manual trick to
cause this exception 10 times in a row and the app behaved just fine.
These are the only two exceptions that I find in my fastcgi.crash.log,
so I''m assuming that the first exception: FCGI::Stream::ProtocolError
is
the culprit. Looking through the fcgi_handler.rb file in Rails reveals
that there''s a retry mechanism being conducted after exceptions are
caught. In the process! method there''s this code:
rescue Object => fcgi_error
# retry on errors that would otherwise have terminated the FCGI process,
# but only if they occur more than 10 seconds apart.
if !(SignalException === fcgi_error) && Time.now - @last_error_on
> 10
@last_error_on = Time.now
dispatcher_error(fcgi_error, "almost killed by this error")
retry
else
dispatcher_error(fcgi_error, "killed by this error")
end
end
Now I''m thinking maybe when this FCGI::Stream::ProtocolError exception
is thrown it makes a zombie out of the fcgi process, and this retry
mechanism might be doing the same to the other processes, leaving me
with a bunch of zombie processes and thus my app stops responding.
The other thing I''ve noted is that never do these processes get killed.
I never see "killed by this error" in my logs, so maybe fcgi processes
that should be getting killed aren''t?
Hopefully some kind soul smarter than myself can help shed some light as
to what is going on and what I can do about it. Thanks for just reading
this message if you''ve made it this far.
I''ve thought about modifying fcgi_handler.rb to explicity catch
FCGI::Stream::ProtocolError, and terminate the fcgi process explicity.
Does anyone think that would help? I figured I''d run all this by the
experts first.
Oh, and could someone explain to me what the Garbage Collection period
parameter is all about, and if setting it to some value might help in my
case?
Thanks and Best,
Gianni
Ryan Carver
2005-Aug-10 17:24 UTC
Re: Image Uploading with FastCGI Woes, App Stops Responding to Requests
I haven''t tried to track it down yet, but we also have many zombied processes hanging around, though none of the errors you see, and no file uploads. this is under apache/freebsd, but I''m not sure of the exact fcgi details. On Aug 10, 2005, at 9:45 AM, Gianni Jacklone wrote:> Hello Rails Community, > > I''m having an issue with an application of mine where periodically > the fastcgi processes running on the box are turning to zombies, > requests stop being handled, and I''ve got to manually restart > apache to bring the app back to life. I''ve put together a little > trail of what''s going on based on data from my fastcgi.crash.log, > and I figured the community might be able to help me figure out > just what''s going on, and hopefully help me fix it. > > Here''s the skinny: > > Rails 0.13.1 running on OS X Server Panther, Apache 1.3 FastCGI > setup. 15 server processes, big timeout value of 1800 (My app > involves uploading large image files) > > Every now and again my app stops responding to requests. Logging in > over ssh shows memory usuage is quite normal, fastcgi processes > hover between 35-40MB a piece, and I''ve got plenty memory to spare. > A look into my fastcgi.crash.log right after I notice the apps > stops responding, shows entries like these: > > 09/Aug/2005:13:22:57 :: 795] Dispatcher failed to catch: protocol > error (FCGI::Stream::ProtocolError) > /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read'' > /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read_multipart'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' > /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/ > action_controller/cgi_ext/raw_post_data_fix.rb:13:in > `initialize_query'' > /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' > (eval):16:in `initialize'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 55:in `process!'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 21:in `process!'' > /Library/WebServer/Documents/intranet/dispatch.fcgi:24 > almost killed by this error > > My app involves heavy uploading of files, 20MB uploads are > uncommon, but do occur. We''re thinking of putting limits on the > downloads because of bandwidth issues and how long uploading a 20MB > set of images takes over slow connections, but I''m not sure if file > size is the cause of the FCGI::Stream::ProtocolError, but it is a > hunch. > > I can''t seem to reproduce this Exception on my dev box though, or > my integration server. Poking around through the code that the > stacktrace points to, it appears that someting goes wrong in cgi.rb > when reading the image data from the post that contains the image > being uploaded. The FCGI::Stream::ProtocolError is thrown from the > fcgi.c file, I''m not sure what this exception means however, > perhaps someone reading this knows? > > Occassionaly I see log entries like these which seem similar: > > [09/Aug/2005:13:47:51 :: 792] Dispatcher failed to catch: bad > content body (EOFError) > /opt/local/lib/ruby/1.8/cgi.rb:1020:in `read_multipart'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' > /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/ > action_controller/cgi_ext/raw_post_data_fix.rb:13:in > `initialize_query'' > unc /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' > (eval):16:in `initialize'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 55:in `process!'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 21:in `process!'' > /Library/WebServer/Documents/intranet/dispatch.fcgi:24 > almost killed by this error > > However this error I am able to reproduce. If I submit a form with > a file upload and then hit Firefox''s stop button in the middle of > the request before it completes, this exception is thrown. It does > NOT seem to leave my app in an unstable state however, I did this > manual trick to cause this exception 10 times in a row and the app > behaved just fine. > > These are the only two exceptions that I find in my > fastcgi.crash.log, so I''m assuming that the first exception: > FCGI::Stream::ProtocolError is the culprit. Looking through the > fcgi_handler.rb file in Rails reveals that there''s a retry > mechanism being conducted after exceptions are caught. In the > process! method there''s this code: > > rescue Object => fcgi_error > # retry on errors that would otherwise have terminated the FCGI > process, > # but only if they occur more than 10 seconds apart. > if !(SignalException === fcgi_error) && Time.now - > @last_error_on > 10 > @last_error_on = Time.now > dispatcher_error(fcgi_error, "almost killed by this error") > retry > else > dispatcher_error(fcgi_error, "killed by this error") > end > end > > Now I''m thinking maybe when this FCGI::Stream::ProtocolError > exception is thrown it makes a zombie out of the fcgi process, and > this retry mechanism might be doing the same to the other > processes, leaving me with a bunch of zombie processes and thus my > app stops responding. > > The other thing I''ve noted is that never do these processes get > killed. I never see "killed by this error" in my logs, so maybe > fcgi processes that should be getting killed aren''t? > > Hopefully some kind soul smarter than myself can help shed some > light as to what is going on and what I can do about it. Thanks for > just reading this message if you''ve made it this far. > > I''ve thought about modifying fcgi_handler.rb to explicity catch > FCGI::Stream::ProtocolError, and terminate the fcgi process > explicity. Does anyone think that would help? I figured I''d run all > this by the experts first. > > Oh, and could someone explain to me what the Garbage Collection > period parameter is all about, and if setting it to some value > might help in my case? > > Thanks and Best, > Gianni > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >