Gianni Jacklone
2005-Aug-10 16:45 UTC
Image Uploading with FastCGI Woes, App Stops Responding to Requests
Hello Rails Community, I''m having an issue with an application of mine where periodically the fastcgi processes running on the box are turning to zombies, requests stop being handled, and I''ve got to manually restart apache to bring the app back to life. I''ve put together a little trail of what''s going on based on data from my fastcgi.crash.log, and I figured the community might be able to help me figure out just what''s going on, and hopefully help me fix it. Here''s the skinny: Rails 0.13.1 running on OS X Server Panther, Apache 1.3 FastCGI setup. 15 server processes, big timeout value of 1800 (My app involves uploading large image files) Every now and again my app stops responding to requests. Logging in over ssh shows memory usuage is quite normal, fastcgi processes hover between 35-40MB a piece, and I''ve got plenty memory to spare. A look into my fastcgi.crash.log right after I notice the apps stops responding, shows entries like these: 09/Aug/2005:13:22:57 :: 795] Dispatcher failed to catch: protocol error (FCGI::Stream::ProtocolError) /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read'' /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read_multipart'' /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/cgi_ext/raw_post_data_fix.rb:13:in `initialize_query'' /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' (eval):16:in `initialize'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `each_cgi'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each_cgi'' /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:55:in `process!'' /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:21:in `process!'' /Library/WebServer/Documents/intranet/dispatch.fcgi:24 almost killed by this error My app involves heavy uploading of files, 20MB uploads are uncommon, but do occur. We''re thinking of putting limits on the downloads because of bandwidth issues and how long uploading a 20MB set of images takes over slow connections, but I''m not sure if file size is the cause of the FCGI::Stream::ProtocolError, but it is a hunch. I can''t seem to reproduce this Exception on my dev box though, or my integration server. Poking around through the code that the stacktrace points to, it appears that someting goes wrong in cgi.rb when reading the image data from the post that contains the image being uploaded. The FCGI::Stream::ProtocolError is thrown from the fcgi.c file, I''m not sure what this exception means however, perhaps someone reading this knows? Occassionaly I see log entries like these which seem similar: [09/Aug/2005:13:47:51 :: 792] Dispatcher failed to catch: bad content body (EOFError) /opt/local/lib/ruby/1.8/cgi.rb:1020:in `read_multipart'' /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/cgi_ext/raw_post_data_fix.rb:13:in `initialize_query'' unc /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' (eval):16:in `initialize'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `each_cgi'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each'' /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each_cgi'' /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:55:in `process!'' /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb:21:in `process!'' /Library/WebServer/Documents/intranet/dispatch.fcgi:24 almost killed by this error However this error I am able to reproduce. If I submit a form with a file upload and then hit Firefox''s stop button in the middle of the request before it completes, this exception is thrown. It does NOT seem to leave my app in an unstable state however, I did this manual trick to cause this exception 10 times in a row and the app behaved just fine. These are the only two exceptions that I find in my fastcgi.crash.log, so I''m assuming that the first exception: FCGI::Stream::ProtocolError is the culprit. Looking through the fcgi_handler.rb file in Rails reveals that there''s a retry mechanism being conducted after exceptions are caught. In the process! method there''s this code: rescue Object => fcgi_error # retry on errors that would otherwise have terminated the FCGI process, # but only if they occur more than 10 seconds apart. if !(SignalException === fcgi_error) && Time.now - @last_error_on > 10 @last_error_on = Time.now dispatcher_error(fcgi_error, "almost killed by this error") retry else dispatcher_error(fcgi_error, "killed by this error") end end Now I''m thinking maybe when this FCGI::Stream::ProtocolError exception is thrown it makes a zombie out of the fcgi process, and this retry mechanism might be doing the same to the other processes, leaving me with a bunch of zombie processes and thus my app stops responding. The other thing I''ve noted is that never do these processes get killed. I never see "killed by this error" in my logs, so maybe fcgi processes that should be getting killed aren''t? Hopefully some kind soul smarter than myself can help shed some light as to what is going on and what I can do about it. Thanks for just reading this message if you''ve made it this far. I''ve thought about modifying fcgi_handler.rb to explicity catch FCGI::Stream::ProtocolError, and terminate the fcgi process explicity. Does anyone think that would help? I figured I''d run all this by the experts first. Oh, and could someone explain to me what the Garbage Collection period parameter is all about, and if setting it to some value might help in my case? Thanks and Best, Gianni
Ryan Carver
2005-Aug-10 17:24 UTC
Re: Image Uploading with FastCGI Woes, App Stops Responding to Requests
I haven''t tried to track it down yet, but we also have many zombied processes hanging around, though none of the errors you see, and no file uploads. this is under apache/freebsd, but I''m not sure of the exact fcgi details. On Aug 10, 2005, at 9:45 AM, Gianni Jacklone wrote:> Hello Rails Community, > > I''m having an issue with an application of mine where periodically > the fastcgi processes running on the box are turning to zombies, > requests stop being handled, and I''ve got to manually restart > apache to bring the app back to life. I''ve put together a little > trail of what''s going on based on data from my fastcgi.crash.log, > and I figured the community might be able to help me figure out > just what''s going on, and hopefully help me fix it. > > Here''s the skinny: > > Rails 0.13.1 running on OS X Server Panther, Apache 1.3 FastCGI > setup. 15 server processes, big timeout value of 1800 (My app > involves uploading large image files) > > Every now and again my app stops responding to requests. Logging in > over ssh shows memory usuage is quite normal, fastcgi processes > hover between 35-40MB a piece, and I''ve got plenty memory to spare. > A look into my fastcgi.crash.log right after I notice the apps > stops responding, shows entries like these: > > 09/Aug/2005:13:22:57 :: 795] Dispatcher failed to catch: protocol > error (FCGI::Stream::ProtocolError) > /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read'' > /opt/local/lib/ruby/1.8/cgi.rb:1015:in `read_multipart'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' > /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/ > action_controller/cgi_ext/raw_post_data_fix.rb:13:in > `initialize_query'' > /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' > (eval):16:in `initialize'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 55:in `process!'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 21:in `process!'' > /Library/WebServer/Documents/intranet/dispatch.fcgi:24 > almost killed by this error > > My app involves heavy uploading of files, 20MB uploads are > uncommon, but do occur. We''re thinking of putting limits on the > downloads because of bandwidth issues and how long uploading a 20MB > set of images takes over slow connections, but I''m not sure if file > size is the cause of the FCGI::Stream::ProtocolError, but it is a > hunch. > > I can''t seem to reproduce this Exception on my dev box though, or > my integration server. Poking around through the code that the > stacktrace points to, it appears that someting goes wrong in cgi.rb > when reading the image data from the post that contains the image > being uploaded. The FCGI::Stream::ProtocolError is thrown from the > fcgi.c file, I''m not sure what this exception means however, > perhaps someone reading this knows? > > Occassionaly I see log entries like these which seem similar: > > [09/Aug/2005:13:47:51 :: 792] Dispatcher failed to catch: bad > content body (EOFError) > /opt/local/lib/ruby/1.8/cgi.rb:1020:in `read_multipart'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `loop'' > /opt/local/lib/ruby/1.8/cgi.rb:983:in `read_multipart'' > /opt/local/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/ > action_controller/cgi_ext/raw_post_data_fix.rb:13:in > `initialize_query'' > unc /opt/local/lib/ruby/1.8/cgi.rb:2269:in `initialize'' > (eval):16:in `initialize'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in `new'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:600:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each'' > /opt/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in > `each_cgi'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 55:in `process!'' > /opt/local/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/fcgi_handler.rb: > 21:in `process!'' > /Library/WebServer/Documents/intranet/dispatch.fcgi:24 > almost killed by this error > > However this error I am able to reproduce. If I submit a form with > a file upload and then hit Firefox''s stop button in the middle of > the request before it completes, this exception is thrown. It does > NOT seem to leave my app in an unstable state however, I did this > manual trick to cause this exception 10 times in a row and the app > behaved just fine. > > These are the only two exceptions that I find in my > fastcgi.crash.log, so I''m assuming that the first exception: > FCGI::Stream::ProtocolError is the culprit. Looking through the > fcgi_handler.rb file in Rails reveals that there''s a retry > mechanism being conducted after exceptions are caught. In the > process! method there''s this code: > > rescue Object => fcgi_error > # retry on errors that would otherwise have terminated the FCGI > process, > # but only if they occur more than 10 seconds apart. > if !(SignalException === fcgi_error) && Time.now - > @last_error_on > 10 > @last_error_on = Time.now > dispatcher_error(fcgi_error, "almost killed by this error") > retry > else > dispatcher_error(fcgi_error, "killed by this error") > end > end > > Now I''m thinking maybe when this FCGI::Stream::ProtocolError > exception is thrown it makes a zombie out of the fcgi process, and > this retry mechanism might be doing the same to the other > processes, leaving me with a bunch of zombie processes and thus my > app stops responding. > > The other thing I''ve noted is that never do these processes get > killed. I never see "killed by this error" in my logs, so maybe > fcgi processes that should be getting killed aren''t? > > Hopefully some kind soul smarter than myself can help shed some > light as to what is going on and what I can do about it. Thanks for > just reading this message if you''ve made it this far. > > I''ve thought about modifying fcgi_handler.rb to explicity catch > FCGI::Stream::ProtocolError, and terminate the fcgi process > explicity. Does anyone think that would help? I figured I''d run all > this by the experts first. > > Oh, and could someone explain to me what the Garbage Collection > period parameter is all about, and if setting it to some value > might help in my case? > > Thanks and Best, > Gianni > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >