Emmett Shear
2008-May-29 20:07 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
I just switched to Mongrel, and it''s been working much better than my previous lighttpd/fastcgi setup. So thanks for the awesomeness. My current problem: once or twice an hour, I get following error in production Mongrel timed out this thread: too many open files I never get it in testing or on our staging server. Any ideas what would cause that? It doesn''t *appear* particularly correlated with load to me, but I''m only receiving notifications after the fact so I can''t be sure. Thanks, Emmett -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mongrel-users/attachments/20080529/d875aa2f/attachment.html>
Zed A. Shaw
2008-May-29 21:02 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
On Thu, 29 May 2008 13:07:27 -0700 "Emmett Shear" <emmett at justin.tv> wrote:> I just switched to Mongrel, and it''s been working much better than my > previous lighttpd/fastcgi setup. So thanks for the awesomeness. > > My current problem: once or twice an hour, I get following error in > production > > Mongrel timed out this thread: too many open files > > I never get it in testing or on our staging server. Any ideas what would > cause that? It doesn''t *appear* particularly correlated with load to me, but > I''m only receiving notifications after the fact so I can''t be sure.A couple things cause this. One is that the mongrel is overloaded with too many connections so it can''t accept any more. If there''s isn''t that much load on the server, then it''s more likely that you are leaking an open file here or there. If you are doing code like this: a = open("blah.txt") a.write("hi") a.close() Then you are probably leaking files. Look for that, and then translate to the block form: open("blah.txt") {|a| a.write("hi") } That''s probably the #1 mistake people make from other languages. -- Zed A. Shaw - Hate: http://savingtheinternetwithhate.com/ - Good: http://www.zedshaw.com/ - Evil: http://yearofevil.com/
Brian Weaver
2008-Jun-01 02:11 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
Emmett, Contrary to what Zed''s message seems to imply, there is nothing inherently wrong with codling like: a = FIle.open("blah.txt") a.write("hi!") a.close() You simply need to understand that if any error occurs during a.write(...) or a similar call then a.close will not be invoked. If you use error handling like a = FIle.open("blah.txt") begin a.write("hi!") ensure a.close() end then you will ensure that the file is actually closed regardless of an exception. Of course a block like that is kind of ugly, so it''s better to do what Zed suggested and actually associate a code block with the open call. This means that even if the block faults the file is closed; it''s just a cleaner syntax. Here are some links that kind of explains it too: http://www.meshplex.org/wiki/Ruby/File_handling_Input_Output http://www.math.hokudai.ac.jp/~gotoken/ruby/ruby-uguide/uguide25.html -- Brian On Thu, May 29, 2008 at 5:02 PM, Zed A. Shaw <zedshaw at zedshaw.com> wrote:> On Thu, 29 May 2008 13:07:27 -0700 > "Emmett Shear" <emmett at justin.tv> wrote: > >> I just switched to Mongrel, and it''s been working much better than my >> previous lighttpd/fastcgi setup. So thanks for the awesomeness. >> >> My current problem: once or twice an hour, I get following error in >> production >> >> Mongrel timed out this thread: too many open files >> >> I never get it in testing or on our staging server. Any ideas what would >> cause that? It doesn''t *appear* particularly correlated with load to me, but >> I''m only receiving notifications after the fact so I can''t be sure. > > A couple things cause this. One is that the mongrel is overloaded with > too many connections so it can''t accept any more. > > If there''s isn''t that much load on the server, then it''s more likely > that you are leaking an open file here or there. If you are doing code > like this: > > a = open("blah.txt") > a.write("hi") > a.close() > > Then you are probably leaking files. Look for that, and then translate > to the block form: > > open("blah.txt") {|a| a.write("hi") } > > That''s probably the #1 mistake people make from other languages. > > -- > Zed A. Shaw > - Hate: http://savingtheinternetwithhate.com/ > - Good: http://www.zedshaw.com/ > - Evil: http://yearofevil.com/ > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- /* insert witty comment here */
Emmett Shear
2008-Jun-01 04:03 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
Looks like I was overloading the mongrels with connections...I took down the number of connections allowed in HAProxy and it looks like the problem went away. So, thanks! This has uncovered a new problem though, one that''s truly baffling me: - Start up mongrel instances. Everything is awesome. Site is fast, life is good. - Wait 30-40 minutes. - Observe that updates and inserts in the database (postgres) are becoming slow. And by slow, I mean 30-40 seconds for a simple insert or update where it previously took less than 0.1 seconds. Load on DB server itself remains nominal; less than 2 on an 8 core box. No error messages of importance that I can see. Inserts and updates from other sources (script/console, psql) are fast. This started happening just after switching from fcgi to mongrels. Could it be something is different about how it handles database connections? Was I relying on some kind of bug before? E On Sat, May 31, 2008 at 7:11 PM, Brian Weaver <cmdrclueless at gmail.com> wrote:> Emmett, > > Contrary to what Zed''s message seems to imply, there is nothing > inherently wrong with codling like: > > a = FIle.open("blah.txt") > a.write("hi!") > a.close() > > You simply need to understand that if any error occurs during > a.write(...) or a similar call then a.close will not be invoked. If > you use error handling like > > a = FIle.open("blah.txt") > begin > a.write("hi!") > ensure > a.close() > end > > then you will ensure that the file is actually closed regardless of an > exception. Of course a block like that is kind of ugly, so it''s better > to do what Zed suggested and actually associate a code block with the > open call. This means that even if the block faults the file is > closed; it''s just a cleaner syntax. > > Here are some links that kind of explains it too: > > http://www.meshplex.org/wiki/Ruby/File_handling_Input_Output > http://www.math.hokudai.ac.jp/~gotoken/ruby/ruby-uguide/uguide25.html<http://www.math.hokudai.ac.jp/%7Egotoken/ruby/ruby-uguide/uguide25.html> > > -- Brian > > On Thu, May 29, 2008 at 5:02 PM, Zed A. Shaw <zedshaw at zedshaw.com> wrote: > > On Thu, 29 May 2008 13:07:27 -0700 > > "Emmett Shear" <emmett at justin.tv> wrote: > > > >> I just switched to Mongrel, and it''s been working much better than my > >> previous lighttpd/fastcgi setup. So thanks for the awesomeness. > >> > >> My current problem: once or twice an hour, I get following error in > >> production > >> > >> Mongrel timed out this thread: too many open files > >> > >> I never get it in testing or on our staging server. Any ideas what would > >> cause that? It doesn''t *appear* particularly correlated with load to me, > but > >> I''m only receiving notifications after the fact so I can''t be sure. > > > > A couple things cause this. One is that the mongrel is overloaded with > > too many connections so it can''t accept any more. > > > > If there''s isn''t that much load on the server, then it''s more likely > > that you are leaking an open file here or there. If you are doing code > > like this: > > > > a = open("blah.txt") > > a.write("hi") > > a.close() > > > > Then you are probably leaking files. Look for that, and then translate > > to the block form: > > > > open("blah.txt") {|a| a.write("hi") } > > > > That''s probably the #1 mistake people make from other languages. > > > > -- > > Zed A. Shaw > > - Hate: http://savingtheinternetwithhate.com/ > > - Good: http://www.zedshaw.com/ > > - Evil: http://yearofevil.com/ > > _______________________________________________ > > Mongrel-users mailing list > > Mongrel-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mongrel-users > > > > > > -- > > /* insert witty comment here */ >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mongrel-users/attachments/20080531/c8b3aeaa/attachment-0001.html>
Zed A. Shaw
2008-Jun-01 06:07 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
On Sat, 31 May 2008 21:03:21 -0700 "Emmett Shear" <emmett at justin.tv> wrote:> - Observe that updates and inserts in the database (postgres) are becoming > slow. And by slow, I mean 30-40 seconds for a simple insert or update where > it previously took less than 0.1 seconds. Load on DB server itself remains > nominal; less than 2 on an 8 core box. No error messages of importance that > I can see. Inserts and updates from other sources (script/console, psql) are > fast.Well, it sounds like your site already has some traffic. Without getting into a remote debugging session, have you checked your indexes to make sure you''re adding the right ones to the right columns? If you were say entering a ton of strings into a DB and then querying for them with insane LIKE clauses, you''d see this kind of behavior. As you added more rows your app would get slower and slower. -- Zed A. Shaw - Hate: http://savingtheinternetwithhate.com/ - Good: http://www.zedshaw.com/ - Evil: http://yearofevil.com/
Emmett Shear
2008-Jun-01 21:23 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
At first, I thought I''d messed up something in the database too. But running the *exact* same updates and inserts against the production database, through the console, yields normal, fast results. The *only* place I see these 30-40 second updates/inserts is from mongrels that have been under load for a while; I don''t see the slowness when running the exact same things from console, or from the old FCGI setup. What could be different about the doing the database queries in Mongrel that could cause this? I''m not too clear on exactly how Mongrel differs from FCGI, other than being faster and not using FCGI (the protocol). Could it be possible that the database connections are longer lived, or somehow shared between multiple threads, or something like that? I start with the assumption Mongrel does things the right way, and that I''ve made some mistake in configuring my application, but I''m at a loss as to where to start looking. Thanks, Emmett On Sat, May 31, 2008 at 11:07 PM, Zed A. Shaw <zedshaw at zedshaw.com> wrote:> On Sat, 31 May 2008 21:03:21 -0700 > "Emmett Shear" <emmett at justin.tv> wrote: > > > - Observe that updates and inserts in the database (postgres) are > becoming > > slow. And by slow, I mean 30-40 seconds for a simple insert or update > where > > it previously took less than 0.1 seconds. Load on DB server itself > remains > > nominal; less than 2 on an 8 core box. No error messages of importance > that > > I can see. Inserts and updates from other sources (script/console, psql) > are > > fast. > > Well, it sounds like your site already has some traffic. Without > getting into a remote debugging session, have you checked your indexes > to make sure you''re adding the right ones to the right columns? > > If you were say entering a ton of strings into a DB and then querying > for them with insane LIKE clauses, you''d see this kind of behavior. As > you added more rows your app would get slower and slower. > > -- > Zed A. Shaw > - Hate: http://savingtheinternetwithhate.com/ > - Good: http://www.zedshaw.com/ > - Evil: http://yearofevil.com/ >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mongrel-users/attachments/20080601/78eec549/attachment.html>
Tikhon Bernstam
2008-Jun-02 13:04 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
Hi Emmett, I''ve think I''ve seen the problem you''ve described when using acts_as_ferret + ferret DRb server (though I''ll assume you aren''t actually using ferret -- as the inimitable Engine Yard guys pointed out this weekend during one of their talks, ferret is a common cause of problems for their users. I haven''t played with ferret in months btw, so this example might be outdated, but this example illustrates a more general problem, I think). in this ferret case, the problem, I believe, is that when you have some model Foo that uses acts_as_ferret and you call foo.save, the COMMIT on the save transaction occurs *after* the ferret after_create/after_update hooks. So the COMMIT occurs *after* the call to the ferret DRb server. Normally this is ok, but if you are indexing large amounts of text (e.g.) or the DRb server gets busy for whatever reason, we saw that the save transactions can suddenly take a long time. The example above illustrates a more general point, I think -- be careful with what you''re doing in your AR hooks. Again, the problem is that when you save your AR object, that save is wrapped in a transaction, and the commit on that transaction occurs after the AR hooks like after_create. To verify this, here''s a simple example: # script/generate model foo && rake db:migrate class Foo < ActiveRecord::Base after_create { sleep 10 } end # then from script/console foo = Foo.create # now watch your database -- the transaction begins, but the COMMIT doesn''t occur until after the 10 seconds of sleep. So what plugins are you using? And are you using any interesting AR hooks that could potentially take a long time (like talking to a DRb server or uploading files to s3 as an after_create, for example)? Best, Tikhon Bernstam Co-founder, Scribd.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mongrel-users/attachments/20080602/3ef01294/attachment.html>
Nicolas Escobar
2008-Jun-03 14:17 UTC
[Mongrel] Error: Mongrel timed out this thread: too many open files
Tikhon Bernstam wrote:> I''ve think I''ve seen the problem you''ve described when using > acts_as_ferret > + ferret DRb serverI have exactly the same problem. Initially I was running acts_as_ferret with a DRb server in a mongrel_cluster and it was working ok. Then, I changed a field in a table and restarted the mongrel_cluster. It was then when it stopped working (same error as posted). I used the backup version and dropped the field that I created but the same happens. In ''development'' enviroment with a single instance of mongrel it works though, using acts_as_ferret and DRb server. -- Posted via http://www.ruby-forum.com/.