Hi list, I''m seeing a couple of issues with Mongrel. I''m running FreeBSD 6.1 and have previously been told that there are known conflicts between this and Mongrel, yet I hope these issues will be resolved with time. I''m overloading Mongrel with httperf on my local workstation. Mongrel is started directly with the mongrel_rails command and there is only one mongrel running. Soon I will start seeing: Server overloaded with N processors (M max). Dropping connection. Wed Sep 13 00:03:05 CEST 2006: Reaping N threads for slow workers because of ''max processors'' Now this is fine, but sooner or later I start getting the following in the mongrel.log: Thread #<Thread:0x8f1a364 sleep> is too old, killing. Wed Sep 13 00:34:42 CEST 2006: Error calling Dispatcher.dispatch #<Sync_m::Err::UnknownLocker: Thread(#<Thread:0x8f1a364 run>) not locked.> /usr/local/lib/ruby/1.8/sync.rb:57:in `Fail'' /usr/local/lib/ruby/1.8/sync.rb:63:in `Fail'' /usr/local/lib/ruby/1.8/sync.rb:183:in `sync_unlock'' /usr/local/lib/ruby/1.8/sync.rb:231:in `synchronize'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/rails.rb:83:in `process'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:580:in `process_client'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:579:in `process_client'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:686:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:686:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:673:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:267:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:203:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231 /usr/local/bin/mongrel_rails:18 Often I get many, many of these in succession. Usually mongrel will stop responding shortly after these start showing up, but not always. Now when Mongrel dies it sometimes uses up all of the CPU time it can get it''s hands on. I tried to get a backtrace from gdb by attaching to the running process and asking nicely for a backtrace, here''s what gdb tells me (not sure if it really tells that much): #0 0x2823e9fb in nanosleep () from /lib/libc.so.6 #1 0x281823fe in _nanosleep () from /usr/lib/libpthread.so.2 #2 0x28182602 in nanosleep () from /usr/lib/libpthread.so.2 #3 0x281790eb in select () from /usr/lib/libpthread.so.2 #4 0x280ad34c in rb_thread_wait_for () from /usr/local/lib/libruby18.so.18 #5 0x280e4dec in rb_f_sleep () from /usr/local/lib/libruby18.so.18 #6 0x280a40ac in rb_call0 () from /usr/local/lib/libruby18.so.18 #7 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #8 0x2809f369 in rb_eval () from /usr/local/lib/libruby18.so.18 #9 0x280a2444 in rb_yield_0 () from /usr/local/lib/libruby18.so.18 #10 0x280a285e in rb_f_loop () from /usr/local/lib/libruby18.so.18 #11 0x280a40be in rb_call0 () from /usr/local/lib/libruby18.so.18 #12 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #13 0x2809f369 in rb_eval () from /usr/local/lib/libruby18.so.18 #14 0x2809e275 in rb_eval () from /usr/local/lib/libruby18.so.18 #15 0x280a2444 in rb_yield_0 () from /usr/local/lib/libruby18.so.18 #16 0x280ae89f in rb_thread_yield () from /usr/local/lib/libruby18.so.18 #17 0x280ae5ed in rb_thread_start_0 () from /usr/local/lib/libruby18.so.18 #18 0x280ae9a7 in rb_thread_initialize () from /usr/local/lib/libruby18.so.18 #19 0x280a4098 in rb_call0 () from /usr/local/lib/libruby18.so.18 #20 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #21 0x280a507d in rb_funcall2 () from /usr/local/lib/libruby18.so.18 #22 0x280a785e in rb_obj_call_init () from /usr/local/lib/libruby18.so.18 #23 0x280ae8dc in rb_thread_s_new () from /usr/local/lib/libruby18.so.18 #24 0x280a40ac in rb_call0 () from /usr/local/lib/libruby18.so.18 #25 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #26 0x2809f11c in rb_eval () from /usr/local/lib/libruby18.so.18 #27 0x2809e275 in rb_eval () from /usr/local/lib/libruby18.so.18 #28 0x2809fe04 in rb_eval () from /usr/local/lib/libruby18.so.18 #29 0x280a480e in rb_call0 () from /usr/local/lib/libruby18.so.18 #30 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #31 0x2809f11c in rb_eval () from /usr/local/lib/libruby18.so.18 #32 0x280a480e in rb_call0 () from /usr/local/lib/libruby18.so.18 #33 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #34 0x2809f11c in rb_eval () from /usr/local/lib/libruby18.so.18 #35 0x280a480e in rb_call0 () from /usr/local/lib/libruby18.so.18 #36 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #37 0x2809f11c in rb_eval () from /usr/local/lib/libruby18.so.18 #38 0x2809daae in rb_eval () from /usr/local/lib/libruby18.so.18 #39 0x2809ac9d in eval_node () from /usr/local/lib/libruby18.so.18 #40 0x280a6588 in rb_load () from /usr/local/lib/libruby18.so.18 #41 0x280a6842 in rb_f_load () from /usr/local/lib/libruby18.so.18 #42 0x280a40ac in rb_call0 () from /usr/local/lib/libruby18.so.18 #43 0x280a4c5d in rb_call () from /usr/local/lib/libruby18.so.18 #44 0x2809f369 in rb_eval () from /usr/local/lib/libruby18.so.18 #45 0x2809ac9d in eval_node () from /usr/local/lib/libruby18.so.18 #46 0x2809b229 in ruby_exec_internal () from /usr/local/lib/libruby18.so.18 #47 0x2809b28d in ruby_exec () from /usr/local/lib/libruby18.so.18 #48 0x2809b2c1 in ruby_run () from /usr/local/lib/libruby18.so.18 #49 0x08048649 in main () This corresponds to what top tells me (nanslp). Additionally I tried reattaching to the process a couple of times with gdb, and each time the backtrace looked the same. I''ve been trying to dig through the Mongrel code but I haven''t fully grasped what''s going on with all the threads in there, so I hope someone more insightful might be able to solve these issues. A couple of further observations: The above seems to be replicating whether I run with numprocs set to 1 or to 1024. Also I''m a bit surprised that I can from time to time experience that there are one more processor than the max allowed, I assume this is due to the threading and most likely is not a problem. I would really like to get this issue resolved so I can get back to using Mongrel for production on FreeBSD and if necessary I am willing to spend some time trying to resolve these issues, but I will probably need some help if I am to succeed. So please let me know if I can provide any further information or give me a pointer to where to go from here. -- Cheers, - Jacob Atzen
On Wed, 2006-09-13 at 16:00 +0200, Jacob Atzen wrote:> Hi list, > > I''m seeing a couple of issues with Mongrel. I''m running FreeBSD 6.1 and > have previously been told that there are known conflicts between this > and Mongrel, yet I hope these issues will be resolved with time. > > I''m overloading Mongrel with httperf on my local workstation. Mongrel is > started directly with the mongrel_rails command and there is only one > mongrel running. Soon I will start seeing: >This guy walks into the doctors office and says, "Doc! It hurts when I do this." He then sticks his finger in his eye and gouges out his cornea and eats it. The doctor yells, "What the hell did you do that for?!" Guy says, "So I can test how much damage my eye can take. See, it sucks, I need a stronger eye." In other words, congratulations you''ve overloaded Mongrel with httperf. That''s not too hard since Ruby only has 1024 files and you''re probably hitting a rails action that opens loads and loads of files or just plain maxing it out. The way mongrel defends itself is to try closing things off, but there''s not much it actually can do. It''s at the mercy of Ruby and that''s that. End of story. So, there''s no "issue" other than you need: 1) To tune your rails actions to be faster. 2) More mongrel processes. 3) To not do this to mongrel. 4) Maybe more machines if this is the load you have to handle. You should also get the 0.3.13.4 pre-release and use the USR1 logging (mentioned on this list many times) to debug which rails action is taking the longest and holding up the other threads. Then tune that rails action up to service the requests faster. Hope that helps. -- Zed A. Shaw http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
On Wed, Sep 13, 2006 at 03:24:25PM -0700, Zed Shaw wrote:> On Wed, 2006-09-13 at 16:00 +0200, Jacob Atzen wrote: > > Hi list, > > > > I''m seeing a couple of issues with Mongrel. I''m running FreeBSD 6.1 and > > have previously been told that there are known conflicts between this > > and Mongrel, yet I hope these issues will be resolved with time. > > > > I''m overloading Mongrel with httperf on my local workstation. Mongrel is > > started directly with the mongrel_rails command and there is only one > > mongrel running. Soon I will start seeing: > > > > This guy walks into the doctors office and says, "Doc! It hurts when I > do this." He then sticks his finger in his eye and gouges out his > cornea and eats it. The doctor yells, "What the hell did you do that > for?!" Guy says, "So I can test how much damage my eye can take. See, > it sucks, I need a stronger eye."I will try to explain my motivation for the tests performed. I was seeing Mongrel processes freezing in the sense that they stopped handling incoming requests on a production system. As part of the debugging of this I tried to create a setup where I minimized the external factors (Apache, Rails to a certain extent) and putting load on this to see if I could reproduce this behaviour. Now if anyone has suggestions on better ways of doing this I''m all ears.> In other words, congratulations you''ve overloaded Mongrel with httperf. > That''s not too hard since Ruby only has 1024 files and you''re probably > hitting a rails action that opens loads and loads of files or just plain > maxing it out.I''m seeing the same behaviour with 256 threads and lsof shows a stable amount of files opened by Ruby (around 290) throughout the running of the test. Additionally sockstat shows 255 sockets opened by Ruby throughout. My rails action is trivial, this is the implementation: def test render :text => "Just testing" end> The way mongrel defends itself is to try closing things off, but there''s > not much it actually can do. It''s at the mercy of Ruby and that''s that. > End of story. > > So, there''s no "issue" other than you need: > > 1) To tune your rails actions to be faster. > 2) More mongrel processes. > 3) To not do this to mongrel. > 4) Maybe more machines if this is the load you have to handle. > > You should also get the 0.3.13.4 pre-release and use the USR1 logging > (mentioned on this list many times) to debug which rails action is > taking the longest and holding up the other threads. Then tune that > rails action up to service the requests faster. > > Hope that helps.Of course I should have mentioned the following in my previous post, I''m sorry for the omission: - I am running the 0.3.13.4 pre-release fetched this tuesday. - The Rails action is minimized as shown above. - I am using USR1 debugging and seeing output resembling: Wed Sep 13 00:34:36 CEST 2006: 156 threads sync_waiting for /pages/test, 1000 still active in mongrel. Now Rails is telling me it can complete 1000+ requests a second, so I interpret this number as it''s Ruby/Mongrel/Whatever that''s using the majority of the time processing the requests. Of course my interpretation may be wrong. I''m not really concerned about performance. The problem is that Mongrel from time to time will end up in what appears to be an endless loop where it stops responding to anything besides a kill -9. I''ve noticed that both the open socket count and the open file count stays constant when this happens, which to me is a further indication that something is stuck somewhere. Finally the problem occurs after very different amounts of requests, sometimes it will show after only 30-40000 hits from httperf, sometimes I can get 10 times that. I will not start guessing as to why this is. I don''t mind Mongrel trying to defend itself and closing things off. And if there''s something inherent in Ruby or whatever which means that if Mongrel gets loaded it may suddenly stop responding I will just go back to mod_fcgid. In conclusion I believe there is an issue, namely that Mongrel from time to time freezes totally and never recovers. I hope the above explanation clarifies my position. If you do not think this is an issue that''s fine and I will not pursue the matter further. -- Sincerely, - Jacob Atzen
I am wondering if this might arise because of a potential race condition within Sync? A ''Thread.critical = true'' does not affect threads that did not exist at the time that it was called, and the lock() in Sync does quite a bit of work between when it declares a critical area and when a thread either gets the lock or does not. If a bunch of threads are killed very quickly, one of the remaining threads will get into that thread critical section, but I think it may be possible for a newly spawned thread to also get into there before the other one finishes its work. If that happens, and two threads think they have the lock, while Sync thinks only one does, then when the one that does not have the lock calls unlock, you get the error that you reported. Jacob, I have attached a version of thread.rb that makes a slight change to Mutex to avoid the memory management issues that the stock one runs into with regard to Array. It is otherwise identical to the stock thread.rb. I am curious if you replace your thread.rb with this one, and flip Mongrel back to using Mutex, if you can still get this same failure mode to happen? Kirk Haines -------------- next part -------------- A non-text attachment was scrubbed... Name: thread.rb Type: application/x-ruby Size: 8406 bytes Desc: not available Url : http://rubyforge.org/pipermail/mongrel-users/attachments/20060914/1195f96a/attachment.bin
On Thu, Sep 14, 2006 at 08:07:38AM -0600, Kirk Haines wrote:> I am wondering if this might arise because of a potential race > condition within Sync? A ''Thread.critical = true'' does not affect > threads that did not exist at the time that it was called, and the > lock() in Sync does quite a bit of work between when it declares a > critical area and when a thread either gets the lock or does not. > > If a bunch of threads are killed very quickly, one of the remaining > threads will get into that thread critical section, but I think it may > be possible for a newly spawned thread to also get into there before > the other one finishes its work. > > If that happens, and two threads think they have the lock, while Sync > thinks only one does, then when the one that does not have the lock > calls unlock, you get the error that you reported. > > Jacob, I have attached a version of thread.rb that makes a slight > change to Mutex to avoid the memory management issues that the stock > one runs into with regard to Array. It is otherwise identical to the > stock thread.rb. I am curious if you replace your thread.rb with this > one, and flip Mongrel back to using Mutex, if you can still get this > same failure mode to happen?How do I get Mongrel to use Mutex? -- Cheers, - Jacob Atzen
On 9/14/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:> How do I get Mongrel to use Mutex?In your mongrel/rails.rb, right at the beginning: require ''sync'' Comment that out and add require ''thread'' Use the thread lib I sent to you, as it should fix the problem that caused Zed to want to switch from Mutex to Sync in the first place. Then, just below that, in this code: class RailsHandler < Mongrel::HttpHandler attr_reader :files attr_reader :guard @@file_only_methods = ["GET","HEAD"] def initialize(dir, mime_map = {}) @files = Mongrel::DirHandler.new(dir,false) @guard = Sync.new That last line, comment it out and add: @guard = Mutex.new Then try your test again and see if you can reproduce that mode of failure. Kirk Haines
On Thu, Sep 14, 2006 at 10:04:56AM -0600, Kirk Haines wrote:> On 9/14/06, Jacob Atzen <jacob at jacobatzen.dk> wrote: > > > How do I get Mongrel to use Mutex? > > In your mongrel/rails.rb, right at the beginning: > > require ''sync'' > > Comment that out and add > > require ''thread'' > > Use the thread lib I sent to you, as it should fix the problem that > caused Zed to want to switch from Mutex to Sync in the first place. > > Then, just below that, in this code: > > class RailsHandler < Mongrel::HttpHandler > attr_reader :files > attr_reader :guard > @@file_only_methods = ["GET","HEAD"] > > def initialize(dir, mime_map = {}) > @files = Mongrel::DirHandler.new(dir,false) > @guard = Sync.new > > That last line, comment it out and add: > > @guard = Mutex.new > > Then try your test again and see if you can reproduce that mode of failure.A little heads-up before the weekend, I''ve been running with the above modifications and a few more to make the mutex calls run. So far I''ve been processing ~28000 rails requests on a mongrel with a limit of 64 processors. I will perform further testing and supply a full diff of my modifications early next week. -- Cheers, - Jacob Atzen
On Fri, Sep 15, 2006 at 03:02:11PM +0200, Jacob Atzen wrote:> A little heads-up before the weekend, I''ve been running with the above > modifications and a few more to make the mutex calls run. So far I''ve > been processing ~28000 rails requests on a mongrel with a limit of 64 > processors. I will perform further testing and supply a full diff of my > modifications early next week.I have now performed some more extensive testing based on the suggestions by Kirk and have found Mongrel crashing when I raise the number of processors. I have attached a diff of my altarations to rails.rb in case anyone is interested. So alas, this did not cure the problems. I will try to upgrade mongrel to the new release and see how that holds up. Here is a snippet from mongrel.log up to the point where Mongrel crashed: Server overloaded with 512 processors (512 max). Dropping connection. Fri Sep 15 21:45:46 CEST 2006: Reaping 512 threads for slow workers because of ''max processors'' Thread #<Thread:0x8dc971c sleep> is too old, killing. Fri Sep 15 21:45:46 CEST 2006: Error calling Dispatcher.dispatch #<Mongrel::TimeoutError: Mongrel timed out this thread: max processors> /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:637:in `lock'' /usr/local/lib/ruby/1.8/thread.rb:133:in `synchronize'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/rails.rb:88:in `process'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:580:in `process_client'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:579:in `process_client'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:686:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:686:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:673:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:267:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:203:in `run'' /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231 /usr/local/bin/mongrel_rails:18 Thread #<Thread:0x8fcf14c run> is too old, killing. Fri Sep 15 21:45:46 CEST 2006: ERROR: Mongrel timed out this thread: max processors Thread #<Thread:0x8cc6b6c dead> is too old, killing. Thread #<Thread:0x8d15050 dead> is too old, killing. Thread #<Thread:0x8f294e0 dead> is too old, killing. Thread #<Thread:0x8f6c394 dead> is too old, killing. Thread #<Thread:0x8fac7c8 dead> is too old, killing. Thread #<Thread:0x8fc41c0 dead> is too old, killing. Thread #<Thread:0x8fc8590 dead> is too old, killing. Thread #<Thread:0x8fd0830 run> is too old, killing. /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:637:in `run'': Mongrel timed out this thread: max processors (Mongrel::T imeoutError) from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:686:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:673:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:267:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:203:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231 from /usr/local/bin/mongrel_rails:18 -- Cheers, - Jacob Atzen -------------- next part -------------- 9c9 < require ''sync'' ---> require ''thread''45c45 < @guard = Sync.new ---> @guard = Mutex.new83c83 < @guard.synchronize(:EX) { ---> @guard.synchronize {100c100 < STDERR.puts "#{Time.now}: #{@guard.sync_waiting.length} threads sync_waiting for #{event}, #{self.listener.workers.list.length} still active in mongrel." ---> #STDERR.puts "#{Time.now}: #{@guard.sync_waiting.length} threads sync_waiting for #{event}, #{self.listener.workers.list.length} still active in mongrel."