Hello all, For the past couple of weeks I have been spending some time debugging a couple of issues I was having with Mongrel when I put load on it. I have seen two distinct issues: 1. Mongrel stopped responding as if in an endless loop. 2. Mongrel crashed when severely loaded. I believe to have resolved these two issues and have attached patches which shows the resolution (simple as it is). Explanation of the patches is given below. The first problem is handled by the patch to sync.rb from the standard library. What is happening here is that when sync_unlock is called Thread.critical is set to true. Now if the thread is not the sync_ex_locker an exception is thrown without Thread.critical being set to false. This in turn resulted in a situation where the mongrel_sleeper_thread (configurator.rb:270) was the only thread getting back on the cpu and Thread.critical stayed true. The patch simply ensures that Thread.critical is set to false upon leaving sync.rb. I am not sure if this is really the correct way to handle this issue though. As some famous programmers have been known to say "select() ain''t broken" so I''m not really sure what to think of this. The second problem stems from the fact that Mongrel uses the Thread#abort_on_exception. I''m not sure why this is even in there, as the documentation says: When set to true, causes all threads (including the main program) to abort if an exception is raised in thr. The process will effectively exit(0). The patch simply removes the abort_on_exception from mongrel.rb. After applying this patch I have been unable to make Mongrel crash. Finally I have provided a debug patch for the Sync library which simply adds a lot of debug output to STDERR. I believe it might be of use in future performance optimizations as there seems to be happening a lot of work managing the queued up clients. -- Cheers, - Jacob Atzen -------------- next part -------------- Index: lib/mongrel.rb ==================================================================--- lib/mongrel.rb (revision 353) +++ lib/mongrel.rb (working copy) @@ -687,7 +687,6 @@ reap_dead_workers("max processors") else thread = Thread.new(client) {|c| process_client(c) } - thread.abort_on_exception = true thread[:started_on] = Time.now @workers.add(thread) -------------- next part -------------- --- sync.rb Sun Oct 1 21:02:28 2006 +++ sync.new.rb Sun Oct 1 21:05:28 2006 @@ -131,8 +131,10 @@ def sync_try_lock(mode = EX) return unlock if sync_mode == UN + print_critical("sync_try_lock", "1", "true") Thread.critical = true ret = sync_try_lock_sub(sync_mode) + print_critical("sync_try_lock", "2", "false") Thread.critical = false ret end @@ -140,22 +142,27 @@ def sync_lock(m = EX) return unlock if m == UN - until (Thread.critical = true; sync_try_lock_sub(m)) + until (print_critical("sync_lock", "1", "true"); Thread.critical = true; sync_try_lock_sub(m)) if sync_sh_locker[Thread.current] sync_upgrade_waiting.push [Thread.current, sync_sh_locker[Thread.current]] sync_sh_locker.delete(Thread.current) else + STDERR.print "[sync_lock:2] Pushing #{Thread.current.inspect} behind #{sync_waiting.size} others\n" sync_waiting.push Thread.current end + print_critical("sync_lock", "3", "false") Thread.stop end + print_critical("sync_lock", "4", "false") Thread.critical = false self end def sync_unlock(m = EX) + print_critical("sync_unlock", "1", "true") Thread.critical = true if sync_mode == UN + print_critical("sync_unlock", "2", "false") Thread.critical = false Err::UnknownLocker.Fail(Thread.current) end @@ -165,6 +172,7 @@ runnable = false case m when UN + print_critical("sync_unlock", "3", "false") Thread.critical = false Err::UnknownLocker.Fail(Thread.current) @@ -173,13 +181,17 @@ if (self.sync_ex_count = sync_ex_count - 1) == 0 self.sync_ex_locker = nil if sync_sh_locker.include?(Thread.current) + STDERR.print "[sync_unlock] Setting sync_mode = SH\n" self.sync_mode = SH else + STDERR.print "[sync_unlock] Setting sync_mode = UN\n" self.sync_mode = UN end runnable = true end else + # Patching criticalities when exceptions are thrown + print_critical("sync_unlock", "4", "false") Thread.critical = false Err::UnknownLocker.Fail(Thread.current) end @@ -191,6 +203,7 @@ if (sync_sh_locker[Thread.current] = count - 1) == 0 sync_sh_locker.delete(Thread.current) if sync_sh_locker.empty? and sync_ex_count == 0 + STDERR.print "[sync_unlock] Setting sync_mode = UN\n" self.sync_mode = UN runnable = true end @@ -205,6 +218,11 @@ end wait = sync_upgrade_waiting self.sync_upgrade_waiting = [] + for w, v in wait + STDERR.print "[sync_unlock:5] Starting thread #{w.inspect}\n" + end + + print_critical("sync_unlock", "6", "false") Thread.critical = false for w, v in wait @@ -213,22 +231,31 @@ else wait = sync_waiting self.sync_waiting = [] + print_critical("sync_unlock", "7", "false") Thread.critical = false for w in wait + STDERR.print "[sync_unlock:8] Running #{w.inspect}\n" w.run end end end - + print_critical("sync_unlock", "9", "false") Thread.critical = false self end + def print_critical(method, count, bool) + STDERR.print "[#{method}:#{count}] Thread.critical = #{bool} #{Thread.current.inspect}\n" + end + def sync_synchronize(mode = EX) begin + STDERR.print "[sync_synchronize] Getting lock #{Thread.current.inspect}\n" sync_lock(mode) + STDERR.print "[sync_synchronize] Yielding #{Thread.current.inspect}\n" yield ensure + STDERR.print "[sync_synchronize] Unlocking #{Thread.current.inspect}\n" sync_unlock end end @@ -292,6 +319,7 @@ ret = false end else + print_critical("sync_try_lock_sub", "1", "false") Thread.critical = false Err::LockModeFailer.Fail mode end -------------- next part -------------- --- sync.orig.rb Sun Oct 1 20:57:39 2006 +++ sync.rb Sun Oct 1 21:02:28 2006 @@ -180,6 +180,7 @@ runnable = true end else + Thread.critical = false Err::UnknownLocker.Fail(Thread.current) end
On Sun, 1 Oct 2006 21:38:12 +0200 Jacob Atzen <jacob at jacobatzen.dk> wrote:> Hello all, > > For the past couple of weeks I have been spending some time debugging a > couple of issues I was having with Mongrel when I put load on it. I have > seen two distinct issues: > > 1. Mongrel stopped responding as if in an endless loop. > 2. Mongrel crashed when severely loaded. >Cool, glad you took the time to figure this out.> I believe to have resolved these two issues and have attached patches > which shows the resolution (simple as it is). Explanation of the patches > is given below. > > The first problem is handled by the patch to sync.rb from the standard > library. What is happening here is that when sync_unlock is called > Thread.critical is set to true. Now if the thread is not the > sync_ex_locker an exception is thrown without Thread.critical being set > to false. This in turn resulted in a situation where the > mongrel_sleeper_thread (configurator.rb:270) was the only thread getting > back on the cpu and Thread.critical stayed true. The patch simply > ensures that Thread.critical is set to false upon leaving sync.rb. >Ok, is there a way to fix this without having people backpatch their ruby? Also, why were you the only one having this problem? I''d like to know how the error is caused if you could explain it.> I am not sure if this is really the correct way to handle this issue > though. As some famous programmers have been known to say "select() > ain''t broken" so I''m not really sure what to think of this. >Interesting side note is that when you mix threads and select in ruby then the interpreter can randomly decide it''s in deadlock without much explanation. Damned if you do, also if you don''t.> The second problem stems from the fact that Mongrel uses the > Thread#abort_on_exception. I''m not sure why this is even in there, as > the documentation says: > > When set to true, causes all threads (including the main > program) to abort if an exception is raised in thr. The process > will effectively exit(0). > > The patch simply removes the abort_on_exception from mongrel.rb. After > applying this patch I have been unable to make Mongrel crash. >The abort was put in there to catch exceptions that are "leaking" through and not being caught. I''m curious what exception was being thrown that *none* of the damn begin/rescue blocks catches. We seriously need a begin/rescue-every-damn-thing-no-matter-what construct that actually works. Now, I believe the problem you''ll have is when this exception leaks through that thread will become dead and eventually you''ll fill up and Mongrel will die anyway. If you can, try to find out what exception causes this so I can have the server kill off its threads properly by handling this yet another annoying random exception.> Finally I have provided a debug patch for the Sync library which simply > adds a lot of debug output to STDERR. I believe it might be of use in > future performance optimizations as there seems to be happening a lot of > work managing the queued up clients.This might not be needed as the ruby-core guys finally started taking a serious look at how array works and we can probably switch back to Mutex in the near future. Thanks again Jacob, if you can answer the questions I had so I can work on a fix that doesn''t involve updating ruby. Also an explanation as to why you were having these problems will help people decide if they should apply the patches too. -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw wrote:> Ok, is there a way to fix this without having people backpatch their > ruby? Also, why were you the only one having this problem? I''d like > to know how the error is caused if you could explain it.I''m not sure about this. I''ll investigate further and see what I can come up with.> > The second problem stems from the fact that Mongrel uses the > > Thread#abort_on_exception. I''m not sure why this is even in there, as > > the documentation says: > > > > When set to true, causes all threads (including the main > > program) to abort if an exception is raised in thr. The process > > will effectively exit(0). > > > > The patch simply removes the abort_on_exception from mongrel.rb. After > > applying this patch I have been unable to make Mongrel crash. > > > > The abort was put in there to catch exceptions that are "leaking" > through and not being caught. I''m curious what exception was being > thrown that *none* of the damn begin/rescue blocks catches. We > seriously need a begin/rescue-every-damn-thing-no-matter-what > construct that actually works. > > Now, I believe the problem you''ll have is when this exception leaks > through that thread will become dead and eventually you''ll fill up and > Mongrel will die anyway. If you can, try to find out what exception > causes this so I can have the server kill off its threads properly by > handling this yet another annoying random exception.I''m not sure this is really a problem. It seems that threads are killed of nicely anyhow. But I''ll dig into it and report back on this one too. -- Cheers, - Jacob Atzen
On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw wrote:> > The second problem stems from the fact that Mongrel uses the > > Thread#abort_on_exception. I''m not sure why this is even in there, as > > the documentation says: > > > > When set to true, causes all threads (including the main > > program) to abort if an exception is raised in thr. The process > > will effectively exit(0). > > > > The patch simply removes the abort_on_exception from mongrel.rb. After > > applying this patch I have been unable to make Mongrel crash. > > > > The abort was put in there to catch exceptions that are "leaking" > through and not being caught. I''m curious what exception was being > thrown that *none* of the damn begin/rescue blocks catches. We > seriously need a begin/rescue-every-damn-thing-no-matter-what > construct that actually works.Some further digging gives another pointer towards the problem. I am seeing some backtraces that look like this (beware, the line numbers might be a little off): /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in `run'': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError) from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:704:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:684:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:268:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:211:in `run'' from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231 from /usr/local/bin/mongrel_rails:18 The mongrel.rb:646 is the w.raise() call and 704 is: thread = Thread.new(client) { |c| What I make of this is that somehow the thread is put aside as soon as it''s created, that is _before_ the process_client() call and when the thread is then killed off there''s nothing to catch the exception. Now I am seeing exceptions being cast at line 646 before this one, but they have different backtraces, typically involving sync.rb and seems to be handled more properly. As far as I can tell they are all handled by the "Error calling Dispatcher.dispatch" rescue block. Another interesting note is that after an exception is cast at this exact point and with the above backtrace all processes go into the aborting state. And when they''re all killed off Mongrel dies. It''s actually quite clear as the following log exerpt will show: % grep -ni timeout log/mongrel.14.log 127011:Tue Oct 03 21:57:28 CEST 2006: Error calling Dispatcher.dispatch #<Mongrel::TimeoutError: Mongrel timed out this thread: max processors#<Thread:0x8fe760c run>> from #<Thread:0x8fe760c run> 203806:/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in `run'': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError) % grep -n "aborting" log/mongrel.14.log | head -n 3 203815:[sync_synchronize] Unlocking #<Thread:0x8f3a72c aborting> 203816:[sync_unlock:1] Thread.critical = true #<Thread:0x8f3a72c aborting> 203817:[sync_unlock:4] Thread.critical = false #<Thread:0x8f3a72c aborting> I have tried adding a begin/rescue/end around the process_client() call like this: begin process_client(thread_client); rescue Object STDERR.print "ERROR: RESCUING BEFORE CALL\n" end And it seems to be doing the job. At least I haven''t been able to make Mongrel crash yet, with whatever little testing I''ve been doing. Yet I have seen the error message show up a couple of times. What are your thoughts on this? Does it make sense?> Now, I believe the problem you''ll have is when this exception leaks > through that thread will become dead and eventually you''ll fill up and > Mongrel will die anyway. If you can, try to find out what exception > causes this so I can have the server kill off its threads properly by > handling this yet another annoying random exception.Actually from what I''ve seen so far, the threads seems to be killed of properly, if somewhat slow. I''m printing the process list in every run of the sleeper thread, and from what I can see Mongrel will clean out the threads even when the exception leaks through. From my understanding when an exception gets to the top level of the thread, the thread is simply killed off. Feel free to correct me if I''m wrong. It does seem that some threads keep lingering in Mongrel even after all requests have died. But this seems to be the case whether the abort flag is set or not. It looks as if some enter an eternal sleeping state. I don''t think it''s much of a problem stability wise though, as they''ll just get killed of whenever the reaper gets to work.> This might not be needed as the ruby-core guys finally started taking > a serious look at how array works and we can probably switch back to > Mutex in the near future.I have no idea how Mutex handles locking, but sync seems to be doing a whole lot of work over and over again. It seems to me it''s really meant for smaller locking queues.> Thanks again Jacob, if you can answer the questions I had so I can > work on a fix that doesn''t involve updating ruby. Also an explanation > as to why you were having these problems will help people decide if > they should apply the patches too.I''m still not sure why I''m the only one seeing these problems. Maybe others are seing them too and just not being aware of them. Maybe they only show up when the Mongrels gets severely loaded. Maybe I''m simply the only one butchering poor Mongrels for fun in my spare-time? ;-) In regards to the sync issues the jury is still out on that one. I will need to get a deeper understanding of how sync works to get to the bottom of it, but I''ll continue poking around. -- Cheers, - Jacob Atzen
On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:> I have no idea how Mutex handles locking, but sync seems to be doing a > whole lot of work over and over again. It seems to me it''s really meant > for smaller locking queues.They handle locking in more or less the same way. What is really substantially different is the unlocking. Sync goes and wakes up the entire queue of waiting threads. They scramble for the lock again, and all but one (probably the first one to be woken up) fail to get it and are relocked. If you have a big thread queue, it is a lot of extra work. Mutex, on the other hand, just handles it one thread at a time, so it is _MUCH_ more efficient with the unlocking.> In regards to the sync issues the jury is still out on that one. I will > need to get a deeper understanding of how sync works to get to the > bottom of it, but I''ll continue poking around.That with Mutex. It''t very simple to understand. Then go take a look at Sync. You have to shift through an uglier, more antiquated coding style, and there is a bunch of extra code to deal with the different types of locks that Sync supports, but if you filter through that to find the lock and unlock pieces of code for exclusive locks, you''ll quickly start to understand the differences. Mutex is also so very simple that you''ll see why I prefer to use a custom version that works around the Array memory mismanagement that bites the standard one over Sync, which is ugly enough to make the baby Jesus cry. Kirk Haines
On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:> > I''m still not sure why I''m the only one seeing these problems. Maybe > others are seing them too and just not being aware of them. Maybe they > only show up when the Mongrels gets severely loaded. Maybe I''m simply > the only one butchering poor Mongrels for fun in my spare-time? ;-)I see the same behavior with a handler that does some slow, blocking IO (resolving symlinks on heavily loaded NFS servers). I worked around it by using more Mongrels instead of more worker threads and by backgrounding one long-running task which could safely send an HTTP response before completing. It''d be cool if Mongrel forked itself (up to a max # of processes) to alleviate the issues that arise with Rails handler locking and with blocking IO due to Ruby''s green threads. Mongrel cluster could work but it feels like overkill when forking would do the job with zero config (minus Win32). jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20061003/dcd4c6cf/attachment.html
On 10/3/06, Jeremy Kemper <jeremy at bitsweat.net> wrote:> On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote: > > I''m still not sure why I''m the only one seeing these problems. Maybe > > others are seing them too and just not being aware of them. Maybe they > > only show up when the Mongrels gets severely loaded. Maybe I''m simply > > the only one butchering poor Mongrels for fun in my spare-time? ;-) > > I see the same behavior with a handler that does some slow, blocking IO > (resolving symlinks on heavily loaded NFS servers). I worked around it by > using more Mongrels instead of more worker threads and by backgrounding one > long-running task which could safely send an HTTP response before > completing. > > It''d be cool if Mongrel forked itself (up to a max # of processes) to > alleviate the issues that arise with Rails handler locking and with > blocking IO due to Ruby''s green threads. Mongrel cluster could work but it > feels like overkill when forking would do the job with zero config (minus > Win32).Jeremy, I''m working on a improved mongrel_service (for win32) that will handle the mongrel_cluster scenario in a simplified way. Still, configuration will be required, but each time you need to change the parameters for your mongrels, wouldn''t need to remove/install cycle, just change the config file and restart the service. Also, with the new schema will get child process monitoring and re spawning if they are needed. (Still, don''t know how to differentiate a sleeping-dead-zombie mongrel from the rest, but guess will figure it out ;)> > jeremy > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users > >-- Luis Lavena Multimedia systems - Leaders are made, they are not born. They are made by hard effort, which is the price which all of us must pay to achieve any goal that is worthwhile. Vince Lombardi
"Zed A. Shaw" <zedshaw at zedshaw.com> writes:> On Sun, 1 Oct 2006 21:38:12 +0200 > Jacob Atzen <jacob at jacobatzen.dk> wrote: > >> Hello all, >> >> For the past couple of weeks I have been spending some time debugging a >> couple of issues I was having with Mongrel when I put load on it. I have >> seen two distinct issues: >> >> 1. Mongrel stopped responding as if in an endless loop. >> 2. Mongrel crashed when severely loaded. >> > > Cool, glad you took the time to figure this out. > >> I believe to have resolved these two issues and have attached patches >> which shows the resolution (simple as it is). Explanation of the patches >> is given below. >> >> The first problem is handled by the patch to sync.rb from the standard >> library. What is happening here is that when sync_unlock is called >> Thread.critical is set to true. Now if the thread is not the >> sync_ex_locker an exception is thrown without Thread.critical being set >> to false. This in turn resulted in a situation where the >> mongrel_sleeper_thread (configurator.rb:270) was the only thread getting >> back on the cpu and Thread.critical stayed true. The patch simply >> ensures that Thread.critical is set to false upon leaving sync.rb. > > Ok, is there a way to fix this without having people backpatch their > ruby? Also, why were you the only one having this problem? I''d > like to know how the error is caused if you could explain it.He''s definitely not the only one having this problem. I''ve been working on it too as I''ve had time. I--and possibly others--have been wasting time with USR1 trying to figure out what *my* code must have been doing wrong. Once I realized that wasn''t helping, I had got as far as commenting out the sleeper thread so that I could get a useful backtrace, but then it looked like it my query that was hanging. I see now that I was stupidly glossing over the most important frame. thread.rb:100:in `lock'' oci8.rb:126:in `do_ocicall'' oci8.rb:232:in `commit'' (eval):3:in `commit'' active_record/connection_adapters/oracle_adapter.rb:295:in `commit_db_transaction'' active_record/connection_adapters/abstract/database_stateme ... For people who have this problem, the commonality is likely to be (1) the Oracle adapter, (2) actions that create threads. If other (mysql) adapters don''t use threads then that may explain it. Steve
Zed A. Shaw
2006-Oct-04 23:25 UTC
[Mongrel] Mongrel pre-release 0.3.13.5(Was: Mongrel woes fixed)
On Sun, 1 Oct 2006 21:38:12 +0200 Jacob Atzen <jacob at jacobatzen.dk> wrote:> Hello all, > > For the past couple of weeks I have been spending some time debugging a > couple of issues I was having with Mongrel when I put load on it. I have > seen two distinct issues: > > 1. Mongrel stopped responding as if in an endless loop. > 2. Mongrel crashed when severely loaded.Alright, I made the slight change to remove the abort on exception call, and a small tweak to abort if the received size of a body does not match the content-length. I also made a slight clarity change to the USR1 logging so that it reports only the action that is blocking all the other actions. Previously it was more confused that this, so I simplified it. People experiencing problems of either 99% CPU or crashing should grab the pre-release and run your server with it. If you see a stack trace saying the body lengths don''t match, send them to me. If you crash, send me the stack trace as well. I''ll also be looking at another suggested change from Jacob, but I think this will clean up nicely. And as usual you install with: gem install mongrel --source=http://mongrel.rubyforge.org/releases/ Thanks a bunch folks! -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
On Tue, 3 Oct 2006 16:51:55 -0700 "Jeremy Kemper" <jeremy at bitsweat.net> wrote:> On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote: > > > > I''m still not sure why I''m the only one seeing these problems. Maybe > > others are seing them too and just not being aware of them. Maybe they > > only show up when the Mongrels gets severely loaded. Maybe I''m simply > > the only one butchering poor Mongrels for fun in my spare-time? ;-) > > > I see the same behavior with a handler that does some slow, blocking IO > (resolving symlinks on heavily loaded NFS servers). I worked around it by > using more Mongrels instead of more worker threads and by backgrounding one > long-running task which could safely send an HTTP response before > completing. > > It''d be cool if Mongrel forked itself (up to a max # of processes) to > alleviate the issues that arise with Rails handler locking and with > blocking IO due to Ruby''s green threads. Mongrel cluster could work but it > feels like overkill when forking would do the job with zero config (minus > Win32).It actually wouldn''t be hard to write a simple HttpHandler that was inserted before rails which just called fork to continue the request, but I think it''d require a shift in how the response is processed in Mongrel. The problem with forking servers is they are slow and they end up having the same problems we''re seeing with threads just with processes (which eat up ass loads more ram than a dead thread). I''m really more interested in something that''s in-between full on 50M forked processes for each request and nasty ruby threading causing problems. Another thought might be to have such a forking handler for Mongrel, but make it selective so that it only forked the troublesome requests. -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
On Wed, 04 Oct 2006 14:50:19 -0700 Steven Lumos <steven at lumos.us> wrote:> "Zed A. Shaw" <zedshaw at zedshaw.com> writes: > > > On Sun, 1 Oct 2006 21:38:12 +0200 > > Jacob Atzen <jacob at jacobatzen.dk> wrote: > > > > Ok, is there a way to fix this without having people backpatch their > > ruby? Also, why were you the only one having this problem? I''d > > like to know how the error is caused if you could explain it. > > He''s definitely not the only one having this problem. I''ve been > working on it too as I''ve had time. I--and possibly others--have been > wasting time with USR1 trying to figure out what *my* code must have > been doing wrong. Once I realized that wasn''t helping, I had got as > far as commenting out the sleeper thread so that I could get a useful > backtrace, but then it looked like it my query that was hanging. I > see now that I was stupidly glossing over the most important frame. > > thread.rb:100:in `lock'' > oci8.rb:126:in `do_ocicall'' > oci8.rb:232:in `commit'' > (eval):3:in `commit'' > active_record/connection_adapters/oracle_adapter.rb:295:in `commit_db_transaction'' > active_record/connection_adapters/abstract/database_stateme ... > > For people who have this problem, the commonality is likely to be (1) > the Oracle adapter, (2) actions that create threads. If other (mysql) > adapters don''t use threads then that may explain it.Ah, but I''m wondering if there''s a possible deadlock in oci8.rb when it does that lock call. Ruby''s supposed to detect those. Also, if the oci8 driver is kicking up threads to do its requests, and it isn''t cleaning them out, then you''ll quickly run out of file descriptors. Are you also seeing tons of connections to your oracle db? Like one per request? -- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
Hey Jacob, did you get this resolved? I flaked out on it so sorry it was back in October. :-) On Tue, 3 Oct 2006 23:13:26 +0200 Jacob Atzen <jacob at jacobatzen.dk> wrote:> On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw wrote: > > > The second problem stems from the fact that Mongrel uses the > > > Thread#abort_on_exception. I''m not sure why this is even in there, as > > > the documentation says: > > > > > > When set to true, causes all threads (including the main > > > program) to abort if an exception is raised in thr. The process > > > will effectively exit(0). > > > > > > The patch simply removes the abort_on_exception from mongrel.rb. After > > > applying this patch I have been unable to make Mongrel crash. > > > > > > > The abort was put in there to catch exceptions that are "leaking" > > through and not being caught. I''m curious what exception was being > > thrown that *none* of the damn begin/rescue blocks catches. We > > seriously need a begin/rescue-every-damn-thing-no-matter-what > > construct that actually works. > > Some further digging gives another pointer towards the problem. I am > seeing some backtraces that look like this (beware, the line numbers > might be a little off): > > /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in `run'': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError) > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:704:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:684:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:268:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:211:in `run'' > from /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231 > from /usr/local/bin/mongrel_rails:18 > > The mongrel.rb:646 is the w.raise() call and 704 is: > > thread = Thread.new(client) { |c| > > What I make of this is that somehow the thread is put aside as soon as > it''s created, that is _before_ the process_client() call and when the > thread is then killed off there''s nothing to catch the exception. > > Now I am seeing exceptions being cast at line 646 before this one, but > they have different backtraces, typically involving sync.rb and seems to > be handled more properly. As far as I can tell they are all handled by > the "Error calling Dispatcher.dispatch" rescue block. > > Another interesting note is that after an exception is cast at this > exact point and with the above backtrace all processes go into the > aborting state. And when they''re all killed off Mongrel dies. It''s > actually quite clear as the following log exerpt will show: > > > % grep -ni timeout log/mongrel.14.log > 127011:Tue Oct 03 21:57:28 CEST 2006: Error calling Dispatcher.dispatch #<Mongrel::TimeoutError: Mongrel timed out this thread: max processors#<Thread:0x8fe760c run>> from #<Thread:0x8fe760c run> > 203806:/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in `run'': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError) > > % grep -n "aborting" log/mongrel.14.log | head -n 3 > 203815:[sync_synchronize] Unlocking #<Thread:0x8f3a72c aborting> > 203816:[sync_unlock:1] Thread.critical = true #<Thread:0x8f3a72c aborting> > 203817:[sync_unlock:4] Thread.critical = false #<Thread:0x8f3a72c aborting> > > > I have tried adding a begin/rescue/end around the process_client() call > like this: > > begin > process_client(thread_client); > rescue Object > STDERR.print "ERROR: RESCUING BEFORE CALL\n" > end > > And it seems to be doing the job. At least I haven''t been able to make > Mongrel crash yet, with whatever little testing I''ve been doing. Yet I > have seen the error message show up a couple of times. > > What are your thoughts on this? Does it make sense? > > > Now, I believe the problem you''ll have is when this exception leaks > > through that thread will become dead and eventually you''ll fill up and > > Mongrel will die anyway. If you can, try to find out what exception > > causes this so I can have the server kill off its threads properly by > > handling this yet another annoying random exception. > > Actually from what I''ve seen so far, the threads seems to be killed of > properly, if somewhat slow. I''m printing the process list in every run > of the sleeper thread, and from what I can see Mongrel will clean out > the threads even when the exception leaks through. From my understanding > when an exception gets to the top level of the thread, the thread is > simply killed off. Feel free to correct me if I''m wrong. > > It does seem that some threads keep lingering in Mongrel even after all > requests have died. But this seems to be the case whether the abort flag > is set or not. It looks as if some enter an eternal sleeping state. I > don''t think it''s much of a problem stability wise though, as they''ll > just get killed of whenever the reaper gets to work. > > > This might not be needed as the ruby-core guys finally started taking > > a serious look at how array works and we can probably switch back to > > Mutex in the near future. > > I have no idea how Mutex handles locking, but sync seems to be doing a > whole lot of work over and over again. It seems to me it''s really meant > for smaller locking queues. > > > Thanks again Jacob, if you can answer the questions I had so I can > > work on a fix that doesn''t involve updating ruby. Also an explanation > > as to why you were having these problems will help people decide if > > they should apply the patches too. > > I''m still not sure why I''m the only one seeing these problems. Maybe > others are seing them too and just not being aware of them. Maybe they > only show up when the Mongrels gets severely loaded. Maybe I''m simply > the only one butchering poor Mongrels for fun in my spare-time? ;-) > > In regards to the sync issues the jury is still out on that one. I will > need to get a deeper understanding of how sync works to get to the > bottom of it, but I''ll continue poking around. > > -- > Cheers, > - Jacob Atzen > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users-- Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu http://www.zedshaw.com/ http://www.awprofessional.com/title/0321483502 -- The Mongrel Book http://mongrel.rubyforge.org/ http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.