thr3ads.net - Mongrel users - [Mongrel] Mongrel woes fixed [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Jacob Atzen

2006-Oct-01 19:38 UTC

[Mongrel] Mongrel woes fixed

Hello all,

For the past couple of weeks I have been spending some time debugging a
couple of issues I was having with Mongrel when I put load on it. I have
seen two distinct issues:

1. Mongrel stopped responding as if in an endless loop.
2. Mongrel crashed when severely loaded.

I believe to have resolved these two issues and have attached patches
which shows the resolution (simple as it is). Explanation of the patches
is given below.

The first problem is handled by the patch to sync.rb from the standard
library. What is happening here is that when sync_unlock is called
Thread.critical is set to true. Now if the thread is not the
sync_ex_locker an exception is thrown without Thread.critical being set
to false. This in turn resulted in a situation where the
mongrel_sleeper_thread (configurator.rb:270) was the only thread getting
back on the cpu and Thread.critical stayed true. The patch simply
ensures that Thread.critical is set to false upon leaving sync.rb.

I am not sure if this is really the correct way to handle this issue
though. As some famous programmers have been known to say "select()
ain''t broken" so I''m not really sure what to think of
this.

The second problem stems from the fact that Mongrel uses the
Thread#abort_on_exception. I''m not sure why this is even in there, as
the documentation says:

        When set to true, causes all threads (including the main
        program) to abort if an exception is raised in thr. The process
        will effectively exit(0).

The patch simply removes the abort_on_exception from mongrel.rb. After
applying this patch I have been unable to make Mongrel crash.

Finally I have provided a debug patch for the Sync library which simply
adds a lot of debug output to STDERR. I believe it might be of use in
future performance optimizations as there seems to be happening a lot of
work managing the queued up clients.

-- 
Cheers,
- Jacob Atzen
-------------- next part --------------
Index: lib/mongrel.rb
==================================================================---
lib/mongrel.rb	(revision 353)
+++ lib/mongrel.rb	(working copy)
@@ -687,7 +687,6 @@
               reap_dead_workers("max processors")
             else
               thread = Thread.new(client) {|c| process_client(c) }
-              thread.abort_on_exception = true
               thread[:started_on] = Time.now
               @workers.add(thread)
 
-------------- next part --------------
--- sync.rb	Sun Oct  1 21:02:28 2006
+++ sync.new.rb	Sun Oct  1 21:05:28 2006
@@ -131,8 +131,10 @@
   def sync_try_lock(mode = EX)
     return unlock if sync_mode == UN
     
+    print_critical("sync_try_lock", "1", "true")
     Thread.critical = true
     ret = sync_try_lock_sub(sync_mode)
+    print_critical("sync_try_lock", "2", "false")
     Thread.critical = false
     ret
   end
@@ -140,22 +142,27 @@
   def sync_lock(m = EX)
     return unlock if m == UN
 
-    until (Thread.critical = true; sync_try_lock_sub(m))
+    until (print_critical("sync_lock", "1",
"true"); Thread.critical = true; sync_try_lock_sub(m))
       if sync_sh_locker[Thread.current]
 	sync_upgrade_waiting.push [Thread.current, sync_sh_locker[Thread.current]]
 	sync_sh_locker.delete(Thread.current)
       else
+        STDERR.print "[sync_lock:2] Pushing #{Thread.current.inspect}
behind #{sync_waiting.size} others\n"
 	sync_waiting.push Thread.current
       end
+      print_critical("sync_lock", "3", "false")
       Thread.stop
     end
+    print_critical("sync_lock", "4", "false")
     Thread.critical = false
     self
   end
   
   def sync_unlock(m = EX)
+    print_critical("sync_unlock", "1", "true")
     Thread.critical = true
     if sync_mode == UN
+      print_critical("sync_unlock", "2", "false")
       Thread.critical = false
       Err::UnknownLocker.Fail(Thread.current)
     end
@@ -165,6 +172,7 @@
     runnable = false
     case m
     when UN
+      print_critical("sync_unlock", "3", "false")
       Thread.critical = false
       Err::UnknownLocker.Fail(Thread.current)
       
@@ -173,13 +181,17 @@
 	if (self.sync_ex_count = sync_ex_count - 1) == 0
 	  self.sync_ex_locker = nil
 	  if sync_sh_locker.include?(Thread.current)
+            STDERR.print "[sync_unlock] Setting sync_mode = SH\n"
 	    self.sync_mode = SH
 	  else
+            STDERR.print "[sync_unlock] Setting sync_mode = UN\n"
 	    self.sync_mode = UN
 	  end
 	  runnable = true
 	end
       else
+        # Patching criticalities when exceptions are thrown
+        print_critical("sync_unlock", "4",
"false")
 	Thread.critical = false
 	Err::UnknownLocker.Fail(Thread.current)
       end
@@ -191,6 +203,7 @@
 	if (sync_sh_locker[Thread.current] = count - 1) == 0 
 	  sync_sh_locker.delete(Thread.current)
 	  if sync_sh_locker.empty? and sync_ex_count == 0
+            STDERR.print "[sync_unlock] Setting sync_mode = UN\n"
 	    self.sync_mode = UN
 	    runnable = true
 	  end
@@ -205,6 +218,11 @@
 	end
 	wait = sync_upgrade_waiting
 	self.sync_upgrade_waiting = []
+        for w, v in wait
+          STDERR.print "[sync_unlock:5] Starting thread
#{w.inspect}\n"
+        end
+
+        print_critical("sync_unlock", "6",
"false")
 	Thread.critical = false
 	
 	for w, v in wait
@@ -213,22 +231,31 @@
       else
 	wait = sync_waiting
 	self.sync_waiting = []
+        print_critical("sync_unlock", "7",
"false")
 	Thread.critical = false
 	for w in wait
+          STDERR.print "[sync_unlock:8] Running #{w.inspect}\n"
 	  w.run
 	end
       end
     end
-    
+    print_critical("sync_unlock", "9", "false")
     Thread.critical = false
     self
   end
   
+  def print_critical(method, count, bool)
+    STDERR.print "[#{method}:#{count}] Thread.critical = #{bool}
#{Thread.current.inspect}\n"
+  end
+  
   def sync_synchronize(mode = EX)
     begin
+      STDERR.print "[sync_synchronize] Getting lock
#{Thread.current.inspect}\n"
       sync_lock(mode)
+      STDERR.print "[sync_synchronize] Yielding
#{Thread.current.inspect}\n"
       yield
     ensure
+      STDERR.print "[sync_synchronize] Unlocking
#{Thread.current.inspect}\n"
       sync_unlock
     end
   end
@@ -292,6 +319,7 @@
 	ret = false
       end
     else
+      print_critical("sync_try_lock_sub", "1",
"false")
       Thread.critical = false
       Err::LockModeFailer.Fail mode
     end
-------------- next part --------------
--- sync.orig.rb	Sun Oct  1 20:57:39 2006
+++ sync.rb	Sun Oct  1 21:02:28 2006
@@ -180,6 +180,7 @@
 	  runnable = true
 	end
       else
+	Thread.critical = false
 	Err::UnknownLocker.Fail(Thread.current)
       end

Zed A. Shaw

2006-Oct-01 23:39 UTC

head link

[Mongrel] Mongrel woes fixed

On Sun, 1 Oct 2006 21:38:12 +0200
Jacob Atzen <jacob at jacobatzen.dk> wrote:
> Hello all,
> 
> For the past couple of weeks I have been spending some time debugging a
> couple of issues I was having with Mongrel when I put load on it. I have
> seen two distinct issues:
> 
> 1. Mongrel stopped responding as if in an endless loop.
> 2. Mongrel crashed when severely loaded.
> 
Cool, glad you took the time to figure this out.
> I believe to have resolved these two issues and have attached patches
> which shows the resolution (simple as it is). Explanation of the patches
> is given below.
> 
> The first problem is handled by the patch to sync.rb from the standard
> library. What is happening here is that when sync_unlock is called
> Thread.critical is set to true. Now if the thread is not the
> sync_ex_locker an exception is thrown without Thread.critical being set
> to false. This in turn resulted in a situation where the
> mongrel_sleeper_thread (configurator.rb:270) was the only thread getting
> back on the cpu and Thread.critical stayed true. The patch simply
> ensures that Thread.critical is set to false upon leaving sync.rb.
> 
Ok, is there a way to fix this without having people backpatch their ruby? 
Also, why were you the only one having this problem?  I''d like to know
how the error is caused if you could explain it.
> I am not sure if this is really the correct way to handle this issue
> though. As some famous programmers have been known to say "select()
> ain''t broken" so I''m not really sure what to think
of this.
>
Interesting side note is that when you mix threads and select in ruby then the
interpreter can randomly decide it''s in deadlock without much
explanation.  Damned if you do, also if you don''t.
 > The second problem stems from the fact that Mongrel uses the
> Thread#abort_on_exception. I''m not sure why this is even in there,
as
> the documentation says:
> 
>         When set to true, causes all threads (including the main
>         program) to abort if an exception is raised in thr. The process
>         will effectively exit(0).
> 
> The patch simply removes the abort_on_exception from mongrel.rb. After
> applying this patch I have been unable to make Mongrel crash.
> 
The abort was put in there to catch exceptions that are "leaking"
through and not being caught.  I''m curious what exception was being
thrown that *none* of the damn begin/rescue blocks catches.  We seriously need a
begin/rescue-every-damn-thing-no-matter-what construct that actually works.

Now, I believe the problem you''ll have is when this exception leaks
through that thread will become dead and eventually you''ll fill up and
Mongrel will die anyway.  If you can, try to find out what exception causes this
so I can have the server kill off its threads properly by handling this yet
another annoying random exception.
> Finally I have provided a debug patch for the Sync library which simply
> adds a lot of debug output to STDERR. I believe it might be of use in
> future performance optimizations as there seems to be happening a lot of
> work managing the queued up clients.
This might not be needed as the ruby-core guys finally started taking a serious
look at how array works and we can probably switch back to Mutex in the near
future.

Thanks again Jacob, if you can answer the questions I had so I can work on a fix
that doesn''t involve updating ruby.  Also an explanation as to why you
were having these problems will help people decide if they should apply the
patches too.

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Jacob Atzen

2006-Oct-02 21:39 UTC

head link

[Mongrel] Mongrel woes fixed

On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw
wrote:> Ok, is there a way to fix this without having people backpatch their
> ruby?  Also, why were you the only one having this problem?  I''d
like
> to know how the error is caused if you could explain it.
I''m not sure about this. I''ll investigate further and see what
I can
come up with.
> > The second problem stems from the fact that Mongrel uses the
> > Thread#abort_on_exception. I''m not sure why this is even in
there, as
> > the documentation says:
> > 
> >         When set to true, causes all threads (including the main
> >         program) to abort if an exception is raised in thr. The
process
> >         will effectively exit(0).
> > 
> > The patch simply removes the abort_on_exception from mongrel.rb. After
> > applying this patch I have been unable to make Mongrel crash.
> > 
> 
> The abort was put in there to catch exceptions that are "leaking"
> through and not being caught.  I''m curious what exception was
being
> thrown that *none* of the damn begin/rescue blocks catches.  We
> seriously need a begin/rescue-every-damn-thing-no-matter-what
> construct that actually works.
> 
> Now, I believe the problem you''ll have is when this exception
leaks
> through that thread will become dead and eventually you''ll fill up
and
> Mongrel will die anyway.  If you can, try to find out what exception
> causes this so I can have the server kill off its threads properly by
> handling this yet another annoying random exception.
I''m not sure this is really a problem. It seems that threads are killed
of nicely anyhow. But I''ll dig into it and report back on this one
too.

-- 
Cheers,
- Jacob Atzen

Jacob Atzen

2006-Oct-03 21:13 UTC

head link

[Mongrel] Mongrel woes fixed

On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw
wrote:> > The second problem stems from the fact that Mongrel uses the
> > Thread#abort_on_exception. I''m not sure why this is even in
there, as
> > the documentation says:
> > 
> >         When set to true, causes all threads (including the main
> >         program) to abort if an exception is raised in thr. The
process
> >         will effectively exit(0).
> > 
> > The patch simply removes the abort_on_exception from mongrel.rb. After
> > applying this patch I have been unable to make Mongrel crash.
> > 
> 
> The abort was put in there to catch exceptions that are "leaking"
> through and not being caught.  I''m curious what exception was
being
> thrown that *none* of the damn begin/rescue blocks catches.  We
> seriously need a begin/rescue-every-damn-thing-no-matter-what
> construct that actually works.
Some further digging gives another pointer towards the problem. I am
seeing some backtraces that look like this (beware, the line numbers
might be a little off):

/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in
`run'': Mongrel timed out this thread: max
processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:704:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:684:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:268:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:211:in
`run''
        from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231
        from /usr/local/bin/mongrel_rails:18

The mongrel.rb:646 is the w.raise() call and 704 is:

        thread = Thread.new(client) { |c|

What I make of this is that somehow the thread is put aside as soon as
it''s created, that is _before_ the process_client() call and when the
thread is then killed off there''s nothing to catch the exception.

Now I am seeing exceptions being cast at line 646 before this one, but
they have different backtraces, typically involving sync.rb and seems to
be handled more properly. As far as I can tell they are all handled by
the "Error calling Dispatcher.dispatch" rescue block.

Another interesting note is that after an exception is cast at this
exact point and with the above backtrace all processes go into the
aborting state. And when they''re all killed off Mongrel dies.
It''s
actually quite clear as the following log exerpt will show:


% grep -ni timeout log/mongrel.14.log
127011:Tue Oct 03 21:57:28 CEST 2006: Error calling Dispatcher.dispatch
#<Mongrel::TimeoutError: Mongrel timed out this thread: max
processors#<Thread:0x8fe760c run>> from #<Thread:0x8fe760c run>
203806:/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in
`run'': Mongrel timed out this thread: max
processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)

% grep -n "aborting" log/mongrel.14.log | head -n 3
203815:[sync_synchronize] Unlocking #<Thread:0x8f3a72c aborting>
203816:[sync_unlock:1] Thread.critical = true #<Thread:0x8f3a72c aborting>
203817:[sync_unlock:4] Thread.critical = false #<Thread:0x8f3a72c
aborting>


I have tried adding a begin/rescue/end around the process_client() call
like this:

    begin
      process_client(thread_client);
    rescue Object
      STDERR.print "ERROR: RESCUING BEFORE CALL\n"
    end

And it seems to be doing the job. At least I haven''t been able to make
Mongrel crash yet, with whatever little testing I''ve been doing. Yet I
have seen the error message show up a couple of times.

What are your thoughts on this? Does it make sense?
> Now, I believe the problem you''ll have is when this exception
leaks
> through that thread will become dead and eventually you''ll fill up
and
> Mongrel will die anyway.  If you can, try to find out what exception
> causes this so I can have the server kill off its threads properly by
> handling this yet another annoying random exception.
Actually from what I''ve seen so far, the threads seems to be killed of
properly, if somewhat slow. I''m printing the process list in every run
of the sleeper thread, and from what I can see Mongrel will clean out
the threads even when the exception leaks through. From my understanding
when an exception gets to the top level of the thread, the thread is
simply killed off. Feel free to correct me if I''m wrong.

It does seem that some threads keep lingering in Mongrel even after all
requests have died. But this seems to be the case whether the abort flag
is set or not. It looks as if some enter an eternal sleeping state. I
don''t think it''s much of a problem stability wise though, as
they''ll
just get killed of whenever the reaper gets to work.
> This might not be needed as the ruby-core guys finally started taking
> a serious look at how array works and we can probably switch back to
> Mutex in the near future.
I have no idea how Mutex handles locking, but sync seems to be doing a
whole lot of work over and over again. It seems to me it''s really meant
for smaller locking queues.
> Thanks again Jacob, if you can answer the questions I had so I can
> work on a fix that doesn''t involve updating ruby.  Also an
explanation
> as to why you were having these problems will help people decide if
> they should apply the patches too.
I''m still not sure why I''m the only one seeing these problems.
Maybe
others are seing them too and just not being aware of them. Maybe they
only show up when the Mongrels gets severely loaded. Maybe I''m simply
the only one butchering poor Mongrels for fun in my spare-time? ;-)

In regards to the sync issues the jury is still out on that one. I will
need to get a deeper understanding of how sync works to get to the
bottom of it, but I''ll continue poking around.

-- 
Cheers,
- Jacob Atzen

Kirk Haines

2006-Oct-03 22:50 UTC

head link

[Mongrel] Mongrel woes fixed

On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:
> I have no idea how Mutex handles locking, but sync seems to be doing a
> whole lot of work over and over again. It seems to me it''s really
meant
> for smaller locking queues.
They handle locking in more or less the same way.  What is really
substantially different is the unlocking.

Sync goes and wakes up the entire queue of waiting threads.  They
scramble for the lock again, and all but one (probably the first one
to be woken up) fail to get it and are relocked.

If you have a big thread queue, it is a lot of extra work.

Mutex, on the other hand, just handles it one thread at a time, so it
is _MUCH_ more efficient with the unlocking.
> In regards to the sync issues the jury is still out on that one. I will
> need to get a deeper understanding of how sync works to get to the
> bottom of it, but I''ll continue poking around.
That with Mutex.  It''t very simple to understand.  Then go take a look
at Sync. You have to shift through an uglier, more antiquated coding
style, and there is a bunch of extra code to deal with the different
types of locks that Sync supports, but if you filter through that to
find the lock and unlock pieces of code for exclusive locks, you''ll
quickly start to understand the differences.

Mutex is also so very simple that you''ll see why I prefer to use a
custom version that works around the Array memory mismanagement that
bites the standard one over Sync, which is ugly enough to make the
baby Jesus cry.

Kirk Haines

Jeremy Kemper

2006-Oct-03 23:51 UTC

head link

[Mongrel] Mongrel woes fixed

On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk>
wrote:>
> I''m still not sure why I''m the only one seeing these
problems. Maybe
> others are seing them too and just not being aware of them. Maybe they
> only show up when the Mongrels gets severely loaded. Maybe I''m
simply
> the only one butchering poor Mongrels for fun in my spare-time? ;-)

I see the same behavior with a handler that does some slow, blocking IO
(resolving symlinks on heavily loaded NFS servers). I worked around it by
using more Mongrels instead of more worker threads and by backgrounding one
long-running task which could safely send an HTTP response before
completing.

It''d be cool if Mongrel forked itself (up to a max # of processes) to
alleviate the issues that arise with Rails  handler locking and with
blocking IO due to Ruby''s green threads. Mongrel cluster could work but
it
feels like overkill when forking would do the job with zero config (minus
Win32).

jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mongrel-users/attachments/20061003/dcd4c6cf/attachment.html

Luis Lavena

2006-Oct-04 02:49 UTC

head link

[Mongrel] Mongrel woes fixed

On 10/3/06, Jeremy Kemper <jeremy at bitsweat.net>
wrote:> On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:
> > I''m still not sure why I''m the only one seeing these
problems. Maybe
> > others are seing them too and just not being aware of them. Maybe they
> > only show up when the Mongrels gets severely loaded. Maybe
I''m simply
> > the only one butchering poor Mongrels for fun in my spare-time? ;-)
>
> I see the same behavior with a handler that does some slow, blocking IO
> (resolving symlinks on heavily loaded NFS servers). I worked around it by
> using more Mongrels instead of more worker threads and by backgrounding one
> long-running task which could safely send an HTTP response before
> completing.
>
> It''d be cool if Mongrel forked itself (up to a max # of processes)
to
> alleviate the issues that arise with Rails  handler locking and with
> blocking IO due to Ruby''s green threads. Mongrel cluster could
work but it
> feels like overkill when forking would do the job with zero config (minus
> Win32).
Jeremy, I''m working on a improved mongrel_service (for win32) that
will handle the mongrel_cluster scenario in a simplified way.

Still, configuration will be required, but each time you need to
change the parameters for your mongrels, wouldn''t need to
remove/install cycle, just change the config file and restart the
service.

Also, with the new schema will get child process monitoring and re
spawning if they are needed.

(Still, don''t know how to differentiate a sleeping-dead-zombie mongrel
from the rest, but guess will figure it out ;)

>
> jeremy
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>
>

-- 
Luis Lavena
Multimedia systems
-
Leaders are made, they are not born. They are made by hard effort,
which is the price which all of us must pay to achieve any goal that
is worthwhile.
Vince Lombardi

Steven Lumos

2006-Oct-04 21:50 UTC

head link

[Mongrel] Mongrel woes fixed

"Zed A. Shaw" <zedshaw at zedshaw.com> writes:
> On Sun, 1 Oct 2006 21:38:12 +0200
> Jacob Atzen <jacob at jacobatzen.dk> wrote:
>
>> Hello all,
>> 
>> For the past couple of weeks I have been spending some time debugging a
>> couple of issues I was having with Mongrel when I put load on it. I
have
>> seen two distinct issues:
>> 
>> 1. Mongrel stopped responding as if in an endless loop.
>> 2. Mongrel crashed when severely loaded.
>> 
>
> Cool, glad you took the time to figure this out.
>
>> I believe to have resolved these two issues and have attached patches
>> which shows the resolution (simple as it is). Explanation of the
patches
>> is given below.
>> 
>> The first problem is handled by the patch to sync.rb from the standard
>> library. What is happening here is that when sync_unlock is called
>> Thread.critical is set to true. Now if the thread is not the
>> sync_ex_locker an exception is thrown without Thread.critical being set
>> to false. This in turn resulted in a situation where the
>> mongrel_sleeper_thread (configurator.rb:270) was the only thread
getting
>> back on the cpu and Thread.critical stayed true. The patch simply
>> ensures that Thread.critical is set to false upon leaving sync.rb.
>
> Ok, is there a way to fix this without having people backpatch their
> ruby?  Also, why were you the only one having this problem?  I''d
> like to know how the error is caused if you could explain it.
He''s definitely not the only one having this problem.  I''ve
been
working on it too as I''ve had time.  I--and possibly others--have been
wasting time with USR1 trying to figure out what *my* code must have
been doing wrong.  Once I realized that wasn''t helping, I had got as
far as commenting out the sleeper thread so that I could get a useful
backtrace, but then it looked like it my query that was hanging.  I
see now that I was stupidly glossing over the most important frame.

thread.rb:100:in `lock''
oci8.rb:126:in `do_ocicall''
oci8.rb:232:in `commit''
(eval):3:in `commit''
active_record/connection_adapters/oracle_adapter.rb:295:in
`commit_db_transaction''
active_record/connection_adapters/abstract/database_stateme ...

For people who have this problem, the commonality is likely to be (1)
the Oracle adapter, (2) actions that create threads.  If other (mysql)
adapters don''t use threads then that may explain it.

Steve

Zed A. Shaw

2006-Oct-04 23:25 UTC

head link

[Mongrel] Mongrel pre-release 0.3.13.5(Was: Mongrel woes fixed)

On Sun, 1 Oct 2006 21:38:12 +0200
Jacob Atzen <jacob at jacobatzen.dk> wrote:
> Hello all,
> 
> For the past couple of weeks I have been spending some time debugging a
> couple of issues I was having with Mongrel when I put load on it. I have
> seen two distinct issues:
> 
> 1. Mongrel stopped responding as if in an endless loop.
> 2. Mongrel crashed when severely loaded.
Alright, I made the slight change to remove the abort on exception call, and a
small tweak to abort if the received size of a body does not match the
content-length.  I also made a slight clarity change to the USR1 logging so that
it reports only the action that is blocking all the other actions.  Previously
it was more confused that this, so I simplified it.

People experiencing problems of either 99% CPU or crashing should grab the
pre-release and run your server with it.  If you see a stack trace saying the
body lengths don''t match, send them to me.  If you crash, send me the
stack trace as well.

I''ll also be looking at another suggested change from Jacob, but I
think this will clean up nicely.

And as usual you install with:

gem install mongrel --source=http://mongrel.rubyforge.org/releases/

Thanks a bunch folks!

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Zed A. Shaw

2006-Oct-04 23:31 UTC

head link

[Mongrel] Mongrel woes fixed

On Tue, 3 Oct 2006 16:51:55 -0700
"Jeremy Kemper" <jeremy at bitsweat.net> wrote:
> On 10/3/06, Jacob Atzen <jacob at jacobatzen.dk> wrote:
> >
> > I''m still not sure why I''m the only one seeing these
problems. Maybe
> > others are seing them too and just not being aware of them. Maybe they
> > only show up when the Mongrels gets severely loaded. Maybe
I''m simply
> > the only one butchering poor Mongrels for fun in my spare-time? ;-)
> 
> 
> I see the same behavior with a handler that does some slow, blocking IO
> (resolving symlinks on heavily loaded NFS servers). I worked around it by
> using more Mongrels instead of more worker threads and by backgrounding one
> long-running task which could safely send an HTTP response before
> completing.
> 
> It''d be cool if Mongrel forked itself (up to a max # of processes)
to
> alleviate the issues that arise with Rails  handler locking and with
> blocking IO due to Ruby''s green threads. Mongrel cluster could
work but it
> feels like overkill when forking would do the job with zero config (minus
> Win32).
It actually wouldn''t be hard to write a simple HttpHandler that was
inserted before rails which just called fork to continue the request, but I
think it''d require a shift in how the response is processed in Mongrel.

The problem with forking servers is they are slow and they end up having the
same problems we''re seeing with threads just with processes (which eat
up ass loads more ram than a dead thread).  I''m really more interested
in something that''s in-between full on 50M forked processes for each
request and nasty ruby threading causing problems.

Another thought might be to have such a forking handler for Mongrel, but make it
selective so that it only forked the troublesome requests.

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Zed A. Shaw

2006-Oct-04 23:34 UTC

head link

[Mongrel] Mongrel woes fixed

On Wed, 04 Oct 2006 14:50:19 -0700
Steven Lumos <steven at lumos.us> wrote:
> "Zed A. Shaw" <zedshaw at zedshaw.com> writes:
> 
> > On Sun, 1 Oct 2006 21:38:12 +0200
> > Jacob Atzen <jacob at jacobatzen.dk> wrote:
> >
> > Ok, is there a way to fix this without having people backpatch their
> > ruby?  Also, why were you the only one having this problem? 
I''d
> > like to know how the error is caused if you could explain it.
> 
> He''s definitely not the only one having this problem. 
I''ve been
> working on it too as I''ve had time.  I--and possibly others--have
been
> wasting time with USR1 trying to figure out what *my* code must have
> been doing wrong.  Once I realized that wasn''t helping, I had got
as
> far as commenting out the sleeper thread so that I could get a useful
> backtrace, but then it looked like it my query that was hanging.  I
> see now that I was stupidly glossing over the most important frame.
> 
> thread.rb:100:in `lock''
> oci8.rb:126:in `do_ocicall''
> oci8.rb:232:in `commit''
> (eval):3:in `commit''
> active_record/connection_adapters/oracle_adapter.rb:295:in
`commit_db_transaction''
> active_record/connection_adapters/abstract/database_stateme ...
> 
> For people who have this problem, the commonality is likely to be (1)
> the Oracle adapter, (2) actions that create threads.  If other (mysql)
> adapters don''t use threads then that may explain it.
Ah, but I''m wondering if there''s a possible deadlock in
oci8.rb when it does that lock call.  Ruby''s supposed to detect those. 
Also, if the oci8 driver is kicking up threads to do its requests, and it
isn''t cleaning them out, then you''ll quickly run out of file
descriptors.  Are you also seeing tons of connections to your oracle db?  Like
one per request?

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Zed A. Shaw

2007-Feb-07 07:37 UTC

head link

[Mongrel] Mongrel woes fixed

Hey Jacob, did you get this resolved?  I flaked out on it so sorry it
was back in October. :-)

On Tue, 3 Oct 2006 23:13:26 +0200
Jacob Atzen <jacob at jacobatzen.dk> wrote:
> On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw wrote:
> > > The second problem stems from the fact that Mongrel uses the
> > > Thread#abort_on_exception. I''m not sure why this is even
in there, as
> > > the documentation says:
> > > 
> > >         When set to true, causes all threads (including the main
> > >         program) to abort if an exception is raised in thr. The
process
> > >         will effectively exit(0).
> > > 
> > > The patch simply removes the abort_on_exception from mongrel.rb.
After
> > > applying this patch I have been unable to make Mongrel crash.
> > > 
> > 
> > The abort was put in there to catch exceptions that are
"leaking"
> > through and not being caught.  I''m curious what exception was
being
> > thrown that *none* of the damn begin/rescue blocks catches.  We
> > seriously need a begin/rescue-every-damn-thing-no-matter-what
> > construct that actually works.
> 
> Some further digging gives another pointer towards the problem. I am
> seeing some backtraces that look like this (beware, the line numbers
> might be a little off):
> 
> /usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in
`run'': Mongrel timed out this thread: max
processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:704:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:684:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:268:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/configurator.rb:266:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:127:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel/command.rb:211:in
`run''
>         from
/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/bin/mongrel_rails:231
>         from /usr/local/bin/mongrel_rails:18
> 
> The mongrel.rb:646 is the w.raise() call and 704 is:
> 
>         thread = Thread.new(client) { |c|
> 
> What I make of this is that somehow the thread is put aside as soon as
> it''s created, that is _before_ the process_client() call and when
the
> thread is then killed off there''s nothing to catch the exception.
> 
> Now I am seeing exceptions being cast at line 646 before this one, but
> they have different backtraces, typically involving sync.rb and seems to
> be handled more properly. As far as I can tell they are all handled by
> the "Error calling Dispatcher.dispatch" rescue block.
> 
> Another interesting note is that after an exception is cast at this
> exact point and with the above backtrace all processes go into the
> aborting state. And when they''re all killed off Mongrel dies.
It''s
> actually quite clear as the following log exerpt will show:
> 
> 
> % grep -ni timeout log/mongrel.14.log
> 127011:Tue Oct 03 21:57:28 CEST 2006: Error calling Dispatcher.dispatch
#<Mongrel::TimeoutError: Mongrel timed out this thread: max
processors#<Thread:0x8fe760c run>> from #<Thread:0x8fe760c run>
>
203806:/usr/local/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4/lib/mongrel.rb:646:in
`run'': Mongrel timed out this thread: max
processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)
> 
> % grep -n "aborting" log/mongrel.14.log | head -n 3
> 203815:[sync_synchronize] Unlocking #<Thread:0x8f3a72c aborting>
> 203816:[sync_unlock:1] Thread.critical = true #<Thread:0x8f3a72c
aborting>
> 203817:[sync_unlock:4] Thread.critical = false #<Thread:0x8f3a72c
aborting>
> 
> 
> I have tried adding a begin/rescue/end around the process_client() call
> like this:
> 
>     begin
>       process_client(thread_client);
>     rescue Object
>       STDERR.print "ERROR: RESCUING BEFORE CALL\n"
>     end
> 
> And it seems to be doing the job. At least I haven''t been able to
make
> Mongrel crash yet, with whatever little testing I''ve been doing.
Yet I
> have seen the error message show up a couple of times.
> 
> What are your thoughts on this? Does it make sense?
> 
> > Now, I believe the problem you''ll have is when this exception
leaks
> > through that thread will become dead and eventually you''ll
fill up and
> > Mongrel will die anyway.  If you can, try to find out what exception
> > causes this so I can have the server kill off its threads properly by
> > handling this yet another annoying random exception.
> 
> Actually from what I''ve seen so far, the threads seems to be
killed of
> properly, if somewhat slow. I''m printing the process list in every
run
> of the sleeper thread, and from what I can see Mongrel will clean out
> the threads even when the exception leaks through. From my understanding
> when an exception gets to the top level of the thread, the thread is
> simply killed off. Feel free to correct me if I''m wrong.
> 
> It does seem that some threads keep lingering in Mongrel even after all
> requests have died. But this seems to be the case whether the abort flag
> is set or not. It looks as if some enter an eternal sleeping state. I
> don''t think it''s much of a problem stability wise though,
as they''ll
> just get killed of whenever the reaper gets to work.
> 
> > This might not be needed as the ruby-core guys finally started taking
> > a serious look at how array works and we can probably switch back to
> > Mutex in the near future.
> 
> I have no idea how Mutex handles locking, but sync seems to be doing a
> whole lot of work over and over again. It seems to me it''s really
meant
> for smaller locking queues.
> 
> > Thanks again Jacob, if you can answer the questions I had so I can
> > work on a fix that doesn''t involve updating ruby.  Also an
explanation
> > as to why you were having these problems will help people decide if
> > they should apply the patches too.
> 
> I''m still not sure why I''m the only one seeing these
problems. Maybe
> others are seing them too and just not being aware of them. Maybe they
> only show up when the Mongrels gets severely loaded. Maybe I''m
simply
> the only one butchering poor Mongrels for fun in my spare-time? ;-)
> 
> In regards to the sync issues the jury is still out on that one. I will
> need to get a deeper understanding of how sync works to get to the
> bottom of it, but I''ll continue poking around.
> 
> -- 
> Cheers,
> - Jacob Atzen
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://www.awprofessional.com/title/0321483502 -- The Mongrel Book
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.

Seemingly Similar Threads

Search for more seemingly similar threads

Mongrel users - Oct 2006 - Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel pre-release 0.3.13.5(Was: Mongrel woes fixed)

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

[Mongrel] Mongrel woes fixed

Seemingly Similar Threads