Evan Weaver
2007-May-25 18:26 UTC
[Mongrel] deadlock issue: 1.8.6/fastthread/memcached-client/mongrel
Hi all, I''m getting a strange, intermittent Thread deadlock in production: deadlock 0x2a988d7008: sleep:F(3) - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel.rb:723 CPU usage spikes to a full core and Apache 502''s on requests that got handed to that particular listener each time. I tried attaching with gdb, and gdb twisted itself into a knot. Anyone have any clues? The memcached-client reference seems suspicious. Full backtrace, gdb log, and system information is here: http://pastie.caboo.se/64610 , and also attached below. Thanks Evan -- Evan Weaver Cloudburst, LLC **************************** ## mongrel error log ** Starting Mongrel listening at 0.0.0.0:8225 ** Changing group to app. ** Changing user to app. ** Starting Rails with production environment... ** Rails loaded. ** Loading any Rails specific GemPlugins ** Signals ready. TERM => stop. USR2 => restart. INT => stop (no restart). ** Rails signals registered. HUP => reload (without restart). It might not work well. ** Mongrel available at 0.0.0.0:8225 ** Writing PID file to /opt/rails/chow/production/current/log/mongrel.8225.pid deadlock 0x2a988d7008: sleep:F(3) - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel.rb:723 deadlock 0x2a9a18d700: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9a349ad0: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9ac10f60: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9a87c4a8: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9a705a70: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9aebf9f0: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9a600210: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9a4fc0a8: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9ad415b0: sleep:- - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/rails.rb:76 deadlock 0x2a9aec5d28: sleep:S - /opt/rails/chow/production/releases/20070516172704/vendor/gems/memcache-client-1.3.0/lib/memcache.rb:473 deadlock 0x2a9a7c71c0: sleep:S - (eval):3 deadlock 0x2a955bde10: sleep:J(0x2a988d7008) (main) - /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/configurator.rb:293 /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/configurator.rb:274:in `join'': Thread(0x2a988d6f40): deadlock (fatal) from /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/configurator.rb:293:in `each'' from /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/configurator.rb:293:in `join'' from /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/bin/mongrel_rails:136:in `run'' from /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/lib/mongrel/command.rb:211:in `run'' from /opt/ruby/default/lib/ruby/gems/1.8/gems/mongrel-1.0.1/bin/mongrel_rails:243 from /opt/ruby/default/bin/mongrel_rails:16:in `load'' from /opt/ruby/default/bin/mongrel_rails:16 [process exited] ## gdb attach attempt Loaded symbols for /opt/ruby/default/lib/ruby/1.8/x86_64-linux/digest/sha1.so 0x00000000004130bc in match_fds (dst=0x7fbfff7520, src=0xf45bdc0, max=1025) at eval.c:10527 10527 eval.c: No such file or directory. in eval.c (gdb) source ruby-gdb (gdb) redirect_stdout $1 = 2 (gdb) eval "caller" Program received signal SIGPIPE, Broken pipe. 0x0000003add5b7ee2 in __write_nocancel () from /lib64/tls/libc.so.6 The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (rb_p) will be abandoned. (gdb) eval "caller()" [weavere at c13-chd-app1 tools]$ Program received signal SIGINT, Interrupt. 0x00000000004132d9 in rb_thread_schedule () at eval.c:10618 10618 in eval.c The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (rb_eval_string_protect) will be abandoned. (gdb) [A [gdb hung at this point, force-killed, and mongrel died] ## system environment: [weavere at c13-chd-app1 ~]$ ruby -v ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux] [weavere at c13-chd-app1 ~]$ gem list | grep ''^\w'' cgi_multipart_eof_fix (2.1) daemons (1.0.5) fastthread (1.0) gem_plugin (0.2.2) hpricot (0.5) mongrel (1.0.1) mongrel_cluster (1.0.1.1) mysql (2.7) rake (0.7.3) sources (0.0.1) unicode (0.1) [weavere at c13-chd-app1 chow]$ ls -1 vendor/gems/ crypt-1.1.4 image_science-1.1.1 memcache-client-1.3.0 RedCloth-3.0.4 RubyInline-3.6.3 session-2.4.0 sphinx-0.9.7-rc2 test-spec-0.3.0 [weavere at c13-chd-app1 ~]$ uname -r 2.6.9-5.0.3.ELsmp System is RHEL 4, quad-core Opteron. Error occurs in production mode very intermittently.