Jack Nutting
2008-Oct-10 12:12 UTC
[Backgroundrb-devel] magical disappearing background processes!
Hi all, I''ve been having trouble for a long time with backgroundrb processes that suddenly vanish without a trace. What happens is that at some point I discover that all the backgroundrb processes are suddenly gone. Nothing special is seen in any of the log files. This has happened intermittently for a long time, and I was hoping that upgrading to 1.0.4 would somehow help me out, but I seem to encounter the same problem. It happens infrequently, sometimes two-three times a week, sometimes not at all for several weeks. Yesterday it actually happened twice in ten minutes during a period when the server was heavily loaded, but that''s unusual. Usually when it happens the server is not under a heavy load. Yesterday when it happened, I had the fortune of having a "top" log running in a terminal window, so I''m able to present some more data. top was displaying all threads, so most of the processes show up twice or more. I have 5 background workers running, each apparently has 2 threads, plus log_worker with 1 thread and script/backgroundrb with 2 threads. My architecture is set up so that only "master" is started automatically when backgroundrb starts up, and it in turn starts the rest. I''m pasting in data for all the backgroundrb processes, sorry for the terrible formatting but I can''t really think of a better way to present this all. Here''s what it normally looks like while everything is up and running. This is the last "normal" state I found before it starting going haywire: top - 15:11:13 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17508 deploy 15 0 49300 35m 2688 S 11.8 1.7 7:54.65 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar 17504 deploy 15 0 49648 35m 2688 S 8.2 1.7 8:01.64 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar 14141 deploy 15 0 20796 17m 1612 S 0.3 0.8 2:48.59 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 14147 deploy 15 0 48232 34m 2556 S 0.3 1.7 5:10.90 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri 17523 deploy 17 0 132m 115m 3316 R 0.3 5.6 6:43.89 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ 14102 deploy 17 0 48320 31m 1364 R 0.0 1.5 3:08.97 ruby /home/deploy/mbargo/script/backgroundrb start 14144 deploy 15 0 48320 31m 1364 S 0.0 1.5 0:45.35 ruby /home/deploy/mbargo/script/backgroundrb start 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri 17486 deploy 15 0 59500 41m 3500 S 0.0 2.0 11:45.15 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar 24042 deploy 15 0 49300 35m 2688 S 0.0 1.7 0:43.58 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar 24053 deploy 15 0 132m 115m 3316 S 0.0 5.6 0:43.70 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ Next snapshot, 3 seconds later. script/backgroundrb is gone, and each of my workers (except for master) is down to 1 thread. top - 15:11:16 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17504 deploy 15 0 49648 35m 2688 S 12.6 1.7 8:02.02 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar 17486 deploy 17 0 59500 41m 3500 R 0.3 2.0 11:45.16 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar Next, 3 seconds after that, all I have left is master (still 2 threads) and log_worker: top - 15:11:19 up 5 days, 5:05, 3 users, load average: 2.85, 3.03, 3.01 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 /usr/bin/ruby1.8 /usr/bin/packet_worker_runner 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri At the next snapshot, all backgroundrb processes are gone. This is running on Ubuntu 7.10, backgroundrb 1.0.4. I''m nowhere near maxing out system memory, and there are no memory or other limits set on user processes as far as I can tell. If anyone has any ideas about what might cause this, or how to dig deeper, please let me know! I''m nearly at my wits'' end. -- // jack // http://www.nuthole.com
hemant kumar
2008-Oct-10 12:34 UTC
[Backgroundrb-devel] magical disappearing background processes!
Are your running two copies of BackgrounDRb server on the same machine? I see, two server instances, in your top output. On Fri, 2008-10-10 at 14:12 +0200, Jack Nutting wrote:> Hi all, > > I''ve been having trouble for a long time with backgroundrb processes > that suddenly vanish without a trace. What happens is that at some > point I discover that all the backgroundrb processes are suddenly > gone. Nothing special is seen in any of the log files. This has > happened intermittently for a long time, and I was hoping that > upgrading to 1.0.4 would somehow help me out, but I seem to encounter > the same problem. > > It happens infrequently, sometimes two-three times a week, sometimes > not at all for several weeks. Yesterday it actually happened twice in > ten minutes during a period when the server was heavily loaded, but > that''s unusual. Usually when it happens the server is not under a > heavy load. > > Yesterday when it happened, I had the fortune of having a "top" log > running in a terminal window, so I''m able to present some more data. > top was displaying all threads, so most of the processes show up twice > or more. > > I have 5 background workers running, each apparently has 2 threads, > plus log_worker with 1 thread and script/backgroundrb with 2 threads. > My architecture is set up so that only "master" is started > automatically when backgroundrb starts up, and it in turn starts the > rest. > > I''m pasting in data for all the backgroundrb processes, sorry for the > terrible formatting but I can''t really think of a better way to > present this all. > > Here''s what it normally looks like while everything is up and running. > This is the last "normal" state I found before it starting going > haywire: > > top - 15:11:13 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 17508 deploy 15 0 49300 35m 2688 S 11.8 1.7 7:54.65 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 17504 deploy 15 0 49648 35m 2688 S 8.2 1.7 8:01.64 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 14141 deploy 15 0 20796 17m 1612 S 0.3 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.3 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17523 deploy 17 0 132m 115m 3316 R 0.3 5.6 6:43.89 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ > 14102 deploy 17 0 48320 31m 1364 R 0.0 1.5 3:08.97 ruby > /home/deploy/mbargo/script/backgroundrb start > 14144 deploy 15 0 48320 31m 1364 S 0.0 1.5 0:45.35 ruby > /home/deploy/mbargo/script/backgroundrb start > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17486 deploy 15 0 59500 41m 3500 S 0.0 2.0 11:45.15 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 24042 deploy 15 0 49300 35m 2688 S 0.0 1.7 0:43.58 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 24053 deploy 15 0 132m 115m 3316 S 0.0 5.6 0:43.70 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/ > > Next snapshot, 3 seconds later. script/backgroundrb is gone, and each > of my workers (except for master) is down to 1 thread. > > top - 15:11:16 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 17504 deploy 15 0 49648 35m 2688 S 12.6 1.7 8:02.02 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > 17486 deploy 17 0 59500 41m 3500 R 0.3 2.0 11:45.16 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar > > Next, 3 seconds after that, all I have left is master (still 2 > threads) and log_worker: > > top - 15:11:19 up 5 days, 5:05, 3 users, load average: 2.85, 3.03, 3.01 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s > 14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > 17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62 > /usr/bin/ruby1.8 /usr/bin/packet_worker_runner > 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri > > At the next snapshot, all backgroundrb processes are gone. > > This is running on Ubuntu 7.10, backgroundrb 1.0.4. I''m nowhere near > maxing out system memory, and there are no memory or other limits set > on user processes as far as I can tell. If anyone has any ideas about > what might cause this, or how to dig deeper, please let me know! I''m > nearly at my wits'' end. >
Jack Nutting
2008-Oct-10 12:52 UTC
[Backgroundrb-devel] magical disappearing background processes!
On Fri, Oct 10, 2008 at 2:34 PM, hemant kumar <gethemant at gmail.com> wrote:> Are your running two copies of BackgrounDRb server on the same machine? > I see, two server instances, in your top output.No, it''s just one. The mode I was running "top" in showed one line for each thread. -- // jack // http://www.nuthole.com