Hi folks - I''m having trouble getting backgroundrb to stop after one of the packet_worker_r processes dies. If backgroundrb is running properly, "/path/to/application/script/backgroundrb stop" works fine, but often one of the packet_worker_r processes dies, and the stop command no longer works after that (it runs, but it does not stop the processes, and so then start doesn''t work). The only thing that seems to work at that point is to manually kill the processes that are still running, and then the start works, but that is going to make restarting via monit a lot less clean. Any ideas would be much appreciated! I''m using github version of backgroundrb, and packet 0.1.13 running on ubuntu. Thanks! Ryan
Hi Ryan,
I recently ran into the same issue where the backgroundrb process
would not respond to ./script/backgroundrb stop command. The pid file
was being deleted but the actual process was not being killed. I''m
running packet 0.1.12 on gentoo.
I''m not exactly sure what conditions put backgroundrb into such a
state but I''ve decided to modify the script/backgroundrb to behave a
little differently.
My hypothesis is that if one of the Process.kill method calls in
script/backgroundrb raises an exception, the pid file is deleted even
though the kill signal is never sent. At this point, running starting
and stopping backgroundrb never affects the original still running
backgroundrb process.
There are a couple of reasons that I believe an exception could be
raised. Either the Process.getpgid(pid), Process.kill(''TERM'',
pid) or
the PRocess.kill(''-TERM'', pgid) raise an exception or the
effective
uid of the user running script/backgroundrb stop does not have
permission to kill those processes.
To fix this, we''ve removed the Process.getpgid and the two
Process.kill''s that are sending the TERM signal. Since we''ve
architected our backgroundrb jobs to be persistent and idempotent (a
db backed queue written before the feature appeared in bdrb), we''ll
just use the KILL signal.
Thoughts?
Thanks,
Jonathan
On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at gmail.com>
wrote:> Hi folks -
>
> I''m having trouble getting backgroundrb to stop after one of the
> packet_worker_r processes dies.
>
> If backgroundrb is running properly,
> "/path/to/application/script/backgroundrb stop" works fine, but
often
> one of the packet_worker_r processes dies, and the stop command no
> longer works after that (it runs, but it does not stop the processes,
> and so then start doesn''t work).
>
> The only thing that seems to work at that point is to manually kill
> the processes that are still running, and then the start works, but
> that is going to make restarting via monit a lot less clean.
>
> Any ideas would be much appreciated!
>
> I''m using github version of backgroundrb, and packet 0.1.13
running on ubuntu.
>
> Thanks!
> Ryan
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>
Jonathan,
Glad you raised this, I''ve been spending some time trying to
diagnose this exact same problem.
The exception handling code in the "when ''stop''"
block (in
script/backgroundrb) could definitely could be improved somewhat
- check that the process with ''pid'' exists before trying to
kill it
- rescue permission exceptions (Errno::EPERM)
- only delete the pid file if the process pid does not still exist (in
ensure block)
- be a little more verbose to stdout/stderr
While we are on the subject of shutdown, - when the backgroundrb process
gets a HUP signal does it wait for existing workers to complete any work
methods that are executing or is the
''Process.kill(''-TERM'', pgid)'' call
intended to make the OS handle this?
We use capistrano to deploy our application (stopping and restarting
backgroundrb after the rails app has been updated). It would be great
if we could have more predictability regarding shutting down
backgroundrb (i.e. have the backgroundrb disable the reactor loop in
idle workers and wait for all active workers to finish methods, then
shutdown").
John.
Jonathan Wallace wrote:> Hi Ryan,
>
> I recently ran into the same issue where the backgroundrb process
> would not respond to ./script/backgroundrb stop command. The pid file
> was being deleted but the actual process was not being killed.
I''m
> running packet 0.1.12 on gentoo.
>
> I''m not exactly sure what conditions put backgroundrb into such a
> state but I''ve decided to modify the script/backgroundrb to behave
a
> little differently.
>
> My hypothesis is that if one of the Process.kill method calls in
> script/backgroundrb raises an exception, the pid file is deleted even
> though the kill signal is never sent. At this point, running starting
> and stopping backgroundrb never affects the original still running
> backgroundrb process.
>
> There are a couple of reasons that I believe an exception could be
> raised. Either the Process.getpgid(pid),
Process.kill(''TERM'', pid) or
> the PRocess.kill(''-TERM'', pgid) raise an exception or the
effective
> uid of the user running script/backgroundrb stop does not have
> permission to kill those processes.
>
> To fix this, we''ve removed the Process.getpgid and the two
> Process.kill''s that are sending the TERM signal. Since
we''ve
> architected our backgroundrb jobs to be persistent and idempotent (a
> db backed queue written before the feature appeared in bdrb),
we''ll
> just use the KILL signal.
>
> Thoughts?
>
> Thanks,
> Jonathan
>
> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at gmail.com>
wrote:
>
>> Hi folks -
>>
>> I''m having trouble getting backgroundrb to stop after one of
the
>> packet_worker_r processes dies.
>>
>> If backgroundrb is running properly,
>> "/path/to/application/script/backgroundrb stop" works fine,
but often
>> one of the packet_worker_r processes dies, and the stop command no
>> longer works after that (it runs, but it does not stop the processes,
>> and so then start doesn''t work).
>>
>> The only thing that seems to work at that point is to manually kill
>> the processes that are still running, and then the start works, but
>> that is going to make restarting via monit a lot less clean.
>>
>> Any ideas would be much appreciated!
>>
>> I''m using github version of backgroundrb, and packet 0.1.13
running on ubuntu.
>>
>> Thanks!
>> Ryan
>> _______________________________________________
>> Backgroundrb-devel mailing list
>> Backgroundrb-devel at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>
>>
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>
--
John O''Shea, CTO at Nooked
www: http://www.nooked.com/
cell: +353 87 992 9959
skype: joshea
Hi Jonathan & Ryan,
Problem is cross platform issues, just sending "KILL" won''t
work across
all the platforms.Also we are sending a Process.kill("-TERM",pgid),
because, this will ensure that corresponding workers also get killed.
What was the exception, you got when this happened, that can shed some
light on the issue.
On my ubuntu box,
On Wed, 2008-09-17 at 11:42 -0400, Jonathan Wallace
wrote:> Hi Ryan,
>
> I recently ran into the same issue where the backgroundrb process
> would not respond to ./script/backgroundrb stop command. The pid file
> was being deleted but the actual process was not being killed.
I''m
> running packet 0.1.12 on gentoo.
>
> I''m not exactly sure what conditions put backgroundrb into such a
> state but I''ve decided to modify the script/backgroundrb to behave
a
> little differently.
>
> My hypothesis is that if one of the Process.kill method calls in
> script/backgroundrb raises an exception, the pid file is deleted even
> though the kill signal is never sent. At this point, running starting
> and stopping backgroundrb never affects the original still running
> backgroundrb process.
>
> There are a couple of reasons that I believe an exception could be
> raised. Either the Process.getpgid(pid),
Process.kill(''TERM'', pid) or
> the PRocess.kill(''-TERM'', pgid) raise an exception or the
effective
> uid of the user running script/backgroundrb stop does not have
> permission to kill those processes.
>
> To fix this, we''ve removed the Process.getpgid and the two
> Process.kill''s that are sending the TERM signal. Since
we''ve
> architected our backgroundrb jobs to be persistent and idempotent (a
> db backed queue written before the feature appeared in bdrb),
we''ll
> just use the KILL signal.
>
> Thoughts?
>
> Thanks,
> Jonathan
>
> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at gmail.com>
wrote:
> > Hi folks -
> >
> > I''m having trouble getting backgroundrb to stop after one of
the
> > packet_worker_r processes dies.
> >
> > If backgroundrb is running properly,
> > "/path/to/application/script/backgroundrb stop" works fine,
but often
> > one of the packet_worker_r processes dies, and the stop command no
> > longer works after that (it runs, but it does not stop the processes,
> > and so then start doesn''t work).
> >
> > The only thing that seems to work at that point is to manually kill
> > the processes that are still running, and then the start works, but
> > that is going to make restarting via monit a lot less clean.
> >
> > Any ideas would be much appreciated!
> >
> > I''m using github version of backgroundrb, and packet 0.1.13
running on ubuntu.
> >
> > Thanks!
> > Ryan
> > _______________________________________________
> > Backgroundrb-devel mailing list
> > Backgroundrb-devel at rubyforge.org
> > http://rubyforge.org/mailman/listinfo/backgroundrb-devel
> >
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
I too have been having the same issue. Every time I try to restart backgroundrb after an update to our application (about once a day), I have to forcefully kill it myself. However, I haven''t been able to reproduce it in a controlled setting. After I kill and start it, it all works ok. I tried killing packet_worker processes (even with -9), but it still shuts down correctly on the stop command. I''ll let it run for a while and try tomorrow, but has anyone been able to predictably reproduce the issue? -Woody (debian etch, packet 0.1.10) On Sep 16, 2008, at 9:11 AM, Ryan Case wrote:> Hi folks - > > I''m having trouble getting backgroundrb to stop after one of the > packet_worker_r processes dies. > > If backgroundrb is running properly, > "/path/to/application/script/backgroundrb stop" works fine, but often > one of the packet_worker_r processes dies, and the stop command no > longer works after that (it runs, but it does not stop the processes, > and so then start doesn''t work). > > The only thing that seems to work at that point is to manually kill > the processes that are still running, and then the start works, but > that is going to make restarting via monit a lot less clean. > > Any ideas would be much appreciated! > > I''m using github version of backgroundrb, and packet 0.1.13 running > on ubuntu. > > Thanks! > Ryan > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel
I''m not able to reproduce the issue consistently. Often killing (-9) a packet_worker will create the issue, but not always. Jonathan - thanks for the info! skipping the pgid sounds interesting, since what I do to manually fix it is usually just kill the pid for the backgroundrb process (however, as Hemant mentioned, I guess that might leave workers still running). And it definitely seems like it is failing on the kill block and deleting the pid file even tho the process is still running. Hemant - I don''t see too much as far as exceptions when I run into this issue. The debug log does show the "Address already in use - bind(2) (Errno::EADDRINUSE)" error when start tries to run, but I don''t see exceptions for the stop in the log. When I run stop however, I do get the "Deleting pid file" output, which looks like Errno::ESRCH is being rescued. (Not sure if that is correct, or if there is a way to see more detail on the exception...) Thanks everyone, Ryan On Wed, Sep 17, 2008 at 4:14 PM, Woody Peterson <woody at crystalcommerce.com> wrote:> I too have been having the same issue. Every time I try to restart > backgroundrb after an update to our application (about once a day), I have > to forcefully kill it myself. However, I haven''t been able to reproduce it > in a controlled setting. After I kill and start it, it all works ok. I tried > killing packet_worker processes (even with -9), but it still shuts down > correctly on the stop command. I''ll let it run for a while and try tomorrow, > but has anyone been able to predictably reproduce the issue? > > -Woody > > (debian etch, packet 0.1.10) > > On Sep 16, 2008, at 9:11 AM, Ryan Case wrote: > >> Hi folks - >> >> I''m having trouble getting backgroundrb to stop after one of the >> packet_worker_r processes dies. >> >> If backgroundrb is running properly, >> "/path/to/application/script/backgroundrb stop" works fine, but often >> one of the packet_worker_r processes dies, and the stop command no >> longer works after that (it runs, but it does not stop the processes, >> and so then start doesn''t work). >> >> The only thing that seems to work at that point is to manually kill >> the processes that are still running, and then the start works, but >> that is going to make restarting via monit a lot less clean. >> >> Any ideas would be much appreciated! >> >> I''m using github version of backgroundrb, and packet 0.1.13 running on >> ubuntu. >> >> Thanks! >> Ryan >> _______________________________________________ >> Backgroundrb-devel mailing list >> Backgroundrb-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >
Okay folks here is a patch to "backgroundrb" script, which should fix
some issues:
diff --git a/script/backgroundrb b/script/backgroundrb
index dabf80b..8d4bb78 100755
--- a/script/backgroundrb
+++ b/script/backgroundrb
@@ -49,18 +49,9 @@ when ''stop''
def kill_process arg_pid_file
pid = nil
File.open(arg_pid_file, "r") { |pid_handle| pid
pid_handle.gets.strip.chomp.to_i }
- begin
- pgid = Process.getpgid(pid)
- Process.kill(''TERM'', pid)
- Process.kill(''-TERM'', pgid)
- Process.kill(''KILL'', pid)
- rescue Errno::ESRCH => e
- puts "Deleting pid file"
- rescue
- puts $!
- ensure
- File.delete(arg_pid_file) if File.exists?(arg_pid_file)
- end
+ pgid = Process.getpgid(pid)
+ Process.kill(''-TERM'', pgid)
+ File.delete(arg_pid_file) if File.exists?(arg_pid_file)
end
pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"]
pid_files.each { |x| kill_process(x) }
What it does is:
1. Deleting by group id is enough for master process.
2. Do not delete the pid file if, there was an exception while stopping
the daemon.
3. Do not handle exceptions silently.
Please try this and let me know, how it goes.
On Wed, 2008-09-17 at 17:35 +0100, John O''Shea
wrote:> Jonathan,
> Glad you raised this, I''ve been spending some time trying to
> diagnose this exact same problem.
> The exception handling code in the "when
''stop''" block (in
> script/backgroundrb) could definitely could be improved somewhat
> - check that the process with ''pid'' exists before trying
to kill it
> - rescue permission exceptions (Errno::EPERM)
> - only delete the pid file if the process pid does not still exist (in
> ensure block)
> - be a little more verbose to stdout/stderr
>
> While we are on the subject of shutdown, - when the backgroundrb process
> gets a HUP signal does it wait for existing workers to complete any work
> methods that are executing or is the
''Process.kill(''-TERM'', pgid)'' call
> intended to make the OS handle this?
>
> We use capistrano to deploy our application (stopping and restarting
> backgroundrb after the rails app has been updated). It would be great
> if we could have more predictability regarding shutting down
> backgroundrb (i.e. have the backgroundrb disable the reactor loop in
> idle workers and wait for all active workers to finish methods, then
> shutdown").
>
> John.
>
> Jonathan Wallace wrote:
> > Hi Ryan,
> >
> > I recently ran into the same issue where the backgroundrb process
> > would not respond to ./script/backgroundrb stop command. The pid file
> > was being deleted but the actual process was not being killed.
I''m
> > running packet 0.1.12 on gentoo.
> >
> > I''m not exactly sure what conditions put backgroundrb into
such a
> > state but I''ve decided to modify the script/backgroundrb to
behave a
> > little differently.
> >
> > My hypothesis is that if one of the Process.kill method calls in
> > script/backgroundrb raises an exception, the pid file is deleted even
> > though the kill signal is never sent. At this point, running starting
> > and stopping backgroundrb never affects the original still running
> > backgroundrb process.
> >
> > There are a couple of reasons that I believe an exception could be
> > raised. Either the Process.getpgid(pid),
Process.kill(''TERM'', pid) or
> > the PRocess.kill(''-TERM'', pgid) raise an exception
or the effective
> > uid of the user running script/backgroundrb stop does not have
> > permission to kill those processes.
> >
> > To fix this, we''ve removed the Process.getpgid and the two
> > Process.kill''s that are sending the TERM signal. Since
we''ve
> > architected our backgroundrb jobs to be persistent and idempotent (a
> > db backed queue written before the feature appeared in bdrb),
we''ll
> > just use the KILL signal.
> >
> > Thoughts?
> >
> > Thanks,
> > Jonathan
> >
> > On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at
gmail.com> wrote:
> >
> >> Hi folks -
> >>
> >> I''m having trouble getting backgroundrb to stop after one
of the
> >> packet_worker_r processes dies.
> >>
> >> If backgroundrb is running properly,
> >> "/path/to/application/script/backgroundrb stop" works
fine, but often
> >> one of the packet_worker_r processes dies, and the stop command no
> >> longer works after that (it runs, but it does not stop the
processes,
> >> and so then start doesn''t work).
> >>
> >> The only thing that seems to work at that point is to manually
kill
> >> the processes that are still running, and then the start works,
but
> >> that is going to make restarting via monit a lot less clean.
> >>
> >> Any ideas would be much appreciated!
> >>
> >> I''m using github version of backgroundrb, and packet
0.1.13 running on ubuntu.
> >>
> >> Thanks!
> >> Ryan
> >> _______________________________________________
> >> Backgroundrb-devel mailing list
> >> Backgroundrb-devel at rubyforge.org
> >> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
> >>
> >>
> > _______________________________________________
> > Backgroundrb-devel mailing list
> > Backgroundrb-devel at rubyforge.org
> > http://rubyforge.org/mailman/listinfo/backgroundrb-devel
> >
>
>
Slight variation that
- deletes pid for already-gone processes
- exits (with errror code -1) without deleting the pid file if there was
a permission problem
begin
- pgid = Process.getpgid(pid)
- Process.kill(''TERM'', pid)
- Process.kill(''-TERM'', pgid)
- Process.kill(''KILL'', pid)
- rescue Errno::ESRCH => e
- puts "Deleting pid file"
- rescue
+ pgid = Process.getpgid(pid)
+ Process.kill(''-TERM'', pgid)
+ rescue Errno::ESRCH
+ puts $!
+ # No process - Do nothing.
+ rescue Errno::EPERM
+ # Permission denied.
+ puts $!
+ Process.exit!
ensure
File.delete(arg_pid_file) if File.exists?(arg_pid_file)
end
hemant kumar wrote:> Okay folks here is a patch to "backgroundrb" script, which should
fix
> some issues:
>
> diff --git a/script/backgroundrb b/script/backgroundrb
> index dabf80b..8d4bb78 100755
> --- a/script/backgroundrb
> +++ b/script/backgroundrb
> @@ -49,18 +49,9 @@ when ''stop''
> def kill_process arg_pid_file
> pid = nil
> File.open(arg_pid_file, "r") { |pid_handle| pid >
pid_handle.gets.strip.chomp.to_i }
> - begin
> - pgid = Process.getpgid(pid)
> - Process.kill(''TERM'', pid)
> - Process.kill(''-TERM'', pgid)
> - Process.kill(''KILL'', pid)
> - rescue Errno::ESRCH => e
> - puts "Deleting pid file"
> - rescue
> - puts $!
> - ensure
> - File.delete(arg_pid_file) if File.exists?(arg_pid_file)
> - end
> + pgid = Process.getpgid(pid)
> + Process.kill(''-TERM'', pgid)
> + File.delete(arg_pid_file) if File.exists?(arg_pid_file)
> end
> pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"]
> pid_files.each { |x| kill_process(x) }
>
> What it does is:
> 1. Deleting by group id is enough for master process.
> 2. Do not delete the pid file if, there was an exception while stopping
> the daemon.
> 3. Do not handle exceptions silently.
>
> Please try this and let me know, how it goes.
>
>
>
> On Wed, 2008-09-17 at 17:35 +0100, John O''Shea wrote:
>
>> Jonathan,
>> Glad you raised this, I''ve been spending some time trying
to
>> diagnose this exact same problem.
>> The exception handling code in the "when
''stop''" block (in
>> script/backgroundrb) could definitely could be improved somewhat
>> - check that the process with ''pid'' exists before
trying to kill it
>> - rescue permission exceptions (Errno::EPERM)
>> - only delete the pid file if the process pid does not still exist (in
>> ensure block)
>> - be a little more verbose to stdout/stderr
>>
>> While we are on the subject of shutdown, - when the backgroundrb
process
>> gets a HUP signal does it wait for existing workers to complete any
work
>> methods that are executing or is the
''Process.kill(''-TERM'', pgid)'' call
>> intended to make the OS handle this?
>>
>> We use capistrano to deploy our application (stopping and restarting
>> backgroundrb after the rails app has been updated). It would be great
>> if we could have more predictability regarding shutting down
>> backgroundrb (i.e. have the backgroundrb disable the reactor loop in
>> idle workers and wait for all active workers to finish methods, then
>> shutdown").
>>
>> John.
>>
>> Jonathan Wallace wrote:
>>
>>> Hi Ryan,
>>>
>>> I recently ran into the same issue where the backgroundrb process
>>> would not respond to ./script/backgroundrb stop command. The pid
file
>>> was being deleted but the actual process was not being killed.
I''m
>>> running packet 0.1.12 on gentoo.
>>>
>>> I''m not exactly sure what conditions put backgroundrb into
such a
>>> state but I''ve decided to modify the script/backgroundrb
to behave a
>>> little differently.
>>>
>>> My hypothesis is that if one of the Process.kill method calls in
>>> script/backgroundrb raises an exception, the pid file is deleted
even
>>> though the kill signal is never sent. At this point, running
starting
>>> and stopping backgroundrb never affects the original still running
>>> backgroundrb process.
>>>
>>> There are a couple of reasons that I believe an exception could be
>>> raised. Either the Process.getpgid(pid),
Process.kill(''TERM'', pid) or
>>> the PRocess.kill(''-TERM'', pgid) raise an
exception or the effective
>>> uid of the user running script/backgroundrb stop does not have
>>> permission to kill those processes.
>>>
>>> To fix this, we''ve removed the Process.getpgid and the two
>>> Process.kill''s that are sending the TERM signal. Since
we''ve
>>> architected our backgroundrb jobs to be persistent and idempotent
(a
>>> db backed queue written before the feature appeared in bdrb),
we''ll
>>> just use the KILL signal.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Jonathan
>>>
>>> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at
gmail.com> wrote:
>>>
>>>
>>>> Hi folks -
>>>>
>>>> I''m having trouble getting backgroundrb to stop after
one of the
>>>> packet_worker_r processes dies.
>>>>
>>>> If backgroundrb is running properly,
>>>> "/path/to/application/script/backgroundrb stop" works
fine, but often
>>>> one of the packet_worker_r processes dies, and the stop command
no
>>>> longer works after that (it runs, but it does not stop the
processes,
>>>> and so then start doesn''t work).
>>>>
>>>> The only thing that seems to work at that point is to manually
kill
>>>> the processes that are still running, and then the start works,
but
>>>> that is going to make restarting via monit a lot less clean.
>>>>
>>>> Any ideas would be much appreciated!
>>>>
>>>> I''m using github version of backgroundrb, and packet
0.1.13 running on ubuntu.
>>>>
>>>> Thanks!
>>>> Ryan
>>>> _______________________________________________
>>>> Backgroundrb-devel mailing list
>>>> Backgroundrb-devel at rubyforge.org
>>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>>
>>
>
>
--
John O''Shea, CTO at Nooked
www: http://www.nooked.com/
cell: +353 87 992 9959
skype: joshea
In my particular case I know it''s not a permissions issue, as
I''m
always using the same user.
I just tried restarting it, and with Hemant''s patch I got:
script/backgroundrb:52:in `getpgid'': No such process (Errno::ESRCH)
Via the above I found that in this particular case what happened is
that my logrotate wasn''t calling stop, only start (it meant to call
stop, but was in a failing if statement checking if the pid existed).
When you call start, it doesn''t check to see if it''s already
running,
so it starts backgroundrb, overwrites the pid file, then backgroundrb
fails to start but has had it''s pid file changed. The original process
is still running, but can''t stop because it doesn''t have the
correct
pid in the pid file.
Thus, I rewrote script/backgroundrb to be more LSB compliant
(http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
) so I don''t have to check for existing pid files myself. I made a
patch, but it''s almost as big as the script itself and Hemants patch
didn''t apply for me (I must have changed something earlier in the
file), so the whole thing is at the end of the email.
While we''re on the topic, is there a place to load all the
requirements other than this file? backgroundrb status takes a matter
of seconds to do a simple File.exists?(pid) ''cuz it has to load all
the backgroundrb requirements. Not that it really matters...
-Woody
#!/usr/bin/env ruby
RAILS_HOME = File.expand_path(File.join(File.dirname(__FILE__),".."))
BDRB_HOME =
File.join(RAILS_HOME,"vendor","plugins","backgroundrb")
WORKER_ROOT = File.join(RAILS_HOME,"lib","workers")
WORKER_LOAD_ENV =
File.join(RAILS_HOME,"script","load_worker_env")
["server","server/lib","lib","lib/backgroundrb"].each
{ |x|
$LOAD_PATH.unshift(BDRB_HOME + "/#{x}")}
$LOAD_PATH.unshift(WORKER_ROOT)
require "rubygems"
require "yaml"
require "erb"
require "logger"
require "packet"
require "optparse"
require "bdrb_config"
require RAILS_HOME + "/config/boot"
require "active_support"
BackgrounDRb::Config.parse_cmd_options ARGV
BDRB_CONFIG = BackgrounDRb::Config.read_config("#{RAILS_HOME}/config/
backgroundrb.yml")
require RAILS_HOME + "/config/environment"
require "bdrb_job_queue"
require "backgroundrb_server"
PID_FILE = "#{RAILS_HOME}/tmp/pids/
backgroundrb_#{BDRB_CONFIG[:backgroundrb][:port]}.pid"
SERVER_LOGGER = "#{RAILS_HOME}/log/
backgroundrb_debug_#{BDRB_CONFIG[:backgroundrb][:port]}.log"
def kill_process arg_pid_file
pid = nil
File.open(arg_pid_file, "r") { |pid_handle| pid =
pid_handle.gets.strip.chomp.to_i }
pgid = Process.getpgid(pid)
puts "stopping backgroundrb"
Process.kill(''-TERM'', pgid)
File.delete(arg_pid_file) if File.exists?(arg_pid_file)
end
def status
File.exists?(PID_FILE)
end
def start
if fork
sleep(5)
exit
else
if status
puts "already running"
exit
end
puts "starting backgroundrb"
op = File.open(PID_FILE, "w")
op.write(Process.pid().to_s)
op.close
if BDRB_CONFIG[:backgroundrb][:log].nil? or
BDRB_CONFIG[:backgroundrb][:log] != ''foreground''
log_file = File.open(SERVER_LOGGER,"w+")
[STDIN, STDOUT, STDERR].each {|desc| desc.reopen(log_file)}
end
BackgrounDRb::MasterProxy.new()
end
end
def stop
pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"]
pid_files.each { |x| kill_process(x) }
end
case ARGV[0]
when ''start''
start
when ''stop''
stop
when ''restart''
stop
start
when ''status''
if status
puts "running"
exit
else
puts "not running"
exit!(3)
end
else
BackgrounDRb::MasterProxy.new()
end
On Sep 18, 2008, at 3:21 AM, John O''Shea wrote:
> Slight variation that
> - deletes pid for already-gone processes
> - exits (with errror code -1) without deleting the pid file if there
> was a permission problem
>
> begin
> - pgid = Process.getpgid(pid)
> - Process.kill(''TERM'', pid)
> - Process.kill(''-TERM'', pgid)
> - Process.kill(''KILL'', pid)
> - rescue Errno::ESRCH => e
> - puts "Deleting pid file"
> - rescue
> + pgid = Process.getpgid(pid) +
Process.kill(''-TERM'',
> pgid) + rescue Errno::ESRCH
> + puts $!
> + # No process - Do nothing.
> + rescue Errno::EPERM
> + # Permission denied. + puts $!
> + Process.exit!
> ensure File.delete(arg_pid_file) if File.exists?
> (arg_pid_file)
> end
> hemant kumar wrote:
>> Okay folks here is a patch to "backgroundrb" script, which
should fix
>> some issues:
>>
>> diff --git a/script/backgroundrb b/script/backgroundrb
>> index dabf80b..8d4bb78 100755
>> --- a/script/backgroundrb
>> +++ b/script/backgroundrb
>> @@ -49,18 +49,9 @@ when ''stop''
>> def kill_process arg_pid_file
>> pid = nil
>> File.open(arg_pid_file, "r") { |pid_handle| pid >>
pid_handle.gets.strip.chomp.to_i }
>> - begin
>> - pgid = Process.getpgid(pid)
>> - Process.kill(''TERM'', pid)
>> - Process.kill(''-TERM'', pgid)
>> - Process.kill(''KILL'', pid)
>> - rescue Errno::ESRCH => e
>> - puts "Deleting pid file"
>> - rescue
>> - puts $!
>> - ensure
>> - File.delete(arg_pid_file) if File.exists?(arg_pid_file)
>> - end
>> + pgid = Process.getpgid(pid)
>> + Process.kill(''-TERM'', pgid)
>> + File.delete(arg_pid_file) if File.exists?(arg_pid_file)
>> end
>> pid_files =
Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"]
>> pid_files.each { |x| kill_process(x) }
>>
>> What it does is:
>> 1. Deleting by group id is enough for master process. 2. Do not
>> delete the pid file if, there was an exception while stopping
>> the daemon.
>> 3. Do not handle exceptions silently.
>>
>> Please try this and let me know, how it goes.
>>
>>
>>
>> On Wed, 2008-09-17 at 17:35 +0100, John O''Shea wrote:
>>
>>> Jonathan,
>>> Glad you raised this, I''ve been spending some time
trying to
>>> diagnose this exact same problem. The exception handling code
>>> in the "when ''stop''" block (in
script/backgroundrb) could
>>> definitely could be improved somewhat
>>> - check that the process with ''pid'' exists before
trying to kill it
>>> - rescue permission exceptions (Errno::EPERM)
>>> - only delete the pid file if the process pid does not still exist
>>> (in ensure block)
>>> - be a little more verbose to stdout/stderr
>>>
>>> While we are on the subject of shutdown, - when the backgroundrb
>>> process gets a HUP signal does it wait for existing workers to
>>> complete any work methods that are executing or is the
>>> ''Process.kill(''-TERM'', pgid)''
call intended to make the OS handle
>>> this?
>>> We use capistrano to deploy our application (stopping and
>>> restarting backgroundrb after the rails app has been updated). It
>>> would be great if we could have more predictability regarding
>>> shutting down backgroundrb (i.e. have the backgroundrb disable the
>>> reactor loop in idle workers and wait for all active workers to
>>> finish methods, then shutdown").
>>>
>>> John.
>>>
>>> Jonathan Wallace wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> I recently ran into the same issue where the backgroundrb
process
>>>> would not respond to ./script/backgroundrb stop command. The
pid
>>>> file
>>>> was being deleted but the actual process was not being killed.
I''m
>>>> running packet 0.1.12 on gentoo.
>>>>
>>>> I''m not exactly sure what conditions put backgroundrb
into such a
>>>> state but I''ve decided to modify the
script/backgroundrb to
>>>> behave a
>>>> little differently.
>>>>
>>>> My hypothesis is that if one of the Process.kill method calls
in
>>>> script/backgroundrb raises an exception, the pid file is
deleted
>>>> even
>>>> though the kill signal is never sent. At this point, running
>>>> starting
>>>> and stopping backgroundrb never affects the original still
running
>>>> backgroundrb process.
>>>>
>>>> There are a couple of reasons that I believe an exception could
be
>>>> raised. Either the Process.getpgid(pid),
Process.kill(''TERM'',
>>>> pid) or
>>>> the PRocess.kill(''-TERM'', pgid) raise an
exception or the effective
>>>> uid of the user running script/backgroundrb stop does not have
>>>> permission to kill those processes.
>>>>
>>>> To fix this, we''ve removed the Process.getpgid and the
two
>>>> Process.kill''s that are sending the TERM signal.
Since we''ve
>>>> architected our backgroundrb jobs to be persistent and
idempotent
>>>> (a
>>>> db backed queue written before the feature appeared in bdrb),
we''ll
>>>> just use the KILL signal.
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Jonathan
>>>>
>>>> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case
>>>> <mrryancase at gmail.com> wrote:
>>>>
>>>>> Hi folks -
>>>>>
>>>>> I''m having trouble getting backgroundrb to stop
after one of the
>>>>> packet_worker_r processes dies.
>>>>>
>>>>> If backgroundrb is running properly,
>>>>> "/path/to/application/script/backgroundrb stop"
works fine, but
>>>>> often
>>>>> one of the packet_worker_r processes dies, and the stop
command no
>>>>> longer works after that (it runs, but it does not stop the
>>>>> processes,
>>>>> and so then start doesn''t work).
>>>>>
>>>>> The only thing that seems to work at that point is to
manually
>>>>> kill
>>>>> the processes that are still running, and then the start
works,
>>>>> but
>>>>> that is going to make restarting via monit a lot less
clean.
>>>>>
>>>>> Any ideas would be much appreciated!
>>>>>
>>>>> I''m using github version of backgroundrb, and
packet 0.1.13
>>>>> running on ubuntu.
>>>>>
>>>>> Thanks!
>>>>> Ryan
>>>>> _______________________________________________
>>>>> Backgroundrb-devel mailing list
>>>>> Backgroundrb-devel at rubyforge.org
>>>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Backgroundrb-devel mailing list
>>>> Backgroundrb-devel at rubyforge.org
>>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>>
>>>
>>
>>
>
>
> --
> John O''Shea, CTO at Nooked
> www: http://www.nooked.com/
> cell: +353 87 992 9959
> skype: joshea
>
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
Okay, So, Did you find out, why "stop" didn''t work from logrotate, in first place. I think, thats rather critical. On Thu, 2008-09-18 at 11:24 -0700, Woody Peterson wrote:> In my particular case I know it''s not a permissions issue, as I''m > always using the same user. > > I just tried restarting it, and with Hemant''s patch I got: > > script/backgroundrb:52:in `getpgid'': No such process (Errno::ESRCH) > > Via the above I found that in this particular case what happened is > that my logrotate wasn''t calling stop, only start (it meant to call > stop, but was in a failing if statement checking if the pid existed). > When you call start, it doesn''t check to see if it''s already running, > so it starts backgroundrb, overwrites the pid file, then backgroundrb > fails to start but has had it''s pid file changed. The original process > is still running, but can''t stop because it doesn''t have the correct > pid in the pid file. > > Thus, I rewrote script/backgroundrb to be more LSB compliant (http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > ) so I don''t have to check for existing pid files myself. I made a > patch, but it''s almost as big as the script itself and Hemants patch > didn''t apply for me (I must have changed something earlier in the > file), so the whole thing is at the end of the email. > > While we''re on the topic, is there a place to load all the > requirements other than this file? backgroundrb status takes a matter > of seconds to do a simple File.exists?(pid) ''cuz it has to load all > the backgroundrb requirements. Not that it really matters... > > -Woody > > #!/usr/bin/env ruby > > RAILS_HOME = File.expand_path(File.join(File.dirname(__FILE__),"..")) > BDRB_HOME = File.join(RAILS_HOME,"vendor","plugins","backgroundrb") > WORKER_ROOT = File.join(RAILS_HOME,"lib","workers") > WORKER_LOAD_ENV = File.join(RAILS_HOME,"script","load_worker_env") > > ["server","server/lib","lib","lib/backgroundrb"].each { |x| > $LOAD_PATH.unshift(BDRB_HOME + "/#{x}")} > $LOAD_PATH.unshift(WORKER_ROOT) > > require "rubygems" > require "yaml" > require "erb" > require "logger" > require "packet" > require "optparse" > > require "bdrb_config" > require RAILS_HOME + "/config/boot" > require "active_support" > > BackgrounDRb::Config.parse_cmd_options ARGV > BDRB_CONFIG = BackgrounDRb::Config.read_config("#{RAILS_HOME}/config/ > backgroundrb.yml") > > require RAILS_HOME + "/config/environment" > require "bdrb_job_queue" > require "backgroundrb_server" > > PID_FILE = "#{RAILS_HOME}/tmp/pids/ > backgroundrb_#{BDRB_CONFIG[:backgroundrb][:port]}.pid" > SERVER_LOGGER = "#{RAILS_HOME}/log/ > backgroundrb_debug_#{BDRB_CONFIG[:backgroundrb][:port]}.log" > > def kill_process arg_pid_file > pid = nil > File.open(arg_pid_file, "r") { |pid_handle| pid = > pid_handle.gets.strip.chomp.to_i } > pgid = Process.getpgid(pid) > puts "stopping backgroundrb" > Process.kill(''-TERM'', pgid) > File.delete(arg_pid_file) if File.exists?(arg_pid_file) > end > > def status > File.exists?(PID_FILE) > end > > def start > if fork > sleep(5) > exit > else > if status > puts "already running" > exit > end > > puts "starting backgroundrb" > > op = File.open(PID_FILE, "w") > op.write(Process.pid().to_s) > op.close > if BDRB_CONFIG[:backgroundrb][:log].nil? or > BDRB_CONFIG[:backgroundrb][:log] != ''foreground'' > log_file = File.open(SERVER_LOGGER,"w+") > [STDIN, STDOUT, STDERR].each {|desc| desc.reopen(log_file)} > end > > BackgrounDRb::MasterProxy.new() > end > end > > def stop > pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"] > pid_files.each { |x| kill_process(x) } > end > > case ARGV[0] > when ''start'' > start > when ''stop'' > stop > when ''restart'' > stop > start > when ''status'' > if status > puts "running" > exit > else > puts "not running" > exit!(3) > end > else > BackgrounDRb::MasterProxy.new() > end > > > On Sep 18, 2008, at 3:21 AM, John O''Shea wrote: > > > Slight variation that > > - deletes pid for already-gone processes > > - exits (with errror code -1) without deleting the pid file if there > > was a permission problem > > > > begin > > - pgid = Process.getpgid(pid) > > - Process.kill(''TERM'', pid) > > - Process.kill(''-TERM'', pgid) > > - Process.kill(''KILL'', pid) > > - rescue Errno::ESRCH => e > > - puts "Deleting pid file" > > - rescue > > + pgid = Process.getpgid(pid) + Process.kill(''-TERM'', > > pgid) + rescue Errno::ESRCH > > + puts $! > > + # No process - Do nothing. > > + rescue Errno::EPERM > > + # Permission denied. + puts $! > > + Process.exit! > > ensure File.delete(arg_pid_file) if File.exists? > > (arg_pid_file) > > end > > hemant kumar wrote: > >> Okay folks here is a patch to "backgroundrb" script, which should fix > >> some issues: > >> > >> diff --git a/script/backgroundrb b/script/backgroundrb > >> index dabf80b..8d4bb78 100755 > >> --- a/script/backgroundrb > >> +++ b/script/backgroundrb > >> @@ -49,18 +49,9 @@ when ''stop'' > >> def kill_process arg_pid_file > >> pid = nil > >> File.open(arg_pid_file, "r") { |pid_handle| pid > >> pid_handle.gets.strip.chomp.to_i } > >> - begin > >> - pgid = Process.getpgid(pid) > >> - Process.kill(''TERM'', pid) > >> - Process.kill(''-TERM'', pgid) > >> - Process.kill(''KILL'', pid) > >> - rescue Errno::ESRCH => e > >> - puts "Deleting pid file" > >> - rescue > >> - puts $! > >> - ensure > >> - File.delete(arg_pid_file) if File.exists?(arg_pid_file) > >> - end > >> + pgid = Process.getpgid(pid) > >> + Process.kill(''-TERM'', pgid) > >> + File.delete(arg_pid_file) if File.exists?(arg_pid_file) > >> end > >> pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"] > >> pid_files.each { |x| kill_process(x) } > >> > >> What it does is: > >> 1. Deleting by group id is enough for master process. 2. Do not > >> delete the pid file if, there was an exception while stopping > >> the daemon. > >> 3. Do not handle exceptions silently. > >> > >> Please try this and let me know, how it goes. > >> > >> > >> > >> On Wed, 2008-09-17 at 17:35 +0100, John O''Shea wrote: > >> > >>> Jonathan, > >>> Glad you raised this, I''ve been spending some time trying to > >>> diagnose this exact same problem. The exception handling code > >>> in the "when ''stop''" block (in script/backgroundrb) could > >>> definitely could be improved somewhat > >>> - check that the process with ''pid'' exists before trying to kill it > >>> - rescue permission exceptions (Errno::EPERM) > >>> - only delete the pid file if the process pid does not still exist > >>> (in ensure block) > >>> - be a little more verbose to stdout/stderr > >>> > >>> While we are on the subject of shutdown, - when the backgroundrb > >>> process gets a HUP signal does it wait for existing workers to > >>> complete any work methods that are executing or is the > >>> ''Process.kill(''-TERM'', pgid)'' call intended to make the OS handle > >>> this? > >>> We use capistrano to deploy our application (stopping and > >>> restarting backgroundrb after the rails app has been updated). It > >>> would be great if we could have more predictability regarding > >>> shutting down backgroundrb (i.e. have the backgroundrb disable the > >>> reactor loop in idle workers and wait for all active workers to > >>> finish methods, then shutdown"). > >>> > >>> John. > >>> > >>> Jonathan Wallace wrote: > >>> > >>>> Hi Ryan, > >>>> > >>>> I recently ran into the same issue where the backgroundrb process > >>>> would not respond to ./script/backgroundrb stop command. The pid > >>>> file > >>>> was being deleted but the actual process was not being killed. I''m > >>>> running packet 0.1.12 on gentoo. > >>>> > >>>> I''m not exactly sure what conditions put backgroundrb into such a > >>>> state but I''ve decided to modify the script/backgroundrb to > >>>> behave a > >>>> little differently. > >>>> > >>>> My hypothesis is that if one of the Process.kill method calls in > >>>> script/backgroundrb raises an exception, the pid file is deleted > >>>> even > >>>> though the kill signal is never sent. At this point, running > >>>> starting > >>>> and stopping backgroundrb never affects the original still running > >>>> backgroundrb process. > >>>> > >>>> There are a couple of reasons that I believe an exception could be > >>>> raised. Either the Process.getpgid(pid), Process.kill(''TERM'', > >>>> pid) or > >>>> the PRocess.kill(''-TERM'', pgid) raise an exception or the effective > >>>> uid of the user running script/backgroundrb stop does not have > >>>> permission to kill those processes. > >>>> > >>>> To fix this, we''ve removed the Process.getpgid and the two > >>>> Process.kill''s that are sending the TERM signal. Since we''ve > >>>> architected our backgroundrb jobs to be persistent and idempotent > >>>> (a > >>>> db backed queue written before the feature appeared in bdrb), we''ll > >>>> just use the KILL signal. > >>>> > >>>> Thoughts? > >>>> > >>>> Thanks, > >>>> Jonathan > >>>> > >>>> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case > >>>> <mrryancase at gmail.com> wrote: > >>>> > >>>>> Hi folks - > >>>>> > >>>>> I''m having trouble getting backgroundrb to stop after one of the > >>>>> packet_worker_r processes dies. > >>>>> > >>>>> If backgroundrb is running properly, > >>>>> "/path/to/application/script/backgroundrb stop" works fine, but > >>>>> often > >>>>> one of the packet_worker_r processes dies, and the stop command no > >>>>> longer works after that (it runs, but it does not stop the > >>>>> processes, > >>>>> and so then start doesn''t work). > >>>>> > >>>>> The only thing that seems to work at that point is to manually > >>>>> kill > >>>>> the processes that are still running, and then the start works, > >>>>> but > >>>>> that is going to make restarting via monit a lot less clean. > >>>>> > >>>>> Any ideas would be much appreciated! > >>>>> > >>>>> I''m using github version of backgroundrb, and packet 0.1.13 > >>>>> running on ubuntu. > >>>>> > >>>>> Thanks! > >>>>> Ryan > >>>>> _______________________________________________ > >>>>> Backgroundrb-devel mailing list > >>>>> Backgroundrb-devel at rubyforge.org > >>>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel > >>>>> > >>>>> > >>>> _______________________________________________ > >>>> Backgroundrb-devel mailing list > >>>> Backgroundrb-devel at rubyforge.org > >>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel > >>>> > >>> > >> > >> > > > > > > -- > > John O''Shea, CTO at Nooked > > www: http://www.nooked.com/ > > cell: +353 87 992 9959 > > skype: joshea > > > > _______________________________________________ > > Backgroundrb-devel mailing list > > Backgroundrb-devel at rubyforge.org > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel
Thanks for the patch - this works much better. Occasionally, I still have to "pkill -9 -f backgroundrb", but most of the time just the stop script will clean up when one of the packet_worker processes dies. Thanks, Ryan On Sep 17, 2008, at 6:08 PM, hemant kumar wrote:> Okay folks here is a patch to "backgroundrb" script, which should fix > some issues: > > diff --git a/script/backgroundrb b/script/backgroundrb > index dabf80b..8d4bb78 100755 > --- a/script/backgroundrb > +++ b/script/backgroundrb > @@ -49,18 +49,9 @@ when ''stop'' > def kill_process arg_pid_file > pid = nil > File.open(arg_pid_file, "r") { |pid_handle| pid > pid_handle.gets.strip.chomp.to_i } > - begin > - pgid = Process.getpgid(pid) > - Process.kill(''TERM'', pid) > - Process.kill(''-TERM'', pgid) > - Process.kill(''KILL'', pid) > - rescue Errno::ESRCH => e > - puts "Deleting pid file" > - rescue > - puts $! > - ensure > - File.delete(arg_pid_file) if File.exists?(arg_pid_file) > - end > + pgid = Process.getpgid(pid) > + Process.kill(''-TERM'', pgid) > + File.delete(arg_pid_file) if File.exists?(arg_pid_file) > end > pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"] > pid_files.each { |x| kill_process(x) } > > What it does is: > 1. Deleting by group id is enough for master process. > 2. Do not delete the pid file if, there was an exception while > stopping > the daemon. > 3. Do not handle exceptions silently. > > Please try this and let me know, how it goes. > > > > On Wed, 2008-09-17 at 17:35 +0100, John O''Shea wrote: >> Jonathan, >> Glad you raised this, I''ve been spending some time trying to >> diagnose this exact same problem. >> The exception handling code in the "when ''stop''" block (in >> script/backgroundrb) could definitely could be improved somewhat >> - check that the process with ''pid'' exists before trying to kill it >> - rescue permission exceptions (Errno::EPERM) >> - only delete the pid file if the process pid does not still exist >> (in >> ensure block) >> - be a little more verbose to stdout/stderr >> >> While we are on the subject of shutdown, - when the backgroundrb >> process >> gets a HUP signal does it wait for existing workers to complete any >> work >> methods that are executing or is the ''Process.kill(''-TERM'', pgid)'' >> call >> intended to make the OS handle this? >> >> We use capistrano to deploy our application (stopping and restarting >> backgroundrb after the rails app has been updated). It would be >> great >> if we could have more predictability regarding shutting down >> backgroundrb (i.e. have the backgroundrb disable the reactor loop in >> idle workers and wait for all active workers to finish methods, then >> shutdown"). >> >> John. >> >> Jonathan Wallace wrote: >>> Hi Ryan, >>> >>> I recently ran into the same issue where the backgroundrb process >>> would not respond to ./script/backgroundrb stop command. The pid >>> file >>> was being deleted but the actual process was not being killed. I''m >>> running packet 0.1.12 on gentoo. >>> >>> I''m not exactly sure what conditions put backgroundrb into such a >>> state but I''ve decided to modify the script/backgroundrb to behave a >>> little differently. >>> >>> My hypothesis is that if one of the Process.kill method calls in >>> script/backgroundrb raises an exception, the pid file is deleted >>> even >>> though the kill signal is never sent. At this point, running >>> starting >>> and stopping backgroundrb never affects the original still running >>> backgroundrb process. >>> >>> There are a couple of reasons that I believe an exception could be >>> raised. Either the Process.getpgid(pid), Process.kill(''TERM'', >>> pid) or >>> the PRocess.kill(''-TERM'', pgid) raise an exception or the effective >>> uid of the user running script/backgroundrb stop does not have >>> permission to kill those processes. >>> >>> To fix this, we''ve removed the Process.getpgid and the two >>> Process.kill''s that are sending the TERM signal. Since we''ve >>> architected our backgroundrb jobs to be persistent and idempotent (a >>> db backed queue written before the feature appeared in bdrb), we''ll >>> just use the KILL signal. >>> >>> Thoughts? >>> >>> Thanks, >>> Jonathan >>> >>> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at gmail.com> >>> wrote: >>> >>>> Hi folks - >>>> >>>> I''m having trouble getting backgroundrb to stop after one of the >>>> packet_worker_r processes dies. >>>> >>>> If backgroundrb is running properly, >>>> "/path/to/application/script/backgroundrb stop" works fine, but >>>> often >>>> one of the packet_worker_r processes dies, and the stop command no >>>> longer works after that (it runs, but it does not stop the >>>> processes, >>>> and so then start doesn''t work). >>>> >>>> The only thing that seems to work at that point is to manually kill >>>> the processes that are still running, and then the start works, but >>>> that is going to make restarting via monit a lot less clean. >>>> >>>> Any ideas would be much appreciated! >>>> >>>> I''m using github version of backgroundrb, and packet 0.1.13 >>>> running on ubuntu. >>>> >>>> Thanks! >>>> Ryan >>>> _______________________________________________ >>>> Backgroundrb-devel mailing list >>>> Backgroundrb-devel at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >>>> >>>> >>> _______________________________________________ >>> Backgroundrb-devel mailing list >>> Backgroundrb-devel at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >>> >> >> > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel