thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job [May 2009]

If this information is useful, please help other people find it:
Share via:

Meyers, Dan

2009-May-14 15:20 UTC

[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job

I''m trying to schedule jobs from our own database scheduling system
that
we hook BackgrounDRb into. I assign a job to a worker asynchronously
using

MiddleMan.worker(worker).async_wrapper(:job_key => jobkey, :arg =>
task.id)

This gets called whenever I have a job scheduled to start according to
my db. The behaviour in an older version of BackgrounDRb was that if
that worker was already running a job then the new job didn''t get
started or queued, it just disappeared. 

This was exactly what I wanted, as I would periodically check my db and
attempt to start all workers scheduled to do so using this method. Any
that were still processing the previous run of the job just didn''t run.
For example I have a job scheduled to run every 5 seconds on a specific
worker written just for it, for an indefinite period of time. There
should only ever be one copy of it running. Normally it takes 3 or 4
seconds to run, so this is fine. Sometimes it gets a sudden backlog of
data to deal with, and takes 30 seconds to process it all. Using the old
version of BackgrounDRb after the 30 second job was up it would start a
new job on the next 5 second ''tick'' of our scheduler, but all
the jobs
it had tried to start while the worker was already handling a job would
have disappeared. The behaviour using the new version seems to be to
queue up the next run of the job on a call to async_*. The result now
becomes that any time I have a backlog I get the one 30 second job run
that actually processes the data, then immediately it finishes I get 6
or so other runs of the job which have no data to process, so execute in
almost no time but fill my logging table with
''started...stopped''
messages. These jobs were queued up to run, 1 per 5 second tick, while
the 30 second job was still executing.

I hoped I could use

MiddleMan.worker(worker).worker_info[:status]

To check whether the job was running or not. This *seemed* to work on
the jobs that I started manually (rather than starting on BackgrounDRb
load) by putting ''set_no_auto_load true'' at the top of the
worker file
and then calling

MiddleMan.new_worker(:worker => worker, :worker_key => key)

to create the worker before giving it work to do. However this seems to
sometimes get out of sync. It will always return status :stopped if a
worker or job doesn''t exist, but I had a worker running and outputting
data, and worker_info was still claiming it was stopped. I have also had
worker_info continue to claim that a worker is running after it has
finished processing data and disappeared from the process list. Is there
any way to be sure the data returned from worker_info is current and
valid?

Secondly, worker_info seems to *always* return :running if the worker is
started on BackgrounDRb load, instead of being spawned using new_worker
by my own code. This is one of my problems in the example of the
indefinite task above. I cannot use worker_info to see whether the
worker is currently running a job (even if worker_info reliably returned
correct information), and not call async_wrapper if it is, because
worker_info *always* says the task is running. Is there any way that I
can find out if a worker started on BackgrounDRb load is currently
executing a job or not?

--
Dan Meyers
Network Specialist, Lancaster University
E-Mail: d.meyers at lancaster.ac.uk

Fitzhugh, Cary

2009-May-14 18:04 UTC

head link

[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job

Don''t know if this helps, but I would assume that, based on your
description, that a worker knows where to get the data, without an argument
coming in.

If that''s the case, then you might try add_periodic_timer
http://backgroundrb.rubyforge.org/scheduling/

I''m not sure, but it might do what you want out of the box.

If not, then you could make a thread pool of size 1, and in your method, check
if the thread pool has a threadavailable, then you could defer or not to the
thread_pool.

Something like:

pool_size 1

def create(args=nil)
  add_periodic_timer(5) {my_method}
end

def my_method(arg)
  if @thread_pool.work_queue.size == 0
     thread_pool.defer(:do_my_method, arg)
  end
end

def do_my_method(arg)
    ........
end


Though - the scheduling time is held in backgroundrb then, and not the main
application. So you could dump the periodic_timer and just call my_method async.
But testing the work_queue size is probably what you''re looking for.

I don''t know about the status returning invalid and all that though.

Thanks,
Cary

-----Original Message-----
From: backgroundrb-devel-bounces at rubyforge.org
[mailto:backgroundrb-devel-bounces at rubyforge.org] On Behalf Of Meyers, Dan
Sent: Thursday, May 14, 2009 11:20 AM
To: backgroundrb-devel at rubyforge.org
Subject: [Backgroundrb-devel] Scheduling async jobs to workers, and checking if
workers are currently running a job

I''m trying to schedule jobs from our own database scheduling system
that
we hook BackgrounDRb into. I assign a job to a worker asynchronously
using

MiddleMan.worker(worker).async_wrapper(:job_key => jobkey, :arg =>
task.id)

This gets called whenever I have a job scheduled to start according to
my db. The behaviour in an older version of BackgrounDRb was that if
that worker was already running a job then the new job didn''t get
started or queued, it just disappeared.

This was exactly what I wanted, as I would periodically check my db and
attempt to start all workers scheduled to do so using this method. Any
that were still processing the previous run of the job just didn''t run.
For example I have a job scheduled to run every 5 seconds on a specific
worker written just for it, for an indefinite period of time. There
should only ever be one copy of it running. Normally it takes 3 or 4
seconds to run, so this is fine. Sometimes it gets a sudden backlog of
data to deal with, and takes 30 seconds to process it all. Using the old
version of BackgrounDRb after the 30 second job was up it would start a
new job on the next 5 second ''tick'' of our scheduler, but all
the jobs
it had tried to start while the worker was already handling a job would
have disappeared. The behaviour using the new version seems to be to
queue up the next run of the job on a call to async_*. The result now
becomes that any time I have a backlog I get the one 30 second job run
that actually processes the data, then immediately it finishes I get 6
or so other runs of the job which have no data to process, so execute in
almost no time but fill my logging table with
''started...stopped''
messages. These jobs were queued up to run, 1 per 5 second tick, while
the 30 second job was still executing.

I hoped I could use

MiddleMan.worker(worker).worker_info[:status]

To check whether the job was running or not. This *seemed* to work on
the jobs that I started manually (rather than starting on BackgrounDRb
load) by putting ''set_no_auto_load true'' at the top of the
worker file
and then calling

MiddleMan.new_worker(:worker => worker, :worker_key => key)

to create the worker before giving it work to do. However this seems to
sometimes get out of sync. It will always return status :stopped if a
worker or job doesn''t exist, but I had a worker running and outputting
data, and worker_info was still claiming it was stopped. I have also had
worker_info continue to claim that a worker is running after it has
finished processing data and disappeared from the process list. Is there
any way to be sure the data returned from worker_info is current and
valid?

Secondly, worker_info seems to *always* return :running if the worker is
started on BackgrounDRb load, instead of being spawned using new_worker
by my own code. This is one of my problems in the example of the
indefinite task above. I cannot use worker_info to see whether the
worker is currently running a job (even if worker_info reliably returned
correct information), and not call async_wrapper if it is, because
worker_info *always* says the task is running. Is there any way that I
can find out if a worker started on BackgrounDRb load is currently
executing a job or not?

--
Dan Meyers
Network Specialist, Lancaster University
E-Mail: d.meyers at lancaster.ac.uk


_______________________________________________
Backgroundrb-devel mailing list
Backgroundrb-devel at rubyforge.org
http://rubyforge.org/mailman/listinfo/backgroundrb-devel

This e-mail and any files transmitted with it may be proprietary and are
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely those
of the author and do not necessarily represent those of ITT Corporation. The
recipient should check this e-mail and any attachments for the presence of
viruses. ITT accepts no liability for any damage caused by any virus transmitted
by this e-mail.

Backgroundrb devel - May 2009 - Scheduling async jobs to workers, and checking if workers are currently running a job

[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job

[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job