Meyers, Dan
2009-May-14 15:20 UTC
[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job
I''m trying to schedule jobs from our own database scheduling system that we hook BackgrounDRb into. I assign a job to a worker asynchronously using MiddleMan.worker(worker).async_wrapper(:job_key => jobkey, :arg => task.id) This gets called whenever I have a job scheduled to start according to my db. The behaviour in an older version of BackgrounDRb was that if that worker was already running a job then the new job didn''t get started or queued, it just disappeared. This was exactly what I wanted, as I would periodically check my db and attempt to start all workers scheduled to do so using this method. Any that were still processing the previous run of the job just didn''t run. For example I have a job scheduled to run every 5 seconds on a specific worker written just for it, for an indefinite period of time. There should only ever be one copy of it running. Normally it takes 3 or 4 seconds to run, so this is fine. Sometimes it gets a sudden backlog of data to deal with, and takes 30 seconds to process it all. Using the old version of BackgrounDRb after the 30 second job was up it would start a new job on the next 5 second ''tick'' of our scheduler, but all the jobs it had tried to start while the worker was already handling a job would have disappeared. The behaviour using the new version seems to be to queue up the next run of the job on a call to async_*. The result now becomes that any time I have a backlog I get the one 30 second job run that actually processes the data, then immediately it finishes I get 6 or so other runs of the job which have no data to process, so execute in almost no time but fill my logging table with ''started...stopped'' messages. These jobs were queued up to run, 1 per 5 second tick, while the 30 second job was still executing. I hoped I could use MiddleMan.worker(worker).worker_info[:status] To check whether the job was running or not. This *seemed* to work on the jobs that I started manually (rather than starting on BackgrounDRb load) by putting ''set_no_auto_load true'' at the top of the worker file and then calling MiddleMan.new_worker(:worker => worker, :worker_key => key) to create the worker before giving it work to do. However this seems to sometimes get out of sync. It will always return status :stopped if a worker or job doesn''t exist, but I had a worker running and outputting data, and worker_info was still claiming it was stopped. I have also had worker_info continue to claim that a worker is running after it has finished processing data and disappeared from the process list. Is there any way to be sure the data returned from worker_info is current and valid? Secondly, worker_info seems to *always* return :running if the worker is started on BackgrounDRb load, instead of being spawned using new_worker by my own code. This is one of my problems in the example of the indefinite task above. I cannot use worker_info to see whether the worker is currently running a job (even if worker_info reliably returned correct information), and not call async_wrapper if it is, because worker_info *always* says the task is running. Is there any way that I can find out if a worker started on BackgrounDRb load is currently executing a job or not? -- Dan Meyers Network Specialist, Lancaster University E-Mail: d.meyers at lancaster.ac.uk
Fitzhugh, Cary
2009-May-14 18:04 UTC
[Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job
Don''t know if this helps, but I would assume that, based on your description, that a worker knows where to get the data, without an argument coming in. If that''s the case, then you might try add_periodic_timer http://backgroundrb.rubyforge.org/scheduling/ I''m not sure, but it might do what you want out of the box. If not, then you could make a thread pool of size 1, and in your method, check if the thread pool has a threadavailable, then you could defer or not to the thread_pool. Something like: pool_size 1 def create(args=nil) add_periodic_timer(5) {my_method} end def my_method(arg) if @thread_pool.work_queue.size == 0 thread_pool.defer(:do_my_method, arg) end end def do_my_method(arg) ........ end Though - the scheduling time is held in backgroundrb then, and not the main application. So you could dump the periodic_timer and just call my_method async. But testing the work_queue size is probably what you''re looking for. I don''t know about the status returning invalid and all that though. Thanks, Cary -----Original Message----- From: backgroundrb-devel-bounces at rubyforge.org [mailto:backgroundrb-devel-bounces at rubyforge.org] On Behalf Of Meyers, Dan Sent: Thursday, May 14, 2009 11:20 AM To: backgroundrb-devel at rubyforge.org Subject: [Backgroundrb-devel] Scheduling async jobs to workers, and checking if workers are currently running a job I''m trying to schedule jobs from our own database scheduling system that we hook BackgrounDRb into. I assign a job to a worker asynchronously using MiddleMan.worker(worker).async_wrapper(:job_key => jobkey, :arg => task.id) This gets called whenever I have a job scheduled to start according to my db. The behaviour in an older version of BackgrounDRb was that if that worker was already running a job then the new job didn''t get started or queued, it just disappeared. This was exactly what I wanted, as I would periodically check my db and attempt to start all workers scheduled to do so using this method. Any that were still processing the previous run of the job just didn''t run. For example I have a job scheduled to run every 5 seconds on a specific worker written just for it, for an indefinite period of time. There should only ever be one copy of it running. Normally it takes 3 or 4 seconds to run, so this is fine. Sometimes it gets a sudden backlog of data to deal with, and takes 30 seconds to process it all. Using the old version of BackgrounDRb after the 30 second job was up it would start a new job on the next 5 second ''tick'' of our scheduler, but all the jobs it had tried to start while the worker was already handling a job would have disappeared. The behaviour using the new version seems to be to queue up the next run of the job on a call to async_*. The result now becomes that any time I have a backlog I get the one 30 second job run that actually processes the data, then immediately it finishes I get 6 or so other runs of the job which have no data to process, so execute in almost no time but fill my logging table with ''started...stopped'' messages. These jobs were queued up to run, 1 per 5 second tick, while the 30 second job was still executing. I hoped I could use MiddleMan.worker(worker).worker_info[:status] To check whether the job was running or not. This *seemed* to work on the jobs that I started manually (rather than starting on BackgrounDRb load) by putting ''set_no_auto_load true'' at the top of the worker file and then calling MiddleMan.new_worker(:worker => worker, :worker_key => key) to create the worker before giving it work to do. However this seems to sometimes get out of sync. It will always return status :stopped if a worker or job doesn''t exist, but I had a worker running and outputting data, and worker_info was still claiming it was stopped. I have also had worker_info continue to claim that a worker is running after it has finished processing data and disappeared from the process list. Is there any way to be sure the data returned from worker_info is current and valid? Secondly, worker_info seems to *always* return :running if the worker is started on BackgrounDRb load, instead of being spawned using new_worker by my own code. This is one of my problems in the example of the indefinite task above. I cannot use worker_info to see whether the worker is currently running a job (even if worker_info reliably returned correct information), and not call async_wrapper if it is, because worker_info *always* says the task is running. Is there any way that I can find out if a worker started on BackgrounDRb load is currently executing a job or not? -- Dan Meyers Network Specialist, Lancaster University E-Mail: d.meyers at lancaster.ac.uk _______________________________________________ Backgroundrb-devel mailing list Backgroundrb-devel at rubyforge.org http://rubyforge.org/mailman/listinfo/backgroundrb-devel This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of ITT Corporation. The recipient should check this e-mail and any attachments for the presence of viruses. ITT accepts no liability for any damage caused by any virus transmitted by this e-mail.