Greg Campbell
2008-Feb-21 01:03 UTC
[Backgroundrb-devel] Registering status for multithreaded worker?
Hi all, I''m using a backgroundrb worker for processing data reporting tasks which can be initiated by users of my rails application, and which need to support status monitoring. I had been spawning a new instance with a new job_id for each task, and reporting/requesting status via that job_id. It appears that this sort of thing may be better handled by thread_pool, but there seem to be two ways of dealing with status reporting, and I''m curious whether people have found one to be preferable over the other: I could track status in the database, as I''m creating a new row for each task anyway to store the results, or I could use register_status, with a hash keyed on the equivalent of job_id (inside a mutex, as suggested in the README). Is there any reason to prefer the second over the first? Alternately, am I incorrect in assuming that thread_pool is preferable to spawning one worker per user request? Thanks, Greg Campbell -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080220/b56af37e/attachment.html
hemant
2008-Feb-21 11:54 UTC
[Backgroundrb-devel] Registering status for multithreaded worker?
Hi Greg, On Thu, Feb 21, 2008 at 6:33 AM, Greg Campbell <gtcampbell at gmail.com> wrote:> Hi all, > > I''m using a backgroundrb worker for processing data reporting tasks which > can be initiated by users of my rails application, and which need to support > status monitoring. I had been spawning a new instance with a new job_id for > each task, and reporting/requesting status via that job_id. It appears that > this sort of thing may be better handled by thread_pool, but there seem to > be two ways of dealing with status reporting, and I''m curious whether people > have found one to be preferable over the other: > > I could track status in the database, as I''m creating a new row for each > task anyway to store the results, or I could use register_status, with a > hash keyed on the equivalent of job_id (inside a mutex, as suggested in the > README). Is there any reason to prefer the second over the first? > Alternately, am I incorrect in assuming that thread_pool is preferable to > spawning one worker per user request? >thread_pool is definitely preferable over one worker per request approach. Also, usually register_status is faster than your hand rolled approach of using databases. Also, if worker status results can be stored in memcache clusters as well hence is preferable.
Greg Campbell
2008-Feb-26 20:27 UTC
[Backgroundrb-devel] Registering status for multithreaded worker?
One followup here: On Thu, Feb 21, 2008 at 3:54 AM, hemant <gethemant at gmail.com> wrote:> Hi Greg, > > On Thu, Feb 21, 2008 at 6:33 AM, Greg Campbell <gtcampbell at gmail.com> > wrote: > > Hi all, > > > > I''m using a backgroundrb worker for processing data reporting tasks > which > > can be initiated by users of my rails application, and which need to > support > > status monitoring. I had been spawning a new instance with a new job_id > for > > each task, and reporting/requesting status via that job_id. It appears > that > > this sort of thing may be better handled by thread_pool, but there seem > to > > be two ways of dealing with status reporting, and I''m curious whether > people > > have found one to be preferable over the other: > > > > I could track status in the database, as I''m creating a new row for each > > task anyway to store the results, or I could use register_status, with a > > hash keyed on the equivalent of job_id (inside a mutex, as suggested in > the > > README). Is there any reason to prefer the second over the first? > > Alternately, am I incorrect in assuming that thread_pool is preferable > to > > spawning one worker per user request? > > > > thread_pool is definitely preferable over one worker per request approach. > > Also, usually register_status is faster than your hand rolled approach > of using databases. Also, if worker status results can be stored in > memcache clusters as well hence is preferable. >Things seem to be working with thread_pool, except for one issue - ask_status returns something incorrect the first time it''s called after an ask_work call. It looks like the first ask_status response is the return value from the worker method that''s calling thread_pool.defer, when I would think that return value should be irrelevant (as the worker''s being invoked with the non-blocking ask_work). Has anyone seen this behavior before? For reference, my code basically works this way (with all app-specific stuff removed): (controller) def initiate_task @task = Task.create MiddleMan.ask_work(:worker => :threaded_worker, :worker_method => :process_task, :data => @task.id) end #called via AJAX polling to update progress bar def task_status @task_id = params[:task_id].to_i status_hash = MiddleMan.ask_status(:worker => :threaded_worker) #do something with status_hash[@task_id]... end (worker) def create @worker_status = {} @status_lock = Mutex.new register_status(@worker_status) end def process_task(task_id) thread_pool.defer(task_id) do |task_id| #do several things which call update_status... end return {:this_should_be => :irrelevant} end def update_status(task_id, status) @status_mutex.synchronize do @worker_status[task_id] = status end register_status(@worker_status) end Based on my logging in the controller, the first time task_status is called, the status_hash retrieved looks something like this: {:type => :response, :data => {:this_should_be => :irrelevant}, :client_signature => 11}. The next time, however, it looks correct: {(task_id_1) => (task_status_1), (task_id_2) => (task_status_2)}, etc. Again, has anyone seen this sort of thing before? Am I using the API incorrectly? Thanks, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080226/2184fd67/attachment.html
hemant
2008-Feb-26 21:41 UTC
[Backgroundrb-devel] Registering status for multithreaded worker?
Hi Greg, On Wed, Feb 27, 2008 at 1:57 AM, Greg Campbell <gtcampbell at gmail.com> wrote:> One followup here: > > > > On Thu, Feb 21, 2008 at 3:54 AM, hemant <gethemant at gmail.com> wrote: > > Hi Greg, > > > > > > > > > > On Thu, Feb 21, 2008 at 6:33 AM, Greg Campbell <gtcampbell at gmail.com> > wrote: > > > Hi all, > > > > > > I''m using a backgroundrb worker for processing data reporting tasks > which > > > can be initiated by users of my rails application, and which need to > support > > > status monitoring. I had been spawning a new instance with a new job_id > for > > > each task, and reporting/requesting status via that job_id. It appears > that > > > this sort of thing may be better handled by thread_pool, but there seem > to > > > be two ways of dealing with status reporting, and I''m curious whether > people > > > have found one to be preferable over the other: > > > > > > I could track status in the database, as I''m creating a new row for each > > > task anyway to store the results, or I could use register_status, with a > > > hash keyed on the equivalent of job_id (inside a mutex, as suggested in > the > > > README). Is there any reason to prefer the second over the first? > > > Alternately, am I incorrect in assuming that thread_pool is preferable > to > > > spawning one worker per user request? > > > > > > > thread_pool is definitely preferable over one worker per request approach. > > > > Also, usually register_status is faster than your hand rolled approach > > of using databases. Also, if worker status results can be stored in > > memcache clusters as well hence is preferable. > > > > Things seem to be working with thread_pool, except for one issue - > ask_status returns something incorrect the first time it''s called after an > ask_work call. It looks like the first ask_status response is the return > value from the worker method that''s calling thread_pool.defer, when I would > think that return value should be irrelevant (as the worker''s being invoked > with the non-blocking ask_work). Has anyone seen this behavior before? > > For reference, my code basically works this way (with all app-specific > stuff removed): > > (controller) > def initiate_task > @task = Task.create > MiddleMan.ask_work(:worker => :threaded_worker, :worker_method => > :process_task, :data => @task.id) > end > > #called via AJAX polling to update progress bar > def task_status > @task_id = params[:task_id].to_i > status_hash = MiddleMan.ask_status(:worker => :threaded_worker) > #do something with status_hash[@task_id]... > end > > (worker) > def create > @worker_status = {} > @status_lock = Mutex.new > register_status(@worker_status) > end > > def process_task(task_id) > thread_pool.defer(task_id) do |task_id| > #do several things which call update_status... > end > return {:this_should_be => :irrelevant} > end > > def update_status(task_id, status) > @status_mutex.synchronize do > @worker_status[task_id] = status > end > register_status(@worker_status) > end > > > Based on my logging in the controller, the first time task_status is called, > the status_hash retrieved looks something like this: {:type => :response, > :data => {:this_should_be => :irrelevant}, :client_signature => 11}. The > next time, however, it looks correct: {(task_id_1) => (task_status_1), > (task_id_2) => (task_status_2)}, etc. Again, has anyone seen this sort of > thing before? Am I using the API incorrectly?Thanks for the bug report. I was able to reproduce this and hence I fixed it in trunk ( thats been up on git for a while now ). Get the code using: git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb and follow the README as usual. You will need to install "packet" and "chronic" gems.