Sean O''Hara
2008-May-04 16:56 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
Hi, I am using backgroundrb to process audio files from a rails controller. Currently a new worker gets created every time the method is called on the worker, using this code: @job_key = MiddleMan.new_worker(:worker => :audio_file_worker, :job_key => Time.now.to_i) MiddleMan.worker(:audio_file_worker, @job_key).make_new_audio_file(params[:release_id]) I need to create the new worker each time in order to be able to get the status from the worker using the job key (this info is returned to the client using ajax requests). But this means that I end up with many workers eating up memory, and just hanging around after their jobs are complete. I am planning to just kill them from the controller each time the status returns that they are complete. This will prevent the extra processes from hanging around and using memory, but I guess it still entails some ''costs'' in starting up the new worker each time, since it contains a rails instance? Is there a better approach, e.g. just having one worker, and sending the jobs to the same worker? If so, how do we keep track of statuses of unique jobs in this case? Since the job_key is created when creating a new worker, isn''t in effect a worker key, rather than a job key? I did read about this different approach discussed here (using thread_pool.defer) but it doesn''t seem to allow for getting status of the unique threads, as far as I can tell: http://rubyforge.org/pipermail/backgroundrb-devel/2008-February/001532.html Any guidance is appreciated. Thanks, Sean
Jack Nutting
2008-May-05 08:35 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
The way that I (and perhaps many others here?) handle this is to use the database to hold a "work queue", and have a long-running worker that polls the database periodically and handles any pending requests. In your case for example, you could have an AudioFileWork model, containing fields for "release_id" and "pending"; Your app would create a new instance of that with pending=true, and your background worker would poll for rows where pending=true, then mark them as false when they are complete. For your post-creation interactions (to show the status with ajax) you''d use the id of the created AudioFileWork instead of a job key. There are many advantages to this approach: - you always have a known number of workers (the number you configure and start), so you won''t have uncontrollable memory usage explosions if your site gets busy, just slower response times - you can check the status of a request by querying the database - there is very little in-memory data that is lost in unhappy events (a crash or unhandled exception) - you have a historical record in the databse of all work that is completed by a background worker - you get a very loose coupling to backgroundrb. In my case, I simply specify in a model class that it needs processing, and it just happens. My app doesn''t "talk" directly with backgroundrb at all, except for an admin page where I can make sure it''s up and running. -- // jack // http://www.nuthole.com
Jack Nutting
2008-May-05 16:33 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Mon, May 5, 2008 at 4:25 PM, Sean O''Hara <sohara at sohara.com> wrote:> Hi Jack, > > That approach sounds very good, and makes a lot of sense for this kind of > job. Although, it doesn''t give me the satisfaction of watching the ajax > progress bar :) > > But could we do for a background job that really does require providing > feedback to the user, such as processing a credit card transaction for an > order in real time. I would like to offload the job to backgroundrb so that > the user is getting some feedback, and isn''t tempted to submit the order > twice out of impatience, but I also don''t want extra workers hanging around, > or to be incurring the extra memory usage associated with starting them up. > Is there a way to have a single worker act as the transaction processor, but > still be able to give back the status of unique jobs to rails?How about an extension of the approach I mentioned earlier: Instead of just a simple boolean "pending" flag, you could have a field of any kind you like: An integer or a float to indicate percentage complete, or a string for arbitrary text. Then, while your backgroundrb worker is doing its thing, it can update the field with the current status (percentage complete, or "authenticating card number..." etc), and your ajax method could grab that from the database instead of asking the worker for it. -- // jack // http://www.nuthole.com
Frank Schwach
2008-Jun-24 17:26 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
Jack, I just found your interesting post in the archive and I would like to come back to this. I need to implement something like this: I have some very long running tasks (several hours) that should run on a remote machine and talk to the database on the Rails server. I need to keep track of jobs including those that have been run in the past, so a table for background jobs with their status as you describe would be the best solution for me. I am just wondering whether backgroundrb wouldn''t be a bit of an overkill in the scenario you describe? In the new "Advanced Rails Recipes" from the Pragmatic Programmers Bookshelf there is a recipe using a simple daemonized ruby process that polls the database for pending jobs and uses acts_as_state_machine to set the state of the jobs (there is also a nice BackgrounDRb recipe in the book by the way). I am just wondering if the daemonized process isn''t easier to handle in this case since you don''t integrate your app with backgroundrb very tightly anyway? I would be grateful for any suggestions because there seem to be lots of possible solutions for this problem and some more or less well documented plugins and I haven''t used any of them before. I need a simple and robust method that doesn''t have too many dependencies and doesn''t require too much maintenance because I want to make the finished app available for others to install on their local systems. Thanks in advance, Frank On Mon, 2008-05-05 at 10:35 +0200, Jack Nutting wrote:> The way that I (and perhaps many others here?) handle this is to use > the database to hold a "work queue", and have a long-running worker > that polls the database periodically and handles any pending requests. > In your case for example, you could have an AudioFileWork model, > containing fields for "release_id" and "pending"; Your app would > create a new instance of that with pending=true, and your background > worker would poll for rows where pending=true, then mark them as false > when they are complete. For your post-creation interactions (to show > the status with ajax) you''d use the id of the created AudioFileWork > instead of a job key. > > There are many advantages to this approach: > > - you always have a known number of workers (the number you configure > and start), so you won''t have uncontrollable memory usage explosions > if your site gets busy, just slower response times > - you can check the status of a request by querying the database > - there is very little in-memory data that is lost in unhappy events > (a crash or unhandled exception) > - you have a historical record in the databse of all work that is > completed by a background worker > - you get a very loose coupling to backgroundrb. In my case, I simply > specify in a model class that it needs processing, and it just > happens. My app doesn''t "talk" directly with backgroundrb at all, > except for an admin page where I can make sure it''s up and running. > > >
Jack Nutting
2008-Jun-25 07:32 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote:> Jack, > I just found your interesting post in the archive and I would like to > come back to this. I need to implement something like this: > > I have some very long running tasks (several hours) that should run on a > remote machine and talk to the database on the Rails server. I need to > keep track of jobs including those that have been run in the past, so a > table for background jobs with their status as you describe would be the > best solution for me. > > I am just wondering whether backgroundrb wouldn''t be a bit of an > overkill in the scenario you describe? In the new "Advanced Rails > Recipes" from the Pragmatic Programmers Bookshelf there is a recipe > using a simple daemonized ruby process that polls the database for > pending jobs and uses acts_as_state_machine to set the state of the jobs > (there is also a nice BackgrounDRb recipe in the book by the way). > I am just wondering if the daemonized process isn''t easier to handle in > this case since you don''t integrate your app with backgroundrb very > tightly anyway? > > I would be grateful for any suggestions because there seem to be lots of > possible solutions for this problem and some more or less well > documented plugins and I haven''t used any of them before. I need a > simple and robust method that doesn''t have too many dependencies and > doesn''t require too much maintenance because I want to make the finished > app available for others to install on their local systems.This is an interesting question, Frank. My usage of backgroundrb is somewhat of an edge case, and most of what I''m doing with it could definitely be done with a simpler system. I initially chose backgroundrb for my project because it seemed to make the most sense at the time (for what I *thought* I needed; actual needs changed with further exploration of the problem space), and I was enough of a ruby newbie that it felt comfortable for me to have a packaged solution that (mostly) "just worked". If I were starting from scratch today, I might make a different decision. However, it''s not only inertia that keeps me using backgroundrb. For one thing, backgroundrb does provide some handy things--centralized logging, IPC for storing runtime status info about my processes, etc--that would take some time for me to implement if I were rolling my own solutions with a daemonized script, and from my perspective that would be wasted time, since I have those things working today thanks to backgroundrb. Another reason for me to keep it is that I have a few spots in my system where I''m considering using some of backgroundrb''s other key features, like launching a short-lived process to handle something in response to some action happening in the main application -- // jack // http://www.nuthole.com
hemant
2008-Jun-25 10:26 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Wed, Jun 25, 2008 at 1:02 PM, Jack Nutting <jnutting at gmail.com> wrote:> On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote: >> Jack, >> I just found your interesting post in the archive and I would like to >> come back to this. I need to implement something like this: >> >> I have some very long running tasks (several hours) that should run on a >> remote machine and talk to the database on the Rails server. I need to >> keep track of jobs including those that have been run in the past, so a >> table for background jobs with their status as you describe would be the >> best solution for me. >> >> I am just wondering whether backgroundrb wouldn''t be a bit of an >> overkill in the scenario you describe? In the new "Advanced Rails >> Recipes" from the Pragmatic Programmers Bookshelf there is a recipe >> using a simple daemonized ruby process that polls the database for >> pending jobs and uses acts_as_state_machine to set the state of the jobs >> (there is also a nice BackgrounDRb recipe in the book by the way). >> I am just wondering if the daemonized process isn''t easier to handle in >> this case since you don''t integrate your app with backgroundrb very >> tightly anyway? >> >> I would be grateful for any suggestions because there seem to be lots of >> possible solutions for this problem and some more or less well >> documented plugins and I haven''t used any of them before. I need a >> simple and robust method that doesn''t have too many dependencies and >> doesn''t require too much maintenance because I want to make the finished >> app available for others to install on their local systems. > > This is an interesting question, Frank. My usage of backgroundrb is > somewhat of an edge case, and most of what I''m doing with it could > definitely be done with a simpler system. I initially chose > backgroundrb for my project because it seemed to make the most sense > at the time (for what I *thought* I needed; actual needs changed with > further exploration of the problem space), and I was enough of a ruby > newbie that it felt comfortable for me to have a packaged solution > that (mostly) "just worked". If I were starting from scratch today, I > might make a different decision. > > However, it''s not only inertia that keeps me using backgroundrb. For > one thing, backgroundrb does provide some handy things--centralized > logging, IPC for storing runtime status info about my processes, > etc--that would take some time for me to implement if I were rolling > my own solutions with a daemonized script, and from my perspective > that would be wasted time, since I have those things working today > thanks to backgroundrb. Another reason for me to keep it is that I > have a few spots in my system where I''m considering using some of > backgroundrb''s other key features, like launching a short-lived > process to handle something in response to some action happening in > the main application >Well, I am working on couple of new things with BackgrounDRb. Result storage and retrieval is one of them,as I mentioned in earlier mails and solicited opinions from fellows who are using bdrb. You can checkout http://github.com/gnufied/backgroundrb/commits/testcase So whats there on this branch of BackgrounDRb which will become master very soon. 1> True clustering system for clustering backgroundrb servers running on N nodes. Tasks are dispatched in a round robin manner, but you can specify the host on which you want execute task: MiddleMan.worker(:foo_worker).async_some_work(:args => "lol") ^^ will choose any server in a round robin manner and run "some_work" method in the specified worker. You can also specify: :host => <local or all or "10.0.0.6:11001"> which overrides the default behaviour and run specified method on local bdrb server, all bdrb server or specified server. 2> Clustering is failsafe and if one bdrb node goes down, all the requests are immediately started to being routed to remaining servers. Once that node comes up, it automatically starts participating in clustering process. 3> Results can be stored in memcache and register_status method has been replace by a "cache" object available in all workers. Hence you can cache results with: cache[@user.id] = some_data in your workers and later you can retrieve results using: MiddleMan.worker(:foo_worker).ask_result(@user.id) I will seriously recommend using memcache if you are clustering bdrb servers. Also, cache object''s caching mechanism is completely thread safe and hence can be used from within the thread pool or anywhere you want. 4> Apart from memory based job queue that you can use with thread pools, testcase branch implements database based job queues. So, to enquue a particular task: MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args) some_task method will be automatically called in first availbable worker and task will be dequed from database.Also, jobs with duplicate keys automatically get rejected. Note that, above things are already working on test case branch. I think, these features make bdrb a very compelling choice. Some things that I will finish in a day or two: 5> Similar to worker method invocation, with each scheduled method, you can specify host on which this task should run. For example, if you have 5 bdrb servers and you have scheduled billing task to run every sunday. Now, you don''t want billing task to run on sunday on all the servers. So, by default scheduled task will run on the server on which its been created but you can specify host on which it should run.
Frank Schwach
2008-Jun-25 11:00 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
Hemant, These latest additions to backgroundrb look pretty cool. Unfortunately, I don''t think I will be able to use it this way because in my setup I can''t run anything on the cluster nodes directly. I have to submit jobs to a queuing system on the cluster''s master node, which is why I think a simple daemon running on the master node that polls the (remote) db for pending jobs and then submits these to the queue would probably be better for my case - but I''m far from being an expert on distributed systems so any suggestions are very welcome! On Wed, 2008-06-25 at 15:56 +0530, hemant wrote:> On Wed, Jun 25, 2008 at 1:02 PM, Jack Nutting <jnutting at gmail.com> wrote: > > On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote: > >> Jack, > >> I just found your interesting post in the archive and I would like to > >> come back to this. I need to implement something like this: > >> > >> I have some very long running tasks (several hours) that should run on a > >> remote machine and talk to the database on the Rails server. I need to > >> keep track of jobs including those that have been run in the past, so a > >> table for background jobs with their status as you describe would be the > >> best solution for me. > >> > >> I am just wondering whether backgroundrb wouldn''t be a bit of an > >> overkill in the scenario you describe? In the new "Advanced Rails > >> Recipes" from the Pragmatic Programmers Bookshelf there is a recipe > >> using a simple daemonized ruby process that polls the database for > >> pending jobs and uses acts_as_state_machine to set the state of the jobs > >> (there is also a nice BackgrounDRb recipe in the book by the way). > >> I am just wondering if the daemonized process isn''t easier to handle in > >> this case since you don''t integrate your app with backgroundrb very > >> tightly anyway? > >> > >> I would be grateful for any suggestions because there seem to be lots of > >> possible solutions for this problem and some more or less well > >> documented plugins and I haven''t used any of them before. I need a > >> simple and robust method that doesn''t have too many dependencies and > >> doesn''t require too much maintenance because I want to make the finished > >> app available for others to install on their local systems. > > > > This is an interesting question, Frank. My usage of backgroundrb is > > somewhat of an edge case, and most of what I''m doing with it could > > definitely be done with a simpler system. I initially chose > > backgroundrb for my project because it seemed to make the most sense > > at the time (for what I *thought* I needed; actual needs changed with > > further exploration of the problem space), and I was enough of a ruby > > newbie that it felt comfortable for me to have a packaged solution > > that (mostly) "just worked". If I were starting from scratch today, I > > might make a different decision. > > > > However, it''s not only inertia that keeps me using backgroundrb. For > > one thing, backgroundrb does provide some handy things--centralized > > logging, IPC for storing runtime status info about my processes, > > etc--that would take some time for me to implement if I were rolling > > my own solutions with a daemonized script, and from my perspective > > that would be wasted time, since I have those things working today > > thanks to backgroundrb. Another reason for me to keep it is that I > > have a few spots in my system where I''m considering using some of > > backgroundrb''s other key features, like launching a short-lived > > process to handle something in response to some action happening in > > the main application > > > > Well, I am working on couple of new things with BackgrounDRb. Result > storage and retrieval is one of them,as I mentioned in earlier mails > and solicited opinions from fellows who are using bdrb. You can > checkout > > http://github.com/gnufied/backgroundrb/commits/testcase > > So whats there on this branch of BackgrounDRb which will become master > very soon. > > 1> True clustering system for clustering backgroundrb servers running > on N nodes. Tasks are dispatched in a round robin manner, but you can > specify the host on which you want execute task: > > MiddleMan.worker(:foo_worker).async_some_work(:args => "lol") > > ^^ will choose any server in a round robin manner and run "some_work" > method in the specified worker. You can also specify: > :host => <local or all or "10.0.0.6:11001"> > > which overrides the default behaviour and run specified method on > local bdrb server, all bdrb server or specified server. > > 2> Clustering is failsafe and if one bdrb node goes down, all the > requests are immediately started to being routed to remaining servers. > Once that node comes up, it automatically starts participating in > clustering process. > > 3> Results can be stored in memcache and register_status method has > been replace by a "cache" object available in all workers. Hence you > can cache results with: > > cache[@user.id] = some_data > > in your workers and later you can retrieve results using: > > MiddleMan.worker(:foo_worker).ask_result(@user.id) > > I will seriously recommend using memcache if you are clustering bdrb > servers. Also, cache object''s caching mechanism is completely thread > safe and hence can be used from within the thread pool or anywhere you > want. > > 4> Apart from memory based job queue that you can use with thread > pools, testcase branch implements database based job queues. So, to > enquue a particular task: > > MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args) > > some_task method will be automatically called in first availbable > worker and task will be dequed from database.Also, jobs with duplicate > keys automatically get rejected. > > Note that, above things are already working on test case branch. I > think, these features make bdrb a very compelling choice. > > Some things that I will finish in a day or two: > > 5> Similar to worker method invocation, with each scheduled method, > you can specify host on which this task should run. For example, if > you have 5 bdrb servers and you have scheduled billing task to run > every sunday. Now, you don''t want billing task to run on sunday on all > the servers. So, by default scheduled task will run on the server on > which its been created but you can specify host on which it should > run.-- +++++++++++++++++++++++++++++++ Dr Frank Schwach School of Computing Sciences University of East Anglia Norwich, NR4 7TJ Tel: 0044/(0)1603 - 592 405 www.cmp.uea.ac.uk ++++++++++++++++++++++++++++++++
hemant
2008-Jun-25 11:20 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote:> Hemant, > These latest additions to backgroundrb look pretty cool. Unfortunately, > I don''t think I will be able to use it this way because in my setup I > can''t run anything on the cluster nodes directly. I have to submit jobs > to a queuing system on the cluster''s master node, which is why I think a > simple daemon running on the master node that polls the (remote) db for > pending jobs and then submits these to the queue would probably be > better for my case - but I''m far from being an expert on distributed > systems so any suggestions are very welcome!Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays lightweight because most of these changes aren''t affecting core and have really went into client side of code. MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args) Above snippet does exactly that. But anyways, i think you feel its too complicated for your setup? I can''t help that feeling. Its complicated, if its complicated to setup and use, but its not. Those features totally stay out of your way, if you don''t need them.
Steve D
2008-Jul-16 07:54 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
hemant, Would bdrb''s queueing mechanism allow for prioritizing of tasks? I have a few long-running low priority tasks that I''d want to run on the condition that no high priority tasks are in the queue. Jack''s solution makes sense to my newbie brain, each time my queue_processing_worker accesses the table, it can sort unprocessed tasks by priority and begin from there. But I''d rather stick to bdrb convention if it''s inbuilt already. - Steve On Wed, Jun 25, 2008 at 7:20 AM, hemant <gethemant at gmail.com> wrote:> On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at uea.ac.uk> > wrote: > > Hemant, > > These latest additions to backgroundrb look pretty cool. Unfortunately, > > I don''t think I will be able to use it this way because in my setup I > > can''t run anything on the cluster nodes directly. I have to submit jobs > > to a queuing system on the cluster''s master node, which is why I think a > > simple daemon running on the master node that polls the (remote) db for > > pending jobs and then submits these to the queue would probably be > > better for my case - but I''m far from being an expert on distributed > > systems so any suggestions are very welcome! > > Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays > lightweight because most of these changes aren''t affecting core and > have really went into client side of code. > > MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args) > > Above snippet does exactly that. But anyways, i think you feel its too > complicated for your setup? I can''t help that feeling. Its > complicated, if its complicated to setup and use, but its not. Those > features totally stay out of your way, if you don''t need them. > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080716/18265b2a/attachment.html>
Frank Schwach
2008-Jul-16 18:10 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
Hi Hemant and others, Once again thank you for your reply to my last post and the excellent work you are doing! Having seen your announcement of all those new features really convinced me to go ahead and use backgroundrb for my app. However, I must admit being quite confused now with all the latest changes to the API and would appreciate some help getting my head around this. So, basically, I have an app with a database table of long running tasks. There are several different types of tasks, so in my table I have a column for the type of job to be run and another for serialized data for the job (basically some IDs and args). I want to run the actual jobs on a remote machine, which the new version of backgroundrb seems to have covered nicely now. So this is what I thought I could do: Have a pool of x RemoteJobWorkers on the remote machine waiting to accept jobs: class RemoteJobWorker def enque_job (job_type, job_args) thread_pool.defer(:job_type, job_args) end def job_type1 (args) # update job table with status Jobs.update(job_key, :status => running) # perform the long running task large_dataset.each do # update job table with progress Jobs.update(job_key, :progress => x) end # job done: update status again Jobs.update(job_key, :status => complete) end def jobtype2 (args) ... end def jobtype3 (args) ... end .. and so on .. end So when a job is submitted by a user, the controller sends it to the queue on the remote machine like this(?): MiddleMan.worker(:remote_job_worker).async_enque_job(:arg => data,:job_key => new_job.id,:host => "a.b.c.d:XXX") where new_job is the newly created instance of my Jobs model. Does that make sense or is there a better approach that I should be using? I also like the idea of just having a scheduled worker on the remote machine polling the database for pending jobs and then sending them to the queue just like I am trying to do with the above. I would really appreciate opinions from some experienced users about this to help me decide. In the above I am assuming that the jobs running in thread pool have access to the job key and can therefore identify the row in the Jobs table that they need to update? I hope I understood the announcement of the latest version correctly that that is the case? One of my tasks would itself fork to run a third-party program during its run time - would that be possible inside a job running in a thread already or would the space-time continuum collapse if I tried that? I noticed in the announcement that there is support for persistent job queues now but I am not sure how to tie this in with my existing table. Hemant, do you have a more detailed example of how to use this and what the model would look like? Apologies for all the questions - your help is more than welcome!! Thank you all in advance Frank On Wed, 2008-06-25 at 16:50 +0530, hemant wrote:> On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote: > > Hemant, > > These latest additions to backgroundrb look pretty cool. Unfortunately, > > I don''t think I will be able to use it this way because in my setup I > > can''t run anything on the cluster nodes directly. I have to submit jobs > > to a queuing system on the cluster''s master node, which is why I think a > > simple daemon running on the master node that polls the (remote) db for > > pending jobs and then submits these to the queue would probably be > > better for my case - but I''m far from being an expert on distributed > > systems so any suggestions are very welcome! > > Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays > lightweight because most of these changes aren''t affecting core and > have really went into client side of code. > > MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args) > > Above snippet does exactly that. But anyways, i think you feel its too > complicated for your setup? I can''t help that feeling. Its > complicated, if its complicated to setup and use, but its not. Those > features totally stay out of your way, if you don''t need them.-- +++++++++++++++++++++++++++++++ Dr Frank Schwach School of Computing Sciences University of East Anglia Norwich, NR4 7TJ Tel: 0044/(0)1603 - 592 405 www.cmp.uea.ac.uk ++++++++++++++++++++++++++++++++
hemant
2008-Jul-16 18:17 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Wed, Jul 16, 2008 at 1:24 PM, Steve D <swertyui at gmail.com> wrote:> hemant, > > Would bdrb''s queueing mechanism allow for prioritizing of tasks? > > I have a few long-running low priority tasks that I''d want to run on the > condition that no high priority tasks are in the queue. > > Jack''s solution makes sense to my newbie brain, each time my > queue_processing_worker accesses the table, it can sort unprocessed tasks by > priority and begin from there. But I''d rather stick to bdrb convention if > it''s inbuilt already. >No, it can''t right now. But, it should be trivial to add such functionality, I welcome a patch.
hemant
2008-Jul-16 18:19 UTC
[Backgroundrb-devel] best approach to managing workers and getting status
On Wed, Jul 16, 2008 at 11:40 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote:> Hi Hemant and others, > > Once again thank you for your reply to my last post and the excellent > work you are doing! Having seen your announcement of all those new > features really convinced me to go ahead and use backgroundrb for my > app. However, I must admit being quite confused now with all the latest > changes to the API and would appreciate some help getting my head around > this. So, basically, I have an app with a database table of long running > tasks. There are several different types of tasks, so in my table I have > a column for the type of job to be run and another for serialized data > for the job (basically some IDs and args). I want to run the actual jobs > on a remote machine, which the new version of backgroundrb seems to have > covered nicely now. > > So this is what I thought I could do: > > Have a pool of x RemoteJobWorkers on the remote machine waiting to > accept jobs: > > class RemoteJobWorker > def enque_job (job_type, job_args) > thread_pool.defer(:job_type, job_args) > end > > def job_type1 (args) > # update job table with status > Jobs.update(job_key, :status => running) > > # perform the long running task > large_dataset.each do > # update job table with progress > Jobs.update(job_key, :progress => x) > end > > # job done: update status again > Jobs.update(job_key, :status => complete) > end > > def jobtype2 (args) ... end > def jobtype3 (args) ... end > .. and so on .. > > end > > So when a job is submitted by a user, the controller sends it to the > queue on the remote machine like this(?): > > MiddleMan.worker(:remote_job_worker).async_enque_job(:arg => > data,:job_key => new_job.id,:host => "a.b.c.d:XXX") > > where new_job is the newly created instance of my Jobs model. >I see you are rolling out your own table based queue here, why not just use inbuilt persistent database queue? For the worker, read this page: http://backgroundrb.rubyforge.org/workers/ Persistent job queue is explained at the bottom. For adding tasks to the queue you need to do: MiddleMan.worker(:remote_worker).enq_some_job(:arg => some_arg) Task will be added to the db queue and automatically picked up. Handles hairy race conditions for you and stuff like that.
hemant
2008-Jul-17 09:06 UTC
[Backgroundrb-devel] Fwd: best approach to managing workers and getting status
---------- Forwarded message ---------- From: hemant <gethemant at gmail.com> Date: Thu, Jul 17, 2008 at 2:36 PM Subject: Re: [Backgroundrb-devel] best approach to managing workers and getting status To: Frank Schwach <f.schwach at uea.ac.uk> On Thu, Jul 17, 2008 at 1:46 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote:> Thanks Hemant, > > Yes, I saw the part about the persistent job queue but I have a couple > of questions about this: > > What does the table for the job queue look like? >You can open mysql or whatever db you are using and run: desc bdrb_job_queues; to see, whats the schema of the table.> How do I set other states of the job like "error"? Is the job status > changed from "pending" to "running" automatically in the table so that I > can query this table for pending/running/completed jobs?When a task is pulled out of queue, its flagged as taken and you can specify a timeout period while creating a task. When you invoke "#finish!" task is marked finished. There is no, "error", because, if you a task is "taken" and not "finished!" within specified period, its automatically considered in error state.> > Can I get the jobs ID from within the worker so that I can update the > job queue table "manually" too? If I want to record a "percentage > completion" I guess I would need that.Sure, from anywhere in your worker code, you can use, ''persistent_job'' attribute to get task thats dequed from job queue and is currently running.> Can I specify the (remote) host with enq_some_job like I can with the > async_enque_job method?You can''t, because for persistent tasks, it doesn''t matter, the worker which fetches the task first, gets to execute it anyway. -- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org