After some initial stumblings, I think I''ve got the hang of backgroundrb. It''s great! I''d been thinking for many many months how cool something like this would be! I''m trying to make a "job queue". That is, a pool of worker threads monitor a queue. When a job appears, one of the workers grabs it and executes the task. When complete, the worker returns to watching the queue. If all workers are busy, jobs simply sit in the queue until the resources are available. I can see how to build an array containing jobs. And a mutex could be used to ensure thread-safe access to the array. But how can the workers "go to sleep" until a job arrives? Any thoughts? Thanks again! Norman
On Feb 1, 2008 1:58 PM, Norman Elton <normelton at gmail.com> wrote:> > I can see how to build an array containing jobs. And a mutex could be > used to ensure thread-safe access to the array. But how can the > workers "go to sleep" until a job arrives? > > Any thoughts?I had a similar requirement and ended up just using a single worker and thread_pool.defer (the documentation talks about how to use this.) I''ve looked at the BackgrounDRb code and thread_pool.defer makes use of a queue which a configurable number of threads read from. You can set the size of the pool by calling pool_size in your worker. The default pool size is 20 threads. If you need to know the status of each thread you need to use a mutex to synchronize access to a hashtable member variable which can contain the status for each thread (you would need to decide what the key would be for each thread, maybe the database ID of the job the thread is processing.) You would then pass this hashtable to the register_status method. There is a post from hemant on this mailing list from December which shows how to do this pretty well: http://rubyforge.org/pipermail/backgroundrb-devel/2007-December/001170.html The only caveat here is the green Ruby threads used in the thread_pool may not play well with the job processing you are doing. But from the testing I''ve done it seems pretty good, especially for something involving the network which probably has a lot of latency. The general architecture I would suggest would be to have a jobs table in the database, and when jobs are added the Rails model (after_create) can call MiddleMan.ask_work and pass the ID of the job just created. The worker will pass that job_id into thread_pool.defer which will then process the job. For my own work I tend to put all the heavy processing into separate classes which I simply call from the worker. So for you maybe something like JobProcessor.run(job_id) or whatever. Regards, Ryan
I was just about to reply, but you linked to my code sample already :) The bugs we were discussing in that thread have been resolved and I''ve been using thread pool in combination with mutex to save a hash of "statuses" as I described in that post for a little over a month now in production with no problems. Admittedly, the site in question is not exactly high traffic (75k pageviews for the month of January according to analytics), but I''ve had no stability issues whatsoever in the last month. I run a pool of 10 threads currently, but the job is fairly short so I doubt I''m ever using more than a couple threads at once. - Jason L. -- My Rails and Linux Blog: http://offtheline.net On Feb 1, 2008 11:19 AM, Ryan Leavengood <leavengood at gmail.com> wrote:> On Feb 1, 2008 1:58 PM, Norman Elton <normelton at gmail.com> wrote: > > > > I can see how to build an array containing jobs. And a mutex could be > > used to ensure thread-safe access to the array. But how can the > > workers "go to sleep" until a job arrives? > > > > Any thoughts? > > I had a similar requirement and ended up just using a single worker > and thread_pool.defer (the documentation talks about how to use this.) > I''ve looked at the BackgrounDRb code and thread_pool.defer makes use > of a queue which a configurable number of threads read from. You can > set the size of the pool by calling pool_size in your worker. The > default pool size is 20 threads. > > If you need to know the status of each thread you need to use a mutex > to synchronize access to a hashtable member variable which can contain > the status for each thread (you would need to decide what the key > would be for each thread, maybe the database ID of the job the thread > is processing.) You would then pass this hashtable to the > register_status method. There is a post from hemant on this mailing > list from December which shows how to do this pretty well: > > http://rubyforge.org/pipermail/backgroundrb-devel/2007-December/001170.html > > The only caveat here is the green Ruby threads used in the thread_pool > may not play well with the job processing you are doing. But from the > testing I''ve done it seems pretty good, especially for something > involving the network which probably has a lot of latency. > > The general architecture I would suggest would be to have a jobs table > in the database, and when jobs are added the Rails model > (after_create) can call MiddleMan.ask_work and pass the ID of the job > just created. The worker will pass that job_id into thread_pool.defer > which will then process the job. For my own work I tend to put all the > heavy processing into separate classes which I simply call from the > worker. So for you maybe something like JobProcessor.run(job_id) or > whatever. > > Regards, > Ryan > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >
On Sat, Feb 2, 2008 at 1:07 AM, Jason LaPier <jason.lapier at gmail.com> wrote:> I was just about to reply, but you linked to my code sample already :) > > The bugs we were discussing in that thread have been resolved and I''ve > been using thread pool in combination with mutex to save a hash of > "statuses" as I described in that post for a little over a month now > in production with no problems. Admittedly, the site in question is > not exactly high traffic (75k pageviews for the month of January > according to analytics), but I''ve had no stability issues whatsoever > in the last month. I run a pool of 10 threads currently, but the job > is fairly short so I doubt I''m ever using more than a couple threads > at once. >I have been toying with some code that I picked other Ruby projects for implementing a job queue based on database tables. But i absolutely don''t want to add any features without test cases in hand. But yeah, as told by Ryan and Json, you should have no trouble in using thread_pool feature.