thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] best approach to managing workers and getting status [May 2008]

If this information is useful, please help other people find it:
Share via:

Sean O''Hara

2008-May-04 16:56 UTC

[Backgroundrb-devel] best approach to managing workers and getting status

Hi,

I am using backgroundrb to process audio files from a rails  
controller. Currently a new worker gets created every time the method  
is called on the worker, using this code:

@job_key = MiddleMan.new_worker(:worker  
=> :audio_file_worker, :job_key => Time.now.to_i)
MiddleMan.worker(:audio_file_worker,  
@job_key).make_new_audio_file(params[:release_id])

I need to create the new worker each time in order to be able to get  
the status from the worker using the job key (this info is returned to  
the client using ajax requests). But this means that I end up with  
many workers eating up memory, and just hanging around after their  
jobs are complete.

I am planning to just kill them from the controller each time the  
status returns that they are complete. This will prevent the extra  
processes from hanging around and using memory, but I guess it still  
entails some ''costs'' in starting up the new worker each time,
since it
contains a rails instance?

Is there a better approach, e.g. just having one worker, and sending  
the jobs to the same worker? If so, how do we keep track of statuses  
of unique jobs in this case? Since the job_key is created when  
creating a new worker, isn''t in effect a worker key, rather than a job
key?

I did read about this different approach discussed here (using  
thread_pool.defer) but it doesn''t seem to allow for getting status of  
the unique threads, as far as I can tell:
http://rubyforge.org/pipermail/backgroundrb-devel/2008-February/001532.html

Any guidance is appreciated.

Thanks,
Sean

Jack Nutting

2008-May-05 08:35 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

The way that I (and perhaps many others here?) handle this is to use
 the database to hold a "work queue", and have a long-running worker
 that polls the database periodically and handles any pending requests.
  In your case for example, you could have an AudioFileWork model,
 containing fields for "release_id" and "pending";  Your app
would
 create a new instance of that with pending=true, and your background
 worker would poll for rows where pending=true, then mark them as false
 when they are complete.  For your post-creation interactions (to show
 the status with ajax) you''d use the id of the created AudioFileWork
 instead of a job key.

 There are many advantages to this approach:

 - you always have a known number of workers (the number you configure
 and start), so you won''t have uncontrollable memory usage explosions
 if your site gets busy, just slower response times
 - you can check the status of a request by querying the database
 - there is very little in-memory data that is lost in unhappy events
 (a crash or unhandled exception)
 - you have a historical record in the databse of all work that is
 completed by a background worker
 - you get a very loose coupling to backgroundrb.  In my case, I simply
 specify in a model class that it needs processing, and it just
 happens.  My app doesn''t "talk" directly with backgroundrb
at all,
 except for an admin page where I can make sure it''s up and running.



-- 
// jack
// http://www.nuthole.com

Jack Nutting

2008-May-05 16:33 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Mon, May 5, 2008 at 4:25 PM, Sean O''Hara <sohara at
sohara.com> wrote:> Hi Jack,
>
>  That approach sounds very good, and makes a lot of sense for this kind of
> job. Although, it doesn''t give me the satisfaction of watching the
ajax
> progress bar :)
>
>  But could we do for a background job that really does require providing
> feedback to the user, such as processing a credit card transaction for an
> order in real time. I would like to offload the job to backgroundrb so that
> the user is getting some feedback, and isn''t tempted to submit the
order
> twice out of impatience, but I also don''t want extra workers
hanging around,
> or to be incurring the extra memory usage associated with starting them up.
> Is there a way to have a single worker act as the transaction processor,
but
> still be able to give back the status of unique jobs to rails?
How about an extension of the approach I mentioned earlier:  Instead
of just a simple boolean "pending" flag, you could have a field of any
kind you like:  An integer or a float to indicate percentage complete,
or a string for arbitrary text.  Then, while your backgroundrb worker
is doing its thing, it can update the field with the current status
(percentage complete, or "authenticating card number..." etc), and
your ajax method could grab that from the database instead of asking
the worker for it.

-- 
// jack
// http://www.nuthole.com

Frank Schwach

2008-Jun-24 17:26 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

Jack,
I just found your interesting post in the archive and I would like to
come back to this. I need to implement something like this:

I have some very long running tasks (several hours) that should run on a
remote machine and talk to the database on the Rails server. I need to
keep track of jobs including those that have been run in the past, so a
table for background jobs with their status as you describe would be the
best solution for me. 

I am just wondering whether backgroundrb wouldn''t be a bit of an
overkill in the scenario you describe? In the new "Advanced Rails
Recipes" from the Pragmatic Programmers Bookshelf there is a recipe
using a simple daemonized ruby process that polls the database for
pending jobs and uses acts_as_state_machine to set the state of the jobs
(there is also a nice BackgrounDRb recipe in the book by the way). 
I am just wondering if the daemonized process isn''t easier to handle in
this case since you don''t integrate your app with backgroundrb very
tightly anyway?

I would be grateful for any suggestions because there seem to be lots of
possible solutions for this problem and some more or less well
documented plugins and I haven''t used any of them before. I need a
simple and robust method that doesn''t have too many dependencies and
doesn''t require too much maintenance because I want to make the
finished
app available for others to install on their local systems. 

Thanks in advance,

Frank

On Mon, 2008-05-05 at 10:35 +0200, Jack Nutting wrote:> The way that I (and perhaps many others here?) handle this is to use
>  the database to hold a "work queue", and have a long-running
worker
>  that polls the database periodically and handles any pending requests.
>   In your case for example, you could have an AudioFileWork model,
>  containing fields for "release_id" and "pending"; 
Your app would
>  create a new instance of that with pending=true, and your background
>  worker would poll for rows where pending=true, then mark them as false
>  when they are complete.  For your post-creation interactions (to show
>  the status with ajax) you''d use the id of the created
AudioFileWork
>  instead of a job key.
> 
>  There are many advantages to this approach:
> 
>  - you always have a known number of workers (the number you configure
>  and start), so you won''t have uncontrollable memory usage
explosions
>  if your site gets busy, just slower response times
>  - you can check the status of a request by querying the database
>  - there is very little in-memory data that is lost in unhappy events
>  (a crash or unhandled exception)
>  - you have a historical record in the databse of all work that is
>  completed by a background worker
>  - you get a very loose coupling to backgroundrb.  In my case, I simply
>  specify in a model class that it needs processing, and it just
>  happens.  My app doesn''t "talk" directly with
backgroundrb at all,
>  except for an admin page where I can make sure it''s up and
running.
> 
> 
>

Jack Nutting

2008-Jun-25 07:32 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at uea.ac.uk>
wrote:> Jack,
> I just found your interesting post in the archive and I would like to
> come back to this. I need to implement something like this:
>
> I have some very long running tasks (several hours) that should run on a
> remote machine and talk to the database on the Rails server. I need to
> keep track of jobs including those that have been run in the past, so a
> table for background jobs with their status as you describe would be the
> best solution for me.
>
> I am just wondering whether backgroundrb wouldn''t be a bit of an
> overkill in the scenario you describe? In the new "Advanced Rails
> Recipes" from the Pragmatic Programmers Bookshelf there is a recipe
> using a simple daemonized ruby process that polls the database for
> pending jobs and uses acts_as_state_machine to set the state of the jobs
> (there is also a nice BackgrounDRb recipe in the book by the way).
> I am just wondering if the daemonized process isn''t easier to
handle in
> this case since you don''t integrate your app with backgroundrb
very
> tightly anyway?
>
> I would be grateful for any suggestions because there seem to be lots of
> possible solutions for this problem and some more or less well
> documented plugins and I haven''t used any of them before. I need a
> simple and robust method that doesn''t have too many dependencies
and
> doesn''t require too much maintenance because I want to make the
finished
> app available for others to install on their local systems.
This is an interesting question, Frank.  My usage of backgroundrb is
somewhat of an edge case, and most of what I''m doing with it could
definitely be done with a simpler system.  I initially chose
backgroundrb for my project because it seemed to make the most sense
at the time (for what I *thought* I needed; actual needs changed with
further exploration of the problem space), and I was enough of a ruby
newbie that it felt comfortable for me to have a packaged solution
that (mostly) "just worked".  If I were starting from scratch today, I
might make a different decision.

However, it''s not only inertia that keeps me using backgroundrb.  For
one thing, backgroundrb does provide some handy things--centralized
logging, IPC for storing runtime status info about my processes,
etc--that would take some time for me to implement if I were rolling
my own solutions with a daemonized script, and from my perspective
that would be wasted time, since I have those things working today
thanks to backgroundrb.  Another reason for me to keep it is that I
have a few spots in my system where I''m considering using some of
backgroundrb''s other key features, like launching a short-lived
process to handle something in response to some action happening in
the main application

-- 
// jack
// http://www.nuthole.com

hemant

2008-Jun-25 10:26 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Wed, Jun 25, 2008 at 1:02 PM, Jack Nutting <jnutting at gmail.com>
wrote:> On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at
uea.ac.uk> wrote:
>> Jack,
>> I just found your interesting post in the archive and I would like to
>> come back to this. I need to implement something like this:
>>
>> I have some very long running tasks (several hours) that should run on
a
>> remote machine and talk to the database on the Rails server. I need to
>> keep track of jobs including those that have been run in the past, so a
>> table for background jobs with their status as you describe would be
the
>> best solution for me.
>>
>> I am just wondering whether backgroundrb wouldn''t be a bit of
an
>> overkill in the scenario you describe? In the new "Advanced Rails
>> Recipes" from the Pragmatic Programmers Bookshelf there is a
recipe
>> using a simple daemonized ruby process that polls the database for
>> pending jobs and uses acts_as_state_machine to set the state of the
jobs
>> (there is also a nice BackgrounDRb recipe in the book by the way).
>> I am just wondering if the daemonized process isn''t easier to
handle in
>> this case since you don''t integrate your app with backgroundrb
very
>> tightly anyway?
>>
>> I would be grateful for any suggestions because there seem to be lots
of
>> possible solutions for this problem and some more or less well
>> documented plugins and I haven''t used any of them before. I
need a
>> simple and robust method that doesn''t have too many
dependencies and
>> doesn''t require too much maintenance because I want to make
the finished
>> app available for others to install on their local systems.
>
> This is an interesting question, Frank.  My usage of backgroundrb is
> somewhat of an edge case, and most of what I''m doing with it could
> definitely be done with a simpler system.  I initially chose
> backgroundrb for my project because it seemed to make the most sense
> at the time (for what I *thought* I needed; actual needs changed with
> further exploration of the problem space), and I was enough of a ruby
> newbie that it felt comfortable for me to have a packaged solution
> that (mostly) "just worked".  If I were starting from scratch
today, I
> might make a different decision.
>
> However, it''s not only inertia that keeps me using backgroundrb. 
For
> one thing, backgroundrb does provide some handy things--centralized
> logging, IPC for storing runtime status info about my processes,
> etc--that would take some time for me to implement if I were rolling
> my own solutions with a daemonized script, and from my perspective
> that would be wasted time, since I have those things working today
> thanks to backgroundrb.  Another reason for me to keep it is that I
> have a few spots in my system where I''m considering using some of
> backgroundrb''s other key features, like launching a short-lived
> process to handle something in response to some action happening in
> the main application
>
Well, I am working on couple of  new things with BackgrounDRb. Result
storage and retrieval is one of them,as I mentioned in earlier mails
and solicited opinions from fellows who are using bdrb. You can
checkout

http://github.com/gnufied/backgroundrb/commits/testcase

So whats there on this branch of BackgrounDRb which will become master
very soon.

1> True clustering system for clustering backgroundrb servers running
on N nodes. Tasks are dispatched in a round robin manner, but you can
specify the host on which you want execute task:

MiddleMan.worker(:foo_worker).async_some_work(:args => "lol")

^^ will choose any server in a round robin manner and run "some_work"
method in the specified worker. You can also specify:
:host => <local or all or "10.0.0.6:11001">

which overrides the default behaviour and run specified method on
local bdrb server, all bdrb server or specified server.

2> Clustering is failsafe and if one bdrb node goes down, all the
requests are immediately started to being routed to remaining servers.
Once that node comes up, it automatically starts participating in
clustering process.

3> Results can be stored in memcache and register_status method has
been replace by a "cache" object available in all workers. Hence you
can cache results with:

cache[@user.id] = some_data

in your workers and later you can retrieve results using:

MiddleMan.worker(:foo_worker).ask_result(@user.id)

I will seriously recommend using memcache if you are clustering bdrb
servers. Also, cache object''s caching mechanism is completely thread
safe and hence can be used from within the thread pool or anywhere you
want.

4> Apart from memory based job queue that you can use with thread
pools, testcase branch implements database based job queues. So, to
enquue a particular task:

MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)

some_task method will be automatically called in first availbable
worker and task will be dequed from database.Also, jobs with duplicate
keys automatically get rejected.

Note that, above things are already working on test case branch. I
think, these features make bdrb a very compelling choice.

Some things that I will finish in a day or two:

5> Similar to worker method invocation, with each scheduled method,
you can specify host on which this task should run. For example, if
you have 5 bdrb servers and you have scheduled billing task to run
every sunday. Now, you don''t want billing task to run on sunday on all
the servers. So, by default scheduled task will run on the server on
which its been created but you can specify host on which it should
run.

Frank Schwach

2008-Jun-25 11:00 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

Hemant,
These latest additions to backgroundrb look pretty cool. Unfortunately,
I don''t think I will be able to use it this way because in my setup I
can''t run anything on the cluster nodes directly. I have to submit jobs
to a queuing system on the cluster''s master node, which is why I think
a
simple daemon running on the master node that polls the (remote) db for
pending jobs and then submits these to the queue would probably be
better for my case - but I''m far from being an expert on distributed
systems so any suggestions are very welcome!


On Wed, 2008-06-25 at 15:56 +0530, hemant wrote:> On Wed, Jun 25, 2008 at 1:02 PM, Jack Nutting <jnutting at gmail.com>
wrote:
> > On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at
uea.ac.uk> wrote:
> >> Jack,
> >> I just found your interesting post in the archive and I would like
to
> >> come back to this. I need to implement something like this:
> >>
> >> I have some very long running tasks (several hours) that should
run on a
> >> remote machine and talk to the database on the Rails server. I
need to
> >> keep track of jobs including those that have been run in the past,
so a
> >> table for background jobs with their status as you describe would
be the
> >> best solution for me.
> >>
> >> I am just wondering whether backgroundrb wouldn''t be a
bit of an
> >> overkill in the scenario you describe? In the new "Advanced
Rails
> >> Recipes" from the Pragmatic Programmers Bookshelf there is a
recipe
> >> using a simple daemonized ruby process that polls the database for
> >> pending jobs and uses acts_as_state_machine to set the state of
the jobs
> >> (there is also a nice BackgrounDRb recipe in the book by the way).
> >> I am just wondering if the daemonized process isn''t
easier to handle in
> >> this case since you don''t integrate your app with
backgroundrb very
> >> tightly anyway?
> >>
> >> I would be grateful for any suggestions because there seem to be
lots of
> >> possible solutions for this problem and some more or less well
> >> documented plugins and I haven''t used any of them before.
I need a
> >> simple and robust method that doesn''t have too many
dependencies and
> >> doesn''t require too much maintenance because I want to
make the finished
> >> app available for others to install on their local systems.
> >
> > This is an interesting question, Frank.  My usage of backgroundrb is
> > somewhat of an edge case, and most of what I''m doing with it
could
> > definitely be done with a simpler system.  I initially chose
> > backgroundrb for my project because it seemed to make the most sense
> > at the time (for what I *thought* I needed; actual needs changed with
> > further exploration of the problem space), and I was enough of a ruby
> > newbie that it felt comfortable for me to have a packaged solution
> > that (mostly) "just worked".  If I were starting from
scratch today, I
> > might make a different decision.
> >
> > However, it''s not only inertia that keeps me using
backgroundrb.  For
> > one thing, backgroundrb does provide some handy things--centralized
> > logging, IPC for storing runtime status info about my processes,
> > etc--that would take some time for me to implement if I were rolling
> > my own solutions with a daemonized script, and from my perspective
> > that would be wasted time, since I have those things working today
> > thanks to backgroundrb.  Another reason for me to keep it is that I
> > have a few spots in my system where I''m considering using
some of
> > backgroundrb''s other key features, like launching a
short-lived
> > process to handle something in response to some action happening in
> > the main application
> >
> 
> Well, I am working on couple of  new things with BackgrounDRb. Result
> storage and retrieval is one of them,as I mentioned in earlier mails
> and solicited opinions from fellows who are using bdrb. You can
> checkout
> 
> http://github.com/gnufied/backgroundrb/commits/testcase
> 
> So whats there on this branch of BackgrounDRb which will become master
> very soon.
> 
> 1> True clustering system for clustering backgroundrb servers running
> on N nodes. Tasks are dispatched in a round robin manner, but you can
> specify the host on which you want execute task:
> 
> MiddleMan.worker(:foo_worker).async_some_work(:args => "lol")
> 
> ^^ will choose any server in a round robin manner and run
"some_work"
> method in the specified worker. You can also specify:
> :host => <local or all or "10.0.0.6:11001">
> 
> which overrides the default behaviour and run specified method on
> local bdrb server, all bdrb server or specified server.
> 
> 2> Clustering is failsafe and if one bdrb node goes down, all the
> requests are immediately started to being routed to remaining servers.
> Once that node comes up, it automatically starts participating in
> clustering process.
> 
> 3> Results can be stored in memcache and register_status method has
> been replace by a "cache" object available in all workers. Hence
you
> can cache results with:
> 
> cache[@user.id] = some_data
> 
> in your workers and later you can retrieve results using:
> 
> MiddleMan.worker(:foo_worker).ask_result(@user.id)
> 
> I will seriously recommend using memcache if you are clustering bdrb
> servers. Also, cache object''s caching mechanism is completely
thread
> safe and hence can be used from within the thread pool or anywhere you
> want.
> 
> 4> Apart from memory based job queue that you can use with thread
> pools, testcase branch implements database based job queues. So, to
> enquue a particular task:
> 
> MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)
> 
> some_task method will be automatically called in first availbable
> worker and task will be dequed from database.Also, jobs with duplicate
> keys automatically get rejected.
> 
> Note that, above things are already working on test case branch. I
> think, these features make bdrb a very compelling choice.
> 
> Some things that I will finish in a day or two:
> 
> 5> Similar to worker method invocation, with each scheduled method,
> you can specify host on which this task should run. For example, if
> you have 5 bdrb servers and you have scheduled billing task to run
> every sunday. Now, you don''t want billing task to run on sunday on
all
> the servers. So, by default scheduled task will run on the server on
> which its been created but you can specify host on which it should
> run.-- 
+++++++++++++++++++++++++++++++
Dr Frank Schwach
School of Computing Sciences
University of East Anglia
Norwich, NR4 7TJ
Tel: 0044/(0)1603 - 592 405
www.cmp.uea.ac.uk
++++++++++++++++++++++++++++++++

hemant

2008-Jun-25 11:20 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at uea.ac.uk>
wrote:> Hemant,
> These latest additions to backgroundrb look pretty cool. Unfortunately,
> I don''t think I will be able to use it this way because in my
setup I
> can''t run anything on the cluster nodes directly. I have to submit
jobs
> to a queuing system on the cluster''s master node, which is why I
think a
> simple daemon running on the master node that polls the (remote) db for
> pending jobs and then submits these to the queue would probably be
> better for my case - but I''m far from being an expert on
distributed
> systems so any suggestions are very welcome!
Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays
lightweight because most of these changes aren''t affecting core and
have really went into client side of code.

MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)

Above snippet does exactly that. But anyways, i think you feel its too
complicated for your setup? I can''t help that feeling. Its
complicated, if its complicated to setup and use, but its not. Those
features totally stay out of your way, if you don''t need them.

Steve D

2008-Jul-16 07:54 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

hemant,

Would bdrb''s queueing mechanism allow for prioritizing of tasks?

I have a few long-running low priority tasks that I''d want to run on
the
condition that no high priority tasks are in the queue.

Jack''s solution makes sense to my newbie brain, each time my
queue_processing_worker accesses the table, it can sort unprocessed tasks by
priority and begin from there.  But I''d rather stick to bdrb convention
if
it''s inbuilt already.

- Steve

On Wed, Jun 25, 2008 at 7:20 AM, hemant <gethemant at gmail.com> wrote:
> On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at
uea.ac.uk>
> wrote:
> > Hemant,
> > These latest additions to backgroundrb look pretty cool.
Unfortunately,
> > I don''t think I will be able to use it this way because in my
setup I
> > can''t run anything on the cluster nodes directly. I have to
submit jobs
> > to a queuing system on the cluster''s master node, which is
why I think a
> > simple daemon running on the master node that polls the (remote) db
for
> > pending jobs and then submits these to the queue would probably be
> > better for my case - but I''m far from being an expert on
distributed
> > systems so any suggestions are very welcome!
>
> Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays
> lightweight because most of these changes aren''t affecting core
and
> have really went into client side of code.
>
> MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)
>
> Above snippet does exactly that. But anyways, i think you feel its too
> complicated for your setup? I can''t help that feeling. Its
> complicated, if its complicated to setup and use, but its not. Those
> features totally stay out of your way, if you don''t need them.
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080716/18265b2a/attachment.html>

Frank Schwach

2008-Jul-16 18:10 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

Hi Hemant and others,

Once again thank you for your reply to my last post and the excellent
work you are doing! Having seen your announcement of all those new
features really convinced me to go ahead and use backgroundrb for my
app. However, I must admit being quite confused now with all the latest
changes to the API and would appreciate some help getting my head around
this. So, basically, I have an app with a database table of long running
tasks. There are several different types of tasks, so in my table I have
a column for the type of job to be run and another for serialized data
for the job (basically some IDs and args). I want to run the actual jobs
on a remote machine, which the new version of backgroundrb seems to have
covered nicely now. 

So this is what I thought I could do:

Have a pool of x RemoteJobWorkers on the remote machine waiting to
accept jobs:

class RemoteJobWorker
  def enque_job (job_type, job_args)
    thread_pool.defer(:job_type, job_args)
  end

  def job_type1 (args)
    # update job table with status 
    Jobs.update(job_key, :status => running)

    # perform the long running task
    large_dataset.each do
      # update job table with progress
      Jobs.update(job_key, :progress => x)
    end

    # job done: update status again
    Jobs.update(job_key, :status => complete)
  end

  def jobtype2 (args) ... end
  def jobtype3 (args) ... end
  .. and so on ..

end

So when a job is submitted by a user, the controller sends it to the
queue on the remote machine like this(?):

MiddleMan.worker(:remote_job_worker).async_enque_job(:arg =>
data,:job_key => new_job.id,:host => "a.b.c.d:XXX")

where new_job is the newly created instance of my Jobs model.

Does that make sense or is there a better approach that I should be
using? I also like the idea of just having a scheduled worker on the
remote machine polling the database for pending jobs and then sending
them to the queue just like I am trying to do with the above. I would
really appreciate opinions from some experienced users about this to
help me decide. 
In the above I am assuming that the jobs running in thread pool have
access to the job key and can therefore identify the row in the Jobs
table that they need to update? I hope I understood the announcement of
the latest version correctly that that is the case? 

One of my tasks would itself fork to run a third-party program during
its run time - would that be possible inside a job running in a thread
already or would the space-time continuum collapse if I tried that?

I noticed in the announcement that there is support for persistent job
queues now but I am not sure how to tie this in with my existing table.
Hemant, do you have a more detailed example of how to use this and what
the model would look like? 

Apologies for all the questions - your help is more than welcome!!
Thank you all in advance

Frank

On Wed, 2008-06-25 at 16:50 +0530, hemant wrote:> On Wed, Jun 25, 2008 at 4:30 PM, Frank Schwach <f.schwach at
uea.ac.uk> wrote:
> > Hemant,
> > These latest additions to backgroundrb look pretty cool.
Unfortunately,
> > I don''t think I will be able to use it this way because in my
setup I
> > can''t run anything on the cluster nodes directly. I have to
submit jobs
> > to a queuing system on the cluster''s master node, which is
why I think a
> > simple daemon running on the master node that polls the (remote) db
for
> > pending jobs and then submits these to the queue would probably be
> > better for my case - but I''m far from being an expert on
distributed
> > systems so any suggestions are very welcome!
> 
> Hmm, so use db queuing mechanism inbuilt in bdrb. bdrb still stays
> lightweight because most of these changes aren''t affecting core
and
> have really went into client side of code.
> 
> MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)
> 
> Above snippet does exactly that. But anyways, i think you feel its too
> complicated for your setup? I can''t help that feeling. Its
> complicated, if its complicated to setup and use, but its not. Those
> features totally stay out of your way, if you don''t need them.-- 
+++++++++++++++++++++++++++++++
Dr Frank Schwach
School of Computing Sciences
University of East Anglia
Norwich, NR4 7TJ
Tel: 0044/(0)1603 - 592 405
www.cmp.uea.ac.uk
++++++++++++++++++++++++++++++++

hemant

2008-Jul-16 18:17 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Wed, Jul 16, 2008 at 1:24 PM, Steve D <swertyui at gmail.com>
wrote:> hemant,
>
> Would bdrb''s queueing mechanism allow for prioritizing of tasks?
>
> I have a few long-running low priority tasks that I''d want to run
on the
> condition that no high priority tasks are in the queue.
>
> Jack''s solution makes sense to my newbie brain, each time my
> queue_processing_worker accesses the table, it can sort unprocessed tasks
by
> priority and begin from there.  But I''d rather stick to bdrb
convention if
> it''s inbuilt already.
>
No, it can''t right now. But, it should be trivial to add such
functionality, I welcome a patch.

hemant

2008-Jul-16 18:19 UTC

head link

[Backgroundrb-devel] best approach to managing workers and getting status

On Wed, Jul 16, 2008 at 11:40 PM, Frank Schwach <f.schwach at uea.ac.uk>
wrote:> Hi Hemant and others,
>
> Once again thank you for your reply to my last post and the excellent
> work you are doing! Having seen your announcement of all those new
> features really convinced me to go ahead and use backgroundrb for my
> app. However, I must admit being quite confused now with all the latest
> changes to the API and would appreciate some help getting my head around
> this. So, basically, I have an app with a database table of long running
> tasks. There are several different types of tasks, so in my table I have
> a column for the type of job to be run and another for serialized data
> for the job (basically some IDs and args). I want to run the actual jobs
> on a remote machine, which the new version of backgroundrb seems to have
> covered nicely now.
>
> So this is what I thought I could do:
>
> Have a pool of x RemoteJobWorkers on the remote machine waiting to
> accept jobs:
>
> class RemoteJobWorker
>  def enque_job (job_type, job_args)
>    thread_pool.defer(:job_type, job_args)
>  end
>
>  def job_type1 (args)
>    # update job table with status
>    Jobs.update(job_key, :status => running)
>
>    # perform the long running task
>    large_dataset.each do
>      # update job table with progress
>      Jobs.update(job_key, :progress => x)
>    end
>
>    # job done: update status again
>    Jobs.update(job_key, :status => complete)
>  end
>
>  def jobtype2 (args) ... end
>  def jobtype3 (args) ... end
>  .. and so on ..
>
> end
>
> So when a job is submitted by a user, the controller sends it to the
> queue on the remote machine like this(?):
>
> MiddleMan.worker(:remote_job_worker).async_enque_job(:arg =>
> data,:job_key => new_job.id,:host => "a.b.c.d:XXX")
>
> where new_job is the newly created instance of my Jobs model.
>

I see you are rolling out your own table based queue here, why not
just use inbuilt persistent database queue?

For the worker, read this page:

http://backgroundrb.rubyforge.org/workers/

Persistent job queue is explained at the bottom.

For adding tasks to the queue you need to do:

MiddleMan.worker(:remote_worker).enq_some_job(:arg => some_arg)

Task will be added to the db queue and automatically picked up.
Handles hairy race conditions for you and stuff like that.

hemant

2008-Jul-17 09:06 UTC

head link

[Backgroundrb-devel] Fwd: best approach to managing workers and getting status

---------- Forwarded message ----------
From: hemant <gethemant at gmail.com>
Date: Thu, Jul 17, 2008 at 2:36 PM
Subject: Re: [Backgroundrb-devel] best approach to managing workers
and getting status
To: Frank Schwach <f.schwach at uea.ac.uk>


On Thu, Jul 17, 2008 at 1:46 PM, Frank Schwach <f.schwach at uea.ac.uk>
wrote:> Thanks Hemant,
>
> Yes, I saw the part about the persistent job queue but I have a couple
> of questions about this:
>
> What does the table for the job queue look like?
>
You can open mysql or whatever db you are using and run:

desc bdrb_job_queues;

to see, whats the schema of the table.
> How do I set other states of the job like "error"? Is the job
status
> changed from "pending" to "running" automatically in
the table so that I
> can query this table for pending/running/completed jobs?
When a task is pulled out of queue, its flagged as taken and you can
specify a timeout period while creating a task. When you invoke
"#finish!" task is marked finished. There is no, "error",
because, if
you a task is "taken" and not "finished!" within specified
period, its
automatically considered in error state.

>
> Can I get the jobs ID from within the worker so that I can update the
> job queue table "manually" too? If I want to record a
"percentage
> completion" I guess I would need that.
Sure, from anywhere in your worker code, you can use,
''persistent_job''
attribute to get task thats dequed from job queue and is currently
running.

> Can I specify the (remote) host with enq_some_job like I can with the
> async_enque_job method?
You can''t, because for persistent tasks, it doesn''t matter,
the worker
which fetches the task first, gets to execute it anyway.



-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

Maybe Matching Threads

Search for more reasonably related threads

Backgroundrb devel - May 2008 - best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] best approach to managing workers and getting status

[Backgroundrb-devel] Fwd: best approach to managing workers and getting status

Maybe Matching Threads