thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] Behaviour of pool

If this information is useful, please help other people find it:
Share via:

Christian Schlaefcke

2007-May-15 18:37 UTC

[Backgroundrb-devel] Behaviour of pool_size setting

Hi,

I have backgroundrb running to decouple the execution of massive 
business logic from an ActionWebservice request. The service is designed 
to take some configuration parameters and fire a lot of background 
workers to do the requested work. Due to performance reasons I want to 
limit the number of workers to a maximum number of 30. But when I start 
a configuration that requires for example 300 worker executions I can 
see that the limit of 30 workers is not kept and a number of about 180 
worker processes are filling up my process list.

I start my workers like this:

key = MiddleMan.new_worker(:class => :execution_worker, :args =>
{...some_args...})


Why do I see so much more than my declared number of 30 workers? Am I 
wrong somehow? How do I have to understand the behaviour of pool_size? 
What happens when I have 30 workers working and the 31st, 32nd, ..., 
300th request to start a worker comes in?

Thanks & Regards!

Christian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cschlaefcke.vcf
Type: text/x-vcard
Size: 368 bytes
Desc: not available
Url :
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20070515/defa024a/attachment.vcf

skaar

2007-May-16 12:40 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

* Christian Schlaefcke (cschlaefcke at wms-network.de) [070515
23:49]:> Hi,
> 
> I have backgroundrb running to decouple the execution of massive 
> business logic from an ActionWebservice request. The service is designed 
> to take some configuration parameters and fire a lot of background 
> workers to do the requested work. Due to performance reasons I want to 
> limit the number of workers to a maximum number of 30. But when I start 
> a configuration that requires for example 300 worker executions I can 
> see that the limit of 30 workers is not kept and a number of about 180 
> worker processes are filling up my process list.
ok, I might look at adding a ''max_worker'' configuration
parameter as
something separate from the thread pool (which really just say something
about the MiddleMan, not the exisiting workers) - changing the pool_size
semantics could impact current installations.

/skaar


> 
> I start my workers like this:
> 
> key = MiddleMan.new_worker(:class => :execution_worker, :args => 
> {...some_args...})
> 
> 
> Why do I see so much more than my declared number of 30 workers? Am I 
> wrong somehow? How do I have to understand the behaviour of pool_size? 
> What happens when I have 30 workers working and the 31st, 32nd, ..., 
> 300th request to start a worker comes in?
> 
> Thanks & Regards!
> 
> Christian
> 
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
-- 
----------------------------------------------------------------------
|\|\             where in the       |          s_u_b_s_t_r_u_c_t_i_o_n 
| | >===========  W.A.S.T.E.        |                  genarratologies 
|/|/    (_)     is the wisdom       |                  skaar at waste.org
----------------------------------------------------------------------

Mason Hale

2007-May-16 13:27 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

I resolved a similar situation by pre-allocating a fixed-size pool of
workers.
These workers are long-running and they individually poll the database
for new work.

My backgroundrb_schedules.yml includes this:

<% pub_worker_pool_size = 5 %>
<% pub_worker_pool_size.times do |i| %>
pubworker<%= i %>:
    :class: :pub_worker
    :job_key: :pubworker<%= i %>
    :worker_method: :do_work
    :worker_method_args:
        :ignored: 1
    :trigger_args:
        :start: <%= Time.now + (10 + (10 * i)).seconds %>
<% end %>

My PubWorker#do_work method is:

  def do_work(args)
    logger.debug("#{self.jobkey}: do_work called")
    loop do # loop forever
      unless do_publish
        sleep 60 # pause 1 minute, if no publications were found on last try
      else
        sleep 1 # pause 1 second otherwise
      end
    end
  end

My do_publish method checks the queue for work, and if there is a
publication due, it publishes it and returns true, if there was no work it
returns false. In the do_work you''ll see that if the queue is empty it
gets
polled once per minute per worker. Otherwise workers try to empty the queue
as quickly as possible. I stagger the start up of the workers at 10 second
intervals so as to distribute the polls to the database. I avoid
multi-threading issues by making each worker responsible for its own loop.
There is no external scheduler that calls the do_work method periodically.

My previous implementation had a single long-running worker that would spawn
additional workers as needed to handle the work in the queue. But my
experience was that creating a worker from a worker leads to socket not
found and connection reset by peer errors, and poor reliability in general.
So I scrapped that in favor of the fixed-size pool above and it has been
much more reliable.

I do think there is potential to create a fixed-size pool of workers, and
another long-running ''queue'' manager worker, and implement it
such that the
workers request work from the queue manager worker rather than the database.
This way the queue manager could get  a block of work and parcel it out with
fewer queries to the database. In addition the workers could check in with
the manager when they are done, making it possible for the manager to
compile statistics on the jobs. For me, using the database to manage the
queue -- and the inherent multi-threading issues -- was the more expedient
route.

Mason

On 5/16/07, skaar <skaar at waste.org> wrote:>
> * Christian Schlaefcke (cschlaefcke at wms-network.de) [070515 23:49]:
> > Hi,
> >
> > I have backgroundrb running to decouple the execution of massive
> > business logic from an ActionWebservice request. The service is
designed
> > to take some configuration parameters and fire a lot of background
> > workers to do the requested work. Due to performance reasons I want to
> > limit the number of workers to a maximum number of 30. But when I
start
> > a configuration that requires for example 300 worker executions I can
> > see that the limit of 30 workers is not kept and a number of about 180
> > worker processes are filling up my process list.
>
> ok, I might look at adding a ''max_worker'' configuration
parameter as
> something separate from the thread pool (which really just say something
> about the MiddleMan, not the exisiting workers) - changing the pool_size
> semantics could impact current installations.
>
> /skaar
>
>
>
> >
> > I start my workers like this:
> >
> > key = MiddleMan.new_worker(:class => :execution_worker, :args =>
> > {...some_args...})
> >
> >
> > Why do I see so much more than my declared number of 30 workers? Am I
> > wrong somehow? How do I have to understand the behaviour of pool_size?
> > What happens when I have 30 workers working and the 31st, 32nd, ...,
> > 300th request to start a worker comes in?
> >
> > Thanks & Regards!
> >
> > Christian
> >
>
>
> > _______________________________________________
> > Backgroundrb-devel mailing list
> > Backgroundrb-devel at rubyforge.org
> > http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>
> --
> ----------------------------------------------------------------------
> |\|\             where in the       |          s_u_b_s_t_r_u_c_t_i_o_n
> | | >===========  W.A.S.T.E.        |                  genarratologies
> |/|/    (_)     is the wisdom       |                  skaar at waste.org
> ----------------------------------------------------------------------
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20070516/f73bbf90/attachment-0001.html

Jonathan Wallace

2007-Jul-26 17:21 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

On 5/16/07, Mason Hale <masonhale at gmail.com>
wrote:>
> I resolved a similar situation by pre-allocating a fixed-size pool of
> workers.
> These workers are long-running and they individually poll the database
> for new work.

I''d prefer to not have the workers polling the database, especially if
the
amount of workers is large.  Since responsiveness is one of the goals of my
app, I''d prefer to let users queries slow down the db. :)

My previous implementation had a single long-running worker that would
spawn> additional workers as needed to handle the work in the queue. But my
> experience was that creating a worker from a worker leads to socket not
> found and connection reset by peer errors, and poor reliability in general.
> So I scrapped that in favor of the fixed-size pool above and it has been
> much more reliable.

Has anyone successfully implemented this technique in a high reliability
site?

I do think there is potential to create a fixed-size pool of workers,
and> another long-running ''queue'' manager worker, and
implement it such that the
> workers request work from the queue manager worker rather than the
database.
> This way the queue manager could get  a block of work and parcel it out
with
> fewer queries to the database. In addition the workers could check in with
> the manager when they are done, making it possible for the manager to
> compile statistics on the jobs. For me, using the database to manage the
> queue -- and the inherent multi-threading issues -- was the more expedient
> route.

How about using the result hash as way to push information from the "queue
managing" worker to other workers?  Has anyone tried something like this?
Is it reliable?  Or are there race conditions in with the queue-managing
worker writing to the hash and the other workers reading from their key?

-- 
Jonathan Wallace
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20070726/b8b43560/attachment-0001.html

Mason Hale

2007-Jul-26 19:24 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

>  How about using the result hash as way to push information from the
"queue
> managing" worker to other workers?  Has anyone tried something like
this?
> Is it reliable?  Or are there race conditions in with the queue-managing
> worker writing to the hash and the other workers reading from their key?
I highly suggest avoiding use of the result hash. The current
implementation suffers from some threading/locking issues that result
in very weird behavior when accessed concurrently.

See:
http://rubyforge.org/pipermail/backgroundrb-devel/2007-January/000639.html

Still I do think it would be possible to either fix the results hash
issue (which would be great!) or manage the multi-threaded access
issues in a queue manager worker yourself.

For my immediate needs, I punted and just store the queue of things to
be worked on in the db, and let the db server deal with the
concurrency issues. I agree it is a less than ideal solution - but
given my options and other priorities it was a quick easy fix to a
hairy problem. Perhaps a lightweight db like SQLite could be used just
to manage the queue -- and thereby offload the traffic from the main
db, while also saving the need to deal with the concurrency headaches.
(I''ve never used SQLite, so I''m not sure).

Mason

Jonathan Wallace

2007-Jul-26 20:02 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

On 7/26/07, Mason Hale <masonhale at gmail.com>
wrote:> >  How about using the result hash as way to push information from the
"queue
> > managing" worker to other workers?  Has anyone tried something
like this?
> > Is it reliable?  Or are there race conditions in with the
queue-managing
> > worker writing to the hash and the other workers reading from their
key?
>
> I highly suggest avoiding use of the result hash. The current
> implementation suffers from some threading/locking issues that result
> in very weird behavior when accessed concurrently.
>
> See:
> http://rubyforge.org/pipermail/backgroundrb-devel/2007-January/000639.html
Sorry, I hadn''t read far enough back in the archives.  I''m not
sure I
follow all the particulars but would this issue include where each
worker writes to its own key in the results hash?  I.e., multiple
writers to the hash but no two writers accessing the same key?
> Still I do think it would be possible to either fix the results hash
> issue (which would be great!) or manage the multi-threaded access
> issues in a queue manager worker yourself.
I''m going to try to run the test you detailed in that thread with the
question I present above to see what happens.  Depending on the
results, I may attempt a dive into the backgroundrb code.

As for multi-threaded (process) access in a queue manager worker, I''ve
already ruled that out unless absolutely necessary.  I see no need to
introduce concurrency programming (outside of use of bdrb) in my app
just yet.   Succinctly put, deadlocks would suck.
> For my immediate needs, I punted and just store the queue of things to
> be worked on in the db, and let the db server deal with the
> concurrency issues. I agree it is a less than ideal solution - but
> given my options and other priorities it was a quick easy fix to a
I''m currently storing the jobs to be completed in the db already to
ensure that no jobs are ever missed due to crashing by the bdrb
process, a worker or a server.  If each worker is idempotent, then it
doesn''t matter if a job that didn''t quite complete is re-run
on a bdrb
restart / server restart.  Also, since I want a log of jobs completed,
it makes sense to t

As stated before, my concern is to limit the amount queries to the db
if at all possible. The reason for this being the future possibility
of multiple dedicated backgroundrb servers.  It seems unreasonable to
have a bunch of bdrb servers polling the db for jobs.  Do you find
that db caching eliminates the majority of my concern here for you?
> hairy problem. Perhaps a lightweight db like SQLite could be used just
> to manage the queue -- and thereby offload the traffic from the main
> db, while also saving the need to deal with the concurrency headaches.
> (I''ve never used SQLite, so I''m not sure).
Ha!  It sounds like you''re thinking of ruby queue[0].  I thought of
using that for my current issue and foregoing bdrb altogether but I
don''t think I like the idea of using ruby queue to execute
./script/runner statements.  It seems somewhat dirty for some reason.
Plus, I don''t see any easy way to acquire the progress of the ruby
queue clients as they are running.

On another note, I remember reading in the archives about the problems
with workers spawning workers[1] .  Has anyone tried having a worker
call a method in a traditional rails class that spawns another worker?
 This is another task on my list of things to try.

0. http://codeforpeople.com/lib/ruby/rq/rq-3.1.0/ , found via
http://www.forbiddenweb.org/topic/270/index.html
1. among other threads,
http://rubyforge.org/pipermail/backgroundrb-devel/2007-February/000755.html
-- 
Jonathan Wallace

hemant

2007-Jul-27 13:24 UTC

head link

[Backgroundrb-devel] Behaviour of pool_size setting

On 7/27/07, Jonathan Wallace <jonathan.wallace at gmail.com>
wrote:> On 7/26/07, Mason Hale <masonhale at gmail.com> wrote:
> > >  How about using the result hash as way to push information from
the "queue
> > > managing" worker to other workers?  Has anyone tried
something like this?
> > > Is it reliable?  Or are there race conditions in with the
queue-managing
> > > worker writing to the hash and the other workers reading from
their key?
> >
> > I highly suggest avoiding use of the result hash. The current
> > implementation suffers from some threading/locking issues that result
> > in very weird behavior when accessed concurrently.
> >
> > See:
> >
http://rubyforge.org/pipermail/backgroundrb-devel/2007-January/000639.html
>
> Sorry, I hadn''t read far enough back in the archives. 
I''m not sure I
> follow all the particulars but would this issue include where each
> worker writes to its own key in the results hash?  I.e., multiple
> writers to the hash but no two writers accessing the same key?
>
> > Still I do think it would be possible to either fix the results hash
> > issue (which would be great!) or manage the multi-threaded access
> > issues in a queue manager worker yourself.
>
> I''m going to try to run the test you detailed in that thread with
the
> question I present above to see what happens.  Depending on the
> results, I may attempt a dive into the backgroundrb code.
>
> As for multi-threaded (process) access in a queue manager worker,
I''ve
> already ruled that out unless absolutely necessary.  I see no need to
> introduce concurrency programming (outside of use of bdrb) in my app
> just yet.   Succinctly put, deadlocks would suck.
>
> > For my immediate needs, I punted and just store the queue of things to
> > be worked on in the db, and let the db server deal with the
> > concurrency issues. I agree it is a less than ideal solution - but
> > given my options and other priorities it was a quick easy fix to a
>
> I''m currently storing the jobs to be completed in the db already
to
> ensure that no jobs are ever missed due to crashing by the bdrb
> process, a worker or a server.  If each worker is idempotent, then it
> doesn''t matter if a job that didn''t quite complete is
re-run on a bdrb
> restart / server restart.  Also, since I want a log of jobs completed,
> it makes sense to t
>
> As stated before, my concern is to limit the amount queries to the db
> if at all possible. The reason for this being the future possibility
> of multiple dedicated backgroundrb servers.  It seems unreasonable to
> have a bunch of bdrb servers polling the db for jobs.  Do you find
> that db caching eliminates the majority of my concern here for you?
>
> > hairy problem. Perhaps a lightweight db like SQLite could be used just
> > to manage the queue -- and thereby offload the traffic from the main
> > db, while also saving the need to deal with the concurrency headaches.
> > (I''ve never used SQLite, so I''m not sure).
>
> Ha!  It sounds like you''re thinking of ruby queue[0].  I thought
of
> using that for my current issue and foregoing bdrb altogether but I
> don''t think I like the idea of using ruby queue to execute
> ./script/runner statements.  It seems somewhat dirty for some reason.
> Plus, I don''t see any easy way to acquire the progress of the ruby
> queue clients as they are running.
>
> On another note, I remember reading in the archives about the problems
> with workers spawning workers[1] .  Has anyone tried having a worker
> call a method in a traditional rails class that spawns another worker?
>  This is another task on my list of things to try.
Jonathan,

Calling something in rails that invokes another, doesn''t sound right.
Its not possible also in current scheme of things.


>
> 0. http://codeforpeople.com/lib/ruby/rq/rq-3.1.0/ , found via
> http://www.forbiddenweb.org/topic/270/index.html
> 1. among other threads,
> http://rubyforge.org/pipermail/backgroundrb-devel/2007-February/000755.html
> --
> Jonathan Wallace
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>

-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://blog.gnufied.org

Apparently Analagous Threads

Search for more maybe matching threads

Backgroundrb devel - May 2007 - Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

[Backgroundrb-devel] Behaviour of pool_size setting

Apparently Analagous Threads