thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] Thread Pool Size? [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Brian Mulloy

2006-Dec-08 02:59 UTC

[Backgroundrb-devel] Thread Pool Size?

Hi All,

It might be lack of sleep, but I am struggling to accurately limit our pool
size.   It seems like I can specify it on the server with the -s command
line option and also on the client via the YAML pool_size.  Is that right?
Which one wins?

Our problem is that we are getting about 40 threads on each backgroundrb box
and it''s flooring our db and each bgrb box.
We want around 8.

Is there a way to put a hard ceiling on the server side thread pool?

Here is our setup:
5 app boxes, app01 - app05, with 8 mongrel instances each (this is where the
40 comes from, I think)
Each app box points to a load balancer in front of two backgroundrb boxes,
crawler01 and crawler02
this is the backgroundrb.yml on each app box
:host: backgroundrb
:port: 2000
:rails_env: production
:pool_size: 1

We have to keep killing the bgrb.  But we''re ok, because all the state
of
the workers is stored in a record in the db.  That also allows us to use the
load balancer in front of the bgrb (crawler) tier.

BTW, we our site is www.swivel.com and you see our use of the progress bar
pattern and bgrb (although you won''t see much progress until we solve
this.


long live bgrb

Brian Mulloy
CEO & Cofounder
www.swivel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061207/2c8b9511/attachment.html

Ezra Zygmuntowicz

2006-Dec-08 04:35 UTC

head link

[Backgroundrb-devel] Thread Pool Size?

Hey Brian-

On Dec 7, 2006, at 6:59 PM, Brian Mulloy wrote:
> Hi All,
>
> It might be lack of sleep, but I am struggling to accurately limit  
> our pool size.   It seems like I can specify it on the server with  
> the -s command line option and also on the client via the YAML  
> pool_size.  Is that right?  Which one wins?
>
> Our problem is that we are getting about 40 threads on each  
> backgroundrb box and it''s flooring our db and each bgrb box.
> We want around 8.
>
> Is there a way to put a hard ceiling on the server side thread pool?
>
> Here is our setup:
> 5 app boxes, app01 - app05, with 8 mongrel instances each (this is  
> where the 40 comes from, I think)
> Each app box points to a load balancer in front of two backgroundrb  
> boxes, crawler01 and crawler02
> this is the backgroundrb.yml on each app box
> :host: backgroundrb
> :port: 2000
> :rails_env: production
> :pool_size: 1
>
	You are using the latest release of backgroundrb? 0.2.1? The pool  
size works like this.. It is supposed to only allow the number you  
specify of workers to be running at one time. Any more requests for  
new workers when the pool is full get queued up and then they are run  
when another worker leaves the pool and dies. What exactly is  
happening to you? Are you sending a bunch of requests and they all  
run at once even though you have set the limit?

	It may be that we need to add some tighter control to how many  
active jobs there are. Maybe a way for new_worker to return an error  
that you can rescue and retry on a different drb server? The current  
thread pool should limit the actively running workers but writing  
properly threaded code is hard so I wouldn''t be too surprised if I  
made a mistake.

  	I can imagine a way to be more strict about the pool limit but it  
will take some doing I imagine. Can you explain how your bdrb stuff  
works? Are you starting a new worker for each time you do something?  
Or do you have a certain number of workers that always run and keep  
running in a loop accepting jobs? The reason I ask is that the way  
that the new backgroundrb works its better to have a number of named  
workers that always run in a loop accepting jobs from a queue. So  
instead of calling new_worker all the time you set your workers to  
autostart at server start time and then just call methods on the  
daemon style workers. This way you can start exactly how many workers  
you want and give each one a named job key. Then you can either have  
the workers running in a loop where they wake up every second or  
whatever interval you set and do whatever jobs they need to do,

	Since you say that you are already keeping the state of the workers  
in the database then maybe an approach like this would be better  
suited. The basic idea is something like this:

An example worker that pulls events off the db and executes them,  
then saves them out and loops again

class RSSWorker < BackgrounDRb::Worker::RailsBase
   require ''rss''
   require ''net/http''

   def do_work(args)
     @args = args
     mainloop
   end

   def mainloop
     loop {
       sleep @args[:sleep]
       RssUrls.find_all_by_pending(true).each do |rss|
         rss.processing=true
         rss.save
         Net::HTTP.start(rss.host) do |http|
           response = http.get(rss.path)
           raise response.code unless response.code == "200"
           rss_parser = RSS::Parser.new(response.body)
           rss.output = rss_parser.parse
         end
         rss.completed=true
         rss.save
       end
     }
   end

end
RSSWorker.register

	Something like that may work better although I am not certain  
exactly what your workers need to do. You coudl also do the same  
thing with long lived workers but create a method other then do_work  
which you can call from rails via the work_thread method.


Cheers-
-- Ezra Zygmuntowicz 
-- Lead Rails Evangelist
-- ez at engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)

Brian Mulloy

2006-Dec-08 08:11 UTC

head link

[Backgroundrb-devel] Thread Pool Size?

>         You are using the latest release of backgroundrb?

module BackgrounDRb
  VERSION = "0.2.0"
end

doh, I am going to upgrade and then give it another go.

these are excellent thoughts, Ezra, thanks.


0.2.1? The pool> size works like this.. It is supposed to only allow the number you
> specify of workers to be running at one time.


Any more requests for> new workers when the pool is full get queued up and then they are run
> when another worker leaves the pool and dies.


What exactly is> happening to you? Are you sending a bunch of requests and they all
> run at once even though you have set the limit?
>
>         It may be that we need to add some tighter control to how many
> active jobs there are. Maybe a way for new_worker to return an error
> that you can rescue and retry on a different drb server? The current
> thread pool should limit the actively running workers but writing
> properly threaded code is hard so I wouldn''t be too surprised if I
> made a mistake.
>
>         I can imagine a way to be more strict about the pool limit but it
> will take some doing I imagine. Can you explain how your bdrb stuff
> works? Are you starting a new worker for each time you do something?
> Or do you have a certain number of workers that always run and keep
> running in a loop accepting jobs? The reason I ask is that the way
> that the new backgroundrb works its better to have a number of named
> workers that always run in a loop accepting jobs from a queue. So
> instead of calling new_worker all the time you set your workers to
> autostart at server start time and then just call methods on the
> daemon style workers. This way you can start exactly how many workers
> you want and give each one a named job key. Then you can either have
> the workers running in a loop where they wake up every second or
> whatever interval you set and do whatever jobs they need to do,
>
>         Since you say that you are already keeping the state of the
> workers
> in the database then maybe an approach like this would be better
> suited. The basic idea is something like this:
>
> An example worker that pulls events off the db and executes them,
> then saves them out and loops again
>
> class RSSWorker < BackgrounDRb::Worker::RailsBase
>    require ''rss''
>    require ''net/http''
>
>    def do_work(args)
>      @args = args
>      mainloop
>    end
>
>    def mainloop
>      loop {
>        sleep @args[:sleep]
>        RssUrls.find_all_by_pending(true).each do |rss|
>          rss.processing=true
>          rss.save
>          Net::HTTP.start(rss.host) do |http|
>            response = http.get(rss.path)
>            raise response.code unless response.code == "200"
>            rss_parser = RSS::Parser.new(response.body)
>            rss.output = rss_parser.parse
>          end
>          rss.completed=true
>          rss.save
>        end
>      }
>    end
>
> end
> RSSWorker.register
>
>         Something like that may work better although I am not certain
> exactly what your workers need to do. You coudl also do the same
> thing with long lived workers but create a method other then do_work
> which you can call from rails via the work_thread method.
>
>
> Cheers-
> -- Ezra Zygmuntowicz
> -- Lead Rails Evangelist
> -- ez at engineyard.com
> -- Engine Yard, Serious Rails Hosting
> -- (866) 518-YARD (9273)
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061208/9d575aab/attachment.html

Brian Mulloy

2006-Dec-09 03:59 UTC

head link

[Backgroundrb-devel] Thread Pool Size?

Hi Ezra,

I did the upgrade to 0.2.1.  I am going to adjust how I am using the workers
based on your suggestions.

However, here is a unit test I wrote, that for better or worse has the
behavior I was hoping for.

The test creates new workers and asserts after each worker that the # of
worker threads is less than I specified with the -s option on backgroundrb
start.

class FooWorker < BackgrounDRb::Worker::RailsBase
  def do_work(args)
    sleep 1
  end
end
FooWorker.register

def test_middle_man_processes
  # thread pool size
  max_bdrb_threads = 2

  # stop bdrb just in case
  %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb stop -- -c
backgroundrb_test)

  # start bdrb int test with thread pool size
  %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb start -- -c
backgroundrb_test -s #{max_bdrb_threads})

  # give bdrb time to start
  sleep(5)

  # shouldn''t have to do this, but we do
  orig_env=ENV[''RAILS_ENV'']
  ENV[''RAILS_ENV''] = ''test''

  10.times do
     MiddleMan.new_worker :class => :foo_worker
     worker_count = %x(pstree | grep -v grep | grep #{:foo_worker} | wc
-l).gsub("\n", '''').to_i
     puts "worker_count: #{worker_count} max_bdrb_threads:
#{max_bdrb_threads}"
     # assert that thread pool size is less than what we specified
     assert worker_count <= max_bdrb_threads, "expected
#{max_bdrb_threads}
workers, was #{worker_count}"
  end

  # shouldn''t have to do this, but we do
  %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb stop -- -c
backgroundrb_test)
  ENV[''RAILS_ENV''] = orig_env
end

thanks again for the help.  i''m going back to your other suggestions
now.

On 12/8/06, Brian Mulloy <brian at swivel.com>
wrote:>
>
>         You are using the latest release of backgroundrb?
>
>
> module BackgrounDRb
>   VERSION = "0.2.0"
> end
>
> doh, I am going to upgrade and then give it another go.
>
> these are excellent thoughts, Ezra, thanks.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061208/33913125/attachment-0001.html

Reasonably Related Threads

Search for more seemingly similar threads

Backgroundrb devel - Dec 2006 - Thread Pool Size?

[Backgroundrb-devel] Thread Pool Size?

[Backgroundrb-devel] Thread Pool Size?

[Backgroundrb-devel] Thread Pool Size?

[Backgroundrb-devel] Thread Pool Size?

Reasonably Related Threads