Hi All, It might be lack of sleep, but I am struggling to accurately limit our pool size. It seems like I can specify it on the server with the -s command line option and also on the client via the YAML pool_size. Is that right? Which one wins? Our problem is that we are getting about 40 threads on each backgroundrb box and it''s flooring our db and each bgrb box. We want around 8. Is there a way to put a hard ceiling on the server side thread pool? Here is our setup: 5 app boxes, app01 - app05, with 8 mongrel instances each (this is where the 40 comes from, I think) Each app box points to a load balancer in front of two backgroundrb boxes, crawler01 and crawler02 this is the backgroundrb.yml on each app box :host: backgroundrb :port: 2000 :rails_env: production :pool_size: 1 We have to keep killing the bgrb. But we''re ok, because all the state of the workers is stored in a record in the db. That also allows us to use the load balancer in front of the bgrb (crawler) tier. BTW, we our site is www.swivel.com and you see our use of the progress bar pattern and bgrb (although you won''t see much progress until we solve this. long live bgrb Brian Mulloy CEO & Cofounder www.swivel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061207/2c8b9511/attachment.html
Hey Brian- On Dec 7, 2006, at 6:59 PM, Brian Mulloy wrote:> Hi All, > > It might be lack of sleep, but I am struggling to accurately limit > our pool size. It seems like I can specify it on the server with > the -s command line option and also on the client via the YAML > pool_size. Is that right? Which one wins? > > Our problem is that we are getting about 40 threads on each > backgroundrb box and it''s flooring our db and each bgrb box. > We want around 8. > > Is there a way to put a hard ceiling on the server side thread pool? > > Here is our setup: > 5 app boxes, app01 - app05, with 8 mongrel instances each (this is > where the 40 comes from, I think) > Each app box points to a load balancer in front of two backgroundrb > boxes, crawler01 and crawler02 > this is the backgroundrb.yml on each app box > :host: backgroundrb > :port: 2000 > :rails_env: production > :pool_size: 1 >You are using the latest release of backgroundrb? 0.2.1? The pool size works like this.. It is supposed to only allow the number you specify of workers to be running at one time. Any more requests for new workers when the pool is full get queued up and then they are run when another worker leaves the pool and dies. What exactly is happening to you? Are you sending a bunch of requests and they all run at once even though you have set the limit? It may be that we need to add some tighter control to how many active jobs there are. Maybe a way for new_worker to return an error that you can rescue and retry on a different drb server? The current thread pool should limit the actively running workers but writing properly threaded code is hard so I wouldn''t be too surprised if I made a mistake. I can imagine a way to be more strict about the pool limit but it will take some doing I imagine. Can you explain how your bdrb stuff works? Are you starting a new worker for each time you do something? Or do you have a certain number of workers that always run and keep running in a loop accepting jobs? The reason I ask is that the way that the new backgroundrb works its better to have a number of named workers that always run in a loop accepting jobs from a queue. So instead of calling new_worker all the time you set your workers to autostart at server start time and then just call methods on the daemon style workers. This way you can start exactly how many workers you want and give each one a named job key. Then you can either have the workers running in a loop where they wake up every second or whatever interval you set and do whatever jobs they need to do, Since you say that you are already keeping the state of the workers in the database then maybe an approach like this would be better suited. The basic idea is something like this: An example worker that pulls events off the db and executes them, then saves them out and loops again class RSSWorker < BackgrounDRb::Worker::RailsBase require ''rss'' require ''net/http'' def do_work(args) @args = args mainloop end def mainloop loop { sleep @args[:sleep] RssUrls.find_all_by_pending(true).each do |rss| rss.processing=true rss.save Net::HTTP.start(rss.host) do |http| response = http.get(rss.path) raise response.code unless response.code == "200" rss_parser = RSS::Parser.new(response.body) rss.output = rss_parser.parse end rss.completed=true rss.save end } end end RSSWorker.register Something like that may work better although I am not certain exactly what your workers need to do. You coudl also do the same thing with long lived workers but create a method other then do_work which you can call from rails via the work_thread method. Cheers- -- Ezra Zygmuntowicz -- Lead Rails Evangelist -- ez at engineyard.com -- Engine Yard, Serious Rails Hosting -- (866) 518-YARD (9273)
> You are using the latest release of backgroundrb?module BackgrounDRb VERSION = "0.2.0" end doh, I am going to upgrade and then give it another go. these are excellent thoughts, Ezra, thanks. 0.2.1? The pool> size works like this.. It is supposed to only allow the number you > specify of workers to be running at one time.Any more requests for> new workers when the pool is full get queued up and then they are run > when another worker leaves the pool and dies.What exactly is> happening to you? Are you sending a bunch of requests and they all > run at once even though you have set the limit? > > It may be that we need to add some tighter control to how many > active jobs there are. Maybe a way for new_worker to return an error > that you can rescue and retry on a different drb server? The current > thread pool should limit the actively running workers but writing > properly threaded code is hard so I wouldn''t be too surprised if I > made a mistake. > > I can imagine a way to be more strict about the pool limit but it > will take some doing I imagine. Can you explain how your bdrb stuff > works? Are you starting a new worker for each time you do something? > Or do you have a certain number of workers that always run and keep > running in a loop accepting jobs? The reason I ask is that the way > that the new backgroundrb works its better to have a number of named > workers that always run in a loop accepting jobs from a queue. So > instead of calling new_worker all the time you set your workers to > autostart at server start time and then just call methods on the > daemon style workers. This way you can start exactly how many workers > you want and give each one a named job key. Then you can either have > the workers running in a loop where they wake up every second or > whatever interval you set and do whatever jobs they need to do, > > Since you say that you are already keeping the state of the > workers > in the database then maybe an approach like this would be better > suited. The basic idea is something like this: > > An example worker that pulls events off the db and executes them, > then saves them out and loops again > > class RSSWorker < BackgrounDRb::Worker::RailsBase > require ''rss'' > require ''net/http'' > > def do_work(args) > @args = args > mainloop > end > > def mainloop > loop { > sleep @args[:sleep] > RssUrls.find_all_by_pending(true).each do |rss| > rss.processing=true > rss.save > Net::HTTP.start(rss.host) do |http| > response = http.get(rss.path) > raise response.code unless response.code == "200" > rss_parser = RSS::Parser.new(response.body) > rss.output = rss_parser.parse > end > rss.completed=true > rss.save > end > } > end > > end > RSSWorker.register > > Something like that may work better although I am not certain > exactly what your workers need to do. You coudl also do the same > thing with long lived workers but create a method other then do_work > which you can call from rails via the work_thread method. > > > Cheers- > -- Ezra Zygmuntowicz > -- Lead Rails Evangelist > -- ez at engineyard.com > -- Engine Yard, Serious Rails Hosting > -- (866) 518-YARD (9273) > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061208/9d575aab/attachment.html
Hi Ezra, I did the upgrade to 0.2.1. I am going to adjust how I am using the workers based on your suggestions. However, here is a unit test I wrote, that for better or worse has the behavior I was hoping for. The test creates new workers and asserts after each worker that the # of worker threads is less than I specified with the -s option on backgroundrb start. class FooWorker < BackgrounDRb::Worker::RailsBase def do_work(args) sleep 1 end end FooWorker.register def test_middle_man_processes # thread pool size max_bdrb_threads = 2 # stop bdrb just in case %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb stop -- -c backgroundrb_test) # start bdrb int test with thread pool size %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb start -- -c backgroundrb_test -s #{max_bdrb_threads}) # give bdrb time to start sleep(5) # shouldn''t have to do this, but we do orig_env=ENV[''RAILS_ENV''] ENV[''RAILS_ENV''] = ''test'' 10.times do MiddleMan.new_worker :class => :foo_worker worker_count = %x(pstree | grep -v grep | grep #{:foo_worker} | wc -l).gsub("\n", '''').to_i puts "worker_count: #{worker_count} max_bdrb_threads: #{max_bdrb_threads}" # assert that thread pool size is less than what we specified assert worker_count <= max_bdrb_threads, "expected #{max_bdrb_threads} workers, was #{worker_count}" end # shouldn''t have to do this, but we do %x(RAILS_ENV=test; #{RAILS_ROOT}/script/backgroundrb stop -- -c backgroundrb_test) ENV[''RAILS_ENV''] = orig_env end thanks again for the help. i''m going back to your other suggestions now. On 12/8/06, Brian Mulloy <brian at swivel.com> wrote:> > > You are using the latest release of backgroundrb? > > > module BackgrounDRb > VERSION = "0.2.0" > end > > doh, I am going to upgrade and then give it another go. > > these are excellent thoughts, Ezra, thanks. >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20061208/33913125/attachment-0001.html