Hello, I am trying to build an application that will parse thousands of XML Feeds in continue in the back. I really have no idea on how to do this "correctly" with Rails. Here is my code : class Feed < ActiveRecord::Base def parse # Parsing using external lib (syndication gem) end end class FeedsController < ApplicationController def parse feed = Feed.find(params[:id]) feed.parse end end So for now, if I want to parse all my feeds forever, what I have to do is to call http://myapp/feeds/1/parse, and then http://myapp/feeds/2/parse ... This is definetely not a good solution! How can I use Backgroundrb to do this? Thanks for your help! -- Julien Genestoux julien.genestoux at gmail.com +1 (415) 254 7340 +33 (0)8 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080422/d86201e4/attachment.html
On Apr 22, 2008, at 7:36 PM, Julien Genestoux wrote:> So for now, if I want to parse all my feeds forever, what I have to > do is to call http://myapp/feeds/1/parse, and then http://myapp/feeds/2/parse > ... > This is definetely not a good solution! > > How can I use Backgroundrb to do this?1. Use the version of backgroundrb from subversion. The git one was having problems for me. 2. Follow these instructions: http://backgroundrb.rubyforge.org/ 3. Then read this: http://backgroundrb.rubyforge.org/rails/index.html 4. Create a worker of your own. Schedule it according to http://backgroundrb.rubyforge.org/scheduling/index.html adam (a 3 day old user of backgroundrb)
Thanks Adam for your help... I still have a few questions : shoud I have one worker for each feed that is called periodically (add_periodic_timer) or rather one single worker that calls every feed one by one? What is the best solution, perfomance-wise? Thanks again for your help! Best On 4/22/08, Adam Williams <adam at thewilliams.ws> wrote:> On Apr 22, 2008, at 7:36 PM, Julien Genestoux wrote: > > > So for now, if I want to parse all my feeds forever, what I have to > > do is to call http://myapp/feeds/1/parse, and then http://myapp/feeds/2/parse > > ... > > This is definetely not a good solution! > > > > How can I use Backgroundrb to do this? > > > 1. Use the version of backgroundrb from subversion. The git one was > having problems for me. > 2. Follow these instructions: http://backgroundrb.rubyforge.org/ > 3. Then read this: http://backgroundrb.rubyforge.org/rails/index.html > 4. Create a worker of your own. Schedule it according to http://backgroundrb.rubyforge.org/scheduling/index.html > > adam (a 3 day old user of backgroundrb) > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >-- -- Julien Genestoux julien.genestoux at gmail.com http://www.ouvre-boite.com +1 (415) 254 7340 +33 (0)8 70 44 76 29
On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote:> I still have a few questions : shoud I have one worker for each feed > that is called periodically (add_periodic_timer) or rather one single > worker that calls every feed one by one? > > What is the best solution, perfomance-wise?Good question... I don''t suppose I know exactly. I would start by processing all the feeds in one worker invocation - that is what I have done for sending an unknown amount of email. It just seems wrong to me to invoke a worker for one email at a time. The right answer likely lies in understanding the whole MasterWorker, Packet::Reactor/handler_instance.ask_work bits of the puzzle... adam
Thanks Adam, That sounded weird to me as well to have one worker for each feed... However, if I only have one worker, that also means that I am parsing one feed only at any moment. An option, is maybe to have a few workers (denpending on the number of feeds) that parse feeds concurrently? If I only have one worker, according to you what should be the winnning strategy to choose the "right" parse to feed? Obviously some feeds need to be parsed one every few minutes, while some other might no need to be parse more than every hour... Any idea/tip on this? On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote:> On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > I still have a few questions : shoud I have one worker for each feed > > that is called periodically (add_periodic_timer) or rather one single > > worker that calls every feed one by one? > > > > What is the best solution, perfomance-wise? > > > Good question... I don''t suppose I know exactly. I would start by > processing all the feeds in one worker invocation - that is what I > have done for sending an unknown amount of email. It just seems wrong > to me to invoke a worker for one email at a time. > > The right answer likely lies in understanding the whole MasterWorker, > Packet::Reactor/handler_instance.ask_work bits of the puzzle... > > > adam > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >-- -- Julien Genestoux julien.genestoux at gmail.com http://www.ouvre-boite.com +1 (415) 254 7340 +33 (0)8 70 44 76 29
Hey Julien/Adam, There was a great thread about a similar situation about 10 days ago. Check it out here: http://rubyforge.org/pipermail/backgroundrb-devel/2008-April/001681.html Julien, you definitely don''t want a worker for each feed, and you''ll want to use thread_pool.defer, which will allow you to concurrently process as many feeds as you want (or as many as your system can handle). From what you''ve said, it sounds like you''ll only need one worker coded up, but probably set multiple periodic timers (e.g one of hourly parsing of high-priority feeds, one for nightlies, etc). The method you specify in the periodic timer should use thread_pool.defer to handle processing of multiple feeds at a time -- there''s no reason to do them sequentially. stevie On Wed, Apr 23, 2008 at 10:30 AM, Julien Genestoux <julien.genestoux at gmail.com> wrote:> Thanks Adam, > > That sounded weird to me as well to have one worker for each feed... > However, if I only have one worker, that also means that I am parsing > one feed only at any moment. An option, is maybe to have a few workers > (denpending on the number of feeds) that parse feeds concurrently? > > If I only have one worker, according to you what should be the > winnning strategy to choose the "right" parse to feed? Obviously some > feeds need to be parsed one every few minutes, while some other might > no need to be parse more than every hour... > > Any idea/tip on this? > > > > > > > On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote: > > On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > > > I still have a few questions : shoud I have one worker for each feed > > > that is called periodically (add_periodic_timer) or rather one single > > > worker that calls every feed one by one? > > > > > > What is the best solution, perfomance-wise? > > > > > > Good question... I don''t suppose I know exactly. I would start by > > processing all the feeds in one worker invocation - that is what I > > have done for sending an unknown amount of email. It just seems wrong > > to me to invoke a worker for one email at a time. > > > > The right answer likely lies in understanding the whole MasterWorker, > > Packet::Reactor/handler_instance.ask_work bits of the puzzle... > > > > > > adam > > > > _______________________________________________ > > Backgroundrb-devel mailing list > > Backgroundrb-devel at rubyforge.org > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > -- > -- > Julien Genestoux > julien.genestoux at gmail.com > http://www.ouvre-boite.com > +1 (415) 254 7340 > +33 (0)8 70 44 76 29 > _______________________________________________ > > > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >
You can use the built build thread pool to process more than one feed within the same worker. So within the worker, you''d do, def parse_feeds loop do feed = Feed.find_feed_to_process thread_pool.defer do feed.parse end end end I think the default pool size is 20. You can control the size of the thread pool using a class level method, as I recall it is pool_size x Paul On Wed, Apr 23, 2008 at 7:30 AM, Julien Genestoux < julien.genestoux at gmail.com> wrote:> Thanks Adam, > > That sounded weird to me as well to have one worker for each feed... > However, if I only have one worker, that also means that I am parsing > one feed only at any moment. An option, is maybe to have a few workers > (denpending on the number of feeds) that parse feeds concurrently? > > If I only have one worker, according to you what should be the > winnning strategy to choose the "right" parse to feed? Obviously some > feeds need to be parsed one every few minutes, while some other might > no need to be parse more than every hour... > > Any idea/tip on this? > > > > > On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote: > > On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > > > I still have a few questions : shoud I have one worker for each feed > > > that is called periodically (add_periodic_timer) or rather one single > > > worker that calls every feed one by one? > > > > > > What is the best solution, perfomance-wise? > > > > > > Good question... I don''t suppose I know exactly. I would start by > > processing all the feeds in one worker invocation - that is what I > > have done for sending an unknown amount of email. It just seems wrong > > to me to invoke a worker for one email at a time. > > > > The right answer likely lies in understanding the whole MasterWorker, > > Packet::Reactor/handler_instance.ask_work bits of the puzzle... > > > > > > adam > > > > _______________________________________________ > > Backgroundrb-devel mailing list > > Backgroundrb-devel at rubyforge.org > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > -- > -- > Julien Genestoux > julien.genestoux at gmail.com > http://www.ouvre-boite.com > +1 (415) 254 7340 > +33 (0)8 70 44 76 29 > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080423/2437376c/attachment.html
Thanks guys... that''s a ton of info! I am definetely gonna use the thread_pool... as soon as I can find the documentation ;D 1- For each feed, I define a "frequency" (every minute, every hour... every 30 minutes...) that will be updated every time I''m parsing the feed: if the parser returns "new" element, I am increasding the frequency (from 1 time per hour, to 1 time per 30 min.), if not, I''m decreasing the frequency... 2- I also have a "last_update" field which remembers the time when the feed was parsed for the last time. 3- With 1 & 2, I know how "late" I am to parse a feed... so when I choose my next feed to parse, I am always choosing the one that is the most "late" I am not sure if Steevie''s approach of having multiple tasks for the worker applies here. Actually, I am not even schedulling my worker, I am just launching it once, and the parse_feeds runs forever (while true do... end) Also, if I understand well Paul''s code, his approach allows my worker to be more efficient always, but doesn''t take into account the "lateness" of my feeds. My idea would be to add/remove worker according to "how late" I am in parsing feeds. If my the the lastest feed is late by more than 10min, I would add one worker... and If my latest feed is late by less than 5 minutes, I would remove one worker Does this approach makes sense to you? Thanks a lot for your help guys... On 4/23/08, Paul Kmiec <paul.kmiec at appfolio.com> wrote:> You can use the built build thread pool to process more than one feed within > the same worker. So within the worker, you''d do, > > def parse_feeds > loop do > feed = Feed.find_feed_to_process > thread_pool.defer do > feed.parse > end > end > end > > I think the default pool size is 20. You can control the size of the thread > pool using a class level method, as I recall it is > > pool_size x > > Paul > > > On Wed, Apr 23, 2008 at 7:30 AM, Julien Genestoux > <julien.genestoux at gmail.com> wrote: > > Thanks Adam, > > > > That sounded weird to me as well to have one worker for each feed... > > However, if I only have one worker, that also means that I am parsing > > one feed only at any moment. An option, is maybe to have a few workers > > (denpending on the number of feeds) that parse feeds concurrently? > > > > If I only have one worker, according to you what should be the > > winnning strategy to choose the "right" parse to feed? Obviously some > > feeds need to be parsed one every few minutes, while some other might > > no need to be parse more than every hour... > > > > Any idea/tip on this? > > > > > > > > > > > > > > > > On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote: > > > On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > > > > > I still have a few questions : shoud I have one worker for each feed > > > > that is called periodically (add_periodic_timer) or rather one single > > > > worker that calls every feed one by one? > > > > > > > > What is the best solution, perfomance-wise? > > > > > > > > > Good question... I don''t suppose I know exactly. I would start by > > > processing all the feeds in one worker invocation - that is what I > > > have done for sending an unknown amount of email. It just seems wrong > > > to me to invoke a worker for one email at a time. > > > > > > The right answer likely lies in understanding the whole MasterWorker, > > > Packet::Reactor/handler_instance.ask_work bits of the > puzzle... > > > > > > > > > adam > > > > > > _______________________________________________ > > > Backgroundrb-devel mailing list > > > Backgroundrb-devel at rubyforge.org > > > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > > > > > > -- > > -- > > Julien Genestoux > > julien.genestoux at gmail.com > > http://www.ouvre-boite.com > > +1 (415) 254 7340 > > +33 (0)8 70 44 76 29 > > _______________________________________________ > > > > > > > > Backgroundrb-devel mailing list > > Backgroundrb-devel at rubyforge.org > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > >-- -- Julien Genestoux julien.genestoux at gmail.com http://www.ouvre-boite.com +1 (415) 254 7340 +33 (0)8 70 44 76 29
Hey Julien, It sounds like you are planning on using one "long running" feed parsing loop with a do...while. This is exactly the sort of thing you want to avoid in new bdrb, especially if you know you want to do something at discrete time periods--it totally goes against the twisted paradigm. After thinking about it for a bit, I would recommend setting just one periodic_timer for every minute, and then determining in your parse_feeds method which feeds need to be parsed. If I were you, I wouldn''t use last_updated to determine when to parse your feeds -- it adds unnecessary complexity to your system. You can of course save that value for reference, but it''s not necessary for your requirements. In your db you could have a field for every feed call "interval" that would determine the minute intervals to parse the feeds. Then every minute when parse_feed gets called, you could parse every feed with an interval of "1", and then determine based on the current minute in the hour whether or not to try to parse the 15, 30, or 60 minute feeds. And you''ll of course want to use thread_pool.defer. So, using Paul''s code as a starting point, something like this: def parse_feeds feeds = Feed.find_feeds_to_process feeds.each do |feed| thread_pool.defer do feed.parse end end end class Feed def find_feeds_to_process feeds = [] [1, 15, 30, 60].each |interval| feeds << Feeds.find_by_interval( interval ) if Time.now.min % interval == 0 end end def parse # parsing code end end On my way home yesterday I thought of another sexy addition you could add to this. In the above code, you know that you''ll be parsing _every_ feed in your db on the hour, which isn''t a very efficient setup. If possible, you want to set it up so that you have even parsing distribution throughout the hour so you''re not getting hammered. You could add a pretty simple heuristic that would give you a relatively even distribution across the hour by using the hash of the feed url. Along with the url and the interval, save an "offset" value like this example: feed = Feed.new feed.url = ''''my_feed_url'' feed.interval = 15 feed.offset = feed.url.hash % 60 feed.save Then in find_feeds_to_process, you can do this (untested): # the select will return any feed for which its interval offset matches the current minute''s offset for the same interval def find_feeds_to_process feeds = Feed.find(:all).select do |feed| [15, 30, 60].detect { |interval| feed.offset % interval =Time.now.min % interval } end end Doing a Feed.find(:all) is probably not the best idea if you have a ton of records, so you might want to do multiple db finds to get the same results. stevie On Wed, Apr 23, 2008 at 5:46 PM, Julien Genestoux <julien.genestoux at gmail.com> wrote:> Thanks guys... that''s a ton of info! I am definetely gonna use the > thread_pool... as soon as I can find the documentation ;D > > 1- For each feed, I define a "frequency" (every minute, every hour... > every 30 minutes...) that will be updated every time I''m parsing the > feed: if the parser returns "new" element, I am increasding the > frequency (from 1 time per hour, to 1 time per 30 min.), if not, I''m > decreasing the frequency... > > 2- I also have a "last_update" field which remembers the time when the > feed was parsed for the last time. > > 3- With 1 & 2, I know how "late" I am to parse a feed... so when I > choose my next feed to parse, I am always choosing the one that is the > most "late" > > I am not sure if Steevie''s approach of having multiple tasks for the > worker applies here. Actually, I am not even schedulling my worker, I > am just launching it once, and the parse_feeds runs forever (while > true do... end) > > Also, if I understand well Paul''s code, his approach allows my worker > to be more efficient always, but doesn''t take into account the > "lateness" of my feeds. > > > My idea would be to add/remove worker according to "how late" I am in > parsing feeds. > If my the the lastest feed is late by more than 10min, I would add one > worker... and If my latest feed is late by less than 5 minutes, I > would remove one worker > > Does this approach makes sense to you? > > Thanks a lot for your help guys... > > > > > On 4/23/08, Paul Kmiec <paul.kmiec at appfolio.com> wrote: > > You can use the built build thread pool to process more than one feed within > > the same worker. So within the worker, you''d do, > > > > def parse_feeds > > loop do > > feed = Feed.find_feed_to_process > > thread_pool.defer do > > feed.parse > > end > > end > > end > > > > I think the default pool size is 20. You can control the size of the thread > > pool using a class level method, as I recall it is > > > > pool_size x > > > > Paul > > > > > > On Wed, Apr 23, 2008 at 7:30 AM, Julien Genestoux > > <julien.genestoux at gmail.com> wrote: > > > Thanks Adam, > > > > > > That sounded weird to me as well to have one worker for each feed... > > > However, if I only have one worker, that also means that I am parsing > > > one feed only at any moment. An option, is maybe to have a few workers > > > (denpending on the number of feeds) that parse feeds concurrently? > > > > > > If I only have one worker, according to you what should be the > > > winnning strategy to choose the "right" parse to feed? Obviously some > > > feeds need to be parsed one every few minutes, while some other might > > > no need to be parse more than every hour... > > > > > > Any idea/tip on this? > > > > > > > > > > > > > > > > > > > > > > > > On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote: > > > > On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > > > > > > > I still have a few questions : shoud I have one worker for each feed > > > > > that is called periodically (add_periodic_timer) or rather one single > > > > > worker that calls every feed one by one? > > > > > > > > > > What is the best solution, perfomance-wise? > > > > > > > > > > > > Good question... I don''t suppose I know exactly. I would start by > > > > processing all the feeds in one worker invocation - that is what I > > > > have done for sending an unknown amount of email. It just seems wrong > > > > to me to invoke a worker for one email at a time. > > > > > > > > The right answer likely lies in understanding the whole MasterWorker, > > > > Packet::Reactor/handler_instance.ask_work bits of the > > puzzle... > > > > > > > > > > > > adam > > > > > > > > _______________________________________________ > > > > Backgroundrb-devel mailing list > > > > Backgroundrb-devel at rubyforge.org > > > > > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > > > > > > > > > > > -- > > > -- > > > Julien Genestoux > > > julien.genestoux at gmail.com > > > http://www.ouvre-boite.com > > > +1 (415) 254 7340 > > > +33 (0)8 70 44 76 29 > > > _______________________________________________ > > > > > > > > > > > > Backgroundrb-devel mailing list > > > Backgroundrb-devel at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > > > > > -- > > > -- > Julien Genestoux > julien.genestoux at gmail.com > http://www.ouvre-boite.com > +1 (415) 254 7340 > +33 (0)8 70 44 76 29 > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >
Thanks a lot for this very helpful answer. I implement a solution very similar to yours and it runs, but I have 2 big problems. The first one is "throughput". If I have a periodic timer of 1 minute, I can only parse 20 (number of threads) feeds per minute, which leads to 1200 per hour (since I want to parse a feed at least once every hour). The problem is that I really need to be able to parse at least 10 times this number of feeds... and probably closer to 100k! What if I increase the number of threads? Will I be able to parse more feeds? The second one is actually a lot worse. I''ve had my system running for a little more than a day... without monitoring it, and well, this morning, everything was "down". I did a "ps aux" and here is what I got : USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 21697 0.0 0.8 32524 15620 ? D Apr27 0:13 ruby /mnt/app/current/script/backgroundrb start -e production root 21698 0.0 0.2 32504 4736 ? D Apr27 0:08 ruby log_worker root 21699 1.1 90.5 2170872 1576364 ? D Apr27 25:58 ruby parser_worker As you can see, my parser_worker is consuming a little over 1,5Gb of RAM : wayyyy too much ;) it seems that the vars are not destroyed in my worker? Any idea of what''s wrong? Thanks a lot once again for your help! Best, On 4/25/08, Stevie Clifton <stevie at slowbicycle.com> wrote:> Hey Julien, > > It sounds like you are planning on using one "long running" feed > parsing loop with a do...while. This is exactly the sort of thing you > want to avoid in new bdrb, especially if you know you want to do > something at discrete time periods--it totally goes against the > twisted paradigm. After thinking about it for a bit, I would > recommend setting just one periodic_timer for every minute, and then > determining in your parse_feeds method which feeds need to be parsed. > If I were you, I wouldn''t use last_updated to determine when to parse > your feeds -- it adds unnecessary complexity to your system. You can > of course save that value for reference, but it''s not necessary for > your requirements. > > In your db you could have a field for every feed call "interval" that > would determine the minute intervals to parse the feeds. Then every > minute when parse_feed gets called, you could parse every feed with an > interval of "1", and then determine based on the current minute in the > hour whether or not to try to parse the 15, 30, or 60 minute feeds. > And you''ll of course want to use thread_pool.defer. So, using Paul''s > code as a starting point, something like this: > > def parse_feeds > feeds = Feed.find_feeds_to_process > feeds.each do |feed| > > thread_pool.defer do > feed.parse > end > end > end > > > class Feed > def find_feeds_to_process > feeds = [] > [1, 15, 30, 60].each |interval| > feeds << Feeds.find_by_interval( interval ) if Time.now.min % > interval == 0 > end > end > def parse > # parsing code > end > end > > On my way home yesterday I thought of another sexy addition you could > add to this. In the above code, you know that you''ll be parsing > _every_ feed in your db on the hour, which isn''t a very efficient > setup. If possible, you want to set it up so that you have even > parsing distribution throughout the hour so you''re not getting > hammered. You could add a pretty simple heuristic that would give you > a relatively even distribution across the hour by using the hash of > the feed url. Along with the url and the interval, save an "offset" > value like this example: > > feed = Feed.new > feed.url = ''''my_feed_url'' > feed.interval = 15 > feed.offset = feed.url.hash % 60 > feed.save > > Then in find_feeds_to_process, you can do this (untested): > > # the select will return any feed for which its interval offset > matches the current minute''s offset for the same interval > def find_feeds_to_process > feeds = Feed.find(:all).select do |feed| > [15, 30, 60].detect { |interval| feed.offset % interval => Time.now.min % interval } > end > end > > Doing a Feed.find(:all) is probably not the best idea if you have a > ton of records, so you might want to do multiple db finds to get the > same results. > > stevie > > > On Wed, Apr 23, 2008 at 5:46 PM, Julien Genestoux > > <julien.genestoux at gmail.com> wrote: > > Thanks guys... that''s a ton of info! I am definetely gonna use the > > thread_pool... as soon as I can find the documentation ;D > > > > 1- For each feed, I define a "frequency" (every minute, every hour... > > every 30 minutes...) that will be updated every time I''m parsing the > > feed: if the parser returns "new" element, I am increasding the > > frequency (from 1 time per hour, to 1 time per 30 min.), if not, I''m > > decreasing the frequency... > > > > 2- I also have a "last_update" field which remembers the time when the > > feed was parsed for the last time. > > > > 3- With 1 & 2, I know how "late" I am to parse a feed... so when I > > choose my next feed to parse, I am always choosing the one that is the > > most "late" > > > > I am not sure if Steevie''s approach of having multiple tasks for the > > worker applies here. Actually, I am not even schedulling my worker, I > > am just launching it once, and the parse_feeds runs forever (while > > true do... end) > > > > Also, if I understand well Paul''s code, his approach allows my worker > > to be more efficient always, but doesn''t take into account the > > "lateness" of my feeds. > > > > > > My idea would be to add/remove worker according to "how late" I am in > > parsing feeds. > > If my the the lastest feed is late by more than 10min, I would add one > > worker... and If my latest feed is late by less than 5 minutes, I > > would remove one worker > > > > Does this approach makes sense to you? > > > > Thanks a lot for your help guys... > > > > > > > > > > On 4/23/08, Paul Kmiec <paul.kmiec at appfolio.com> wrote: > > > You can use the built build thread pool to process more than one feed within > > > the same worker. So within the worker, you''d do, > > > > > > def parse_feeds > > > loop do > > > feed = Feed.find_feed_to_process > > > thread_pool.defer do > > > feed.parse > > > end > > > end > > > end > > > > > > I think the default pool size is 20. You can control the size of the thread > > > pool using a class level method, as I recall it is > > > > > > pool_size x > > > > > > Paul > > > > > > > > > On Wed, Apr 23, 2008 at 7:30 AM, Julien Genestoux > > > <julien.genestoux at gmail.com> wrote: > > > > Thanks Adam, > > > > > > > > That sounded weird to me as well to have one worker for each feed... > > > > However, if I only have one worker, that also means that I am parsing > > > > one feed only at any moment. An option, is maybe to have a few workers > > > > (denpending on the number of feeds) that parse feeds concurrently? > > > > > > > > If I only have one worker, according to you what should be the > > > > winnning strategy to choose the "right" parse to feed? Obviously some > > > > feeds need to be parsed one every few minutes, while some other might > > > > no need to be parse more than every hour... > > > > > > > > Any idea/tip on this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 4/23/08, Adam Williams <adam at thewilliams.ws> wrote: > > > > > On Apr 23, 2008, at 1:07 AM, Julien Genestoux wrote: > > > > > > > > > > > I still have a few questions : shoud I have one worker for each feed > > > > > > that is called periodically (add_periodic_timer) or rather one single > > > > > > worker that calls every feed one by one? > > > > > > > > > > > > What is the best solution, perfomance-wise? > > > > > > > > > > > > > > > Good question... I don''t suppose I know exactly. I would start by > > > > > processing all the feeds in one worker invocation - that is what I > > > > > have done for sending an unknown amount of email. It just seems wrong > > > > > to me to invoke a worker for one email at a time. > > > > > > > > > > The right answer likely lies in understanding the whole MasterWorker, > > > > > Packet::Reactor/handler_instance.ask_work bits of the > > > puzzle... > > > > > > > > > > > > > > > adam > > > > > > > > > > _______________________________________________ > > > > > Backgroundrb-devel mailing list > > > > > Backgroundrb-devel at rubyforge.org > > > > > > > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > > > > > > > > > > > > > > > > -- > > > > -- > > > > Julien Genestoux > > > > julien.genestoux at gmail.com > > > > http://www.ouvre-boite.com > > > > +1 (415) 254 7340 > > > > +33 (0)8 70 44 76 29 > > > > _______________________________________________ > > > > > > > > > > > > > > > > Backgroundrb-devel mailing list > > > > Backgroundrb-devel at rubyforge.org > > > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > > > > > > > > > > > > > > > -- > > > > > > -- > > Julien Genestoux > > julien.genestoux at gmail.com > > http://www.ouvre-boite.com > > +1 (415) 254 7340 > > +33 (0)8 70 44 76 29 > > _______________________________________________ > > Backgroundrb-devel mailing list > > Backgroundrb-devel at rubyforge.org > > http://rubyforge.org/mailman/listinfo/backgroundrb-devel > > >-- -- Julien Genestoux julien.genestoux at gmail.com http://www.ouvre-boite.com +1 (415) 254 7340 +33 (0)8 70 44 76 29