thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] adding results from threads to a collection and returning it [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Neil Mock

2008-Jun-10 23:56 UTC

[Backgroundrb-devel] adding results from threads to a collection and returning it

Forgive me if this has been addressed somewhere, but I have searched and
can''t come up with anything.

I am basically trying to distribute several web page scraping tasks among
different threads, and have the results from each added to an Array which is
ultimately returned by the backgroundrb worker.  Here is an example of what
I''m trying to do in a worker method:

     pages = Array.new

     pages_to_scrape.each do |url|
          thread_pool.defer(url) do |url|
            begin
              # model object performs the scraping
              page = ScrapedPage.new(page.url)
              pages << page
            rescue
              logger.info "page scrape failed"
            end
          end
        end
      end

    return pages
>From monitoring the backgroundrb logs, it appears that all of the pages arecompleted successfully in the threads.  However, the array that is returned
is empty.  This is to be expected I suppose because the threads don''t
complete before the array is returned, but my question is: how can I make
the worker wait to return the array only when all of the threads are
complete?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080610/3eb915be/attachment-0001.html>

hemant

2008-Jun-11 03:35 UTC

head link

[Backgroundrb-devel] adding results from threads to a collection and returning it

On Wed, Jun 11, 2008 at 5:26 AM, Neil Mock <neilmock at gmail.com>
wrote:> Forgive me if this has been addressed somewhere, but I have searched and
> can''t come up with anything.
>
> I am basically trying to distribute several web page scraping tasks among
> different threads, and have the results from each added to an Array which
is
> ultimately returned by the backgroundrb worker.  Here is an example of what
> I''m trying to do in a worker method:
>
>      pages = Array.new
>
>      pages_to_scrape.each do |url|
>           thread_pool.defer(url) do |url|
>             begin
>               # model object performs the scraping
>               page = ScrapedPage.new(page.url)
>               pages << page
>             rescue
>               logger.info "page scrape failed"
>             end
>           end
>         end
>       end
>
>     return pages
>
> From monitoring the backgroundrb logs, it appears that all of the pages are
> completed successfully in the threads.  However, the array that is returned
> is empty.  This is to be expected I suppose because the threads
don''t
> complete before the array is returned, but my question is: how can I make
> the worker wait to return the array only when all of the threads are
> complete?
>
Actually, you are doing couple of things wrong. First, you are
accessing a variable that you created outside thread_pool from inside
of pool and hence have a big thread unsafe code, which can cause
anything from deadlocks to random crashes.

Thread pools are for running concurrent tasks in background without
any reporting, its for fire and forget kinda of thing. However, I am
contemplating some change in behaviour of thread pools, which will
enable what you want perhaps, so unless your need is dire, please
don''t use thread pools like above snippet.

hemant

2008-Jun-12 19:54 UTC

head link

[Backgroundrb-devel] adding results from threads to a collection and returning it

On Wed, Jun 11, 2008 at 5:26 AM, Neil Mock <neilmock at gmail.com>
wrote:> Forgive me if this has been addressed somewhere, but I have searched and
> can''t come up with anything.
>
> I am basically trying to distribute several web page scraping tasks among
> different threads, and have the results from each added to an Array which
is
> ultimately returned by the backgroundrb worker.  Here is an example of what
> I''m trying to do in a worker method:
>
>      pages = Array.new
>
>      pages_to_scrape.each do |url|
>           thread_pool.defer(url) do |url|
>             begin
>               # model object performs the scraping
>               page = ScrapedPage.new(page.url)
>               pages << page
>             rescue
>               logger.info "page scrape failed"
>             end
>           end
>         end
>       end
>
>     return pages
>
> From monitoring the backgroundrb logs, it appears that all of the pages are
> completed successfully in the threads.  However, the array that is returned
> is empty.  This is to be expected I suppose because the threads
don''t
> complete before the array is returned, but my question is: how can I make
> the worker wait to return the array only when all of the threads are
> complete?
>
Neil,

I have a solution for you in git version:

http://gnufied.org/2008/06/12/unthreaded-threads-of-hobbiton/

Stevie Clifton

2008-Jun-13 08:45 UTC

head link

[Backgroundrb-devel] adding results from threads to a collection and returning it

Hey Hemant,

A couple of questions about fetch_parallely:

1)  Does it operate in the same way as thread_pool.defer, where the
number of concurrent threads are limited by :pool_size?

2)  Why did you choose to introduce another method instead of
providing a thread-safe register_status? (more out of curiosity than
anything else -- in my code I''ve overridden register_status to use a
Mutex, and am wondering what the benefit of fetch_parallely would be
to this)

Thanks!

stevie

On Thu, Jun 12, 2008 at 3:54 PM, hemant <gethemant at gmail.com>
wrote:> On Wed, Jun 11, 2008 at 5:26 AM, Neil Mock <neilmock at gmail.com>
wrote:
>> Forgive me if this has been addressed somewhere, but I have searched
and
>> can''t come up with anything.
>>
>> I am basically trying to distribute several web page scraping tasks
among
>> different threads, and have the results from each added to an Array
which is
>> ultimately returned by the backgroundrb worker.  Here is an example of
what
>> I''m trying to do in a worker method:
>>
>>      pages = Array.new
>>
>>      pages_to_scrape.each do |url|
>>           thread_pool.defer(url) do |url|
>>             begin
>>               # model object performs the scraping
>>               page = ScrapedPage.new(page.url)
>>               pages << page
>>             rescue
>>               logger.info "page scrape failed"
>>             end
>>           end
>>         end
>>       end
>>
>>     return pages
>>
>> From monitoring the backgroundrb logs, it appears that all of the pages
are
>> completed successfully in the threads.  However, the array that is
returned
>> is empty.  This is to be expected I suppose because the threads
don''t
>> complete before the array is returned, but my question is: how can I
make
>> the worker wait to return the array only when all of the threads are
>> complete?
>>
>
> Neil,
>
> I have a solution for you in git version:
>
> http://gnufied.org/2008/06/12/unthreaded-threads-of-hobbiton/
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>

hemant

2008-Jun-13 10:33 UTC

head link

[Backgroundrb-devel] adding results from threads to a collection and returning it

On Fri, Jun 13, 2008 at 2:15 PM, Stevie Clifton <stevie at
slowbicycle.com> wrote:> Hey Hemant,
>
> A couple of questions about fetch_parallely:
>
> 1)  Does it operate in the same way as thread_pool.defer, where the
> number of concurrent threads are limited by :pool_size?
>
> 2)  Why did you choose to introduce another method instead of
> providing a thread-safe register_status? (more out of curiosity than
> anything else -- in my code I''ve overridden register_status to use
a
> Mutex, and am wondering what the benefit of fetch_parallely would be
> to this)
register_status is going to invoke send_data at one point or another.
yeah sure, prolly i can make outbound_data instance variable protected
by a mutex, but thats going to slow down the whole operation by a
large margin. Thats not simply point of a event driven network
programming library.  It will mean that, I will have to check for
mutex, each time i write. What a waste of time it will be!

Now, on the other hand, if we can make sure that, we retrieve results
from thread pool in a thread safe manner and then invoke send_data,
everything is nice and dandy. fetch_parallely does exactly that. Name
is a bit dubious, i didn''t want to break existing functionality and at
the same time, wanted to add this. Let me know, if you have better
name.

Seemingly Similar Threads

Search for more seemingly similar threads

Backgroundrb devel - Jun 2008 - adding results from threads to a collection and returning it

[Backgroundrb-devel] adding results from threads to a collection and returning it

[Backgroundrb-devel] adding results from threads to a collection and returning it

[Backgroundrb-devel] adding results from threads to a collection and returning it

[Backgroundrb-devel] adding results from threads to a collection and returning it

[Backgroundrb-devel] adding results from threads to a collection and returning it

Seemingly Similar Threads