Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn''t really find any information about it anywhere. I''ve been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I''m trying to write a spider that does multiple gets in parallel, but it keeps puking when I thread it. Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee)
Can you give more information on where it dies? I''ve run mechanize
successfully with multiple threads but I did have to work some kinks out, mostly
with database access.
Matt White
----- Original Message ----
From: Carl Lerche <carl.lerche at gmail.com>
To: mechanize-users at rubyforge.org
Sent: Friday, July 27, 2007 12:43:07 AM
Subject: [Mechanize-users] Is mechanize thread safe?
Hello all,
I was just wondering if anybody knew whether mechanize is supposed to
be thread-safe or not? I didn''t really find any information about it
anywhere. I''ve been getting a strange error in protocol.rb when I run
a script that uses mechanize in a multi threaded fashion, but not with
a single thread.
I''m trying to write a spider that does multiple gets in parallel, but
it keeps puking when I thread it.
Thanks,
-carl
--
EPA Rating: 3000 Lines of Code / Gallon (of coffee)
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
____________________________________________________________________________________
Get the free Yahoo! toolbar and rest assured with the added security of spyware
protection.
http://new.toolbar.yahoo.com/toolbar/features/norton/index.php
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/mechanize-users/attachments/20070727/0ffc627b/attachment.html
Thanks for the response.
Here is my bit of code. I''m no expert coder, but I think I got the
mutex applied where it is needed.
Here are various errors I get. What I notice is that it seems like
stuff is getting overwritten left and right because of the threading.
It all seems to happen in net/http, but as far as I know, net/http is
thread safe (I''ve done a lot of threading with it before).
whowhat:~/Developer/Tools/Parser carllerche$ ./spider
/usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong
status line: "gieslist.com/angieslist/Login.aspx\"
class=\"link\"
title=\"Angie''s List Member Login\"
onmouseover=\"window.status=this.title;return true;\"
onmouseout=\"window.status=defaultStatus;return true;\">"
(Net::HTTPBadResponse)
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`join''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`each''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in
`crawl''
from ./spider:6
whowhat:~/Developer/Tools/Parser carllerche$ ./spider
/usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0'': undefined
method `+'' for nil:NilClass (NoMethodError)
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`join''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`each''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in
`crawl''
from ./spider:6
whowhat:~/Developer/Tools/Parser carllerche$ ./spider
/usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong
status line: " _uacct = \"UA-448811-1\"; "
(Net::HTTPBadResponse)
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`join''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`each''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in
`crawl_entries''
from
/Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in
`crawl''
from ./spider:6
On 7/27/07, Matt White <whitethunder922 at yahoo.com>
wrote:>
> Can you give more information on where it dies? I''ve run mechanize
> successfully with multiple threads but I did have to work some kinks out,
> mostly with database access.
>
> Matt White
>
>
> ----- Original Message ----
> From: Carl Lerche <carl.lerche at gmail.com>
> To: mechanize-users at rubyforge.org
> Sent: Friday, July 27, 2007 12:43:07 AM
> Subject: [Mechanize-users] Is mechanize thread safe?
>
>
> Hello all,
>
> I was just wondering if anybody knew whether mechanize is supposed to
> be thread-safe or not? I didn''t really find any information about
it
> anywhere. I''ve been getting a strange error in protocol.rb when I
run
> a script that uses mechanize in a multi threaded fashion, but not with
> a single thread.
>
> I''m trying to write a spider that does multiple gets in parallel,
but
> it keeps puking when I thread it.
>
> Thanks,
> -carl
>
> --
> EPA Rating: 3000 Lines of Code / Gallon (of coffee)
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
>
>
> ________________________________
> Get the free Yahoo! toolbar and rest assured with the added security of
> spyware protection.
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
>
--
EPA Rating: 3000 Lines of Code / Gallon (of coffee)
Welp, i looked through the mechanize code. Doesn''t look thread safe to me. Good to know for future reference. -carl On 7/27/07, Carl Lerche <carl.lerche at gmail.com> wrote:> Thanks for the response. > > Here is my bit of code. I''m no expert coder, but I think I got the > mutex applied where it is needed. > > Here are various errors I get. What I notice is that it seems like > stuff is getting overwritten left and right because of the threading. > It all seems to happen in net/http, but as far as I know, net/http is > thread safe (I''ve done a lot of threading with it before). > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong > status line: "gieslist.com/angieslist/Login.aspx\" class=\"link\" > title=\"Angie''s List Member Login\" > onmouseover=\"window.status=this.title;return true;\" > onmouseout=\"window.status=defaultStatus;return true;\">" > (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0'': undefined > method `+'' for nil:NilClass (NoMethodError) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong > status line: " _uacct = \"UA-448811-1\"; " (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > On 7/27/07, Matt White <whitethunder922 at yahoo.com> wrote: > > > > Can you give more information on where it dies? I''ve run mechanize > > successfully with multiple threads but I did have to work some kinks out, > > mostly with database access. > > > > Matt White > > > > > > ----- Original Message ---- > > From: Carl Lerche <carl.lerche at gmail.com> > > To: mechanize-users at rubyforge.org > > Sent: Friday, July 27, 2007 12:43:07 AM > > Subject: [Mechanize-users] Is mechanize thread safe? > > > > > > Hello all, > > > > I was just wondering if anybody knew whether mechanize is supposed to > > be thread-safe or not? I didn''t really find any information about it > > anywhere. I''ve been getting a strange error in protocol.rb when I run > > a script that uses mechanize in a multi threaded fashion, but not with > > a single thread. > > > > I''m trying to write a spider that does multiple gets in parallel, but > > it keeps puking when I thread it. > > > > Thanks, > > -carl > > > > -- > > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > ________________________________ > > Get the free Yahoo! toolbar and rest assured with the added security of > > spyware protection. > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > -- > EPA Rating: 3000 Lines of Code / Gallon (of coffee) >-- EPA Rating: 3000 Lines of Code / Gallon (of coffee)