Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn''t really find any information about it anywhere. I''ve been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I''m trying to write a spider that does multiple gets in parallel, but it keeps puking when I thread it. Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee)
Can you give more information on where it dies? I''ve run mechanize successfully with multiple threads but I did have to work some kinks out, mostly with database access. Matt White ----- Original Message ---- From: Carl Lerche <carl.lerche at gmail.com> To: mechanize-users at rubyforge.org Sent: Friday, July 27, 2007 12:43:07 AM Subject: [Mechanize-users] Is mechanize thread safe? Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn''t really find any information about it anywhere. I''ve been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I''m trying to write a spider that does multiple gets in parallel, but it keeps puking when I thread it. Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users ____________________________________________________________________________________ Get the free Yahoo! toolbar and rest assured with the added security of spyware protection. http://new.toolbar.yahoo.com/toolbar/features/norton/index.php -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070727/0ffc627b/attachment.html
Thanks for the response. Here is my bit of code. I''m no expert coder, but I think I got the mutex applied where it is needed. Here are various errors I get. What I notice is that it seems like stuff is getting overwritten left and right because of the threading. It all seems to happen in net/http, but as far as I know, net/http is thread safe (I''ve done a lot of threading with it before). whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong status line: "gieslist.com/angieslist/Login.aspx\" class=\"link\" title=\"Angie''s List Member Login\" onmouseover=\"window.status=this.title;return true;\" onmouseout=\"window.status=defaultStatus;return true;\">" (Net::HTTPBadResponse) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl'' from ./spider:6 whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0'': undefined method `+'' for nil:NilClass (NoMethodError) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl'' from ./spider:6 whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong status line: " _uacct = \"UA-448811-1\"; " (Net::HTTPBadResponse) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries'' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl'' from ./spider:6 On 7/27/07, Matt White <whitethunder922 at yahoo.com> wrote:> > Can you give more information on where it dies? I''ve run mechanize > successfully with multiple threads but I did have to work some kinks out, > mostly with database access. > > Matt White > > > ----- Original Message ---- > From: Carl Lerche <carl.lerche at gmail.com> > To: mechanize-users at rubyforge.org > Sent: Friday, July 27, 2007 12:43:07 AM > Subject: [Mechanize-users] Is mechanize thread safe? > > > Hello all, > > I was just wondering if anybody knew whether mechanize is supposed to > be thread-safe or not? I didn''t really find any information about it > anywhere. I''ve been getting a strange error in protocol.rb when I run > a script that uses mechanize in a multi threaded fashion, but not with > a single thread. > > I''m trying to write a spider that does multiple gets in parallel, but > it keeps puking when I thread it. > > Thanks, > -carl > > -- > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > > ________________________________ > Get the free Yahoo! toolbar and rest assured with the added security of > spyware protection. > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-- EPA Rating: 3000 Lines of Code / Gallon (of coffee)
Welp, i looked through the mechanize code. Doesn''t look thread safe to me. Good to know for future reference. -carl On 7/27/07, Carl Lerche <carl.lerche at gmail.com> wrote:> Thanks for the response. > > Here is my bit of code. I''m no expert coder, but I think I got the > mutex applied where it is needed. > > Here are various errors I get. What I notice is that it seems like > stuff is getting overwritten left and right because of the threading. > It all seems to happen in net/http, but as far as I know, net/http is > thread safe (I''ve done a lot of threading with it before). > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong > status line: "gieslist.com/angieslist/Login.aspx\" class=\"link\" > title=\"Angie''s List Member Login\" > onmouseover=\"window.status=this.title;return true;\" > onmouseout=\"window.status=defaultStatus;return true;\">" > (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0'': undefined > method `+'' for nil:NilClass (NoMethodError) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line'': wrong > status line: " _uacct = \"UA-448811-1\"; " (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries'' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl'' > from ./spider:6 > > On 7/27/07, Matt White <whitethunder922 at yahoo.com> wrote: > > > > Can you give more information on where it dies? I''ve run mechanize > > successfully with multiple threads but I did have to work some kinks out, > > mostly with database access. > > > > Matt White > > > > > > ----- Original Message ---- > > From: Carl Lerche <carl.lerche at gmail.com> > > To: mechanize-users at rubyforge.org > > Sent: Friday, July 27, 2007 12:43:07 AM > > Subject: [Mechanize-users] Is mechanize thread safe? > > > > > > Hello all, > > > > I was just wondering if anybody knew whether mechanize is supposed to > > be thread-safe or not? I didn''t really find any information about it > > anywhere. I''ve been getting a strange error in protocol.rb when I run > > a script that uses mechanize in a multi threaded fashion, but not with > > a single thread. > > > > I''m trying to write a spider that does multiple gets in parallel, but > > it keeps puking when I thread it. > > > > Thanks, > > -carl > > > > -- > > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > ________________________________ > > Get the free Yahoo! toolbar and rest assured with the added security of > > spyware protection. > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > -- > EPA Rating: 3000 Lines of Code / Gallon (of coffee) >-- EPA Rating: 3000 Lines of Code / Gallon (of coffee)