I''m planning to process a bunch of pages in parallel, and I''d love to do it with fibers and asynchronous web requests, something like http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/. It looks like Mechanize is built around Net::HTTP, which AFAIK, is synchronous only. Is there a way of mixing Eventmachine with Mechanize, or is it too closely tied into Net::HTTP? I''m diving into code, but I''m wondering if a) anyone''s tried this, or maybe b) it''s just crazy talk and I should use threads or something, or even c) I''m just totally missing some larger point. Thanks, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20100507/330ececc/attachment.html>
On Fri, May 7, 2010 at 1:14 AM, Isaac Cambron <icambron at alum.mit.edu> wrote:> I''m planning to process a bunch of pages in parallel, and I''d love to do it > with fibers and asynchronous web requests, something like > http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/. It > looks like Mechanize is built around Net::HTTP, which AFAIK, is synchronous > only. Is there a way of mixing Eventmachine with Mechanize, or is it too > closely tied into Net::HTTP? I''m diving into code, but I''m wondering if a) > anyone''s tried this, or maybe b) it''s just crazy talk and I should use > threads or something, or even c) I''m just totally missing some larger point. > >You should look at Paul Dix''s typhoeus, http://github.com/pauldix/typhoeus.> Thanks, > Isaac > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20100507/bd674dfa/attachment.html>
Thanks, I''ll check that out. It looks like the right approach for me is to log in with mechanize, manually add the cookies to Typhoeus, make my requests, and then do complex parsing with Nokigiri. Is that a non-crazy approach? On Fri, May 7, 2010 at 9:04 AM, Mike Dalessio <mike at csa.net> wrote:> > > On Fri, May 7, 2010 at 1:14 AM, Isaac Cambron <icambron at alum.mit.edu>wrote: > >> I''m planning to process a bunch of pages in parallel, and I''d love to do >> it with fibers and asynchronous web requests, something like >> http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/. It >> looks like Mechanize is built around Net::HTTP, which AFAIK, is synchronous >> only. Is there a way of mixing Eventmachine with Mechanize, or is it too >> closely tied into Net::HTTP? I''m diving into code, but I''m wondering if a) >> anyone''s tried this, or maybe b) it''s just crazy talk and I should use >> threads or something, or even c) I''m just totally missing some larger point. >> >> > You should look at Paul Dix''s typhoeus, http://github.com/pauldix/typhoeus > . > > >> Thanks, >> Isaac >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20100510/f2c87bed/attachment.html>
> I''m planning to process a bunch of pages in parallel, and I''d love to do it > with fibers and asynchronous web requests, something like > http://www.igvita.com/2010/03/22/untangling-evented-code-with- ruby-fibers/. > It looks like Mechanize is built around Net::HTTP, which AFAIK, is > synchronous only. Is there a way of mixing Eventmachine with Mechanize, or is > it too closely tied into Net::HTTP? I''m diving into code, but I''m wondering > if a) anyone''s tried this, or maybe b) it''s just crazy talk and I should use > threads or something, or even c) I''m just totally missing some larger point.If you''re using Ruby 1.9 (I assume so, since you''re talking about fibers), you might want to take a look at em-net-http (http://rubygems.org/gems/em-net-http), which I made just a few days ago. It patches Net::HTTP so that if it''s running inside EM''s reactor loop, it internally uses Fibers and em-http- request to process the request in a non-blocking fashion without any changes to calling code (aka the NeverBlock trick). The advantage of this approach is that it lets you continue using libraries like Mechanize (and rest- client, weary, right_aws, etc) that depend on Net::HTTP, while allowing you to achieve high concurrency, without any changes to your code or the library. Please note that it''s not exhaustively tested yet, and I haven''t actually tried it with Mechanize, so YMMV. (Insert standard plea for patches, tests and bug reports here.) :-) Thanks, James