Justin Brinkerhoff
2010-Apr-13 06:07 UTC
[Mechanize-users] Form submit doesn''t behave correctly, when form action returns results on same page.
Hi, I am stumped on how to proceed with this problem. So I am building an application to scrape data off of the website, biblegateway.com, to get Bible passages, that I can then export the retrieved data to a file. So I am just trying to get the behavior correct before I write the Ruby script. So here is what I do: I fire up an irb console. irb I''ll declare the required Ruby libraries require ''rubygems'' require ''mechanize'' I''ll then create a new object of the Mechanize class. agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message I''ll then tell it what page to scrape. agent.get("http://www.biblegateway.com/passage") I''ll then tell it to use the last form on the page, which is the one I am working with. form = agent.page.forms.last I''ll then find the name of the fields, and set their values form.search1 = "John 3:16" form.version1 = "NKJV" That is all the options needed to get the results, so then submit the form. form.submit Now technically speaking, the form does in fact submit. That''s not the problem. The problem is, Mechanize is designed to render the results from a new page to a new Mechanize::Page object. But how they have their website setup, the same page is rendered with the results then loaded on the page, and it uses a get method instead of a post method, and the URL ends up looking like: http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV<http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV> So, what I need to know is, what do I need to do to render the same page in a "get fashion" so to speak? The documentation is very difficult to pick apart, and I haven''t had much luck with Google... Thank you in advance for the help. Hi, I am stumped on how to proceed with this problem. So I am building an application to scrape data off of the website, biblegateway.com, to get Bible passages, that I can then export the retrieved data to a file. So I am just trying to get the behavior correct before I write the Ruby script. So here is what I do: I fire up an irb console. irb I''ll declare the required Ruby libraries require ''rubygems'' require ''mechanize'' I''ll then create a new object of the Mechanize class. agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message I''ll then tell it what page to scrape. agent.get("http://www.biblegateway.com/passage") I''ll then tell it to use the last form on the page, which is the one I am working with. form = agent.page.forms.last I''ll then find the name of the fields, and set their values form.search1 = "John 3:16" form.version1 = "NKJV" That is all the options needed to get the results, so then submit the form. form.submit Now technically speaking, the form does in fact submit. That''s not the problem. The problem is, Mechanize is designed to render the results from a new page to a new Mechanize::Page object. But how they have their website setup, the same page is rendered with the results then loaded on the page, and it uses a get method instead of a post method, and the URL ends up looking like: http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV<http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV> So, what I need to know is, what do I need to do to render the same page in a "get fashion" so to speak? The documentation is very difficult to pick apart, and I haven''t had much luck with Google... Thank you in advance for the help. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20100413/4a46dd2a/attachment.html>