Justin Brinkerhoff
2010-Apr-13 06:07 UTC
[Mechanize-users] Form submit doesn''t behave correctly, when form action returns results on same page.
Hi,
I am stumped on how to proceed with this problem.
So I am building an application to scrape data off of the website,
biblegateway.com, to get Bible passages, that I can then export the
retrieved data to a file.
So I am just trying to get the behavior correct before I write the Ruby
script.
So here is what I do:
I fire up an irb console.
irb
I''ll declare the required Ruby libraries
require ''rubygems'' require ''mechanize''
I''ll then create a new object of the Mechanize class.
agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message
I''ll then tell it what page to scrape.
agent.get("http://www.biblegateway.com/passage")
I''ll then tell it to use the last form on the page, which is the one I
am
working with.
form = agent.page.forms.last
I''ll then find the name of the fields, and set their values
form.search1 = "John 3:16" form.version1 = "NKJV"
That is all the options needed to get the results, so then submit the form.
form.submit
Now technically speaking, the form does in fact submit. That''s not the
problem. The problem is, Mechanize is designed to render the results from a
new page to a new Mechanize::Page object.
But how they have their website setup, the same page is rendered with the
results then loaded on the page, and it uses a get method instead of a post
method, and the URL ends up looking like:
http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV<http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV>
So, what I need to know is, what do I need to do to render the same page in
a "get fashion" so to speak? The documentation is very difficult to
pick
apart, and I haven''t had much luck with Google...
Thank you in advance for the help.
Hi,
I am stumped on how to proceed with this problem.
So I am building an application to scrape data off of the website,
biblegateway.com, to get Bible passages, that I can then export the
retrieved data to a file.
So I am just trying to get the behavior correct before I write the Ruby
script.
So here is what I do:
I fire up an irb console.
irb
I''ll declare the required Ruby libraries
require ''rubygems'' require ''mechanize''
I''ll then create a new object of the Mechanize class.
agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message
I''ll then tell it what page to scrape.
agent.get("http://www.biblegateway.com/passage")
I''ll then tell it to use the last form on the page, which is the one I
am
working with.
form = agent.page.forms.last
I''ll then find the name of the fields, and set their values
form.search1 = "John 3:16" form.version1 = "NKJV"
That is all the options needed to get the results, so then submit the form.
form.submit
Now technically speaking, the form does in fact submit. That''s not the
problem. The problem is, Mechanize is designed to render the results from a
new page to a new Mechanize::Page object.
But how they have their website setup, the same page is rendered with the
results then loaded on the page, and it uses a get method instead of a post
method, and the URL ends up looking like:
http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV<http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV>
So, what I need to know is, what do I need to do to render the same page in
a "get fashion" so to speak? The documentation is very difficult to
pick
apart, and I haven''t had much luck with Google...
Thank you in advance for the help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20100413/4a46dd2a/attachment.html>