Jack Royal-Gordon
2012-Jan-15 02:45 UTC
[Mechanize-users] Reload previously retrieved page
I''m a newbie with Mechanize, and I have what seems like a simple question: I''m trying to save a page that has been loaded and then reload it later for further processing. I see that Mechanize has a method parse(uri, response, body) that (I presume) can load the DOM without physically accessing the site over the internet. If I save the URI and body from a previous Mechanize.get(), how do I get the "response" from the page that is also required in order to reload the page? Or is there a better way to save and restore the page without incurring the latency of a second internet access? I''ve also tried Mechanize::Page.new() but I have the same problem in that I need a "response" for that call (nil gives an error).
Hey Jack,
Mechanize uses Nokogiri <http://nokogiri.org/> to parse HTML. You can use
Nokogiri directly to parse HTML or XML from a string or
file<http://nokogiri.org/tutorials/parsing_an_html_xml_document.html>.
However, this returns a Nokogiri::HTML::Document object, while Mechanize
returns a Mechanize::Page object - the Nokogiri document is the same
information that you would get by calling page.root. This is usually all
that you need, but it lacks the convenience methods that a
Mechanize::Pageaffords.
To recreate a Mechanize::Page, use the ::new
method<http://mechanize.rubyforge.org/Mechanize/Page.html#method-c-new>(note:
I haven''t fully tested this):
agent = Mechanize.new
content = ... # your content as a string - you could read from a file here.
# Arguments: (uri, response headers, body, response code, mechanize agent)
# The ones I''ve left nil seemed to not be required, though leaving them
out
may affect other methods.
page = Mechanize::Page.new(nil, {''content-type'' =>
''text/html''}, content,
nil, agent)
As you can tell, this is a bit more complicated than simply running
Nokogiri::HTML(content), but it gives you all the Mechanize::Page methods.
Ben
On Sat, Jan 14, 2012 at 9:45 PM, Jack Royal-Gordon <
jack at groundbreakingsoftware.com> wrote:
> I''m a newbie with Mechanize, and I have what seems like a simple
question:
> I''m trying to save a page that has been loaded and then reload it
later for
> further processing. I see that Mechanize has a method parse(uri, response,
> body) that (I presume) can load the DOM without physically accessing the
> site over the internet. If I save the URI and body from a previous
> Mechanize.get(), how do I get the "response" from the page that
is also
> required in order to reload the page? Or is there a better way to save and
> restore the page without incurring the latency of a second internet access?
> I''ve also tried Mechanize::Page.new() but I have the same problem
in that
> I need a "response" for that call (nil gives an error).
>
>
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
>
--
Benjamin Manns
benmanns at gmail.com
(434) 321-8324
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20120117/3d8ee574/attachment.html>