thr3ads.net - Mechanize users - [Mechanize-users] Single method call to retrieve the entire page in HTML? [Jan 2007]

If this information is useful, please help other people find it:
Share via:

William Flanagan

2007-Jan-12 22:01 UTC

[Mechanize-users] Single method call to retrieve the entire page in HTML?

All,

Another easy question.  In Hpricot, on a doc that I am using, I can do a
.to_html method and retrieve the entire page.  However, this doesn''t
seem to
work in Mechanize. 

My goal is to the text of the page and put it into a database to make it
searchable with ferret (using the acts_as_ferret plugin in Rails).  Does
anyone have a good suggestion short of iterating over the entire document
and grabbing individual texts?

Thanks,

William

Aaron Patterson

2007-Jan-12 23:56 UTC

head link

[Mechanize-users] Single method call to retrieve the entire page in HTML?

Hi William,

On Fri, Jan 12, 2007 at 05:01:28PM -0500, William Flanagan
wrote:> All,
> 
> Another easy question.  In Hpricot, on a doc that I am using, I can do a
> .to_html method and retrieve the entire page.  However, this
doesn''t seem to
> work in Mechanize. 
You can get the html in a page by calling "body" on the page object.
For example:

  mech = WWW::Mechanize.new
  page = mech.get(''http://tenderlovemaking.com/'')
  puts page.body

Mechanize uses Hpricot to parse the html.  If there is functionality
on Hpricot that you would like to use, you can get a hold of the parser
from the page object by calling the "root" method:

  puts page.root.class
> 
> My goal is to the text of the page and put it into a database to make it
> searchable with ferret (using the acts_as_ferret plugin in Rails).  Does
> anyone have a good suggestion short of iterating over the entire document
> and grabbing individual texts?
> 
> Thanks,
> 
> William
Hope that helps!

-- 
Aaron Patterson
http://tenderlovemaking.com/

Seemingly Similar Threads

Search for more maybe matching threads

Mechanize users - Jan 2007 - Single method call to retrieve the entire page in HTML?

[Mechanize-users] Single method call to retrieve the entire page in HTML?

[Mechanize-users] Single method call to retrieve the entire page in HTML?

Seemingly Similar Threads