William Flanagan
2007-Jan-12 22:01 UTC
[Mechanize-users] Single method call to retrieve the entire page in HTML?
All, Another easy question. In Hpricot, on a doc that I am using, I can do a .to_html method and retrieve the entire page. However, this doesn''t seem to work in Mechanize. My goal is to the text of the page and put it into a database to make it searchable with ferret (using the acts_as_ferret plugin in Rails). Does anyone have a good suggestion short of iterating over the entire document and grabbing individual texts? Thanks, William
Aaron Patterson
2007-Jan-12 23:56 UTC
[Mechanize-users] Single method call to retrieve the entire page in HTML?
Hi William, On Fri, Jan 12, 2007 at 05:01:28PM -0500, William Flanagan wrote:> All, > > Another easy question. In Hpricot, on a doc that I am using, I can do a > .to_html method and retrieve the entire page. However, this doesn''t seem to > work in Mechanize.You can get the html in a page by calling "body" on the page object. For example: mech = WWW::Mechanize.new page = mech.get(''http://tenderlovemaking.com/'') puts page.body Mechanize uses Hpricot to parse the html. If there is functionality on Hpricot that you would like to use, you can get a hold of the parser from the page object by calling the "root" method: puts page.root.class> > My goal is to the text of the page and put it into a database to make it > searchable with ferret (using the acts_as_ferret plugin in Rails). Does > anyone have a good suggestion short of iterating over the entire document > and grabbing individual texts? > > Thanks, > > WilliamHope that helps! -- Aaron Patterson http://tenderlovemaking.com/