I am trying to get a page which includes a form, but the form is missing from the WWW::Mechanize::Page object. I retrieve it via: page = web_agent.submit(a_different_form) For debugging this problem, I then immediately write the resulting page to two different logs: File.open(''big.html'',''wb'') { |f| f.write(page.body) } File.open(''little.html'',''wb'') { |f| f.write(page.root.to_html) } The results of these two methods (page.body vs. page.root.to_html) are dramatically different, with most of the page missing from the page.root.to_html version. In other words, the form appears in page.body, but not in page.root.to_html. Furthermore, page.body seems to have valid html, because I can do this from irb: f = File.open(''big.html'',''rb'') { |f| f.read } page = WWW::Mechanize::Page.new(nil, {''content-type''=>''text/html''}, f, 200) And that page works just fine -- the form is there. Any idea why the page retrieved from web_agent.submit, which apparently has the same body as the page created by hand, would nevertheless have two different element lists? Many, many thanks in advance for whatever guidance you can give. Aaron
Or, here''s the quick and exciting version of the same question. On a particular page, I consistently get this weird result: ??? page.root.to_html !WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''}, page.body, 200).root.to_html Isn''t that weird? And, exciting? Any idea how something like that could happen? Anyone? Say, Aaron Patterson? Aaron (Slow, breath-takingly boring version of the question:)> > I am trying to get a page which includes a form, but the form is > missing from the WWW::Mechanize::Page object. I retrieve it via: > > ? ?page = web_agent.submit(a_different_form) > > For debugging this problem, I then immediately write the resulting > page to two different logs: > > ? ?File.open(''big.html'',''wb'') { |f| f.write(page.body) } > ? ?File.open(''little.html'',''wb'') { |f| f.write(page.root.to_html) } > > The results of these two methods (page.body vs. page.root.to_html) are > dramatically different, with most of the page missing from the > page.root.to_html version. > > In other words, the form appears in page.body, but not in page.root.to_html. > > Furthermore, page.body seems to have valid html, because I can do this from irb: > > ? ?f = File.open(''big.html'',''rb'') { |f| f.read } > ? ?page = WWW::Mechanize::Page.new(nil, {''content-type''=>''text/html''}, f, 200) > > And that page works just fine -- the form is there. > > Any idea why the page retrieved from web_agent.submit, which > apparently has the same body as the page created by hand, would > nevertheless have two different element lists? > > Many, many thanks in advance for whatever guidance you can give. > > Aaron
I''m offline so can''t verify now, but this sounds like the problem that keeps coming up on this list lately: Quoting Anthony F: Setting the html_parser to the Nokogiri or Hpricot object (rather than the default Nokogiri::HTML) object worked for me, like so: WWW::Mechanize.html_parser = Nokogiri or WWW::Mechanize.html_parser = Hpricot Hope that helps. I''ll keep this in mind next time I have some downtime and see if I can get a patch together for Aaron. Have you tried cloning and installing his version from github? It might already have this issue fixed. -Mat On May 6, 2009, at 2:27 AM, Aaron Starr wrote:> Or, here''s the quick and exciting version of the same question. > > On a particular page, I consistently get this weird result: > > page.root.to_html !> WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''}, page.body, > 200).root.to_html > > Isn''t that weird? And, exciting? Any idea how something like that > could happen? Anyone? Say, Aaron Patterson? > > Aaron > > (Slow, breath-takingly boring version of the question:) >> >> I am trying to get a page which includes a form, but the form is >> missing from the WWW::Mechanize::Page object. I retrieve it via: >> >> page = web_agent.submit(a_different_form) >> >> For debugging this problem, I then immediately write the resulting >> page to two different logs: >> >> File.open(''big.html'',''wb'') { |f| f.write(page.body) } >> File.open(''little.html'',''wb'') { |f| f.write(page.root.to_html) } >> >> The results of these two methods (page.body vs. page.root.to_html) >> are >> dramatically different, with most of the page missing from the >> page.root.to_html version. >> >> In other words, the form appears in page.body, but not in >> page.root.to_html. >> >> Furthermore, page.body seems to have valid html, because I can do >> this from irb: >> >> f = File.open(''big.html'',''rb'') { |f| f.read } >> page = WWW::Mechanize::Page.new(nil, {''content-type''=>''text/ >> html''}, f, 200) >> >> And that page works just fine -- the form is there. >> >> Any idea why the page retrieved from web_agent.submit, which >> apparently has the same body as the page created by hand, would >> nevertheless have two different element lists? >> >> Many, many thanks in advance for whatever guidance you can give. >> >> Aaron > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users
Mat, Thank you so much for your response. I had temporarily worked around the issue by automatically taking each page as it''s returned and building a new page from it, passing the old page''s body. And that''s how sausage gets made, ladies and gentlemen. I''ll try the "WWW::Mechanize.html_parser = Nokogiri" temporary work-around -- that seems about eight-hundred times more clever. Sincere thanks, Aaron On Wed, May 6, 2009 at 5:44 AM, Mat Schaffer <mat.schaffer at gmail.com> wrote:> I''m offline so can''t verify now, but this sounds like the problem that > keeps coming up on this list lately: > > Quoting Anthony F: > > Setting the html_parser to the Nokogiri or Hpricot object (rather than the > default Nokogiri::HTML) object worked for me, like so: > > WWW::Mechanize.html_parser = Nokogiri > or > WWW::Mechanize.html_parser = Hpricot > > Hope that helps. I''ll keep this in mind next time I have some downtime and > see if I can get a patch together for Aaron. Have you tried cloning and > installing his version from github? It might already have this issue fixed. > > -Mat > > > > On May 6, 2009, at 2:27 AM, Aaron Starr wrote: > > Or, here''s the quick and exciting version of the same question. >> >> On a particular page, I consistently get this weird result: >> >> page.root.to_html !>> WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''}, page.body, >> 200).root.to_html >> >> Isn''t that weird? And, exciting? Any idea how something like that >> could happen? Anyone? Say, Aaron Patterson? >> >> Aaron >> >> (Slow, breath-takingly boring version of the question:) >> >>> >>> I am trying to get a page which includes a form, but the form is >>> missing from the WWW::Mechanize::Page object. I retrieve it via: >>> >>> page = web_agent.submit(a_different_form) >>> >>> For debugging this problem, I then immediately write the resulting >>> page to two different logs: >>> >>> File.open(''big.html'',''wb'') { |f| f.write(page.body) } >>> File.open(''little.html'',''wb'') { |f| f.write(page.root.to_html) } >>> >>> The results of these two methods (page.body vs. page.root.to_html) are >>> dramatically different, with most of the page missing from the >>> page.root.to_html version. >>> >>> In other words, the form appears in page.body, but not in >>> page.root.to_html. >>> >>> Furthermore, page.body seems to have valid html, because I can do this >>> from irb: >>> >>> f = File.open(''big.html'',''rb'') { |f| f.read } >>> page = WWW::Mechanize::Page.new(nil, {''content-type''=>''text/html''}, f, >>> 200) >>> >>> And that page works just fine -- the form is there. >>> >>> Any idea why the page retrieved from web_agent.submit, which >>> apparently has the same body as the page created by hand, would >>> nevertheless have two different element lists? >>> >>> Many, many thanks in advance for whatever guidance you can give. >>> >>> Aaron >>> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090506/9385a88d/attachment.html>