I am trying to get a page which includes a form, but the form is
missing from the WWW::Mechanize::Page object. I retrieve it via:
page = web_agent.submit(a_different_form)
For debugging this problem, I then immediately write the resulting
page to two different logs:
File.open(''big.html'',''wb'') { |f|
f.write(page.body) }
File.open(''little.html'',''wb'') { |f|
f.write(page.root.to_html) }
The results of these two methods (page.body vs. page.root.to_html) are
dramatically different, with most of the page missing from the
page.root.to_html version.
In other words, the form appears in page.body, but not in page.root.to_html.
Furthermore, page.body seems to have valid html, because I can do this from irb:
f = File.open(''big.html'',''rb'') { |f|
f.read }
page = WWW::Mechanize::Page.new(nil,
{''content-type''=>''text/html''}, f, 200)
And that page works just fine -- the form is there.
Any idea why the page retrieved from web_agent.submit, which
apparently has the same body as the page created by hand, would
nevertheless have two different element lists?
Many, many thanks in advance for whatever guidance you can give.
Aaron
Or, here''s the quick and exciting version of the same question.
On a particular page, I consistently get this weird result:
??? page.root.to_html
!WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''},
page.body,
200).root.to_html
Isn''t that weird? And, exciting? Any idea how something like that
could happen? Anyone? Say, Aaron Patterson?
Aaron
(Slow, breath-takingly boring version of the question:)>
> I am trying to get a page which includes a form, but the form is
> missing from the WWW::Mechanize::Page object. I retrieve it via:
>
> ? ?page = web_agent.submit(a_different_form)
>
> For debugging this problem, I then immediately write the resulting
> page to two different logs:
>
> ? ?File.open(''big.html'',''wb'') { |f|
f.write(page.body) }
> ? ?File.open(''little.html'',''wb'') { |f|
f.write(page.root.to_html) }
>
> The results of these two methods (page.body vs. page.root.to_html) are
> dramatically different, with most of the page missing from the
> page.root.to_html version.
>
> In other words, the form appears in page.body, but not in
page.root.to_html.
>
> Furthermore, page.body seems to have valid html, because I can do this from
irb:
>
> ? ?f = File.open(''big.html'',''rb'') { |f|
f.read }
> ? ?page = WWW::Mechanize::Page.new(nil,
{''content-type''=>''text/html''}, f, 200)
>
> And that page works just fine -- the form is there.
>
> Any idea why the page retrieved from web_agent.submit, which
> apparently has the same body as the page created by hand, would
> nevertheless have two different element lists?
>
> Many, many thanks in advance for whatever guidance you can give.
>
> Aaron
I''m offline so can''t verify now, but this sounds like the
problem that
keeps coming up on this list lately:
Quoting Anthony F:
Setting the html_parser to the Nokogiri or Hpricot object (rather than
the default Nokogiri::HTML) object worked for me, like so:
WWW::Mechanize.html_parser = Nokogiri
or
WWW::Mechanize.html_parser = Hpricot
Hope that helps. I''ll keep this in mind next time I have some downtime
and see if I can get a patch together for Aaron. Have you tried
cloning and installing his version from github? It might already have
this issue fixed.
-Mat
On May 6, 2009, at 2:27 AM, Aaron Starr wrote:
> Or, here''s the quick and exciting version of the same question.
>
> On a particular page, I consistently get this weird result:
>
> page.root.to_html !>
WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''},
page.body,
> 200).root.to_html
>
> Isn''t that weird? And, exciting? Any idea how something like that
> could happen? Anyone? Say, Aaron Patterson?
>
> Aaron
>
> (Slow, breath-takingly boring version of the question:)
>>
>> I am trying to get a page which includes a form, but the form is
>> missing from the WWW::Mechanize::Page object. I retrieve it via:
>>
>> page = web_agent.submit(a_different_form)
>>
>> For debugging this problem, I then immediately write the resulting
>> page to two different logs:
>>
>> File.open(''big.html'',''wb'') { |f|
f.write(page.body) }
>> File.open(''little.html'',''wb'') {
|f| f.write(page.root.to_html) }
>>
>> The results of these two methods (page.body vs. page.root.to_html)
>> are
>> dramatically different, with most of the page missing from the
>> page.root.to_html version.
>>
>> In other words, the form appears in page.body, but not in
>> page.root.to_html.
>>
>> Furthermore, page.body seems to have valid html, because I can do
>> this from irb:
>>
>> f = File.open(''big.html'',''rb'') {
|f| f.read }
>> page = WWW::Mechanize::Page.new(nil,
{''content-type''=>''text/
>> html''}, f, 200)
>>
>> And that page works just fine -- the form is there.
>>
>> Any idea why the page retrieved from web_agent.submit, which
>> apparently has the same body as the page created by hand, would
>> nevertheless have two different element lists?
>>
>> Many, many thanks in advance for whatever guidance you can give.
>>
>> Aaron
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
Mat, Thank you so much for your response. I had temporarily worked around the issue by automatically taking each page as it''s returned and building a new page from it, passing the old page''s body. And that''s how sausage gets made, ladies and gentlemen. I''ll try the "WWW::Mechanize.html_parser = Nokogiri" temporary work-around -- that seems about eight-hundred times more clever. Sincere thanks, Aaron On Wed, May 6, 2009 at 5:44 AM, Mat Schaffer <mat.schaffer at gmail.com> wrote:> I''m offline so can''t verify now, but this sounds like the problem that > keeps coming up on this list lately: > > Quoting Anthony F: > > Setting the html_parser to the Nokogiri or Hpricot object (rather than the > default Nokogiri::HTML) object worked for me, like so: > > WWW::Mechanize.html_parser = Nokogiri > or > WWW::Mechanize.html_parser = Hpricot > > Hope that helps. I''ll keep this in mind next time I have some downtime and > see if I can get a patch together for Aaron. Have you tried cloning and > installing his version from github? It might already have this issue fixed. > > -Mat > > > > On May 6, 2009, at 2:27 AM, Aaron Starr wrote: > > Or, here''s the quick and exciting version of the same question. >> >> On a particular page, I consistently get this weird result: >> >> page.root.to_html !>> WWW::Mechanize::Page.new(nil,{''content-type''=>''text/html''}, page.body, >> 200).root.to_html >> >> Isn''t that weird? And, exciting? Any idea how something like that >> could happen? Anyone? Say, Aaron Patterson? >> >> Aaron >> >> (Slow, breath-takingly boring version of the question:) >> >>> >>> I am trying to get a page which includes a form, but the form is >>> missing from the WWW::Mechanize::Page object. I retrieve it via: >>> >>> page = web_agent.submit(a_different_form) >>> >>> For debugging this problem, I then immediately write the resulting >>> page to two different logs: >>> >>> File.open(''big.html'',''wb'') { |f| f.write(page.body) } >>> File.open(''little.html'',''wb'') { |f| f.write(page.root.to_html) } >>> >>> The results of these two methods (page.body vs. page.root.to_html) are >>> dramatically different, with most of the page missing from the >>> page.root.to_html version. >>> >>> In other words, the form appears in page.body, but not in >>> page.root.to_html. >>> >>> Furthermore, page.body seems to have valid html, because I can do this >>> from irb: >>> >>> f = File.open(''big.html'',''rb'') { |f| f.read } >>> page = WWW::Mechanize::Page.new(nil, {''content-type''=>''text/html''}, f, >>> 200) >>> >>> And that page works just fine -- the form is there. >>> >>> Any idea why the page retrieved from web_agent.submit, which >>> apparently has the same body as the page created by hand, would >>> nevertheless have two different element lists? >>> >>> Many, many thanks in advance for whatever guidance you can give. >>> >>> Aaron >>> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090506/9385a88d/attachment.html>