That solves my problem - I somehow missed that in the docs. Thanks
for your help.
-Chris
That helps somewhat. Now, when I parse a local copy of the html page,
I get everything by using the page.body command.
The problem is, that when I try to retrieve the data from the server,
the page.body command cuts off at the original point - right before
the second <html> tag.
The html file in question can be found here:
http://chrisamiller.com/temp.html
Hrmm... maybe some of the page is being written out by javascript. If
that''s the case, mechanize won''t be able to deal with it,
right?
-Chris
On Thu, May 7, 2009 at 8:31 PM, Aaron Starr <astarr at wiredquote.com>
wrote:> page.body should have the raw, original
> text-that-kind-of-reminds-us-of-html, does it not?
>
>
> On Thu, May 7, 2009 at 6:01 PM, Chris Miller <chrisamiller at
gmail.com> wrote:
>>
>> I''m using Mechanize to parse an extraordinarily malformed html
page.
>>
>> After submitting a form like so:
>> ? page = mech.submit(dform)
>>
>> The result I get back is truncated. ?I suspect that it''s
because the
>> source HTML looks like this:
>>
>> <html>
>> <head> yadda yadda</head>
>> ? ? <p>some text</p>
>>
>> ? ?<html>
>> ? ?<table yadda yadda>
>>
>>
>> My ''page'' variable contains only the data that occurs
before the
>> second <html> tag.
>>
>> Am I right in suspecting that this is the cause of my problems? ?Are
>> there any work-arounds that will enable me to grab all of the text,
>> even if it can''t be parsed sanely?
>>
>> Thanks,
>>
>> Chris Miller
>> chrisamiller at gmail.com
>> _______________________________________________
>> Mechanize-users mailing list
>> Mechanize-users at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/mechanize-users
>
>
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
>