Hi I apologize up front if this is a dumb question because I guess Ajax and thus Javascript is involved. Is there any way to capture the result of a submit if the current page is modified as result of the submit? For example, a couple of input fields, a submit and the result turns up in a modified <div> and which it looks like Mechanize doesn''t get. I hope I haven''t answered my own question! Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090325/c1a9e649/attachment.html>
If the page doesn''t refresh then javascript is involved. Of course, that''s not to say you couldn''t parse the javascript response in ruby and get the information you''re looking for. I''ve done it a lot with good results. I actually scripted most of the major webmail systems with mechanize a few years back and AOL''s webmail was the only javascript nut I couldn''t crack. -Mat On Mar 24, 2009, at 7:23 PM, Ross Cameron wrote:> Hi > > I apologize up front if this is a dumb question because I guess Ajax > and thus Javascript is involved. > > Is there any way to capture the result of a submit if the current > page is modified as result of the submit? > > For example, a couple of input fields, a submit and the result turns > up in a modified <div> and which it looks like Mechanize doesn''t get. > > I hope I haven''t answered my own question! > > Regards > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users
Hi Matt Many thanks. I sort of went and solved it in the case of a form GET method by scripting the full path for the form action. This wasn''t too difficult because the action url can be discovered by inspection. POST is somewhat more difficult but I assume there are ways of finding out what is passed and setting those. But what would be nicer, if you wouldn''t mind, is pointing me in the right direction to get at the JavaScript response - not sure how to do that. That would nail it. Regards Ross Mat Schaffer wrote:> If the page doesn''t refresh then javascript is involved. Of course, > that''s not to say you couldn''t parse the javascript response in ruby > and get the information you''re looking for. I''ve done it a lot with > good results. I actually scripted most of the major webmail systems > with mechanize a few years back and AOL''s webmail was the only > javascript nut I couldn''t crack. > -Mat > > On Mar 24, 2009, at 7:23 PM, Ross Cameron wrote: > >> Hi >> >> I apologize up front if this is a dumb question because I guess Ajax >> and thus Javascript is involved. >> >> Is there any way to capture the result of a submit if the current >> page is modified as result of the submit? >> >> For example, a couple of input fields, a submit and the result turns >> up in a modified <div> and which it looks like Mechanize doesn''t get. >> >> I hope I haven''t answered my own question! >> >> Regards >> >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users-- ------------------------------------------------------------------------ Ross Cameron | Director Roscommon Pty Ltd | ABN 85 099 499 840 p: +61 2 9016 4133 <callto:+61%202%209016%204133> | m: +61 4 3312 9087 <callto:+61%204%203312%209087> | f: +61 2 9420 4525 <callto:+61%202%209420%204525> | w: www.roscommonhq.com <http://www.roscommonhq.com> | AIM: rossppc Roscommon uses the five sentences <http://five.sentenc.es> email reply policy. Please consider our environment before printing this email. NOTE: This email and any attachments may be confidential. If received in error, please delete the email. Because emails and attachments may be interfered with, may contain computer viruses or other defects and may not be successfully replicated on other systems, you must be cautious. Roscommon cannot guarantee that what you receive is what we sent. If you have any doubts about the authenticity of an email from Roscommon, please contact us immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090325/abe5f7e6/attachment-0001.html>
On Tue, Mar 24, 2009 at 6:17 PM, Mat Schaffer <mat.schaffer at gmail.com> wrote:> If the page doesn''t refresh then javascript is involved. Of course, that''s > not to say you couldn''t parse the javascript response in ruby and get the > information you''re looking for. I''ve done it a lot with good results. I > actually scripted most of the major webmail systems with mechanize a few > years back and AOL''s webmail was the only javascript nut I couldn''t crack.I think a lot of people came up against the problem with scraping AOL webmail. They had an edgecase for URL formatting that Mechanize was handling a bit differently than a real web browser. Here''s the duck punch on WWW::Mechanize::to_absolute_uri that can be used to scrape on AOL webmail properly. http://github.com/contentfree/blackbook/blob/ca9d90ff1be576bdbb42a1c6b81940d81840ed9d/lib/blackbook/importer/page_scraper.rb Mike> -Mat > > On Mar 24, 2009, at 7:23 PM, Ross Cameron wrote: > >> Hi >> >> I apologize up front if this is a dumb question because I guess Ajax and >> thus Javascript is involved. >> >> Is there any way to capture the result of a submit if the current page is >> modified as result of the submit? >> >> For example, a couple of input fields, a submit and the result turns up in >> a modified <div> and which it looks like Mechanize doesn''t get. >> >> I hope I haven''t answered my own question! >> >> Regards >> >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >
Mike Most helpful. And a very elegant solution to the mechanize uri problem. Regards Ross Mike Mondragon wrote:> On Tue, Mar 24, 2009 at 6:17 PM, Mat Schaffer <mat.schaffer at gmail.com> wrote: > >> If the page doesn''t refresh then javascript is involved. Of course, that''s >> not to say you couldn''t parse the javascript response in ruby and get the >> information you''re looking for. I''ve done it a lot with good results. I >> actually scripted most of the major webmail systems with mechanize a few >> years back and AOL''s webmail was the only javascript nut I couldn''t crack. >> > > I think a lot of people came up against the problem with scraping AOL > webmail. They had an edgecase for URL formatting that Mechanize was > handling a bit differently than a real web browser. Here''s the duck > punch on WWW::Mechanize::to_absolute_uri that can be used to scrape on > AOL webmail properly. > > http://github.com/contentfree/blackbook/blob/ca9d90ff1be576bdbb42a1c6b81940d81840ed9d/lib/blackbook/importer/page_scraper.rb > > Mike > > >> -Mat >> >> On Mar 24, 2009, at 7:23 PM, Ross Cameron wrote: >> >> >>> Hi >>> >>> I apologize up front if this is a dumb question because I guess Ajax and >>> thus Javascript is involved. >>> >>> Is there any way to capture the result of a submit if the current page is >>> modified as result of the submit? >>> >>> For example, a couple of input fields, a submit and the result turns up in >>> a modified <div> and which it looks like Mechanize doesn''t get. >>> >>> I hope I haven''t answered my own question! >>> >>> Regards >>> >>> >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> >> > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users >-- ------------------------------------------------------------------------ Ross Cameron | Director Roscommon Pty Ltd | ABN 85 099 499 840 p: +61 2 9016 4133 <callto:+61%202%209016%204133> | m: +61 4 3312 9087 <callto:+61%204%203312%209087> | f: +61 2 9420 4525 <callto:+61%202%209420%204525> | w: www.roscommonhq.com <http://www.roscommonhq.com> | AIM: rossppc Roscommon uses the five sentences <http://five.sentenc.es> email reply policy. Please consider our environment before printing this email. NOTE: This email and any attachments may be confidential. If received in error, please delete the email. Because emails and attachments may be interfered with, may contain computer viruses or other defects and may not be successfully replicated on other systems, you must be cautious. Roscommon cannot guarantee that what you receive is what we sent. If you have any doubts about the authenticity of an email from Roscommon, please contact us immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090325/cd41c5e0/attachment.html>
On Mar 24, 2009, at 11:35 PM, Mike Mondragon wrote:> I think a lot of people came up against the problem with scraping AOL > webmail. They had an edgecase for URL formatting that Mechanize was > handling a bit differently than a real web browser. Here''s the duck > punch on WWW::Mechanize::to_absolute_uri that can be used to scrape on > AOL webmail properly. > > http://github.com/contentfree/blackbook/blob/ca9d90ff1be576bdbb42a1c6b81940d81840ed9d/lib/blackbook/importer/page_scraper.rb > > Mikeha! Nice one, man. Sadly the project I was doing it for is long gone, but thanks for this lovely gem. I''ll sure be bookmarking this for later! -Mat
On Mar 24, 2009, at 11:29 PM, Ross Cameron wrote:> Hi Matt > > Many thanks. I sort of went and solved it in the case of a form GET > method by scripting the full path for the form action. This wasn''t > too difficult because the action url can be discovered by > inspection. POST is somewhat more difficult but I assume there are > ways of finding out what is passed and setting those. > > But what would be nicer, if you wouldn''t mind, is pointing me in the > right direction to get at the JavaScript response - not sure how to > do that. That would nail it.I often use Charles in these situations (http:// www.charlesproxy.com/). There are other options too like TamperData or Fiddler for windows, but charles feels a bit more organized/reliable and usually the 30 minute time limit is enough to get simple jobs done. Once you''ve figured out the right request, the response can be obtained from #body in mechanize like usual. -Mat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090325/afb5c32b/attachment.html>