The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? _________________________________________________________________ Experience all of the new features, and Reconnect with your life. http://go.microsoft.com/?linkid=9650730 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090326/66e79588/attachment.html>
Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first) >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body => "<html>\r\n<head>\r\n<!-- Copyright \251 2001, 2002 OGMA Consulting Corp. -->\r\n<title>Re-directing to BC Bid...</title>\r\n</ head>\r\n<body bgcolor="#FFFFFF" onLoad="document.location=''http://www.bcbid.gov.bc.ca/open.dll/welcome'' ">\r\nIf this page does not automatically re-direct you to BC Bid<sup>®</sup>,<br>\r\nplease <a href="http://www.bcbid.gov.bc.ca/open.dll/welcome ">click here</a>.\r\n</body>\r\n</html>" On Mar 26, 2009, at 6:30 PM, Anthony F wrote:> > The site is: http://www.bcbid.gov.bc.ca > > It''s a weird, complicated piece of crap full of frames and cookies > and all sorts of god-awful javascript navigation. However, before I > even get into that stuff I can''t even get the site to open in > Mechanize. Can anyone else get this working, or is it just me? > > Make your Messenger window look the way you want. Express Yourself! > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090326/218fca63/attachment-0001.html>
Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it properly (ie. frames => nil, link => nil, etc). What version are you using? Mat Schaffer wrote: Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first) >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body Re-directing to BC Bid... => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" On Mar 26, 2009, at 6:30 PM, Anthony F wrote: The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? Make your Messenger window look the way you want. Express Yourself! _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _________________________________________________________________ Reunite with the people closest to you, chat face to face with Messenger. http://go.microsoft.com/?linkid=9650736 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/f8a8ad6b/attachment.html>
My previous was 0.9.0, but it works for me with 0.9.2 as well: >> require ''mechanize'' => true >> WWW::Mechanize::VERSION => "0.9.2" >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body => "<html>\r\n<head>\r\n<!-- Copyright \357\275\251 2001, 2002 OGMA Consulting Corp. -->\r\n<title>Re-directing to BC Bid...</title>\r\n</ head>\r\n<body bgcolor="#FFFFFF" onLoad="document.location=''http://www.bcbid.gov.bc.ca/open.dll/welcome'' ">\r\nIf this page does not automatically re-direct you to BC Bid<sup>®</sup>,<br>\r\nplease <a href="http://www.bcbid.gov.bc.ca/open.dll/welcome ">click here</a>.\r\n</body>\r\n</html>" I don''t see any frames here. Do you maybe have a transparent web proxy where you are? What does your response look like? You might want to check using curl too. -Mat On Mar 27, 2009, at 4:02 AM, Anthony F wrote:> Interesting. I tried it with mechanize 0.8.5 and it seemed to work > fine. With 0.9.2 it opens the page, but doesn''t seem to parse it > properly (ie. frames => nil, link => nil, etc). What version are > you using? > > Mat Schaffer wrote: >> >> Loads for me, but it''s also got a javascript redirect in there. >> You''ll have to do that yourself with something like >> agent.click(page.links.first) >> >> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/ >> welcome''">\r\nIf this page does not automatically re-direct you to >> BC Bid?, >> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here. >> \r\n\r\n" >> >> >> On Mar 26, 2009, at 6:30 PM, Anthony F wrote: >> >>> >>> The site is: http://www.bcbid.gov.bc.ca >>> >>> It''s a weird, complicated piece of crap full of frames and cookies >>> and all sorts of god-awful javascript navigation. However, before >>> I even get into that stuff I can''t even get the site to open in >>> Mechanize. Can anyone else get this working, or is it just me? >>> >>> Make your Messenger window look the way you want. Express >>> Yourself! _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > Messenger has tons of new features that make chatting more fun. > Click here to learn more. > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/d8234a51/attachment.html>
I should be more clear. Your code also works for me as is in 0.9.2. However, if I do this: irb(main):001:0> require ''mechanize'' => true irb(main):002:0> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'') => #<WWW::Mechanize::Page {url #<URI::HTTP:0x1bd1504 URL:http://www.bcbid.gov.bc.ca/>} {meta} {title nil} {iframes} {frames} {links} {forms}> The problem I''m having is title => nil, links => [], etc. I can''t actually do anything with the page other than get the body. And when I say frames is empty I mean that when I try to parse the redirected page (http://www.bcbid.gov.bc.ca/open.dll/welcome) it comes up empty as well even though I can do a page.body successfully. I had it working with 0.8.5 last night, but now that doesn''t work anymore either. I''m baffled. Mat Schaffer wrote: My previous was 0.9.0, but it works for me with 0.9.2 as well: >> require ''mechanize'' => true >> WWW::Mechanize::VERSION => "0.9.2" >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body Re-directing to BC Bid... => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" I don''t see any frames here. Do you maybe have a transparent web proxy where you are? What does your response look like? You might want to check using curl too. -Mat On Mar 27, 2009, at 4:02 AM, Anthony F wrote: Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it properly (ie. frames => nil, link => nil, etc). What version are you using? Mat Schaffer wrote: Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first) >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" On Mar 26, 2009, at 6:30 PM, Anthony F wrote: The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? Make your Messenger window look the way you want. Express Yourself! _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users Messenger has tons of new features that make chatting more fun. Click here to learn more. _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _________________________________________________________________ Chat with the whole group, and bring everyone together. http://go.microsoft.com/?linkid=9650735 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/12200925/attachment-0001.html>
Errr... I take that back. I can still get it to work in 0.8.5, but not 0.9.2. If I understand correctly the parser changed between those versions? That''s probably the issue... Mat Schaffer wrote: My previous was 0.9.0, but it works for me with 0.9.2 as well: >> require ''mechanize'' => true >> WWW::Mechanize::VERSION => "0.9.2" >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body Re-directing to BC Bid... => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" I don''t see any frames here. Do you maybe have a transparent web proxy where you are? What does your response look like? You might want to check using curl too. -Mat On Mar 27, 2009, at 4:02 AM, Anthony F wrote: Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it properly (ie. frames => nil, link => nil, etc). What version are you using? Mat Schaffer wrote: Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first) >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" On Mar 26, 2009, at 6:30 PM, Anthony F wrote: The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? Make your Messenger window look the way you want. Express Yourself! _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users Messenger has tons of new features that make chatting more fun. Click here to learn more. _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _________________________________________________________________ Share photos with friends on Windows Live Messenger http://go.microsoft.com/?linkid=9650734 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/529cba98/attachment.html>
YES!!! That was it. When I switch the parser to Hpricot all is well. Thanks for the help, Mat! Now I''m off to scrape this god-awful website... A F wrote: Errr... I take that back. I can still get it to work in 0.8.5, but not 0.9.2. If I understand correctly the parser changed between those versions? That''s probably the issue... Mat Schaffer wrote: My previous was 0.9.0, but it works for me with 0.9.2 as well: >> require ''mechanize'' => true >> WWW::Mechanize::VERSION => "0.9.2" >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body Re-directing to BC Bid... => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" I don''t see any frames here. Do you maybe have a transparent web proxy where you are? What does your response look like? You might want to check using curl too. -Mat On Mar 27, 2009, at 4:02 AM, Anthony F wrote: Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it properly (ie. frames => nil, link => nil, etc). What version are you using? Mat Schaffer wrote: Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first) >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?, \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" On Mar 26, 2009, at 6:30 PM, Anthony F wrote: The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? Make your Messenger window look the way you want. Express Yourself! _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users Messenger has tons of new features that make chatting more fun. Click here to learn more. _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _________________________________________________________________ Reunite with the people closest to you, chat face to face with Messenger. http://go.microsoft.com/?linkid=9650736 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/18579999/attachment.html>
Cool. I dunno if Aaron''s on this or not, but it might be good to figure out why nokogiri can''t parse that page. Here''s a file captured with: File.open(''response.html'', ''w'') { |f| f.print WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body } I may play with it myself this weekend, but if maybe Aaron will beat me to it. Thanks for finding the bug Anthony! -Mat On Mar 27, 2009, at 12:29 PM, Anthony F wrote:> YES!!! That was it. When I switch the parser to Hpricot all is > well. Thanks for the help, Mat! > > Now I''m off to scrape this god-awful website... > > A F wrote: >> >> Errr... I take that back. I can still get it to work in 0.8.5, but >> not 0.9.2. If I understand correctly the parser changed between >> those versions? That''s probably the issue... >> >> Mat Schaffer wrote: >>> >>> My previous was 0.9.0, but it works for me with 0.9.2 as well: >>> >>> >> require ''mechanize'' >>> => true >>> >> WWW::Mechanize::VERSION >>> => "0.9.2" >>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/ >>> welcome''">\r\nIf this page does not automatically re-direct you to >>> BC Bid?, >>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here. >>> \r\n\r\n" >>> >>> I don''t see any frames here. Do you maybe have a transparent web >>> proxy where you are? What does your response look like? You might >>> want to check using curl too. >>> -Mat >>> >>> On Mar 27, 2009, at 4:02 AM, Anthony F wrote: >>> >>>> Interesting. I tried it with mechanize 0.8.5 and it seemed to >>>> work fine. With 0.9.2 it opens the page, but doesn''t seem to >>>> parse it properly (ie. frames => nil, link => nil, etc). What >>>> version are you using? >>>> >>>> Mat Schaffer wrote: >>>>> >>>>> Loads for me, but it''s also got a javascript redirect in there. >>>>> You''ll have to do that yourself with something like >>>>> agent.click(page.links.first) >>>>> >>>>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>>>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome'' >>>>> ">\r\nIf this page does not automatically re-direct you to BC >>>>> Bid?, >>>>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click >>>>> here.\r\n\r\n" >>>>> >>>>> >>>>> On Mar 26, 2009, at 6:30 PM, Anthony F wrote: >>>>> >>>>>> >>>>>> The site is: http://www.bcbid.gov.bc.ca >>>>>> >>>>>> It''s a weird, complicated piece of crap full of frames and >>>>>> cookies and all sorts of god-awful javascript navigation. >>>>>> However, before I even get into that stuff I can''t even get the >>>>>> site to open in Mechanize. Can anyone else get this working, or >>>>>> is it just me? >>>>>> >>>>>> Make your Messenger window look the way you want. Express >>>>>> Yourself! _______________________________________________ >>>>>> Mechanize-users mailing list >>>>>> Mechanize-users at rubyforge.org >>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mechanize-users mailing list >>>>> Mechanize-users at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>> >>>> >>>> Messenger has tons of new features that make chatting more fun. >>>> Click here to learn more. >>>> _______________________________________________ >>>> Mechanize-users mailing list >>>> Mechanize-users at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > > Messenger has tons of new features that make chatting more fun. > Click here to learn more. > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0003.html> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0004.html> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0005.html>
Just noticed this: <!-- Copyright ? 2001, 2002 OGMA Consulting Corp. --> Looks like there''s a UTF-8 copyright symbol or something that might be throwing things off. Especially because the server doesn''t appear to mark it as UTF-8 in the headers. -Mat On Mar 27, 2009, at 2:12 PM, Mat Schaffer wrote:> Cool. I dunno if Aaron''s on this or not, but it might be good to > figure out why nokogiri can''t parse that page. > > Here''s a file captured with: File.open(''response.html'', ''w'') { |f| > f.print WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body } > > I may play with it myself this weekend, but if maybe Aaron will beat > me to it. > > Thanks for finding the bug Anthony! > -Mat > > > <response.html> > > > On Mar 27, 2009, at 12:29 PM, Anthony F wrote: > >> YES!!! That was it. When I switch the parser to Hpricot all is >> well. Thanks for the help, Mat! >> >> Now I''m off to scrape this god-awful website... >> >> A F wrote: >>> >>> Errr... I take that back. I can still get it to work in 0.8.5, >>> but not 0.9.2. If I understand correctly the parser changed >>> between those versions? That''s probably the issue... >>> >>> Mat Schaffer wrote: >>>> >>>> My previous was 0.9.0, but it works for me with 0.9.2 as well: >>>> >>>> >> require ''mechanize'' >>>> => true >>>> >> WWW::Mechanize::VERSION >>>> => "0.9.2" >>>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome'' >>>> ">\r\nIf this page does not automatically re-direct you to BC >>>> Bid?, >>>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click >>>> here.\r\n\r\n" >>>> >>>> I don''t see any frames here. Do you maybe have a transparent web >>>> proxy where you are? What does your response look like? You might >>>> want to check using curl too. >>>> -Mat >>>> >>>> On Mar 27, 2009, at 4:02 AM, Anthony F wrote: >>>> >>>>> Interesting. I tried it with mechanize 0.8.5 and it seemed to >>>>> work fine. With 0.9.2 it opens the page, but doesn''t seem to >>>>> parse it properly (ie. frames => nil, link => nil, etc). What >>>>> version are you using? >>>>> >>>>> Mat Schaffer wrote: >>>>>> >>>>>> Loads for me, but it''s also got a javascript redirect in there. >>>>>> You''ll have to do that yourself with something like >>>>>> agent.click(page.links.first) >>>>>> >>>>>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>>>>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome'' >>>>>> ">\r\nIf this page does not automatically re-direct you to BC >>>>>> Bid?, >>>>>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click >>>>>> here.\r\n\r\n" >>>>>> >>>>>> >>>>>> On Mar 26, 2009, at 6:30 PM, Anthony F wrote: >>>>>> >>>>>>> >>>>>>> The site is: http://www.bcbid.gov.bc.ca >>>>>>> >>>>>>> It''s a weird, complicated piece of crap full of frames and >>>>>>> cookies and all sorts of god-awful javascript navigation. >>>>>>> However, before I even get into that stuff I can''t even get >>>>>>> the site to open in Mechanize. Can anyone else get this >>>>>>> working, or is it just me? >>>>>>> >>>>>>> Make your Messenger window look the way you want. Express >>>>>>> Yourself! _______________________________________________ >>>>>>> Mechanize-users mailing list >>>>>>> Mechanize-users at rubyforge.org >>>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mechanize-users mailing list >>>>>> Mechanize-users at rubyforge.org >>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>>> >>>>> >>>>> Messenger has tons of new features that make chatting more fun. >>>>> Click here to learn more. >>>>> _______________________________________________ >>>>> Mechanize-users mailing list >>>>> Mechanize-users at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>> >>>> >>>> _______________________________________________ >>>> Mechanize-users mailing list >>>> Mechanize-users at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> >> >> >> Messenger has tons of new features that make chatting more fun. >> Click here to learn more. >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/8bcd86a3/attachment.html>
Just another update... If I switch the html parser to Nokogiri instead of Nokogiri::HTML it seems to work as well. It turns out I need Nokogiri''s extra XPath goodness to deal with this rat''s nest, so that''s a good thing. From: mat.schaffer at gmail.com To: mat.schaffer at gmail.com Date: Fri, 27 Mar 2009 14:31:25 -0400 CC: mechanize-users at rubyforge.org Subject: Re: [Mechanize-users] Can''t get this site to open Just noticed this: <!-- Copyright ? 2001, 2002 OGMA Consulting Corp. --> Looks like there''s a UTF-8 copyright symbol or something that might be throwing things off. Especially because the server doesn''t appear to mark it as UTF-8 in the headers. -Mat On Mar 27, 2009, at 2:12 PM, Mat Schaffer wrote:Cool. I dunno if Aaron''s on this or not, but it might be good to figure out why nokogiri can''t parse that page. Here''s a file captured with: File.open(''response.html'', ''w'') { |f| f.print WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body } I may play with it myself this weekend, but if maybe Aaron will beat me to it. Thanks for finding the bug Anthony!-Mat <response.html> On Mar 27, 2009, at 12:29 PM, Anthony F wrote:YES!!! That was it. When I switch the parser to Hpricot all is well. Thanks for the help, Mat! Now I''m off to scrape this god-awful website... A F wrote:Errr... I take that back. I can still get it to work in 0.8.5, but not 0.9.2. If I understand correctly the parser changed between those versions? That''s probably the issue... Mat Schaffer wrote:My previous was 0.9.0, but it works for me with 0.9.2 as well:>> require ''mechanize''=> true>> WWW::Mechanize::VERSION=> "0.9.2">> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body=> "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?,\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" I don''t see any frames here. Do you maybe have a transparent web proxy where you are? What does your response look like? You might want to check using curl too.-Mat On Mar 27, 2009, at 4:02 AM, Anthony F wrote: Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it properly (ie. frames => nil, link => nil, etc). What version are you using? Mat Schaffer wrote:Loads for me, but it''s also got a javascript redirect in there. You''ll have to do that yourself with something like agent.click(page.links.first)>> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body=> "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf this page does not automatically re-direct you to BC Bid?,\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.\r\n\r\n" On Mar 26, 2009, at 6:30 PM, Anthony F wrote: The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? Make your Messenger window look the way you want. Express Yourself! _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users Messenger has tons of new features that make chatting more fun. Click here to learn more. _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users Messenger has tons of new features that make chatting more fun. Click here to learn more. _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users _________________________________________________________________ Chat with the whole group, and bring everyone together. http://go.microsoft.com/?linkid=9650735 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090331/0df3aaa4/attachment.html>