The site is: http://www.bcbid.gov.bc.ca It''s a weird, complicated piece of crap full of frames and cookies and all sorts of god-awful javascript navigation. However, before I even get into that stuff I can''t even get the site to open in Mechanize. Can anyone else get this working, or is it just me? _________________________________________________________________ Experience all of the new features, and Reconnect with your life. http://go.microsoft.com/?linkid=9650730 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090326/66e79588/attachment.html>
Loads for me, but it''s also got a javascript redirect in there.
You''ll
have to do that yourself with something like
agent.click(page.links.first)
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
=> "<html>\r\n<head>\r\n<!-- Copyright \251 2001, 2002
OGMA
Consulting Corp. -->\r\n<title>Re-directing to BC
Bid...</title>\r\n</
head>\r\n<body bgcolor="#FFFFFF"
onLoad="document.location=''http://www.bcbid.gov.bc.ca/open.dll/welcome''
">\r\nIf this page does not automatically re-direct you to BC
Bid<sup>®</sup>,<br>\r\nplease <a
href="http://www.bcbid.gov.bc.ca/open.dll/welcome
">click here</a>.\r\n</body>\r\n</html>"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
>
> The site is: http://www.bcbid.gov.bc.ca
>
> It''s a weird, complicated piece of crap full of frames and cookies
> and all sorts of god-awful javascript navigation. However, before I
> even get into that stuff I can''t even get the site to open in
> Mechanize. Can anyone else get this working, or is it just me?
>
> Make your Messenger window look the way you want. Express Yourself!
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090326/218fca63/attachment-0001.html>
Interesting. I tried it with mechanize 0.8.5 and it seemed to work
fine. With 0.9.2 it opens the page, but doesn''t seem to parse it
properly (ie. frames => nil, link => nil, etc). What version are
you using?
Mat Schaffer wrote:
Loads for me, but it''s also got a javascript redirect in there.
You''ll
have to do that yourself with something like
agent.click(page.links.first)
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
Re-directing to BC Bid...
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
The site is: http://www.bcbid.gov.bc.ca
It''s a weird, complicated piece of crap full of frames and cookies and
all sorts of god-awful javascript navigation. However, before I even
get into that stuff I can''t even get the site to open in Mechanize. Can
anyone else get this working, or is it just me?
Make your Messenger window look the way you want. Express
Yourself! _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_________________________________________________________________
Reunite with the people closest to you, chat face to face with Messenger.
http://go.microsoft.com/?linkid=9650736
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/f8a8ad6b/attachment.html>
My previous was 0.9.0, but it works for me with 0.9.2 as well:
>> require ''mechanize''
=> true
>> WWW::Mechanize::VERSION
=> "0.9.2"
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
=> "<html>\r\n<head>\r\n<!-- Copyright \357\275\251
2001, 2002 OGMA
Consulting Corp. -->\r\n<title>Re-directing to BC
Bid...</title>\r\n</
head>\r\n<body bgcolor="#FFFFFF"
onLoad="document.location=''http://www.bcbid.gov.bc.ca/open.dll/welcome''
">\r\nIf this page does not automatically re-direct you to BC
Bid<sup>®</sup>,<br>\r\nplease <a
href="http://www.bcbid.gov.bc.ca/open.dll/welcome
">click here</a>.\r\n</body>\r\n</html>"
I don''t see any frames here. Do you maybe have a transparent web proxy
where you are? What does your response look like? You might want to
check using curl too.
-Mat
On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
> Interesting. I tried it with mechanize 0.8.5 and it seemed to work
> fine. With 0.9.2 it opens the page, but doesn''t seem to parse it
> properly (ie. frames => nil, link => nil, etc). What version are
> you using?
>
> Mat Schaffer wrote:
>>
>> Loads for me, but it''s also got a javascript redirect in
there.
>> You''ll have to do that yourself with something like
>> agent.click(page.links.first)
>>
>> >>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/
>> welcome''">\r\nIf this page does not automatically
re-direct you to
>> BC Bid?,
>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.
>> \r\n\r\n"
>>
>>
>> On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
>>
>>>
>>> The site is: http://www.bcbid.gov.bc.ca
>>>
>>> It''s a weird, complicated piece of crap full of frames and
cookies
>>> and all sorts of god-awful javascript navigation. However, before
>>> I even get into that stuff I can''t even get the site to
open in
>>> Mechanize. Can anyone else get this working, or is it just me?
>>>
>>> Make your Messenger window look the way you want. Express
>>> Yourself! _______________________________________________
>>> Mechanize-users mailing list
>>> Mechanize-users at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>
>>
>> _______________________________________________
>> Mechanize-users mailing list
>> Mechanize-users at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/mechanize-users
>
>
> Messenger has tons of new features that make chatting more fun.
> Click here to learn more.
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/d8234a51/attachment.html>
I should be more clear. Your code also works for me as is in 0.9.2.
However, if I do this:
irb(main):001:0> require ''mechanize''
=> true
irb(main):002:0>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'')
=> #<WWW::Mechanize::Page
{url #<URI::HTTP:0x1bd1504 URL:http://www.bcbid.gov.bc.ca/>}
{meta}
{title nil}
{iframes}
{frames}
{links}
{forms}>
The problem I''m having is title => nil, links => [], etc. I
can''t actually do anything with the page other than get the body. And
when I say frames is empty I mean that when I try to parse the
redirected page (http://www.bcbid.gov.bc.ca/open.dll/welcome) it comes
up empty as well even though I can do a page.body successfully.
I had it working with 0.8.5 last night, but now that doesn''t work
anymore either. I''m baffled.
Mat Schaffer wrote:
My previous was 0.9.0, but it works for me with 0.9.2 as well:
>> require ''mechanize''
=> true
>> WWW::Mechanize::VERSION
=> "0.9.2"
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
Re-directing to BC Bid...
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
I don''t see any frames here. Do you maybe have a transparent web
proxy where you are? What does your response look like? You might want
to check using curl too.
-Mat
On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
Interesting. I tried it with mechanize 0.8.5 and it seemed to
work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it
properly (ie. frames => nil, link => nil, etc). What version are
you using?
Mat Schaffer wrote:
Loads
for me, but it''s also got a javascript redirect in there.
You''ll have
to do that yourself with something like agent.click(page.links.first)
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
The site is: http://www.bcbid.gov.bc.ca
It''s a weird, complicated piece of crap full of frames and cookies and
all sorts of god-awful javascript navigation. However, before I even
get into that stuff I can''t even get the site to open in Mechanize. Can
anyone else get this working, or is it just me?
Make your Messenger window look the way you want. Express
Yourself! _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
Messenger has tons of new features that make chatting more fun. Click
here to learn more. _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_________________________________________________________________
Chat with the whole group, and bring everyone together.
http://go.microsoft.com/?linkid=9650735
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/12200925/attachment-0001.html>
Errr... I take that back. I can still get it to work in 0.8.5, but not
0.9.2. If I understand correctly the parser changed between those
versions? That''s probably the issue...
Mat Schaffer wrote:
My previous was 0.9.0, but it works for me with 0.9.2 as well:
>> require ''mechanize''
=> true
>> WWW::Mechanize::VERSION
=> "0.9.2"
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
Re-directing to BC Bid...
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
I don''t see any frames here. Do you maybe have a transparent web
proxy where you are? What does your response look like? You might want
to check using curl too.
-Mat
On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
Interesting. I tried it with mechanize 0.8.5 and it seemed to
work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it
properly (ie. frames => nil, link => nil, etc). What version are
you using?
Mat Schaffer wrote:
Loads
for me, but it''s also got a javascript redirect in there.
You''ll have
to do that yourself with something like agent.click(page.links.first)
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
The site is: http://www.bcbid.gov.bc.ca
It''s a weird, complicated piece of crap full of frames and cookies and
all sorts of god-awful javascript navigation. However, before I even
get into that stuff I can''t even get the site to open in Mechanize. Can
anyone else get this working, or is it just me?
Make your Messenger window look the way you want. Express
Yourself! _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
Messenger has tons of new features that make chatting more fun. Click
here to learn more. _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_________________________________________________________________
Share photos with friends on Windows Live Messenger
http://go.microsoft.com/?linkid=9650734
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/529cba98/attachment.html>
YES!!! That was it. When I switch the parser to Hpricot all is well.
Thanks for the help, Mat!
Now I''m off to scrape this god-awful website...
A F wrote:
Errr... I take that back. I can still get it to work in 0.8.5, but not
0.9.2. If I understand correctly the parser changed between those
versions? That''s probably the issue...
Mat Schaffer wrote:
My previous was 0.9.0, but it works for me with 0.9.2 as well:
>> require ''mechanize''
=> true
>> WWW::Mechanize::VERSION
=> "0.9.2"
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
Re-directing to BC Bid...
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
I don''t see any frames here. Do you maybe have a transparent
web
proxy where you are? What does your response look like? You might want
to check using curl too.
-Mat
On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
Interesting. I tried it with mechanize 0.8.5 and it seemed
to
work fine. With 0.9.2 it opens the page, but doesn''t seem to parse it
properly (ie. frames => nil, link => nil, etc). What version are
you using?
Mat Schaffer wrote:
Loads
for me, but it''s also got a javascript redirect in there.
You''ll have
to do that yourself with something like agent.click(page.links.first)
>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
The site is: http://www.bcbid.gov.bc.ca
It''s a weird, complicated piece of crap full of frames and cookies and
all sorts of god-awful javascript navigation. However, before I even
get into that stuff I can''t even get the site to open in Mechanize. Can
anyone else get this working, or is it just me?
Make your Messenger window look the way you want. Express
Yourself! _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
Messenger has tons of new features that make chatting more fun. Click
here to learn more. _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_________________________________________________________________
Reunite with the people closest to you, chat face to face with Messenger.
http://go.microsoft.com/?linkid=9650736
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/18579999/attachment.html>
Cool. I dunno if Aaron''s on this or not, but it might be good to
figure out why nokogiri can''t parse that page.
Here''s a file captured with:
File.open(''response.html'', ''w'') { |f|
f.print
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body }
I may play with it myself this weekend, but if maybe Aaron will beat
me to it.
Thanks for finding the bug Anthony!
-Mat
On Mar 27, 2009, at 12:29 PM, Anthony F wrote:
> YES!!! That was it. When I switch the parser to Hpricot all is
> well. Thanks for the help, Mat!
>
> Now I''m off to scrape this god-awful website...
>
> A F wrote:
>>
>> Errr... I take that back. I can still get it to work in 0.8.5, but
>> not 0.9.2. If I understand correctly the parser changed between
>> those versions? That''s probably the issue...
>>
>> Mat Schaffer wrote:
>>>
>>> My previous was 0.9.0, but it works for me with 0.9.2 as well:
>>>
>>> >> require ''mechanize''
>>> => true
>>> >> WWW::Mechanize::VERSION
>>> => "0.9.2"
>>> >>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
>>> =>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/
>>> welcome''">\r\nIf this page does not automatically
re-direct you to
>>> BC Bid?,
>>> \r\nplease
http://www.bcbid.gov.bc.ca/open.dll/welcome">click here.
>>> \r\n\r\n"
>>>
>>> I don''t see any frames here. Do you maybe have a
transparent web
>>> proxy where you are? What does your response look like? You might
>>> want to check using curl too.
>>> -Mat
>>>
>>> On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
>>>
>>>> Interesting. I tried it with mechanize 0.8.5 and it seemed to
>>>> work fine. With 0.9.2 it opens the page, but doesn''t
seem to
>>>> parse it properly (ie. frames => nil, link => nil, etc).
What
>>>> version are you using?
>>>>
>>>> Mat Schaffer wrote:
>>>>>
>>>>> Loads for me, but it''s also got a javascript
redirect in there.
>>>>> You''ll have to do that yourself with something
like
>>>>> agent.click(page.links.first)
>>>>>
>>>>> >>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body
>>>>> =>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''
>>>>> ">\r\nIf this page does not automatically re-direct
you to BC
>>>>> Bid?,
>>>>> \r\nplease
http://www.bcbid.gov.bc.ca/open.dll/welcome">click
>>>>> here.\r\n\r\n"
>>>>>
>>>>>
>>>>> On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
>>>>>
>>>>>>
>>>>>> The site is: http://www.bcbid.gov.bc.ca
>>>>>>
>>>>>> It''s a weird, complicated piece of crap full
of frames and
>>>>>> cookies and all sorts of god-awful javascript
navigation.
>>>>>> However, before I even get into that stuff I
can''t even get the
>>>>>> site to open in Mechanize. Can anyone else get this
working, or
>>>>>> is it just me?
>>>>>>
>>>>>> Make your Messenger window look the way you want.
Express
>>>>>> Yourself!
_______________________________________________
>>>>>> Mechanize-users mailing list
>>>>>> Mechanize-users at rubyforge.org
>>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mechanize-users mailing list
>>>>> Mechanize-users at rubyforge.org
>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>>>
>>>>
>>>> Messenger has tons of new features that make chatting more fun.
>>>> Click here to learn more.
>>>> _______________________________________________
>>>> Mechanize-users mailing list
>>>> Mechanize-users at rubyforge.org
>>>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>>
>>>
>>> _______________________________________________
>>> Mechanize-users mailing list
>>> Mechanize-users at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>
>
>
> Messenger has tons of new features that make chatting more fun.
> Click here to learn more.
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0003.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0004.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/bd3d5e87/attachment-0005.html>
Just noticed this: <!-- Copyright ? 2001, 2002 OGMA Consulting Corp. --> Looks like there''s a UTF-8 copyright symbol or something that might be throwing things off. Especially because the server doesn''t appear to mark it as UTF-8 in the headers. -Mat On Mar 27, 2009, at 2:12 PM, Mat Schaffer wrote:> Cool. I dunno if Aaron''s on this or not, but it might be good to > figure out why nokogiri can''t parse that page. > > Here''s a file captured with: File.open(''response.html'', ''w'') { |f| > f.print WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body } > > I may play with it myself this weekend, but if maybe Aaron will beat > me to it. > > Thanks for finding the bug Anthony! > -Mat > > > <response.html> > > > On Mar 27, 2009, at 12:29 PM, Anthony F wrote: > >> YES!!! That was it. When I switch the parser to Hpricot all is >> well. Thanks for the help, Mat! >> >> Now I''m off to scrape this god-awful website... >> >> A F wrote: >>> >>> Errr... I take that back. I can still get it to work in 0.8.5, >>> but not 0.9.2. If I understand correctly the parser changed >>> between those versions? That''s probably the issue... >>> >>> Mat Schaffer wrote: >>>> >>>> My previous was 0.9.0, but it works for me with 0.9.2 as well: >>>> >>>> >> require ''mechanize'' >>>> => true >>>> >> WWW::Mechanize::VERSION >>>> => "0.9.2" >>>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome'' >>>> ">\r\nIf this page does not automatically re-direct you to BC >>>> Bid?, >>>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click >>>> here.\r\n\r\n" >>>> >>>> I don''t see any frames here. Do you maybe have a transparent web >>>> proxy where you are? What does your response look like? You might >>>> want to check using curl too. >>>> -Mat >>>> >>>> On Mar 27, 2009, at 4:02 AM, Anthony F wrote: >>>> >>>>> Interesting. I tried it with mechanize 0.8.5 and it seemed to >>>>> work fine. With 0.9.2 it opens the page, but doesn''t seem to >>>>> parse it properly (ie. frames => nil, link => nil, etc). What >>>>> version are you using? >>>>> >>>>> Mat Schaffer wrote: >>>>>> >>>>>> Loads for me, but it''s also got a javascript redirect in there. >>>>>> You''ll have to do that yourself with something like >>>>>> agent.click(page.links.first) >>>>>> >>>>>> >> WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body >>>>>> => "\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome'' >>>>>> ">\r\nIf this page does not automatically re-direct you to BC >>>>>> Bid?, >>>>>> \r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click >>>>>> here.\r\n\r\n" >>>>>> >>>>>> >>>>>> On Mar 26, 2009, at 6:30 PM, Anthony F wrote: >>>>>> >>>>>>> >>>>>>> The site is: http://www.bcbid.gov.bc.ca >>>>>>> >>>>>>> It''s a weird, complicated piece of crap full of frames and >>>>>>> cookies and all sorts of god-awful javascript navigation. >>>>>>> However, before I even get into that stuff I can''t even get >>>>>>> the site to open in Mechanize. Can anyone else get this >>>>>>> working, or is it just me? >>>>>>> >>>>>>> Make your Messenger window look the way you want. Express >>>>>>> Yourself! _______________________________________________ >>>>>>> Mechanize-users mailing list >>>>>>> Mechanize-users at rubyforge.org >>>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mechanize-users mailing list >>>>>> Mechanize-users at rubyforge.org >>>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>>> >>>>> >>>>> Messenger has tons of new features that make chatting more fun. >>>>> Click here to learn more. >>>>> _______________________________________________ >>>>> Mechanize-users mailing list >>>>> Mechanize-users at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>> >>>> >>>> _______________________________________________ >>>> Mechanize-users mailing list >>>> Mechanize-users at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> >> >> >> Messenger has tons of new features that make chatting more fun. >> Click here to learn more. >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090327/8bcd86a3/attachment.html>
Just another update...
If I switch the html parser to Nokogiri instead of Nokogiri::HTML it seems to
work as well. It turns out I need Nokogiri''s extra XPath goodness to
deal with this rat''s nest, so that''s a good thing.
From: mat.schaffer at gmail.com
To: mat.schaffer at gmail.com
Date: Fri, 27 Mar 2009 14:31:25 -0400
CC: mechanize-users at rubyforge.org
Subject: Re: [Mechanize-users] Can''t get this site to open
Just noticed this:
<!-- Copyright ? 2001, 2002 OGMA Consulting Corp. -->
Looks like there''s a UTF-8 copyright symbol or something that might be
throwing things off. Especially because the server doesn''t appear to
mark it as UTF-8 in the headers.
-Mat
On Mar 27, 2009, at 2:12 PM, Mat Schaffer wrote:Cool. I dunno if
Aaron''s on this or not, but it might be good to figure out why nokogiri
can''t parse that page.
Here''s a file captured with:
File.open(''response.html'', ''w'') { |f|
f.print
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body }
I may play with it myself this weekend, but if maybe Aaron will beat me to it.
Thanks for finding the bug Anthony!-Mat
<response.html>
On Mar 27, 2009, at 12:29 PM, Anthony F wrote:YES!!! That was it. When I
switch the parser to Hpricot all is well. Thanks for the help, Mat!
Now I''m off to scrape this god-awful website...
A F wrote:Errr... I take that back. I can still get it to work in 0.8.5, but
not 0.9.2. If I understand correctly the parser changed between those versions?
That''s probably the issue...
Mat Schaffer wrote:My previous was 0.9.0, but it works for me with 0.9.2 as
well:>> require ''mechanize''=> true>>
WWW::Mechanize::VERSION=> "0.9.2">>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
I don''t see any frames here. Do you maybe have a transparent web proxy
where you are? What does your response look like? You might want to check using
curl too.-Mat
On Mar 27, 2009, at 4:02 AM, Anthony F wrote:
Interesting. I tried it with mechanize 0.8.5 and it seemed to work fine. With
0.9.2 it opens the page, but doesn''t seem to parse it properly (ie.
frames => nil, link => nil, etc). What version are you using?
Mat Schaffer wrote:Loads for me, but it''s also got a javascript
redirect in there. You''ll have to do that yourself with something like
agent.click(page.links.first)>>
WWW::Mechanize.new.get(''http://www.bcbid.gov.bc.ca'').body=>
"\r\n\r\n\r\n\r\n\r\nhttp://www.bcbid.gov.bc.ca/open.dll/welcome''">\r\nIf
this page does not automatically re-direct you to BC Bid?,
\r\nplease http://www.bcbid.gov.bc.ca/open.dll/welcome">click
here.\r\n\r\n"
On Mar 26, 2009, at 6:30 PM, Anthony F wrote:
The site is: http://www.bcbid.gov.bc.ca
It''s a weird, complicated piece of crap full of frames and cookies and
all sorts of god-awful javascript navigation. However, before I even get into
that stuff I can''t even get the site to open in Mechanize. Can anyone
else get this working, or is it just me?
Make your Messenger window look the way you want. Express Yourself!
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
Messenger has tons of new features that make chatting more fun. Click here to
learn more. _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
Messenger has tons of new features that make chatting more fun. Click here to
learn more. _______________________________________________
Mechanize-users mailing list
Mechanize-users at rubyforge.org
http://rubyforge.org/mailman/listinfo/mechanize-users
_________________________________________________________________
Chat with the whole group, and bring everyone together.
http://go.microsoft.com/?linkid=9650735
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20090331/0df3aaa4/attachment.html>