Jimmy McGrath
2010-Jan-21 01:14 UTC
[Mechanize-users] Cannot download www.yahoo.com with firefox user_agent string
Hi All,
This is rather lengthy, but thought the info should be useful. If anyone
can help me it would be much appreciated!
I''m having an issue trying to get mechanize to download yahoo.com
whilst
having mechanize identify itself as firefox 3.1. When I leave the
mechanize user_agent string as default, the address
"http://www.yahoo.com" ends up being redirected to
"http://au.yahoo.com/?p=us". When I change the user_agent string to
firefox''s using the built in alias "Mac FireFox", and also my
browser''s
user agent) I get directed to "http://m.www.yahoo.com/" which, if put
into firefox, will resolve to "http://au.yahoo.com/?p=us".
Interestingly
if I user the user_agent_alias of "Linux Mozilla"
Has anyone seen this before? It is the only url I have had problems with
so I''m a bit perplexed, I''m guessing yahoo is doing something
a little
weird and mechanize is probably in the right, but it would be good if
there was a work around someone could suggest. It is quite important for
my application to be able to use their own user agent strings, so I
would prefer not to limit the user to only run as the default user agent
(or the preset aliases).
I have tried the following changes to default settings:
agent.redirection_limit = 50
agent.follow_meta_refresh = true
But this has had no effect.
BTW: I am running ubuntu 9.04 with "ruby 1.8.7 (2008-08-11 patchlevel
72) [x86_64-linux]"
and the following gems installed:
-mechanize (0.9.3)
-nokogiri (1.4.1)
Here is a copy and paste from the console if you would like to reproduce it:
irb(main):002:0> require ''rubygems''
=> true
irb(main):003:0> require ''mechanize''
=> true
irb(main):004:0> agent = WWW::Mechanize.new
=> #<WWW::Mechanize:0x7f3b3a9abf38
@pre_connect_hook=#<WWW::Mechanize::Chain::PreConnectHook:0x7f3b3a9abbc8
@hooks=[]>, @proxy_port=nil, @history=[], @open_timeout=nil,
@keep_alive=true, @auth_hash={}, @cert=nil,
@post_connect_hook=#<WWW::Mechanize::Chain::PostConnectHook:0x7f3b3a9abba0
@hooks=[]>, @follow_meta_refresh=false, @watch_for_set=nil,
@proxy_pass=nil, @redirect_ok=true, @log=nil, @keep_alive_time=300,
@digest=nil, @verify_callback=nil, @conditional_requests=true,
@pluggable_parser=#<WWW::Mechanize::PluggableParser:0x7f3b3a9abe48
@default=WWW::Mechanize::File,
@parsers={"application/xhtml+xml"=>WWW::Mechanize::Page,
"text/html"=>WWW::Mechanize::Page,
"application/vnd.wap.xhtml+xml"=>WWW::Mechanize::Page}>,
@user_agent="WWW-Mechanize/0.9.3
(http://rubyforge.org/projects/mechanize/)", @proxy_addr=nil, @pass=nil,
@html_parser=Nokogiri::HTML, @connection_cache={}, @password=nil,
@ca_file=nil, @proxy_user=nil, @read_timeout=nil,
@scheme_handlers={"https"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>,
"file"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>,
"http"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>,
"relative"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>},
@request_headers={}, @key=nil,
@cookie_jar=#<WWW::Mechanize::CookieJar:0x7f3b3a9abee8 @jar={}>,
@redirection_limit=20, @user=nil, @history_added=nil>
irb(main):005:0> page = agent.get "http://www.yahoo.com"
=> #<WWW::Mechanize::Page
{url #<URI::HTTP:0x7f3b3a98d588 URL:http://au.yahoo.com/?p=us>}
{meta}
{title "Yahoo!7"}
[SNIP - it seems to download correctly]
irb(main):006:0> page.uri.to_s
=> "http://au.yahoo.com/?p=us"
**I''m coming from an Australian IP, yahoo will probably send you
elsewhere if you not in Oz**
irb(main):007:0> agent.user_agent = "Mozilla/5.0 (X11; U; Linux x86_64;
en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15"
=> "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15)
Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15"
irb(main):008:0> page = agent.get "http://www.yahoo.com"
=> #<WWW::Mechanize::Page
{url #<URI::HTTP:0x7f3b3a9c1f68 URL:http://m.www.yahoo.com/>}
{meta}
{title nil}
{iframes}
{frames}
{links}
{forms}>
irb(main):009:0> page.uri.to_s
=> "http://m.www.yahoo.com/"
Thanks,
-Jimmy