Jimmy McGrath
2010-Jan-21 01:14 UTC
[Mechanize-users] Cannot download www.yahoo.com with firefox user_agent string
Hi All, This is rather lengthy, but thought the info should be useful. If anyone can help me it would be much appreciated! I''m having an issue trying to get mechanize to download yahoo.com whilst having mechanize identify itself as firefox 3.1. When I leave the mechanize user_agent string as default, the address "http://www.yahoo.com" ends up being redirected to "http://au.yahoo.com/?p=us". When I change the user_agent string to firefox''s using the built in alias "Mac FireFox", and also my browser''s user agent) I get directed to "http://m.www.yahoo.com/" which, if put into firefox, will resolve to "http://au.yahoo.com/?p=us". Interestingly if I user the user_agent_alias of "Linux Mozilla" Has anyone seen this before? It is the only url I have had problems with so I''m a bit perplexed, I''m guessing yahoo is doing something a little weird and mechanize is probably in the right, but it would be good if there was a work around someone could suggest. It is quite important for my application to be able to use their own user agent strings, so I would prefer not to limit the user to only run as the default user agent (or the preset aliases). I have tried the following changes to default settings: agent.redirection_limit = 50 agent.follow_meta_refresh = true But this has had no effect. BTW: I am running ubuntu 9.04 with "ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]" and the following gems installed: -mechanize (0.9.3) -nokogiri (1.4.1) Here is a copy and paste from the console if you would like to reproduce it: irb(main):002:0> require ''rubygems'' => true irb(main):003:0> require ''mechanize'' => true irb(main):004:0> agent = WWW::Mechanize.new => #<WWW::Mechanize:0x7f3b3a9abf38 @pre_connect_hook=#<WWW::Mechanize::Chain::PreConnectHook:0x7f3b3a9abbc8 @hooks=[]>, @proxy_port=nil, @history=[], @open_timeout=nil, @keep_alive=true, @auth_hash={}, @cert=nil, @post_connect_hook=#<WWW::Mechanize::Chain::PostConnectHook:0x7f3b3a9abba0 @hooks=[]>, @follow_meta_refresh=false, @watch_for_set=nil, @proxy_pass=nil, @redirect_ok=true, @log=nil, @keep_alive_time=300, @digest=nil, @verify_callback=nil, @conditional_requests=true, @pluggable_parser=#<WWW::Mechanize::PluggableParser:0x7f3b3a9abe48 @default=WWW::Mechanize::File, @parsers={"application/xhtml+xml"=>WWW::Mechanize::Page, "text/html"=>WWW::Mechanize::Page, "application/vnd.wap.xhtml+xml"=>WWW::Mechanize::Page}>, @user_agent="WWW-Mechanize/0.9.3 (http://rubyforge.org/projects/mechanize/)", @proxy_addr=nil, @pass=nil, @html_parser=Nokogiri::HTML, @connection_cache={}, @password=nil, @ca_file=nil, @proxy_user=nil, @read_timeout=nil, @scheme_handlers={"https"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>, "file"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>, "http"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>, "relative"=>#<Proc:0x00007f3b3d7269b8@/usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:152>}, @request_headers={}, @key=nil, @cookie_jar=#<WWW::Mechanize::CookieJar:0x7f3b3a9abee8 @jar={}>, @redirection_limit=20, @user=nil, @history_added=nil> irb(main):005:0> page = agent.get "http://www.yahoo.com" => #<WWW::Mechanize::Page {url #<URI::HTTP:0x7f3b3a98d588 URL:http://au.yahoo.com/?p=us>} {meta} {title "Yahoo!7"} [SNIP - it seems to download correctly] irb(main):006:0> page.uri.to_s => "http://au.yahoo.com/?p=us" **I''m coming from an Australian IP, yahoo will probably send you elsewhere if you not in Oz** irb(main):007:0> agent.user_agent = "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15" => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15" irb(main):008:0> page = agent.get "http://www.yahoo.com" => #<WWW::Mechanize::Page {url #<URI::HTTP:0x7f3b3a9c1f68 URL:http://m.www.yahoo.com/>} {meta} {title nil} {iframes} {frames} {links} {forms}> irb(main):009:0> page.uri.to_s => "http://m.www.yahoo.com/" Thanks, -Jimmy