takumi iino
2007-Jan-12 10:39 UTC
[Mechanize-users] why dose to_absolute_uri use URI.escape?
hello.
This code is abort with Mechanize 0.6.4 .
----------------------------
# sample.rb
require "rubygems"
require "mechanize"
agent = WWW::Mechanize.new
agent.user_agent_alias=''Windows Mozilla''
# top page of wikipedia for japanese
agent.get("http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8")
-----------------------------
> ruby sample.rb
ruby sample.rb
C:/opt/ruby-1.8/lib/ruby/1.8/uri/common.rb:432:in `split'': bad URI(is
not URI?): http://ja.wikipedia.org/wiki/???????????
(URI::InvalidURIError)
from C:/opt/ruby-1.8/lib/ruby/1.8/uri/common.rb:481:in `parse''
from
C:/opt/ruby-1.8/lib/ruby/gems/1.8/gems/mechanize-0.6.4/lib/mechanize.rb:272:in
`to_absolute_uri''
from
C:/opt/ruby-1.8/lib/ruby/gems/1.8/gems/mechanize-0.6.4/lib/mechanize.rb:141:in
`get''
from sample.rb:6
to_absolute_uri in mechanize.rb
url = URI.parse(
URI.unescape(Util.html_unescape(url.to_s.strip)).gsub(/ /,
''%20'')
) unless url.is_a? URI
This code cann''t run with escaped multibyte character.
Why URI.unescape( "uri" ).gsub(/ /, ''%20'') ?
I guess URI.unescape( "uri" ).gsub(/ /, ''%20'') is
not needed.
url = URI.parse(
Util.html_unescape(url.to_s.strip)
) unless url.is_a? URI
--------- takumi
