ddellacosta
2011-May-16 13:47 UTC
Problem with GET args and UTF-8 encoding (output of Rack::Utils.unescape() ?)
Hi folks, Here''s my basic issue, hopefully this is clear. I''m trying to submit some UTF-8 values in my query string, but they are coming out mangled on the other end. It *seems* like the problem is that what Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it''s mangling characters which are two bytes (for example, "%20," which is space and a one byte character, gets converted fine). I feel like I''ve almost figured this out, but I''m still stumped. Here''s my "evidence:" # Example UTF-8 string: "Adélaïde de Hongrie" # GET string (obviously URI encoded): Started GET "/registers/results?filter[title][]=Ad%E9la%EFde%20de %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700 # What Rack produces/Rails sees (in Controller): Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, "search"=>"", "limit"=>"4"} # Error I''m getting, when I try to "do stuff" with the above string: ArgumentError (invalid byte sequence in UTF-8): # What would actually be a valid string with hex UTF code points in the format above: "Ad\xC3\xA9la\xC3\xAFde de Hongrie" Or, in the "\u ..." format (see anything interesting here? Something obvious is eluding me...): "Ad\u{E9}la\u{EF}de de Hongrie To be clear, this is not a form, but an ajax query. I''ve tried adding the ''utf8'' snowman thing manually too, but that doesn''t seem to do anything...of course, maybe I''m doing that wrong. Any thoughts/questions/pointing out of obvious errors or confused ways of thinking? I''d also appreciate any pointers to Rails documentation which describes in more detail how this stuff happens; I''ve just been digging through the code and it''s slow going for me. Help much appreciated! Cheers, Dave -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
ddellacosta
2011-May-16 17:09 UTC
Re: Problem with GET args and UTF-8 encoding (output of Rack::Utils.unescape() ?)
Okay, I''m still not there but I''ve realized I''ve been confusing a few things. This stackoverflow answer helped a lot: http://stackoverflow.com/questions/5083032/how-to-utf-8-encode-a-character-string/5083858#5083858 I was conflating Unicode with UTF-8. But, I think that''s also essentially what is happening somewhere in the process of ASCII-8BIT (output of Rack::Utils.unescape()) getting converted to UTF-8. I have to figure out how to override unescape() in my own initializer, I suppose, or intercept unescape()''s output and properly encode that. I think I''m close to a solution, since I''m starting to understand what all the values should be and what is happening. But any help will still be greatly appreciated, since there is still something eluding my understanding. Thanks, Dave On 5月16日, 午後8:47, ddellacosta <ddellaco...@gmail.com> wrote:> Hi folks, > > Here''s my basic issue, hopefully this is clear. I''m trying to submit > some UTF-8 values in my query string, but they are coming out mangled > on the other end. It *seems* like the problem is that what > Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in > the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it''s mangling > characters which are two bytes (for example, "%20," which is space and > a one byte character, gets converted fine). I feel like I''ve almost > figured this out, but I''m still stumped. Here''s my "evidence:" > > # Example UTF-8 string: > > "Adélaïde de Hongrie" > > # GET string (obviously URI encoded): > > Started GET "/registers/results?filter[title][]=Ad%E9la%EFde%20de > %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700 > > # What Rack produces/Rails sees (in Controller): > > Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, > "search"=>"", "limit"=>"4"} > > # Error I''m getting, when I try to "do stuff" with the above string: > > ArgumentError (invalid byte sequence in UTF-8): > > # What would actually be a valid string with hex UTF code points in > the format above: > > "Ad\xC3\xA9la\xC3\xAFde de Hongrie" > > Or, in the "\u ..." format (see anything interesting here? Something > obvious is eluding me...): > > "Ad\u{E9}la\u{EF}de de Hongrie > > To be clear, this is not a form, but an ajax query. I''ve tried adding > the ''utf8'' snowman thing manually too, but that doesn''t seem to do > anything...of course, maybe I''m doing that wrong. > > Any thoughts/questions/pointing out of obvious errors or confused ways > of thinking? I''d also appreciate any pointers to Rails documentation > which describes in more detail how this stuff happens; I''ve just been > digging through the code and it''s slow going for me. > > Help much appreciated! > > Cheers, > Dave-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Frederick Cheung
2011-May-16 18:00 UTC
Re: Problem with GET args and UTF-8 encoding (output of Rack::Utils.unescape() ?)
On 16 May 2011, at 14:47, ddellacosta <ddellacosta-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi folks, > > Here''s my basic issue, hopefully this is clear. I''m trying to submit > some UTF-8 values in my query string, but they are coming out mangled > on the other end. It *seems* like the problem is that what > Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in > the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it''s mangling > characters which are two bytes (for example, "%20," which is space and > a one byte character, gets converted fine). I feel like I''ve almost > figured this out, but I''m still stumped. Here''s my "evidence:" > > # Example UTF-8 string: > > "Adélaïde de Hongrie" > > # GET string (obviously URI encoded): > > Started GET "/registers/results?filter[title][]=Ad%E9la%EFde%20de > %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700Who is producing this query string? They should be generating %c3%a9 if they are UTF8 friendly, since %e9 is just URL speak for \xe9, which smells like iso-Latin-something Fred> > # What Rack produces/Rails sees (in Controller): > > Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, > "search"=>"", "limit"=>"4"} > > # Error I''m getting, when I try to "do stuff" with the above string: > > ArgumentError (invalid byte sequence in UTF-8): > > # What would actually be a valid string with hex UTF code points in > the format above: > > "Ad\xC3\xA9la\xC3\xAFde de Hongrie" > > Or, in the "\u ..." format (see anything interesting here? Something > obvious is eluding me...): > > "Ad\u{E9}la\u{EF}de de Hongrie > > To be clear, this is not a form, but an ajax query. I''ve tried adding > the ''utf8'' snowman thing manually too, but that doesn''t seem to do > anything...of course, maybe I''m doing that wrong. > > Any thoughts/questions/pointing out of obvious errors or confused ways > of thinking? I''d also appreciate any pointers to Rails documentation > which describes in more detail how this stuff happens; I''ve just been > digging through the code and it''s slow going for me. > > Help much appreciated! > > Cheers, > Dave > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en. >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
ddellacosta
2011-May-17 05:07 UTC
Re: Problem with GET args and UTF-8 encoding (output of Rack::Utils.unescape() ?)
Thanks for pointing out the obvious Frederick (seriously, thank you). The problem was completely on the JavaScript/browser side; the function which prepared the query string was using escape() rather than encodeURIComponent(). I replaced all the calls to escape and things started to magically work, how about that? Thank you, I really appreciate the help!! I can''t believe how much time I spent looking in the wrong places...at least I learned a fair amount about Rails internals as well as encoding issues though...haha. Cheers, Dave On 5月17日, 午前1:00, Frederick Cheung <frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 16 May 2011, at 14:47, ddellacosta <ddellaco...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > Hi folks, > > > Here''s my basic issue, hopefully this is clear. I''m trying to submit > > some UTF-8 values in my query string, but they are coming out mangled > > on the other end. It *seems* like the problem is that what > > Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in > > the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it''s mangling > > characters which are two bytes (for example, "%20," which is space and > > a one byte character, gets converted fine). I feel like I''ve almost > > figured this out, but I''m still stumped. Here''s my "evidence:" > > > # Example UTF-8 string: > > > "Adélaïde de Hongrie" > > > # GET string (obviously URI encoded): > > > Started GET "/registers/results?filter[title][]=Ad%E9la%EFde%20de > > %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700 > > Who is producing this query string? They should be generating %c3%a9 if they are UTF8 friendly, since %e9 is just URL speak for \xe9, which smells like iso-Latin-something > > Fred > > > > > # What Rack produces/Rails sees (in Controller): > > > Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, > > "search"=>"", "limit"=>"4"} > > > # Error I''m getting, when I try to "do stuff" with the above string: > > > ArgumentError (invalid byte sequence in UTF-8): > > > # What would actually be a valid string with hex UTF code points in > > the format above: > > > "Ad\xC3\xA9la\xC3\xAFde de Hongrie" > > > Or, in the "\u ..." format (see anything interesting here? Something > > obvious is eluding me...): > > > "Ad\u{E9}la\u{EF}de de Hongrie > > > To be clear, this is not a form, but an ajax query. I''ve tried adding > > the ''utf8'' snowman thing manually too, but that doesn''t seem to do > > anything...of course, maybe I''m doing that wrong. > > > Any thoughts/questions/pointing out of obvious errors or confused ways > > of thinking? I''d also appreciate any pointers to Rails documentation > > which describes in more detail how this stuff happens; I''ve just been > > digging through the code and it''s slow going for me. > > > Help much appreciated! > > > Cheers, > > Dave > > > -- > > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > For more options, visit this group athttp://groups.google.com/group/rubyonrails-talk?hl=en.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.