Javascript''s encodeURIComponent works differently from CGI.eacape or ERB::Util.u. for example: encodeURIComponent(''中文'') = ''%D6%D0%CE%C4'' but>> CGI.escape("中文")=> "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文")=> "%E4%B8%AD%E6%96%87" Is there any way to get the same encoded result with ruby code? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Mar 31, 2:06 pm, Nanyang Zhan <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Javascript''s encodeURIComponent works differently from CGI.eacape or > ERB::Util.u.Well the difference is that the javascript stuff is produced UTF16 and the ruby UTF8 (although the documentation I can find suggests that the javascript should also be producing utf8).> for example: > encodeURIComponent(''中文'') = ''%D6%D0%CE%C4'' > but>> CGI.escape("中文") > > => "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文") > > => "%E4%B8%AD%E6%96%87" > > Is there any way to get the same encoded result with ruby code?The are various libraries for messing around with string encodings, including iconv, and pack/unpack have some specifiers that are relevant for unicode stuff, and rails itself also has various unicode utilities in it. Fred> -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung wrote:> Well the difference is that the javascript stuff is produced UTF16 and > the ruby UTF8 (although the documentation I can find suggests that the > javascript should also be producing utf8).ith ruby code?Thank you for your replied. May be it is the true. But how can the utf16 encodeURIComponent result to be the shorter?> The are various libraries for messing around with string encodings, > including iconv, and pack/unpack have some specifiers that are > relevant for unicode stuff, and rails itself also has various unicode > utilities in it.I tried to encode the string to utf-16 encoding before passing it to CGI.escape(), But I don''t have any luck to production the same result as encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文") I find a perl and a python way to do encodeURIComponent on the net, and their are here: http://d.hatena.ne.jp/ruby-U/20081110/1226313786 It is a pity that I don''t know perl nor python. Can anyone figure out the ruby code for me from them? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Mar 31, 4:27 pm, Nanyang Zhan <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Frederick Cheung wrote: > > Well the difference is that the javascript stuff is produced UTF16 and > > the ruby UTF8 (although the documentation I can find suggests that the > > javascript should also be producing utf8).ith ruby code? > > Thank you for your replied. May be it is the true. But how can the utf16 > encodeURIComponent result to be the shorter?Because for double byte characters utf16 is shorter than utf8.> > > The are various libraries for messing around with string encodings, > > including iconv, and pack/unpack have some specifiers that are > > relevant for unicode stuff, and rails itself also has various unicode > > utilities in it. > > I tried to encode the string to utf-16 encoding before passing it to > CGI.escape(), But I don''t have any luck to production the same result as > encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文") > > I find a perl and a python way to do encodeURIComponent on the net, and > their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786 > > It is a pity that I don''t know perl nor python. Can anyone figure out > the ruby code for me from them? >Those aren''t playing with encodings which is apparently the issue here. Why does it matter anyway? Fred> -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung wrote:> Those aren''t playing with encodings which is apparently the issue > here. Why does it matter anyway?ok. Here is the source code of ERB::Util.url_encode(s) method. # File erb.rb, line 801 def url_encode(s) s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X", $&.unpack("C")[0]) } end now it works like this:> ERB::Util.url_encode("中文") > > => "%E4%B8%AD%E6%96%87"Can you help me changing the url_encode code a bit, so it can return utf16 result? ( which ''%D6%D0%CE%C4'' is the one I want.) -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Mar 31, 4:44 pm, Nanyang Zhan <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Frederick Cheung wrote: > > Those aren''t playing with encodings which is apparently the issue > > here. Why does it matter anyway? > > ok. > > Here is the source code of ERB::Util.url_encode(s) method. > # File erb.rb, line 801 > def url_encode(s) > s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X", > $&.unpack("C")[0]) } > end > > now it works like this: > > > ERB::Util.url_encode("中文") > > > => "%E4%B8%AD%E6%96%87" > > Can you help me changing the url_encode code a bit, so it can return > utf16 result? ( which ''%D6%D0%CE%C4'' is the one I want.)well s.unpack("U*") will turn a string into a array of integers (utf code points) that it should then be easy to split into bytes. I''d start from scratch rather than using url_encode though. Fred> -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung wrote:> well s.unpack("U*") will turn a string into a array of integers (utf > code points) that it should then be easy to split into bytes. I''d > start from scratch rather than using url_encode though.Thanks! Fred.>> "中文".unpack("C*")=> [228, 184, 173, 230, 150, 135] > ERB::Util.url_encode("中文")> => "%E4%B8%AD%E6%96%87"For the first time,I have a little idea what url_encode is doing. when:>> "中文".unpack("U*")=> [20013, 25991] So, it is a way turning [20013, 25991] to ''%D6%D0%CE%C4'', right? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Mar 31, 5:04 pm, Nanyang Zhan <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > when:>> "中文".unpack("U*") > > => [20013, 25991] > > So, it is a way turning [20013, 25991] to ''%D6%D0%CE%C4'', right? >Well 20013 is 0x4E2D which is the utf16 for the first of your characters. Looking back at what you write I''d no idea where D6D0 is coming from - that''s a completely different character according to the unicode character palette I have. Not sure what you javascript has been doing. Fred> -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung wrote:> I''d no idea where D6D0 is > coming fromOK, problem solved. Thank you, Fred. I may never have it done without your help. It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 encoding production. I convert the string from utf8 to GB2312 with iconv, then the url_encode products the right string I need. Thank you again. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Nanyang Zhan wrote:> Frederick Cheung wrote: >> I''d no idea where D6D0 is >> coming from > > OK, problem solved. Thank you, Fred. I may never have it done without > your help. > > It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312 > encoding production. > > I convert the string from utf8 to GB2312 with iconv, then the url_encode > products the right string I need. > > Thank you again.could you give me some codes you soloved the problem? thanks a lot. -- Posted via http://www.ruby-forum.com/.