Marius Feraru
2006-Sep-19 17:47 UTC
How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not really a Prototype problem, but given the fact its approach is to use this (in)famous "X-JSON" HTTP header, I wonder how (if any) you succeeded to pass on Unicode data without encoding it to UCS. At first sight it looks like a Gecko problem, but nevertheless, I''m still curious of other workarounds besides this clumsy UCS encoding. Gecko seems to parse headers properly (NSPR nsHttp debugging), but something "in the middle" (on the way to JavaScript) messes all the data (testes with LiveHTTPHeaders, firebug & XMLHttpRequest). For instance: JSON: {"a":"ß","b":"ç" } response: X-JSON: {"a":"Ã","b":"ç"} X-JSON-UCS: {"a":"\u00df","b":"\u00e7"} ("X-JSON" is obviously the erroneously parsed one) TIA - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFFEC0ptZHp/AYZiNkRAnGwAJ0UKDjtzWh4YuZGD4BBC9Z51ekmtwCgmNEL 6kF43MFUYJNiC16rOGHjIO8=wxWz -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Martin Bialasinski
2006-Sep-19 22:24 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
On 9/19/06, Marius Feraru <altblue-9gptZ63fvgw@public.gmane.org> wrote:> Not really a Prototype problem, but given the fact its approach is to > use this (in)famous "X-JSON" HTTP header, I wonder how (if any) you > succeeded to pass on Unicode data without encoding it to UCS.The value of the X-JSON header is eval()''ed. Strings in JS are UTF-8 encoded. Encode the X-JSON value as UTF-8 and everything will be just fine. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Marius Feraru
2006-Sep-20 13:30 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin Bialasinski wrote:> The value of the X-JSON header is eval()''ed. Strings in JS are UTF-8 > encoded.If I wouldn''t know about this, why do you think I mentioned UCS?!> Encode the X-JSON value as UTF-8 and everything will be just fine.I thought it was already obvious from my previous message that data _is_ UTF-8 encoded. I suppose my previous message wasn''t eloquent enough, so I''ll reiterate. Request: new Ajax.Request(''/echo'', { method: ''get'', parameters: ''a=ß;b=ç'', onSuccess: function(req, json) { [''X-JSON'', ''X-JSON-UCS''].each(function(h) { console.debug( h + '': '', req.getResponseHeader(h) ); }); } }); "/echo" it''s just a simple "JSON echo" test, encoding received query params into a JSON object. Scraping some gecko debugging messages: ... - -1210520688[9c18b48]: nsHttpTransaction::ParseLine [X-JSON-UCS: {"a":"\u00df","_":"","b":"\u00e7"}] - -1210520688[9c18b48]: nsHttpTransaction::ParseLine [X-JSON: {"a":"ß","_":"","b":"ç"}] ... - -1210520688[9c18b48]: http response [ - -1210520688[9c18b48]: HTTP/1.1 200 OK ... - -1210520688[9c18b48]: X-JSON-UCS: {"a":"\u00df","_":"","b":"\u00e7"} - -1210520688[9c18b48]: X-JSON: {"a":"ß","_":"","b":"ç"} ... - -1210520688[9c18b48]: ] ... This proves Gecko properly processes HTTP headers (at nsHttp level) ... BUT!... checking response headers (with anything JS based, e.g. FireBug, LiveHTTPHeaders, etc), I get: X-JSON-UCS: {"a":"\u00df","_":"","b":"\u00e7"} X-JSON: {"a":"Ã","_":"","b":"ç"} Which obviously is wrong (the ''X-JSON'' sample) :( - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFFEUJetZHp/AYZiNkRAlWRAJ0cscsZ4U5kAfan+VFJq5X3YAPWSACdEUXv kT/qAn0KhA4ipfHco7y2fcs=Xnki -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Martin Bialasinski
2006-Sep-20 15:17 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
I am sorry, I did not understand you correctly. And I also have to revise my understanding of the matter. RFC2616, Section 2.2 says: The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = <any OCTET except CTLs, but including LWS> What seems to happen is that Firefox assumes the X-JSON header to contain a ISO-8859-1 encoded value and performs a conversion ISO-8859-1 -> UTF8. So multi-byte codepoint don''t work. I tried quoted printable but had no luck, but it might be just me. And there is also the question how the various browsers handle this. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Marius Feraru
2006-Sep-20 16:53 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin Bialasinski wrote:> I tried quoted printable but had no luck, but it might be just me.It''s useless to try encoding HTTP headers in any way, as no HTTP spec (rfc2616, rfc1945, erratas, etc) specify any encoding to be necessary. More than that, I already described that Gecko parses headers correctly (nsHttp module), it''s just some other part of their toolkit that messes with those headers before letting JS using them. :(> And there is also the question how the various browsers handle this.OK, here''s a quick test: Mozilla Firefox 1.5.0.7 (Gecko/20060913) - Fail Mozilla Firefox 2.0b2 (Gecko/20060821) - Fail Mozilla SeaMonkey 1.0.5 (Gecko/20060910) - Fail Internet Explorer 6 - OK Opera 9.01 - Fail (Gecko-like) Konqueror 3.5.4 - Fail (Gecko-like) - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFFEXIetZHp/AYZiNkRAqORAKDwqKYqE+nVZdgLGMAmM1RGhBd6TgCgmUnz YSAsICxWLJhnG0tOuaiPvgw=BxY0 -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Martin Bialasinski
2006-Sep-20 19:40 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
On 9/20/06, Marius Feraru <altblue-9gptZ63fvgw@public.gmane.org> wrote:> It''s useless to try encoding HTTP headers in any way, as no HTTP spec > (rfc2616, rfc1945, erratas, etc) specify any encoding to be necessary.But RFC 2616 says Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 only when encoded according to the rules of RFC 2047 Doesn''t that mean, that one may use quoted-printable encoding as described in RFC 2047?> More than that, I already described that Gecko parses headers correctly > (nsHttp module), it''s just some other part of their toolkit that messes > with those headers before letting JS using them. :(Just to clear this up: Sending the ß character X-JSON (utf-8 encoded value) = C3 9F X-JSON-UCS (equals ISO-8859-1 in this case) = DF Firefox has to assume both headers contain ISO-8859-1 encoded values X-JSON = 2 character string C3 9F = ß in ISO-8859-1 X-JSON-UCS = 1 character string DF = ß in ISO-8859-1 Pass the values to JS as X-JSON = "\u00C3\u009F" = ß in UTF-8 X-JSON-UCS = "\u00DF" = ß in UTF-8 Seems consistant with the observations and what it is supposed to do according to the RFC. Now as to the debugger messages as in - -1210520688[9c18b48]: X-JSON: {"a":"ß","_":"","b":"ç"} Firefox expects the header to be ISO-8859-1 and it cannot know that the data is actually UTF-8 encoded. Therefore it cannot know that C3 9F is ß. So how can it be able to show "ß" in the above output? It looks like it is just the display system that assumes UTF-8 data, gets C3 9F and displays a "ß" where as Firefox internaly deals with the byte sequence that it assumes to be a ISO-8859-1 encoded string. Your tests show that IE6 correctly retrieves a UTF-8 encoded header? Then it should wrongly decode the ISO-8859-1 equivalent value in the X-JSON-UCS header as either C3 9F or DF can be "ß", but not both, no? Looks like one needs a function that takes the result from getResponseHeader() and reconstructs the UTF-8 encoded string from the bytes returned by charCodeAt(). --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Marius Feraru
2006-Sep-20 22:46 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1> Your tests show that IE6 correctly retrieves a UTF-8 encoded header? > Then it should wrongly decode the ISO-8859-1 equivalent value in the > X-JSON-UCS header as either C3 9F or DF can be "ß", but not both, no?When I said UCS encoded, I really meant that :)) Remember this scrap from Gecko debug: - -1210520688[9c18b48]: http response [ - -1210520688[9c18b48]: HTTP/1.1 200 OK ... - -1210520688[9c18b48]: X-JSON-UCS: {"a":"\u00df","_":"","b":"\u00e7"} - -1210520688[9c18b48]: X-JSON: {"a":"ß","_":"","b":"ç"} ... - -1210520688[9c18b48]: ] These are "raw" headers, it''s exactly what it received.> Sending the ß character X-JSON (utf-8 encoded value) = C3 9F > X-JSON-UCS (equals ISO-8859-1 in this case) = DF^ no, it''s 6 (six) bytes: "\", "u", "0", "0", "d" and "f". Following your thoughts, looks like IE doesn''t assume data is ISO-8859-1 (or anything else), it just passed on the raw header to JS. - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFFEcTFtZHp/AYZiNkRAvvNAJ91DBqJpn80XdiSZk+jui3ZeeE2ZACeM2qn IXo+AiA/+0nunMc/Zy0b6pY=voBd -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Marius Feraru
2006-Sep-21 01:30 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin Bialasinski wrote:> Looks like one needs a function that takes the result from > getResponseHeader() and reconstructs the UTF-8 encoded string from the > bytes returned by charCodeAt().Based on your observations and following this suggestion I''ll settle this discussion to something like "OK, if it''s bytes they get, bytes I''ll handle". So, I''ll just drop UCS (6 times longer length it''s not nice after all) and keep on sending proper UTF-8 data, which I''ll handle in JS by decoding those bytes back to UTF-8. I''ll be overriding Prototype''s "evalJSON" with this one: Ajax.Request.prototype.evalJSON = function() { var json = this.header(''X-JSON'') || ''{}''; var jutf = decodeURIComponent(escape(json)); if (jutf.length < json.length) { json = jutf; } try { return eval(''('' + json + '')''); } catch (e) { return {}; } }; cheers - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFFEesctZHp/AYZiNkRAtG5AJ9UGD8jIzAsdq/1ZYJjqWTklWCC2gCfaFUV vIcfQLL64yB5KHXBYXQ3DR4=9Gwp -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---
Martin Bialasinski
2006-Sep-22 13:03 UTC
Re: How to deal with Unicode chars inside X-JSON HTTP header (besides UCS)?
Darn, I did not realise it was a literal \uXXXX. Too much exposure to viewers / editors that display the code for control characters they can not show, I guess... On 9/21/06, Marius Feraru <altblue-9gptZ63fvgw@public.gmane.org> wrote:> So, I''ll just drop UCS (6 times longerThree times for "\u00DF" vs. 0xC3 0x9F, but still :-)> var jutf = decodeURIComponent(escape(json)); > if (jutf.length < json.length) { > json = jutf; > }Wow. Very, very clever! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to rubyonrails-spinoffs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-spinoffs-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs -~----------~----~----~----~------~----~------~--~---