Question: Hi, our company is using Ruby 1.8.6 with Rails 2.2.2. Does anyone know how we can explicitly specify what encoding to use when calling .to_json on an ActiveRecord? Description: We have some multibyte characters in our database. For example we have a table with a name column that has this French accented e: Café Records. When we serialize this object using ActiveRecord''s to_xml() everything looks fine in the browser and with our json objects. When we render JSON using to_json() we are seeing problems where the accented ''e'' character is getting mangled and causes our calling web client to fail since it''s expecting properly UTF-8 encoded characters. If we use the browser to submit HTTP Get requesting JSON format, save the file and view it in binary mode in Hexadecimal representation, this is what we get. It looks like this is using extended ASCII. Bytes Text 43 61 66 E9 C a f (should be accented e but get weird block unprintable character) If we save that same file from above and convert it to UTF-8, we get an extra byte that seems to be proper UTF-8 encoding as shown below. Bytes Text 43 61 66 C3 A9 C a f é (correctl get accented e) Can someone tell me how they''ve made to_json() UTF-8 compliant? Thanks in advance, Calvin. -- Posted via http://www.ruby-forum.com/.
On Aug 10, 10:02 pm, Calvin Nguyen <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Question: > Hi, our company is using Ruby 1.8.6 with Rails 2.2.2. Does anyone > know how we can explicitly specify what encoding to use when calling > .to_json on an ActiveRecord? > > Description: > We have some multibyte characters in our database. For example we > have a table with a name column that has this French accented e: Café > Records. When we serialize this object using ActiveRecord''s to_xml() > everything looks fine in the browser and with our json objects. > > When we render JSON using to_json() we are seeing problems where the > accented ''e'' character is getting mangled and causes our calling web > client to fail since it''s expecting properly UTF-8 encoded characters. >What''s actually stored in the database ? If you open up a console and find the relevant object in the database what does the name attribute contain ? I don''t think that to_json does much more than spit out the data ActiveRecord already has. Fred> If we use the browser to submit HTTP Get requesting JSON format, save > the file and view it in binary mode in Hexadecimal representation, > this is what we get. It looks like this is using extended ASCII. > > Bytes Text > 43 61 66 E9 C a f (should be accented e but get weird > block unprintable character) > > If we save that same file from above and convert it to UTF-8, we get an > extra byte that seems to be proper UTF-8 encoding as shown below. > > Bytes Text > 43 61 66 C3 A9 C a f é (correctl get accented e) > > Can someone tell me how they''ve made to_json() UTF-8 compliant? > Thanks in advance, Calvin. > -- > Posted viahttp://www.ruby-forum.com/.
Frederick Cheung wrote:> On Aug 10, 10:02�pm, Calvin Nguyen <rails-mailing-l...@andreas-s.net> > wrote: >> When we render JSON using to_json() we are seeing problems where the >> accented ''e'' character is getting mangled and causes our calling web >> client to fail since it''s expecting properly UTF-8 encoded characters. >> > > What''s actually stored in the database ? If you open up a console and > find the relevant object in the database what does the name attribute > contain ? I don''t think that to_json does much more than spit out the > data ActiveRecord already has. > > FredHere is our record in the datbase: 322 Café Records NULL In debugging we have had to set up a webclient to stream the bytes and it results in what I summarized previously: Bytes Text 43 61 66 E9 C a f (should be accented e but get weird block unprintable character) -- Posted via http://www.ruby-forum.com/.
On Aug 10, 11:55 pm, Calvin Nguyen <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Frederick Cheung wrote: > > On Aug 10, 10:02 pm, Calvin Nguyen <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> > > wrote: > >> When we render JSON using to_json() we are seeing problems where the > >> accented ''e'' character is getting mangled and causes our calling web > >> client to fail since it''s expecting properly UTF-8 encoded characters. > > > What''s actually stored in the database ? If you open up a console and > > find the relevant object in the database what does the name attribute > > contain ? I don''t think that to_json does much more than spit out the > > data ActiveRecord already has. > > > Fred > > Here is our record in the datbase: > 322 Café Records NULL >That''s not what I meant. What are the actual bytes stored in the database ? What encoding does the database think this column is in. If, in a ruby console, you inspect the bytes contained in the name column what do you see ? Fred
Hi Fred, I appreciate the reply. We are using SQL Server 2005 and our database record looks like this: name cast(name as varbinary) Café Records 0x436166E9205265636F726473 If I use the ruby console and print each byte, I get this. Must be at better way to show the byte stream in ruby 1.8.6 than this...>> l.name[0]=> 67>> l.name[1]=> 97>> l.name[2]=> 102>> l.name[3]=> 233>> l.name[4]=> 32 The database table defines the name column as varchar(255). I cannot find the column level encoding but I read (http://books.google.com/books?id=_8t73M1r71sC&pg=PA418&lpg=PA418&dq=encoding+sql+server+2005&source=bl&ots=hT6RZJyXl-&sig=KzP2kG73Md7-N5oZviZZ9Xie2PY&hl=en&ei=wcuASu2AIo3gswPvgu32CA&sa=X&oi=book_result&ct=result&resnum=8#v=onepage&q=&f=false) that SQL server 2005 by default uses Unicode ( doesn''t say which Unicode though. I believe when you specify a column as nvarchar then it default to UCS-2? -- Posted via http://www.ruby-forum.com/.