Steven G. Harms
2008-Jan-05 20:18 UTC
[Repost, with Formatting] Trying to understand unicode character entry, goes into postgres DB backing rails, saved to yaml as \xc4\x81
Apologies on unformatted send previously, i hit Enter and the web UI posted, to my chagrin. 1. Examine the Unicode standard''s code page collection for "Latin small letter a with macron". 2. Nets U0100.pdf 3. "Latin small letter a with macron" appears on chart as 0101. This is a hexidemial number which points to U+0101 as its code point. Converting 0101 to decimal gets you 257, this is the same as the HTML entity code. HTML code point is 257. That is &257; gives you &257; != 325. OK, so I can link this guy back to the Unicode source. But here''s the question, what''s up with the two broken values. 4. Put &257; character into a view via Rails that is back-ended by a PostGres database. 5. Using script/console, write the collection of models that contain this accented character to a YAML file. 6. "Latin small letter a with macron" is stored in a YAML dump of accented charcters as: \xC4\x81 Hm, OK that''s a start. Somehow 0101 or 257 is linked to C4 81 Let''s convert those two to decimal and see if correlation becomes clear ( I know, BTW, the database that holds that entry is in UTF-8). C4: 196 81: 129 196+129=325 != 0101. Hm, look at documentation. 7. Be stumped. --- I''m working an application up that works with foreign languages and I''m trying to make it easy to enter accented characters. I saved some base data that I entered as a fixture ( so that I could re-load it as a sample when needed ) and I noticed that in this yaml file my accented characters are in this unusual \x##\x## format that bears little link to the code-points that I''ve seen before in code point charts. I''ve always been scared to jump into the "How does Unicode work, really" discussion, but maybe it''s time that I try to sort it out a bit. Doubtless people from a more multi-lingual environment probably understand this much better than those of us in North America, so I''m hoping this is a lost easier than I think! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---