Hi, I have a load of records in my database which were imported through processing a YAML file. These original YAML files were created from the ''to_yaml'' function of an array of Hash objects. The YAML file contains multibyte character references such as: ...and between them and today\xE2\x80\x99s College. The scope, r... When I imported this data into my DB these character references have changed but are still there in the DB: ...and between them and today\342\200\231s College. The scope, r... So I have two questions: 1) Are the original characters retreivable from the copy in the DB, or has it been mangled? 2) If the above answer is yes, then how! Really appreciate any help on this one. Many thanks in advance. ~ Mark -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Mark Dodwell wrote:> Hi, > > I have a load of records in my database which were imported through > processing a YAML file. These original YAML files were created from the > ''to_yaml'' function of an array of Hash objects. > > The YAML file contains multibyte character references such as: > > ...and between them and today\xE2\x80\x99s College. The scope, r... > > When I imported this data into my DB these character references have > changed but are still there in the DB: > > ...and between them and today\342\200\231s College. The scope, r... > > So I have two questions: > > 1) Are the original characters retreivable from the copy in the DB, or > has it been mangled? > > 2) If the above answer is yes, then how! > > Really appreciate any help on this one. Many thanks in advance. > > ~ MarkWhat''s the encoding in the YAML file (presumably UTF-8), what database are you using and what encoding is your database/table set to? -- Michael Wang --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Michael, The DB is ''ISO Latin 1 (latin1)'' encoding. I''m not sure about the original YAML file (do you know the default encoding for .to_yaml?) - but when I open it directly with, say TextMate, it shows the character reference *not* the actual character. Thanks, ~ Mark -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Mark Dodwell wrote:> Hi Michael, > > The DB is ''ISO Latin 1 (latin1)'' encoding. > > I''m not sure about the original YAML file (do you know the default > encoding for .to_yaml?) - but when I open it directly with, say > TextMate, it shows the character reference *not* the actual character. > > Thanks, > > ~ MarkMySQL, if that''s what you are using, let''s you set the character encoding at various different levels (server, database, table, column). If you are using MySQL you could try something like an ALTER TABLE to change the encoding to UTF-8 (which I''m guessing is what the original YAML data is in). You might have to export the data and import it into a table that''s already set to UTF-8, though, in which case if you still have all the YAML data around it might be easier just to reload that with the table set to the proper encoding. http://dev.mysql.com/doc/refman/5.0/en/charset.html -- Michael Wang --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks for the help Michael. Eventually, I managed to sort this without having to reimport. Thought, I''d post how in case somebody else got stuck in a similar way. Model.find_all.each do |m| content = m.content content = Iconv.iconv(''ISO-8859-1//TRANSLIT'', ''UTF-8'', content).to_s m.update_attribute(:content, content) end This translated the UTF-8 encoded chars into ISO-8859-1 encoded equicalents. They are not the exact characters obviously, but close approximations. ~ Mark -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---