This is not a question but a report on the difficulties I had and the solution I found with respect to UTF-8, YAML::load, and Ruby/Rails. Comments are appreciated. - - - I had been struggling for two days to get UTF-8 working in my Rails app. I had/have a localization file, lib\locale\de.yml, that had iso-8859-1 encoding. I could not get that to display properly. Marnen, quite correctly, suggested that I transit to UTF-8. Of course, I had tried to do that but I could not get the YAML localization file to load. What I had done was load the ANSI (i.e. iso-8859-1) localization file into Notepad, convert to UTF-8, and saved that file. Then all my German (de.yml) localizations failed. It turns out that Notepad places "\xEF\xBB\xBF" at the beginning of the file to indicate that this is a YAML file. These three bytes appear to screw up YAML::load Gimme a break! Note only does Notepad put in these indicator bytes ... so does TextMate. In fact, TextMate will happily determine that your non-"\xEF\xBB\xBF" file is a UTF-8 file and will automatically reinsert the indicator bytes. I find this rather hysterical (not in a good way) since in http://blog.macromates.com/2005/handling-encodings-utf-8/ one of the authors of TextMate wrote "Property 3 turns out to be attractive because it means we can heuristically recognize UTF-8 with a near 100% certainty by checking if the file is valid. Some software think it’s a good idea to embed a BOM (byte order mark) in the beginning of an UTF-8 file, but it is not, because the file can already be recognized, and placing a BOM in the beginning of a file means placing three bytes in the beginning of the file which a program that use the file may not expect...". How thoughtful that TextMate does what the article says it should not do. If there is a way to turn off that behavior, I can''t find it. Maybe there''s a TextMate bundle ... who knows? In order to get YAML::Load to load the localization, I have to remove the three indicator bytes. Yuck! Once I did that, YAML loads happily. - - - - - - - - - If you store your locales in lib/locale and you use the AVAILABLE_LOCALES idiom as suggested in http://rails-i18n.org/wiki/pages/i18n-available_locales then you can use this in config\initializers\available_locales.rb - - - #See http://guides.rubyonrails.org/i18n.html # # Get loaded locales conveniently # See http://rails-i18n.org/wiki/pages/i18n-available_locales module I18n class << self def available_locales; backend.available_locales; end end module Backend class Simple def available_locales; translations.keys.collect { |l| l.to_s }.sort; end end end end # You need to "force-initialize" loaded locales I18n.backend.send(:init_translations) AVAILABLE_LOCALES = I18n.backend.available_locales RAILS_DEFAULT_LOGGER.debug "* Loaded locales: #{AVAILABLE_LOCALES.inspect}" #Shnelvar: Remove UTF-8 indicator bytes so that YAML::load works AVAILABLE_LOCALES.each do |localization_name| # localization_name is, e.g. "de" localization_name_dot_yml = localization_name + ''.yml'' localization_file_name File.join(''lib/locale'',localization_name_dot_yml) yaml_str = IO.read(localization_file_name) utf_8__3_byte_indicator = "\xEF\xBB\xBF" if yaml_str[0..2] == utf_8__3_byte_indicator yaml_str = yaml_str[3...yaml_str.size] File.open(localization_file_name,"w") { |f| f << yaml_str } puts localization_file_name + '' has had the UTF-8 indicator bytes removed'' end end - - - Suggestions and comments are welcome. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> What I had done was load the ANSI (i.e. iso-8859-1) localization file > into Notepad, convert to UTF-8, and saved that file.<…>> It turns out that Notepad places "\xEF\xBB\xBF" at the beginning of the > file to indicate that this is a YAML file.This is not to indicate a YAML file (I doubt Notepad knows that YAML is at all). This is Byte-Order-Mark http://en.wikipedia.org/wiki/Byte-order_mark> Gimme a break! > > Note only does Notepad put in these indicator bytes ... so does > TextMate.<…>> How thoughtful that TextMate does what the article says it should not > do. If there is a way to turn off that behavior, I can''t find it. > Maybe there''s a TextMate bundle ... who knows?Really? Never saw Textmate to do that. Are you sure you did not just loaded file saved elsewhere with BOM? Regards, Rimantas -- http://rimantas.com/ -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
>> How thoughtful that TextMate does what the article says it should not >> do. �If there is a way to turn off that behavior, I can''t find it. >> Maybe there''s a TextMate bundle ... who knows? > > > Really? Never saw Textmate to do that. Are you sure you did not > just loaded file saved elsewhere with BOM?Yes ... absolutely certain. I use a hex editor to remove the BOM ... resave. I examine the file with another hex editor ... the BOM is not there. I go into TextMate ... load the file ... resave ... and the BOM reappears. This only happens if TextMate detects UTF-8 characters in the file. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Ralph Shnelvar wrote:> >>> How thoughtful that TextMate does what the article says it should not >>> do. �If there is a way to turn off that behavior, I can''t find it. >>> Maybe there''s a TextMate bundle ... who knows? >> >> >> Really? Never saw Textmate to do that. Are you sure you did not >> just loaded file saved elsewhere with BOM? > > Yes ... absolutely certain. > > I use a hex editor to remove the BOM ... resave. > > I examine the file with another hex editor ... the BOM is not there. > > I go into TextMate ... load the file ... resave ... and the BOM > reappears. > > This only happens if TextMate detects UTF-8 characters in the file.Is there a setting to save as "UTF-8 without BOM" or something? Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Marnen Laibow-Koser wrote:> Is there a setting to save as "UTF-8 without BOM" or something?As I said earlier, if there is a setting, I can''t find it. TextMate has things call "bundles. These are mini-applications tht can be integrated into TextMate. Someone, somewhere may have figured out how to do it. What UTF-8-compliant editor do you use, Marnen? -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> Marnen Laibow-Koser wrote: >> Is there a setting to save as "UTF-8 without BOM" or something? > > As I said earlier, if there is a setting, I can''t find it.Textmate does not save BOM for UTF-8 files. Just choose save as, utf-8 and that''s it. Regards, Rimantas -- http://rimantas.com/ -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Rimantas Liubertas wrote:> Textmate does not save BOM for UTF-8 files. > > Just choose save as, utf-8 and that''s it.Oh, Geez, I feel like a complete idiot ... I am using "e" as the text editor ... which the advertising says is "textmate for windows." Sorry! It is "e" that is saving BOM. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Ralph Shnelvar wrote:> Marnen Laibow-Koser wrote: >> Is there a setting to save as "UTF-8 without BOM" or something? > > As I said earlier, if there is a setting, I can''t find it. > > TextMate has things call "bundles. These are mini-applications tht can > be integrated into TextMate. Someone, somewhere may have figured out > how to do it. > > What UTF-8-compliant editor do you use, Marnen?I mostly use KomodoEdit, for whatever it''s worth; also sometimes jEdit, NetBeans, TextWrangler, Eclipse/Aptana... Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.