I was searching for string encoding issues in Ruby. Here is the summary of what I learnt, in case its useful to anyone else of if anyone has any corrections to this. Ruby 1.8 support for encoding: * A comment like "# -*- coding: utf-8 -*-" at the start of the file is supposed to determine how to parse a .rb file, but I haven''t really figured out how to make this work. Non-ansi characters cause an error while loading the file. * ruby.exe -K<kcode> sets $KCODE (which can also be set programmaticaly) * $KCODE affects the following: * Determines the encoding to use to parse .rb files. Normally, identifiers have to be ANSI, but the limitation is removed if $KCODE is set to "UTF8". * Affects whether inspect escapes non-ascii chars, or if it leaves them as is. * Affects how regexps without an explicit encoding interpret the input string. Ruby 1.9 support for encodings: * Identifiers can be non-ANSI by default. Ruby 2.0 support for encodings: * Each string and symbol knows its own encoding, and String#force_encoding can change the encoding of an existing string. * IO#encoding to control encoding to use for reading/writing from disk -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/158c5556/attachment-0001.html>
On Fri, Feb 13, 2009 at 5:01 PM, Shri Borde <Shri.Borde at microsoft.com>wrote:> > Ruby 1.8 support for encoding: > > ? A comment like "# -*- coding: utf-8 -*-" at the start of the > file is supposed to determine how to parse a .rb file, but I haven''t really > figured out how to make this work. Non-ansi characters cause an error while > loading the file. >Did the utf-8 file(s) you tried have a BOM or not? -Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/05dbf3c4/attachment.html>
If I use Notepad2''s menu to set the encoding to "UTF8 with signature", and run either "ruby utf8_with_signature.rb" or "ruby -Ku utf8_with_signature.rb", the file fails to parse. The file is attached. If I save the file with encoding set just as "UTF8", the file is 3 bytes smaller. "ruby utf8.rb" fails, but "ruby -Ku utf8.rb" works. With "-Ku", things work even if I do not have "# -*- coding: utf-8 -*-" in the file. The repro files are attached. From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Matthew Wilson Sent: Friday, February 13, 2009 5:11 PM To: ironruby-core at rubyforge.org Subject: Re: [Ironruby-core] $KCODE and encodings On Fri, Feb 13, 2009 at 5:01 PM, Shri Borde <Shri.Borde at microsoft.com<mailto:Shri.Borde at microsoft.com>> wrote: Ruby 1.8 support for encoding: * A comment like "# -*- coding: utf-8 -*-" at the start of the file is supposed to determine how to parse a .rb file, but I haven''t really figured out how to make this work. Non-ansi characters cause an error while loading the file. Did the utf-8 file(s) you tried have a BOM or not? -Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/2cc07df2/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: utf8_with_signature.rb Type: application/octet-stream Size: 45 bytes Desc: utf8_with_signature.rb URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/2cc07df2/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: utf8.rb Type: application/octet-stream Size: 42 bytes Desc: utf8.rb URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/2cc07df2/attachment-0001.obj>
AFAIK Ruby 1.8 doesn''t support magic comments that specify encodings at all, 1.9 does. Ruby 1.8 also doesn''t recognize BOM. Even version 1.9 has full encoding support, not just 2.0. Tomas From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Shri Borde Sent: Friday, February 13, 2009 3:01 PM To: ironruby-core at rubyforge.org Subject: [Ironruby-core] $KCODE and encodings I was searching for string encoding issues in Ruby. Here is the summary of what I learnt, in case its useful to anyone else of if anyone has any corrections to this. Ruby 1.8 support for encoding: * A comment like "# -*- coding: utf-8 -*-" at the start of the file is supposed to determine how to parse a .rb file, but I haven''t really figured out how to make this work. Non-ansi characters cause an error while loading the file. * ruby.exe -K<kcode> sets $KCODE (which can also be set programmaticaly) * $KCODE affects the following: * Determines the encoding to use to parse .rb files. Normally, identifiers have to be ANSI, but the limitation is removed if $KCODE is set to "UTF8". * Affects whether inspect escapes non-ascii chars, or if it leaves them as is. * Affects how regexps without an explicit encoding interpret the input string. Ruby 1.9 support for encodings: * Identifiers can be non-ANSI by default. Ruby 2.0 support for encodings: * Each string and symbol knows its own encoding, and String#force_encoding can change the encoding of an existing string. * IO#encoding to control encoding to use for reading/writing from disk -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090213/d25fbce7/attachment-0001.html>