Stumbled on this when testing yaml.>>> "\204"=> "?" While irb(main):005:0>"\204" => "\204" I believe Ruby string can hold arbitrary byte values, but as we are storing content as a string we are obviously losing all values that cannot be represented in default encoding. Tomas, what do you think? -- Oleg -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080713/7cd2933d/attachment-0001.html>
This was a known issue a while back, it''s the reason the Zlib library didn''t work well with binary files. I''m fairly certain there was work being done on making String be backed by a byte array... and in fact I thought this was already done. On Mon, Jul 14, 2008 at 2:20 AM, Oleg Tkachenko <olegtk at microsoft.com> wrote:> Stumbled on this when testing yaml. > > > > >>> "\204" > > => "?" > > > > While > > > > irb(main):005:0>"\204" > > => "\204" > > > > I believe Ruby string can hold arbitrary byte values, but as we are storing > content as a string we are obviously losing all values that cannot be > represented in default encoding. Tomas, what do you think? > > > > -- > > Oleg > > > > _______________________________________________ > Ironruby-core mailing list > Ironruby-core at rubyforge.org > http://rubyforge.org/mailman/listinfo/ironruby-core > >-- Michael Letterle [Polymath Prokrammer] http://blog.prokrams.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080714/59de1144/attachment.html>
MutableString can have one of three internal representations, depending on how it was last used. One of these is a byte array. This particular problem may be in the scanner or parser and not in the actual string class, as we don''t otherwise have a problem storing the character:>>> $s = "\204"=> "?">>> $s[0]=> 63>>> $s[0] = 132=> 132>>> $s=> "\204">>>From: Michael Letterle [mailto:michael.letterle at gmail.com] Sent: Monday, July 14, 2008 6:21 AM To: ironruby-core at rubyforge.org Cc: IronRuby Team Subject: Re: [Ironruby-core] MutableString encoding issue This was a known issue a while back, it''s the reason the Zlib library didn''t work well with binary files. I''m fairly certain there was work being done on making String be backed by a byte array... and in fact I thought this was already done. On Mon, Jul 14, 2008 at 2:20 AM, Oleg Tkachenko <olegtk at microsoft.com<mailto:olegtk at microsoft.com>> wrote: Stumbled on this when testing yaml.>>> "\204"=> "?" While irb(main):005:0>"\204" => "\204" I believe Ruby string can hold arbitrary byte values, but as we are storing content as a string we are obviously losing all values that cannot be represented in default encoding. Tomas, what do you think? -- Oleg _______________________________________________ Ironruby-core mailing list Ironruby-core at rubyforge.org<mailto:Ironruby-core at rubyforge.org> http://rubyforge.org/mailman/listinfo/ironruby-core -- Michael Letterle [Polymath Prokrammer] http://blog.prokrams.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080714/fea8b02d/attachment.html>
I think it''s a bug :). Could you file it? If it''s something that''s blocking you I can look at it asap. Tomas From: Oleg Tkachenko Sent: Sunday, July 13, 2008 11:20 PM To: IronRuby Team Cc: ironruby-core at rubyforge.org Subject: MutableString encoding issue Stumbled on this when testing yaml.>>> "\204"=> "?" While irb(main):005:0>"\204" => "\204" I believe Ruby string can hold arbitrary byte values, but as we are storing content as a string we are obviously losing all values that cannot be represented in default encoding. Tomas, what do you think? -- Oleg -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080714/ce75292a/attachment.html>
This problem is probably StringContent.ToByteArray(). It uses Encoding.GetBytes(string) which obeys .NET encoding semantics and by default replaces any nonconvertible characters to ''?''. And then MutableStringOps.Dump() is using it to create string representation. We could make StringContent.ToByteArray() not replacing nonconvertible characters by using EncodingFallback. BinaryContent.ToString()/ToStringBuilder() also has the same issue. -- Oleg From: Curt Hagenlocher Sent: Monday, July 14, 2008 6:27 AM To: Michael Letterle; ironruby-core at rubyforge.org Cc: IronRuby Team Subject: RE: [Ironruby-core] MutableString encoding issue MutableString can have one of three internal representations, depending on how it was last used. One of these is a byte array. This particular problem may be in the scanner or parser and not in the actual string class, as we don''t otherwise have a problem storing the character:>>> $s = "\204"=> "?">>> $s[0]=> 63>>> $s[0] = 132=> 132>>> $s=> "\204">>>From: Michael Letterle [mailto:michael.letterle at gmail.com] Sent: Monday, July 14, 2008 6:21 AM To: ironruby-core at rubyforge.org Cc: IronRuby Team Subject: Re: [Ironruby-core] MutableString encoding issue This was a known issue a while back, it''s the reason the Zlib library didn''t work well with binary files. I''m fairly certain there was work being done on making String be backed by a byte array... and in fact I thought this was already done. On Mon, Jul 14, 2008 at 2:20 AM, Oleg Tkachenko <olegtk at microsoft.com<mailto:olegtk at microsoft.com>> wrote: Stumbled on this when testing yaml.>>> "\204"=> "?" While irb(main):005:0>"\204" => "\204" I believe Ruby string can hold arbitrary byte values, but as we are storing content as a string we are obviously losing all values that cannot be represented in default encoding. Tomas, what do you think? -- Oleg _______________________________________________ Ironruby-core mailing list Ironruby-core at rubyforge.org<mailto:Ironruby-core at rubyforge.org> http://rubyforge.org/mailman/listinfo/ironruby-core -- Michael Letterle [Polymath Prokrammer] http://blog.prokrams.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080714/1b8b37f7/attachment.html>
Sure. Unfortunately this one blocks our yaml impl passing MRI''s test_yaml.rb. -- Oleg From: Tomas Matousek Sent: Monday, July 14, 2008 9:14 AM To: Oleg Tkachenko; IronRuby Team Cc: ironruby-core at rubyforge.org Subject: RE: MutableString encoding issue I think it''s a bug :). Could you file it? If it''s something that''s blocking you I can look at it asap. Tomas From: Oleg Tkachenko Sent: Sunday, July 13, 2008 11:20 PM To: IronRuby Team Cc: ironruby-core at rubyforge.org Subject: MutableString encoding issue Stumbled on this when testing yaml.>>> "\204"=> "?" While irb(main):005:0>"\204" => "\204" I believe Ruby string can hold arbitrary byte values, but as we are storing content as a string we are obviously losing all values that cannot be represented in default encoding. Tomas, what do you think? -- Oleg -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20080714/4134c0d8/attachment.html>