Daniel Berger
2012-May-22 00:17 UTC
[Win32utils-devel] Ruby 1.9 and Encoding.default_external
Hi, Just curious, when using Ruby 1.9 on my Windows 7 laptop, strings are encoded in IBM437 by default. However, when I check my default code page using GetCPInfoEx, I get Windows-1252. This is causing some confusion when trying to port code to FFI and JRuby, which by default encodes strings as Windows-1252. # code_page.rb require ''ffi'' class Windows extend FFI::Library ffi_convention :stdcall ffi_lib :kernel32 attach_function :GetConsoleCP, [], :uint attach_function :GetCPInfoEx, :GetCPInfoExA, [:uint, :ulong, :pointer], :bool # From WinNls.h MAX_LEADBYTES = 12 MAX_DEFAULTCHAR = 2 CP_ACP = 0 # From WinDef.h MAX_PATH = 260 class CPINFOEX < FFI::Struct layout( :MaxCharSize, :uint, :DefaultChar, [:uchar, MAX_DEFAULTCHAR], :LeadByte, [:uchar, MAX_LEADBYTES], :UnicodeDefaultChar, [:char, 2], :CodePage, :uint, :CodePageName, [:char, MAX_PATH] ) end def self.cp_number GetConsoleCP() end def self.cp_name ptr = CPINFOEX.new unless GetCPInfoEx(CP_ACP, 0, ptr) raise SystemCallError, FFI.errno, "GetCPInfoEx" end ptr[:CodePageName] end end p Windows.cp_number # 437 p Windows.cp_name # 1252 (ANSI - Latin I) Is this a case of the system default not being the same as the console code page? If so, isn''t this a bug in MRI then? Regards, Dan
Heesob Park
2012-May-22 01:02 UTC
[Win32utils-devel] Ruby 1.9 and Encoding.default_external
Hi, 2012/5/22 Daniel Berger <djberg96 at gmail.com>> Hi, > > Just curious, when using Ruby 1.9 on my Windows 7 laptop, strings are > encoded in IBM437 by default. However, when I check my default code > page using GetCPInfoEx, I get Windows-1252. > > This is causing some confusion when trying to port code to FFI and > JRuby, which by default encodes strings as Windows-1252. > > # code_page.rb > require ''ffi'' > > class Windows > extend FFI::Library > ffi_convention :stdcall > ffi_lib :kernel32 > > attach_function :GetConsoleCP, [], :uint > attach_function :GetCPInfoEx, :GetCPInfoExA, [:uint, :ulong, :pointer], > :bool > > # From WinNls.h > MAX_LEADBYTES = 12 > MAX_DEFAULTCHAR = 2 > CP_ACP = 0 > > # From WinDef.h > MAX_PATH = 260 > > class CPINFOEX < FFI::Struct > layout( > :MaxCharSize, :uint, > :DefaultChar, [:uchar, MAX_DEFAULTCHAR], > :LeadByte, [:uchar, MAX_LEADBYTES], > :UnicodeDefaultChar, [:char, 2], > :CodePage, :uint, > :CodePageName, [:char, MAX_PATH] > ) > end > > def self.cp_number > GetConsoleCP() > end > > def self.cp_name > ptr = CPINFOEX.new > > unless GetCPInfoEx(CP_ACP, 0, ptr) > raise SystemCallError, FFI.errno, "GetCPInfoEx" > end > > ptr[:CodePageName] > end > end > > p Windows.cp_number # 437 > p Windows.cp_name # 1252 (ANSI - Latin I) > > Is this a case of the system default not being the same as the console > code page? If so, isn''t this a bug in MRI then? > > > IBM437 is a legacy of MS-DOS and used for console application.Refer to http://en.wikipedia.org/wiki/Code_page http://blogs.msdn.com/b/michkap/archive/2005/02/08/369197.aspx Regards, Park Heesob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/win32utils-devel/attachments/20120522/0c8b0667/attachment-0001.html>
Daniel Berger
2012-May-22 01:22 UTC
[Win32utils-devel] Ruby 1.9 and Encoding.default_external
On Mon, May 21, 2012 at 7:02 PM, Heesob Park <phasis at gmail.com> wrote:> Hi, > > 2012/5/22 Daniel Berger <djberg96 at gmail.com> >> >> Hi, >> >> Just curious, when using Ruby 1.9 on my Windows 7 laptop, strings are >> encoded in IBM437 by default. However, when I check my default code >> page using GetCPInfoEx, I get Windows-1252. >> >> This is causing some confusion when trying to port code to FFI and >> JRuby, which by default encodes strings as Windows-1252. >> >> # code_page.rb >> require ''ffi'' >> >> class Windows >> ?extend FFI::Library >> ?ffi_convention :stdcall >> ?ffi_lib :kernel32 >> >> ?attach_function :GetConsoleCP, [], :uint >> ?attach_function :GetCPInfoEx, :GetCPInfoExA, [:uint, :ulong, :pointer], >> :bool >> >> ?# From WinNls.h >> ?MAX_LEADBYTES = 12 >> ?MAX_DEFAULTCHAR = 2 >> ?CP_ACP = 0 >> >> ?# From WinDef.h >> ?MAX_PATH = 260 >> >> ?class CPINFOEX < FFI::Struct >> ? ?layout( >> ? ? ?:MaxCharSize, :uint, >> ? ? ?:DefaultChar, [:uchar, MAX_DEFAULTCHAR], >> ? ? ?:LeadByte, [:uchar, MAX_LEADBYTES], >> ? ? ?:UnicodeDefaultChar, [:char, 2], >> ? ? ?:CodePage, :uint, >> ? ? ?:CodePageName, [:char, MAX_PATH] >> ? ?) >> ?end >> >> ?def self.cp_number >> ? ?GetConsoleCP() >> ?end >> >> ?def self.cp_name >> ? ?ptr = CPINFOEX.new >> >> ? ?unless GetCPInfoEx(CP_ACP, 0, ptr) >> ? ? ?raise SystemCallError, FFI.errno, "GetCPInfoEx" >> ? ?end >> >> ? ?ptr[:CodePageName] >> ?end >> end >> >> p Windows.cp_number # 437 >> p Windows.cp_name # 1252 ?(ANSI - Latin I) >> >> Is this a case of the system default not being the same as the console >> code page? If so, isn''t this a bug in MRI then? >> >> > IBM437 is a legacy of MS-DOS and used for console application. > > Refer to > http://en.wikipedia.org/wiki/Code_page > http://blogs.msdn.com/b/michkap/archive/2005/02/08/369197.aspxOk, so what''s the correct way to encode strings by default then? This all started as the result of the ffi branch of the win32-dir project. The tests pass with MRI using 1.9.3 but if I try to use JRuby with the --1.9 option I get InvalidByteSequence failures in the Dir.getwd method. Which is odd, because I can''t duplicate the issues when I run standalone code with JRuby. Regards, Dan
Heesob Park
2012-May-22 02:09 UTC
[Win32utils-devel] Ruby 1.9 and Encoding.default_external
Hi, 2012/5/22 Daniel Berger <djberg96 at gmail.com>> On Mon, May 21, 2012 at 7:02 PM, Heesob Park <phasis at gmail.com> wrote: > > Hi, > > > > 2012/5/22 Daniel Berger <djberg96 at gmail.com> > >> > >> Hi, > >> > >> Just curious, when using Ruby 1.9 on my Windows 7 laptop, strings are > >> encoded in IBM437 by default. However, when I check my default code > >> page using GetCPInfoEx, I get Windows-1252. > >> > >> This is causing some confusion when trying to port code to FFI and > >> JRuby, which by default encodes strings as Windows-1252. > >> > >> # code_page.rb > >> require ''ffi'' > >> > >> class Windows > >> extend FFI::Library > >> ffi_convention :stdcall > >> ffi_lib :kernel32 > >> > >> attach_function :GetConsoleCP, [], :uint > >> attach_function :GetCPInfoEx, :GetCPInfoExA, [:uint, :ulong, :pointer], > >> :bool > >> > >> # From WinNls.h > >> MAX_LEADBYTES = 12 > >> MAX_DEFAULTCHAR = 2 > >> CP_ACP = 0 > >> > >> # From WinDef.h > >> MAX_PATH = 260 > >> > >> class CPINFOEX < FFI::Struct > >> layout( > >> :MaxCharSize, :uint, > >> :DefaultChar, [:uchar, MAX_DEFAULTCHAR], > >> :LeadByte, [:uchar, MAX_LEADBYTES], > >> :UnicodeDefaultChar, [:char, 2], > >> :CodePage, :uint, > >> :CodePageName, [:char, MAX_PATH] > >> ) > >> end > >> > >> def self.cp_number > >> GetConsoleCP() > >> end > >> > >> def self.cp_name > >> ptr = CPINFOEX.new > >> > >> unless GetCPInfoEx(CP_ACP, 0, ptr) > >> raise SystemCallError, FFI.errno, "GetCPInfoEx" > >> end > >> > >> ptr[:CodePageName] > >> end > >> end > >> > >> p Windows.cp_number # 437 > >> p Windows.cp_name # 1252 (ANSI - Latin I) > >> > >> Is this a case of the system default not being the same as the console > >> code page? If so, isn''t this a bug in MRI then? > >> > >> > > IBM437 is a legacy of MS-DOS and used for console application. > > > > Refer to > > http://en.wikipedia.org/wiki/Code_page > > http://blogs.msdn.com/b/michkap/archive/2005/02/08/369197.aspx > > Ok, so what''s the correct way to encode strings by default then? > > This all started as the result of the ffi branch of the win32-dir > project. The tests pass with MRI using 1.9.3 but if I try to use JRuby > with the --1.9 option I get InvalidByteSequence failures in the > Dir.getwd method. Which is odd, because I can''t duplicate the issues > when I run standalone code with JRuby. > >Well, I have no idea because I am not a JRuby user or tester. You can better answer for it on the JRuby Mailing Lists. http://jruby.org/community or http://www.ruby-forum.com/forum/jruby Regards, Park Heesob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/win32utils-devel/attachments/20120522/84d97ef7/attachment.html>
Daniel Berger
2012-May-22 02:12 UTC
[Win32utils-devel] Ruby 1.9 and Encoding.default_external
On Mon, May 21, 2012 at 8:09 PM, Heesob Park <phasis at gmail.com> wrote:> Hi, > > 2012/5/22 Daniel Berger <djberg96 at gmail.com> >> >> On Mon, May 21, 2012 at 7:02 PM, Heesob Park <phasis at gmail.com> wrote: >> > Hi, >> > >> > 2012/5/22 Daniel Berger <djberg96 at gmail.com> >> >> >> >> Hi, >> >> >> >> Just curious, when using Ruby 1.9 on my Windows 7 laptop, strings are >> >> encoded in IBM437 by default. However, when I check my default code >> >> page using GetCPInfoEx, I get Windows-1252. >> >> >> >> This is causing some confusion when trying to port code to FFI and >> >> JRuby, which by default encodes strings as Windows-1252. >> >> >> >> # code_page.rb >> >> require ''ffi'' >> >> >> >> class Windows >> >> ?extend FFI::Library >> >> ?ffi_convention :stdcall >> >> ?ffi_lib :kernel32 >> >> >> >> ?attach_function :GetConsoleCP, [], :uint >> >> ?attach_function :GetCPInfoEx, :GetCPInfoExA, [:uint, :ulong, >> >> :pointer], >> >> :bool >> >> >> >> ?# From WinNls.h >> >> ?MAX_LEADBYTES = 12 >> >> ?MAX_DEFAULTCHAR = 2 >> >> ?CP_ACP = 0 >> >> >> >> ?# From WinDef.h >> >> ?MAX_PATH = 260 >> >> >> >> ?class CPINFOEX < FFI::Struct >> >> ? ?layout( >> >> ? ? ?:MaxCharSize, :uint, >> >> ? ? ?:DefaultChar, [:uchar, MAX_DEFAULTCHAR], >> >> ? ? ?:LeadByte, [:uchar, MAX_LEADBYTES], >> >> ? ? ?:UnicodeDefaultChar, [:char, 2], >> >> ? ? ?:CodePage, :uint, >> >> ? ? ?:CodePageName, [:char, MAX_PATH] >> >> ? ?) >> >> ?end >> >> >> >> ?def self.cp_number >> >> ? ?GetConsoleCP() >> >> ?end >> >> >> >> ?def self.cp_name >> >> ? ?ptr = CPINFOEX.new >> >> >> >> ? ?unless GetCPInfoEx(CP_ACP, 0, ptr) >> >> ? ? ?raise SystemCallError, FFI.errno, "GetCPInfoEx" >> >> ? ?end >> >> >> >> ? ?ptr[:CodePageName] >> >> ?end >> >> end >> >> >> >> p Windows.cp_number # 437 >> >> p Windows.cp_name # 1252 ?(ANSI - Latin I) >> >> >> >> Is this a case of the system default not being the same as the console >> >> code page? If so, isn''t this a bug in MRI then? >> >> >> >> >> > IBM437 is a legacy of MS-DOS and used for console application. >> > >> > Refer to >> > http://en.wikipedia.org/wiki/Code_page >> > http://blogs.msdn.com/b/michkap/archive/2005/02/08/369197.aspx >> >> Ok, so what''s the correct way to encode strings by default then? >> >> This all started as the result of the ffi branch of the win32-dir >> project. The tests pass with MRI using 1.9.3 but if I try to use JRuby >> with the --1.9 option I get InvalidByteSequence failures in the >> Dir.getwd method. Which is odd, because I can''t duplicate the issues >> when I run standalone code with JRuby. >> > > Well, I have no idea because I am not a JRuby user or tester. > You can better answer for it on the JRuby Mailing Lists. > http://jruby.org/community or > http://www.ruby-forum.com/forum/jrubyOk, sorry, I will ask there. Regards, Dan