radio-9FJW37AA6n1Wk0Htik3J/w@public.gmane.org
2005-Aug-12 13:15 UTC
M17N Ruby, Kconv and Multilingual Rails
Hello. Though I don''t try Multilingual Rails, I have two troublesome points in Multilingual Rails. One is that Multilingual Rails uses UTF-8 internally. This is completely against M17N(Multilingualization) Ruby. In the future version of Ruby, Matz will introduce M17N Ruby. Matz said that in M17N Ruby, the internal charcater encoding must not be determined as only one encoding(Code Set Independent). When M17N Ruby is used widely, how does Multilingual Ruby handles String objects? The other is Kconv module. From Multilingual Rails 0.5, Kconv methods are redefined by iconv. Because the implementation of iconv depends on the OS, we have possibilty that iconv can''t handle some Japanese encodings. (especially, when converting from/to ShiftJIS) At least Ruby-1.8.x, we can convert euc-jp/iso-2022-jp/ShiftJIS/UTF-8/UTF-16 with Kconv. It seems to me that the reason why Multilingual Rails redefines the Kconv methods is not enough. ** NISHIO Mizuho
> Hello.Hi!> Though I don''t try Multilingual Rails, > I have two troublesome points in Multilingual Rails. > > One is that Multilingual Rails uses UTF-8 internally. > This is completely against M17N(Multilingualization) Ruby. > In the future version of Ruby, Matz will introduce M17N Ruby. > Matz said that in M17N Ruby, the internal charcater encoding > must not be determined as only one encoding(Code Set Independent). > When M17N Ruby is used widely, > how does Multilingual Ruby handles String objects?When that future version of Ruby is released in the future, Multilingual Rails will probably be updated to make use of it. Until then I''ll stick with using UTF-8 internally because UTF-8 can represent any language and characterset in the world.> The other is Kconv module. From Multilingual Rails 0.5, > Kconv methods are redefined by iconv. > Because the implementation of iconv depends on the OS, > we have possibilty that iconv can''t handle some Japanese encodings. > (especially, when converting from/to ShiftJIS) > At least Ruby-1.8.x, we can convert > euc-jp/iso-2022-jp/ShiftJIS/UTF-8/UTF-16 with Kconv. > It seems to me that the reason why Multilingual Rails > redefines the Kconv methods is not enough.Multilingual Rails is supposed to run supported server-based applications. It doesn''t try to be the all-in-one solution for Ruby desktop-applications, only Ruby on Rails-based server-applications. I think it''s fair to make the assumption that the admin will choose an operating system that supports the needed features, or at least install a newer version of iconv if it''s too old... If you tell me an OS that doesn''t support the overridden Kconv methods I''ll re-evalute overriding those methods. :) // Per
radio-9FJW37AA6n1Wk0Htik3J/w@public.gmane.org
2005-Aug-13 13:39 UTC
Re: M17N Ruby, Kconv and Multilingual Rails
Hi.>> One is that Multilingual Rails uses UTF-8 internally. >> This is completely against M17N(Multilingualization) Ruby. > When that future version of Ruby is released in the future, > Multilingual Rails will probably be updated to make use of it.Currently, M17N Ruby has no document though matz wrote its code. If the implementation of M17N Ruby is far from that of MLR, users of MLR may pay the cost for MLR version up.> Until then I''ll stick with using UTF-8 internally because UTF-8 can > represent any language and characterset in the world.No. UTF-8 can''t represent any characterset in the world. For exmaple, UTF-8 can''t deal with Mojikyo-kagami. It has about 120000 characters. http://www.mojikyo.org/ This may be one of the reason why Matz don''t use unicode as internal chacter encoding. Unicode can''t represent all charactersets in the world.>> The other is Kconv module. From Multilingual Rails 0.5, >> Kconv methods are redefined by iconv. >> Because the implementation of iconv depends on the OS, >> we have possibilty that iconv can''t handle some Japanese encodings. >> (especially, when converting from/to ShiftJIS) >> At least Ruby-1.8.x, we can convert >> euc-jp/iso-2022-jp/ShiftJIS/UTF-8/UTF-16 with Kconv. >> It seems to me that the reason why Multilingual Rails >> redefines the Kconv methods is not enough. > > Multilingual Rails is supposed to run supported server-based > applications. It doesn''t try to be the all-in-one solution for Ruby > desktop-applications, only Ruby on Rails-based server-applications. I > think it''s fair to make the assumption that the admin will choose an > operating system that supports the needed features, or at least > install a newer version of iconv if it''s too old...If iconv which the OS supports lacks some features we can''t use its features in MLR. (even if original Kconv can do well.) Why do we reinstall libiconv instead of using original Kconv?> If you tell me an OS that doesn''t support the overridden Kconv > methods I''ll re-evalute overriding those methods. :)I use libiconv patch for version 1.9.2 in order to handle CP932 well. http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html(Japanese) Maybe, this patch is used by many Japanese. Appendix http://www.miraclelinux.com/english/technet/samba30/iconv_issues.html In this article, libiconv issues are explained. Though I don''t know the detail of them, some of them are not solved in latest libiconv/glibc.
radio-9FJW37AA6n1Wk0Htik3J/w@public.gmane.org
2005-Aug-13 13:40 UTC
Re: M17N Ruby, Kconv and Multilingual Rails
Hi.>> One is that Multilingual Rails uses UTF-8 internally. >> This is completely against M17N(Multilingualization) Ruby. > When that future version of Ruby is released in the future, > Multilingual Rails will probably be updated to make use of it.Currently, M17N Ruby has no document though matz wrote its code. If the implementation of M17N Ruby is far from that of MLR, users of MLR may pay the cost for MLR version up.> Until then I''ll stick with using UTF-8 internally because UTF-8 can > represent any language and characterset in the world.No. UTF-8 can''t represent any characterset in the world. For exmaple, UTF-8 can''t deal with Mojikyo-kagami. It has about 120000 characters. http://www.mojikyo.org/ This may be one of the reason why Matz don''t use unicode as internal chacter encoding. Unicode can''t represent all charactersets in the world.>> The other is Kconv module. From Multilingual Rails 0.5, >> Kconv methods are redefined by iconv. >> Because the implementation of iconv depends on the OS, >> we have possibilty that iconv can''t handle some Japanese encodings. >> (especially, when converting from/to ShiftJIS) >> At least Ruby-1.8.x, we can convert >> euc-jp/iso-2022-jp/ShiftJIS/UTF-8/UTF-16 with Kconv. >> It seems to me that the reason why Multilingual Rails >> redefines the Kconv methods is not enough. > > Multilingual Rails is supposed to run supported server-based > applications. It doesn''t try to be the all-in-one solution for Ruby > desktop-applications, only Ruby on Rails-based server-applications. I > think it''s fair to make the assumption that the admin will choose an > operating system that supports the needed features, or at least > install a newer version of iconv if it''s too old...If iconv which the OS supports lacks some features we can''t use its features in MLR. (even if original Kconv can do well.) Why do we reinstall libiconv instead of using original Kconv?> If you tell me an OS that doesn''t support the overridden Kconv > methods I''ll re-evalute overriding those methods. :)I use libiconv patch for version 1.9.2 in order to handle CP932 well. http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html(Japanese) Maybe, this patch is used by many Japanese. Appendix http://www.miraclelinux.com/english/technet/samba30/iconv_issues.html In this article, libiconv issues are explained. Though I don''t know the detail of them, some of them are not solved in latest libiconv/glibc.
>>> One is that Multilingual Rails uses UTF-8 internally. >>> This is completely against M17N(Multilingualization) Ruby. >>> >> When that future version of Ruby is released in the future, >> Multilingual Rails will probably be updated to make use of it. >> > Currently, M17N Ruby has no document though matz wrote its code. > If the implementation of M17N Ruby is far from that of MLR, > users of MLR may pay the cost for MLR version up.I haven''t looked at the M17N implementation but my guess is that it keeps track of what encoding the current string use so it would be possible to keep the MLR interface 100% API-compatible when M17N is released. (Ruby v2.0?)>> Until then I''ll stick with using UTF-8 internally because UTF-8 can >> represent any language and characterset in the world. >> > No. UTF-8 can''t represent any characterset in the world. > For exmaple, UTF-8 can''t deal with Mojikyo-kagami. > It has about 120000 characters.Ok, UTF-8 can represent ALMOST any characterset in the world. :) Until a stable M17N Ruby is released I''ll stick with using UTF-8 internally because that scratches all my itches. If you think this decision is horrible, you are welcome to send patches. :)> [kconv information]You''ve convinced me. MLR v0.6 will no longer overload the Kconv methods. // Per