I''m testing ruby-head through rvm but can''t get ''ação''.mb_chars.upcase == ''AÇÃO''... I get ''AçãO'' instead... This happens both for Rails 2.3.5 and Rails 3 beta 3... How can I get upcase to work correctly? Thanks in advance, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
in 1.9 mb_chars simply returns self. This behaviour is coming straight from ruby core: http://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/string/multibyte.rb#L53-65 On Sat, May 8, 2010 at 11:00 AM, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> I''m testing ruby-head through rvm but can''t get ''ação''.mb_chars.upcase => ''AÇÃO''... I get ''AçãO'' instead... > > This happens both for Rails 2.3.5 and Rails 3 beta 3... > > How can I get upcase to work correctly? > > Thanks in advance, > > Rodrigo. > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Core" group. > To post to this group, send email to rubyonrails-core@googlegroups.com. > To unsubscribe from this group, send email to > rubyonrails-core+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/rubyonrails-core?hl=en. > >-- Cheers Koz -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Is there any approach currently used for making the Ruby 1.8/Rails 2.3.5 behavior the same in Ruby 1.9? This is important for virtually any non-english application... Are there any plans for integration some library for achieving the same results as Rails currently supports? Rodrigo. Em 08-05-2010 00:04, Michael Koziarski escreveu:> in 1.9 mb_chars simply returns self. This behaviour is coming > straight from ruby core: > > http://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/string/multibyte.rb#L53-65 > > On Sat, May 8, 2010 at 11:00 AM, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > >> I''m testing ruby-head through rvm but can''t get ''ação''.mb_chars.upcase =>> ''AÇÃO''... I get ''AçãO'' instead... >> >> This happens both for Rails 2.3.5 and Rails 3 beta 3... >> >> How can I get upcase to work correctly? >> >> Thanks in advance, >> >> Rodrigo. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Ruby on Rails: Core" group. >> To post to this group, send email to rubyonrails-core@googlegroups.com. >> To unsubscribe from this group, send email to >> rubyonrails-core+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/rubyonrails-core?hl=en. >> >> >> > > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> Is there any approach currently used for making the Ruby 1.8/Rails 2.3.5 > behavior the same in Ruby 1.9? > > This is important for virtually any non-english application... Are there any > plans for integration some library for achieving the same results as Rails > currently supports?My understanding is that ruby 1.9 is meant to support all these operations internally, our mb_chars functionality was only ever intended as a stop-gap until ruby itself could do native multi-byte aware string operations. So what you''re seeing are bugs in ruby which should be fixed there, we probably shouldn''t be maintaining a second multi-byte aware library. -- Cheers Koz -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Sat, May 8, 2010 at 02:34, Michael Koziarski <michael@koziarski.com> wrote:> On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: >> Is there any approach currently used for making the Ruby 1.8/Rails 2.3.5 >> behavior the same in Ruby 1.9?Not a solution, and perhaps you''re already aware of this, but as a workaround to these issues you can get an instance of ActiveSupport::Multibyte::Chars and perform the operations you need: ActiveSupport::Multibyte::Chars.new("café").upcase This lets you use the same methods that would be used on Ruby 1.8. Regards, Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Em 08-05-2010 02:34, Michael Koziarski escreveu:> On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > >> Is there any approach currently used for making the Ruby 1.8/Rails 2.3.5 >> behavior the same in Ruby 1.9? >> >> This is important for virtually any non-english application... Are there any >> plans for integration some library for achieving the same results as Rails >> currently supports? >> > My understanding is that ruby 1.9 is meant to support all these > operations internally, our mb_chars functionality was only ever > intended as a stop-gap until ruby itself could do native multi-byte > aware string operations. So what you''re seeing are bugs in ruby which > should be fixed there, we probably shouldn''t be maintaining a second > multi-byte aware library. > > >Please, take a look at this documentation for String#upcase: http://ruby-doc.org/ruby-1.9/classes/String.html#M000593 "Returns a copy of str with all lowercase letters replaced with their uppercase counterparts. The operation is locale insensitive—*only characters ``a’’ to ``z’’ are affected*. Note: case replacement is effective only in ASCII region." It doesn''t seem Ruby 1.9 will change this behavior, so Rails should keep using its Proxy approach while Ruby doesn''t support it itself. My guess is that mb_chars should be set on Rails initialization with something like: def mb_chars self end String.send :include, StringMultiBytePatch unless ''ação''.upcase == ''AÇÃO'' Of course this is not the real code, but a suggestiong of an approach... The StringMultiBytePatch module would override mb_chars to use ActiveSupport::Multibyte::Chars proxy as noted by Norman Clarke. Please, see also this thread from 2008: http://old.nabble.com/String-upcase-downcase-with-UTF-8-strings-in-Ruby-1.9-td18372062.html --- |in *Ruby* *1*.*9* I get the following behaviour: | |>> "aoueäöüé".*upcase* |=> "AOUEäöüé" |>> "AOUEÄÖÜÉ".downcase |=> "aoueÄÖÜÉ" | |I can''t find however find a bug in the bug tracking system. |Doesn''t this qualify as a bug? The document for String#*upcase* says: call-seq: str.*upcase* => new_str Returns a copy of <i>str</i> with all lowercase letters replaced with their uppercase counterparts. The operation is locale insensitive---only characters ``a'''' to ``z'''' are affected. Note: case replacement is effective only in ASCII region. "hEllO".*upcase* #=> "HELLO" See "Note:". Tim Bray have persuaded me to do so, since case conversion outside of ASCII region is highly dependent on country, language, culture and script. matz. --- So, it doesn''t seem Matz consider this a bug and he won''t probably change this behavior for Ruby 1.9... So, don''t you think we should continue supporting mb_chars as before? Best regards, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Em 08-05-2010 09:57, Norman Clarke escreveu:> On Sat, May 8, 2010 at 02:34, Michael Koziarski<michael@koziarski.com> wrote: > >> On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas >> <rr.rosas@gmail.com> wrote: >> >>> Is there any approach currently used for making the Ruby 1.8/Rails 2.3.5 >>> behavior the same in Ruby 1.9? >>> > Not a solution, and perhaps you''re already aware of this, but as a > workaround to these issues you can get an instance of > ActiveSupport::Multibyte::Chars and perform the operations you need: > > ActiveSupport::Multibyte::Chars.new("café").upcase > > This lets you use the same methods that would be used on Ruby 1.8. > > Regards, > > NormanHi Norman, while this seem to work with Rails 3 beta, it didn''t work with rails 2.3.5 in my tests... Any idea of why is this behavior different between 2.3.5 and 3? Thanks, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Sat, May 8, 2010 at 12:24, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> > Em 08-05-2010 02:34, Michael Koziarski escreveu: >> >> On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas >> <rr.rosas@gmail.com> wrote: >> >>> > Please, take a look at this documentation for String#upcase: > > http://ruby-doc.org/ruby-1.9/classes/String.html#M000593 > > "Returns a copy of str with all lowercase letters replaced with their > uppercase counterparts. The operation is locale insensitive—*only characters > ``a’’ to ``z’’ are affected*. Note: case replacement is effective only in > ASCII region."> Please, see also this thread from 2008: > http://old.nabble.com/String-upcase-downcase-with-UTF-8-strings-in-Ruby-1.9-td18372062.html > > --- > |in *Ruby* *1*.*9* I get the following behaviour: > | > |>> "aoueäöüé".*upcase* > |=> "AOUEäöüé" > |>> "AOUEÄÖÜÉ".downcase > |=> "aoueÄÖÜÉ" > | > |I can''t find however find a bug in the bug tracking system. > |Doesn''t this qualify as a bug? > > The document for String#*upcase* says: > > call-seq: > str.*upcase* => new_str > > Returns a copy of <i>str</i> with all lowercase letters replaced with their > uppercase counterparts. The operation is locale insensitive---only > characters ``a'''' to ``z'''' are affected. > Note: case replacement is effective only in ASCII region. > > "hEllO".*upcase* #=> "HELLO" > > See "Note:". Tim Bray have persuaded me to do so, since case > conversion outside of ASCII region is highly dependent on country, > language, culture and script.I had been considering working a patch to add a "light" proxy class for 1.9.x that uses some but not all of the method in the proxy class for 1.8. If it''s true that there are no plans to add UTF-8 case-folding to Ruby 1.9 then I think it would be a good idea. I''ve been working on multibyte a bit lately and would be happy to work on it some more if folks think it would be useful. There are also a couple of pedantic issues with AS''s case folding, such as incomplete support for Greek and Turkic languages, that I''d like to fix. I''ll look into it this week to see if maybe that would be worthwhile as well. -Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Interesting, I didn''t realize this was going to change in 1.9.2. While I sympathize with Matz for not wanting to step into the minefield that is case folding, I''m a bit disappointed. With no built-in support for that, or normalization, Ruby''s UTF-8 support is so weak that I find myself relying on AS more and more, even outside Rails apps. I had considered working on a light multibyte proxy class for 1.9 when 1.9.1-p343 broke String#center and a few other methods, but decided against it when I saw it fixed in 1.9.2. AS''s case folding is a little lacking too, because it doesn''t implement case folding for Greek and Turkic as recommended for Unicode 5.1. I''ve been hacking on multibye quite a bit lately and would be happy to take a longer look if folks think it''s worthwhile. -Norman On May 8, 2010 12:25 PM, "Rodrigo Rosenfeld Rosas" <rr.rosas@gmail.com> wrote: Em 08-05-2010 02:34, Michael Koziarski escreveu:> > On Sat, May 8, 2010 at 12:03 PM, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > >>...Please, take a look at this documentation for String#upcase: http://ruby-doc.org/ruby-1.9/classes/String.html#M000593 "Returns a copy of str with all lowercase letters replaced with their uppercase counterparts. The operation is locale insensitive—*only characters ``a’’ to ``z’’ are affected*. Note: case replacement is effective only in ASCII region." It doesn''t seem Ruby 1.9 will change this behavior, so Rails should keep using its Proxy approach while Ruby doesn''t support it itself. My guess is that mb_chars should be set on Rails initialization with something like: def mb_chars self end String.send :include, StringMultiBytePatch unless ''ação''.upcase == ''AÇÃO'' Of course this is not the real code, but a suggestiong of an approach... The StringMultiBytePatch module would override mb_chars to use ActiveSupport::Multibyte::Chars proxy as noted by Norman Clarke. Please, see also this thread from 2008: http://old.nabble.com/String-upcase-downcase-with-UTF-8-strings-in-Ruby-1.9-td18372062.html --- |in *Ruby* *1*.*9* I get the following behaviour: | |>> "aoueäöüé".*upcase* |=> "AOUEäöüé" |>> "AOUEÄÖÜÉ".downcase |=> "aoueÄÖÜÉ" | |I can''t find however find a bug in the bug tracking system. |Doesn''t this qualify as a bug? The document for String#*upcase* says: call-seq: str.*upcase* => new_str Returns a copy of <i>str</i> with all lowercase letters replaced with their uppercase counterparts. The operation is locale insensitive---only characters ``a'''' to ``z'''' are affected. Note: case replacement is effective only in ASCII region. "hEllO".*upcase* #=> "HELLO" See "Note:". Tim Bray have persuaded me to do so, since case conversion outside of ASCII region is highly dependent on country, language, culture and script. matz. --- So, it doesn''t seem Matz consider this a bug and he won''t probably change this behavior for Ruby 1.9... So, don''t you think we should continue supporting mb_chars as before? Best regards, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core... -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
> sympathize with Matz for not wanting to step into the minefield that is case... Sorry for the double post, Looks like I accidentally sent an earlier draft from my phone. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On 8-May-10, at 1:31 PM, Norman Clarke wrote:> If it''s true that there are no plans to add UTF-8 case-folding to Ruby > 1.9 then I think it would be a good idea. I''ve been working on > multibyte a bit lately and would be happy to work on it some more if > folks think it would be useful.I''d say that developing this as part of the I18n gem or even standalone would be better than as part of rails, as it would be very useful outside of rails, and not everybody who uses rails would need this functionality. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Em 08-05-2010 15:56, Mateo Murphy escreveu:> > On 8-May-10, at 1:31 PM, Norman Clarke wrote: > >> If it''s true that there are no plans to add UTF-8 case-folding to Ruby >> 1.9 then I think it would be a good idea. I''ve been working on >> multibyte a bit lately and would be happy to work on it some more if >> folks think it would be useful. > > I''d say that developing this as part of the I18n gem or even > standalone would be better than as part of rails, as it would be very > useful outside of rails, and not everybody who uses rails would need > this functionality. > >I agree that writing this in I18n or a standalone library would probably be better because of you first argument, but not for the last one... Rails has an approach different from Merb or Sinatra in the way it is a full-stack framework. I believe multibyte support would be more useful for most people than REST support, for instance... But since AS is also an independent library and could be used outside Rails too, I don''t see any problems in patching String in AS... But I think it would be cleaner if it was an independent library that could be used inside I18n or AS gem... Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Sat, May 8, 2010 at 16:31, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> Em 08-05-2010 15:56, Mateo Murphy escreveu: >> >> On 8-May-10, at 1:31 PM, Norman Clarke wrote: >> >>> If it''s true that there are no plans to add UTF-8 case-folding to Ruby >>> 1.9 then I think it would be a good idea. I''ve been working on >>> multibyte a bit lately and would be happy to work on it some more if >>> folks think it would be useful. >> >> I''d say that developing this as part of the I18n gem or even standalone >> would be better than as part of rails, as it would be very useful outside of >> rails, and not everybody who uses rails would need this functionality. >> >> > I agree that writing this in I18n or a standalone library would probably be > better because of you first argument, but not for the last one... > > Rails has an approach different from Merb or Sinatra in the way it is a > full-stack framework. I believe multibyte support would be more useful for > most people than REST support, for instance... > > But since AS is also an independent library and could be used outside Rails > too, I don''t see any problems in patching String in AS... But I think it > would be cleaner if it was an independent library that could be used inside > I18n or AS gem...These two libraries provide pretty good support for UTF-8 manipulation: http://github.com/blackwinter/unicode http://github.com/lang/unicode_utils Yoshida Masato''s is written in C and provides good performance, while Stefan Lang''s is written in Ruby and also appears to provide support for proper UTF-8 case folding, so there''s probably no need to duplicate the effort of adding that to AS; it should be easy enough to just implement proxy classes that use them, and make AS use them in place of its default proxy class: ActiveSupport::Multibyte.proxy_class = PutativeUnicodeProxyClass ActiveSupport::Multibyte.proxy_class = PutativeUnicodeUtilsProxyClass But I do think that Rails should still provide decent support for case folding, and the behavior of commonly-used things like #upcase and #downcase should not change so dramatically when you use Ruby 1.9 vs 1.8. It would be pretty simple to extract some methods from Multibyte::Chars into a module that can be shared between the current feature-rich proxy class for 1.8 and a thinner one for 1.9. -Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Em 08-05-2010 17:02, Norman Clarke escreveu:> On Sat, May 8, 2010 at 16:31, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > >> Em 08-05-2010 15:56, Mateo Murphy escreveu: >> >>> On 8-May-10, at 1:31 PM, Norman Clarke wrote: >>> >>> >>>> If it''s true that there are no plans to add UTF-8 case-folding to Ruby >>>> 1.9 then I think it would be a good idea. I''ve been working on >>>> multibyte a bit lately and would be happy to work on it some more if >>>> folks think it would be useful. >>>> >>> I''d say that developing this as part of the I18n gem or even standalone >>> would be better than as part of rails, as it would be very useful outside of >>> rails, and not everybody who uses rails would need this functionality. >>> >>> >>> >> I agree that writing this in I18n or a standalone library would probably be >> better because of you first argument, but not for the last one... >> >> Rails has an approach different from Merb or Sinatra in the way it is a >> full-stack framework. I believe multibyte support would be more useful for >> most people than REST support, for instance... >> >> But since AS is also an independent library and could be used outside Rails >> too, I don''t see any problems in patching String in AS... But I think it >> would be cleaner if it was an independent library that could be used inside >> I18n or AS gem... >> > These two libraries provide pretty good support for UTF-8 manipulation: > > http://github.com/blackwinter/unicode > http://github.com/lang/unicode_utils > > Yoshida Masato''s is written in C and provides good performance, while > Stefan Lang''s is written in Ruby and also appears to provide support > for proper UTF-8 case folding, so there''s probably no need to > duplicate the effort of adding that to AS; it should be easy enough to > just implement proxy classes that use them, and make AS use them in > place of its default proxy class: > > ActiveSupport::Multibyte.proxy_class = PutativeUnicodeProxyClass > ActiveSupport::Multibyte.proxy_class = PutativeUnicodeUtilsProxyClass > > But I do think that Rails should still provide decent support for case > folding, and the behavior of commonly-used things like #upcase and > #downcase should not change so dramatically when you use Ruby 1.9 vs > 1.8. It would be pretty simple to extract some methods from > Multibyte::Chars into a module that can be shared between the current > feature-rich proxy class for 1.8 and a thinner one for 1.9. >Agreed. Is it possible in Bundler to add dependency to either unicode or unicode_utils gem? This should work as script/server, in Rails 2. If it finds a mongrel, use it, othercase, use webrick... If the faster C implementation is available, use it, else try the pure Ruby alternative... Is it possible? Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
AS::Multibyte currently implements two things: encoding aware string operations and Unicode algorithms. 1.9 only implements encoding aware string operations. We could activate the proxy with the Unicode operations for 1.9, that should solve most people''s problems. I don''t really like the idea of depending on external libraries for this kind of functionality because the most used algorithms are already defined in Multibyte. Manfred On May 8, 10:02 pm, Norman Clarke <nor...@njclarke.com> wrote:> On Sat, May 8, 2010 at 16:31, Rodrigo Rosenfeld Rosas > > > > > > <rr.ro...@gmail.com> wrote: > > Em 08-05-2010 15:56, Mateo Murphy escreveu: > > >> On 8-May-10, at 1:31 PM, Norman Clarke wrote: > > >>> If it''s true that there are no plans to add UTF-8 case-folding to Ruby > >>> 1.9 then I think it would be a good idea. I''ve been working on > >>> multibyte a bit lately and would be happy to work on it some more if > >>> folks think it would be useful. > > >> I''d say that developing this as part of the I18n gem or even standalone > >> would be better than as part of rails, as it would be very useful outside of > >> rails, and not everybody who uses rails would need this functionality. > > > I agree that writing this in I18n or a standalone library would probably be > > better because of you first argument, but not for the last one... > > > Rails has an approach different from Merb or Sinatra in the way it is a > > full-stack framework. I believe multibyte support would be more useful for > > most people than REST support, for instance... > > > But since AS is also an independent library and could be used outside Rails > > too, I don''t see any problems in patching String in AS... But I think it > > would be cleaner if it was an independent library that could be used inside > > I18n or AS gem... > > These two libraries provide pretty good support for UTF-8 manipulation: > > http://github.com/blackwinter/unicodehttp://github.com/lang/unicode_utils > > Yoshida Masato''s is written in C and provides good performance, while > Stefan Lang''s is written in Ruby and also appears to provide support > for proper UTF-8 case folding, so there''s probably no need to > duplicate the effort of adding that to AS; it should be easy enough to > just implement proxy classes that use them, and make AS use them in > place of its default proxy class: > > ActiveSupport::Multibyte.proxy_class = PutativeUnicodeProxyClass > ActiveSupport::Multibyte.proxy_class = PutativeUnicodeUtilsProxyClass > > But I do think that Rails should still provide decent support for case > folding, and the behavior of commonly-used things like #upcase and > #downcase should not change so dramatically when you use Ruby 1.9 vs > 1.8. It would be pretty simple to extract some methods from > Multibyte::Chars into a module that can be shared between the current > feature-rich proxy class for 1.8 and a thinner one for 1.9. > > -Norman > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. > To post to this group, send email to rubyonrails-core@googlegroups.com. > To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. > For more options, visit this group athttp://groups.google.com/group/rubyonrails-core?hl=en.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Mon, May 10, 2010 at 04:12, Manfred Stienstra <manfred@gmail.com> wrote:> I don''t really like the idea of depending on external libraries for > this kind of functionality because the most used algorithms are > already defined in Multibyte.I agree. I was thinking more about implementing proxy classes for them in a separate library that people could use, for example, if they needed either the high performance of the library written in C, or the proper case-folding for Greek and Turkic that the other one provides. -Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
That''s more or less how it''s right now. The C implementation is called Unichars: http://github.com/Manfred/unichars. On May 10, 1:55 pm, Norman Clarke <nor...@njclarke.com> wrote:> On Mon, May 10, 2010 at 04:12, Manfred Stienstra <manf...@gmail.com> wrote: > > I don''t really like the idea of depending on external libraries for > > this kind of functionality because the most used algorithms are > > already defined in Multibyte. > > I agree. I was thinking more about implementing proxy classes for them > in a separate library that people could use, for example, if they > needed either the high performance of the library written in C, or the > proper case-folding for Greek and Turkic that the other one provides. > > -Norman > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. > To post to this group, send email to rubyonrails-core@googlegroups.com. > To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. > For more options, visit this group athttp://groups.google.com/group/rubyonrails-core?hl=en.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
2010/5/9 Norman Clarke <norman@njclarke.com>:> Interesting, I didn''t realize this was going to change in 1.9.2.1.9.2''s feature is already froze and it doesn''t have such Unicode utilities. We ruby-core know such needs for Unicode utility and had some discussion about it but we can''t agree its spec and implementation. I think it needs more time.> While I > sympathize with Matz for not wanting to step into the minefield that is case > folding, I''m a bit disappointed. With no built-in support for that, or > normalization, Ruby''s UTF-8 support is so weak that I find myself relying on > AS more and more, even outside Rails apps. > > I had considered working on a light multibyte proxy class for 1.9 when > 1.9.1-p343 broke String#center and a few other methods, but decided against > it when I saw it fixed in 1.9.2. AS''s case folding is a little lacking too, > because it doesn''t implement case folding for Greek and Turkic as > recommended for Unicode 5.1. > I''ve been hacking on multibye quite a bit lately and would be happy to take > a longer look if folks think it''s worthwhile.FYI: If you implement case folding for greek and Turkic, a string (or something), the string needs language information. Selecting font, calculating width, -- NARUSE, Yui naruse@airemix.jp -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Thu, May 13, 2010 at 04:54, NARUSE, Yui <naruse@airemix.jp> wrote:> 2010/5/9 Norman Clarke <norman@njclarke.com>: >> Interesting, I didn''t realize this was going to change in 1.9.2. > > 1.9.2''s feature is already froze and it doesn''t have such Unicode utilities. > > We ruby-core know such needs for Unicode utility and had some discussion > about it but we can''t agree its spec and implementation. > I think it needs more time. > >> While I >> sympathize with Matz for not wanting to step into the minefield that is case >> folding, I''m a bit disappointed. With no built-in support for that, or >> normalization, Ruby''s UTF-8 support is so weak that I find myself relying on >> AS more and more, even outside Rails apps. >> >> I had considered working on a light multibyte proxy class for 1.9 when >> 1.9.1-p343 broke String#center and a few other methods, but decided against >> it when I saw it fixed in 1.9.2. AS''s case folding is a little lacking too, >> because it doesn''t implement case folding for Greek and Turkic as >> recommended for Unicode 5.1. >> I''ve been hacking on multibye quite a bit lately and would be happy to take >> a longer look if folks think it''s worthwhile. > > FYI: > If you implement case folding for greek and Turkic, a string (or something), > the string needs language information. Selecting font, calculating width,Hi all, I submitted a patch to fix the upcasing issue with 1.9 about a week ago[1], but haven''t gotten any followup yet. I saw today that there''s been some more work on this area, so my patch now conflicts with Rails master. If somebody has the time and inclination, could you let me know if there''s any interest in including my changes? In addition to resolving the issue with upcasing on Ruby 1.9, I added an ActiveSupport::Multibyte::Unicode module to contain the class methods from ActiveSupport::Multibyte::Chars, and then moved in some related functionality to the module for the sake of consistency. I''m happy to resolve the conflicts to make the patch apply again, but if people don''t like the direction my refactoring went and don''t want to include the changes, then no problem, I''ll just kill my branch[2] and won''t bother resolving the conflicts. Either way, I think it would still be ideal to get a fix for the upcasing issue before 3.0 is released. Regards, Norman [1] https://rails.lighthouseapp.com/projects/8994/tickets/4595-stringmb_charsupcase-doesnt-upcase-non-ascii-chars-on-with-ruby-19x [2] http://github.com/norman/rails/commit/f01dd100a7853e9bb5c7eb9097068ddb9ed1909d -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Em 21-05-2010 14:30, Norman Clarke escreveu:> On Thu, May 13, 2010 at 04:54, NARUSE, Yui<naruse@airemix.jp> wrote: > >> 2010/5/9 Norman Clarke<norman@njclarke.com>: >> >>> Interesting, I didn''t realize this was going to change in 1.9.2. >>> >> 1.9.2''s feature is already froze and it doesn''t have such Unicode utilities. >> >> We ruby-core know such needs for Unicode utility and had some discussion >> about it but we can''t agree its spec and implementation. >> I think it needs more time. >> >> >>> While I >>> sympathize with Matz for not wanting to step into the minefield that is case >>> folding, I''m a bit disappointed. With no built-in support for that, or >>> normalization, Ruby''s UTF-8 support is so weak that I find myself relying on >>> AS more and more, even outside Rails apps. >>> >>> I had considered working on a light multibyte proxy class for 1.9 when >>> 1.9.1-p343 broke String#center and a few other methods, but decided against >>> it when I saw it fixed in 1.9.2. AS''s case folding is a little lacking too, >>> because it doesn''t implement case folding for Greek and Turkic as >>> recommended for Unicode 5.1. >>> I''ve been hacking on multibye quite a bit lately and would be happy to take >>> a longer look if folks think it''s worthwhile. >>> >> FYI: >> If you implement case folding for greek and Turkic, a string (or something), >> the string needs language information. Selecting font, calculating width, >> > Hi all, > > I submitted a patch to fix the upcasing issue with 1.9 about a week > ago[1], but haven''t gotten any followup yet. I saw today that there''s > been some more work on this area, so my patch now conflicts with Rails > master. > > If somebody has the time and inclination, could you let me know if > there''s any interest in including my changes? In addition to resolving > the issue with upcasing on Ruby 1.9, I added an > ActiveSupport::Multibyte::Unicode module to contain the class methods > from ActiveSupport::Multibyte::Chars, and then moved in some related > functionality to the module for the sake of consistency. > > I''m happy to resolve the conflicts to make the patch apply again, but > if people don''t like the direction my refactoring went and don''t want > to include the changes, then no problem, I''ll just kill my branch[2] > and won''t bother resolving the conflicts. > > Either way, I think it would still be ideal to get a fix for the > upcasing issue before 3.0 is released. > > Regards, > > Norman > > > [1] https://rails.lighthouseapp.com/projects/8994/tickets/4595-stringmb_charsupcase-doesnt-upcase-non-ascii-chars-on-with-ruby-19x >Norman, take a look at the above link. It seems Jeremy is willing to accept your patch. Please rebase agains master again. Best regards, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.