Hi all, In response to Rodrigo Rosas''s message about mb_chars.upcase not giving the expected result on 1.9, I''ve done some work in a fork to make String#mb_chars always return an instance of a proxy class, both with Ruby 1.8 and Ruby 1.9. The end result of the patch is (hopefully) to make Rails'' multibyte functionality behave the same way in 1.8.7 and 1.9.x. http://github.com/norman/rails/tree/multibyte Basically, the problem is that with current edge Rails and 1.9.x, `"café".mb_chars.upcase` will return "CAFé" rather than the expected "CAFÉ". In my changes, the proxy class leaves some methods undefined for 1.9 because they have a native equivalent, but redefines a few others because either they are buggy or, like String#upcase, don''t have the same behavior as AS::Multibyte::Chars. Additionally, I refactored all of the Unicode support in ActiveSupport into a new module, ActiveSupport::Multibyte::Unicode. This makes some useful functionality like UTF-8 normalization/composition/decomposition easier to reuse since it''s no longer bound to the ActiveSupport::Multibyte::Chars class. I''d be very grateful for any feedback. Regards, Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Rodrigo Rosenfeld Rosas
2010-May-11 19:51 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
Norman, I checked out your multibyte branch but it is not working for me. Here is what I did: $ cd ~/src/rails $ git remote add norman http://github.com/norman/rails.git $ git remote update $ git checkout norman/multibyte -b multibyte $ rvm ruby-head $ gem install thor bundle $ ruby bin/rails ~/temp/multibyte --dev $ cd ~/temp/multibyte $ script/rails c $ > ''ação''.mb_chars.upcase # yields ''AO'' instead of ''AÇÃO'' $ > ''ação''.mb_chars.class # yields ActiveSupport::Multibyte::Chars - OK Any ideas? Also, from the diffs between master and your branch I could realize that there is a lot of multibyte code in ActiveSupport. Maybe this could be put in an external gem on which AS would depend of. It would make AS cleaner and it would allow testing other gems as proxies... For instance, when running on JRuby, it would probably be better to have a different approach since strings in Java are unicode and String#toUpperCase() would already give the expected results... Any thoughts? Thank you for your effort on correcting this multibyte issue for Ruby 1.9 on Rails, Rodrigo. Em 10-05-2010 15:01, Norman Clarke escreveu:> Hi all, > > In response to Rodrigo Rosas''s message about mb_chars.upcase not > giving the expected result on 1.9, I''ve done some work in a fork to > make String#mb_chars always return an instance of a proxy class, both > with Ruby 1.8 and Ruby 1.9. The end result of the patch is > (hopefully) to make Rails'' multibyte functionality behave the same way > in 1.8.7 and 1.9.x. > > http://github.com/norman/rails/tree/multibyte > > Basically, the problem is that with current edge Rails and 1.9.x, > `"café".mb_chars.upcase` will return "CAFé" rather than the expected > "CAFÉ". > > In my changes, the proxy class leaves some methods undefined for 1.9 > because they have a native equivalent, but redefines a few others > because either they are buggy or, like String#upcase, don''t have the > same behavior as AS::Multibyte::Chars. > > Additionally, I refactored all of the Unicode support in ActiveSupport > into a new module, ActiveSupport::Multibyte::Unicode. This makes some > useful functionality like UTF-8 > normalization/composition/decomposition easier to reuse since it''s no > longer bound to the ActiveSupport::Multibyte::Chars class. > > I''d be very grateful for any feedback. > > Regards, > > Norman > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Norman Clarke
2010-May-11 20:24 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
On Tue, May 11, 2010 at 16:51, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> Norman, I checked out your multibyte branch but it is not working for me. > Here is what I did: > <...> > Any ideas?No, not off the top of my head. But I''ll retrace your steps and see if I get the same problems. Thanks for looking into it and getting back to me with your detailed feedback. :)> Also, from the diffs between master and your branch I could realize that > there is a lot of multibyte code in ActiveSupport. Maybe this could be put > in an external gem on which AS would depend of. It would make AS cleaner and > it would allow testing other gems as proxies... For instance, when running > on JRuby, it would probably be better to have a different approach since > strings in Java are unicode and String#toUpperCase() would already give the > expected results... Any thoughts?I don''t think there''s "a lot" of multibyte code in ActiveSupport, it''s around 1000 lines, or roughly twice the size of inflector. Maintaining it in a separate gem would be more project management overhead, for something that doesn''t usually see a lot of developer activity and is going to be required anyway. Also, it''s very easy to write your own proxy classes if you want, for example, to use one the relies on Java''s native string handling for JRuby. I wouldn''t be opposed if the Rails team wanted to do that, but I just don''t see any significant benefit. -Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Norman Clarke
2010-May-12 16:02 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
On Tue, May 11, 2010 at 16:51, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> Norman, I checked out your multibyte branch but it is not working for me. > Here is what I did: > > $ cd ~/src/rails > $ git remote add norman http://github.com/norman/rails.git > $ git remote update > $ git checkout norman/multibyte -b multibyte > $ rvm ruby-head > $ gem install thor bundle > $ ruby bin/rails ~/temp/multibyte --dev > $ cd ~/temp/multibyte > $ script/rails c > $ > ''ação''.mb_chars.upcase # yields ''AO'' instead of ''AÇÃO'' > $ > ''ação''.mb_chars.class # yields ActiveSupport::Multibyte::Chars - OK > > Any ideas?I just checked this out and it is working correctly for me. I''m not sure where things are going wrong for you, but I''m unable to reproduce your problem. Here''s more or less what I just did: cd ~/work/rails git checkout master git pull origin master git checkout multibyte git rebase master cd activesupport rvm ruby-head rake test # this pukes because of recent changes to String rvm 1.9.2 rake test # segfault rvm 1.9.1 rake test # ok, all tests pass. cd .. ruby bin/rails /tmp/mb --dev cd /tmp/mb now create temp.rb with following contents: # encoding utf-8 puts ''ação''.mb_chars.upcase ruby script/rails runner temp.rb #works rvm ruby-head bundle install ruby script/rails runner temp.rb # also works rvm ree ruby script/rails runner temp.rb # also works These are the Rubies I have installed (I''m on 64-bit Snow Leopard) $ rvm list rvm Rubies jruby-1.4.0 [ [x86_64-java] ] ree-1.8.7-2010.01 [ x86_64 ] ruby-1.8.6-p399 [ x86_64 ] ruby-1.9.1-p243 [ x86_64 ] ruby-1.9.1-p378 [ x86_64 ] ruby-1.9.2-preview1 [ x86_64 ] => ruby-head [ x86_64 ] System Ruby system [ x86_64 i386 ppc ] -Norman -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Rodrigo Rosenfeld Rosas
2010-May-12 23:59 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
HEm 12-05-2010 13:02, Norman Clarke escreveu:> On Tue, May 11, 2010 at 16:51, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > > >> Norman, I checked out your multibyte branch but it is not working for me. >> Here is what I did: >> >> $ cd ~/src/rails >> $ git remote add norman http://github.com/norman/rails.git >> $ git remote update >> $ git checkout norman/multibyte -b multibyte >> $ rvm ruby-head >> $ gem install thor bundle >> $ ruby bin/rails ~/temp/multibyte --dev >> $ cd ~/temp/multibyte >> $ script/rails c >> $> ''ação''.mb_chars.upcase # yields ''AO'' instead of ''AÇÃO'' >> $> ''ação''.mb_chars.class # yields ActiveSupport::Multibyte::Chars - OK >> >> Any ideas? >> > I just checked this out and it is working correctly for me. I''m not > sure where things are going wrong for you, but I''m unable to reproduce > your problem. Here''s more or less what I just did: > > cd ~/work/rails > git checkout master > git pull origin master > git checkout multibyte > git rebase master > cd activesupport > rvm ruby-head > rake test # this pukes because of recent changes to String > rvm 1.9.2 > rake test # segfault > rvm 1.9.1 > rake test # ok, all tests pass. > cd .. > ruby bin/rails /tmp/mb --dev > cd /tmp/mb > > now create temp.rb with following contents: > # encoding utf-8 > puts ''ação''.mb_chars.upcase > > ruby script/rails runner temp.rb #works > rvm ruby-head > bundle install > ruby script/rails runner temp.rb # also works > rvm ree > ruby script/rails runner temp.rb # also works > > These are the Rubies I have installed (I''m on 64-bit Snow Leopard) > > $ rvm list > > rvm Rubies > > jruby-1.4.0 [ [x86_64-java] ] > ree-1.8.7-2010.01 [ x86_64 ] > ruby-1.8.6-p399 [ x86_64 ] > ruby-1.9.1-p243 [ x86_64 ] > ruby-1.9.1-p378 [ x86_64 ] > ruby-1.9.2-preview1 [ x86_64 ] > => ruby-head [ x86_64 ] > > System Ruby > > system [ x86_64 i386 ppc ] > > -Norman > >Hello Norman, I couldn''t test at work today and your branch seems to be working in my recent tests at home. I''ll try to get some time tomorrow to test at work again... Thank you, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Rodrigo Rosenfeld Rosas
2010-May-13 19:41 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
Em 12-05-2010 13:02, Norman Clarke escreveu:> On Tue, May 11, 2010 at 16:51, Rodrigo Rosenfeld Rosas > <rr.rosas@gmail.com> wrote: > > >> Norman, I checked out your multibyte branch but it is not working for me. >> Here is what I did: >> >> $ cd ~/src/rails >> $ git remote add norman http://github.com/norman/rails.git >> $ git remote update >> $ git checkout norman/multibyte -b multibyte >> $ rvm ruby-head >> $ gem install thor bundle >> $ ruby bin/rails ~/temp/multibyte --dev >> $ cd ~/temp/multibyte >> $ script/rails c >> $> ''ação''.mb_chars.upcase # yields ''AO'' instead of ''AÇÃO'' >> $> ''ação''.mb_chars.class # yields ActiveSupport::Multibyte::Chars - OK >> >> Any ideas? >> > I just checked this out and it is working correctly for me. I''m not > sure where things are going wrong for you, but I''m unable to reproduce > your problem. Here''s more or less what I just did: > > cd ~/work/rails > git checkout master > git pull origin master > git checkout multibyte > git rebase master > cd activesupport > rvm ruby-head > rake test # this pukes because of recent changes to String > rvm 1.9.2 > rake test # segfault > rvm 1.9.1 > rake test # ok, all tests pass. > cd .. > ruby bin/rails /tmp/mb --dev > cd /tmp/mb > > now create temp.rb with following contents: > # encoding utf-8 > puts ''ação''.mb_chars.upcase > > ruby script/rails runner temp.rb #works >Using this approach (a runner with a file specifying the encoding) your branch works at my work too. But at home, I can run ''ação''.mb_chars.upcase in rails console and it works too. At work, ''ação''.mb_chars yields ''ao''. Any idea why this is not consistent in both environments? Thanks, Rodrigo. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Norman Clarke
2010-May-13 20:01 UTC
Re: feedback on a few ActiveSupport::Multibyte patches
On Thu, May 13, 2010 at 16:41, Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> wrote:> Em 12-05-2010 13:02, Norman Clarke escreveu:> But at home, I can run ''ação''.mb_chars.upcase in rails console and it works > too. At work, ''ação''.mb_chars yields ''ao''. Any idea why this is not > consistent in both environments?If you''re trying it on the console, it''s probably a difference in the way your consoles are set up to handle UTF-8 characters. I think the only really reliable way to test this is by putting the text in a file. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.