Julian ''Julik'' Tarkhanov
2006-Jun-11 07:09 UTC
Getting back to the old story of Unicode support
I ranted and I raved and I tried and failed :-) Unfortunately, my proposal to wreak havoc in the String class proved futile - it popped up bugs that I never seen before (among others CGI escaping and ERB broke, the latter in a rather intricate way). Outside of these two problems the solution worked on my applications though (but I would not recommend using it in production by now as all the implications become clear). Also the speed overhead proved very substantial. However, after some more tweaks I have come to the simple idea of having proxy access to the characters instead of subclassing. Maybe this can be useful for ActiveSupport (it is going to be able to address the more expensive routines only where characters are involved, and when Matz finally comes up with a good M17N engine - doh - it will be a matter of aliasing a method to self. Works (in general) like this: @some_unicode_string.u.length @some_unicode_string.u.reverse etc. with modifications to the string in-place where necessary. Here is the test suite http://julik.textdriven.com/svn/tools/rails_plugins/unicode_hacks/ test/t_string_overrides.rb I think this can be refactored nicely into the string extensions in ActiveSupport, after that things like validates_length_of and truncate will be able to address the characters explicitly without resorting to regexen. And the users get a goot head start on Unicode in their apps from the get go. It has an implicit dependency on the Unicode gem for normalization and capitalization, but that can be easily stubbed out to dummy methods and made optional (having, for example, an alert in the development log). Considering that Rails has some implicit dependencies all by itself I don''t see that as too much burden, but this is my opinion of course. I will gladly provide a patch shall the core find this something noteworthy. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
Mislav Marohnić
2006-Jun-11 12:56 UTC
Re: Getting back to the old story of Unicode support
That is some nice work here, Julian. Proxy access makes this possible to adopt while not breaking anything - I would, too, like to see this someday _without_ a proxy. I have a question and a request that aren''t really about the hacks (I would leave that to more experienced programmers) but about the plugin. First, about ''db_unicode_client.rb'' - isn''t this functionality (setting client encoding) already present by specifying ''encoding'' attribute in database.yml? Second: when setting "content-type" header for output, could you not force text/html but put in a variable which value defaults to text/html so we can provide ''application/xhtml+xml'' or other content types in specific controllers? Sorry to bug you with such somewhat not very relevant issues, but I feel that this needs to be a truly universally droppable plugin and these kind of minor tweaks will make it such. Keep up the good i18n work, -Mislav On 6/11/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote:> > ...after some more tweaks I have come to the > simple idea of having proxy access to the characters instead of > subclassing. Maybe this can be useful for ActiveSupport (it is going > to be able to address the more expensive routines only where > characters are involved, and when Matz finally comes up with a good > M17N engine - doh - it will be a matter of aliasing a method to self._______________________________________________ Rails-core mailing list Rails-core@lists.rubyonrails.org http://lists.rubyonrails.org/mailman/listinfo/rails-core
Julian ''Julik'' Tarkhanov
2006-Jun-11 22:29 UTC
Re: Getting back to the old story of Unicode support
On 11-jun-2006, at 14:56, Mislav Marohnić wrote:> That is some nice work here, Julian. Proxy access makes this > possible to adopt while not breaking anything - I would, too, like > to see this someday _without_ a proxy. > > I have a question and a request that aren''t really about the hacks > (I would leave that to more experienced programmers) but about the > plugin. First, about ''db_unicode_client.rb'' - isn''t this > functionality (setting client encoding) already present by > specifying ''encoding'' attribute in database.yml? Second: when > setting "content-type" header for output, could you not force text/ > html but put in a variable which value defaults to text/html so we > can provide ''application/xhtml+xml'' or other content types in > specific controllers? > > Sorry to bug you with such somewhat not very relevant issues, but I > feel that this needs to be a truly universally droppable plugin and > these kind of minor tweaks will make it such.unicode_hacks has to go, if the core will agree for the chars proxy. As to the encoding configuration, this has to be done in the connection adapters - the reason being, I haven''t yet met an implementation (either in Perl or PHP or Python - and I suspect AR is no different) that would maintain a client encoding should the connection "go away" (it means "do another query when you have to reconnect"). Rails uses persistent connections right now, meaning that without this I am insecure from having my NAMES reset to something I really didn''t want when the connection needs to be reestablished (and it''s common for SQL sockets on shareds to timeout). This has to be handled by ActiveRecord''s adapters IMO (if it''s not handled already). As to the headers, Rails should just default for utf-8 headers when $KCODE is UTF for both xml, rjs and html. See a ticket on this: http://dev.rubyonrails.org/ticket/4975 All of this can be implemented and tested in a backwards-compatible way and friendly for (for example) Japanese folks that want their $KCODE set to JIS and friends, or German people who tend to rely on ISO. The Chars abstraction also caters for this requirement by always checking $KCODE. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
Charles O Nutter
2006-Jun-12 00:20 UTC
Re: Getting back to the old story of Unicode support
I'll jump in to say that we on the JRuby team are also very interested in finding a way to support unicode. We're close to running Rails in more general scenarios, and obviously running on top of the JVM we have the potential for unicode support out of the box. However, we have held off providing any API, hoping for Ruby proper to lead the way. We're not interested in forking the community in any way or providing incompatible functionality; but if there's an acceptable API coming out of Rails that works well and feels right, it could be the answer. I have not had a chance to look at Julian's work, but we'll be watching these developments. One of the most frequent question we get from would-be JRuby users is "why don't you support unicode." We want to...we really do. On 6/11/06, Julian 'Julik' Tarkhanov <listbox@julik.nl> wrote:> > > On 11-jun-2006, at 14:56, Mislav Marohnić wrote: > > > That is some nice work here, Julian. Proxy access makes this > > possible to adopt while not breaking anything - I would, too, like > > to see this someday _without_ a proxy. > > > > I have a question and a request that aren't really about the hacks > > (I would leave that to more experienced programmers) but about the > > plugin. First, about 'db_unicode_client.rb' - isn't this > > functionality (setting client encoding) already present by > > specifying 'encoding' attribute in database.yml? Second: when > > setting "content-type" header for output, could you not force text/ > > html but put in a variable which value defaults to text/html so we > > can provide 'application/xhtml+xml' or other content types in > > specific controllers? > > > > Sorry to bug you with such somewhat not very relevant issues, but I > > feel that this needs to be a truly universally droppable plugin and > > these kind of minor tweaks will make it such. > > unicode_hacks has to go, if the core will agree for the chars proxy. > As to the encoding configuration, this has to be done in the > connection adapters - the reason being, I haven't yet met an > implementation (either in Perl or PHP or Python - and I suspect AR is > no different) that would maintain a client encoding should the > connection "go away" (it means "do another query when you have to > reconnect"). Rails uses persistent connections right now, meaning > that without this I am insecure from having my NAMES reset to > something I really didn't want when the connection needs to be > reestablished (and it's common for SQL sockets on shareds to > timeout). This has to be handled by ActiveRecord's adapters IMO (if > it's not handled already). > > As to the headers, Rails should just default for utf-8 headers when > $KCODE is UTF for both xml, rjs and html. See a ticket on this: > > http://dev.rubyonrails.org/ticket/4975 > > All of this can be implemented and tested in a backwards-compatible > way and friendly for (for example) Japanese folks that want their > $KCODE set to JIS and friends, or German people who tend to rely on > ISO. The Chars abstraction also caters for this requirement by always > checking $KCODE. > > -- > Julian 'Julik' Tarkhanov > please send all personal mail to > me at julik.nl > > > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >-- Charles Oliver Nutter @ headius.blogspot.com JRuby Developer @ jruby.sourceforge.net Application Architect @ www.ventera.com _______________________________________________ Rails-core mailing list Rails-core@lists.rubyonrails.org http://lists.rubyonrails.org/mailman/listinfo/rails-core
Thijs van der Vossen
2006-Jun-12 07:36 UTC
Re: Getting back to the old story of Unicode support
On 11 Jun 2006, at 09:09 , Julian ''Julik'' Tarkhanov wrote:> @some_unicode_string.u.length > @some_unicode_string.u.reverse+1 This is very slick indeed. We at Fingertips would love to see this added to the core. Kind regards, Thijs -- Fingertips - http://www.fngtps.com Phone: +31 (0)6 24204845 Skype: tvandervossen MSN Messenger: thijs@fngtps.com iChat/AOL: t.vandervossen@mac.com Jabber IM: thijs@jabber.org
Julian ''Julik'' Tarkhanov
2006-Jun-12 20:33 UTC
Re: Getting back to the old story of Unicode support
On 12-jun-2006, at 9:36, Thijs van der Vossen wrote:> On 11 Jun 2006, at 09:09 , Julian ''Julik'' Tarkhanov wrote: >> @some_unicode_string.u.length >> @some_unicode_string.u.reverse > > +1 >Moved to http://julik.textdriven.com/svn/tools/rails_plugins/ unicode_hacks/test/t_chars.rb -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
Julian ''Julik'' Tarkhanov
2006-Jun-14 19:00 UTC
[XPATCH] ActiveSupport::Multibyte (was: Getting back to the old story...)
I have incorporated all of the above under http://dev.rubyonrails.org/ticket/5396 Would love to have ome feedback. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
PJ Hyett
2006-Jun-16 20:55 UTC
Re: [XPATCH] ActiveSupport::Multibyte (was: Getting back to the old story...)
Could you make this a plugin? Thanks, PJ On 6/14/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote:> I have incorporated all of the above under > > http://dev.rubyonrails.org/ticket/5396 > > Would love to have ome feedback. > -- > Julian ''Julik'' Tarkhanov > please send all personal mail to > me at julik.nl > > > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >
Julian ''Julik'' Tarkhanov
2006-Jun-16 22:33 UTC
Re: [XPATCH] ActiveSupport::Multibyte (was: Getting back to the old story...)
On 16-jun-2006, at 22:55, PJ Hyett wrote:> Could you make this a plugin?It is a plugin already. Thanks for being helpful. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
PJ Hyett
2006-Jun-18 04:15 UTC
Re: [XPATCH] ActiveSupport::Multibyte (was: Getting back to the old story...)
I hadn''t noticed unicode_hacks plugin was updated, thanks. -PJ On 6/16/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote:> > On 16-jun-2006, at 22:55, PJ Hyett wrote: > > > Could you make this a plugin? > > It is a plugin already. Thanks for being helpful. > > -- > Julian ''Julik'' Tarkhanov > please send all personal mail to > me at julik.nl > > > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >
Really like this! Has there been any feedback from the core? PJ Hyett wrote:> I hadn''t noticed unicode_hacks plugin was updated, thanks. > > -PJ > > On 6/16/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote: >> >> On 16-jun-2006, at 22:55, PJ Hyett wrote: >> >> > Could you make this a plugin? >> >> It is a plugin already. Thanks for being helpful. >> >> -- >> Julian ''Julik'' Tarkhanov >> please send all personal mail to >> me at julik.nl >> >> >> _______________________________________________ >> Rails-core mailing list >> Rails-core@lists.rubyonrails.org >> http://lists.rubyonrails.org/mailman/listinfo/rails-core >> > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >-- Abdur-Rahman Advany http://blog.railsdevelopment.com/