thr3ads.net - Rails core - Fwd: Unicode roadmap? [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Charles O Nutter

2006-Jun-15 00:19 UTC

Fwd: Unicode roadmap?

I posted this to ruby-talk, but it occurred to me that you folks
implementing Rails functionality probably have a thing or two to say about
unicode support in Ruby. Therefore, I would love to hear your opinions.
Adding native unicode support is only a matter of time in JRuby; its
usefulness as a JVM-based language depends on it. However, we continue to
wrestle with how best to support unicode without stepping on the Ruby
community''s toes in the process. Thoughts?

---------- Forwarded message ----------
From: Charles O Nutter <headius@headius.com>
Date: Jun 14, 2006 7:11 PM
Subject: Re: Unicode roadmap?
To: ruby-talk ML <ruby-talk@ruby-lang.org>

Every time these unicode discussions come up my head spins like a top. You
should see it.

We JRubyists have headaches from the unicode question too. Since JRuby is
currently 1.8-compatible, we do not have what most call *native* unicode
support. This is primarily because we do not wish to create an incompatible
version of Ruby or build in support for unicode now that would conflict with
Ruby 2.0 in the future. It is, however, embarressing to say that although we
run on top of Java, which has arguably pretty good unicode support, we
don''t
support unicode. Perhaps you can see our conundrum.

I am no unicode expert. I know that Java uses UTF16 strings internally,
converted to/from the current platform''s encoding of choice by default.
It
also supports converting those UTF16 strings into just about every encoding
out there, just by telling it to do so. Java supports the Unicode
specification version 3.0. So Unicode is not a problem for Java.

We would love to be able to support unicode in JRuby, but there''s
always
that nagging question of what it should look like and what would mesh well
with the Ruby community at large. With the underlying platform already rich
with unicode support, it would not take much effort to modify JRuby. So then
there''s a simple question:

What form would you, the Ruby users, want unicode to take? Is there a
specific library that you feel encompasses a reasonable implementation of
unicode support, e.g. icu4r? Should the support be transparent, e.g. no
longer treat or assume strings are byte vectors? JRuby, because we use
Java''s String, is already using UTF16 strings exclusively...however
there''s
no way to get at them through core Ruby APIs. What would be the most
comfortable way to support unicode now, considering where Ruby may go in the
future?

--
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com



-- 
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com


_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Manfred Stienstra

2006-Jun-15 00:40 UTC

head link

Re: Fwd: Unicode roadmap?

On Jun 15, 2006, at 2:19 AM, Charles O Nutter wrote:
> I posted this to ruby-talk, but it occurred to me that you folks  
> implementing Rails functionality probably have a thing or two to  
> say about unicode support in Ruby. Therefore, I would love to hear  
> your opinions. Adding native unicode support is only a matter of  
> time in JRuby; its usefulness as a JVM-based language depends on  
> it. However, we continue to wrestle with how best to support  
> unicode without stepping on the Ruby community''s toes in the  
> process. Thoughts?
Julik has done a lot of pionering in that direction for Rails. His  
latest suggestion is to use a proxy class on string objects to  
perform unicode operations:

@some_unicode_string.u.length
@some_unicode_string.u.reverse

I tend to agree with this solution as it doesn''t break any previous  
string operations and gives us an easy way to perform unicode aware  
operations.

Manfred

Charles O Nutter

2006-Jun-15 01:50 UTC

head link

Re: Fwd: Unicode roadmap?

I agree it''s a very attractive solution. I have two questions related
(perhaps you are out there to answer, Julik):

1. How does performance look with the unicode string add-on versus native
strings?
2. Is this the ideal way to support unicode strings in ruby?

And I explain the second as follows....if we could assume that switching
from treating a string as an array of bytes to a list of characters of
arbitrary width, and have all existing string operations work correctly
treating those characters as string, would that be a better ideal? Where are
the breaking points in such a design? What''s to stop the underlying
implementation from actually using a UTF-16 character, passing UTF-8 to
libraries and IO streams but still allowing you to access everything as
UTF-16 or your encoding of choice? (Of course this is somewhat rhetorical;
we do this currently with JRuby since Java''s scrints are UTF-16...we
just
don''t have any way to provide access to UTF-16 characters, and we
normalize
everything to UTF-8 for Ruby''s sake...but what if we didn''t
normalize and
adjusted string functions to compensate?)

On 6/14/06, Manfred Stienstra <manfred@gmail.com>
wrote:>
> On Jun 15, 2006, at 2:19 AM, Charles O Nutter wrote:
>
> > I posted this to ruby-talk, but it occurred to me that you folks
> > implementing Rails functionality probably have a thing or two to
> > say about unicode support in Ruby. Therefore, I would love to hear
> > your opinions. Adding native unicode support is only a matter of
> > time in JRuby; its usefulness as a JVM-based language depends on
> > it. However, we continue to wrestle with how best to support
> > unicode without stepping on the Ruby community''s toes in the
> > process. Thoughts?
>
> Julik has done a lot of pionering in that direction for Rails. His
> latest suggestion is to use a proxy class on string objects to
> perform unicode operations:
>
> @some_unicode_string.u.length
> @some_unicode_string.u.reverse
>
> I tend to agree with this solution as it doesn''t break any
previous
> string operations and gives us an easy way to perform unicode aware
> operations.
>
> Manfred
> _______________________________________________
> Rails-core mailing list
> Rails-core@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails-core
>

-- 
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com

_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Julian ''Julik'' Tarkhanov

2006-Jun-15 02:14 UTC

head link

Re: Fwd: Unicode roadmap?

On 15-jun-2006, at 3:50, Charles O Nutter wrote:
> I agree it''s a very attractive solution. I have two questions  
> related (perhaps you are out there to answer, Julik):
>
> 1. How does performance look with the unicode string add-on versus  
> native strings?
> 2. Is this the ideal way to support unicode strings in ruby?
>
> And I explain the second as follows....if we could assume that  
> switching from treating a string as an array of bytes to a list of  
> characters of arbitrary width, and have all existing string  
> operations work correctly treating those characters as string,  
> would that be a better ideal? Where are the breaking points in such  
> a design? What''s to stop the underlying implementation from  
> actually using a UTF-16 character, passing UTF-8 to libraries and  
> IO streams but still allowing you to access everything as UTF-16 or  
> your encoding of choice? (Of course this is somewhat rhetorical; we  
> do this currently with JRuby since Java''s scrints are UTF-16...we
> just don''t have any way to provide access to UTF-16 characters,
and
> we normalize everything to UTF-8 for Ruby''s sake...but what if we
> didn''t normalize and adjusted string functions to compensate?)
This is more appropriate for ruby-talk

--
Julian ''Julik'' Tarkhanov
please send all personal mail to
me at julik.nl

Charles O Nutter

2006-Jun-15 02:25 UTC

head link

Re: Fwd: Unicode roadmap?

Fair enough; redirected. If any other rails-core folks want to chime in,
please do so...I would expect unicode and multibyte are key issues for
worldwide rails deployments.

On 6/14/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl>
wrote:>
>
> On 15-jun-2006, at 3:50, Charles O Nutter wrote:
>
> > I agree it''s a very attractive solution. I have two questions
> > related (perhaps you are out there to answer, Julik):
> >
> > 1. How does performance look with the unicode string add-on versus
> > native strings?
> > 2. Is this the ideal way to support unicode strings in ruby?
> >
> > And I explain the second as follows....if we could assume that
> > switching from treating a string as an array of bytes to a list of
> > characters of arbitrary width, and have all existing string
> > operations work correctly treating those characters as string,
> > would that be a better ideal? Where are the breaking points in such
> > a design? What''s to stop the underlying implementation from
> > actually using a UTF-16 character, passing UTF-8 to libraries and
> > IO streams but still allowing you to access everything as UTF-16 or
> > your encoding of choice? (Of course this is somewhat rhetorical; we
> > do this currently with JRuby since Java''s scrints are
UTF-16...we
> > just don''t have any way to provide access to UTF-16
characters, and
> > we normalize everything to UTF-8 for Ruby''s sake...but what
if we
> > didn''t normalize and adjusted string functions to
compensate?)
>
> This is more appropriate for ruby-talk
>
> --
> Julian ''Julik'' Tarkhanov
> please send all personal mail to
> me at julik.nl
>
>
> _______________________________________________
> Rails-core mailing list
> Rails-core@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails-core
>


-- 
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com


_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Rails core - Jun 2006 - Fwd: Unicode roadmap?

Fwd: Unicode roadmap?

Re: Fwd: Unicode roadmap?

Re: Fwd: Unicode roadmap?

Re: Fwd: Unicode roadmap?

Re: Fwd: Unicode roadmap?