thr3ads.net - Rails core - Investigating Unicode. Take 2, with nastities and allegations. [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Julian ''Julik'' Tarkhanov

2005-Dec-21 14:53 UTC

Investigating Unicode. Take 2, with nastities and allegations.

Well, I see that my last email hasn''t generated any reaction from the  
Rails core team. It looks like all of them are the happy users of  
"plain text" (which, as we know by now, doesn''t exist, but
still).

I apologize in advance for the sore bitterness of this message but I  
see that the Rails-core STILL, despite all of the efforts, sees these  
issues as something you can YAGNI away, something "optional",  
"additional" or "plugin-able".

What I will try to prove in this message is that it''s not  
"additional" - and more, it''s got poisonous teeth and it
bites
painfully. You can forgive Matz, because he has to stay above the  
controversy and cater to the Japanese and Chinese users, and he  
dislikes Unicode (like most of the enlightened Japanese do). But you  
can''t forgive _yourself_ because these are _your_ aplications. As a  
developer, you are accountable.

In my first email I was talking about the low-level mechanics of this  
stuff. They are interesting for a Ruby internals developer or to a  
deep-down Ruby hacker (like Jamis), but I haven''t touched the  
consequences that Rails gets from this (because I thought I won''t  
need to draw out this knife if there is interest). Turns out we (as  
per David and others) are still in the cozy world of "Plain Text"  
though, so it''s time I better open this can of worms.

Let''s skip on a second on all these nasty, disgusting "question
mark
in a rectangle" characters your users will see when you truncate  
their text improperly - this is, after all, the temple of Output, the  
browser domain - you sent it out, and then the browser has to cope  
with it according to Postel''s law. And besides there are not many of  
them, right? Just some lousy 5,5 billions of potential customers,  
right? Uhm, sorry, got a little offtopic here.

Let''s move somewhat up the stack in my previous message, into a  
different domain - the one you care about. The one you foster and  
cherish. The domain of Data.

Paul Battley had a good talk on the recent Eureko conference in  
Munich about Unicode in Ruby. Among his other slides he had "Doing  
mischief with Unicode". Unfortunately I couldn''t attend because  
Eureko effectively was on my birthday, so I found other fish to fry  
on that day - but you can find Paul''s presentation in it''s
gory MPEG4
here:

http://www.futurometer.com/320x240x15fps/Battley.Unicode.mov.gz

I will merely expand his presentation into Rails - that''s right, we  
will exploit Rails with Unicode. Let''s say you are storing your data  
in Unicode (because if you don''t you must spend the rest of your days  
in Hell writing Sanskrit in octets on a concrete plate with a dinner  
fork). You think your bases are covered and you did require
''jcode''.
Except that ''jcode'' won''t help.

Let''s have a look at this nice little snippet.

class User
  validates_presence_of :login
end

Looks buletproof, isn''t it? If a user enters spaces into the form  
they are going to get String#strip''ped, and then the text in the  
field is going to be String#blank?, right? So entering all spaces  
into the Login field won''t work, right?

Well, it will. The Unicode standard, as of now, comprises 26 (!)  
characters which can be considered "whitespace". 26, that is, when  
used inside a string - when it''s at the boundaries it gets 27  
(including the zero-width space AKA BOM).

So let''s try:

kinda_lovely_login = [
	   0x0020,          # White_Space # Zs       SPACE
            0x00A0,          # White_Space # Zs       NO-BREAK SPACE
            0x202F,          # White_Space # Zs       NARROW NO-BREAK  
SPACE
].pack("U*")

And lo and behold...

User.new(:login=>kinda_lovely_login).save!

Nice, isn''t it? If you wonder - yes, this is an exploit existing in  
YOUR Rails application RIGHT NOW (albeit a mild one). That one  
application that is sooo-web 2.0, with Ajax and stuff. If you like  
it, you better switch to 7-bit ASCII right away before selling it to  
anyone (not that you will be succesful unless you only sell to the  
British and American customers, and as we all know, the Web ends  
there). And "just using UTF-8" won''t help, because Unicode is
hard.

You wonder WHY that happens? Well... String#strip is Unicode-unaware.  
As are String#empty? and (thusly) String#blank? But don''t reach out  
for your fixtures just yet! Because I''m far from finished...

Let''s move on:

class User
    validates_size_of :name,  :maximum=>5
end

Ok, this is our User. Now let''s see if I can use this application:

my_name = [1070, 1083, 1080, 1082].pack("U*")

in case you wonder - this is my name in Russian, spelled like "Юлик".
The one my mother gave to me.

User.new(:login=>''julik'', :name=>my_name).save!

/usr/local/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/ 
active_record/validations.rb:711:in `save!'':  
ActiveRecord::RecordInvalid (ActiveRecord::RecordInvalid)

Ahem, wait a minute. You said it was 5 right? And of course you show  
it to me in a nice little error message? But I gather that my name is  
as many as 4 letters, and it fits the boundaries quite nicely. Well,  
no. String#size is not Unicode-aware, as we know - so AR just sticks  
to that. And my name turns out to be quite a bit longer than what I  
thought it might be:

name.size
=> 8

Well, sure, Two-bytes per character. David can stick some of his nice  
Danish diacritics in there as well, because they ought to be double- 
byte too. And yes, the fact that Ruby uses UTF-8 will nicely conceal  
this from you as long as you stay in your cozy "plain-text" land. If  
you like it THAT way you better stick the following into the form:

"The length of your name decomposed into bytes should be less than,  
or equal to 5".

I bet your users will love that.

Now just do a grep on Rails sources for string.size (and friends).  
Enjoy the mess.

This is not "localization of dates and times", gentlemen, this is  
serious BAD. And if you still think these things are not serious and  
Rails can stay plain text, if you stil think this can be outsourced  
and YAGNI''ed away, if you think it doesn''t "touch me
because most of
my customers are American anyways", if you think you can sell THIS to  
the pointy-haired bossed, or if you think Matz (and other Japanese)  
will take care of it for you -- I admire you. Keep countin''
em'' bytes.

--
Julian ''Julik'' Tarkhanov
me at julik.nl

Thijs Van Der Vossen

2005-Dec-22 19:36 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On 21 Dec 2005, at 15:53 , Julian ''Julik'' Tarkhanov
wrote:> Well, I see that my last email hasn''t generated any reaction from
> the Rails core team. [...]
Julian, maybe I''ve missed it, but do you have a patch for the String  
fix you proposed in your previous email? I really like to test our  
current apps against your proposed solution.

Kind regards,
Thijs

--
Fingertips - http://www.fngtps.com
+31 (0)6 24204845
thijs@jabber.org

Kyle Maxwell

2005-Dec-22 22:13 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On 12/21/05, Julian 'Julik' Tarkhanov <listbox@julik.nl>
wrote:> Well, I see that my last email hasn't generated any reaction from the
> Rails core team. It looks like all of them are the happy users of
> "plain text" (which, as we know by now, doesn't exist, but
still).
>
> I apologize in advance for the sore bitterness of this message but I
> see that the Rails-core STILL, despite all of the efforts, sees these
> issues as something you can YAGNI away, something "optional",
> "additional" or "plugin-able".
>
> What I will try to prove in this message is that it's not
> "additional" - and more, it's got poisonous teeth and it
bites
> painfully. You can forgive Matz, because he has to stay above the
> controversy and cater to the Japanese and Chinese users, and he
> dislikes Unicode (like most of the enlightened Japanese do). But you
> can't forgive _yourself_ because these are _your_ aplications. As a
> developer, you are accountable.
>
> In my first email I was talking about the low-level mechanics of this
> stuff. They are interesting for a Ruby internals developer or to a
> deep-down Ruby hacker (like Jamis), but I haven't touched the
> consequences that Rails gets from this (because I thought I won't
> need to draw out this knife if there is interest). Turns out we (as
> per David and others) are still in the cozy world of "Plain Text"
> though, so it's time I better open this can of worms.
>
> Let's skip on a second on all these nasty, disgusting "question
mark
> in a rectangle" characters your users will see when you truncate
> their text improperly - this is, after all, the temple of Output, the
> browser domain - you sent it out, and then the browser has to cope
> with it according to Postel's law. And besides there are not many of
> them, right? Just some lousy 5,5 billions of potential customers,
> right? Uhm, sorry, got a little offtopic here.
>
> Let's move somewhat up the stack in my previous message, into a
> different domain - the one you care about. The one you foster and
> cherish. The domain of Data.
>
> Paul Battley had a good talk on the recent Eureko conference in
> Munich about Unicode in Ruby. Among his other slides he had "Doing
> mischief with Unicode". Unfortunately I couldn't attend because
> Eureko effectively was on my birthday, so I found other fish to fry
> on that day - but you can find Paul's presentation in it's gory
MPEG4
> here:
>
> http://www.futurometer.com/320x240x15fps/Battley.Unicode.mov.gz
>
> I will merely expand his presentation into Rails - that's right, we
> will exploit Rails with Unicode. Let's say you are storing your data
> in Unicode (because if you don't you must spend the rest of your days
> in Hell writing Sanskrit in octets on a concrete plate with a dinner
> fork). You think your bases are covered and you did require
'jcode'.
> Except that 'jcode' won't help.
>
> Let's have a look at this nice little snippet.
>
> class User
>   validates_presence_of :login
> end
>
> Looks buletproof, isn't it? If a user enters spaces into the form
> they are going to get String#strip'ped, and then the text in the
> field is going to be String#blank?, right? So entering all spaces
> into the Login field won't work, right?
>
> Well, it will. The Unicode standard, as of now, comprises 26 (!)
> characters which can be considered "whitespace". 26, that is,
when
> used inside a string - when it's at the boundaries it gets 27
> (including the zero-width space AKA BOM).
>
> So let's try:
>
> kinda_lovely_login = [
>            0x0020,          # White_Space # Zs       SPACE
>             0x00A0,          # White_Space # Zs       NO-BREAK SPACE
>             0x202F,          # White_Space # Zs       NARROW NO-BREAK
> SPACE
> ].pack("U*")
>
> And lo and behold...
>
> User.new(:login=>kinda_lovely_login).save!
>
> Nice, isn't it? If you wonder - yes, this is an exploit existing in
> YOUR Rails application RIGHT NOW (albeit a mild one). That one
> application that is sooo-web 2.0, with Ajax and stuff. If you like
> it, you better switch to 7-bit ASCII right away before selling it to
> anyone (not that you will be succesful unless you only sell to the
> British and American customers, and as we all know, the Web ends
> there). And "just using UTF-8" won't help, because Unicode is
hard.
>
> You wonder WHY that happens? Well... String#strip is Unicode-unaware.
> As are String#empty? and (thusly) String#blank? But don't reach out
> for your fixtures just yet! Because I'm far from finished...
>
> Let's move on:
>
> class User
>     validates_size_of :name,  :maximum=>5
> end
>
> Ok, this is our User. Now let's see if I can use this application:
>
> my_name = [1070, 1083, 1080, 1082].pack("U*")
>
> in case you wonder - this is my name in Russian, spelled like
"Юлик".
> The one my mother gave to me.
>
> User.new(:login=>'julik', :name=>my_name).save!
>
> /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/
> active_record/validations.rb:711:in `save!':
> ActiveRecord::RecordInvalid (ActiveRecord::RecordInvalid)
>
> Ahem, wait a minute. You said it was 5 right? And of course you show
> it to me in a nice little error message? But I gather that my name is
> as many as 4 letters, and it fits the boundaries quite nicely. Well,
> no. String#size is not Unicode-aware, as we know - so AR just sticks
> to that. And my name turns out to be quite a bit longer than what I
> thought it might be:
>
> name.size
> => 8
>
> Well, sure, Two-bytes per character. David can stick some of his nice
> Danish diacritics in there as well, because they ought to be double-
> byte too. And yes, the fact that Ruby uses UTF-8 will nicely conceal
> this from you as long as you stay in your cozy "plain-text" land.
If
> you like it THAT way you better stick the following into the form:
>
> "The length of your name decomposed into bytes should be less than,
> or equal to 5".
>
> I bet your users will love that.
>
> Now just do a grep on Rails sources for string.size (and friends).
> Enjoy the mess.
>
> This is not "localization of dates and times", gentlemen, this is
> serious BAD. And if you still think these things are not serious and
> Rails can stay plain text, if you stil think this can be outsourced
> and YAGNI'ed away, if you think it doesn't "touch me because
most of
> my customers are American anyways", if you think you can sell THIS to
> the pointy-haired bossed, or if you think Matz (and other Japanese)
> will take care of it for you -- I admire you. Keep countin' em'
bytes.
>
> --
> Julian 'Julik' Tarkhanov
> me at julik.nl
>
>
>
> _______________________________________________
> Rails-core mailing list
> Rails-core@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails-core
>
Julian,

I think that everyone is with you about wanting great Unicode support
in Ruby.  However, to release of 1.0, all of the core team guys put in
massive effort to get the release out the door.  I imagine that they
need some recovery time.  Also, there's the holiday season, and many
people are spending time with friends and family.

Great Unicode support will happen sooner or later, and if you want
sooner, you should start working on a patch.  I'd love to contribute,
but I need to get through the holidays and a major product launch in
January first.

--
Kyle Maxwell
Chief Technologist
E Factor Media // FN Interactive
kyle@efactormedia.com
1-866-263-3261

_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Julian ''Julik'' Tarkhanov

2005-Dec-22 22:47 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On 22-dec-2005, at 20:36, Thijs Van Der Vossen wrote:
> On 21 Dec 2005, at 15:53 , Julian ''Julik'' Tarkhanov
wrote:
>> Well, I see that my last email hasn''t generated any reaction
from
>> the Rails core team. [...]
>
> Julian, maybe I''ve missed it, but do you have a patch for the  
> String fix you proposed in your previous email? I really like to  
> test our current apps against your proposed solution.
>I sent it to you off-list yesterday I believe, I am working on this:
http://julik.textdriven.com/svn/tools/rails_plugins/unicode_hacks/

If someone wants to help out hacking I will gladly accept it.

Just grab the Unicode gem, export the plugin, rake. It has some other  
code (some of which is addressed in the core already  - like DB  
connection charset - but funny as it may seem this was protecting me  
from the effects of the infamousdatabase timeout problem).

But I need more solid test coverage and not all methods are shadowed  
yet. Unfortunately there is no test for the core Ruby string  
functionality so I can''t check if I break it for anyone else. If such  
a test exists I would like to know where (is Rubicon still viable? it  
hasn''t been updated for quite some time). Right now I just filter all  
calls to strings which have UTF-8 semantics and only when $KCODE is  
UTF8. And you need to have the gem, which means that this won''t work  
for Windows people - they will need to find out how to build the gem  
themselves, I am C-illiterate.

But it overrides the core Ruby class and core Ruby methods. It is, in  
general, a very nasty hack - a very deep one. I stand by it (and I  
use it daily), but I don''t know if it will work for others. I just  
felt very, uhm... upset when I found out that Rails basically does  
nothing to what is (IMO) Matz''s hesitation.  There is similar  
ambiguity with this in PHP but every moderately large application (or  
framework) at least tries to tackle this through use of mb_string. I  
might hack on this further but I would like to know the position of  
the core on this.

Because if you want Rails-apps to be Unicode-enabled you basically  
have 2 options:
1) hack the String - Matz will not produce something working in the  
near future. Or maybe the Pragmatic guys can convince him, because  
the purism of "not doing anything not to hurt nobody" is noble but  
long-lasting with bad side-effects. I could find talks about Unicode  
in Ruby going to as far back as 2002, and still absolutely niente has  
been done to address it at the language level.
2) fork, fork, fork. Every single string truncation or length  
calculation or stripping within Rails has to be forked (like the  
truncate() helper)
3) Make an extension of String which will accomodate hacks like mine  
under their own prefix, as if we were in PHP-land calling  
mb_functions. Again, an enormous code review process should ensue, as  
well as it gives us no guarantee of covering other outside libraries  
(or, for that matter, it gives no guarantee that a Rails core  
developer from the USA won''t forget that you need a prefix to count  
these darn letters right).

I am just upset because it''s so broken and I seem to be the only one  
whining and asking questions. Maybe I am asking them wrong, I don''t  
know. Or  I seem to be the only Rails user needing to use both an ß  
and a Ш in a single string, while everyone else is happily building  
this new Web 2.0 (which as it turns out has problems accepting my  
first and last name).

Enjoy the holidays everyone!

--
Julian ''Julik'' Tarkhanov
me at julik.nl

Jamis Buck

2005-Dec-23 00:55 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On Dec 22, 2005, at 3:47 PM, Julian ''Julik'' Tarkhanov wrote:
> I am just upset because it''s so broken and I seem to be the only  
> one whining and asking questions. Maybe I am asking them wrong, I  
> don''t know. Or  I seem to be the only Rails user needing to use  
> both an ß and a Ш in a single string, while everyone else is  
> happily building this new Web 2.0 (which as it turns out has  
> problems accepting my first and last name).
Julik,

Allow me, as a core team member, to say, "the core team cares about  
this issue." I hope that assuages some of your pain.

Now, as a core team member, allow me to say, "the core team has no  
experience with i18n". Allow me also to say, "the core team has no  
pressing needs for extensive i18n in their applications." And lastly,  
allow me to say (as has been said multiple times), "patches are  
always welcome."

I apologize if I''ve come off snarky, here, but no ones like to be  
called insensitive. And members of the core team HAVE addressed this  
issue, repeatedly, and on this very list. Our universal answer is "if  
someone comes up with a good solution, we''ll consider it."
I''m sorry
if that''s not the kind of answer you want to hear, but I can promise  
you that the core team will not just go away for a month and come  
back with an i18n solution that everyone loves. Mostly because most  
of us have never done i18n before, and are therefore not best  
qualified to come up with a solution.

Please, please, please, work on this. Please, please, please come up  
with a solution and get the other people on this list (and elsewhere)  
who need i18n to buy off on it, And then, please, please, please post  
a patch. That is the only way it''s going to happen.

- Jamis

Thijs Van Der Vossen

2005-Dec-23 08:07 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On 23 Dec 2005, at 01:55 , Jamis Buck wrote:> On Dec 22, 2005, at 3:47 PM, Julian ''Julik'' Tarkhanov
wrote:
>> I am just upset because it''s so broken and I seem to be the
only
>> one whining and asking questions. Maybe I am asking them wrong, I  
>> don''t know. Or  I seem to be the only Rails user needing to
use
>> both an ß and a Ш in a single string, while everyone else is  
>> happily building this new Web 2.0 (which as it turns out has  
>> problems accepting my first and last name).
>
> Julik,
>
> Allow me, as a core team member, to say, "the core team cares about  
> this issue." I hope that assuages some of your pain.
>
> Now, as a core team member, allow me to say, "the core team has no  
> experience with i18n". Allow me also to say, "the core team has
no
> pressing needs for extensive i18n in their applications." And  
> lastly, allow me to say (as has been said multiple times), "patches  
> are always welcome."
Just to try clarifying the issue; Julik is _not_ whining about  
_extensive_ i18n at all, he is whining because Rails breaks in all  
kinds of subtle ways when you enter Unicode data that contains  
characters beyond the ''Basic Latin'' plane.

Simply put, a _character_ is no longer _one byte long_ when you get  
beyond the characters you can see printed on your keyboard. Even  
simple punctuation like these “double quotation marks” take up  
_two bytes_ each, and stuff like ⾦ is _three bytes_ in UTF-8.

Because most string handling stuff in Ruby treats each character as  
one byte, there a lot of places in Rails right now where _every_  
character is assumed to be _one byte_ in length; which simply is not  
the case.

Everyone interested might like to read the following articles on how  
Unicode and the UTF-8 encoding works:

http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode
http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
http://www.joelonsoftware.com/articles/Unicode.html

Kind regards,
Thijs van der Vossen

--
Fingertips - http://www.fngtps.com
+31 (0)6 24204845
thijs@jabber.org

Jean-Christophe Michel

2005-Dec-23 08:46 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

Hi Julian,

Julian ''Julik'' Tarkhanov a écrit :> Well, I see that my last email hasn''t generated any reaction from
the
> Rails core team. It looks like all of them are the happy users of 
> "plain text" (which, as we know by now, doesn''t exist,
but still).
...

May I ask a ruby ignorant question ?
How does the ruby project work ? Is there no way to fix String class
directly in ruby by contributing a patch ?
We could borrow code from php''s mb_string or from python to see how
utf8
is unpacked.
-- 
Jean-Christophe Michel

Michael Koziarski

2005-Dec-23 10:10 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

> Simply put, a _character_ is no longer _one byte long_ when you get
> beyond the characters you can see printed on your keyboard. Even
> simple punctuation like these "double quotation marks" take up
> _two bytes_ each, and stuff like ⾦ is _three bytes_ in UTF-8.
The problem with UTF-8 is that the length of characters varies.  So
something like this:

a_string[434..2443]

is no longer O(1).   This is why things are often stored with ucs-2
internally, and converted at the boundaries.  I believe this is how
the JVM handles things, but I could be completely wrong.

But Jamis' point is a valid one,  I think one of the key reasons that
rails has been successful is that we haven't just gone mad adding
features left right and center.  Everything which gets in is taken
from an application where it's been proven.    In other frameworks
where this hasn't happened you get  annoying bugs,  and sub-par apis.

i18n is something I care about, but it's not something I need for my
paid work.   I think the ideal way to get it into core is for people
who are experts *and* need it in their paid work to produce a plugin.

Then once the plugin has been in use by the community, we can roll it
in.   I18n is extremely important,   i18n needs to end up in the core
distribution.   But we need to do it the 'rails way'.



--
Cheers

Koz

_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Thijs Van Der Vossen

2005-Dec-23 10:49 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On 23 Dec 2005, at 11:10 , Michael Koziarski wrote:>> Simply put, a _character_ is no longer _one byte long_ when you get
>> beyond the characters you can see printed on your keyboard. Even
>> simple punctuation like these "double quotation marks" take
up
>> _two bytes_ each, and stuff like ⾦ is _three bytes_ in UTF-8.
>
> The problem with UTF-8 is that the length of characters varies.  So
> something like this:
>
> a_string[434..2443]
>
> is no longer O(1).   This is why things are often stored with ucs-2
> internally, and converted at the boundaries.  I believe this is how
> the JVM handles things, but I could be completely wrong.
You''re right, this is how the String class in Java stores Unicode  
data internally. The problem with UCS-2 is that it only allows you to  
encode the ''Basic Multilingual Plane'' because you can only use
16
bits for each character. Don''t confuse UCS-2 with UTF-16, where each  
character can take up 2 or 4 bytes.

See http://en.wikipedia.org/wiki/UCS-2 for more on this.

The reason we are talking about UTF-8 is that this is everyone is  
already using this encoding in their Rails apps  and that it allows  
you to handle ASCII data without ever thinking about it.

See http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF for why  
UTF-8 might actually be a good idea.
> But Jamis'' point is a valid one,  I think one of the key reasons
that
> rails has been successful is that we haven''t just gone mad adding
> features left right and center.  Everything which gets in is taken
> from an application where it''s been proven.    In other frameworks
> where this hasn''t happened you get  annoying bugs,  and sub-par
apis.
This is a valid point, but it does not apply to this issue. Rails is  
currently annoyingly buggy when you need to handle Unicode data.
> i18n is something I care about, but it''s not something I need for
my
> paid work.   I think the ideal way to get it into core is for people
> who are experts *and* need it in their paid work to produce a plugin.
I think Julian might be our expert and he''s currently working on a  
solution. Please see his previous email for details.
> Then once the plugin has been in use by the community, we can roll it
> in.   I18n is extremely important,   i18n needs to end up in the core
> distribution.   But we need to do it the ''rails way''.
Although you can''t have proper i18n without good Unicode support,  
good Unicode support is _not_ about i18n. Even if your app will never  
ever handle anything but english text, you still need to handle stuff  
like punctuation in text your users are copying and pasting from Word.

Please, please, don''t ignore this issue because David said that i18n  
should be handled at the application level.

Kind regards,
Thijs van der Vossen

--
Fingertips - http://www.fngtps.com
+31 (0)6 24204845
thijs@jabber.org

Jamis Buck

2005-Dec-23 14:23 UTC

head link

Re: Investigating Unicode. Take 2, with nastities and allegations.

On Dec 23, 2005, at 3:49 AM, Thijs Van Der Vossen wrote:
> Please, please, don''t ignore this issue because David said that  
> i18n should be handled at the application level.
One more time, and then I sign out of this thread for good:

The rails team is NOT ignoring this issue. Rather, the rails team is  
waiting for someone with i18n chops to come up with a decent  
solution. If that person is you, then we''re waiting for you to fix  
the problem. If that person is not you, then you''re in the same boat  
we are.

- Jamis

Julian ''Julik'' Tarkhanov

2005-Dec-23 19:21 UTC

head link

Re: Investigating Unicode. Take 2, practical

On 23-dec-2005, at 15:23, Jamis Buck wrote:
> On Dec 23, 2005, at 3:49 AM, Thijs Van Der Vossen wrote:
>
>> Please, please, don''t ignore this issue because David said
that
>> i18n should be handled at the application level.
>
> One more time, and then I sign out of this thread for good:
>
> The rails team is NOT ignoring this issue. Rather, the rails team  
> is waiting for someone with i18n chops to come up with a decent  
> solution. If that person is you, then we''re waiting for you to fix
> the problem. If that person is not you, then you''re in the same  
> boat we are.
That person is the collective consciense :-) Jamis, thanks for  
chiming in.
I don''t know if it''s understandable - I just wanted someone to
say
that it''s indeed broken (I got nasty because I wanted to show that  
it''s also broken in Basecamp et al.)

Let''s start simply : why don''t we assume $KCODE =
''UTF-8'' for test
environment in Rails so that you don''t need to sandbox every Rails  
test that has to do with multibyte characters? (i.e. embrace the fact  
that most of the text in the world is mulibyte). So that you can type  
wonky literals right in the test cases and see which stuff brakes  
quickly, without starting an extra interpreter every time you want to  
truncate a string? This would be the step 0 in the right direction. I  
am not sure, but I assume all 37''s apps run with that setting.

--
Julian ''Julik'' Tarkhanov
me at julik.nl

Maybe Matching Threads

Search for more apparently analagous threads

Rails core - Dec 2005 - Investigating Unicode. Take 2, with nastities and allegations.

Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, with nastities and allegations.

Re: Investigating Unicode. Take 2, practical

Maybe Matching Threads