thr3ads.net - Rails - UTF-8 and validates_length

If this information is useful, please help other people find it:
Share via:

Morus Walter

2005-Oct-24 08:22 UTC

UTF-8 and validates_length_of

Hi,

we are using rails with utf-8.
Unfortuantely rubys utf-8 support isn''t perfect yet and length/size for
an
utf8-string returns the number of bytes in the string, not the number of 
characters.
This makes validates_length_of erroneous in the case of values containing 
non ascii characters.

How should one work around this?

I think I could 
a) patch validations.rb to use value.jlength (from jcode) instead of value.size
b) do the same as a) without modification of rails by overwriting the
   validates_length_of method in some common superclass of our models
c) overwrite strings size method to use jlength (size can be overwritten,
   length not, since jlength uses length)

a and b are basically the same change in different ways (the first has the 
disadvantage that I have to redo the change whenever updating rails,
the second has the disadvantage that I''ll have my own
validates_length_of
method). c is more general and might fix or worsen things in other places.
Currently I''m favouring b.

Are there any other possibilities? How do others handle this problem?

  Morus

Julian ''Julik'' Tarkhanov

2005-Oct-24 11:52 UTC

head link

Re: UTF-8 and validates_length_of

On 24-okt-2005, at 10:22, Morus Walter wrote:>
> How should one work around this?
>
> I think I could
> a) patch validations.rb to use value.jlength (from jcode) instead  
> of value.size
> b) do the same as a) without modification of rails by overwriting the
>    validates_length_of method in some common superclass of our models
> c) overwrite strings size method to use jlength (size can be  
> overwritten,
>    length not, since jlength uses length)
>
> a and b are basically the same change in different ways (
do a) and wrap it into a patch so that we all can use it

-- 
Julian "Julik" Tarkhanov

Morus Walter

2005-Oct-25 08:43 UTC

head link

Re: UTF-8 and validates_length_of

On Mon, 24 Oct 2005 13:52:28 +0200
Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:

> > How should one work around this?
> >
> > I think I could
> > a) patch validations.rb to use value.jlength (from jcode) instead  
> > of value.size
>
> 
> do a) and wrap it into a patch so that we all can use it
>hmm. How would you do that in a way that doesn''t require everyone (that
is those, not using
utf8) to use jcode?

If you look at jcode''s implementation of jlength, it''s
something you don''t really want to have
unless there''s no alternative (it replaces all non-ascii characters by
blanks and calls length
on the result).

Morus

Julian ''Julik'' Tarkhanov

2005-Oct-25 11:58 UTC

head link

Re: UTF-8 and validates_length_of

On 25-okt-2005, at 10:43, Morus Walter wrote:
> On Mon, 24 Oct 2005 13:52:28 +0200
> Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:
>
>
>
>>> How should one work around this?
>>>
>>> I think I could
>>> a) patch validations.rb to use value.jlength (from jcode) instead
>>> of value.size
>>>
>>
>>
>> do a) and wrap it into a patch so that we all can use it
>>
>>
> hmm. How would you do that in a way that doesn''t require everyone
> (that is those, not using
> utf8) to use jcode?
>
> If you look at jcode''s implementation of jlength, it''s
something
> you don''t really want to have
> unless there''s no alternative (it replaces all non-ascii
characters
> by blanks and calls length
> on the result).
I know that Unicode in Ruby sucks big big time. It''s my pain as well.  
As for your solution-

You can check for $KCODE. If it''s set to ''u'', then
you can act like
this:

($KCODE == ''UTF8'' and str.respond_to?(:jsize)) ? str.jsize :
str.size

I believe Shugo Maeda had some more info on the ways jcode is bad in  
his blog.
-- 
Julian "Julik" Tarkhanov

Rails - Oct 2005 - UTF-8 and validates_length_of

UTF-8 and validates_length_of

Re: UTF-8 and validates_length_of

Re: UTF-8 and validates_length_of

Re: UTF-8 and validates_length_of