thr3ads.net - fxruby users - [fxruby-users] Unicode support in FXRuby 1.6 [Aug 2005]

If this information is useful, please help other people find it:
Share via:

Lyle Johnson

2005-Aug-28 14:26 UTC

[fxruby-users] Unicode support in FXRuby 1.6

All,

As some of you know, Jeroen has added support for Unicode strings in 
the unstable development version of FOX (version 1.5). I''m trying to 
plan ahead to decide how best to support this for FXRuby 1.6, but I 
don''t really know anything about Ruby''s support for Unicode or
i18n in
general. If you''re familiar with this topic (how/if Ruby deals with 
Unicode strings) I''d appreciate some pointers.

Thanks,

Lyle

olivers@mondrian-ide.com

2005-Sep-02 19:44 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

Did you get any information on this?  I decided to look around a bit since
I had to learn a bit of Unicode for work the other day.  I didn''t find
out
much, but here it is in the interest of getting a discussion going:

require ''jcode'' # japanese character support module

$KCODE = ''u'' # tells ruby to use the UTF-8 character set

utf_string = "\xc2\xa9" # UTF-8 code for the copyright sign

utf_string.length # -> 2, just a byte count
utf_string.jlength # -> 1, the number of UTF-8 characters

That''s all I could get to happen.  So you can store Unicode strings
with
the ''\x'' escape code, but you have to type in the UTF or
Japanese encoded
bytes manually (no u''Unicode String'' like in Python) and ruby
doesn''t
really know the difference.  There are some string utilities in the jcode
module, and jcode also alludes to a PATTERN_UTF8 option which allows you
to use Regexps with Unicode but it wasn''t defined for me.  There is
also
an ''iconv'' module which allows you to convert between
character sets, but
it is just a wrapper around a unix utility and is not available for me on
Windows XP.

Oliver
> All,
>
> As some of you know, Jeroen has added support for Unicode strings in
> the unstable development version of FOX (version 1.5). I''m trying
to
> plan ahead to decide how best to support this for FXRuby 1.6, but I
> don''t really know anything about Ruby''s support for
Unicode or i18n in
> general. If you''re familiar with this topic (how/if Ruby deals
with
> Unicode strings) I''d appreciate some pointers.
>
> Thanks,
>
> Lyle
>
> _______________________________________________
> fxruby-users mailing list
> fxruby-users@rubyforge.org
> http://rubyforge.org/mailman/listinfo/fxruby-users
>

Sander Jansen

2005-Sep-02 22:52 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

> That''s all I could get to happen.  So you can store Unicode
strings with
> the ''\x'' escape code, but you have to type in the UTF or
Japanese encoded
> bytes manually (no u''Unicode String'' like in Python) and
ruby doesn''t
> really know the difference.  
Can''t you store the ruby scripts as UTF8 encoded text files? Or will
the ruby
interperter struggle with that?
> There are some string utilities in the jcode 
> module, and jcode also alludes to a PATTERN_UTF8 option which allows you
> to use Regexps with Unicode but it wasn''t defined for me.  
> There is also 
> an ''iconv'' module which allows you to convert between
character sets, but
> it is just a wrapper around a unix utility and is not available for me on
> Windows XP.
FOX has several text codecs already buildin.


	Sander

Oliver Smith

2005-Sep-03 01:55 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

> > That''s all I could get to happen.  So you can store Unicode
strings with
> > the ''\x'' escape code, but you have to type in the
UTF or
> Japanese encoded
> > bytes manually (no u''Unicode String'' like in Python)
and ruby doesn''t
> > really know the difference.
>
> Can''t you store the ruby scripts as UTF8 encoded text files? Or
> will the ruby
> interperter struggle with that?
I think there is a way to do that, since there''s a -K option to the
ruby
interpreter which allows you to specify UTF-8, EUC or Shift-JIS, but when I
tried it choked on the non-ascii character regardless (I just saved a simple
ruby file in UTF-8 format with notepad and did ''ruby -Ku
test.rb'').  I guess
the $KCODE system variable is supposed to do this also.

I just noticed an argument to Regexp.new which allows you to specify from
the same charset choices.  I can use a UTF-8 two-byte character in a
character class and it works as expected.

Oliver

Gonzalo Garramuno

2005-Sep-03 03:17 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

That''s pretty much correct.  Ruby''s Unicode support is
somewhat weak
compared to python or perl.
Only UTF-8 is supported.  No support for UTF-16 is available, afaik.

Basically...  here''s everything you wanted to know about
ruby''s Unicode but
were afraid to ask....

* $KCODE can be set to support an encoding directly, but this is *NOT*
needed to have a script work with unicode.
It is just a simple shortcut so that any regex like /./ will do the right
thing.

* Without $KCODE, regexp with unicode support is available.  It is done
using /u language option, like
t =~ //u
or
Regexp.new(regex, options, ''u'')
(or, alternatively,  //m which is for multi-byte -- meaning ANSI, UTF-8,
EUC, or SJIS depending on
what $KCODE is set to, albeit I believe this is now no longer needed as
setting $KCODE will alredy
adjust all regexes).

* Supporting u"" like python can be added to some extent very easily. 
See:
http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html
This allows you to then do:
c = u''U+00a9''  # same as \xc2\xa9

*  You can also use:
     [].pack(''U*'')
     "".unpack(''U*'')
     to pack/unpack utf-8 strings.  This allows you to easily count
characters and iterate thru them,
     without the need of jcode (which really is only needed for getting succ
to work).

* jcode.rb is kind of a ruby hack and it is incomplete.  Methods such as:
reverse, capitalize, casecmp, swapcase, all the strip functions and probably
others are not defined and will return incorrect results, depending on the
language.

* Ruby''s $KCODE does not add a UTF-8 <->Latin1 encoding
conversion, unlike
python''s unicode strings.  So, albeit with the above, you can do:

question = u''U+00bfHabla espaU+00f1ol?''  # ?Habla espa?ol?
puts question

similar to python''s:
question = u''\u00bfHabla espa\u00f1ol?''  # ?Habla espa?ol?
print question

You will not get the corresponding Latin1 string when you print it (unlike
python''s unicode strings).

* To properly do the above, and convert Latin1<->UTF8 for printing, you
should use iconv.
    ruby -rinconv -e ''puts Iconv.iconv("UTF-8",
"ISO-8859-1", "\xf1")''
   Iconv, by default, does *NOT* get installed by the One-Click Windows
installer, even thou it is supposed to be a
   standard part of ruby.
   Adding something then like:
          class UString
                 require ''iconv''
                 def to_s
                     puts Iconv.iconv("UTF-8", "ISO-8859-1",
self)
                 end
           end
   will do the trick for Why''s UString class.

* The ruby interpreter should have no problem reading a utf-8 .rb script
file, but you have to prefix it by calling> ruby -Ku file.rb  (or set RUBYOPTS to -Ku, so ruby always runs with that)Note, however, that window''s notepad, when saving UTF-8 files adds a
valid
albeit meaningless 3-byte BOM (byte-order sequence) at start which will not
work fine with ruby1.8 (and will also corrupt unix shebang lines on
most -all?- unixes).  This sequence is not valid utf-8 unicode, albeit it is
allowed by the standard.  Ruby, just as Unix shebangs, does not deal with
this appropiately.

Gonzalo Garramuno

2005-Sep-03 03:59 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

Oh yeah... the plan for ruby2.0 (or 1.9?) Unicode is to have:
http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html

so what does this mean for fxruby?

Well, it means that Unicode support could probably be implemented in either
one of two ways:

a) By using FXRuby''s own FXString, which would do all the encoding and
which
would support a constructor like:
     class FXUnicodeString < String
       def initialize( str, encoding = $KCODE )  # with encoding being ASCII
(latin1, really), Unicode, EJIS or EUC
       end
        # ...etc...
        # ...with all of ruby''s standard String methods implemented,
using
fox''s unicode as the backend.
     end

    # And perhaps... for ease of use...
     class Kernel
          def u(str)
             FXUnicodeString.new(str, ''U'')
           end
     end



or...

b) Simply by having a similar text function for the widgets with unicode, so
that, besides:
               text()
               text=()   # both returning the string unprocessed.
         there''s also
               text(str, encoding)
               text_enc()  # returns [text, encoding]  # if fox remembers
the encoding

Obviously, a) is better as that it would be more similar to what ruby plans
to eventually do with unicode support (and thus, eventually, FXUnicodeString
could just be replaced with ruby''s String itself), albeit it may end up
being more work in having to implement all of ruby''s methods.

Gonzalo Garramuno

2005-Sep-03 05:02 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

> (or, alternatively,  //m which is for multi-byte -- meaning ANSI, UTF-8,
> EUC, or SJIS depending on
> what $KCODE is set to, albeit I believe this is now no longer needed as
> setting $KCODE will alredy
> adjust all regexes).
>
Err... actually this is not correct.  /m is for multi-line in regex.  Not
sure what the heck I was thinking there.

Oliver Smith

2005-Sep-03 12:44 UTC

head link

[fxruby-users] Unicode support in FXRuby 1.6

Gonzalo,

Thanks a lot for your thorough notes!  I think you covered everything I was
curious about.

Oliver
> -----Original Message-----
> From: fxruby-users-bounces@rubyforge.org
> [mailto:fxruby-users-bounces@rubyforge.org]On Behalf Of Gonzalo
> Garramuno
> Sent: Saturday, September 03, 2005 12:24 AM
> To: fxruby-users@rubyforge.org
> Subject: Re: [fxruby-users] Unicode support in FXRuby 1.6
>
>
> That''s pretty much correct.  Ruby''s Unicode support is
somewhat weak
> compared to python or perl.
> Only UTF-8 is supported.  No support for UTF-16 is available, afaik.
>
> Basically...  here''s everything you wanted to know about
ruby''s
> Unicode but
> were afraid to ask....
>
> * $KCODE can be set to support an encoding directly, but this is *NOT*
> needed to have a script work with unicode.
> It is just a simple shortcut so that any regex like /./ will do the right
> thing.
>
> * Without $KCODE, regexp with unicode support is available.  It is done
> using /u language option, like
> t =~ //u
> or
> Regexp.new(regex, options, ''u'')
> (or, alternatively,  //m which is for multi-byte -- meaning ANSI, UTF-8,
> EUC, or SJIS depending on
> what $KCODE is set to, albeit I believe this is now no longer needed as
> setting $KCODE will alredy
> adjust all regexes).
>
> * Supporting u"" like python can be added to some extent very
> easily.  See:
> http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html
> This allows you to then do:
> c = u''U+00a9''  # same as \xc2\xa9
>
> *  You can also use:
>      [].pack(''U*'')
>      "".unpack(''U*'')
>      to pack/unpack utf-8 strings.  This allows you to easily count
> characters and iterate thru them,
>      without the need of jcode (which really is only needed for
> getting succ
> to work).
>
> * jcode.rb is kind of a ruby hack and it is incomplete.  Methods such as:
> reverse, capitalize, casecmp, swapcase, all the strip functions
> and probably
> others are not defined and will return incorrect results, depending on the
> language.
>
> * Ruby''s $KCODE does not add a UTF-8 <->Latin1 encoding
conversion, unlike
> python''s unicode strings.  So, albeit with the above, you can do:
>
> question = u''U+00bfHabla espaU+00f1ol?''  # ?Habla
espa?ol?
> puts question
>
> similar to python''s:
> question = u''\u00bfHabla espa\u00f1ol?''  # ?Habla
espa?ol?
> print question
>
> You will not get the corresponding Latin1 string when you print it (unlike
> python''s unicode strings).
>
> * To properly do the above, and convert Latin1<->UTF8 for printing,
you
> should use iconv.
>     ruby -rinconv -e ''puts Iconv.iconv("UTF-8",
"ISO-8859-1", "\xf1")''
>    Iconv, by default, does *NOT* get installed by the One-Click Windows
> installer, even thou it is supposed to be a
>    standard part of ruby.
>    Adding something then like:
>           class UString
>                  require ''iconv''
>                  def to_s
>                      puts Iconv.iconv("UTF-8",
"ISO-8859-1", self)
>                  end
>            end
>    will do the trick for Why''s UString class.
>
> * The ruby interpreter should have no problem reading a utf-8 .rb script
> file, but you have to prefix it by calling
> > ruby -Ku file.rb  (or set RUBYOPTS to -Ku, so ruby always runs
> with that)
> Note, however, that window''s notepad, when saving UTF-8 files adds
a valid
> albeit meaningless 3-byte BOM (byte-order sequence) at start
> which will not
> work fine with ruby1.8 (and will also corrupt unix shebang lines on
> most -all?- unixes).  This sequence is not valid utf-8 unicode,
> albeit it is
> allowed by the standard.  Ruby, just as Unix shebangs, does not deal with
> this appropiately.
>
> _______________________________________________
> fxruby-users mailing list
> fxruby-users@rubyforge.org
> http://rubyforge.org/mailman/listinfo/fxruby-users
>

Seemingly Similar Threads

Search for more maybe matching threads

fxruby users - Aug 2005 - Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

[fxruby-users] Unicode support in FXRuby 1.6

Seemingly Similar Threads