All, Sorry for cross-posting but I am stuck on this thing for a quite some time: I have a variable x = 1046 How can I convert into UTF-8 character? x.chr does not work for it. Basically I need to put x in a string as UTF-8 character to display on a page. Regards, - newB -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Sep 29, 2008, at 7:04 PM, Simon Watson wrote:> > All, Sorry for cross-posting but I am stuck on this thing for a quite > some time: > > I have a variable x = 1046 > > How can I convert into UTF-8 character? x.chr does not work for it. > > Basically I need to put x in a string as UTF-8 character to display > on a > page. > > Regards, > > - newBDo you mean to say that x holds a Unicode code point? If that''s the case (since ASCII is a subset of Unicode, x.to_s => "1046" is trivial), then you can use something like this code I wrote a while back: irb> ("U+"+(''0''*4+x.to_s(16))[-4,4]).to_utf8 => "\320\226" Of course, you could hide most of that in an Integer#to_utf8 method. -Rob Rob Biedenharn http://agileconsultingllc.com Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org # -*- ruby -*- class String # For a string that matches /(?i:U\+?\|\\u)?\d{4}/, return a suitable UTF-8 # string for that character. def to_utf8 case point = self.match(/[[:xdigit:]]{4}/)[0].to_i(16) when 0..0x7f point.chr when 0x80..0x07ff x = point & 0b111111 point >>= 6 y = point "#{(0xC0 | y).chr}#{(0x80 | x).chr}" when 0x0800..0xFFFF x = point & 0b111111 point >>= 6 y = point & 0b111111 point >>= 6 z = point "#{(0xE0 | z).chr}#{(0x80 | y).chr}#{(0x80 | x).chr}" when 0x10000..0x10FFFF raise NotImplementedError, "UTF-8 four byte sequences not yet supported" else raise ArgumentError, "Values above U+10FFFF are not supported" end end end if __FILE__ == $0 require ''test/unit'' class UnicodeHelperTest < Test::Unit::TestCase def test_ascii assert_equal ''!'', "U+0021".to_utf8, ''EXCLAMATION MARK'' assert_equal ''A'', "U+0041".to_utf8, ''UPPERCASE LETTER A'' assert_equal ''-'', "U+002D".to_utf8, ''HYPHEN-MINUS'' assert_equal ''~'', "U+007E".to_utf8, ''TILDE'' assert_equal ''!'', "0021".to_utf8, ''EXCLAMATION MARK'' assert_equal ''A'', "0041".to_utf8, ''UPPERCASE LETTER A'' assert_equal ''-'', "002D".to_utf8, ''HYPHEN-MINUS'' assert_equal ''~'', "007E".to_utf8, ''TILDE'' assert_equal ''!'', "\\u0021".to_utf8, ''EXCLAMATION MARK'' assert_equal ''A'', "\\u0041".to_utf8, ''UPPERCASE LETTER A'' assert_equal ''-'', "\\u002D".to_utf8, ''HYPHEN-MINUS'' assert_equal ''~'', "\\u007E".to_utf8, ''TILDE'' end def test_hi_bit_ascii assert_equal "\xC2\x80", "U+0080".to_utf8, "C-cedilla" assert_equal "\xC2\xA4", "U+00A4".to_utf8, "Spanish n-tilde" end def test_general_punctuation assert_equal "\342\200\220", "U+2010".to_utf8, "HYPHEN" assert_equal "\342\200\221", "U+2011".to_utf8, "NON-BREAKING HYPHEN" assert_equal "\342\200\222", "U+2012".to_utf8, "FIGURE DASH" assert_equal "\342\200\223", "U+2013".to_utf8, "EN DASH" assert_equal "\342\200\224", "U+2014".to_utf8, "EM DASH" assert_equal "\342\200\225", "U+2015".to_utf8, "QUOTATION DASH" end end end __END__ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 30 Sep 2008, at 14:00, Rob Biedenharn wrote:> > > On Sep 29, 2008, at 7:04 PM, Simon Watson wrote: > >> >> All, Sorry for cross-posting but I am stuck on this thing for a >> quite >> some time: >> >> I have a variable x = 1046 >> >> How can I convert into UTF-8 character? x.chr does not work for it. >> >> Basically I need to put x in a string as UTF-8 character to display >> on a >> page. >> >> Regards, >> >> - newB > > Do you mean to say that x holds a Unicode code point? If that''s the > case (since ASCII is a subset of Unicode, x.to_s => "1046" is > trivial), then you can use something like this code I wrote a while > back: > irb> ("U+"+(''0''*4+x.to_s(16))[-4,4]).to_utf8 > => "\320\226"Depend on what you are doing, [1046].pack(''U'') may also be appropriate. Fred> Of course, you could hide most of that in an Integer#to_utf8 method. > > -Rob > > Rob Biedenharn http://agileconsultingllc.com > Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org > > # -*- ruby -*- > > class String > # For a string that matches /(?i:U\+?\|\\u)?\d{4}/, return a > suitable UTF-8 > # string for that character. > def to_utf8 > case point = self.match(/[[:xdigit:]]{4}/)[0].to_i(16) > when 0..0x7f > point.chr > when 0x80..0x07ff > x = point & 0b111111 > point >>= 6 > y = point > "#{(0xC0 | y).chr}#{(0x80 | x).chr}" > when 0x0800..0xFFFF > x = point & 0b111111 > point >>= 6 > y = point & 0b111111 > point >>= 6 > z = point > "#{(0xE0 | z).chr}#{(0x80 | y).chr}#{(0x80 | x).chr}" > when 0x10000..0x10FFFF > raise NotImplementedError, "UTF-8 four byte sequences not yet > supported" > else > raise ArgumentError, "Values above U+10FFFF are not supported" > end > end > end > > if __FILE__ == $0 > require ''test/unit'' > class UnicodeHelperTest < Test::Unit::TestCase > def test_ascii > assert_equal ''!'', "U+0021".to_utf8, ''EXCLAMATION MARK'' > assert_equal ''A'', "U+0041".to_utf8, ''UPPERCASE LETTER A'' > assert_equal ''-'', "U+002D".to_utf8, ''HYPHEN-MINUS'' > assert_equal ''~'', "U+007E".to_utf8, ''TILDE'' > > assert_equal ''!'', "0021".to_utf8, ''EXCLAMATION MARK'' > assert_equal ''A'', "0041".to_utf8, ''UPPERCASE LETTER A'' > assert_equal ''-'', "002D".to_utf8, ''HYPHEN-MINUS'' > assert_equal ''~'', "007E".to_utf8, ''TILDE'' > > assert_equal ''!'', "\\u0021".to_utf8, ''EXCLAMATION MARK'' > assert_equal ''A'', "\\u0041".to_utf8, ''UPPERCASE LETTER A'' > assert_equal ''-'', "\\u002D".to_utf8, ''HYPHEN-MINUS'' > assert_equal ''~'', "\\u007E".to_utf8, ''TILDE'' > end > def test_hi_bit_ascii > assert_equal "\xC2\x80", "U+0080".to_utf8, "C-cedilla" > assert_equal "\xC2\xA4", "U+00A4".to_utf8, "Spanish n-tilde" > end > def test_general_punctuation > assert_equal "\342\200\220", "U+2010".to_utf8, "HYPHEN" > assert_equal "\342\200\221", "U+2011".to_utf8, "NON-BREAKING > HYPHEN" > assert_equal "\342\200\222", "U+2012".to_utf8, "FIGURE DASH" > assert_equal "\342\200\223", "U+2013".to_utf8, "EN DASH" > assert_equal "\342\200\224", "U+2014".to_utf8, "EM DASH" > assert_equal "\342\200\225", "U+2015".to_utf8, "QUOTATION DASH" > end > end > end > __END__ > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---