Hi all, This problem is making me nuts. I am using Iconv.conv to convert from UTF-8 to ISO-8859-1: Iconv.conv(''iso-8859-1//IGNORE'', ''utf-8'', @data).html_safe Both locally and on production the Ruby version is 1.9.3p0 (Rails 3.0.3), but it raises the following exception only on production: A Iconv::IllegalSequence occurred in newsletters#show: "e acompanham, na"... app/controllers/newsletters_controller.rb:19:in `conv'' If I delete that part of the text, it raises again in other location. This is really strange because the contents locally and on production are exactly the same. Here is the text I am trying to convert (user created data): https://gist.github.com/1664294. Any ideas? Thanks! Henrique -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Peter Vandenabeele
2012-Jan-23 17:22 UTC
Re: Different Iconv behavior with the same Ruby version
On Mon, Jan 23, 2012 at 6:10 PM, Henrique Testa <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org>wrote:> Hi all, > > This problem is making me nuts. I am using Iconv.conv to convert from > UTF-8 to ISO-8859-1: > > Iconv.conv(''iso-8859-1//IGNORE'', ''utf-8'', @data).html_safe > > Both locally and on production the Ruby version is 1.9.3p0 (Rails > 3.0.3), but it raises the following exception only on production: > > A Iconv::IllegalSequence occurred in newsletters#show: > > "e acompanham, na"... > app/controllers/newsletters_controller.rb:19:in `conv'' > > If I delete that part of the text, it raises again in other location. > This is really strange because the contents locally and on production > are exactly the same. Here is the text I am trying to convert (user > created data): https://gist.github.com/1664294. Any ideas? > > Thanks! > > Henrique >FWIW, I was able to reproduce the exception Iconv::IllegalSequence with a simple ruby program (rvm ruby 1.9.3). $ wget https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt --2012-01-23 18:16:02-- https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt Resolving raw.github.com... 207.97.227.243 Connecting to raw.github.com|207.97.227.243|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 50089 (49K) [text/plain] Saving to: `gistfile1.txt'' 100%[======================================>] 50,089 --.-K/s in 0.08s 2012-01-23 18:16:03 (584 KB/s) - `gistfile1.txt'' saved [50089/50089] $ cat convert.rb @data File.open(''gistfile1.txt'') do |f| @data = f.read end require ''iconv'' Iconv.conv(''iso-8859-1//IGNORE'', ''utf-8'', @data).html_safe $ ruby convert.rb /home/peterv/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'': iconv will be deprecated in the future, use String#encode instead. convert.rb:7:in `conv'': " style=\"padding-"... (Iconv::IllegalSequence) from convert.rb:7:in `<main>'' I will do a little bit of research more, Peter> -- > Posted via http://www.ruby-forum.com/. > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- Peter Vandenabeele http://twitter.com/peter_v http://rails.vandenabeele.com gsm: +32-478-27.40.69 e-mail: peter-jNuWw7i2w7syMbTcgqFhxg@public.gmane.org -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Peter Vandenabeele
2012-Jan-23 17:54 UTC
Re: Different Iconv behavior with the same Ruby version
On Mon, Jan 23, 2012 at 6:22 PM, Peter Vandenabeele <peter-jNuWw7i2w7syMbTcgqFhxg@public.gmane.org>wrote:> On Mon, Jan 23, 2012 at 6:10 PM, Henrique Testa <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org>wrote: > >> Hi all, >> >> This problem is making me nuts. I am using Iconv.conv to convert from >> UTF-8 to ISO-8859-1: >> >> Iconv.conv(''iso-8859-1//IGNORE'', ''utf-8'', @data).html_safe >> >> Both locally and on production the Ruby version is 1.9.3p0 (Rails >> 3.0.3), but it raises the following exception only on production: >> >> A Iconv::IllegalSequence occurred in newsletters#show: >> >> "e acompanham, na"... >> app/controllers/newsletters_controller.rb:19:in `conv'' >> >> If I delete that part of the text, it raises again in other location. >> This is really strange because the contents locally and on production >> are exactly the same. Here is the text I am trying to convert (user >> created data): https://gist.github.com/1664294. Any ideas? >> >> Thanks! >> >> Henrique >> > > FWIW, I was able to reproduce the exception > > Iconv::IllegalSequence > > with a simple ruby program (rvm ruby 1.9.3). > > $ wget > https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt > --2012-01-23 18:16:02-- > https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt > Resolving raw.github.com... 207.97.227.243 > Connecting to raw.github.com|207.97.227.243|:443... connected. > HTTP request sent, awaiting response... 200 OK > Length: 50089 (49K) [text/plain] > Saving to: `gistfile1.txt'' > > 100%[======================================>] 50,089 --.-K/s in > 0.08s > > 2012-01-23 18:16:03 (584 KB/s) - `gistfile1.txt'' saved [50089/50089] > > $ cat convert.rb > @data > File.open(''gistfile1.txt'') do |f| > @data = f.read > end > > require ''iconv'' > > Iconv.conv(''iso-8859-1//IGNORE'', ''utf-8'', @data).html_safe > > $ ruby convert.rb > /home/peterv/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in > `require'': iconv will be deprecated in the future, use String#encode > instead. > convert.rb:7:in `conv'': " style=\"padding-"... (Iconv::IllegalSequence) > from convert.rb:7:in `<main>'' > > >Some relevant links: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ http://blog.grayproductions.net/articles/ruby_19s_string http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html#method-i-convert The code that seems to function fairly well is: $ cat convert.rb File.open(''gistfile1.txt'') do |f| f.readlines.each do |line| puts "###############################################" puts line.valid_encoding? # always true ec = Encoding::Converter.new("utf-8", "ISO-8859-1", :undef => :replace) ec.replacement = "UNDEFINED" puts ec.convert(line) end end $ ruby convert.rb > result This code converts your entire document (line by line) without throwing exceptions. The source text seems to be always valid UTF-8. But ... some UTF-8 constructs seem to be incompatible to translate to ISO-8859-1, e.g. the long dash in this piece of text: "... institucional do Grupo Zaffari – aliás ..." It is found back in the output with the code "UNDEFINED" that I defined. Without the :undef, that produced: convert.rb:9:in `convert'': U+2013 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError) That seems quite plausible since UTF-8 has many different code points, but ISO-8859-1 is limited to 1 byte if I understand correctly. I hope this can put you on the right track, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Henrique Testa
2012-Jan-24 00:14 UTC
Re: Different Iconv behavior with the same Ruby version
Thank you very much Peter! I used your code and replaced these UTF-8 only chars for similars in ISO-8859-1 (tryed the transliterate mehod but it seems it doesn''t work for special chars). Thanks again, Henrique -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.