What''s a good solution for fixing character encoding problems for compatibility between ascii and utf-8? The database is postgres and is encoded in utf-8. Once in awhile there will be a compatibility error from strings from a webform. Is there a command to fix this besides using a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always work either. Thanks, Erica -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi Erica, I ran into similar situation a while ago for a webservice app I was working on where I had to handle a lot of bad / untrusted non-utf8 data, and found a fix that met the needs of the app using Iconv (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) following a strategy outlined by Paul Battley (http://po-ru.com/diary/ fixing-invalid-utf-8-in-ruby-revisited/): ... def AppUtil.force_utf8(str) ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') return ic.iconv("#{str} ")[0..-2] end ... Jeff On Jun 16, 5:27 pm, Erica <ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> What''s a good solution for fixing character encoding problems for > compatibility between ascii and utf-8? The database is postgres and > is encoded in utf-8. > > Once in awhile there will be a compatibility error from strings from a > webform. > > Is there a command to fix this besides using > a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always > work either. > > Thanks, > > Erica-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Thanks for your response. I tried this on a string that was causing the error and it didn''t work. The problem is with microsoft word special characters. I can''t find a way to replace these characters. Here is one website I found that describes the special characters: http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql, although it''s not about rails. Can anyone help me out? Thanks, Erica On Jun 17, 7:38 pm, Jeff Lewis <jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> HiErica, > > I ran into similar situation a while ago for a webservice app I was > working on where I had to handle a lot of bad / untrusted non-utf8 > data, and found a fix that met the needs of the app using Iconv > (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) > following a strategy outlined by Paul Battley (http://po-ru.com/diary/ > fixing-invalid-utf-8-in-ruby-revisited/): > > ... > def AppUtil.force_utf8(str) > ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') > return ic.iconv("#{str} ")[0..-2] > end > ... > > Jeff > > On Jun 16, 5:27 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > What''s a good solution for fixing character encoding problems for > > compatibility between ascii and utf-8? The database is postgres and > > is encoded in utf-8. > > > Once in awhile there will be a compatibility error from strings from a > > webform. > > > Is there a command to fix this besides using > > a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always > > work either. > > > Thanks, > > >Erica-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hey, I''m using Rails in a Microsoft platform, so I can''t rely use iconv, I had a lot of problems with encoding, and finally I solved with the attached script. I hope it will help you! El 21/06/2011 1:33, Erica escribió:> Thanks for your response. I tried this on a string that was causing > the error and it didn''t work. The problem is with microsoft word > special characters. I can''t find a way to replace these characters. > Here is one website I found that describes the special characters: > http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql, > although it''s not about rails. > > Can anyone help me out? > > Thanks, > > Erica > > On Jun 17, 7:38 pm, Jeff Lewis<jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> HiErica, >> >> I ran into similar situation a while ago for a webservice app I was >> working on where I had to handle a lot of bad / untrusted non-utf8 >> data, and found a fix that met the needs of the app using Iconv >> (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) >> following a strategy outlined by Paul Battley (http://po-ru.com/diary/ >> fixing-invalid-utf-8-in-ruby-revisited/): >> >> ... >> def AppUtil.force_utf8(str) >> ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') >> return ic.iconv("#{str} ")[0..-2] >> end >> ... >> >> Jeff >> >> On Jun 16, 5:27 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> What''s a good solution for fixing character encoding problems for >>> compatibility between ascii and utf-8? The database is postgres and >>> is encoded in utf-8. >>> Once in awhile there will be a compatibility error from strings from a >>> webform. >>> Is there a command to fix this besides using >>> a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always >>> work either. >>> Thanks, >>> Erica-- Miquel Cubel Escarré +34 699 73 22 46 mcubel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org "Computers are good at following instructions, but not at reading your mind." Donald Knuth. "Los ordenadores son buenos siguiendo instrucciones, pero no leyendo tu mente." Donald Knuth. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
You probably need to figure out the actual encoding and explicitly convert from that to UTF-8. This is a snippet of code that I have in a real project: open(DATAFEED_URI) do |file| local_filename = local_path local_filename.open(''w'') do |outf| file.each do |line| begin outf.write Iconv.conv(''UTF-8//TRANSLIT//IGNORE'', ''WINDOWS-1252'', line) rescue Iconv::IllegalSequence => e shlogger.error { "#{DATAFEED_URI} line #{file.lineno} could not be translated:\n#{line}" } end end end local_filename.open(''r'') {|opened| yield opened } end The part that you''re going to be interested in is the line that calls Iconv and, in particular, the second argument of ''WINDOWS-1252'' which is likely the encoding of your data. There are also a couple aliases for that code page: $ iconv -l | grep -e 1252 CP1252 MS-ANSI WINDOWS-1252 (`iconv -l` prints a list of all the encodings known by iconv.) I hope that helps. -Rob On Jun 20, 2011, at 7:33 PM, Erica wrote:> Thanks for your response. I tried this on a string that was causing > the error and it didn''t work. The problem is with microsoft word > special characters. I can''t find a way to replace these characters. > Here is one website I found that describes the special characters: > http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql, > although it''s not about rails. > > Can anyone help me out? > > Thanks, > > Erica > > On Jun 17, 7:38 pm, Jeff Lewis <jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> HiErica, >> >> I ran into similar situation a while ago for a webservice app I was >> working on where I had to handle a lot of bad / untrusted non-utf8 >> data, and found a fix that met the needs of the app using Iconv >> (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) >> following a strategy outlined by Paul Battley (http://po-ru.com/ >> diary/ >> fixing-invalid-utf-8-in-ruby-revisited/): >> >> ... >> def AppUtil.force_utf8(str) >> ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') >> return ic.iconv("#{str} ")[0..-2] >> end >> ... >> >> Jeff >> >> On Jun 16, 5:27 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> What''s a good solution for fixing character encoding problems for >>> compatibility between ascii and utf-8? The database is postgres and >>> is encoded in utf-8. >> >>> Once in awhile there will be a compatibility error from strings >>> from a >>> webform. >> >>> Is there a command to fix this besides using >>> a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always >>> work either. >> >>> Thanks, >> >>> Erica > > -- > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails- > talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > . > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en > . >Rob Biedenharn Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org http://AgileConsultingLLC.com/ rab-/VpnD74mH8+00s0LW7PaslaTQe2KTcn/@public.gmane.org http://GaslightSoftware.com/ -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi Erica, I personally haven''t had to deal with encoding issues yet, but remember reading couple of posts from Yehuda Katz (of merb fame and core contributor to rails) on that. Maybe these can help you identify and fix your problem: http://yehudakatz.com/2010/05/17/encodings-unabridged/ http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ The articles are little long, but if you know a good deal about encodings, then you can skip towards end of the posts where he writes about how to deal with conversions. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/HRgOhAutnN0J. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Maybe post an example of a string/char that''s causing the problem, as it''s logged in your app''s log? Here''s an example of a problem string/char that I was seeing in data posted to my app: $ ./script/rails console ... ruby-1.9.2-p136 :001 > s = "foo\xAE bar" => "foo\xAE bar" ruby-1.9.2-p136 :002 > s.is_utf8? => false ruby-1.9.2-p136 :003 > s.valid_encoding? => false ruby-1.9.2-p136 :004 > s.sub(/bar/, ''biz'') ArgumentError: invalid byte sequence in UTF-8 from (irb):4:in `sub'' ... ruby-1.9.2-p136 :005 > s2 = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'').iconv("#{s} ")[0..-2] => "foo bar" ruby-1.9.2-p136 :006 > s2.gsub(/bar/, ''biz'') => "foo biz" And if that''s not doing the trick, then maybe try forcing the string to utf8 first?: ruby-1.9.2-p136 :007 > s3 = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'').iconv("#{s.force_encoding(''UTF-8'')} ")[0..-2] => "foo bar" Jeff On Jun 20, 4:33 pm, Erica <ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Thanks for your response. I tried this on a string that was causing > the error and it didn''t work. The problem is with microsoft word > special characters. I can''t find a way to replace these characters. > Here is one website I found that describes the special characters:http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql, > although it''s not about rails. > > Can anyone help me out? > > Thanks, > > Erica > > On Jun 17, 7:38 pm, Jeff Lewis <jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > > > > > HiErica, > > > I ran into similar situation a while ago for a webservice app I was > > working on where I had to handle a lot of bad / untrusted non-utf8 > > data, and found a fix that met the needs of the app using Iconv > > (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) > > following a strategy outlined by Paul Battley (http://po-ru.com/diary/ > > fixing-invalid-utf-8-in-ruby-revisited/): > > > ... > > def AppUtil.force_utf8(str) > > ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') > > return ic.iconv("#{str} ")[0..-2] > > end > > ... > > > Jeff > > > On Jun 16, 5:27 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > What''s a good solution for fixing character encoding problems for > > > compatibility between ascii and utf-8? The database is postgres and > > > is encoded in utf-8. > > > > Once in awhile there will be a compatibility error from strings from a > > > webform. > > > > Is there a command to fix this besides using > > > a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always > > > work either. > > > > Thanks, > > > >Erica-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Thank you everyone for your responses. They are helped me figure out a solution. This seems to work for my problem: s = s.gsub("\xe2\x80\x9c", ''"'') s = s.gsub("\xe2\x80\x9d", ''"'') s = s.gsub("\xe2\x80\x98", "''") s = s.gsub("\xe2\x80\x99", "''") s = s.gsub("\xe2\x80\x93", "-") s = s.gsub("\xe2\x80\x94", "--") s = s.gsub("\xe2\x80\xa6", "...") s = Iconv.conv(''UTF-8//IGNORE'', ''UTF-8'', s) -Erica On Jun 21, 12:24 pm, Jeff Lewis <jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Maybe post an example of a string/char that''s causing the problem, as > it''s logged in your app''s log? > > Here''s an example of a problem string/char that I was seeing in data > posted to my app: > > $ ./script/rails console > ... > ruby-1.9.2-p136 :001 > s = "foo\xAE bar" > => "foo\xAE bar" > > ruby-1.9.2-p136 :002 > s.is_utf8? > => false > > ruby-1.9.2-p136 :003 > s.valid_encoding? > => false > > ruby-1.9.2-p136 :004 > s.sub(/bar/, ''biz'') > ArgumentError: invalid byte sequence in UTF-8 > from (irb):4:in `sub'' > ... > > ruby-1.9.2-p136 :005 > s2 = Iconv.new(''UTF-8//IGNORE'', > ''UTF-8'').iconv("#{s} ")[0..-2] > => "foo bar" > > ruby-1.9.2-p136 :006 > s2.gsub(/bar/, ''biz'') > => "foo biz" > > And if that''s not doing the trick, then maybe try forcing the string > to utf8 first?: > > ruby-1.9.2-p136 :007 > s3 = Iconv.new(''UTF-8//IGNORE'', > ''UTF-8'').iconv("#{s.force_encoding(''UTF-8'')} ")[0..-2] > => "foo bar" > > Jeff > > On Jun 20, 4:33 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Thanks for your response. I tried this on a string that was causing > > the error and it didn''t work. The problem is with microsoft word > > special characters. I can''t find a way to replace these characters. > > Here is one website I found that describes the special characters:http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql, > > although it''s not about rails. > > > Can anyone help me out? > > > Thanks, > > >Erica > > > On Jun 17, 7:38 pm, Jeff Lewis <jeff.bu...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > HiErica, > > > > I ran into similar situation a while ago for a webservice app I was > > > working on where I had to handle a lot of bad / untrusted non-utf8 > > > data, and found a fix that met the needs of the app using Iconv > > > (http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html) > > > following a strategy outlined by Paul Battley (http://po-ru.com/diary/ > > > fixing-invalid-utf-8-in-ruby-revisited/): > > > > ... > > > def AppUtil.force_utf8(str) > > > ic = Iconv.new(''UTF-8//IGNORE'', ''UTF-8'') > > > return ic.iconv("#{str} ")[0..-2] > > > end > > > ... > > > > Jeff > > > > On Jun 16, 5:27 pm,Erica<ericarhol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > What''s a good solution for fixing character encoding problems for > > > > compatibility between ascii and utf-8? The database is postgres and > > > > is encoded in utf-8. > > > > > Once in awhile there will be a compatibility error from strings from a > > > > webform. > > > > > Is there a command to fix this besides using > > > > a_string.force_encoding(''utf-8'')? Even this doesn''t seem to always > > > > work either. > > > > > Thanks, > > > > >Erica-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.