I''ve just installed acts_as_ferret, and am trying to build my index, but I''m getting the following error:>> r = Topic.find_by_contents(''testing'')StandardError: : Error occured at <analysis.c>:704 Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `<<'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `rebuild_index'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `rebuild_index'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:247:in `create_index_instance'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:240:in `ferret_index'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:325:in `find_id_by_contents'' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:262:in `find_by_contents'' from (irb):1 I''m using the current version of ferret (gem install ferret) and acts_as_ferret (script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/plugin/stable/acts_as_ferret) as of today, 7/6/06. I have tried setting my locale in environment.rb as mentioned here http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (note: i''m not using typo, but the locale note at the bottom seems to apply). So in the Rails::Initializer.run block, I''ve put this line: ENV[''LANG''] = ''en_US.UTF-8'' Didn''t make a difference. Any other ideas? Thanks, Ian. -- Posted via http://www.ruby-forum.com/.
Hi Ian, On Thu, Jul 06, 2006 at 11:58:57PM +0200, Ian Zabel wrote:> I''ve just installed acts_as_ferret, and am trying to build my index, but > I''m getting the following error: > > >> r = Topic.find_by_contents(''testing'') > StandardError: : Error occured at <analysis.c>:704 > Error: exception 2 not handled: Error decoding input string. Check that > you have the locale set correctly[..]> I have tried setting my locale in environment.rb as mentioned here > http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (note: > i''m not using typo, but the locale note at the bottom seems to apply). > > So in the Rails::Initializer.run block, I''ve put this line: ENV[''LANG''] > = ''en_US.UTF-8'' > > Didn''t make a difference. Any other ideas?I put this statement at the very top of the file, outside of the block. Maybe that will do the trick. You also should make sure the locale exists on your system. On a Debian-based system, you could do dpkg-reconfigure locales and make sure the box before "en_US.UTF-8" is ticked. Hope this helps, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Thanks for the response!> I put this statement at the very top of the file, outside of the block. > Maybe that will do the trick. > > You also should make sure the locale exists on your system. On > a Debian-based system, you could do > dpkg-reconfigure locales > and make sure the box before "en_US.UTF-8" is ticked.I determined with `locale -a` that the locale on the box is called "en_US.utf8", so I added "ENV[''LANG''] = ''en_US.utf8''" at the top of my environment.rb (right after "ENV[''RAILS_ENV''] ||= ''production''". Still getting the same error: "Error decoding input string. Check that you have the locale set correctly" :( It may be worth noting that it seems to only be a problem with this particular model. I am able to index a different model without any issues. So it''s gotta be something with the data. I noticed that my InnoDB topics table was set to latin1 charset, so I changed it to utf8. I still get the same error. Not sure where to go next. Ian. -- Posted via http://www.ruby-forum.com/.
On Fri, Jul 07, 2006 at 05:29:50AM +0200, Ian Zabel wrote:> Thanks for the response! > > > I put this statement at the very top of the file, outside of the block. > > Maybe that will do the trick. > > > > You also should make sure the locale exists on your system. On > > a Debian-based system, you could do > > dpkg-reconfigure locales > > and make sure the box before "en_US.UTF-8" is ticked. > > I determined with `locale -a` that the locale on the box is called > "en_US.utf8", so I added "ENV[''LANG''] = ''en_US.utf8''" at the top of my > environment.rb (right after "ENV[''RAILS_ENV''] ||= ''production''". > > Still getting the same error: "Error decoding input string. Check that > you have the locale set correctly" > > :( > > It may be worth noting that it seems to only be a problem with this > particular model. I am able to index a different model without any > issues. So it''s gotta be something with the data. > > I noticed that my InnoDB topics table was set to latin1 charset, so I > changed it to utf8. I still get the same error.Imho changing the default charset of a table doesn''t change the encoding of the data stored in it. So that''s still latin1 what you get from your DB.> Not sure where to go next.The ENV[''LANG''] value has to correspond to the encoding of the data you want to index, so if your data is latin1, Ferret needs to run with such a locale, i.e. ISO-8859-1. In such cases I dump the data as text, convert to utf8 (usually with vim :set fileencoding=utf8), re-create the table with DEFAULT CHARSET UTF-8 and re-import the data. With large data sets other solutions might be more efficient, though. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
On 7/7/06, Jens Kraemer <kraemer at webit.de> wrote:> On Fri, Jul 07, 2006 at 05:29:50AM +0200, Ian Zabel wrote: > > Thanks for the response! > > > > > I put this statement at the very top of the file, outside of the block. > > > Maybe that will do the trick. > > > > > > You also should make sure the locale exists on your system. On > > > a Debian-based system, you could do > > > dpkg-reconfigure locales > > > and make sure the box before "en_US.UTF-8" is ticked. > > > > I determined with `locale -a` that the locale on the box is called > > "en_US.utf8", so I added "ENV[''LANG''] = ''en_US.utf8''" at the top of my > > environment.rb (right after "ENV[''RAILS_ENV''] ||= ''production''". > > > > Still getting the same error: "Error decoding input string. Check that > > you have the locale set correctly" > > > > :( > > > > It may be worth noting that it seems to only be a problem with this > > particular model. I am able to index a different model without any > > issues. So it''s gotta be something with the data. > > > > I noticed that my InnoDB topics table was set to latin1 charset, so I > > changed it to utf8. I still get the same error. > > Imho changing the default charset of a table doesn''t change the encoding > of the data stored in it. So that''s still latin1 what you get from your > DB. > > > Not sure where to go next. > > The ENV[''LANG''] value has to correspond to the encoding of the data you > want to index, so if your data is latin1, Ferret needs to run with such > a locale, i.e. ISO-8859-1. > > In such cases I dump the data as text, convert to utf8 (usually > with vim :set fileencoding=utf8), re-create the table with DEFAULT > CHARSET UTF-8 and re-import the data. > > With large data sets other solutions might be more efficient, though.Here''s one way you can convert ISO-8859-1 to UTF-8; str = str.unpack("C*).map {|c| if c < 0x80 next c.chr elsif c < 0xC0 next "\xC2" + c.chr else next "\xC3" + (c - 64).chr end }.join("") That may help. Cheers, Dave
> The ENV[''LANG''] value has to correspond to the encoding of the data you > want to index, so if your data is latin1, Ferret needs to run with such > a locale, i.e. ISO-8859-1. > > In such cases I dump the data as text, convert to utf8 (usually > with vim :set fileencoding=utf8), re-create the table with DEFAULT > CHARSET UTF-8 and re-import the data.So I tried to get this to work in MANY different ways. I converted the encoding with vim, iconv, and mysqldump/import. I changed the table types. I tried this: http://textsnippets.com/posts/show/84 , I tried this: http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/ No matter what I try, I get the same error when I change my environment.rb to en_US.utf8. If I set it to en_US.iso88591, everything works fine. If I could successfully convert my database to utf8 AND get it to work with ferret, I would love to. But I just can''t get it. So.... I think I''m going to stick with latin1 for now. :( Thanks for your help, guys! Ian. -- Posted via http://www.ruby-forum.com/.