I''ve installed ferret 0.10.9 together with the latest acts_as_ferret using Windows XP and indexed a location database (geonames.org) with Location.rebuild_index. The data is in utf-8. Now calling Location.find_by_contents "?" does not return a result, causes a lot of CPU load, and finally exits with an error "index.rb:702: in ''parse'': failed to allocate memory (NoMemoryError)". Seems a problem in ''process_query''. Similar results for sometimes for other German Umlauts... -- Posted via http://www.ruby-forum.com/.
David Balmain
2007-Mar-20 02:43 UTC
[Ferret-talk] "ö" causes find_by_contents not to return
On 3/19/07, Star Burger <starburger234 at yahoo.de> wrote:> I''ve installed ferret 0.10.9 together with the latest acts_as_ferret > using Windows XP and indexed a location database (geonames.org) with > Location.rebuild_index. The data is in utf-8. > > Now calling Location.find_by_contents "?" does not return a result, > causes a lot of CPU load, and finally exits with an error "index.rb:702: > in ''parse'': failed to allocate memory (NoMemoryError)". Seems a problem > in ''process_query''. > > Similar results for sometimes for other German Umlauts...Unfortunately Ferret doesn''t come with UTF-8 support in Windows as the win32 runtime environment doesn''t seem to support UTF-8. You will therefore need to write your own analyzer on Windows if you want to support UTF-8 searches. Hopefully the NoMemoryError will be fixed in the next win32 gem I release. -- Dave Balmain http://www.davebalmain.com/
Thomas Senf
2007-Mar-21 13:18 UTC
[Ferret-talk] "ö" causes find_by_content s not to return
David Balmain wrote:> > Unfortunately Ferret doesn''t come with UTF-8 support in Windows as the > win32 runtime environment doesn''t seem to support UTF-8. You will > therefore need to write your own analyzer on Windows if you want to > support UTF-8 searches. >Hello Star Burger, if you''re planning to write your own UTF-8 Analyzer consider the unpack/pack duo: utf-8_encoded_string_from_db.unpack("U*").pack("C*") @index << {:content => utf-8_encoded_string_from_db} @index.search_each(''content:Beh?rde'') {|id,score| do_sth} I didn''t try this in afa, but with ruby it worked in my case. -- Posted via http://www.ruby-forum.com/.
Julio Cesar Ody
2007-Mar-22 00:07 UTC
[Ferret-talk] "ö" causes find_by_content s not to return
I tried this with an UTF-8 encoded string (japanese): "\u304A\u308C\u3068\u9B5A".unpack("U*").pack("C*") Which gives me this in return: "u304Au308Cu3068u9B5A" And that''s not what I want stored in my index, right? Now I''m pretty sure I''m doing something dumb :-) hopefully someone can clarify. Thanks. On 3/22/07, Thomas Senf <thomas.senf at web.de> wrote:> David Balmain wrote: > > > > Unfortunately Ferret doesn''t come with UTF-8 support in Windows as the > > win32 runtime environment doesn''t seem to support UTF-8. You will > > therefore need to write your own analyzer on Windows if you want to > > support UTF-8 searches. > > > > Hello Star Burger, > > if you''re planning to write your own UTF-8 Analyzer consider the > unpack/pack duo: > > utf-8_encoded_string_from_db.unpack("U*").pack("C*") > @index << {:content => utf-8_encoded_string_from_db} > @index.search_each(''content:Beh?rde'') {|id,score| do_sth} > > I didn''t try this in afa, but with ruby it worked in my case. > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- Julio C. Ody http://rootshell.be/~julioody