Greetings, (using acts_as_ferret) So I have a book title "M?ngrel ?Horsemen?" in my index. Searching for "M?ngrel" retrieves the document. But I would like searching for "Mongrel" to also retrieve the document. Which it does not currently. Anyone have any good solutions to this problem? I suppose I could filter the documents and queries first which something like: (Iconv.new(''US-ASCII//TRANSLIT'', ''utf-8'').iconv "M?ngrel ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") But perhaps there is a better, or built in solution. Thanks -- Posted via http://www.ruby-forum.com/.
On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:> Greetings, > > (using acts_as_ferret) > > So I have a book title "M?ngrel ?Horsemen?" in my index. > > Searching for "M?ngrel" retrieves the document. > > But I would like searching for "Mongrel" to also retrieve the document. > Which it does not currently. > > Anyone have any good solutions to this problem? > > I suppose I could filter the documents and queries first which something > like: > > > (Iconv.new(''US-ASCII//TRANSLIT'', ''utf-8'').iconv "M?ngrel > ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") > > But perhaps there is a better, or built in solution.I don''t think so - a custom Analyzer would be the right place for this. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:> On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote: >> Greetings, >> >> (using acts_as_ferret) >> >> So I have a book title "M?ngrel ?Horsemen?" in my index. >> >> Searching for "M?ngrel" retrieves the document. >> >> But I would like searching for "Mongrel" to also retrieve the >> document. >> Which it does not currently. >> >> Anyone have any good solutions to this problem? >> >> I suppose I could filter the documents and queries first which >> something >> like: >> >> >> (Iconv.new(''US-ASCII//TRANSLIT'', ''utf-8'').iconv "M?ngrel >> ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"") >> >> But perhaps there is a better, or built in solution. > > I don''t think so - a custom Analyzer would be the right place for > this.We use a normalizer to store/query (to be revised for Rails 1.2): # Utility method that retursn an ASCIIfied, downcased, and sanitized string. # It relies on the Unicode Hacks plugin by means of String#chars. We assume # $KCODE is ''u'' in environment.rb. By now we support a wide range of latin # accented letters, based on the Unicode Character Palette bundled in Macs. def self.normalize(str) n = str.chars.downcase.strip.to_s n.gsub!(/[????????]/, ''a'') n.gsub!(/?/, ''ae'') n.gsub!(/[??]/, ''d'') n.gsub!(/[?????]/, ''c'') n.gsub!(/[?????????]/, ''e'') n.gsub!(/?/, ''f'') n.gsub!(/[????]/, ''g'') n.gsub!(/[??]/, ''h'') n.gsub!(/[????????]/, ''i'') n.gsub!(/[????]/, ''j'') n.gsub!(/[??]/, ''k'') n.gsub!(/[?????]/, ''l'') n.gsub!(/[??????]/, ''n'') n.gsub!(/[??????????]/, ''o'') n.gsub!(/?/, ''oe'') n.gsub!(/?/, ''q'') n.gsub!(/[???]/, ''r'') n.gsub!(/[?????]/, ''s'') n.gsub!(/[????]/, ''t'') n.gsub!(/[??????????]/, ''u'') n.gsub!(/?/, ''w'') n.gsub!(/[???]/, ''y'') n.gsub!(/[???]/, ''z'') n.gsub!(/\s+/, '' '') n.gsub!(/[^\sa-z0-9_-]/, '''') n end And this convenience class method to use in Rails models with acts_as_ferret (slightly edited): # Wrapper function to normalize fields before calling acts_as_ferret # # Usage: index_fields [:field1, :field2], :option1 => ..., :option2 => ... # # Please note that your queries should use a "_normalized" suffix on # each field, i.e: +field1_normalized:foo class ActiveRecord::Base def self.index_fields(fields, *options) aaf_fields = [] fields.each do |f| class_eval <<-EOS def #{f}_normalized MyAppUtils.normalize(#{f}) end EOS aaf_fields.push ":#{f}_normalized" end aaf_call = ''acts_as_ferret :fields => ['' + aaf_fields.join ('','') + '']'' options.each do |option_pair| option_pair.each do |key, value| aaf_call << ", :#{key} => #{value}" end end logger.info aaf_call class_eval(aaf_call) end end -- fxn
On 1/23/07, Xavier Noria <fxn at hashref.com> wrote:> On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote: > > > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote: > >> Greetings, > >> > >> (using acts_as_ferret) > >> > >> So I have a book title "M?ngrel ?Horsemen"" in my index. > >> > >> Searching for "M?ngrel" retrieves the document. > >> > >> But I would like searching for "Mongrel" to also retrieve the > >> document. > >> Which it does not currently. > >> > >> Anyone have any good solutions to this problem? > >> > >> I suppose I could filter the documents and queries first which > >> something > >> like: > >> > >> > >> (Iconv.new(''US-ASCII//TRANSLIT'', ''utf-8'').iconv "M?ngrel > >> ?Horsemen"").gsub(/[^a-zA-Z0-9/im,"") > >> > >> But perhaps there is a better, or built in solution. > > > > I don''t think so - a custom Analyzer would be the right place for > > this. > > We use a normalizer to store/query (to be revised for Rails 1.2): > > # Utility method that retursn an ASCIIfied, downcased, and > sanitized string. > # It relies on the Unicode Hacks plugin by means of String#chars. > We assume > # $KCODE is ''u'' in environment.rb. By now we support a wide range > of latin > # accented letters, based on the Unicode Character Palette bundled > in Macs. > def self.normalize(str) > n = str.chars.downcase.strip.to_s > n.gsub!(/[????????]/, ''a'') > n.gsub!(/?/, ''ae'') > n.gsub!(/[??]/, ''d'') > n.gsub!(/[?????]/, ''c'') > n.gsub!(/[?????????]/, ''e'') > n.gsub!(/?/, ''f'') > n.gsub!(/[????]/, ''g'') > n.gsub!(/[??]/, ''h'') > n.gsub!(/[????????]/, ''i'') > n.gsub!(/[????]/, ''j'') > n.gsub!(/[??]/, ''k'') > n.gsub!(/[?????]/, ''l'') > n.gsub!(/[??????]/, ''n'') > n.gsub!(/[??????????]/, ''o'') > n.gsub!(/?/, ''oe'') > n.gsub!(/?/, ''q'') > n.gsub!(/[???]/, ''r'') > n.gsub!(/[?????]/, ''s'') > n.gsub!(/[????]/, ''t'') > n.gsub!(/[??????????]/, ''u'') > n.gsub!(/?/, ''w'') > n.gsub!(/[???]/, ''y'') > n.gsub!(/[???]/, ''z'') > n.gsub!(/\s+/, '' '') > n.gsub!(/[^\sa-z0-9_-]/, '''') > n > end > > And this convenience class method to use in Rails models with > acts_as_ferret (slightly edited): > > # Wrapper function to normalize fields before calling acts_as_ferret > # > # Usage: index_fields [:field1, :field2], :option1 > => ..., :option2 => ... > # > # Please note that your queries should use a "_normalized" suffix on > # each field, i.e: +field1_normalized:foo > class ActiveRecord::Base > def self.index_fields(fields, *options) > aaf_fields = [] > fields.each do |f| > class_eval <<-EOS > def #{f}_normalized > MyAppUtils.normalize(#{f}) > end > EOS > aaf_fields.push ":#{f}_normalized" > end > aaf_call = ''acts_as_ferret :fields => ['' + aaf_fields.join > ('','') + '']'' > options.each do |option_pair| > option_pair.each do |key, value| > aaf_call << ", :#{key} => #{value}" > end > end > logger.info aaf_call > class_eval(aaf_call) > end > end > > -- fxnSorry to bring this one back from the archives (I''m going through all the email I''ve missed in my long absence). Anyway, I thought that since not even Jens knew about this I should point out the existence of MappingFilter: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html It essentially does the same thing as Xavier''s code above but it is much faster. It compiles the mappings to a single deterministic finite automaton (DFA): http://en.wikipedia.org/wiki/Deterministic_finite_state_machine Basically, this means the filter does a single pass through the string to do all the mappings rather than a pass for each mapping. Hope that helps somebody, Dave -- Dave Balmain http://www.davebalmain.com/