thr3ads.net - Ferret talk - [Ferret-talk] How to have 'o' == 'ö' [Jan 2007]

If this information is useful, please help other people find it:
Share via:

John Private

2007-Jan-19 17:12 UTC

[Ferret-talk] How to have 'o' == 'ö'

Greetings,

(using acts_as_ferret)

So I have a book title "M?ngrel ?Horsemen?" in my index.

Searching for "M?ngrel" retrieves the document.

But I would like searching for "Mongrel" to also retrieve the
document.
Which it does not currently.

Anyone have any good solutions to this problem?

I suppose I could filter the documents and queries first which something
like:


(Iconv.new(''US-ASCII//TRANSLIT'',
''utf-8'').iconv "M?ngrel
?Horsemen?").gsub(/[^a-zA-Z0-9/im,"")

But perhaps there is a better, or built in solution.


Thanks

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2007-Jan-22 13:49 UTC

head link

[Ferret-talk] How to have 'o' == 'ö'

On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private
wrote:> Greetings,
> 
> (using acts_as_ferret)
> 
> So I have a book title "M?ngrel ?Horsemen?" in my index.
> 
> Searching for "M?ngrel" retrieves the document.
> 
> But I would like searching for "Mongrel" to also retrieve the
document.
> Which it does not currently.
> 
> Anyone have any good solutions to this problem?
> 
> I suppose I could filter the documents and queries first which something
> like:
> 
> 
> (Iconv.new(''US-ASCII//TRANSLIT'',
''utf-8'').iconv "M?ngrel
> ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"")
> 
> But perhaps there is a better, or built in solution.
I don''t think so - a custom Analyzer would be the right place for
this.

Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Xavier Noria

2007-Jan-22 15:24 UTC

head link

[Ferret-talk] How to have 'o' == 'ö'

On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:
> On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
>> Greetings,
>>
>> (using acts_as_ferret)
>>
>> So I have a book title "M?ngrel ?Horsemen?" in my index.
>>
>> Searching for "M?ngrel" retrieves the document.
>>
>> But I would like searching for "Mongrel" to also retrieve the
>> document.
>> Which it does not currently.
>>
>> Anyone have any good solutions to this problem?
>>
>> I suppose I could filter the documents and queries first which  
>> something
>> like:
>>
>>
>> (Iconv.new(''US-ASCII//TRANSLIT'',
''utf-8'').iconv "M?ngrel
>> ?Horsemen?").gsub(/[^a-zA-Z0-9/im,"")
>>
>> But perhaps there is a better, or built in solution.
>
> I don''t think so - a custom Analyzer would be the right place for
> this.
We use a normalizer to store/query (to be revised for Rails 1.2):

   # Utility method that retursn an ASCIIfied, downcased, and  
sanitized string.
   # It relies on the Unicode Hacks plugin by means of String#chars.  
We assume
   # $KCODE is ''u'' in environment.rb. By now we support a wide
range
of latin
   # accented letters, based on the Unicode Character Palette bundled  
in Macs.
   def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[????????]/,    ''a'')
     n.gsub!(/?/,            ''ae'')
     n.gsub!(/[??]/,          ''d'')
     n.gsub!(/[?????]/,       ''c'')
     n.gsub!(/[?????????]/,   ''e'')
     n.gsub!(/?/,             ''f'')
     n.gsub!(/[????]/,        ''g'')
     n.gsub!(/[??]/,           ''h'')
     n.gsub!(/[????????]/,    ''i'')
     n.gsub!(/[????]/,        ''j'')
     n.gsub!(/[??]/,          ''k'')
     n.gsub!(/[?????]/,       ''l'')
     n.gsub!(/[??????]/,      ''n'')
     n.gsub!(/[??????????]/,  ''o'')
     n.gsub!(/?/,            ''oe'')
     n.gsub!(/?/,             ''q'')
     n.gsub!(/[???]/,         ''r'')
     n.gsub!(/[?????]/,       ''s'')
     n.gsub!(/[????]/,        ''t'')
     n.gsub!(/[??????????]/,  ''u'')
     n.gsub!(/?/,             ''w'')
     n.gsub!(/[???]/,         ''y'')
     n.gsub!(/[???]/,         ''z'')
     n.gsub!(/\s+/,            '' '')
     n.gsub!(/[^\sa-z0-9_-]/,   '''')
     n
   end

And this convenience class method to use in Rails models with  
acts_as_ferret (slightly edited):

   # Wrapper function to normalize fields before calling acts_as_ferret
   #
   # Usage: index_fields [:field1, :field2], :option1  
=> ..., :option2 => ...
   #
   # Please note that your queries should use a "_normalized" suffix
on
   # each field, i.e: +field1_normalized:foo
   class ActiveRecord::Base
     def self.index_fields(fields, *options)
       aaf_fields = []
       fields.each do |f|
         class_eval <<-EOS
           def #{f}_normalized
             MyAppUtils.normalize(#{f})
           end
         EOS
         aaf_fields.push ":#{f}_normalized"
       end
       aaf_call = ''acts_as_ferret :fields => ['' +
aaf_fields.join
('','') + '']''
       options.each do |option_pair|
         option_pair.each do |key, value|
           aaf_call << ", :#{key} => #{value}"
         end
       end
       logger.info aaf_call
       class_eval(aaf_call)
     end
   end

-- fxn

David Balmain

2007-Feb-24 12:34 UTC

head link

[Ferret-talk] How to have 'o' == 'ö'

On 1/23/07, Xavier Noria <fxn at hashref.com>
wrote:> On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:
>
> > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
> >> Greetings,
> >>
> >> (using acts_as_ferret)
> >>
> >> So I have a book title "M?ngrel ?Horsemen"" in my
index.
> >>
> >> Searching for "M?ngrel" retrieves the document.
> >>
> >> But I would like searching for "Mongrel" to also
retrieve the
> >> document.
> >> Which it does not currently.
> >>
> >> Anyone have any good solutions to this problem?
> >>
> >> I suppose I could filter the documents and queries first which
> >> something
> >> like:
> >>
> >>
> >> (Iconv.new(''US-ASCII//TRANSLIT'',
''utf-8'').iconv "M?ngrel
> >> ?Horsemen"").gsub(/[^a-zA-Z0-9/im,"")
> >>
> >> But perhaps there is a better, or built in solution.
> >
> > I don''t think so - a custom Analyzer would be the right place
for
> > this.
>
> We use a normalizer to store/query (to be revised for Rails 1.2):
>
>    # Utility method that retursn an ASCIIfied, downcased, and
> sanitized string.
>    # It relies on the Unicode Hacks plugin by means of String#chars.
> We assume
>    # $KCODE is ''u'' in environment.rb. By now we support a
wide range
> of latin
>    # accented letters, based on the Unicode Character Palette bundled
> in Macs.
>    def self.normalize(str)
>      n = str.chars.downcase.strip.to_s
>      n.gsub!(/[????????]/,    ''a'')
>      n.gsub!(/?/,            ''ae'')
>      n.gsub!(/[??]/,          ''d'')
>      n.gsub!(/[?????]/,       ''c'')
>      n.gsub!(/[?????????]/,   ''e'')
>      n.gsub!(/?/,             ''f'')
>      n.gsub!(/[????]/,        ''g'')
>      n.gsub!(/[??]/,           ''h'')
>      n.gsub!(/[????????]/,    ''i'')
>      n.gsub!(/[????]/,        ''j'')
>      n.gsub!(/[??]/,          ''k'')
>      n.gsub!(/[?????]/,       ''l'')
>      n.gsub!(/[??????]/,      ''n'')
>      n.gsub!(/[??????????]/,  ''o'')
>      n.gsub!(/?/,            ''oe'')
>      n.gsub!(/?/,             ''q'')
>      n.gsub!(/[???]/,         ''r'')
>      n.gsub!(/[?????]/,       ''s'')
>      n.gsub!(/[????]/,        ''t'')
>      n.gsub!(/[??????????]/,  ''u'')
>      n.gsub!(/?/,             ''w'')
>      n.gsub!(/[???]/,         ''y'')
>      n.gsub!(/[???]/,         ''z'')
>      n.gsub!(/\s+/,            '' '')
>      n.gsub!(/[^\sa-z0-9_-]/,   '''')
>      n
>    end
>
> And this convenience class method to use in Rails models with
> acts_as_ferret (slightly edited):
>
>    # Wrapper function to normalize fields before calling acts_as_ferret
>    #
>    # Usage: index_fields [:field1, :field2], :option1
> => ..., :option2 => ...
>    #
>    # Please note that your queries should use a "_normalized"
suffix on
>    # each field, i.e: +field1_normalized:foo
>    class ActiveRecord::Base
>      def self.index_fields(fields, *options)
>        aaf_fields = []
>        fields.each do |f|
>          class_eval <<-EOS
>            def #{f}_normalized
>              MyAppUtils.normalize(#{f})
>            end
>          EOS
>          aaf_fields.push ":#{f}_normalized"
>        end
>        aaf_call = ''acts_as_ferret :fields => ['' +
aaf_fields.join
> ('','') + '']''
>        options.each do |option_pair|
>          option_pair.each do |key, value|
>            aaf_call << ", :#{key} => #{value}"
>          end
>        end
>        logger.info aaf_call
>        class_eval(aaf_call)
>      end
>    end
>
> -- fxn
Sorry to bring this one back from the archives (I''m going through all
the email I''ve missed in my long absence). Anyway, I thought that
since not even Jens knew about this I should point out the existence
of MappingFilter:

    http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html

It essentially does the same thing as Xavier''s code above but it is
much faster. It compiles the mappings to a single deterministic finite
automaton (DFA):

    http://en.wikipedia.org/wiki/Deterministic_finite_state_machine

Basically, this means the filter does a single pass through the string
to do all the mappings rather than a pass for each mapping.

Hope that helps somebody,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/

Seemingly Similar Threads

Search for more seemingly similar threads

Ferret talk - Jan 2007 - How to have 'o' == 'ö'

[Ferret-talk] How to have 'o' == 'ö'

[Ferret-talk] How to have 'o' == 'ö'

[Ferret-talk] How to have 'o' == 'ö'

[Ferret-talk] How to have 'o' == 'ö'

Seemingly Similar Threads