thr3ads.net - Ferret talk - [Ferret-talk] How to deal with accentuated chars in 0.10.8? [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Edgar

2006-Oct-19 19:57 UTC

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret.

I need to use something like EuropeanAnalyzer 
(http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).

By example, if the user search by "gonzalez" you can find documents
taht
contents the term "gonz?lez" (gonz&aacute;lez)

The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but 
seems that in 0.10.x this is not available.

What is the way to do this ?


-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Oct-20 05:55 UTC

head link

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

On 10/20/06, Edgar <edgargonzalez at gmail.com>
wrote:> I''m startin to use Ferret and acts_as_ferret.
>
> I need to use something like EuropeanAnalyzer
>
(http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).
>
> By example, if the user search by "gonzalez" you can find
documents taht
> contents the term "gonz?lez" (gonz&aacute;lez)
>
> The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but
> seems that in 0.10.x this is not available.
>
> What is the way to do this ?
# try this. Make sure you use the -KU flag.
require ''rubygems''
require ''ferret''
require ''jcode''

ACCENTUATED_CHARS = ''???A?????a????????????????''
REPLACEMENT_CHARS = ''aaaaaaaaaaooooeeeeeeeeuuuc''

module Ferret::Analysis
  class TokenFilter < TokenStream
    # Construct a token stream filtering the given input.
    def initialize(input)
      @input = input
    end
  end

  # replace accentuated chars with ASCII one
  class ToASCIIFilter < TokenFilter
    def next()
      token = @input.next()
      unless token.nil?
        token.text = token.text.downcase.tr(ACCENTUATED_CHARS,
REPLACEMENT_CHARS)
      end
      token
    end
  end

  class EuropeanAnalyzer
    def token_stream(field, string)
      return ToASCIIFilter.new(StandardTokenizer.new(string))
    end
  end
end

analyzer = Ferret::Analysis::EuropeanAnalyzer.new
ts = analyzer.token_stream(''xxx'', "Let''s see
what " +
                           "happens to ???A?????a????????????????")
while t = ts.next
  puts t
end

Edgar

2006-Oct-20 14:26 UTC

head link

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

David,

Thanks for the tip, but I''ll try your latest release (0.10.13) :-)


-- 
Posted via http://www.ruby-forum.com/.

Maybe Matching Threads

Search for more reasonably related threads

Ferret talk - Oct 2006 - How to deal with accentuated chars in 0.10.8?

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

[Ferret-talk] How to deal with accentuated chars in 0.10.8?

Maybe Matching Threads