I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but seems that in 0.10.x this is not available. What is the way to do this ? -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Oct-20 05:55 UTC
[Ferret-talk] How to deal with accentuated chars in 0.10.8?
On 10/20/06, Edgar <edgargonzalez at gmail.com> wrote:> I''m startin to use Ferret and acts_as_ferret. > > I need to use something like EuropeanAnalyzer > (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). > > By example, if the user search by "gonzalez" you can find documents taht > contents the term "gonz?lez" (gonzález) > > The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but > seems that in 0.10.x this is not available. > > What is the way to do this ?# try this. Make sure you use the -KU flag. require ''rubygems'' require ''ferret'' require ''jcode'' ACCENTUATED_CHARS = ''???A?????a????????????????'' REPLACEMENT_CHARS = ''aaaaaaaaaaooooeeeeeeeeuuuc'' module Ferret::Analysis class TokenFilter < TokenStream # Construct a token stream filtering the given input. def initialize(input) @input = input end end # replace accentuated chars with ASCII one class ToASCIIFilter < TokenFilter def next() token = @input.next() unless token.nil? token.text = token.text.downcase.tr(ACCENTUATED_CHARS, REPLACEMENT_CHARS) end token end end class EuropeanAnalyzer def token_stream(field, string) return ToASCIIFilter.new(StandardTokenizer.new(string)) end end end analyzer = Ferret::Analysis::EuropeanAnalyzer.new ts = analyzer.token_stream(''xxx'', "Let''s see what " + "happens to ???A?????a????????????????") while t = ts.next puts t end