Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right now I''m using this for my aaf line (I don''t want any stop words either as smaller docs, each word even articles can have some significance): acts_as_ferret( { :fields => [ :name ] }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new([]) } ) How should I go about removing the apostrophes when docs are added to the index? Thanks, Chris -- Posted via ruby-forum.com.
On Mon, Jun 25, 2007 at 05:02:54PM +0200, Chris Brickley wrote:> Hi, I just started using ferret and the aaf plugin and it seems to work > quite nicely. However, my fields are very short (titles of music) and I > don''t think may users will be typing in apostrophes when they are > looking for something. Right now, for a simple document such as "what > i''ve done" I''d like it to be indexed as "what ive done" instead. Right > now I''m using this for my aaf line (I don''t want any stop words either > as smaller docs, each word even articles can have some significance): > > acts_as_ferret( { :fields => [ :name ] }, { :analyzer => > Ferret::Analysis::StandardAnalyzer.new([]) } ) > > How should I go about removing the apostrophes when docs are added to > the index?I''d implement a custom analyzer that does what StandardAnalyzer does, plus filtering out the apostrophes from the tokens (which should be possible with a custom filter added to the chain). For a starting point, see ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Ok thanks for that link. However, I am a bit lost as to where I would put my analyzer code? In my model itself or somewhere else? This is what I came up with: class MyAnalyzer < Analyzer def initialize(stop_words = FULL_ENGLISH_STOP_WORDS, lower = true) @lower = lower @stop_words = stop_words end def token_stream(field, str) ts = StandardTokenizer.new(str) ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = HyphenFilter.new(ts) ts = ApostropheFilter.new(ts) end end class ApostropheFilter def next() t = @input.next() if (t == nil) return nil end t.term_text = t.term_text.tr("''","") return t end end I tried putting it below my aaf declaration in my model file but I just get: "NameError: uninitialized constant Ferret::Analysis::MyAnalyzer" when trying to do Model.rebuild_index. Thanks. -- Posted via ruby-forum.com.
I''d just put this into lib/, if you call the file my_analyzer.rb it should be found and loaded by Rails automatically when you use the class. if not, require it explicitly in environment.rb. Jens On Tue, Jun 26, 2007 at 04:25:27PM +0200, Chris Brickley wrote:> Ok thanks for that link. However, I am a bit lost as to where I would > put my analyzer code? In my model itself or somewhere else? > > This is what I came up with: > > > class MyAnalyzer < Analyzer > def initialize(stop_words = FULL_ENGLISH_STOP_WORDS, lower = true) > @lower = lower > @stop_words = stop_words > end > > def token_stream(field, str) > ts = StandardTokenizer.new(str) > ts = LowerCaseFilter.new(ts) if @lower > ts = StopFilter.new(ts, @stop_words) > ts = HyphenFilter.new(ts) > ts = ApostropheFilter.new(ts) > end > end > > class ApostropheFilter > def next() > t = @input.next() > > if (t == nil) > return nil > end > > t.term_text = t.term_text.tr("''","") > > return t > end > end > > I tried putting it below my aaf declaration in my model file but I just > get: > "NameError: uninitialized constant Ferret::Analysis::MyAnalyzer" when > trying to do Model.rebuild_index. > > Thanks. > > -- > Posted via ruby-forum.com. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > rubyforge.org/mailman/listinfo/ferret-talk >-- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Jens Kraemer wrote:> I''d just put this into lib/, if you call the file my_analyzer.rb it > should be found and loaded by Rails automatically when you use the > class. > > if not, require it explicitly in environment.rb. > > JensAwesome! Thanks Jens :) Adding the require to environment.rb did the trick (as well as putting it in the lib dir). Thanks for all your help! -- Posted via ruby-forum.com.