Hello all, Quick question, I''m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop word I get the normal results back. Do I need to do a search of my query for stop words and remove them myself? Or is there something I''m doing wrong with passing my query to AAF? Thanks, Ray -- Posted via http://www.ruby-forum.com/.
Depends on how you produced your query. In general, your query has to pass through the same analyzer that was used for indexing. So, when building a PhraseQuery, for instance, you have to get each word from the analyzer. keywords.each {|keyword| query = Search::PhraseQuery.new(:fieldname) analyzer = StemmedAnalyzer.new tokenizer = analyzer.token_stream(:fieldname, keyword) while (token = tokenizer.next) query << token.text end } This is how I do it, it would be nicer if AAF would encapsulate this. Regards, Ewout>Hello all, >Quick question, I''m using AAF and the following custom analyzer: > >class StemmedAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = ENGLISH_STOP_WORDS) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new >(StandardTokenizer.new(str)), >@stop_words)) > end > > >However when my search term includes a stop word I never get any results >back. Once I remove the stop word I get the normal results back. Do I >need to do a search of my query for stop words and remove them myself? >Or is there something I''m doing wrong with passing my query to AAF? > >Thanks, >Ray > >-- >Posted via http://www.ruby-forum.com/. >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk
On Fri, Jan 12, 2007 at 12:07:07AM +0100, Raymond O''connor wrote:> Hello all, > Quick question, I''m using AAF and the following custom analyzer: > > class StemmedAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = ENGLISH_STOP_WORDS) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), > @stop_words)) > end > > > However when my search term includes a stop word I never get any results > back. Once I remove the stop word I get the normal results back. Do I > need to do a search of my query for stop words and remove them myself? > Or is there something I''m doing wrong with passing my query to AAF?what version of aaf do you use, and how does your call to acts_as_ferret look like ? cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote:> Depends on how you produced your query. In general, your query has to > pass through the same analyzer that was used for indexing. > > So, when building a PhraseQuery, for instance, you have to get each word > from the analyzer. > > keywords.each {|keyword| > query = Search::PhraseQuery.new(:fieldname) > analyzer = StemmedAnalyzer.new > tokenizer = analyzer.token_stream(:fieldname, keyword) > while (token = tokenizer.next) > query << token.text > end > } > > This is how I do it, it would be nicer if AAF would encapsulate this.it should do this, if it doesn''t, I''d consider this a bug. There have been problems with stop words in the past, but these should finally be sorted out in current trunk. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
>On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote: >> Depends on how you produced your query. In general, your query has to >> pass through the same analyzer that was used for indexing. >> >> So, when building a PhraseQuery, for instance, you have to get each word >> from the analyzer. >> >> keywords.each {|keyword| >> query = Search::PhraseQuery.new(:fieldname) >> analyzer = StemmedAnalyzer.new >> tokenizer = analyzer.token_stream(:fieldname, keyword) >> while (token = tokenizer.next) >> query << token.text >> end >> } >> >> This is how I do it, it would be nicer if AAF would encapsulate this. > >it should do this, if it doesn''t, I''d consider this a bug. There have >been problems with stop words in the past, but these should finally be >sorted out in current trunk.I don''t see this solved in the trunk @ <http://projects.jkraemer.net/ acts_as_ferret/browser/trunk>. In single_index_find_by_contents and find_by_contents, the ferret query should be taken apart, and be analyzed using the analyzer given by the user in the acts_as_ferret call. Right? Ewout> >Jens > >-- >webit! Gesellschaft f?r neue Medien mbH www.webit.de >Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de >Schnorrstra?e 76 Tel +49 351 46766 0 >D-01069 Dresden Fax +49 351 46766 66 >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk
On Fri, Jan 12, 2007 at 01:15:29PM +0100, Ewout wrote:> >On Fri, Jan 12, 2007 at 01:05:14AM +0100, Ewout wrote: > >> Depends on how you produced your query. In general, your query has to > >> pass through the same analyzer that was used for indexing. > >> > >> So, when building a PhraseQuery, for instance, you have to get each word > >> from the analyzer. > >> > >> keywords.each {|keyword| > >> query = Search::PhraseQuery.new(:fieldname) > >> analyzer = StemmedAnalyzer.new > >> tokenizer = analyzer.token_stream(:fieldname, keyword) > >> while (token = tokenizer.next) > >> query << token.text > >> end > >> } > >> > >> This is how I do it, it would be nicer if AAF would encapsulate this. > > > >it should do this, if it doesn''t, I''d consider this a bug. There have > >been problems with stop words in the past, but these should finally be > >sorted out in current trunk. > > I don''t see this solved in the trunk @ <http://projects.jkraemer.net/ > acts_as_ferret/browser/trunk>. > > In single_index_find_by_contents and find_by_contents, the ferret query > should be taken apart, and be analyzed using the analyzer given by the > user in the acts_as_ferret call.no, this is done by the Ferret-Index instance aaf internally uses. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66