require ''rubygems'' require ''ferret'' include Ferret PATH = ''/tmp/ferret_stopwords_test'' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => ''a few good men'', :language => ''en''} index.analyzer = Analysis::StandardAnalyzer.new([''men'']) index << {:title => ''a few good men'', :language => ''nl''} index.close searcher = Index::Index.new(:path => PATH) puts searcher.search(''*:men AND language:nl'').total_hits #=> 1 i''d expect zero results, as ''men'' is a stopword at the time of indexing with language:nl. is this a bug or a lack of understanding on my part. a workaround would be to close and reopen the index after every language, that returns the expected zero, as expected. don''T know how much overhead that would be. i am on ruby 1.8.5 / os x. any assistance would be greatly appreciated since i have no clue why this happens ... cheers, phillip -- Posted via http://www.ruby-forum.com/.
* addendum 1: i use ferret 0.11.4 * addendum 2: when i comment out the first index.analyzer assignment, i get: /Users/phillip/Sites/ruby/playground/ferret_stopwords.rb:13: [BUG] Bus Error ruby 1.8.5 (2006-12-25) [i686-darwin8.8.2] * addendum 3: the underlying problem i have is that i have many different languages that have to be correctly indexed. is there a best practise how to do that? i mean, better than having one index and switching the analyzer around? thanks again, phillip -- Posted via http://www.ruby-forum.com/.
On Wed, May 09, 2007 at 11:59:59PM +0200, Phillip Oertel wrote:> require ''rubygems'' > require ''ferret'' > include Ferret > > PATH = ''/tmp/ferret_stopwords_test'' > > index = Index::IndexWriter.new(:path => PATH, :create => true) > > index.analyzer = Analysis::StandardAnalyzer.new([]) > index << {:title => ''a few good men'', :language => ''en''} > > index.analyzer = Analysis::StandardAnalyzer.new([''men'']) > index << {:title => ''a few good men'', :language => ''nl''} > > index.close > > searcher = Index::Index.new(:path => PATH) > puts searcher.search(''*:men AND language:nl'').total_hits > #=> 1 > > i''d expect zero results, as ''men'' is a stopword at the time of indexing > with language:nl. is this a bug or a lack of understanding on my part.Queries get analyzed, too, i.e. to remove stop words from them. So you''ll have to use the correct language-dependent Analyzer for your searcher, too. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa