require ''rubygems''
require ''ferret''
include Ferret
PATH = ''/tmp/ferret_stopwords_test''
index = Index::IndexWriter.new(:path => PATH, :create => true)
index.analyzer = Analysis::StandardAnalyzer.new([])
index << {:title => ''a few good men'', :language =>
''en''}
index.analyzer = Analysis::StandardAnalyzer.new([''men''])
index << {:title => ''a few good men'', :language =>
''nl''}
index.close
searcher = Index::Index.new(:path => PATH)
puts searcher.search(''*:men AND language:nl'').total_hits
#=> 1
i''d expect zero results, as ''men'' is a stopword at
the time of indexing
with language:nl. is this a bug or a lack of understanding on my part.
a workaround would be to close and reopen the index after every
language, that returns the expected zero, as expected. don''T know how
much overhead that would be.
i am on ruby 1.8.5 / os x.
any assistance would be greatly appreciated since i have no clue why
this happens ...
cheers,
phillip
--
Posted via http://www.ruby-forum.com/.
* addendum 1: i use ferret 0.11.4 * addendum 2: when i comment out the first index.analyzer assignment, i get: /Users/phillip/Sites/ruby/playground/ferret_stopwords.rb:13: [BUG] Bus Error ruby 1.8.5 (2006-12-25) [i686-darwin8.8.2] * addendum 3: the underlying problem i have is that i have many different languages that have to be correctly indexed. is there a best practise how to do that? i mean, better than having one index and switching the analyzer around? thanks again, phillip -- Posted via http://www.ruby-forum.com/.
On Wed, May 09, 2007 at 11:59:59PM +0200, Phillip Oertel wrote:> require ''rubygems'' > require ''ferret'' > include Ferret > > PATH = ''/tmp/ferret_stopwords_test'' > > index = Index::IndexWriter.new(:path => PATH, :create => true) > > index.analyzer = Analysis::StandardAnalyzer.new([]) > index << {:title => ''a few good men'', :language => ''en''} > > index.analyzer = Analysis::StandardAnalyzer.new([''men'']) > index << {:title => ''a few good men'', :language => ''nl''} > > index.close > > searcher = Index::Index.new(:path => PATH) > puts searcher.search(''*:men AND language:nl'').total_hits > #=> 1 > > i''d expect zero results, as ''men'' is a stopword at the time of indexing > with language:nl. is this a bug or a lack of understanding on my part.Queries get analyzed, too, i.e. to remove stop words from them. So you''ll have to use the correct language-dependent Analyzer for your searcher, too. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa