Stuart Rackham
2006-Sep-09 00:52 UTC
[Ferret-talk] search_each segmentation fault and parser anomoly
The included test script turned up the following anomolies (run against Ferret 0.10.3, but had same problems with 0.10.2): 1. When the content word is not in the index the inclusion of a wildcard file term causes search_each to throw a segmentation fault. $ ./test.rb zzz file:*.txt query: +content:zzz +file:*.txt ./test.rb:28: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i486-linux] Aborted 2. When the file query term is file:* wildcard the parser translates it to +* instead of +file:* $ ./test.rb one file:* query: +content:one +* file: f1.txt Am I missing something here? Cheers Stuart -- Stuart Rackham -----------BEGIN SCRIPT---------------- #!/usr/bin/env ruby require ''rubygems'' require ''ferret'' include Ferret path = ''/tmp/test_index'' index = Index::IndexWriter.new(:create => true, :path => path) index.field_infos.add_field(:file, :store => :yes, :index => :untokenized) index.field_infos.add_field(:content, :store => :no, :index => :yes) index << {:content => ''one'', :file => ''f1.txt''} index << {:content => ''two'', :file => ''f2.txt''} index << {:content => ''three'', :file => ''f3.txt''} index << {:content => ''four'', :file => ''f4.txt''} index << {:content => ''five'', :file => ''f5.txt''} index.optimize index.close query_parser = QueryParser.new({:default_field => :content, :or_default => false, }) query = query_parser.parse(ARGV.join('' '')) puts "query: #{query}" searcher = Search::Searcher.new(path) searcher.search_each(query) do |doc, score| puts "file: #{searcher[doc][:file]}" end -------------END SCRIPT---------------- -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-09 02:59 UTC
[Ferret-talk] search_each segmentation fault and parser anomoly
On 9/9/06, Stuart Rackham <srackham at methods.co.nz> wrote:> The included test script turned up the following anomolies (run > against Ferret 0.10.3, but had same problems with 0.10.2): > > 1. When the content word is not in the index the inclusion of a > wildcard file term causes search_each to throw a segmentation > fault. > > $ ./test.rb zzz file:*.txt > query: +content:zzz +file:*.txt > ./test.rb:28: [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i486-linux] > > AbortedThanks Stuart. This is fixed in subversion. I''ll put another gem out ASAP.> 2. When the file query term is file:* wildcard the parser > translates it to +* instead of +file:* > > $ ./test.rb one file:* > query: +content:one +* > file: f1.txt > > Am I missing something here?"*" matches everything including empty strings. So basically it will match documents that don''t even contain the :file field. I''ve therefore optimized it to a MatchAllQuery. Before doing this "*" was pretty much unusable in large indexes since it would create a massive MultiTermQuery with every term in the index (as long as you set the :max_clauses parameter of QueryParser large enough to accept them all). Now, if you do need to only match documents that contain the desired field you can do it like this; $ ./test.rb one file:?* Hope that makes sense. Dave