Stuart Rackham
2006-Sep-09 00:52 UTC
[Ferret-talk] search_each segmentation fault and parser anomoly
The included test script turned up the following anomolies (run
against Ferret 0.10.3, but had same problems with 0.10.2):
1. When the content word is not in the index the inclusion of a
wildcard file term causes search_each to throw a segmentation
fault.
$ ./test.rb zzz file:*.txt
query: +content:zzz +file:*.txt
./test.rb:28: [BUG] Segmentation fault
ruby 1.8.4 (2005-12-24) [i486-linux]
Aborted
2. When the file query term is file:* wildcard the parser
translates it to +* instead of +file:*
$ ./test.rb one file:*
query: +content:one +*
file: f1.txt
Am I missing something here?
Cheers Stuart
--
Stuart Rackham
-----------BEGIN SCRIPT----------------
#!/usr/bin/env ruby
require ''rubygems''
require ''ferret''
include Ferret
path = ''/tmp/test_index''
index = Index::IndexWriter.new(:create => true, :path => path)
index.field_infos.add_field(:file, :store => :yes, :index =>
:untokenized)
index.field_infos.add_field(:content, :store => :no, :index => :yes)
index << {:content => ''one'', :file =>
''f1.txt''}
index << {:content => ''two'', :file =>
''f2.txt''}
index << {:content => ''three'', :file =>
''f3.txt''}
index << {:content => ''four'', :file =>
''f4.txt''}
index << {:content => ''five'', :file =>
''f5.txt''}
index.optimize
index.close
query_parser = QueryParser.new({:default_field => :content,
:or_default => false,
})
query = query_parser.parse(ARGV.join('' ''))
puts "query: #{query}"
searcher = Search::Searcher.new(path)
searcher.search_each(query) do |doc, score|
puts "file: #{searcher[doc][:file]}"
end
-------------END SCRIPT----------------
--
Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-09 02:59 UTC
[Ferret-talk] search_each segmentation fault and parser anomoly
On 9/9/06, Stuart Rackham <srackham at methods.co.nz> wrote:> The included test script turned up the following anomolies (run > against Ferret 0.10.3, but had same problems with 0.10.2): > > 1. When the content word is not in the index the inclusion of a > wildcard file term causes search_each to throw a segmentation > fault. > > $ ./test.rb zzz file:*.txt > query: +content:zzz +file:*.txt > ./test.rb:28: [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i486-linux] > > AbortedThanks Stuart. This is fixed in subversion. I''ll put another gem out ASAP.> 2. When the file query term is file:* wildcard the parser > translates it to +* instead of +file:* > > $ ./test.rb one file:* > query: +content:one +* > file: f1.txt > > Am I missing something here?"*" matches everything including empty strings. So basically it will match documents that don''t even contain the :file field. I''ve therefore optimized it to a MatchAllQuery. Before doing this "*" was pretty much unusable in large indexes since it would create a massive MultiTermQuery with every term in the index (as long as you set the :max_clauses parameter of QueryParser large enough to accept them all). Now, if you do need to only match documents that contain the desired field you can do it like this; $ ./test.rb one file:?* Hope that makes sense. Dave