roop
2006-Apr-13 09:34 UTC
[Ferret-talk] QueryParser doesn''t use StandardAnalyzer correctly?
I am having a bit of a problem with my search queries being parsed correctly it seems, and I wonder if anyone else has experienced this. I have written an index using StandardAnalyzer for analysis. I want to search that index by passing my user query through a QueryParser instance which is also using a StandardAnalyzer. However the resultant query does not seem to be a valid term query and therefore the search produces no hits. Specifically I have a bunch of docs with the phrase "museum of art" in the source text. A query ''museum art'' gets parsed into ''+contents:museum +contents:art'' which works just fine and produces hits. A query of ''museum of art'' gets parsed into ''+contents:museum +contents: +contents:art'' which produces no hits. The resulting term query itself seems to be malformed, containing an extraneous term for a stop word which was (correctly) filtered out. Using the Luke gui tool for Lucene, I have verified that passing my query through StandardAnalyzer should indeed work, as it produces the expected term query and the expected hits in that environment. But as for the same query in Ferret, I''m at a loss. This should be easily reproducible with the following code fragment: require ''ferret'' parser = Ferret::QueryParser.new(''contents'', :analyzer => Ferret::Analysis::StandardAnalyzer.new, :occur_default => Ferret::Search::BooleanClause::Occur::MUST) q1 = parser.parse(''museum art'') q2 = parser.parse(''museum of art'') puts q1, q2 Thanks for any insight. -Roop -- Posted via http://www.ruby-forum.com/.
Nathaniel Talbott
2006-Apr-14 13:33 UTC
[Ferret-talk] QueryParser doesn''t use StandardAnalyzer correctly?
roop wrote:> I am having a bit of a problem with my search queries being parsed > correctly it seems, and I wonder if anyone else has experienced this.See this recent thread: "Stop words in queries" (http://www.ruby-forum.com/topic/60599). HTH, Nathaniel -- Posted via http://www.ruby-forum.com/.
roop
2006-Apr-14 20:29 UTC
[Ferret-talk] QueryParser doesn''t use StandardAnalyzer correctly?
Nathaniel, thanks for the info. I will await the bug fix. In the meantime my own workaround looks like this. In my QueryParser subclass, I override parse() so that it filters out stopwords first: class SafeQueryParser < Ferret::QueryParser def initialize(default_field, options) my_options = { :analyzer => Ferret::Analysis::StandardAnalyzer.new }.update(options) super(default_field, my_options) # breaking encapsulation here, but whaddya gonna do... @stop_words = my_options[:analyzer].instance_variable_get(:@stop_words) end def parse(query) @stop_words.each do |word| query.gsub!(/\b#{word}\b\s*/, '''') end super(query) end end -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Apr-18 04:46 UTC
[Ferret-talk] QueryParser doesn''t use StandardAnalyzer correctly?
Hey guys, Since this was a pretty easy fix, I''ve fixed it in the pure ruby version. You''ll have to get it out of the subversion repo if you want it. Cheers, Dave On 4/15/06, roop <roop at pookmail.com> wrote:> Nathaniel, thanks for the info. I will await the bug fix. In the > meantime my own workaround looks like this. In my QueryParser subclass, > I override parse() so that it filters out stopwords first: > > class SafeQueryParser < Ferret::QueryParser > > def initialize(default_field, options) > my_options = { :analyzer => Ferret::Analysis::StandardAnalyzer.new > }.update(options) > super(default_field, my_options) > # breaking encapsulation here, but whaddya gonna do... > @stop_words > my_options[:analyzer].instance_variable_get(:@stop_words) > end > > def parse(query) > @stop_words.each do |word| > query.gsub!(/\b#{word}\b\s*/, '''') > end > super(query) > end > > end > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >