I''m using a custom stem analyser in my searches and my indexing. The
analyser is defined thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
text.downcase!
RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text
= #{text.inspect}"
tokenizer = StandardTokenizer.new(text)
filter = StemFilter.new(tokenizer)
filter
end
end
end
I use it in my indexing like this:
acts_as_ferret({ :store_class_name => true,
:ferret => { :analyzer =>
Ferret::Analysis::StemmingAnalyzer.new },
:fields => {:property_names => { :boost => 3.0 },
....etc
}})
And in a search like this:
search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer
=> Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
r_id = r_id.to_i
ferret_ids << r_id
self.scores_hash[r_id] = score
end
I have a problem with case sensitivity - basically, searches only work when
they are lowercase: even when it looks like the text stored in the index is
uppercase. From the console -
>> resource.to_doc
=> {:resource_id=>"59", :property_names=>"Bb Clarinet
Clarinet Family
Woodwind Instrumental and Vocal Image Resources Types"
}>> TeachingObject.find_with_ferret("Vocal", :page => 1,
:per_page =>
1000).include?(resource)
=> false>> TeachingObject.find_with_ferret("vocal", :page => 1,
:per_page =>
1000).include?(resource)
=> true
I think i have my stemming set up wrong, i''m not sure if it is even
being
used. I implemented it so that searches allowed pluralised and singular
terms, and that seems to work, eg
>> TeachingObject.find_with_ferret("vocals", :page => 1,
:per_page =>
1000).include?(resource)
=> true
But the case sensitivity thing has me stumped. I thought that the downcase!
call on the search term would make case irrelevant for searching but that
seems not to be the case. Can anyone set me straight?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20091126/48b2b301/attachment.html>