Hi! I thought I understood Ferret''s query scoring and how to tweak results using boost values. What I currently experience however, leaves me completely baffled. Perhaps someone can shed some light on the scoring algorithm, because asking Ferret to "explain" the score for a particular document isn''t as informative as I thought. Actually, it confuses me even more. Here''s what I got: I''m indexing locations (addresses) in Ferret using the following fields: street, zipcode, district, city, county, state, country_code Addresses are stored in different precisions, i.e. not all of the fields contain values depending on the location''s accuracy. Here are two examples: 1. Berlin, Germany: country_code: de city: Berlin 2. The district ''Berlin'' in a town called ''Seedorf'': country_code: de city: Seedorf district: Berlin When querying for "berlin, de", document #2 is ranked higher (probably due to its natural position in the index). Since I want the less accurate locations to rank higher, I added boost values. In the example above, assume that city has a boost of 8 and district has a boost of 7. With this little adjustment the first document should rank higher since the term ''berlin'' appears in the city field. As you might suspect, this is not what happens. And I consider this a bug. Then I went and set the document boost to be 8 for a countries and 1 for streets. This doesn''t help either. The ranking of other results change slightly but nothing seems to be consistent with the boost settings. Perhaps the boost settings and the results are related in some way. But it''s definitely not a logical relation. I''m thankful for any hint on how to achieve a proper ranking. Thanks! Andy
Hi! I tried to reproduce this however changing the sorting with modifying boosts works perfectly for me: require ''rubygems'' require ''ferret'' include Ferret fi = Index::FieldInfos.new fi.add_field :country_code fi.add_field :city, :boost => 8 fi.add_field :district, :boost => 7 i = Ferret::I.new :field_infos => fi i << { :country_code => ''de'', :city => ''Berlin'' } i << { :country_code => ''de'', :city => ''Seedorf'', :district => ''Berlin'' } i.search_each ''berlin, de'' do |hit,score| puts "#{i[hit][:country_code]} #{i[hit][:district]} #{i[hit][:city]} Score: #{score}" end this outputs de Berlin Score: 0.841327428817749 de Berlin Seedorf Score: 0.740611553192139 Swapping the boost values (city:7, district:8) also changes the result sorting. Any more info on other circumstances that might cause your problems? Jens On Wed, Jul 11, 2007 at 02:24:33PM +0200, Andreas Korth wrote:> Hi! > > I thought I understood Ferret''s query scoring and how to tweak > results using boost values. What I currently experience however, > leaves me completely baffled. > > Perhaps someone can shed some light on the scoring algorithm, because > asking Ferret to "explain" the score for a particular document isn''t > as informative as I thought. Actually, it confuses me even more. > > Here''s what I got: > > I''m indexing locations (addresses) in Ferret using the following fields: > > street, zipcode, district, city, county, state, country_code > > Addresses are stored in different precisions, i.e. not all of the > fields contain values depending on the location''s accuracy. Here are > two examples: > > 1. Berlin, Germany: > > country_code: de > city: Berlin > > 2. The district ''Berlin'' in a town called ''Seedorf'': > > country_code: de > city: Seedorf > district: Berlin > > When querying for "berlin, de", document #2 is ranked higher > (probably due to its natural position in the index). Since I want the > less accurate locations to rank higher, I added boost values. In the > example above, assume that city has a boost of 8 and district has a > boost of 7. > > With this little adjustment the first document should rank higher > since the term ''berlin'' appears in the city field. As you might > suspect, this is not what happens. And I consider this a bug. > > Then I went and set the document boost to be 8 for a countries and 1 > for streets. This doesn''t help either. > > The ranking of other results change slightly but nothing seems to be > consistent with the boost settings. Perhaps the boost settings and > the results are related in some way. But it''s definitely not a > logical relation. > > I''m thankful for any hint on how to achieve a proper ranking. > > Thanks! > Andy > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Hi Jens, thanks a lot for reminding me that distilling a simple test case can help clearing up things quickly. I was buried so deep in my own code that I couldn''t see the obvious. Turns out there is a problem with a custom analyzer of mine. It works OK and passed all tests but it seems that Ferret isn''t using the same analyzer for searching and indexing although I''ve arranged for it. Or so I thought. I still haven''t found the culprit but you put me on the right track anyway. Thanks, Andy On 11.07.2007, at 14:36, Jens Kraemer wrote:> require ''rubygems'' > require ''ferret'' > > include Ferret > > fi = Index::FieldInfos.new > fi.add_field :country_code > fi.add_field :city, :boost => 8 > fi.add_field :district, :boost => 7 > i = Ferret::I.new :field_infos => fi > > i << { :country_code => ''de'', :city => ''Berlin'' } > i << { :country_code => ''de'', :city => ''Seedorf'', :district => > ''Berlin'' } > > i.search_each ''berlin, de'' do |hit,score| > puts "#{i[hit][:country_code]} #{i[hit][:district]} #{i[hit] > [:city]} Score: #{score}" > end > > this outputs > de Berlin Score: 0.841327428817749 > de Berlin Seedorf Score: 0.740611553192139 > > Swapping the boost values (city:7, district:8) also changes the result > sorting. > > Any more info on other circumstances that might cause your problems?
On 11.07.2007, at 15:40, Andreas Korth wrote:> Turns out there is a problem with a custom analyzer of mine. It works > OK and passed all tests but it seems that Ferret isn''t using the same > analyzer for searching and indexing although I''ve arranged for it. Or > so I thought.Here are three more questions related to the problem. The problem is definitely an analyzer mismatch but I can''t really put my finger on it. 1. Is it required to pass the field_infos everytime the index is opened, or is it sufficient if the index is once created via FieldInfos#create_index? In other words: are the field infos stored in the index? 2. The analyzer to be used for both reading and writing is passed to Index.new() via the :analyzer parameter. Correct? This is what I do and I even set the analyzer explicitly using Index#add_document(doc, analyzer). 3. For a given Index, how can I determine which analyzer is currently used for any given field, both for reading and writing? Cheers, Andy
On Wed, Jul 11, 2007 at 04:12:35PM +0200, Andreas Korth wrote:> > On 11.07.2007, at 15:40, Andreas Korth wrote: > > > Turns out there is a problem with a custom analyzer of mine. It works > > OK and passed all tests but it seems that Ferret isn''t using the same > > analyzer for searching and indexing although I''ve arranged for it. Or > > so I thought. > > Here are three more questions related to the problem. The problem is > definitely an analyzer mismatch but I can''t really put my finger on it. > > 1. Is it required to pass the field_infos everytime the index is > opened, or is it sufficient if the index is once created via > FieldInfos#create_index? In other words: are the field infos stored > in the index?yes.> 2. The analyzer to be used for both reading and writing is passed to > Index.new() via the :analyzer parameter. Correct? This is what I do > and I even set the analyzer explicitly using Index#add_document(doc, > analyzer).correct.> 3. For a given Index, how can I determine which analyzer is currently > used for any given field, both for reading and writing?I don''t know any way to get this information. You can use process_query to see what the query parser generates from your query string (which involves analyzing it). To see what gets indexed, you could use the ferret_browser Dave introduced with the latest release to inspect your index. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa