Hey .. I''m using ferret to index various objects and i''m create a Ferret::Document for each of these objects. Indexing and searching is working fine. Each of these Ferret::Documents has a ''relevance'' field, storing an integer, how relevant this object is for the search. The ''relevance'' is in the range of 1..10 Now i would like to multiply the relevance of the document with the score, and sort the results by that. e.g.: A document with a score of 0.82 and a relevance of 3 should have a final score of 2.46 I couldn''t figure out how to do this .. I''ve read the ''Balancing relevancy and recentness'' thread..> score = yield( doc, score ) if block_given? > > This allows a block attached to a search call to adjust > document scores before documents are sorted, based on > some (possibly dynamic) numerical factors associated > with the document, e.g. the number and importancei guess this works for the pure ruby implementation but won''t work for the c-implementation?> As long as Ferret does what Lucene does with boosts, you could scale > document boosts at indexing time by some factor related to age and > that will factor into scoring.Boost won''t help me here, i''ve even set the boost value for relevance to 0.0, as it should not be part of the query.. Is there any way on how to recaluclate the score? Thanks, Ben -- Posted via http://www.ruby-forum.com/.
On 7/4/06, Benjamin Krause <bk at benjaminkrause.com> wrote:> Hey .. > > I''m using ferret to index various objects and i''m create a > Ferret::Document for each of these objects. Indexing and searching is > working fine. > > Each of these Ferret::Documents has a ''relevance'' field, storing an > integer, how relevant this object is for the search. The ''relevance'' is > in the range of 1..10 > > Now i would like to multiply the relevance of the document with the > score, and sort the results by that. > > e.g.: > A document with a score of 0.82 and a relevance of 3 should have a final > score of 2.46 > > I couldn''t figure out how to do this .. > > I''ve read the ''Balancing relevancy and recentness'' thread.. > > > score = yield( doc, score ) if block_given? > > > > This allows a block attached to a search call to adjust > > document scores before documents are sorted, based on > > some (possibly dynamic) numerical factors associated > > with the document, e.g. the number and importance > > i guess this works for the pure ruby implementation but won''t work for > the c-implementation?Hi Ben, You are right, this is only possible in the pure ruby version. A more flexible framework for sorting will be coming in the future but currently you can only sort by integer, float, string, doc_id, and relevance.> > As long as Ferret does what Lucene does with boosts, you could scale > > document boosts at indexing time by some factor related to age and > > that will factor into scoring. > > Boost won''t help me here, i''ve even set the boost value for relevance to > 0.0, as it should not be part of the query.. > > Is there any way on how to recaluclate the score?How about setting the boost for the whole document rather than just the :relevance field? Or do you sometimes want to sort by relevance without taking the :relevance field into account? Cheers, Dave PS: While we are on the topic, how would you like the sort API to look? Many have complained that the sort API is too java-like but no-one has suggested any improvements yet. I''d love to see some ideas.
Hey David, thanks for the answer ..> How about setting the boost for the whole document rather than just > the :relevance field? Or do you sometimes want to sort by relevance > without taking the :relevance field into account?ah.. you mean i should boost each field of the document? or is there a way to set a boost level for the document as a whole? if so, i''ve missed it ..> PS: While we are on the topic, how would you like the sort API to > look? Many have complained that the sort API is too java-like but > no-one has suggested any improvements yet. I''d love to see some ideas.i like the idea of giving a short block with a sort algorithm.. i would like to see something like that: index.search ( :query => my_query, :sort => Proc.new( |doc| # some caluclation; return new_score ), :reverse => false, :filter => false, :start => 0, :limit => 10 ) alternativly you should be able to give the sort param a name of a filed, like '':sort => :score'' or an array of fields like '':sort => [ :score, :title ]'' and sort by the first element and then by the 2nd if the two or more docs share the same value for the 1st element. I guess something like ":sort => :score" is enough for most people .. i think the other options are almost like it is implemented right now .. i don''t think you nee the SortField class. btw.. i do find the filter API not really intuitive, actually i didn''t understand it at all ;) i know what you want to do with filters and how you want to get there, but i haven''t found any understandable documentation, on how to build one .. maybe you should write a short tutorial on how to write a filter.. i would find it very intuitive, to have something like a base_query.. like having one query to filter/limit results, and have another query to do the real search.. and btw.. one feature i would definitely would like to see is to limit the search on a number of fields.. i know i can write something like field_one:"search string" || field_two:"search string||field_three:"search string"||field_four:"search string" but i would like to be able to write something like (field_one|field_two|field_three|field_four):"search string" furthermore, you should be able to say something like .. search in all fields, except field_one .. like (*|!field_one):"search string" Ben -- Posted via http://www.ruby-forum.com/.
On 7/8/06, Benjamin Krause <bk at benjaminkrause.com> wrote:> Hey David, > > thanks for the answer .. > > > How about setting the boost for the whole document rather than just > > the :relevance field? Or do you sometimes want to sort by relevance > > without taking the :relevance field into account? > > ah.. you mean i should boost each field of the document? or is there a > way to set a boost level for the document as a whole? if so, i''ve missed > it ..doc = Ferret::Document::Document.new() doc.boost = 100.0> > PS: While we are on the topic, how would you like the sort API to > > look? Many have complained that the sort API is too java-like but > > no-one has suggested any improvements yet. I''d love to see some ideas. > > i like the idea of giving a short block with a sort algorithm.. i would > like to see something like that: > > index.search ( :query => my_query, > :sort => Proc.new( |doc| # some caluclation; return > new_score ), > :reverse => false, > :filter => false, > :start => 0, > :limit => 10 )The way sort works at the moment is that it caches all fields that are sorted on. If you start doing sort like this and you have to load every document in the result set which would have a huge performance hit. I guess I could make this feature available though. In the pure ruby version of Ferret you can do this; st_length = SortField::SortType.new("length", lambda{|str| str.length}) sf = SortField.new("content", {:sort_type => st_length, :reverse => true, :comparator => lambda{|i,j| j <=> i}}) The sort type lambda allows you to create the sort cache. Then the comparator lets you compare those two values. This is flexible while remaining performant, although I still think I can make it more intuitive.> alternativly you should be able to give the sort param a name of a > filed, like '':sort => :score'' or an array of fields like '':sort => [ > :score, :title ]'' and sort by the first element and then by the 2nd if > the two or more docs share the same value for the 1st element. > I guess something like ":sort => :score" is enough for most people ..Actually, you can already do this. Have you tried it? Only :score is treated as a field name. You''d have to do this; index.search_each(query, :sort => [SortField::RELEVANCE, :title, :price])> i think the other options are almost like it is implemented right now .. > i don''t think you nee the SortField class. > > btw.. i do find the filter API not really intuitive, actually i didn''t > understand it at all ;) > > i know what you want to do with filters and how you want to get there, > but i haven''t found any understandable documentation, on how to build > one .. > > maybe you should write a short tutorial on how to write a filter.. i > would find it very intuitive, to have something like a base_query.. like > having one query to filter/limit results, and have another query to do > the real search..I will. The TermEnum and TermDocEnum are essential for using filters and they''ve undergone major changes so I''ll hold off on this until I get the next release out.> and btw.. one feature i would definitely would like to see is to limit > the search on a number of fields.. > > i know i can write something like > > field_one:"search string" || field_two:"search > string||field_three:"search string"||field_four:"search string" > > but i would like to be able to write something like > > (field_one|field_two|field_three|field_four):"search string"You can do this already, just get rid of the brackets; field_one|field_two|field_three|field_four:"search string"> furthermore, you should be able to say something like .. search in all > fields, except field_one .. like > > (*|!field_one):"search string"You can''t do this, but it is a nice idea. I''ll think about it. I might also add the brackets into the syntax. Anyway, thanks for your feedback Ben. I will definitely use it. Cheers, Dave