thr3ads.net - Ferret talk - [Ferret-talk] Recalculating the score [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Benjamin Krause

2006-Jul-04 13:19 UTC

[Ferret-talk] Recalculating the score

Hey ..

I''m using ferret to index various objects and i''m create a 
Ferret::Document for each of these objects. Indexing and searching is 
working fine.

Each of these Ferret::Documents has a ''relevance'' field,
storing an
integer, how relevant this object is for the search. The
''relevance'' is
in the range of 1..10

Now i would like to multiply the relevance of the document with the 
score, and sort the results by that.

e.g.:
A document with a score of 0.82 and a relevance of 3 should have a final 
score of 2.46

I couldn''t figure out how to do this ..

I''ve read the ''Balancing relevancy and recentness''
thread..
>      score = yield( doc, score ) if block_given?
>
> This allows a block attached to a search call to adjust
> document scores before documents are sorted, based on
> some (possibly dynamic) numerical factors associated
> with the document, e.g. the number and importance
i guess this works for the pure ruby implementation but won''t work for 
the c-implementation?
> As long as Ferret does what Lucene does with boosts, you could scale
> document boosts at indexing time by some factor related to age and
> that will factor into scoring.  
Boost won''t help me here, i''ve even set the boost value for
relevance to
0.0, as it should not be part of the query..

Is there any way on how to recaluclate the score?

Thanks,
 Ben

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Jul-06 03:53 UTC

head link

[Ferret-talk] Recalculating the score

On 7/4/06, Benjamin Krause <bk at benjaminkrause.com>
wrote:> Hey ..
>
> I''m using ferret to index various objects and i''m create
a
> Ferret::Document for each of these objects. Indexing and searching is
> working fine.
>
> Each of these Ferret::Documents has a ''relevance'' field,
storing an
> integer, how relevant this object is for the search. The
''relevance'' is
> in the range of 1..10
>
> Now i would like to multiply the relevance of the document with the
> score, and sort the results by that.
>
> e.g.:
> A document with a score of 0.82 and a relevance of 3 should have a final
> score of 2.46
>
> I couldn''t figure out how to do this ..
>
> I''ve read the ''Balancing relevancy and
recentness'' thread..
>
> >      score = yield( doc, score ) if block_given?
> >
> > This allows a block attached to a search call to adjust
> > document scores before documents are sorted, based on
> > some (possibly dynamic) numerical factors associated
> > with the document, e.g. the number and importance
>
> i guess this works for the pure ruby implementation but won''t work
for
> the c-implementation?
Hi Ben,
You are right, this is only possible in the pure ruby version. A more
flexible framework for sorting will be coming in the future but
currently you can only sort by integer, float, string, doc_id, and
relevance.
> > As long as Ferret does what Lucene does with boosts, you could scale
> > document boosts at indexing time by some factor related to age and
> > that will factor into scoring.
>
> Boost won''t help me here, i''ve even set the boost value
for relevance to
> 0.0, as it should not be part of the query..
>
> Is there any way on how to recaluclate the score?
How about setting the boost for the whole document rather than just
the :relevance field? Or do you sometimes want to sort by relevance
without taking the :relevance field into account?

Cheers,
Dave

PS: While we are on the topic, how would you like the sort API to
look? Many have complained that the sort API is too java-like but
no-one has suggested any improvements yet. I''d love to see some ideas.

Benjamin Krause

2006-Jul-07 17:23 UTC

head link

[Ferret-talk] Recalculating the score

Hey David,

thanks for the answer ..
> How about setting the boost for the whole document rather than just
> the :relevance field? Or do you sometimes want to sort by relevance
> without taking the :relevance field into account?
ah.. you mean i should boost each field of the document? or is there a 
way to set a boost level for the document as a whole? if so, i''ve
missed
it ..
> PS: While we are on the topic, how would you like the sort API to
> look? Many have complained that the sort API is too java-like but
> no-one has suggested any improvements yet. I''d love to see some
ideas.
i like the idea of giving a short block with a sort algorithm.. i would 
like to see something like that:

index.search ( :query => my_query,
               :sort  => Proc.new( |doc| # some caluclation; return 
new_score ),
               :reverse => false,
               :filter => false,
               :start => 0,
               :limit => 10 )

alternativly you should be able to give the sort param a name of a 
filed, like '':sort => :score'' or an array of fields like
'':sort => [
:score, :title ]'' and sort by the first element and then by the 2nd if 
the two or more docs share the same value for the 1st element.
I guess something like ":sort => :score" is enough for most people
..

i think the other options are almost like it is implemented right now .. 
i don''t think you nee the SortField class.

btw.. i do find the filter API not really intuitive, actually i didn''t 
understand it at all ;)

i know what you want to do with filters and how you want to get there, 
but i haven''t found any understandable documentation, on how to build 
one ..

maybe you should write a short tutorial on how to write a filter.. i 
would find it very intuitive, to have something like a base_query.. like 
having one query to filter/limit results, and have another query to do 
the real search..

and btw.. one feature i would definitely would like to see is to limit 
the search on a number of fields..

i know i can write something like

field_one:"search string" || field_two:"search 
string||field_three:"search string"||field_four:"search
string"

but i would like to be able to write something like

(field_one|field_two|field_three|field_four):"search string"

furthermore, you should be able to say something like .. search in all 
fields, except field_one .. like

(*|!field_one):"search string"

Ben

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Jul-07 23:02 UTC

head link

[Ferret-talk] Recalculating the score

On 7/8/06, Benjamin Krause <bk at benjaminkrause.com>
wrote:> Hey David,
>
> thanks for the answer ..
>
> > How about setting the boost for the whole document rather than just
> > the :relevance field? Or do you sometimes want to sort by relevance
> > without taking the :relevance field into account?
>
> ah.. you mean i should boost each field of the document? or is there a
> way to set a boost level for the document as a whole? if so, i''ve
missed
> it ..
doc = Ferret::Document::Document.new()
doc.boost = 100.0
> > PS: While we are on the topic, how would you like the sort API to
> > look? Many have complained that the sort API is too java-like but
> > no-one has suggested any improvements yet. I''d love to see
some ideas.
>
> i like the idea of giving a short block with a sort algorithm.. i would
> like to see something like that:
>
> index.search ( :query => my_query,
>                :sort  => Proc.new( |doc| # some caluclation; return
> new_score ),
>                :reverse => false,
>                :filter => false,
>                :start => 0,
>                :limit => 10 )
The way sort works at the moment is that it caches all fields that are
sorted on. If you start doing sort like this and you have to load
every document in the result set which would have a huge performance
hit. I guess I could make this feature available though.

In the pure ruby version of Ferret you can do this;

    st_length = SortField::SortType.new("length", lambda{|str|
str.length})
    sf = SortField.new("content", {:sort_type => st_length,
                               :reverse => true,
                               :comparator => lambda{|i,j| j <=> i}})

The sort type lambda allows you to create the sort cache. Then the
comparator lets you compare those two values. This is flexible while
remaining performant, although I still think I can make it more
intuitive.
> alternativly you should be able to give the sort param a name of a
> filed, like '':sort => :score'' or an array of fields
like '':sort => [
> :score, :title ]'' and sort by the first element and then by the
2nd if
> the two or more docs share the same value for the 1st element.
> I guess something like ":sort => :score" is enough for most
people ..
Actually, you can already do this. Have you tried it? Only :score is
treated as a field name. You''d have to do this;

    index.search_each(query, :sort => [SortField::RELEVANCE, :title, :price])

> i think the other options are almost like it is implemented right now ..
> i don''t think you nee the SortField class.
>
> btw.. i do find the filter API not really intuitive, actually i
didn''t
> understand it at all ;)
>
> i know what you want to do with filters and how you want to get there,
> but i haven''t found any understandable documentation, on how to
build
> one ..
>
> maybe you should write a short tutorial on how to write a filter.. i
> would find it very intuitive, to have something like a base_query.. like
> having one query to filter/limit results, and have another query to do
> the real search..
I will. The TermEnum and TermDocEnum are essential for using filters
and they''ve undergone major changes so I''ll hold off on this
until I
get the next release out.
> and btw.. one feature i would definitely would like to see is to limit
> the search on a number of fields..
>
> i know i can write something like
>
> field_one:"search string" || field_two:"search
> string||field_three:"search string"||field_four:"search
string"
>
> but i would like to be able to write something like
>
> (field_one|field_two|field_three|field_four):"search string"
You can do this already, just get rid of the brackets;

    field_one|field_two|field_three|field_four:"search string"
> furthermore, you should be able to say something like .. search in all
> fields, except field_one .. like
>
> (*|!field_one):"search string"
You can''t do this, but it is a nice idea. I''ll think about it.
I might
also add the brackets into the syntax.


Anyway, thanks for your feedback Ben. I will definitely use it.

Cheers,
Dave

Seemingly Similar Threads

Search for more reasonably related threads

Ferret talk - Jul 2006 - Recalculating the score

[Ferret-talk] Recalculating the score

[Ferret-talk] Recalculating the score

[Ferret-talk] Recalculating the score

[Ferret-talk] Recalculating the score

Seemingly Similar Threads