On Sat, Oct 30, 2004 at 12:46:46PM +0200, sinking wrote:> I have some ideas that I like to play around with (mostly ideas on how
> to rank 'spam' sites lower).. All the ideas is based on
> setting a fixed score to the document, that could override the
> probalistic relevancy based on the
> value of the extra score.
>
> But I am confused on how I should implement it. I would like to hear
> what you think is the best approach.
> My idea.
> Add the score: to the Xapian::document value ?
If you want to rank some documents as always lower than some others, you
could set a value for each document and then sort on that, with
relevance as the secondary sort. That won't actually change document
weights though, and a bad A-class document would always beat a good
B-class one. Depending on your application this may be desirable or not.
> Add the weight calculation to the abstract weight class?
There's actually a BiasFunctor which does what you want - adding an
extra weight based on a value. But it's an incomplete piece of work
(I started to write it for Ananova as a way to mix date and relevance
ordering, but they didn't take it up).
Currently it's hardwired to produce a exponential decay considering
the value as a time_t, but the idea was it could be generalised to
allow the user to specify their own functor by subclassing (like
you can for Xapian::Weight).
So you could use that with a little work to make a Xapian::BiasFunctor
class which works a lot like Xapian::Weight.
Or you could probably abuse the extra weight part of your own weight
class to include the extra weight.
Note that with both the BiasFunctor and the extra weight approach you
should give a non-negative bonus. So you need to give good documents
a boost, not bad documents a penalty.
Cheers,
Olly