On Sun, Jul 05, 2009 at 09:58:36AM +0200, James Cauwelier
wrote:> - I want to sort by some other relevance, namely by popularity of
> the document relative to this search query. I want the sort order to
> be influenced by clicking patterns after a query has been executed.
> If a search for 'tomato' gives a set of results and 90% of the
users
> click on the third result, then I want to move this document to the
> top of the result set. I would need a logger, which records this
> behavior, but I would also need to sort by generated key, if I am not
> mistaken. As a default, I would like to fall back to the default
> relevance sorting of Xapian.
The alternative (for Xapian 1.0.x) would be to compute a
query-independent ranking of all documents based on click-through data
and store this in a value slot of each document. You'd have to batch
update (e.g. every night or weekend), and it wouldn't allow you to
respond to a document being more popular for some searches than others.
> Can I do the subclassing Xapian::Sorter as suggested in the
> documentation in PHP?
Not currently. I'm mentoring a student in Google's Summer of Code to
add the needed support to SWIG to allow such subclassing, and given
his progress so far it seems very likely he'll succeed. So this will
probably be possible in Xapian 1.1.x in a month or two. PHP isn't the
fastest of languages, so you might find this approach is too slow if you
have a large database since it calls back to PHP for every document
being considered (since it's not currently supported, I don't have
actual figures...)
If you're happy to use 1.1.x, another alternative is to use the new
PostingSource feature:
http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst
In particular, ValueWeightPostingSource:
http://trac.xapian.org/browser/trunk/xapian-core/include/xapian/postingsource.h#L383
> Or do I need to learn C++?
No, but you might have to use Xapian 1.1.x (which is a development
series). It wouldn't have to be C++ though, Python supports subclassing
of xapian.Sorter.
> Has this already been done before in an open source and free
> solution? I do not want to reinvent the wheel...
I don't know of one I'm afraid.
Cheers,
Olly