Robert Kaye
2008-May-31 00:03 UTC
[Xapian-discuss] Ordering search results and defining a custom Weight class in python
Hi! I'm the lead geek over at MusicBrainz ( http://musicbrainz.org ) and we've been using Lucene (PyLucene compiled by a gcj compiled from source -- painful) for our search services. When it works its great, but getting it to work is an utter nightmare and our method of using it has been deprecated. Sigh. Which brings me to Xapian -- so far I am very pleased by the ease to install, indexing speed and ease to port from Lucene. However, since we're actually indexing an SQL database (rather than a corpus of txt documents) the default scoring method in xapian does not do a good job in ordering our search results. (Lucene didn't do so hot either, but with Xapian its worse) Given this, I have two questions: 1. Do you have any tips for how to tweak the ordering of our search results? It appears that the desirable results are found, but what we consider to be the best match usually doesn't show up as a 100% rank. Is there any way to tweak this ranking without creating a custom Weight class? 2. If a custom Weight class is the way to go (which I suspect), does anyone have an example of how to do this in Python? I've tried to port the default C++ example from the docs as such: class TinkerWeight(xapian.Weight): # def __init__(self): # xapian.Weight.__init__(self) def name(self): return "Tinker" def serialize(self): return "" def get_sumpart(*args): return 1 def get_maxpart(*args): return 1 def get_sumextra(*args): return 0 def get_maxextra(*args): return 0 But when I call: self.en.set_weighting_scheme(TinkerWeight()) it dies in xapian.py with: 2109 def __init__(self): raise AttributeError, "No constructor defined" This happens when I define my own constructor or not. Any feedback, sample code or clue-by-fours would be greatly appreciated! -- --ruaok Somewhere in Texas a village is *still* missing its idiot. Robert Kaye -- rob at eorbit.net -- http://mayhem-chaos.net
Olly Betts
2008-May-31 01:21 UTC
[Xapian-discuss] Ordering search results and defining a custom Weight class in python
On Fri, May 30, 2008 at 05:03:09PM -0700, Robert Kaye wrote:> 1. Do you have any tips for how to tweak the ordering of our search > results? It appears that the desirable results are found, but what we > consider to be the best match usually doesn't show up as a 100% rank. > Is there any way to tweak this ranking without creating a custom > Weight class?Yes, BM25Weight has several parameters which can be adjusted to change the emphasis of the weighting. If your documents are typically quite short, then you probably will get better results if you make the document length less important.> 2. If a custom Weight class is the way to go (which I suspect), does > anyone have an example of how to do this in Python? I've tried to port > the default C++ example from the docs as such: > > class TinkerWeight(xapian.Weight): > > # def __init__(self): > # xapian.Weight.__init__(self)xapian.Weight is an abstract base class, so has no constructor, but your class needs a constructor so it can be constructed. So I think you need to write: def __init__(self): pass Cheers, Olly