My saga of tweaking Xapian to work right for me continues -- last night I figured out the core issue that I am having and I'm hoping you folks can direct me how to tweak xapian to make it behave right. Consider the search for an artist. Let's consider "Duran Duran" as an example right now. Without any weighting tricks, when I search for Duran Duran I get: 100 Duran Duran Duran 88 Duran Duran 66 Mike Duran 66 Duran Y Garcia 66 Duran (FYI, Duran Duran Duran is a valid band, sigh) As a user I would expect to see "Duran Duran" at 100% since my query matches one document in the database EXACTLY. In text searching terms, I understand the result since more occurrences of the word ought to yield a higher score. But for my round-pep-into-a-square hole approach of searching my SQL database with xapian, this isn't the best result. Is there any way I can tweak Xapian to move exact matches to 100% and matches that have more/fewer terms lower? -- --ruaok Somewhere in Texas a village is *still* missing its idiot. Robert Kaye -- rob at eorbit.net -- http://mayhem-chaos.net
On Wed, Jul 02, 2008 at 02:21:44PM -0700, Robert Kaye wrote:> Is there any way I can tweak Xapian to move exact matches to 100% and > matches that have more/fewer terms lower?I would suggest that you take the artist name, normalise it (squash punctuation and whitespace to a single space or nothing and casefold; perhaps drop a leading "the" or trailing ", the") and add this as a prefixed term. Then do the same to the query string and postprocess the parsed query by adding this term, probably with OP_OR so that punctuation normalisation works even if the parsed query doesn't match). So in this case, the artist term would be "XARTISTduran duran" or perhaps "XARTISTduranduran". Cheers, Olly