Felix Antonius Wilhelm Ostmann
2006-Dec-29 16:25 UTC
[Xapian-discuss] weighting of documents and terms
i want to sort the results by my special weighting, i never use the by relevance. but if i use sort_by_value this is realy slow :( 20 times slower than by relevance. can i modify the relevance-sort so it works fine for me? what must i do? The only way i see to modify is at searchtime to set_weighting_scheme of the Enquire. the second problem ist, that i need a weight by value and term :-/ value is perhaps 20 and this term has an weight of 2 so the value must be 40 for the sorting. arg, so complex :-/ i dont find something in the documentation :-/ Happy New Year btw :) MfG Felix Antonius Wilhelm Ostmann
On Fri, Dec 29, 2006 at 05:24:53PM +0100, Felix Antonius Wilhelm Ostmann wrote:> i want to sort the results by my special weighting, i never use the by > relevance. but if i use sort_by_value this is realy slow :( 20 times > slower than by relevance.Sort by value is slower because we need to read the values for candidate documents. Incidentally, I think this could be improved by storing values differently. Currently we store all the values for a document together indexed keyed on the docid (one consequence is that storing extra values you don't use slows down use of values). I think it would be significantly better overall to store a stream for each value number, split into chunks (rather like we do already for posting lists).> can i modify the relevance-sort so it works > fine for me? what must i do? The only way i see to modify is at > searchtime to set_weighting_scheme of the Enquire.You don't seem to say what your "special weighting" is. If it's a pre-calculated weight for each document, you could store it as the wdf of a special extra term which indexes every document. Then a query Q becomes `X FILTER Q' and you can write a custom weighting scheme which returns the weight stored in the wdf of the special term X. It's not really how this was expected to be used, but it would do the job. In a way, it's a quick hack implementation of the different way of storing values I describe above!> the second problem ist, that i need a weight by value and term :-/ value > is perhaps 20 and this term has an weight of 2 so the value must be 40 > for the sorting.I'm not sure I follow. Can you describe the situation in a bit more detail? Cheers, Olly