On Wed, Jan 21, 2009 at 11:34:54PM +0100, amix wrote:> I have tried to implement my own random weight, but that did not work
> out. I would also like this random sorting to perform good and work on
> big result sets.
Implementing a random weighting scheme in Python should be possible,
though the overhead of the callbacks might be an issue if you're working
with a lot of data (I've never profiled, but it's a potential issue as
there's at least one per query term per matching document).
If you're happy using SVN trunk, then BoolWeight plus a PostingSource
which returns a random weight boost between 0 and some fixed value
should do the job. That's one callback per matching document, which
is better for long queries.
> Is this possible (I would really like to see some example code if it's
> possible :-)). I am using Xapian from Python (which probably makes
> things harder).
There's no existing example, and I don't have the time to write one
at present (sorry).
> I could do random selects easily if counts were exact counts and not
> estimates - so returning exact counts would also solve my problem. I
> need performance thought, so setting check_at_least to 1 million is
> not a solution (unless it performs really good).
It's probably worth investigating. High check_at_least prevents various
terminate early optimisations, but then it seems to me that so will
anything which is picking random matches. This would also avoid calling
back to Python code.
Cheers,
Olly