thr3ads.net - Xapian discuss - [Xapian-discuss] Random ordering from Python [Jan 2009]

If this information is useful, please help other people find it:
Share via:

amix

2009-Jan-21 22:34 UTC

[Xapian-discuss] Random ordering from Python

I have tried different full text search engines (Lucene, Sphinx) and I
must say that Xapian is my favorite (full unicode support + live index
updates are really nice features).

I have run into a problem thought and it's sorting a result set randomly.

I have tried to implement my own random weight, but that did not work
out. I would also like this random sorting to perform good and work on
big result sets.

Is this possible (I would really like to see some example code if it's
possible :-)). I am using Xapian from Python (which probably makes
things harder).

I am using this random ordering to suggest a person some other
interesting people (for example, some users from Denmark that are
between 20 and 30 years old).

I could do random selects easily if counts were exact counts and not
estimates - so returning exact counts would also solve my problem. I
need performance thought, so setting check_at_least to 1 million is
not a solution (unless it performs really good).

Kind Regards,
Amir Salihefendic ( http://amix.dk/ )

Olly Betts

2009-Jan-22 00:50 UTC

head link

[Xapian-discuss] Random ordering from Python

On Wed, Jan 21, 2009 at 11:34:54PM +0100, amix wrote:> I have tried to implement my own random weight, but that did not work
> out. I would also like this random sorting to perform good and work on
> big result sets.
Implementing a random weighting scheme in Python should be possible,
though the overhead of the callbacks might be an issue if you're working
with a lot of data (I've never profiled, but it's a potential issue as
there's at least one per query term per matching document).

If you're happy using SVN trunk, then BoolWeight plus a PostingSource
which returns a random weight boost between 0 and some fixed value
should do the job.  That's one callback per matching document, which
is better for long queries.
> Is this possible (I would really like to see some example code if it's
> possible :-)). I am using Xapian from Python (which probably makes
> things harder).
There's no existing example, and I don't have the time to write one
at present (sorry).
> I could do random selects easily if counts were exact counts and not
> estimates - so returning exact counts would also solve my problem. I
> need performance thought, so setting check_at_least to 1 million is
> not a solution (unless it performs really good).
It's probably worth investigating.  High check_at_least prevents various
terminate early optimisations, but then it seems to me that so will
anything which is picking random matches.  This would also avoid calling
back to Python code.

Cheers,
    Olly

Xapian discuss - Jan 2009 - Random ordering from Python

[Xapian-discuss] Random ordering from Python

[Xapian-discuss] Random ordering from Python