Hello, I wonder if there is a way to cause Xapian to order a result set purely by docid. In other words, once the result set has been determined, I'd like the results to be returned to me ordered by their docid, as opposed to by their match relevance. The problem at hand is that I'm building a search engine for a mailing list and I would like to return matches sorted by date; ordering by docid (since the messages are indexed in chronological order) seems to be the simplest way to do so, but because I'm running a probabilistic query I don't think I can use Enquire::set_docid_order, since that will first sort by relevance and then by docid. I thought about adding the date as a value and then use set_sort_by_value, but I wonder about performance (the database contains about one million records). Any thoughts? Thanks, Marco
On Wed, Jun 29, 2005 at 12:14:34PM -0400, Marco Tabini wrote:> The problem at hand is that I'm building a search engine for a mailing list > and I would like to return matches sorted by date; ordering by docid (since > the messages are indexed in chronological order) seems to be the simplest > way to do so, but because I'm running a probabilistic query I don't think I > can use Enquire::set_docid_order, since that will first sort by relevance > and then by docid.The answer is to use Enquire::set_docid_order to set BoolWeight as the weighting scheme. This is suggested in the API docs for set_docid_order but it could be more explicit: Note: If you add documents in strict date order, then a boolean search with set_docid_order(Xapian::Enquire::DESCENDING) is a very efficient way to perform "sort by date, newest first". So you want: Xapian::Enquire enq; // ... enq.set_docid_order(Xapian::Enquire::DESCENDING); enq.set_weighting_scheme(Xapian::BoolWeight()); This is the technique I'm using for gmane: http://rain.gmane.org/ Currently DESCENDING is slower than ASCENDING, because ASCENDING can terminate early. I'm going to tweak things so posting lists are run backwards in the DESCENDING case, which should make it about as fast as ASCENDING. This does mean that you don't get the probabilistic weights, but that's probably not really a problem.> I thought about adding the date as a value and then use set_sort_by_value, > but I wonder about performance (the database contains about one million > records).That would be somewhat slower. Cheers, Olly
On 6/29/05 1:02 PM, "Olly Betts" <olly@survex.com> wrote:> On Wed, Jun 29, 2005 at 12:14:34PM -0400, Marco Tabini wrote: > > The answer is to use Enquire::set_docid_order to set BoolWeight as the > weighting scheme. This is suggested in the API docs for set_docid_order > but it could be more explicit: > > Note: If you add documents in strict date order, then a boolean search > with set_docid_order(Xapian::Enquire::DESCENDING) is a very efficient > way to perform "sort by date, newest first". >Thanks. I missed the bit about how to set the search to Boolean mode :) Cheers, Marco -- Marco Tabini President & CEO Marco Tabini & Associates, Inc. 28 Bombay Ave. Toronto, ON M3H 1B7 Canada Phone: +1 (416) 630-6202 Fax: +1 (416) 630-5057