I hope I'm not beating a dead horse here, but we recently started evaluating xapian and Xapwrap (python wrapper around it) for our project and I've been quickly trying to soak up a lot of the docs and concepts, so I hope I can explain my question clearly. We have a use case where we must return the first 50 most recent documents that match our query. We don't want the first 50 matches to the query that are then sorted by date. I hope the distinction is clear enough. What we are unsure of from reading the documents is if setting a sort value on our query (enq.set_sort_by_value()) will return the first 50 documents that match the query, or the first 50 matches, then sorted by that value. I've read a couple of the theads on sorting dates but I was unclear which approach would be needed to successfuly execute this kind of query. Any help from a kind xapian soul would be much apreciated! We love what xapian has been able to do for us so far, and hopefully there is a fairly easy way to do this kind of search. Thanks! -Michel
On Tue, Mar 21, 2006 at 10:56:32PM -0800, Michel Pelletier wrote:> We have a use case where we must return the first 50 most recent > documents that match our query. We don't want the first 50 matches to > the query that are then sorted by date. I hope the distinction is clear > enough. What we are unsure of from reading the documents is if setting > a sort value on our query (enq.set_sort_by_value()) will return the > first 50 documents that match the query, or the first 50 matches, then > sorted by that value.Hi, Michel. Olly is away right now; others may be able to back me up here, but my understanding is that set_sort_by_value() will give you the first N (documents that match the query sorted by the value). This is slower than not sorting for precisely this reason. Does that answer your question? It's pretty easy to distinguish between the two - you only need about three documents in a db, and to request an mset of size 1. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
> We have a use case where we must return the first 50 most recent > documents that match our query. We don't want the first 50 matches to > the query that are then sorted by date. I hope the distinction is clear > enough. What we are unsure of from reading the documents is if setting > a sort value on our query (enq.set_sort_by_value()) will return the > first 50 documents that match the query, or the first 50 matches, then > sorted by that value.Xapian will perform query first with given query terms and then it will sort by value. It will sort the whole result, so you don't worry if the results are only first 50 matches. When query terms are blank it will return nothing, it means MSet would be empty. Therefore you must give something as query terms. I guess this is the main concern to you in the view point of general RDBMS, because in RDBMS we can get all or only first 50 rows in date order. You can put date to the term list such as 'D20060322' when indexing documents, and simultaneously put the date into value. And you can search by 'D2006*' or 'D200603*' and use set_sort_by_value() to the date value. If docid is in the same order as date, you don't have to use set_sort_by_value(). My answer would be inappropriate, and you can get better idea from others. Sungsoo Kim