Tim Brody
2011-Aug-11 07:49 UTC
[Xapian-discuss] Fwd: Re: what is the fastest way to fetch results which are sorted by timestamp ?
(Forwarded off-list message) -------- Original Message -------- Subject: Re: [Xapian-discuss] what is the fastest way to fetch results which are sorted by timestamp ? Date: Thu, 11 Aug 2011 01:06:36 +0800 From: ??? <panjunyong at gmail.com> To: Tim Brody <tdb2 at ecs.soton.ac.uk> On Wed, Aug 10, 2011 at 6:39 PM, Tim Brody <tdb2 at ecs.soton.ac.uk> wrote:> Hi, > > In terms of the enquiry, do you mean this?: > set_weighting_scheme(Xapian::BoolWeight()); > set_docid_order(Xapian::Enquire::DESCENDING); > >In my test, it is more than 10 times slower than : set_weighting_scheme(Xapian::BoolWeight()); set_docid_order(Xapian::Enquire::ASCENDING); Why? What's the most efficient process to build multiple Xapian indexes? Can> the "relevance" index provide any hints to building the sorted indexes? > > Cheers, > Tim. > > On Tue, 2011-08-09 at 18:04 +0100, Richard Boulton wrote: > > On 9 August 2011 17:48, makao009 <makao009 at 126.com> wrote: > > > what is the fastest way to fetch results which are sorted bytimestamp> ? > > > > The fastest possible way is to have your index sorted by timestamp > > (ie, such that document IDs increase as the timestamp increases). > > That way, the search can stop as soon as sufficient matches have been > > found. It can be very awkward to get an index in such order though, > > particularly in the face of updates, assuming that you want the sort > > order to show most recent first. > > > > > i want to use xapian as my search engine , use > add_boolean_term(something) and > add_value(0,sortable_serialise(get_timestamp())) to a doc. > > > search through enquire.set_weighting_scheme(xapian.BoolWeight()) and > enquire.set_sort_by_value(0,True) to ensure that the results are sortedby> the timestamp. > > > > That's another approach, certainly. > > > > > This method is ok , but is there a faster way to do that ? Since ihave> millions of records . > > > > Sorting the database, or some variant of that, is the way to get > > really fast sorted results. > > > > There's a variation I experimented with using Xappy, involving sorting > > as much of the database as possible, keeping track of the range of > > document IDs for which the values were sorted, and using a custom > > PostingSource to take advantage of that knowledge to skip past the > > document IDs which were known to be at too low a value. This worked > > pretty well (not quite as fast as using a fully sorted database), but > > is quite fiddly to maintain the ordering (and you need to use a custom > > PostingSource, so if you're using one of the language bindings, you'd > > need to compile your own custom Xapian). > > > > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >-- ??? ??????? http://everydo.com ??????OA -- All the best, Tim.
Richard Boulton
2011-Aug-11 08:04 UTC
[Xapian-discuss] Fwd: Re: what is the fastest way to fetch results which are sorted by timestamp ?
On 11 August 2011 08:49, Tim Brody <tdb2 at ecs.soton.ac.uk> wrote:>> set_weighting_scheme(Xapian::BoolWeight()); >> set_docid_order(Xapian::Enquire::DESCENDING); > > In my test, it is more than 10 times slower than : > > set_weighting_scheme(Xapian::BoolWeight()); > set_docid_order(Xapian::Enquire::ASCENDING); > > Why?The xapian indexes for each term are stored in ascending order of document ID. When performing a search, all the indexes for the terms involved in the query are iterated through in parallel, in ascending order of document ID. If the BoolWeight scheme is in use (or, more generally, if the maximum weight that can be returned is 0), and the result docid order is ASCENDING, as soon as sufficient matching documents have been found the match process can stop, without getting to the end of the indexes. Unfortunately, it is not possible to iterate through the indexes in reverse order, so if the order is DESCENDING, the match process has to run to the end of the indexes, in order to find out what the last N items were. This is obviously quite a bit slower. -- Richard
Tim Brody
2011-Aug-11 08:55 UTC
[Xapian-discuss] Fwd: Re: what is the fastest way to fetch results which are sorted by timestamp ?
On Thu, 11 Aug 2011 09:04:33 +0100, Richard Boulton <richard at tartarus.org> wrote:> On 11 August 2011 08:49, Tim Brody <tdb2 at ecs.soton.ac.uk> wrote: >>> set_weighting_scheme(Xapian::BoolWeight()); >>> set_docid_order(Xapian::Enquire::DESCENDING); >> >> In my test, it is more than 10 times slower than : >> >> set_weighting_scheme(Xapian::BoolWeight()); >> set_docid_order(Xapian::Enquire::ASCENDING); >> >> Why? > > The xapian indexes for each term are stored in ascending order of > document ID. When performing a search, all the indexes for the terms > involved in the query are iterated through in parallel, in ascending > order of document ID. If the BoolWeight scheme is in use (or, more > generally, if the maximum weight that can be returned is 0), and the > result docid order is ASCENDING, as soon as sufficient matching > documents have been found the match process can stop, without getting > to the end of the indexes. > > Unfortunately, it is not possible to iterate through the indexes in > reverse order, so if the order is DESCENDING, the match process has to > run to the end of the indexes, in order to find out what the last N > items were. This is obviously quite a bit slower.I took that example from here: http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#bbf7ff734ff6adcb301e493f6eed803b Which says reverse-sorting by docid is "very efficient" ? -- All the best, Tim.
潘俊勇
2011-Aug-12 06:20 UTC
[Xapian-discuss] Fwd: Re: what is the fastest way to fetch results which are sorted by timestamp ?
On Thu, Aug 11, 2011 at 7:20 PM, Richard Boulton <richard at tartarus.org>wrote:> 2011/8/11 ??? <panjunyong at gmail.com>: > > Thanks! > > Is is possible to hack xapian to make descending sort faster? Could you > give > > me some hint? > > The relevant ticket is http://trac.xapian.org/ticket/52 > > There's a very old patch attached to that ticket; it won't apply to > current xapian code, but might be possible to bring up-to-date without > too much difficulty. >This is very helpful for us. But I found the patch is for quartz backend, but the recent database is chert. I know nothing with the database details. :-( I wonder if I have to switch to a old xapian version.> -- > Richard >-- ??? ??????? http://everydo.com ??????OA