It struck me that a lot of the complication in the handling of phrases is because we allow the terms to occur with gaps between them. However in a lot of cases (probably the majority even), this generality isn't actually used, and the window size is equal to the number of terms. So I think it's worth considering a special case for handling such "exact" phrases as the much simpler code path is likely to speed up some cases. I'm hoping it might also serve as a basis for a better implementation of the general case. Here's a patch (which I think should apply cleanly to 0.9.9): http://oligarchy.co.uk/xapian/patches/faster-exact-phrases.patch If you try it out, let us know how it does. Cheers, Olly
Olly Betts wrote:> Here's a patch (which I think should apply cleanly to 0.9.9): > > http://oligarchy.co.uk/xapian/patches/faster-exact-phrases.patch > > If you try it out, let us know how it does.It does apply cleanly, but doesn't compile. It doesn't involve Makefile.in's only .am's, so I already editted that too. But after having editted the matcher/Makefile.in and having reran ./configure I still get this message: ../.libs/libxapian.so: undefined reference to `ExactPhrasePostList::ExactPhrasePostList(Xapian::PostingIterator::Internal*, std::vector<Xapian::PostingIterator::Internal*, std::allocator<Xapian::PostingIterator::Internal*> > const&)' Best regards, Arjen
Olly Betts wrote:> So I think it's worth considering a special case for handling such > "exact" phrases as the much simpler code path is likely to speed up > some cases. I'm hoping it might also serve as a basis for a better > implementation of the general case.I don't see any noticable difference between 'prior to this patch' and this patch, with my set of earlier used phrase-queries. Regards, Arjen