Elliot Winkler
2008-Oct-09 15:16 UTC
[Xapian-devel] Sorting results by a "sort expression"
Olly, We currently use Sphinx for our website search function, but we're planning on using Xapian instead for a few of the extra features it has. Our website is written in Ruby on Rails, so of course we're using Xapian with Ruby bindings. I don't know if you're familiar with Sphinx but Sphinx allows you to pass a sort expression when you execute the search that will be evaluated against each result in the result set to calculate the final weight of the result [1]. You can refer to the already calculated weight in the expression (which in this case would be the default BM25 weight) as well as any of the values matched along with that document. In fact, this[2] is pretty much what we're looking for. In it I notice that you refer to "bias match" functionality and Rusty Conover's ExternalSourcePostList patch. Judging from past discussions [3] and svn [4] it appears that the bias match stuff was never really stable or something, so it was removed a while ago. Was ExternalSourcePostList meant to be dropped in as a replacement at some point in the future? I found two posts that Rusty made about it [5,6], but that's all -- have you heard from Rusty about it since? If so, how difficult would it be to integrate this with Xapian? (Unfortunately I am by no means a C++ expert -- it's been difficult enough just to stumble through the source code.) Thanks for any insight you can give... -- Elliot [1] http://sphinxsearch.com/doc.html#sort-expr [2] http://thread.gmane.org/gmane.comp.search.xapian.general/4075 [3] http://thread.gmane.org/gmane.comp.search.xapian.general/4774 [4] http://trac.xapian.org/changeset/8296 [5] http://thread.gmane.org/gmane.comp.search.xapian.devel/741 [6] http://thread.gmane.org/gmane.comp.search.xapian.general/4061 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20081009/dc15f5fb/attachment-0001.html>
Elliot, You are welcome to work yourself on any features that Sphinx has and Xapian don't. If that feature adds a value to Xapian and presere performance, then I see no reason to not include in Xapian source code. Thanks, Kevin Duraj http://myhealthcare.com On Thu, Oct 9, 2008 at 8:16 AM, Elliot Winkler <elliot.winkler at gmail.com> wrote:> Olly, > > We currently use Sphinx for our website search function, but we're planning > on using Xapian instead for a few of the extra features it has. Our website > is written in Ruby on Rails, so of course we're using Xapian with Ruby > bindings. I don't know if you're familiar with Sphinx but Sphinx allows you > to pass a sort expression when you execute the search that will be evaluated > against each result in the result set to calculate the final weight of the > result [1]. You can refer to the already calculated weight in the expression > (which in this case would be the default BM25 weight) as well as any of the > values matched along with that document. > > In fact, this[2] is pretty much what we're looking for. In it I notice that > you refer to "bias match" functionality and Rusty Conover's > ExternalSourcePostList patch. Judging from past discussions [3] and svn [4] > it appears that the bias match stuff was never really stable or something, > so it was removed a while ago. Was ExternalSourcePostList meant to be > dropped in as a replacement at some point in the future? I found two posts > that Rusty made about it [5,6], but that's all -- have you heard from Rusty > about it since? If so, how difficult would it be to integrate this with > Xapian? (Unfortunately I am by no means a C++ expert -- it's been difficult > enough just to stumble through the source code.) > > Thanks for any insight you can give... > > -- Elliot > > [1] http://sphinxsearch.com/doc.html#sort-expr > [2] http://thread.gmane.org/gmane.comp.search.xapian.general/4075 > [3] http://thread.gmane.org/gmane.comp.search.xapian.general/4774 > [4] http://trac.xapian.org/changeset/8296 > [5] http://thread.gmane.org/gmane.comp.search.xapian.devel/741 > [6] http://thread.gmane.org/gmane.comp.search.xapian.general/4061 > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >
On Thu, Oct 09, 2008 at 10:16:04AM -0500, Elliot Winkler wrote:> Was ExternalSourcePostList meant to be dropped in as a replacement at > some point in the future?The functionality from the ExternalSourcePostList patch plus the ability to supply weighting information was added to SVN trunk a while ago, but is now called Xapian::PostingSource: http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst Richard's been doing some work on adding support for multiple database and for remote databases: http://trac.xapian.org/ticket/295 This is too big a change to go in the 1.0 branch now. We're probably going to have a series of 1.1.x development releases leading to a stable 1.2 series, but we've not finally decided the details. Cheers, Olly
On Thu, Oct 09, 2008 at 10:16:04AM -0500, Elliot Winkler wrote:> I don't know if you're familiar with Sphinx but Sphinx allows you > to pass a sort expression when you execute the search that will be evaluated > against each result in the result set to calculate the final weight of the > result [1]. You can refer to the already calculated weight in the expression > (which in this case would be the default BM25 weight) as well as any of the > values matched along with that document.You might also want to look at Xapian::Sorter, which allows building a sort key from document values. You can't use the calculated weight to build the sort key though - if you want to involve the weight, you can either sort by it before or after the generated sort key. This restriction could be lifted, but currently we avoid calculating the weight when we don't need it so we would need to see if this actually makes a measurable difference here, and if so allow the Sorter to say it wants the weight. Arjen's patch here adds passing the weight in a simple way: http://thread.gmane.org/gmane.comp.search.xapian.general/6446/focus=6501 Xapian::Sorter was added in 1.0.5. Cheers, Olly