Hi there, I am trying to use Xapian in combination with Perl but I have some difficulties with sorting results. I am adding postings with wdf number which depends on position and format of term eg $doc->add_posting("marketing", 1, 50); When i search for "marketing" i expect that document with biggest wdf come as first but it does not work this way. delve ... -v -t marketing.|head produces: 1 447 11332 2 492 17863 3 268 5452 4 403 6768 5 448 9542 6 358 10792 7 362 6191 8 675 12712 9 311 7547 10 384 9814 11 316 6588 12 2306 160980 13 2038 118782 14 313 6509 15 490 10720 16 268 5167 17 537 12958 18 314 8072 19 317 7535 20 401 10683 21 316 4225 so I except to see as results 12 13 8 17 .. but i get as result: ID 8 100% ID 17 99% ID 15 99% ID 13 99% ID 5 99% ID 1 99% ID 12 99% ID 4 99% ID 2 99% ID 20 99% ID 10 99% ID 7 99% ID 6 99% ID 19 99% ID 11 99% Is it possible that get desired result with Xapian? Thanks! Roki -- "Happy ProMail" bis 24. März: http://www.gmx.net/de/go/promail Zum 6. Geburtstag gibt's GMX ProMail jetzt 66 Tage kostenlos!
On Fri, Mar 18, 2005 at 02:26:27PM +0100, roki roki wrote:> I am trying to use Xapian in combination with Perl but I have some > difficulties with sorting results. I am adding postings with wdf > number which depends on position and format of term eg > $doc->add_posting("marketing", 1, 50); When i search for "marketing" > i expect that document with biggest wdf come as first but it does > not work this way.No, it doesn't. The ranking is based on the BM25 term weighting function, which takes into account the document length as well as the term wdf.> Is it possible that get desired result with Xapian?If you add a dummy term you never search over with a wdf such that all document lengths are the same, that might do it. However it will stop the normal relevance mechanism from working, and will probably cause you other problems further down the line. What are you actually trying to achieve? There may be other approaches worth considering. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Fri, 2005-03-18 at 13:50 +0000, James Aylett wrote:> > Is it possible that get desired result with Xapian? > > If you add a dummy term you never search over with a wdf such that all > document lengths are the same, that might do it. However it will stop > the normal relevance mechanism from working, and will probably cause > you other problems further down the line.Or, you could perform your match using different parameters for the BM25 weighting function, such that document length is ignored. Try using a BM25Weight object with the k2 and b parameters to the constructor set to 0. (See the documentation of Enquire::set_weighting_scheme() at http://www.xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#a4 and the documentation of the BM25weight class at http://www.xapian.org/docs/apidoc/html/classXapian_1_1BM25Weight.html#a0 ) -- Richard Boulton <richard@tartarus.org>