Hi there,
I am trying to use Xapian in combination with Perl but I have some
difficulties with sorting results.
I am adding postings with wdf number which depends on position and format of
term eg $doc->add_posting("marketing", 1, 50);
When i search for "marketing" i expect that document with biggest wdf
come
as first but it does not work this way.
delve ... -v -t marketing.|head produces:
1 447 11332
2 492 17863
3 268 5452
4 403 6768
5 448 9542
6 358 10792
7 362 6191
8 675 12712
9 311 7547
10 384 9814
11 316 6588
12 2306 160980
13 2038 118782
14 313 6509
15 490 10720
16 268 5167
17 537 12958
18 314 8072
19 317 7535
20 401 10683
21 316 4225
so I except to see as results
12
13
8
17
..
but i get as result:
ID 8 100%
ID 17 99%
ID 15 99%
ID 13 99%
ID 5 99%
ID 1 99%
ID 12 99%
ID 4 99%
ID 2 99%
ID 20 99%
ID 10 99%
ID 7 99%
ID 6 99%
ID 19 99%
ID 11 99%
Is it possible that get desired result with Xapian?
Thanks!
Roki
--
"Happy ProMail" bis 24. März: http://www.gmx.net/de/go/promail
Zum 6. Geburtstag gibt's GMX ProMail jetzt 66 Tage kostenlos!
On Fri, Mar 18, 2005 at 02:26:27PM +0100, roki roki wrote:> I am trying to use Xapian in combination with Perl but I have some > difficulties with sorting results. I am adding postings with wdf > number which depends on position and format of term eg > $doc->add_posting("marketing", 1, 50); When i search for "marketing" > i expect that document with biggest wdf come as first but it does > not work this way.No, it doesn't. The ranking is based on the BM25 term weighting function, which takes into account the document length as well as the term wdf.> Is it possible that get desired result with Xapian?If you add a dummy term you never search over with a wdf such that all document lengths are the same, that might do it. However it will stop the normal relevance mechanism from working, and will probably cause you other problems further down the line. What are you actually trying to achieve? There may be other approaches worth considering. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Fri, 2005-03-18 at 13:50 +0000, James Aylett wrote:> > Is it possible that get desired result with Xapian? > > If you add a dummy term you never search over with a wdf such that all > document lengths are the same, that might do it. However it will stop > the normal relevance mechanism from working, and will probably cause > you other problems further down the line.Or, you could perform your match using different parameters for the BM25 weighting function, such that document length is ignored. Try using a BM25Weight object with the k2 and b parameters to the constructor set to 0. (See the documentation of Enquire::set_weighting_scheme() at http://www.xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#a4 and the documentation of the BM25weight class at http://www.xapian.org/docs/apidoc/html/classXapian_1_1BM25Weight.html#a0 ) -- Richard Boulton <richard@tartarus.org>