thr3ads.net - Xapian devel - [Xapian-devel] NearPostList and get

If this information is useful, please help other people find it:
Share via:

Yann ROBIN

2008-Dec-28 15:01 UTC

[Xapian-devel] NearPostList and get_wdf

Hi,

I'm trying to make a near search that would give better scoring for
document where the words a nearer.
So i thought that i could change de wdf in the NearPostList according
to the distance between words. But it seems that the get_wdf of the
NearPostList is never called ... Instead it's the get_wdf of the
ChertPostList that it is called.
I don't think this is something wanted ? should i open a ticket ?

Thanks!

-- 
Yann

Yann ROBIN

2008-Dec-28 15:09 UTC

head link

[Xapian-devel] NearPostList and get_wdf

On Sun, Dec 28, 2008 at 4:01 PM, Yann ROBIN <me.show at gmail.com>
wrote:> Hi,
>
> I'm trying to make a near search that would give better scoring for
> document where the words a nearer.
> So i thought that i could change de wdf in the NearPostList according
> to the distance between words. But it seems that the get_wdf of the
> NearPostList is never called ... Instead it's the get_wdf of the
> ChertPostList that it is called.
> I don't think this is something wanted ? should i open a ticket ?
>
> Thanks!
>
Ok, i do understand why it is not called :

NearPostList inherit from SelectPostList which only do call on a given
postlist (that should be the database postlist).

So when the get_weight is made on the NearPostList, it calls the
SelectPostList implementation which calls source->get_weight();

source->get_weight() call get_wdf but it can't but the NearPostList
implementation ...

-- 
Yann

Richard Boulton

2008-Dec-29 12:50 UTC

head link

[Xapian-devel] NearPostList and get_wdf

On Sun, Dec 28, 2008 at 04:01:07PM +0100, Yann ROBIN
wrote:> Hi,
> 
> I'm trying to make a near search that would give better scoring for
> document where the words a nearer.
Fair enough.  This isn't what the current NearPostList is intended to do -
the current NearPostList is used to implement the OP_NEAR operator, which
returns only those documents in which the terms occur within the specified
window size, but returns a weight calculated simply by adding the weights
of the component terms.  This is sometimes what is wanted, but it would be
nice to have a way to do a NEAR search which weighted results based on how
near the terms are.
> So i thought that i could change de wdf in the NearPostList according
> to the distance between words. But it seems that the get_wdf of the
> NearPostList is never called ... Instead it's the get_wdf of the
> ChertPostList that it is called.
Indeed; the wdf is used in the weight calculation, and the weight
calculation is performed on each "leaf" postlist.

I'm not sure that modifying the wdf is really the way to go about this - it
seems to me that you might do better to use a custom weight class, which
factored in the frequencies of the individual terms, as well as their
proximity.

For an example of a postlist which combines several terms together and
calculates a weight on them, take a look at the SynonymPostList (and
corresponding OP_SYNONYM operator) on the "opsynonym" branch in SVN. 
This
combines the wdfs of the terms being "synonymed" together, and passes
that
into the standard weighting algorithm.  It has a few issues, though (which
is why it's not on trunk, yet).  See http://trac.xapian.org/ticket/50
> I don't think this is something wanted ? should i open a ticket ?
Feel free to open a feature request ticket, describing the feature that you
would like to exist.  OP_NEAR as it is currently implemented is behaving as
intended, though.

-- 
Richard

Xapian devel - Dec 2008 - NearPostList and get_wdf

[Xapian-devel] NearPostList and get_wdf

[Xapian-devel] NearPostList and get_wdf

[Xapian-devel] NearPostList and get_wdf