thr3ads.net - Xapian devel - Questions about Weighting Schemes project [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Sourav Saha

2019-Apr-04 09:12 UTC

Questions about Weighting Schemes project

Hi,
I was going through the Xapian code base of different weighting schemes. In
the lmweight code, I found out that we are returning non-negative numbers
from get_maxpart, get_sumpart methods. Is this to avoid negative weight?
Also in the Language Model with Jelinek Mercer Smoothing (LM-JM)
implementation, I don't see any idf effect involved in that equation. The
LM-JM equation looks something like this:
 *(LAMBDA)* MLE(t,d) + (1-LAMBDA) * MLE(t,c)*
However, if we bind it with idf, it will look like :

*1 + ((LAMBDA) / (1-LAMBDA) * (MLE(t,d) / MLE(t,c))) *
which is widely used everywhere. I am planning to patch an improved
representation of LM-JM with the idf effect shortly. Kindly let me know for
any concerns.

Thanks and Regards,
-Sourav
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20190404/49fe3c9e/attachment.html>

Olly Betts

2019-Apr-05 06:01 UTC

head link

Questions about Weighting Schemes project

On Thu, Apr 04, 2019 at 02:42:14PM +0530, Sourav Saha
wrote:> I was going through the Xapian code base of different weighting schemes. In
> the lmweight code, I found out that we are returning non-negative numbers
> from get_maxpart, get_sumpart methods. Is this to avoid negative weight?
Yes - Xapian requires each term contributes a non-negative weight.
> Also in the Language Model with Jelinek Mercer Smoothing (LM-JM)
> implementation, I don't see any idf effect involved in that equation.
The
> LM-JM equation looks something like this:
>  *(LAMBDA)* MLE(t,d) + (1-LAMBDA) * MLE(t,c)*
> However, if we bind it with idf, it will look like :
> 
> *1 + ((LAMBDA) / (1-LAMBDA) * (MLE(t,d) / MLE(t,c))) *
> which is widely used everywhere. I am planning to patch an improved
> representation of LM-JM with the idf effect shortly. Kindly let me know for
> any concerns.
Interesting.  I wonder if our JM implementation is just wrong, or if
there are older and newer variants or something.

Do you have a reference handy?

Cheers,
    Olly

Xapian devel - Apr 2019 - Questions about Weighting Schemes project

Questions about Weighting Schemes project

Questions about Weighting Schemes project