On Thu, Apr 04, 2019 at 02:42:14PM +0530, Sourav Saha
wrote:> I was going through the Xapian code base of different weighting schemes. In
> the lmweight code, I found out that we are returning non-negative numbers
> from get_maxpart, get_sumpart methods. Is this to avoid negative weight?
Yes - Xapian requires each term contributes a non-negative weight.
> Also in the Language Model with Jelinek Mercer Smoothing (LM-JM)
> implementation, I don't see any idf effect involved in that equation.
The
> LM-JM equation looks something like this:
> *(LAMBDA)* MLE(t,d) + (1-LAMBDA) * MLE(t,c)*
> However, if we bind it with idf, it will look like :
>
> *1 + ((LAMBDA) / (1-LAMBDA) * (MLE(t,d) / MLE(t,c))) *
> which is widely used everywhere. I am planning to patch an improved
> representation of LM-JM with the idf effect shortly. Kindly let me know for
> any concerns.
Interesting. I wonder if our JM implementation is just wrong, or if
there are older and newer variants or something.
Do you have a reference handy?
Cheers,
Olly