thr3ads.net - Xapian devel - GSOC 2016 Project: Weighting Schemes [Mar 2016]

If this information is useful, please help other people find it:
Share via:

jaideep singh chauhan

2016-Mar-05 01:47 UTC

GSOC 2016 Project: Weighting Schemes

Hi,

I am a third year undergraduate student at the Indian Institute Of
Technology, Kharagpur.I'm highly interested in the field of Information
Retrieval and have experience working on IR based projects.I am currently
working on a researcher recommendation project where I've used the BM25
weighting scheme to diversify the recommendations for researchers working
in a particular research field.I'm proficient at C/C++, JAVA and python and
have relevant knowledge of Information Retrieval.I'm new to open source
development though and I hope I'll progress soon.

I went through the list of your proposed projects and the one that
fascinates me the most is Ranking:Weighting schemes.

The project proposes incorporation of more weighting schemes and thus what
I wanted to know was that what kind of schemes are we looking at to
incorporate, are they various other probabilistic weighting schemes similar
to BM25 or is there any scope for improvements in the language modeling .
Both of the above mentioned class of schemes are highly parameter dependent
thus we can also look into some non-parametric weighting schemes.Based on
the response I can come up with the pros and cons of the various weighting
schemes.
Looking forward to contributing to Xapian!
Cheers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160305/82092e67/attachment.html>

James Aylett

2016-Mar-06 01:06 UTC

head link

GSOC 2016 Project: Weighting Schemes

On Sat, Mar 05, 2016 at 07:17:40AM +0530, jaideep singh chauhan wrote:
> I went through the list of your proposed projects and the one that
> fascinates me the most is Ranking:Weighting schemes.
> 
> The project proposes incorporation of more weighting schemes and thus what
> I wanted to know was that what kind of schemes are we looking at to
> incorporate, are they various other probabilistic weighting schemes similar
> to BM25 or is there any scope for improvements in the language modeling .
> Both of the above mentioned class of schemes are highly parameter dependent
> thus we can also look into some non-parametric weighting schemes.Based on
> the response I can come up with the pros and cons of the various weighting
> schemes.
Hi, Jaideep -- welcome to Xapian!

In terms of which schemes to implement, we'd want you to propose what
you think is worth adding. You'll note that the project description
talks specifically about further schemes and options from SMART, and
parameter-free DfR.

On the language modelling front, there may be other models worth
considering, although it would be worth showing some recent academic
work that justifies it for IR. Note the comment in Manning et al
(2008, 12.1.2 p241) that:

| However, most language-modeling work in IR has used unigram language
| models. IR is not the place where you most immediately need complex
| language models

Although it does note that there may be value in more sophisticated
models for phrase and proximity queries in particular, with some
references to recent work (12.5, p252).

J

Manning C D, Raghavan P and Schutze H (2008), Introduction to
Information Retrieval, Cambridge University Press. Available online at
<http://nlp.stanford.edu/IR-book/>.

-- 
  James Aylett, occasional trouble-maker
  xapian.org

Xapian devel - Mar 2016 - GSOC 2016 Project: Weighting Schemes

GSOC 2016 Project: Weighting Schemes

GSOC 2016 Project: Weighting Schemes