On Tue, Mar 20, 2012 at 03:28:30AM +0530, Chandan kumar Singh
wrote:> I was fascinated by Xapian's work
> and am really interested in the *Weighting Schemes* project. Currently, I
> am doing literature review on some weighting schemes other than BM25. At
> present, I am studying the Divergence from Randomness family.
> It would be of great help, if someone at Xapian could help me get started.
> In particular, I would like to know in more detail about Xapian's
> expectations from this project and what really would make a great proposal
> for it.
We don't really have a detailed plan for any of the projects.  The idea
is you should explore the idea, what needs to be done, and how best to
go about that.  You can come up with a proposal which reflects your own
interests and you'll enjoy working on more, and we can see how you
approach problems.
That said, it would be good if this project produced some results which
were widely useful, and I think some of the DfR schemes are most likely
to do that - there are schemes which are fairly robust to keyword
spamming (which is useful when indexing content you have little or no
control over), and which are parameter free (tuning weighting scheme
parameters is rarely done outside of academia in my experience).
Other schemes are also interesting - some may be useful in specialised
situations, and they may be useful as features to feed into the Learning
to Rank framework, or useful as baselines for academic work.
It would also be interesting to see some evaluations of how the new
schemes compare with the existing ones and with each other with
different sorts of data, both in terms of retrieval effectiveness
and speed, so we can recommend to users which to use and when in the
documentation.
Cheers,
    Olly