On 2 Mar 2015, at 11:55, Ayush Tomar <ayushtomar at gmail.com> wrote:
> I'm Ayush Tomar, junior undergrad in Computer Science from New Delhi,
India. I love C++ coding and working on machine learning and information
retrieval project. I was exploring the GSoC ideas for Xapian and the project on
"Adding Weighting Schemes" looked really interesting to me. I wanted
to work on text mining/IR this summer and this idea seems perfect!
Hi, Ayush ? Xapian hasn?t been accepted as a mentoring organisation for GSoC
this year. However if you?re interested in working on this (or any other)
project outside GSoC then we can still provide the same support we would have
done as part of GSoC.
> I have gone through the getting started guide and started to understand of
how the Xapian code is connected and identifying which parts I need to focus on
for the project. I have started to research what similar and new schemes could
be added to Xapian. It'll be great help if someone could suggest a weighting
scheme on which I should focus for the entry level task.
Many weighting schemes we?d like to add require tracking more statistics,
meaning they aren?t really entry-level things (this may not be true for some of
the DfR ones we don?t have as yet). You could perhaps improve branch coverage in
tests for the weighting schemes (see
http://lcov.xapian.org/latest/weight/index.html); for instance it looks like
there are various automatic adjustments of smoothing parameters in LMWeight that
aren?t tested under all conditions (see
http://lcov.xapian.org/latest/weight/lmweight.cc.gcov.html; anything in orange
isn?t being tested, and will require a careful unit test writing to exercise it
and ensure it does what it?s supposed to). Alternatively any other small project
will get you working with the code (and give you an opportunity to get one or
two small commits in, which is valuable in getting familiar with PRs to Xapian,
how commits should look &c).
If you aren?t able to get involved with Xapian outside GSoC, then of course you
can ignore all of this, but hopefully you?ll be able to in some way either over
the summer or at some other time! Just shout out if you need any help.
J
--
James Aylett, occasional trouble-maker
xapian.org