thr3ads.net - Xapian devel - [Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank) [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Mudit Gupta

2013-Mar-21 13:28 UTC

[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

Hello Everyone,

I am looking forward to contribute to Xapian and also apply as a Google
Summer of Code student. I would like to start by introducing myself. I am a
final year M.Sc.(H) Chemistry and B.E. (H) Electronics and Instrumentation
student at BITS - Pilani, Goa. I am interested in Machine Learning and
presently pursuing my thesis on the same. I have been selected for Google
Summer of Code (GSoC) twice. During GSoC -11, I worked for the Centre for
the Study of Complex Systems, University of Michigan on making evolutionary
model for swarms (http://code.google.com/p/cscs-repast-demos/wiki/Mudit)
and last summer, GSoC - 12, I worked on making HLA interfaces for The
network simulator - 3 to aid ns-3 in distributed simulation (
http://www.nsnam.org/wiki/index.php/GSOC2012HLA). My first-degree thesis
last semester was on computational cognitive modelling of artificial
investors(
http://code.google.com/p/multiagent-reinforcement-learning/downloads/list).
This semester I am working on the use of probabilistic algorithms for
dynamic gesture recognition. My googlecode profile can be found here (
http://code.google.com/u/110675325175605367090/ ) and my linkedIn profile
can be found here (
http://www.linkedin.com/profile/view?id=79832898&trk=tab_pro)

I am interested in "Learning To Rank" project. If I am not wrong, I
found
the framework incorporated by Parth in the cloned code. It needed some
refactoring in order to incorporate more algorithms and was done by Rishabh
and available in his git repo (https://github.com/rishabhmehrotra/xapian)
but is still not merged. So, I assume I should think of additions to the
code in Rishbh's repo. Moreover, I noticed that SVM-rank, ListMLE and
ListNet is already present in the code. I am interested in addition of a
random forest approach and looking for appropriate libraries. I would be
great to get input by the Xapian community in terms of preference of
algorithms and open source libraries. It would also be great to know the
priority of the Letor project to the Xapian community.

Any further pointers/links related to the "Letor" project would be
appreciated. Thank you for your time.

Best Regards,

Mudit Raj Gupta
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20130321/dcdb4a03/attachment.htm>

Olly Betts

2013-Mar-21 22:37 UTC

head link

[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

On Thu, Mar 21, 2013 at 06:58:41PM +0530, Mudit Gupta
wrote:> I am interested in "Learning To Rank" project.  If I am not
wrong, I found
> the framework incorporated by Parth in the cloned code. It needed some
> refactoring in order to incorporate more algorithms and was done by Rishabh
> and available in his git repo (https://github.com/rishabhmehrotra/xapian)
> but is still not merged. So, I assume I should think of additions to the
> code in Rishbh's repo.
Yes, I think that's the best starting point.
> Moreover, I noticed that SVM-rank, ListMLE and
> ListNet is already present in the code. I am interested in addition of a
> random forest approach and looking for appropriate libraries. I would be
> great to get input by the Xapian community in terms of preference of
> algorithms and open source libraries. It would also be great to know the
> priority of the Letor project to the Xapian community.
Parth and I talked this over recently, and we concluded that this year a
major focus should be on consolidating the existing work.  That doesn't
necessarily mean that new features can't be looked at, but one of the
deliverables should really be a xapian-letor module which we're happy to
tag as a stable release.  A project which adds more algorithms is
interesting, but if the end result isn't useful to Xapian users, there's
much less benefit to be had from it.

One of the major things missing is a testsuite.  Without any automated
tests, it's hard to have much confidence that the code works, and it
makes it much harder to make changes to the code in the future without
introducing new bugs.  So I think adding a testsuite is important.
The harness from xapian-core is suitable, but testcases need writing,
and the bugs that actually writing testcases will inevitably uncover
need fixing.

We should also look at what features are missing from xapian-core
which would be useful for xapian-letor, and consider implementing them -
especially if they have other potential uses.  Two that I'm aware of
are:

* Fundamentally, xapian-letor wants to take a Xapian::MSet object and
  reorder it, so an API which allows that would be handy - then the
  output of xapian-letor can be an Xapian::MSet object, allowing it to
  be cleanly slotted into existing applications using the Xapian API.
  An MSet reordering API also has other potential uses - for example,
  clustering results.

* Field-related features currently have to be calculated specially by
  xapian-letor, but these would also be useful to have for other uses
  (e.g. implementing BM25f) so tracking them in the database backend
  in xapian-core is worth investigating.

I'll update the entry on the project ideas page with the above shortly.

Cheers,
    Olly

Maybe Matching Threads

Search for more apparently analagous threads

Xapian devel - Mar 2013 - GSOC - 2013 - Introduction (Learning to Rank)

[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)

Maybe Matching Threads