Mudit Gupta
2013-Mar-21 13:28 UTC
[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)
Hello Everyone, I am looking forward to contribute to Xapian and also apply as a Google Summer of Code student. I would like to start by introducing myself. I am a final year M.Sc.(H) Chemistry and B.E. (H) Electronics and Instrumentation student at BITS - Pilani, Goa. I am interested in Machine Learning and presently pursuing my thesis on the same. I have been selected for Google Summer of Code (GSoC) twice. During GSoC -11, I worked for the Centre for the Study of Complex Systems, University of Michigan on making evolutionary model for swarms (http://code.google.com/p/cscs-repast-demos/wiki/Mudit) and last summer, GSoC - 12, I worked on making HLA interfaces for The network simulator - 3 to aid ns-3 in distributed simulation ( http://www.nsnam.org/wiki/index.php/GSOC2012HLA). My first-degree thesis last semester was on computational cognitive modelling of artificial investors( http://code.google.com/p/multiagent-reinforcement-learning/downloads/list). This semester I am working on the use of probabilistic algorithms for dynamic gesture recognition. My googlecode profile can be found here ( http://code.google.com/u/110675325175605367090/ ) and my linkedIn profile can be found here ( http://www.linkedin.com/profile/view?id=79832898&trk=tab_pro) I am interested in "Learning To Rank" project. If I am not wrong, I found the framework incorporated by Parth in the cloned code. It needed some refactoring in order to incorporate more algorithms and was done by Rishabh and available in his git repo (https://github.com/rishabhmehrotra/xapian) but is still not merged. So, I assume I should think of additions to the code in Rishbh's repo. Moreover, I noticed that SVM-rank, ListMLE and ListNet is already present in the code. I am interested in addition of a random forest approach and looking for appropriate libraries. I would be great to get input by the Xapian community in terms of preference of algorithms and open source libraries. It would also be great to know the priority of the Letor project to the Xapian community. Any further pointers/links related to the "Letor" project would be appreciated. Thank you for your time. Best Regards, Mudit Raj Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130321/dcdb4a03/attachment.htm>
Olly Betts
2013-Mar-21 22:37 UTC
[Xapian-devel] GSOC - 2013 - Introduction (Learning to Rank)
On Thu, Mar 21, 2013 at 06:58:41PM +0530, Mudit Gupta wrote:> I am interested in "Learning To Rank" project. If I am not wrong, I found > the framework incorporated by Parth in the cloned code. It needed some > refactoring in order to incorporate more algorithms and was done by Rishabh > and available in his git repo (https://github.com/rishabhmehrotra/xapian) > but is still not merged. So, I assume I should think of additions to the > code in Rishbh's repo.Yes, I think that's the best starting point.> Moreover, I noticed that SVM-rank, ListMLE and > ListNet is already present in the code. I am interested in addition of a > random forest approach and looking for appropriate libraries. I would be > great to get input by the Xapian community in terms of preference of > algorithms and open source libraries. It would also be great to know the > priority of the Letor project to the Xapian community.Parth and I talked this over recently, and we concluded that this year a major focus should be on consolidating the existing work. That doesn't necessarily mean that new features can't be looked at, but one of the deliverables should really be a xapian-letor module which we're happy to tag as a stable release. A project which adds more algorithms is interesting, but if the end result isn't useful to Xapian users, there's much less benefit to be had from it. One of the major things missing is a testsuite. Without any automated tests, it's hard to have much confidence that the code works, and it makes it much harder to make changes to the code in the future without introducing new bugs. So I think adding a testsuite is important. The harness from xapian-core is suitable, but testcases need writing, and the bugs that actually writing testcases will inevitably uncover need fixing. We should also look at what features are missing from xapian-core which would be useful for xapian-letor, and consider implementing them - especially if they have other potential uses. Two that I'm aware of are: * Fundamentally, xapian-letor wants to take a Xapian::MSet object and reorder it, so an API which allows that would be handy - then the output of xapian-letor can be an Xapian::MSet object, allowing it to be cleanly slotted into existing applications using the Xapian API. An MSet reordering API also has other potential uses - for example, clustering results. * Field-related features currently have to be calculated specially by xapian-letor, but these would also be useful to have for other uses (e.g. implementing BM25f) so tracking them in the database backend in xapian-core is worth investigating. I'll update the entry on the project ideas page with the above shortly. Cheers, Olly