Hi, I am Nikhar Agrawal, currently studying in my third year at IIIT-H, pursuing Computer Science and Engineering. I am fairly proficient in C++. I have been a GSoC 2013 participant for Boost C++ libraries and managed to successfully merge my project into Boost trunk. As a part of my course on Information Retrieval and Extraction, I did a project on searching for queries on the latest 40 gb wikipedia dump. Hence, I got pretty excited to see all the projects on Xapian ideas page that I could identify with. To summarize, in the project, I used libxml++ to parse the wiki dump. I built an index of words (using multi-way merge) along with its posting list in the decreasing order of TF-IDF. And then built a secondary index on top of it for fast retrieval. To search for multiword queries, I used a simple ? tf-idf ranking system. I would like to apply for GSoC 2014 as well and Xapian seems a great place to learn more and put in practice the theories I am learning in my Information Retrieval and Extraction course. How would you suggest I should proceed? Thanks. Nikhar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140214/a76675fc/attachment-0002.html>
Hi Nikhar, On Fri, Feb 14, 2014 at 10:19:39PM +0530, Nikhar Agrawal wrote:> I would like to apply for GSoC 2014 as well and Xapian seems a great place > to learn more and put in practice the theories I am learning in my > Information Retrieval and Extraction course. > > How would you suggest I should proceed?If you haven't already, I'd suggest checking out the code from git and getting it to build. Did you have an idea what you might want to work on? There's a list of suggested project ideas here, but students are also welcome to propose their own projects: http://trac.xapian.org/wiki/GSoCProjectIdeas Cheers, Olly
Hi, On Sat, Feb 15, 2014 at 2:34 PM, Olly Betts <olly at survex.com> wrote:> If you haven't already, I'd suggest checking out the code from git and > getting it to build. >I checked out the code from git and got it to build. I went through the QuickStart guide and built the sample indexer and searcher programs without any problems. Seems to be working perfectly. :) Did you have an idea what you might want to work on?>Yes, the projects 'Weighing Schemes' and 'Learning to Rank' both seem interesting with my inclination being more towards 'Weighing Schemes'. What is higher priority for Xapian? What would you like me to do next? Thanks. Nikhar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140217/254d0afa/attachment-0002.html>