Hi, I am currently pursuing my computing science bachelors degree at university of Alberta, Canada. My speciality lie in Information retrieval, machine learning and data mining. In order to get hands on experience with real world information retrieval systems, I would like to contribute to the Xapian project. I have been going through some of the project ideas in https://trac.xapian.org/wiki/GSoCProjectIdeas. I am interested on the project "Clustering of Search Results" since I also have some experience with clustering in machine learning. Would you be able to let me know the status of this project?? Once I get familiar with its codebase and the current system status, I can think of possible extensions and ways of improving it. Feel free to share any directions that you think is preferable. If you have any sort of question feel free to ask. Wish you all a delightful weekend! Cheers, Touqir
Hi Touqir, On Sat, Oct 01, 2016 at 04:47:41PM -0600, Touqir Sajed wrote:> I am currently pursuing my computing science bachelors degree at > university of Alberta, Canada. My speciality lie in Information > retrieval, machine learning and data mining. In order to get hands on > experience with real world information retrieval systems, I would like > to contribute to the Xapian project. I have been going through some of > the project ideas in https://trac.xapian.org/wiki/GSoCProjectIdeas. I > am interested on the project "Clustering of Search Results" since I > also have some experience with clustering in machine learning. Would > you be able to let me know the status of this project??Richhiey worked on it as a project for this GSoC. I followed progress but not in great detail, so I can tell you there's a pull request which needs some more work, but not a lot more off the top of my head. Here is the PR: https://github.com/xapian/xapian/pull/122 Richhiey or James can probably give a more useful summary of where things are at.> Once I get > familiar with its codebase and the current system status, I can think > of possible extensions and ways of improving it. Feel free to share > any directions that you think is preferable.At this point, it's really more of a priority to get the existing code finished off and merged. You can certainly think about ways to extend it, but I'd like to not get distracted from the merge. So maybe you'd be best off to find something else to look at first? A smaller project would probably be a better way to get to grips with the codebase anyway. Cheers, Olly
On 3 Oct 2016, at 07:36, Olly Betts <olly at survex.com> wrote:> https://github.com/xapian/xapian/pull/122 > > Richhiey or James can probably give a more useful summary of where things > are at.From memory, we're getting close to merging this PR into its own branch, but Richhiey & I (and anyone else interested) would then have to discuss whether that's a suitable point to merge into master, or if there's more work needed to present something that's a useful new (experimental, for now) feature. I've been somewhat distracted from this by a number of other issues lately, unfortunately. As Olly says, it's probably best to do some small pieces of work to get familiar with the codebase. Our guide for GSoC students should be helpful even to people outside GSoC (https://trac.xapian.org/wiki/GSoC%20Guide), and our list of bite-size project ideas (https://trac.xapian.org/wiki/ProjectIdeas#BiteSize) is a good place to look for something to work on that should take a few days rather than weeks or longer. (That list probably needs a quick check against open PRs, since a couple of them have work in progress from earlier this year.) J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/