Shreedhar Pawar
2014-Mar-18 14:49 UTC
[Xapian-devel] Considering Parallel computing for Letor
Hi everyone, My name is Shreedhar Pawar. I have already introduced myself on Xapian-discuss... I feel that the Xapian Search/Letor Algorithm, can speed up using Parallel computing. Techniques like Map-reduce, 'compact n split', radix sort, scan, parallel hashing n much more can be used to speed up the learning algorithms as well as the search... support vector machines in the Letor algorithm involves heavy computations at the training data-set stage which could be speed up again by using parallel computing. Should I consider this for writing my proposal, I mean are there any chances that we implement Letor in a Parallel way...? OpenCL is an open source parallel computing language which can be implemented with C/C++ and we get much help on forums. Cheers...! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140318/cd367f22/attachment-0002.html>
Hi Shreedhar, Right now the Letor has three ranking algorithms: svmranker, listmle and listnet. Svmranker uses libsvm in the background to train the model which is very efficient. The listmle and listnet algorithms are not parallelised. Parallel computing is definitely an interest of the xapian-letor project but you might have noticed from this year's project idea page, this years focus is to sort out the existing API first and then on top of it adding a feature selection algorithm. We would welcome your proposal and your proposal can be sorting out the letor API and on top of it making it parallel (replacing the feature selection algorithm). You should also well define the scope of your project, if you want to make the training functions parallel, feature extraction parallel and how you want to achieve it. I strongly recommend you to fork the existing branch from: https://github.com/parthg/xapian and see what you want to do and where and include those details in the proposal. Cheers, Parth. On Tue, Mar 18, 2014 at 3:49 PM, Shreedhar Pawar <shreedhar22 at gmail.com>wrote:> Hi everyone, > > My name is Shreedhar Pawar. I have already introduced myself on > Xapian-discuss... > I feel that the Xapian Search/Letor Algorithm, can speed up using > Parallel computing. Techniques like Map-reduce, 'compact n split', radix > sort, scan, parallel hashing n much more can be used to speed up the > learning algorithms as well as the search... support vector machines in > the Letor algorithm involves heavy computations at the training data-set > stage which could be speed up again by using parallel computing. Should I > consider this for writing my proposal, I mean are there any chances that we > implement Letor in a Parallel way...? OpenCL is an open source parallel > computing language which can be implemented with C/C++ and we get much help > on forums. > > > Cheers...! > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140319/9e609649/attachment-0002.html>
On Wed, Mar 19, 2014 at 10:51:24AM +0100, Parth Gupta wrote:> Parallel computing is definitely an interest of the xapian-letor project > but you might have noticed from this year's project idea page, this years > focus is to sort out the existing API first and then on top of it adding a > feature selection algorithm.Yeah, the focus here really does need to be on producing a using xapian-letor module. A super-fast, ultra-parallel module with the same API issues, thin documentation and complete lack of test coverage would not be a step forwards.> We would welcome your proposal and your proposal can be sorting out the > letor API and on top of it making it parallel (replacing the feature > selection algorithm).If you can fit a project that makes a usable xapian-letor and then works on making it more parallel into 13 weeks, I'm also OK with that. As I said on IRC earlier, I think exploiting multiple cores would be more widely useful than GPU stuff - almost everything new has multiple cores these days, but servers and VMs don't generally have a powerful GPU available. Cheers, Olly