Hi Xapian Developers,
I am Dhruv, majoring in Mathematics & Computing from Indian
Institute Of Technology, Guwahati, India. I went through your ideas page
and found a few ideas that caught my interest. After doing a bit of the
research over those ideas, I feel that "Clustering of Search Results"
and ?Weighting
Schemes" are the ones that I would like to contribute on, as they aptly
fits my profile.
A little bit about my background: I have strong software engineering skills
with 3 years of commercial C++ and C experience. I am exposed to MPI too
and have 2 years of parallel programming experience in CUDA*. I have been
studying Machine Learning for the past 3 years and have implemented quite a
few advanced techniques in C++ and CUDA like Q-Learning using Convolutional
Neural Network as Q-value estimator in parallel[1]. Also, I would like to
mention that last year, I participated in Google Summer of Code 2014 and
worked on the project ?Real Time Vectorization of Brain Atlases? mentored
by INCF. Following is the link to the project repository :
https://github.com/INCF/Vectorization-of-brain-atlases
According to my understanding of the above projects, I feel that a parallel
Clustering algorithm like PDBSCAN[2] will be much suitable for ?Clustering
of Search Results? in the context of time complexity and will surely
provide a major speed up. But, the code need not to depend on the
availability of multiple processors, instead we can have generalized
structured code that is capable of taking advantage of the available
processors (and even GPU?s). *What do you think?* Also, it?s not at all
mandatory to implement (only) a density-based clustering algorithm, we may
have multiple other schemes of parallel clustering[3] in our project but
surely the one which can provide the highest speed up, feasibly, has to be
identified and should be implemented first. As in my last year gsoc project
where me and my mentors discovered a new bitmap vectorization algorithm, we
may come up with a new parallel algorithm for faster clustering,
For ?Weighting Schemes?, I believe that implementation would not be much of
a problem instead correct implementation according to the concerned
mathematical formula will be the main concern.
I would love to contribute in the above mentioned project with my full
dedication and would love to have a discussion on them.
Thanks & Best regards,
Dhruv
Link to my github repo: https://github.com/chiggum
References:
1.
http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf (my project repo is
private).
2.
http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf
3.
http://www.cs.gsu.edu/~wkim/index_files/SurveyParallelClustering.pdf
*our team won the first cuda challenge in India organized by Nvidia in 2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20150203/05c8b11a/attachment.html>