Richhiey Thomas
2015-Jan-21 12:12 UTC
[Xapian-devel] GSOC-2015 : Clustering of search results
Hello everyone. I had sent this mail quite a while ago and still am awaiting a reply. Thanks :) Looking at the existing approaches, I suppose we have approached clustering with the single link heirarchial clustering and k means, which appear to be slow for moderately sized datasets. I would like to propose a density based clustering technique for xapian based on DBSCAN or OPTICS since these approaches can handle clusters of various shapes and sizes and are also resistant to noise. Below are links for papers on the same: http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf With use of good indexing structures, the complexity of the above algorithms is O(nlogn) which is faster and efficient than single link and k means. Could I know whether this would be a good idea for a project? And if not, how else can I approach this project? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20150121/3e1372ac/attachment.html>