search for: docsimcosine

Displaying 5 results from an estimated 5 matches for "docsimcosine".

2016 Aug 17
2
KMeans - Evaluation Results
> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying
2016 Aug 17
2
KMeans - Evaluation Results
...ration involving that structure, so I decided to switch to hash maps > (std::tr1::unordered_map). This considerably reduced the time and made it > possible to run the code under the profiler in a decent amount of time. I > noticed that most of the time (about 94%) was spent in > Xapian::DocSimCosine::similarity. As I was looking through the code, I > noticed that the inner product computation was wrong and I reimplemented > it. While doing that, I also changed the implementation to use a hash map > instead of vector. I also noticed that the TF-IDF score was computed for > every doc...
2016 May 05
2
GSoC 2016 - Introduction
...e metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need for this project. Just in this case, euclidian distance would be the metric. Would it be good to structure it in a way similar to the previous API with a few changes? For example, the Xapian::DocSimCosine::similarity( ) function in itself calculates the tf idf vectors and calculates the similarity. Instead would it be possible to have a custom weighting scheme sub classing Xapian::Weight? This can help in providing the user an option about which weighting scheme to use to create document vectors in...
2016 May 01
2
GSoC 2016 - Introduction
Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.
2016 Mar 07
2
GSOC-2016 Project : Clustering of search results
On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote: > My questions are: > 1) Can you direct me on how to convert this raw idea into a proposal in > context to Xapian with more detail? What areas do I focus on? Our GSoC guide has an application template <https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you should use to structure your proposal. It has some