search for: present_docu

Displaying 2 results from an estimated 2 matches for "present_docu".

2016 Aug 17
2
KMeans - Evaluation Results
> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying
2016 Aug 17
2
KMeans - Evaluation Results
...gt; it. While doing that, I also changed the implementation to use a hash map > instead of vector. I also noticed that the TF-IDF score was computed for > every document whenever the similarity was called. I then changed the > interface for Xapian::DocSimCosine by adding a new method, > present_document, which would allow to precompute the scores. > > Yes that's right. I've stored the TF-IDF weights of terms in hashmap and even pre computed values for documents, since they do not change. Its only the centroid values which change. As the number of documents increase, the number of...