search for: documentset

Displaying 4 results from an estimated 4 matches for "documentset".

2016 May 05
2
GSoC 2016 - Introduction
Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need
2016 Aug 17
2
KMeans - Evaluation Results
> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying
2016 Aug 17
2
KMeans - Evaluation Results
...Then gradually move through the > classes that are used by its public API (such as Cluster). > > Note that this means you have to complete the refactor to introduce a > Clusterer class first. (Or maybe 'ClusteringAlgorithm' might be clearer.) > > I'm not sure you need DocumentSet as you've implemented things (just > provide iterators over the documents in the Cluster's API, and put the > vector<Document> directly in the Cluster). That should make things slightly > easier. > > I have a two questions about this since I haven't worked with PIMP...
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr: http://www.anur.ag/blog/2009/03/xapian-and-solr/ According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB. I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is