Displaying 5 results from an estimated 5 matches for "docsimcosine".
2016 Aug 17
2
KMeans - Evaluation Results
> How long does 200?300 documents take to cluster? How does it grow as more
> documents are included in the MSet? We'd expect an MSet of 1000 documents
> to take longer to cluster than one with 100, but the important thing is
> _how_ the time increases as the number of documents grows.
>
> Currently, the number of seconds taken for clustering a set of documents
for varying
2016 Aug 17
2
KMeans - Evaluation Results
...ration involving that structure, so I decided to switch to hash maps
> (std::tr1::unordered_map). This considerably reduced the time and made it
> possible to run the code under the profiler in a decent amount of time. I
> noticed that most of the time (about 94%) was spent in
> Xapian::DocSimCosine::similarity. As I was looking through the code, I
> noticed that the inner product computation was wrong and I reimplemented
> it. While doing that, I also changed the implementation to use a hash map
> instead of vector. I also noticed that the TF-IDF score was computed for
> every doc...
2016 May 05
2
GSoC 2016 - Introduction
...e metrics use TF-IDF weighting with cosine similarity,
which is very similar to the approach I would need for this project. Just
in this case, euclidian distance would be the metric.
Would it be good to structure it in a way similar to the previous API with
a few changes?
For example, the Xapian::DocSimCosine::similarity( ) function in itself
calculates the tf idf vectors and calculates the similarity. Instead would
it be possible to have a custom weighting scheme sub classing
Xapian::Weight? This can help in providing the user an option about which
weighting scheme to use to create document vectors in...
2016 May 01
2
GSoC 2016 - Introduction
Before going ahead with the tests as you mentioned above, I would just like
to clarify a few higher level things that I am still in doubt about.
1) As discussed during the IRC interview, I was suggested about first
implementing a normal K-means clustering implementation and then adding on
the PSO module as a functionality that can be used to improve quality of
clustering for speed as a trade off.
2016 Mar 07
2
GSOC-2016 Project : Clustering of search results
On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote:
> My questions are:
> 1) Can you direct me on how to convert this raw idea into a proposal in
> context to Xapian with more detail? What areas do I focus on?
Our GSoC guide has an application template
<https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you
should use to structure your proposal. It has some