thr3ads.net - similar to: "KMeans

Displaying 20 results from an estimated 9000 matches similar to: "KMeans - Evaluation Results"

2016 Aug 17

KMeans - Evaluation Results

> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org> wrote: > >> How long does 200?300 documents take to cluster? How does it grow as > more documents are included in the MSet? We'd expect an MSet of 1000 > documents to take longer to cluster than one with 100, but the important > thing is _how_ the time increases as the number of documents

KMeans - Evaluation Results

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

I've gone through the link that you sent me and I currently understand how this helps and works to some extent, but I am not too sure of how I should start with converting the current interface to PIMPL design. I'm not used to this design pattern so its taking some time to sink in :) Say I start with the Clusterer class, I create a ClustererImpl class which is the internal class that

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

KMeans - Evaluation Results

2016 Aug 18

KMeans - Evaluation Results

> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

GSoC 2016 - Introduction

2016 Apr 25

GSoC 2016 - Introduction

Hello devs, My name is Richhiey Thomas.and I've been selected for GSoC 2016 for the project Clustering of Search Results. I would like to thank the Xapian GSoC admin's for giving me this opportunity and James and Olly to help me with my first merge request. In the next two to three days, I'll critically examine all the aspects of the project that I could have any doubts in and clear

GSOC-2016 Project : Clustering of search results

2016 Mar 12

GSOC-2016 Project : Clustering of search results

On Sat, Mar 12, 2016 at 04:27:55PM +0530, Richhiey Thomas wrote: > Below I write a raw version of my proposal for Clustering of Search Results > based on our previous mails. Hi, Richhiey. Thanks for putting this together ahead of the formal start of applications, and sharing it with us -- and it's really not too long! Project proposals for something that will last the summer are

GSOC-2016 Project : Clustering of search results

2016 Mar 07

GSOC-2016 Project : Clustering of search results

On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote: > My questions are: > 1) Can you direct me on how to convert this raw idea into a proposal in > context to Xapian with more detail? What areas do I focus on? Our GSoC guide has an application template <https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you should use to structure your proposal. It has some

Otpmial initial centroid in kmeans

2008 Jul 03

Otpmial initial centroid in kmeans

Helo there. I am using kmeans of base package to cluster my customers. As the results of kmeans is dependent on the initial centroid, may I know: 1) how can we specify the centroid in the R function? (I don't want random starting pt) 2) how to determine the optimal (if not, a good) centroid to start with? (I am not after the fixed seed solution as it only ensure that the

GSOC-2016 Project : Clustering of search results

2016 Mar 14

GSOC-2016 Project : Clustering of search results

On Mon, Mar 14, 2016 at 02:09:13AM +0530, Richhiey Thomas wrote: > The way the paper has been written I guess is the main source of your > confusion. Let me provide a paper that explains this same concept in a way > that is easier to understand. I was confused by eq (3) that you mentioned > too. Here it is : > http://www.sau.ac.in/~vivek/softcomp/clustering%20PSO+K-means.pdf Ah,

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

kmeans and incom,plete distance matrix concern

2006 Aug 07

kmeans and incom,plete distance matrix concern

Hi there I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created i.e: [ mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2), dimnames = list(levels(DF$V1), levels(DF$V2))) mat[cbind(DF$V1, DF$V2)] <- DF$V3 This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering. My query

GSoC 2017 Project Proposal

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs. I would like to propose how I plan to go about improving and getting a system that can be integrated into Xapian in this GSoC for the clustering branch. I have identified three areas of work which were not touched last time. 1) Automated Performance Analysis I had roughly implemented 2 evaluation techniques previously (Distance b/w document and centroids within clusters and

distance in the function kmeans

2004 May 28

distance in the function kmeans

Hi, I want to know which distance is using in the function kmeans and if we can change this distance. Indeed, in the function pam, we can put a distance matrix in parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but we can't do it in the function kmeans, we have to put the matrix of data directly ... Thanks in advance, Nicolas BOUGET

Using kmeans given cluster centroids and data with NAs

2005 Mar 31

Using kmeans given cluster centroids and data with NAs

Hello, I have used the functions agnes and cutree to cluster my data (4977 objects x 22 variables) into 8 clusters. I would like to refine the solution using a k-means or similar algorithm, setting the initial cluster centres as the group means from agnes. However my data matrix has NA's in it and the function kmeans does not appear to accept this? > dim(centres) [1] 8 22 > dim(data)

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

Empty cluster / segfault using vanilla kmeans with version 2.15.2

2013 Mar 13

Empty cluster / segfault using vanilla kmeans with version 2.15.2

Hello, here is a working reproducible example which crashes R using kmeans or gives empty clusters using the nstart option with R 15.2. library(cluster) kmeans(ruspini,4) kmeans(ruspini,4,nstart=2) kmeans(ruspini,4,nstart=4) kmeans(ruspini,4,nstart=10) ?kmeans either we got empty always clusters and or, after some further commands an segfault. regards, Detlef Groth ------------ [R] Empty

similar to: KMeans - Evaluation Results